From owner-freebsd-bugs@FreeBSD.ORG  Sun Sep 25 03:10:14 2011
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
Delivered-To: freebsd-bugs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 52B9A106564A
	for <freebsd-bugs@hub.freebsd.org>;
	Sun, 25 Sep 2011 03:10:14 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 37F3B8FC0A
	for <freebsd-bugs@hub.freebsd.org>;
	Sun, 25 Sep 2011 03:10:14 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p8P3ADXO048338
	for <freebsd-bugs@freefall.freebsd.org>; Sun, 25 Sep 2011 03:10:13 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p8P3ADNe048286;
	Sun, 25 Sep 2011 03:10:13 GMT (envelope-from gnats)
Date: Sun, 25 Sep 2011 03:10:13 GMT
Message-Id: <201109250310.p8P3ADNe048286@freefall.freebsd.org>
To: freebsd-bugs@FreeBSD.org
From: Bruce Evans <brde@optusnet.com.au>
Cc: 
Subject: Re: kern/160992: buf_ring(9) statistics accounting not MPSAFE
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Bruce Evans <brde@optusnet.com.au>
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Sep 2011 03:10:14 -0000

The following reply was made to PR kern/160992; it has been noted by GNATS.

From: Bruce Evans <brde@optusnet.com.au>
To: Arnaud Lacombe <lacombar@gmail.com>
Cc: freebsd-gnats-submit@freebsd.org, freebsd-bugs@freebsd.org
Subject: Re: kern/160992: buf_ring(9) statistics accounting not MPSAFE
Date: Sun, 25 Sep 2011 13:01:04 +1000 (EST)

 On Sat, 24 Sep 2011, Arnaud Lacombe wrote:
 
 >> Description:
 > The following block of code, in `sys/sys/buf_ring.h':
 >
 >
 >       /*
 >        * If there are other enqueues in progress
 >        * that preceeded us, we need to wait for them
 >        * to complete
 >        */
 >       while (br->br_prod_tail != prod_head)
 >               cpu_spinwait();
 >       br->br_prod_bufs++;
 >       br->br_prod_bytes += nbytes;
 >       br->br_prod_tail = prod_next;
 >       critical_exit();
 >
 >
 > can be seen at runtime, memory-wise as:
 >
 >      while (br->br_prod_tail != prod_head)
 >              cpu_spinwait();
 >      br->br_prod_tail = prod_next;
 >      br->br_prod_bufs++;
 >      br->br_prod_bytes += nbytes;
 >      critical_exit();
 >
 > That is, there is no memory barrier to enforce completion of the
 > load/increment/store/load/load/addition/store operations before
 > updating what other thread spin on.
 
 The counters are 64 bits, so it also does non-atomic increments of them
 no 32-bit arches.
 
 > Even if `br_prod_tail' is marked `volatile', there is no guarantee that it will not be re-ordered wrt. non-volatile write (to `br_prod_bufs' and `br_prod_bytes').
 
 Using volatile is generally bogus.  Here it seems to mainly give
 pessimizations and more opportunities for bad memory orders.  The i386
 code for incrementing a 64-bit volatile x is:
 
  	movl	x, %eax
  	movl	x+4, %edx
  	addl	$1, %eax
  	adcl	$0, %edx
  	movl	%eax, x
  	movl	%edx, x+4
 
 while for a 64-bit non-volatile it is:
 
  	addl	$1, x
  	adcl	$0, x+4
 
 so volatile gives more caching in registers instead of less.  The following
 are some of the bad memory orders possible:
 
 with volatile:
         lo = br->br_prod_bytes.lo;
         hi = br->br_prod_bytes.hi;
         br->br_prod_tail = prod_next;
         br->br_prod_bufs++;
         lo += nbytes;
         hi += carry;
         br->br_prod_bytes.hi = hi;
         br->br_prod_bytes.lo = lo;
 
 without volatile:
 
         br->br_prod_bytes.lo += nbytes;
         br->br_prod_tail = prod_next;
         br->br_prod_bufs++;
         br->br_prod_bytes.hi += carry;
 
 I think the token method would make the nonatomic accesses to the
 counters sufficiently atomic if it worked.  The necessary memory
 barriers would probably have a memory clobber which effectively makes
 all memory variables transiently volatile (where volatile actually
 means non-volatile with respect to them changing -- holding the token
 prevents them changing -- but actually means volatile with respect to
 their memory accesses) so the effect of declaring the counters permanently
 volatile would be reduced to a pessimization.  Even reads of them in
 sysctls must hold the token to get a consistent snapshot.
 
 Bruce