From owner-freebsd-bugs@FreeBSD.ORG Sun Sep 25 03:10:14 2011 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52B9A106564A for ; Sun, 25 Sep 2011 03:10:14 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 37F3B8FC0A for ; Sun, 25 Sep 2011 03:10:14 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p8P3ADXO048338 for ; Sun, 25 Sep 2011 03:10:13 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p8P3ADNe048286; Sun, 25 Sep 2011 03:10:13 GMT (envelope-from gnats) Date: Sun, 25 Sep 2011 03:10:13 GMT Message-Id: <201109250310.p8P3ADNe048286@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Bruce Evans Cc: Subject: Re: kern/160992: buf_ring(9) statistics accounting not MPSAFE X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Bruce Evans List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Sep 2011 03:10:14 -0000 The following reply was made to PR kern/160992; it has been noted by GNATS. From: Bruce Evans To: Arnaud Lacombe Cc: freebsd-gnats-submit@freebsd.org, freebsd-bugs@freebsd.org Subject: Re: kern/160992: buf_ring(9) statistics accounting not MPSAFE Date: Sun, 25 Sep 2011 13:01:04 +1000 (EST) On Sat, 24 Sep 2011, Arnaud Lacombe wrote: >> Description: > The following block of code, in `sys/sys/buf_ring.h': > > > /* > * If there are other enqueues in progress > * that preceeded us, we need to wait for them > * to complete > */ > while (br->br_prod_tail != prod_head) > cpu_spinwait(); > br->br_prod_bufs++; > br->br_prod_bytes += nbytes; > br->br_prod_tail = prod_next; > critical_exit(); > > > can be seen at runtime, memory-wise as: > > while (br->br_prod_tail != prod_head) > cpu_spinwait(); > br->br_prod_tail = prod_next; > br->br_prod_bufs++; > br->br_prod_bytes += nbytes; > critical_exit(); > > That is, there is no memory barrier to enforce completion of the > load/increment/store/load/load/addition/store operations before > updating what other thread spin on. The counters are 64 bits, so it also does non-atomic increments of them no 32-bit arches. > Even if `br_prod_tail' is marked `volatile', there is no guarantee that it will not be re-ordered wrt. non-volatile write (to `br_prod_bufs' and `br_prod_bytes'). Using volatile is generally bogus. Here it seems to mainly give pessimizations and more opportunities for bad memory orders. The i386 code for incrementing a 64-bit volatile x is: movl x, %eax movl x+4, %edx addl $1, %eax adcl $0, %edx movl %eax, x movl %edx, x+4 while for a 64-bit non-volatile it is: addl $1, x adcl $0, x+4 so volatile gives more caching in registers instead of less. The following are some of the bad memory orders possible: with volatile: lo = br->br_prod_bytes.lo; hi = br->br_prod_bytes.hi; br->br_prod_tail = prod_next; br->br_prod_bufs++; lo += nbytes; hi += carry; br->br_prod_bytes.hi = hi; br->br_prod_bytes.lo = lo; without volatile: br->br_prod_bytes.lo += nbytes; br->br_prod_tail = prod_next; br->br_prod_bufs++; br->br_prod_bytes.hi += carry; I think the token method would make the nonatomic accesses to the counters sufficiently atomic if it worked. The necessary memory barriers would probably have a memory clobber which effectively makes all memory variables transiently volatile (where volatile actually means non-volatile with respect to them changing -- holding the token prevents them changing -- but actually means volatile with respect to their memory accesses) so the effect of declaring the counters permanently volatile would be reduced to a pessimization. Even reads of them in sysctls must hold the token to get a consistent snapshot. Bruce