From owner-svn-src-all@FreeBSD.ORG  Sat Oct  9 09:33:13 2010
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2E8DC106564A;
	Sat,  9 Oct 2010 09:33:13 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au
	[211.29.132.190])
	by mx1.freebsd.org (Postfix) with ESMTP id A05348FC0C;
	Sat,  9 Oct 2010 09:33:12 +0000 (UTC)
Received: from c122-106-146-165.carlnfd1.nsw.optusnet.com.au
	(c122-106-146-165.carlnfd1.nsw.optusnet.com.au [122.106.146.165])
	by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o999X8Tm011920
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 9 Oct 2010 20:33:10 +1100
Date: Sat, 9 Oct 2010 20:33:08 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Andriy Gapon <avg@freebsd.org>
In-Reply-To: <201010090807.o9987nG0030939@svn.freebsd.org>
Message-ID: <20101009191600.N3531@besplex.bde.org>
References: <201010090807.o9987nG0030939@svn.freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
	src-committers@freebsd.org
Subject: Re: svn commit: r213648 - head/sys/kern
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
	user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Oct 2010 09:33:13 -0000

On Sat, 9 Oct 2010, Andriy Gapon wrote:

> Log:
>  panic_cpu variable should be volatile
>
>  This is to prevent caching of its value in a register when it is checked
>  and modified by multiple CPUs in parallel.
>  Also, move the variable  into the scope of the only function that uses it.
>
>  Reviewed by:	jhb
>  Hint from:	mdf
>  MFC after:	1 week

I doubt that this is either necessary or sufficient.  Most variables
aren't volatile while they are locked by a mutex.  But panic() cannot use
normal mutex locking, so it should access this variable using atomic
ops with memory barriers.  The compiler part of the memory barriers
effectively make _all_ variables temporily volatile.  However, 2 of the
the 4 accesses to this variable doesn't use an atomic op.

% #ifdef SMP
% 	/*
% 	 * We don't want multiple CPU's to panic at the same time, so we
% 	 * use panic_cpu as a simple spinlock.  We have to keep checking
% 	 * panic_cpu if we are spinning in case the panic on the first
% 	 * CPU is canceled.
% 	 */
% 	if (panic_cpu != PCPU_GET(cpuid))

This access doesn't use an atomic op.

% 		while (atomic_cmpset_int(&panic_cpu, NOCPU,
% 		    PCPU_GET(cpuid)) == 0)

This access uses an atomic op.  Not all atomic ops use memory barriers,
at least on i386.  However, this one does, at least on i386.

   (I'm always confused about what atomic_any() without acq or release
   means.  Do they mean that you don't want a memory barrier (this is
   what the mostly give, at least on i386), and if so, what use are
   they?  There are far too many atomic ops, for far too many never-used
   widths, with alternative spellings to encourage using a wrong one.
   cmpset is is especially confusing since it you can spell it as cmpset,
   cmpset_acq or compset_rel and always get the barrier, at least on
   i386.  At least cmpset only supports 1 width, at least on i386.)

% 			while (panic_cpu != NOCPU)

This access doesn't use an atomic op.

% 				; /* nothing */
% #endif
% ...
% #ifdef RESTARTABLE_PANICS
% 	/* See if the user aborted the panic, in which case we continue. */
% 	if (panicstr == NULL) {
% #ifdef SMP
% 		atomic_store_rel_int(&panic_cpu, NOCPU);

This access uses an atomic op with a memory barrier, at least on i386.
Now its rel semantics are clear.

panicstr is non-volatile and never accessed by an atomic op, so it probably
strictly needs to be declared volatile even more than panic_cpu.  I think
the only thing that makes it work now is that it is bogusly pubic, and
compilers can't analyze the whole program yet -- if they could, then they
would see that it is only set in panic().

% #endif
% 		return;
% 	}
% #endif
% #endif

Now, why don't the partial memory barriers prevent caching the variable?

% 	if (panic_cpu != PCPU_GET(cpuid))
% 		while (atomic_cmpset_int(&panic_cpu, NOCPU,
% 		    PCPU_GET(cpuid)) == 0)
% 			while (panic_cpu != NOCPU)
% 				; /* nothing */

The very first access can't reasonably use a cachec value.  atomic_cmpset()
can change panic_cpu, so even without the memory barrier, panic_cpu must
be reloaded for the third access the first time.  But then when the third
access is repeated in the second while loop, the missing atomic op with
barrier makes things very broken.  panic_cpu isn't changed by the loop,
and the compiler thinks that it isn't changed by anything else either, so
the compiler may reduce the loop to:

% 			if (panic_cpu != NOCPU)
% 				for (;;)
% 					; /* nothing */

except I've seen claims that even an endless for loop can be optimized
to nothing.  Declaring panic_cpu as volatile prevents the compiler doing
this, but I think it is insufficient.  We really do want to see panic_cpu
changed by other CPUs, and what is the point of atomic_load_acq*() if not
to use for this -- if declaring things volatile were sufficient, then we
could just use *(volatile *)&var.  atomic_load/store_acq/rel*() on i386
used to do exactly that for the UP case, but use a lock prefix and also
a memory barrier for the SMP case.  Current versions also use the memory
barrier in the UP case, and have a relevant comment aboat this:

% Index: atomic.h
% ===================================================================
% RCS file: /home/ncvs/src/sys/i386/include/atomic.h,v
% retrieving revision 1.32.2.1
% retrieving revision 1.55
% diff -u -1 -r1.32.2.1 -r1.55
% --- atomic.h	24 Nov 2004 18:10:02 -0000	1.32.2.1
% +++ atomic.h	20 May 2010 06:18:03 -0000	1.55
% @@ -181,5 +207,5 @@
%   * SMP kernels.  For UP kernels, however, the cache of the single processor
% - * is always consistent, so we don't need any memory barriers.
% + * is always consistent, so we only need to take care of compiler.
%   */

i386 doesn't really have memory barriers (maybe l/s/mfence are, but we
don't use them even on amd64), but we have to use "memoy" clobbers to
take care of the compiler, or we would have to declare too many things
as volatile.

% -#define ATOMIC_STORE_LOAD(TYPE, LOP, SOP)		\
% +#define	ATOMIC_STORE_LOAD(TYPE, LOP, SOP)		\
%  static __inline u_##TYPE				\
% @@ -187,3 +213,7 @@
%  {							\
% -	return (*p);					\
% +	u_##TYPE tmp;					\
% +							\
% +	tmp = *p;					\
% +	__asm __volatile("" : : : "memory");		\
% +	return (tmp);					\
%  }							\
% @@ -193,2 +223,3 @@
%  {							\
% +	__asm __volatile("" : : : "memory");		\
%  	*p = v;						\

5-STABLE is still missing the memory clobber, so a newer compiler on it
might compile away the while loop even when it is fixed to use
atomic_load_acq_int().

% ...
% 	if (panicstr == NULL) {
% 		atomic_store_rel_int(&panic_cpu, NOCPU);

Smaller problems with this.  Just the unlocked access to panicstr.
Maybe not a problem, except to understand why it isn't one.  I think
the earlier while loops prevent concurrent modification of panicstr.
Recursive panics are possible, but since they are possible the compiler
should see that they are possible even if it analyzes the whole program,
and thus know that it cannot cache panicstr.

Bruce