From owner-cvs-src@FreeBSD.ORG Fri Dec 17 07:06:38 2004 Return-Path: Delivered-To: cvs-src@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2CCE716A4CE; Fri, 17 Dec 2004 07:06:38 +0000 (GMT) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 807EF43D49; Fri, 17 Dec 2004 07:06:37 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87])iBH76ZKP026985; Fri, 17 Dec 2004 18:06:35 +1100 Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246]) iBH76V7M004466; Fri, 17 Dec 2004 18:06:32 +1100 Date: Fri, 17 Dec 2004 18:06:31 +1100 (EST) From: Bruce Evans X-X-Sender: bde@epsplex.bde.org To: John Baldwin In-Reply-To: <200412161449.23781.jhb@FreeBSD.org> Message-ID: <20041217165951.Q1181@epsplex.bde.org> References: <200411300618.iAU6IkQX065609@repoman.freebsd.org> <20041216144239.T1723@epsplex.bde.org> <200412161449.23781.jhb@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: cvs-src@FreeBSD.org cc: Nate Lawson cc: cvs-all@FreeBSD.org cc: src-committers@FreeBSD.org cc: Kris Kennaway Subject: Re: cvs commit: src/sys/i386/i386 vm_machdep.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Dec 2004 07:06:38 -0000 On Thu, 16 Dec 2004, John Baldwin wrote: > On Wednesday 15 December 2004 10:51 pm, Bruce Evans wrote: > > On Wed, 15 Dec 2004, Kris Kennaway wrote: > > ... > > > I often > > > get overlapping panics from the other CPUs on this machine, and it > > > often locks up when trying to enter DDB, or while printing the panic > > > string (the other day it only got as far as 'p' before hanging). > > > > panic() needs much the same locking as ddb to prevent concurrent entry. > > It must be fairly likely for all CPUs to panic on the same asertion. > > This is like all CPUs entering ddb on the same breakpoint. > > The thing is, panic does have locking, but it appears to be ineffective: > > #ifdef SMP > /* > * We don't want multiple CPU's to panic at the same time, so we > * use panic_cpu as a simple spinlock. We have to keep checking > * panic_cpu if we are spinning in case the panic on the first > * CPU is canceled. > */ > if (panic_cpu != PCPU_GET(cpuid)) > while (atomic_cmpset_int(&panic_cpu, NOCPU, > PCPU_GET(cpuid)) == 0) > while (panic_cpu != NOCPU) > ; /* nothing */ > #endif Oops, I forgot bout that (and never really noticed that it was the very first thing in panic() as it needs to be). > In the smpng branch in p4, I have the lock changed to be based on the thread > rather than the CPU to account for problems coming from migration due to > preemption while in a panic, but I haven't observed any noticeable > improvement from the change: I can't see any serious problems with the above. Perhaps it should try to stop the other CPUs like ddb, since panic() is going to do drastic things. I think it needs to use atomic_load_acq_int() in the inner loop, but that only affects the RESTARTABLE_PANICS case. > --- //depot/vendor/freebsd/src/sys/kern/kern_shutdown.c 2004/11/05 19:00:32 > +++ //depot/projects/smpng/sys/kern/kern_shutdown.c 2004/11/05 19:22:55 > @@ -473,7 +473,7 @@ > } > > #ifdef SMP > -static u_int panic_cpu = NOCPU; > +static struct thread *panic_thread = NULL; > #endif > > /* > @@ -494,15 +494,14 @@ > #ifdef SMP > /* > * We don't want multiple CPU's to panic at the same time, so we > - * use panic_cpu as a simple spinlock. We have to keep checking > - * panic_cpu if we are spinning in case the panic on the first > + * use panic_thread as a simple spinlock. We have to keep checking > + * panic_thread if we are spinning in case the panic on the first > * CPU is canceled. > */ > - if (panic_cpu != PCPU_GET(cpuid)) > - while (atomic_cmpset_int(&panic_cpu, NOCPU, > - PCPU_GET(cpuid)) == 0) > - while (panic_cpu != NOCPU) > - ; /* nothing */ > + if (panic_thread != curthread) > + while (atomic_cmpset_ptr(&panic_thread, NULL, curthread) == 0) > + while (panic_thread != NULL) > + cpu_spinwait(); > #endif > > bootopt = RB_AUTOBOOT | RB_DUMP; > @@ -538,7 +537,7 @@ > /* See if the user aborted the panic, in which case we continue. */ > if (panicstr == NULL) { > #ifdef SMP > - atomic_store_rel_int(&panic_cpu, NOCPU); > + atomic_store_rel_ptr(&panic_thread, NULL); > #endif > return; > } Hmm, this has a long history. It was committed more than 2 years ago in rev.1.132, then immediately backed out since it wasn't ready. I slightly prefer to use the cpuid, since panic() can occur in non-thread context and during context switching. PCPU_GET(cpuid) is also better than curthread->td_oncpu, since changing the latter is not atomic with changing curthread. Bruce