From owner-cvs-all@FreeBSD.ORG Thu Dec 16 20:31:00 2004 Return-Path: Delivered-To: cvs-all@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 689FB16A4CF for ; Thu, 16 Dec 2004 20:31:00 +0000 (GMT) Received: from mail3.speakeasy.net (mail3.speakeasy.net [216.254.0.203]) by mx1.FreeBSD.org (Postfix) with ESMTP id CCBED43D2D for ; Thu, 16 Dec 2004 20:30:59 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 6542 invoked from network); 16 Dec 2004 20:30:59 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 16 Dec 2004 20:30:58 -0000 Received: from [10.50.41.243] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id iBGKUhno012114; Thu, 16 Dec 2004 15:30:54 -0500 (EST) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Bruce Evans Date: Thu, 16 Dec 2004 14:49:23 -0500 User-Agent: KMail/1.6.2 References: <200411300618.iAU6IkQX065609@repoman.freebsd.org> <20041215151526.GA3462@xor.obsecurity.org> <20041216144239.T1723@epsplex.bde.org> In-Reply-To: <20041216144239.T1723@epsplex.bde.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200412161449.23781.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: cvs-src@FreeBSD.org cc: Nate Lawson cc: cvs-all@FreeBSD.org cc: src-committers@FreeBSD.org cc: Kris Kennaway Subject: Re: cvs commit: src/sys/i386/i386 vm_machdep.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Dec 2004 20:31:00 -0000 On Wednesday 15 December 2004 10:51 pm, Bruce Evans wrote: > On Wed, 15 Dec 2004, Kris Kennaway wrote: > > On Tue, Dec 14, 2004 at 09:48:48PM -0500, John Baldwin wrote: > > > On Tuesday 14 December 2004 07:10 pm, Kris Kennaway wrote: > > > > NB: DDB often isn't usable on SMP machines thesedays, and will hang > > > > when a panic tries to enter it. > > > > > > Try debug.kdb.stop_cpus=0 (sysctl and tunable) to prevent KDB from > > > trying to stop the other CPUs. Another possible fix that ups@ has > > > talked about is changing IPI_STOP to use an NMI rather than a vector > > > (you can send NMI IPIs via the local APIC) so that IPI_STOP is more > > > reliable. > > > > This is already set, and it doesn't always fix the problem. > > debug.kdb.stop_cpus=0 should be expected to increase problems. Given time, > the other CPU are quite likely to enter ddb for whatever reason the first > one did. Then they stomp on ddb's global state (starting with ddb_regs). > > The NMI would need locking to prevent the CPUs stopping each other. > > > I often > > get overlapping panics from the other CPUs on this machine, and it > > often locks up when trying to enter DDB, or while printing the panic > > string (the other day it only got as far as 'p' before hanging). > > panic() needs much the same locking as ddb to prevent concurrent entry. > It must be fairly likely for all CPUs to panic on the same asertion. > This is like all CPUs entering ddb on the same breakpoint. The thing is, panic does have locking, but it appears to be ineffective: #ifdef SMP /* * We don't want multiple CPU's to panic at the same time, so we * use panic_cpu as a simple spinlock. We have to keep checking * panic_cpu if we are spinning in case the panic on the first * CPU is canceled. */ if (panic_cpu != PCPU_GET(cpuid)) while (atomic_cmpset_int(&panic_cpu, NOCPU, PCPU_GET(cpuid)) == 0) while (panic_cpu != NOCPU) ; /* nothing */ #endif In the smpng branch in p4, I have the lock changed to be based on the thread rather than the CPU to account for problems coming from migration due to preemption while in a panic, but I haven't observed any noticeable improvement from the change: --- //depot/vendor/freebsd/src/sys/kern/kern_shutdown.c 2004/11/05 19:00:32 +++ //depot/projects/smpng/sys/kern/kern_shutdown.c 2004/11/05 19:22:55 @@ -473,7 +473,7 @@ } #ifdef SMP -static u_int panic_cpu = NOCPU; +static struct thread *panic_thread = NULL; #endif /* @@ -494,15 +494,14 @@ #ifdef SMP /* * We don't want multiple CPU's to panic at the same time, so we - * use panic_cpu as a simple spinlock. We have to keep checking - * panic_cpu if we are spinning in case the panic on the first + * use panic_thread as a simple spinlock. We have to keep checking + * panic_thread if we are spinning in case the panic on the first * CPU is canceled. */ - if (panic_cpu != PCPU_GET(cpuid)) - while (atomic_cmpset_int(&panic_cpu, NOCPU, - PCPU_GET(cpuid)) == 0) - while (panic_cpu != NOCPU) - ; /* nothing */ + if (panic_thread != curthread) + while (atomic_cmpset_ptr(&panic_thread, NULL, curthread) == 0) + while (panic_thread != NULL) + cpu_spinwait(); #endif bootopt = RB_AUTOBOOT | RB_DUMP; @@ -538,7 +537,7 @@ /* See if the user aborted the panic, in which case we continue. */ if (panicstr == NULL) { #ifdef SMP - atomic_store_rel_int(&panic_cpu, NOCPU); + atomic_store_rel_ptr(&panic_thread, NULL); #endif return; } -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org