From owner-freebsd-current@FreeBSD.ORG Thu Dec 1 17:41:30 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0B95F16A43B; Thu, 1 Dec 2005 17:41:30 +0000 (GMT) (envelope-from Lonnie.Vanzandt@ngc.com) Received: from xcgmd812.northgrum.com (xcgmd812.northgrum.com [155.104.240.108]) by mx1.FreeBSD.org (Postfix) with ESMTP id CBFC643D7D; Thu, 1 Dec 2005 17:41:19 +0000 (GMT) (envelope-from Lonnie.Vanzandt@ngc.com) Received: from xbhm0001.northgrum.com ([155.104.118.90]) by xcgmd812.northgrum.com with InterScan Messaging Security Suite; Thu, 01 Dec 2005 09:37:57 -0800 Received: from xcgco501.northgrum.com ([158.114.104.53]) by xbhm0001.northgrum.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 1 Dec 2005 12:36:23 -0500 Received: from [192.168.170.130] ([158.114.106.12]) by xcgco501.northgrum.com with Microsoft SMTPSVC(5.0.2195.6713); Thu, 1 Dec 2005 10:35:14 -0700 From: Lonnie VanZandt Organization: Northrop Grumman To: John Baldwin Date: Thu, 1 Dec 2005 10:33:29 -0700 User-Agent: KMail/1.8.3 References: <200509220742.10364.lonnie.vanzandt@ngc.com> <200511031327.18011.jhb@freebsd.org> <200511031229.53501.lonnie.vanzandt@ngc.com> In-Reply-To: <200511031229.53501.lonnie.vanzandt@ngc.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-6" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200512011033.31006.lonnie.vanzandt@ngc.com> X-OriginalArrivalTime: 01 Dec 2005 17:35:14.0294 (UTC) FILETIME=[965D9160:01C5F69D] Cc: freebsd-current@freebsd.org, marcel@freebsd.org Subject: Re: Cdiff patch for kernel gdb and mi_switch panic in freebsd 5.4 STABLE X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lonnie.vanzandt@ngc.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Dec 2005 17:41:30 -0000 Well, a patch in this area remains needed for 6.0 STABLE/REL_ENG. I just completed our 6.0 upgrade and got back to doing some kgdb debugging on our SMP box and blip! immediately encountered this kernel panic. So, now motivated, I'm applying your alternative patch and will report back should it not suffice. Lonnie. On Thursday 03 November 2005 12:29 pm, Lonnie VanZandt wrote: > I think I follow the proposal. Sure, I'll apply your patch and run with it > on my SMP box. It may take a while to reach a conclusion on its merits due > to the racy nature of the crash. > > On Thursday 03 November 2005 11:27 am, John Baldwin wrote: > > On Sunday 09 October 2005 05:49 pm, Lonnie VanZandt wrote: > > > Attached is the patch for the revised subr_kdb.c from FreeBSD 5.4 > > > STABLE. (the rcsid is __FBSDID("$FreeBSD: src/sys/kern/subr_kdb.c,v > > > 1.5.2.2.2.1 2005/05/01 05:38:14 dwhite Exp $"); ) > > > > I've looked at this, but I think t could maybe be done slightly > > differently. Here's a suggested patch that would close the race you are > > seeing I think while allowing semantics such that if two CPUs try to > > enter KDB at the same time, they would serialize and the second CPU would > > enter kdb after the first had exited. Could you at least test it to see > > if it addresses your race condition? > > > > --- //depot/projects/smpng/sys/kern/subr_kdb.c 2005/10/27 19:51:50 > > +++ //depot/user/jhb/ktrace/kern/subr_kdb.c 2005/11/03 18:24:38 > > @@ -39,6 +39,7 @@ > > #include > > #include > > > > +#include > > #include > > #include > > > > @@ -462,12 +463,21 @@ > > return (0); > > > > /* We reenter the debugger through kdb_reenter(). */ > > - if (kdb_active) > > + if (kdb_active == PCPU_GET(cpuid) + 1) > > return (0); > > > > critical_enter(); > > > > - kdb_active++; > > + /* > > + * If more than one CPU tries to enter KDB at the same time > > + * then force them to serialize and go one at a time. > > + */ > > + while (!atomic_cmpset_int(&kdb_active, 0, PCPU_GET(cpuid) + 1)) { > > + critical_exit(); > > + while (kdb_active) > > + cpu_spinwait(); > > + critical_enter(); > > + } > > > > #ifdef SMP > > if ((did_stop_cpus = kdb_stop_cpus) != 0) > > @@ -484,13 +494,17 @@ > > > > handled = kdb_dbbe->dbbe_trap(type, code); > > > > + /* > > + * We have to exit KDB before resuming the other CPUs so that they > > + * may run in a debugger-less context. > > + */ > > + kdb_active = 0; > > + > > #ifdef SMP > > if (did_stop_cpus) > > restart_cpus(stopped_cpus); > > #endif > > > > - kdb_active--; > > - > > critical_exit(); > > > > return (handled);