From owner-freebsd-current@FreeBSD.ORG Thu Dec 1 21:07:47 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DFB4F16A41F; Thu, 1 Dec 2005 21:07:46 +0000 (GMT) (envelope-from Lonnie.Vanzandt@ngc.com) Received: from xcgmd812.northgrum.com (xcgmd812.northgrum.com [155.104.240.108]) by mx1.FreeBSD.org (Postfix) with ESMTP id D390C43D55; Thu, 1 Dec 2005 21:07:45 +0000 (GMT) (envelope-from Lonnie.Vanzandt@ngc.com) Received: from xbhm0001.northgrum.com ([155.104.118.90]) by xcgmd812.northgrum.com with InterScan Messaging Security Suite; Thu, 01 Dec 2005 12:58:42 -0800 Received: from xcgco501.northgrum.com ([158.114.104.52]) by xbhm0001.northgrum.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 1 Dec 2005 15:57:09 -0500 Received: from [192.168.170.130] ([158.114.106.12]) by xcgco501.northgrum.com with Microsoft SMTPSVC(5.0.2195.6713); Thu, 1 Dec 2005 13:55:39 -0700 From: Lonnie VanZandt Organization: Northrop Grumman To: John Baldwin Date: Thu, 1 Dec 2005 13:55:03 -0700 User-Agent: KMail/1.8.3 References: <200509220742.10364.lonnie.vanzandt@ngc.com> <200511031229.53501.lonnie.vanzandt@ngc.com> <200512011033.31006.lonnie.vanzandt@ngc.com> In-Reply-To: <200512011033.31006.lonnie.vanzandt@ngc.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-6" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200512011355.04464.lonnie.vanzandt@ngc.com> X-OriginalArrivalTime: 01 Dec 2005 20:55:39.0293 (UTC) FILETIME=[95D2CCD0:01C5F6B9] Cc: freebsd-current@freebsd.org, marcel@freebsd.org Subject: Re: Cdiff patch for kernel gdb and mi_switch panic in freebsd 5.4 STABLE X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lonnie.vanzandt@ngc.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Dec 2005 21:07:47 -0000 Early testing seems to indicate that the proposed patch doesn't work as well as the one I came up with. (I'm not saying mine is better because there may be other issues which I neglected that yours addresses. I'm just saying that for the immediate need of being able to reliably trap into and out of kdb on my SMP box, my patch worked better.) So, I will revert to the edits that I had and retry the current testing. If the mi_switch/kdb_enter issues continue then I can let you know if my patch also isn't a viable solution. Lonnie. PS: It's hard to be precise because I'm attempting to debug kernel code (KLDs) which very likely can step out of bounds. On Thursday 01 December 2005 10:33 am, Lonnie VanZandt wrote: > Well, a patch in this area remains needed for 6.0 STABLE/REL_ENG. I just > completed our 6.0 upgrade and got back to doing some kgdb debugging on our > SMP box and blip! immediately encountered this kernel panic. > > So, now motivated, I'm applying your alternative patch and will report back > should it not suffice. > > Lonnie. > > On Thursday 03 November 2005 12:29 pm, Lonnie VanZandt wrote: > > I think I follow the proposal. Sure, I'll apply your patch and run with > > it on my SMP box. It may take a while to reach a conclusion on its merits > > due to the racy nature of the crash. > > > > On Thursday 03 November 2005 11:27 am, John Baldwin wrote: > > > On Sunday 09 October 2005 05:49 pm, Lonnie VanZandt wrote: > > > > Attached is the patch for the revised subr_kdb.c from FreeBSD 5.4 > > > > STABLE. (the rcsid is __FBSDID("$FreeBSD: src/sys/kern/subr_kdb.c,v > > > > 1.5.2.2.2.1 2005/05/01 05:38:14 dwhite Exp $"); ) > > > > > > I've looked at this, but I think t could maybe be done slightly > > > differently. Here's a suggested patch that would close the race you are > > > seeing I think while allowing semantics such that if two CPUs try to > > > enter KDB at the same time, they would serialize and the second CPU > > > would enter kdb after the first had exited. Could you at least test it > > > to see if it addresses your race condition? > > > > > > --- //depot/projects/smpng/sys/kern/subr_kdb.c 2005/10/27 19:51:50 > > > +++ //depot/user/jhb/ktrace/kern/subr_kdb.c 2005/11/03 18:24:38 > > > @@ -39,6 +39,7 @@ > > > #include > > > #include > > > > > > +#include > > > #include > > > #include > > > > > > @@ -462,12 +463,21 @@ > > > return (0); > > > > > > /* We reenter the debugger through kdb_reenter(). */ > > > - if (kdb_active) > > > + if (kdb_active == PCPU_GET(cpuid) + 1) > > > return (0); > > > > > > critical_enter(); > > > > > > - kdb_active++; > > > + /* > > > + * If more than one CPU tries to enter KDB at the same time > > > + * then force them to serialize and go one at a time. > > > + */ > > > + while (!atomic_cmpset_int(&kdb_active, 0, PCPU_GET(cpuid) + 1)) { > > > + critical_exit(); > > > + while (kdb_active) > > > + cpu_spinwait(); > > > + critical_enter(); > > > + } > > > > > > #ifdef SMP > > > if ((did_stop_cpus = kdb_stop_cpus) != 0) > > > @@ -484,13 +494,17 @@ > > > > > > handled = kdb_dbbe->dbbe_trap(type, code); > > > > > > + /* > > > + * We have to exit KDB before resuming the other CPUs so that they > > > + * may run in a debugger-less context. > > > + */ > > > + kdb_active = 0; > > > + > > > #ifdef SMP > > > if (did_stop_cpus) > > > restart_cpus(stopped_cpus); > > > #endif > > > > > > - kdb_active--; > > > - > > > critical_exit(); > > > > > > return (handled);