From owner-freebsd-current@FreeBSD.ORG  Thu Dec  1 17:41:30 2005
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0B95F16A43B;
	Thu,  1 Dec 2005 17:41:30 +0000 (GMT)
	(envelope-from Lonnie.Vanzandt@ngc.com)
Received: from xcgmd812.northgrum.com (xcgmd812.northgrum.com
	[155.104.240.108])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CBFC643D7D;
	Thu,  1 Dec 2005 17:41:19 +0000 (GMT)
	(envelope-from Lonnie.Vanzandt@ngc.com)
Received: from xbhm0001.northgrum.com ([155.104.118.90]) by
	xcgmd812.northgrum.com with InterScan Messaging Security Suite;
	Thu, 01 Dec 2005 09:37:57 -0800
Received: from xcgco501.northgrum.com ([158.114.104.53]) by
	xbhm0001.northgrum.com with Microsoft SMTPSVC(6.0.3790.211); 
	Thu, 1 Dec 2005 12:36:23 -0500
Received: from [192.168.170.130] ([158.114.106.12]) by xcgco501.northgrum.com
	with Microsoft SMTPSVC(5.0.2195.6713); 
	Thu, 1 Dec 2005 10:35:14 -0700
From: Lonnie VanZandt <lonnie.vanzandt@ngc.com>
Organization: Northrop Grumman
To: John Baldwin <jhb@freebsd.org>
Date: Thu, 1 Dec 2005 10:33:29 -0700
User-Agent: KMail/1.8.3
References: <200509220742.10364.lonnie.vanzandt@ngc.com>
	<200511031327.18011.jhb@freebsd.org>
	<200511031229.53501.lonnie.vanzandt@ngc.com>
In-Reply-To: <200511031229.53501.lonnie.vanzandt@ngc.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-6"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200512011033.31006.lonnie.vanzandt@ngc.com>
X-OriginalArrivalTime: 01 Dec 2005 17:35:14.0294 (UTC)
	FILETIME=[965D9160:01C5F69D]
Cc: freebsd-current@freebsd.org, marcel@freebsd.org
Subject: Re: Cdiff patch for kernel gdb and mi_switch panic in freebsd 5.4
	STABLE
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lonnie.vanzandt@ngc.com
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Dec 2005 17:41:30 -0000

Well, a patch in this area remains needed for 6.0 STABLE/REL_ENG. I just 
completed our 6.0 upgrade and got back to doing some kgdb debugging on our 
SMP box and blip! immediately encountered this kernel panic.

So, now motivated, I'm applying your alternative patch and will report back 
should it not suffice.

Lonnie.

On Thursday 03 November 2005 12:29 pm, Lonnie VanZandt wrote:
> I think I follow the proposal. Sure, I'll apply your patch and run with it
> on my SMP box. It may take a while to reach a conclusion on its merits due
> to the racy nature of the crash.
>
> On Thursday 03 November 2005 11:27 am, John Baldwin wrote:
> > On Sunday 09 October 2005 05:49 pm, Lonnie VanZandt wrote:
> > > Attached is the patch for the revised subr_kdb.c from FreeBSD 5.4
> > > STABLE. (the rcsid is __FBSDID("$FreeBSD: src/sys/kern/subr_kdb.c,v
> > > 1.5.2.2.2.1 2005/05/01 05:38:14 dwhite Exp $"); )
> >
> > I've looked at this, but I think t could maybe be done slightly
> > differently. Here's a suggested patch that would close the race you are
> > seeing I think while allowing semantics such that if two CPUs try to
> > enter KDB at the same time, they would serialize and the second CPU would
> > enter kdb after the first had exited.  Could you at least test it to see
> > if it addresses your race condition?
> >
> > --- //depot/projects/smpng/sys/kern/subr_kdb.c	2005/10/27 19:51:50
> > +++ //depot/user/jhb/ktrace/kern/subr_kdb.c	2005/11/03 18:24:38
> > @@ -39,6 +39,7 @@
> >  #include <sys/smp.h>
> >  #include <sys/sysctl.h>
> >
> > +#include <machine/cpu.h>
> >  #include <machine/kdb.h>
> >  #include <machine/pcb.h>
> >
> > @@ -462,12 +463,21 @@
> >  		return (0);
> >
> >  	/* We reenter the debugger through kdb_reenter(). */
> > -	if (kdb_active)
> > +	if (kdb_active == PCPU_GET(cpuid) + 1)
> >  		return (0);
> >
> >  	critical_enter();
> >
> > -	kdb_active++;
> > +	/*
> > +	 * If more than one CPU tries to enter KDB at the same time
> > +	 * then force them to serialize and go one at a time.
> > +	 */
> > +	while (!atomic_cmpset_int(&kdb_active, 0, PCPU_GET(cpuid) + 1)) {
> > +		critical_exit();
> > +		while (kdb_active)
> > +			cpu_spinwait();
> > +		critical_enter();
> > +	}
> >
> >  #ifdef SMP
> >  	if ((did_stop_cpus = kdb_stop_cpus) != 0)
> > @@ -484,13 +494,17 @@
> >
> >  	handled = kdb_dbbe->dbbe_trap(type, code);
> >
> > +	/*
> > +	 * We have to exit KDB before resuming the other CPUs so that they
> > +	 * may run in a debugger-less context.
> > +	 */
> > +	kdb_active = 0;
> > +
> >  #ifdef SMP
> >  	if (did_stop_cpus)
> >  		restart_cpus(stopped_cpus);
> >  #endif
> >
> > -	kdb_active--;
> > -
> >  	critical_exit();
> >
> >  	return (handled);