Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 3 Nov 2005 12:29:52 -0700
From:      Lonnie VanZandt <lonnie.vanzandt@ngc.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-current@freebsd.org, marcel@freebsd.org
Subject:   Re: Cdiff patch for kernel gdb and mi_switch panic in freebsd 5.4 STABLE
Message-ID:  <200511031229.53501.lonnie.vanzandt@ngc.com>
In-Reply-To: <200511031327.18011.jhb@freebsd.org>
References:  <200509220742.10364.lonnie.vanzandt@ngc.com> <200511031327.18011.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
I think I follow the proposal. Sure, I'll apply your patch and run with it on 
my SMP box. It may take a while to reach a conclusion on its merits due to 
the racy nature of the crash.

On Thursday 03 November 2005 11:27 am, John Baldwin wrote:
> On Sunday 09 October 2005 05:49 pm, Lonnie VanZandt wrote:
> > Attached is the patch for the revised subr_kdb.c from FreeBSD 5.4 STABLE.
> > (the rcsid is __FBSDID("$FreeBSD: src/sys/kern/subr_kdb.c,v 1.5.2.2.2.1
> > 2005/05/01 05:38:14 dwhite Exp $"); )
>
> I've looked at this, but I think t could maybe be done slightly
> differently. Here's a suggested patch that would close the race you are
> seeing I think while allowing semantics such that if two CPUs try to enter
> KDB at the same time, they would serialize and the second CPU would enter
> kdb after the first had exited.  Could you at least test it to see if it
> addresses your race condition?
>
> --- //depot/projects/smpng/sys/kern/subr_kdb.c	2005/10/27 19:51:50
> +++ //depot/user/jhb/ktrace/kern/subr_kdb.c	2005/11/03 18:24:38
> @@ -39,6 +39,7 @@
>  #include <sys/smp.h>
>  #include <sys/sysctl.h>
>
> +#include <machine/cpu.h>
>  #include <machine/kdb.h>
>  #include <machine/pcb.h>
>
> @@ -462,12 +463,21 @@
>  		return (0);
>
>  	/* We reenter the debugger through kdb_reenter(). */
> -	if (kdb_active)
> +	if (kdb_active == PCPU_GET(cpuid) + 1)
>  		return (0);
>
>  	critical_enter();
>
> -	kdb_active++;
> +	/*
> +	 * If more than one CPU tries to enter KDB at the same time
> +	 * then force them to serialize and go one at a time.
> +	 */
> +	while (!atomic_cmpset_int(&kdb_active, 0, PCPU_GET(cpuid) + 1)) {
> +		critical_exit();
> +		while (kdb_active)
> +			cpu_spinwait();
> +		critical_enter();
> +	}
>
>  #ifdef SMP
>  	if ((did_stop_cpus = kdb_stop_cpus) != 0)
> @@ -484,13 +494,17 @@
>
>  	handled = kdb_dbbe->dbbe_trap(type, code);
>
> +	/*
> +	 * We have to exit KDB before resuming the other CPUs so that they
> +	 * may run in a debugger-less context.
> +	 */
> +	kdb_active = 0;
> +
>  #ifdef SMP
>  	if (did_stop_cpus)
>  		restart_cpus(stopped_cpus);
>  #endif
>
> -	kdb_active--;
> -
>  	critical_exit();
>
>  	return (handled);




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200511031229.53501.lonnie.vanzandt>