From owner-freebsd-current@FreeBSD.ORG  Sun Oct  9 21:47:58 2005
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2173616A41F
	for <freebsd-current@freebsd.org>; Sun,  9 Oct 2005 21:47:58 +0000 (GMT)
	(envelope-from lonniev@predictableresponse.com)
Received: from predictableresponse.com (s1.n31.vds2000.com [66.84.31.1])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8AB7E43D45
	for <freebsd-current@freebsd.org>; Sun,  9 Oct 2005 21:47:57 +0000 (GMT)
	(envelope-from lonniev@predictableresponse.com)
Received: from PREDICTABLE (216-58-174-109.speedtrail.net [216.58.174.109])
	by predictableresponse.com (8.11.6/8.11.6) with ESMTP id j99LluZ14614
	for <freebsd-current@freebsd.org>; Sun, 9 Oct 2005 17:47:56 -0400
From: "Lonnie VanZandt" <lonnie.vanzandt@ngc.com>
Sender: "Lonnie VanZandt" <lonniev@predictableresponse.com>
To: <freebsd-current@freebsd.org>
Date: Sun, 9 Oct 2005 15:49:23 -0600
Organization: Predictable Response Consulting
Message-ID: <200509220742.10364.lonnie.vanzandt@ngc.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="----=_NextPart_000_007D_01C5CCE9.060ED9D0"
X-Mailer: Microsoft Outlook, Build 10.0.6626
X-OriginalArrivalTime: 22 Sep 2005 13:43:53.0149 (UTC)
	FILETIME=[ABA46ED0:01C5BF7B]
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180
Subject: Cdiff patch for kernel gdb and mi_switch panic in freebsd 5.4 STABLE
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lonnie.vanzandt@ngc.com
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 09 Oct 2005 21:47:58 -0000

This is a multi-part message in MIME format.

------=_NextPart_000_007D_01C5CCE9.060ED9D0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

Attached is the patch for the revised subr_kdb.c from FreeBSD 5.4 STABLE.
(the rcsid is __FBSDID("$FreeBSD: src/sys/kern/subr_kdb.c,v 1.5.2.2.2.1
2005/05/01 05:38:14 dwhite Exp $"); )

On Wednesday 21 September 2005 03:14 pm, Lonnie VanZandt wrote:
> Yes, the situation is greatly improved with this edit (plus a wee bit 
> more). Once the target reboots, I'll send along its revised subr_kdb.c 
> file. Feel free to tell me where to officially submit a bug report and 
> a proposed patch.
>
> On Wednesday 21 September 2005 01:14 pm, Lonnie VanZandt wrote:
> > No, critical_enter() is _not_ SMP-safe. So, on an SMP system, a CPU 
> > outside of the one currently in KDB can resume running and trap on 
> > any inserted breakpoint _before_ kdb_active is decremented back to 
> > 0. Et voila! mi_switch will panic.
> >
> > I'll move the decrement, rebuild the kernel, try the result, and 
> > report back later what I find.
> >
> > On Wednesday 21 September 2005 01:09 pm, Lonnie VanZandt wrote:
> > > In this snippet,
> > >
> > > #ifdef SMP
> > >         if (did_stop_cpus)
> > >                 restart_cpus(stopped_cpus);
> > > #endif
> > >
> > >         kdb_active--;
> > >
> > >         critical_exit();
> > >
> > > Wouldn't it be better (or even required) to move the decrement of 
> > > kdb_active _before_ restarting stopped CPUs? Maybe
> > > critical_enter()/critical_exit() implies an SMP-safe region?
> > >
> > > On Wednesday 21 September 2005 01:03 pm, Lonnie VanZandt wrote:
> > > > A gtags/global search for kdb_active reports that the variable 
> > > > is never explicitly set back to 0 and is only set in 
> > > > kern/subr_kdb.c in kdb_trap(). There, it is incremented on entry 
> > > > and decremented on exit. That seems appropriate. Maybe there is 
> > > > an SMP oversight and the second CPU is triggering a trap back 
> > > > into the debugger? (That would imply that the other CPU wasn't 
> > > > really stopped or that there is a race condition with setting 
> > > > kdb_active back to false and CPUs coming out of the stopped 
> > > > state.)
> > > >
> > > > Or perhaps something is not quite right with the ddb/kdb/gdb 
> > > > interactions for kdb_reenter()?
> > > >
> > > > On Wednesday 21 September 2005 12:32 pm, Lonnie VanZandt wrote:
> > > > > In 5.4 Stable, when attempting to debug kernels remotely, I 
> > > > > all too frequently encounter panics within the target kernel 
> > > > > as it attempts to return to the stopped thread.
> > > > >
> > > > > The panic report shows that this code is getting triggered:
> > > > >
> > > > >         /*
> > > > >          * Don't perform context switches from the debugger.
> > > > >          */
> > > > >         if (kdb_active) {
> > > > >                 mtx_unlock_spin(&sched_lock);
> > > > >                 kdb_backtrace();
> > > > >                 kdb_reenter();
> > > > >                 panic("%s: did not reenter debugger", __func__);
> > > > >         }
> > > > >
> > > > > My initial guess is that somewhere kdb_active is not getting 
> > > > > set back to 0/False.
> > > > >
> > > > > Is there a post-5.4 fix for this?

-------------------------------------------------------

------=_NextPart_000_007D_01C5CCE9.060ED9D0
Content-Type: text/x-diff;
	name="cdiff.out"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename="cdiff.out"

*** subr_kdb.c.orig	Wed Sep 21 15:27:16 2005
--- /tmp/subr_kdb.c	Wed Sep 21 15:23:23 2005
***************
*** 444,455 ****
  
  	makectx(tf, &kdb_pcb);
  
  	critical_enter();
  
! 	kdb_active++;
! 	kdb_frame = tf;
! 	kdb_thr_select(curthread);
! 
  #ifdef SMP
  	if ((did_stop_cpus = kdb_stop_cpus) != 0)
  	  {
--- 444,454 ----
  
  	makectx(tf, &kdb_pcb);
  
+ 	// disable interrupts to our own CPU
  	critical_enter();
  
! 	// halt any other CPUs who might trip over our
! 	// traps and try to rush into KDB before us
  #ifdef SMP
  	if ((did_stop_cpus = kdb_stop_cpus) != 0)
  	  {
***************
*** 462,479 ****
  	  }
  #endif
  
  	/* Let MD code do its thing first... */
  	kdb_cpu_trap(type, code);
  
  	handled = kdb_dbbe->dbbe_trap(type, code);
  
  #ifdef SMP
  	if (did_stop_cpus)
  		restart_cpus(stopped_cpus);
  #endif
  
! 	kdb_active--;
! 
  	critical_exit();
  
  	return (handled);
--- 461,497 ----
  	  }
  #endif
  
+ 	// we can't be interrupted now and we must be
+ 	// the only running CPU. So, if kdb_active remains 0
+ 	// then we have won the race to enter KDB, otherwise
+ 	// some other CPU beat us through the gate...
+ 	if ( 0 != kdb_active )
+ 	{
+ 		return 0;
+ 	}
+ 
+ 	// claim the prize
+ 	kdb_active++;
+ 
+ 	// set the kdb context
+ 	kdb_frame = tf;
+ 	kdb_thr_select( curthread );
+ 
  	/* Let MD code do its thing first... */
  	kdb_cpu_trap(type, code);
  
  	handled = kdb_dbbe->dbbe_trap(type, code);
  
+ 	// release the prize
+ 	kdb_active--;
+ 
+ 	// let other CPUs have a chance to run
  #ifdef SMP
  	if (did_stop_cpus)
  		restart_cpus(stopped_cpus);
  #endif
  
! 	// let our own ISRs to run
  	critical_exit();
  
  	return (handled);

------=_NextPart_000_007D_01C5CCE9.060ED9D0--