From owner-freebsd-current@FreeBSD.ORG Sun Oct 9 21:47:58 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2173616A41F for ; Sun, 9 Oct 2005 21:47:58 +0000 (GMT) (envelope-from lonniev@predictableresponse.com) Received: from predictableresponse.com (s1.n31.vds2000.com [66.84.31.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8AB7E43D45 for ; Sun, 9 Oct 2005 21:47:57 +0000 (GMT) (envelope-from lonniev@predictableresponse.com) Received: from PREDICTABLE (216-58-174-109.speedtrail.net [216.58.174.109]) by predictableresponse.com (8.11.6/8.11.6) with ESMTP id j99LluZ14614 for ; Sun, 9 Oct 2005 17:47:56 -0400 From: "Lonnie VanZandt" Sender: "Lonnie VanZandt" To: Date: Sun, 9 Oct 2005 15:49:23 -0600 Organization: Predictable Response Consulting Message-ID: <200509220742.10364.lonnie.vanzandt@ngc.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_007D_01C5CCE9.060ED9D0" X-Mailer: Microsoft Outlook, Build 10.0.6626 X-OriginalArrivalTime: 22 Sep 2005 13:43:53.0149 (UTC) FILETIME=[ABA46ED0:01C5BF7B] X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Subject: Cdiff patch for kernel gdb and mi_switch panic in freebsd 5.4 STABLE X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lonnie.vanzandt@ngc.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Oct 2005 21:47:58 -0000 This is a multi-part message in MIME format. ------=_NextPart_000_007D_01C5CCE9.060ED9D0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Attached is the patch for the revised subr_kdb.c from FreeBSD 5.4 STABLE. (the rcsid is __FBSDID("$FreeBSD: src/sys/kern/subr_kdb.c,v 1.5.2.2.2.1 2005/05/01 05:38:14 dwhite Exp $"); ) On Wednesday 21 September 2005 03:14 pm, Lonnie VanZandt wrote: > Yes, the situation is greatly improved with this edit (plus a wee bit > more). Once the target reboots, I'll send along its revised subr_kdb.c > file. Feel free to tell me where to officially submit a bug report and > a proposed patch. > > On Wednesday 21 September 2005 01:14 pm, Lonnie VanZandt wrote: > > No, critical_enter() is _not_ SMP-safe. So, on an SMP system, a CPU > > outside of the one currently in KDB can resume running and trap on > > any inserted breakpoint _before_ kdb_active is decremented back to > > 0. Et voila! mi_switch will panic. > > > > I'll move the decrement, rebuild the kernel, try the result, and > > report back later what I find. > > > > On Wednesday 21 September 2005 01:09 pm, Lonnie VanZandt wrote: > > > In this snippet, > > > > > > #ifdef SMP > > > if (did_stop_cpus) > > > restart_cpus(stopped_cpus); > > > #endif > > > > > > kdb_active--; > > > > > > critical_exit(); > > > > > > Wouldn't it be better (or even required) to move the decrement of > > > kdb_active _before_ restarting stopped CPUs? Maybe > > > critical_enter()/critical_exit() implies an SMP-safe region? > > > > > > On Wednesday 21 September 2005 01:03 pm, Lonnie VanZandt wrote: > > > > A gtags/global search for kdb_active reports that the variable > > > > is never explicitly set back to 0 and is only set in > > > > kern/subr_kdb.c in kdb_trap(). There, it is incremented on entry > > > > and decremented on exit. That seems appropriate. Maybe there is > > > > an SMP oversight and the second CPU is triggering a trap back > > > > into the debugger? (That would imply that the other CPU wasn't > > > > really stopped or that there is a race condition with setting > > > > kdb_active back to false and CPUs coming out of the stopped > > > > state.) > > > > > > > > Or perhaps something is not quite right with the ddb/kdb/gdb > > > > interactions for kdb_reenter()? > > > > > > > > On Wednesday 21 September 2005 12:32 pm, Lonnie VanZandt wrote: > > > > > In 5.4 Stable, when attempting to debug kernels remotely, I > > > > > all too frequently encounter panics within the target kernel > > > > > as it attempts to return to the stopped thread. > > > > > > > > > > The panic report shows that this code is getting triggered: > > > > > > > > > > /* > > > > > * Don't perform context switches from the debugger. > > > > > */ > > > > > if (kdb_active) { > > > > > mtx_unlock_spin(&sched_lock); > > > > > kdb_backtrace(); > > > > > kdb_reenter(); > > > > > panic("%s: did not reenter debugger", __func__); > > > > > } > > > > > > > > > > My initial guess is that somewhere kdb_active is not getting > > > > > set back to 0/False. > > > > > > > > > > Is there a post-5.4 fix for this? ------------------------------------------------------- ------=_NextPart_000_007D_01C5CCE9.060ED9D0 Content-Type: text/x-diff; name="cdiff.out" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="cdiff.out" *** subr_kdb.c.orig Wed Sep 21 15:27:16 2005 --- /tmp/subr_kdb.c Wed Sep 21 15:23:23 2005 *************** *** 444,455 **** makectx(tf, &kdb_pcb); critical_enter(); ! kdb_active++; ! kdb_frame = tf; ! kdb_thr_select(curthread); ! #ifdef SMP if ((did_stop_cpus = kdb_stop_cpus) != 0) { --- 444,454 ---- makectx(tf, &kdb_pcb); + // disable interrupts to our own CPU critical_enter(); ! // halt any other CPUs who might trip over our ! // traps and try to rush into KDB before us #ifdef SMP if ((did_stop_cpus = kdb_stop_cpus) != 0) { *************** *** 462,479 **** } #endif /* Let MD code do its thing first... */ kdb_cpu_trap(type, code); handled = kdb_dbbe->dbbe_trap(type, code); #ifdef SMP if (did_stop_cpus) restart_cpus(stopped_cpus); #endif ! kdb_active--; ! critical_exit(); return (handled); --- 461,497 ---- } #endif + // we can't be interrupted now and we must be + // the only running CPU. So, if kdb_active remains 0 + // then we have won the race to enter KDB, otherwise + // some other CPU beat us through the gate... + if ( 0 != kdb_active ) + { + return 0; + } + + // claim the prize + kdb_active++; + + // set the kdb context + kdb_frame = tf; + kdb_thr_select( curthread ); + /* Let MD code do its thing first... */ kdb_cpu_trap(type, code); handled = kdb_dbbe->dbbe_trap(type, code); + // release the prize + kdb_active--; + + // let other CPUs have a chance to run #ifdef SMP if (did_stop_cpus) restart_cpus(stopped_cpus); #endif ! // let our own ISRs to run critical_exit(); return (handled); ------=_NextPart_000_007D_01C5CCE9.060ED9D0--