Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Oct 2005 15:53:19 GMT
From:      Lonnie VanZandt <lonniev@predictableresponse.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/87990: SMP Race Condition in kdb_enter/kdb_exit code
Message-ID:  <200510251553.j9PFrJJB093937@www.freebsd.org>
Resent-Message-ID: <200510251600.j9PG0UlO010738@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         87990
>Category:       kern
>Synopsis:       SMP Race Condition in kdb_enter/kdb_exit code
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Oct 25 16:00:30 GMT 2005
>Closed-Date:
>Last-Modified:
>Originator:     Lonnie VanZandt
>Release:        5.4 STABLE
>Organization:
Predictable Response Consulting
>Environment:
FreeBSD aperiodic.lab.local 5.4-STABLE FreeBSD 5.4-STABLE #4: Tue Sep 13 10:02:27 MDT 2005     lonniev@predictableresponse.com:/usr/obj/usr/src/sys/APERIODIC  i386

>Description:
Attached is the patch for the revised subr_kdb.c from FreeBSD 5.4 STABLE.
(the rcsid is __FBSDID("$FreeBSD: src/sys/kern/subr_kdb.c,v 1.5.2.2.2.1
2005/05/01 05:38:14 dwhite Exp $"); )

On Wednesday 21 September 2005 03:14 pm, Lonnie VanZandt wrote:
> Yes, the situation is greatly improved with this edit (plus a wee bit 
> more). Once the target reboots, I'll send along its revised subr_kdb.c 
> file. Feel free to tell me where to officially submit a bug report and 
> a proposed patch.
>
> On Wednesday 21 September 2005 01:14 pm, Lonnie VanZandt wrote:
> > No, critical_enter() is _not_ SMP-safe. So, on an SMP system, a CPU 
> > outside of the one currently in KDB can resume running and trap on 
> > any inserted breakpoint _before_ kdb_active is decremented back to 
> > 0. Et voila! mi_switch will panic.
> >
> > I'll move the decrement, rebuild the kernel, try the result, and 
> > report back later what I find.
> >
> > On Wednesday 21 September 2005 01:09 pm, Lonnie VanZandt wrote:
> > > In this snippet,
> > >
> > > #ifdef SMP
> > >         if (did_stop_cpus)
> > >                 restart_cpus(stopped_cpus);
> > > #endif
> > >
> > >         kdb_active--;
> > >
> > >         critical_exit();
> > >
> > > Wouldn't it be better (or even required) to move the decrement of 
> > > kdb_active _before_ restarting stopped CPUs? Maybe
> > > critical_enter()/critical_exit() implies an SMP-safe region?
> > >
> > > On Wednesday 21 September 2005 01:03 pm, Lonnie VanZandt wrote:
> > > > A gtags/global search for kdb_active reports that the variable 
> > > > is never explicitly set back to 0 and is only set in 
> > > > kern/subr_kdb.c in kdb_trap(). There, it is incremented on entry 
> > > > and decremented on exit. That seems appropriate. Maybe there is 
> > > > an SMP oversight and the second CPU is triggering a trap back 
> > > > into the debugger? (That would imply that the other CPU wasn't 
> > > > really stopped or that there is a race condition with setting 
> > > > kdb_active back to false and CPUs coming out of the stopped 
> > > > state.)
> > > >
> > > > Or perhaps something is not quite right with the ddb/kdb/gdb 
> > > > interactions for kdb_reenter()?
> > > >

>How-To-Repeat:
> > > > On Wednesday 21 September 2005 12:32 pm, Lonnie VanZandt wrote:
> > > > > In 5.4 Stable, when attempting to debug kernels remotely, I 
> > > > > all too frequently encounter panics within the target kernel 
> > > > > as it attempts to return to the stopped thread.
> > > > >
> > > > > The panic report shows that this code is getting triggered:
> > > > >
> > > > >         /*
> > > > >          * Don't perform context switches from the debugger.
> > > > >          */
> > > > >         if (kdb_active) {
> > > > >                 mtx_unlock_spin(&sched_lock);
> > > > >                 kdb_backtrace();
> > > > >                 kdb_reenter();
> > > > >                 panic("%s: did not reenter debugger", __func__);
> > > > >         }
> > > > >
> > > > > My initial guess is that somewhere kdb_active is not getting 
> > > > > set back to 0/False.
>Fix:
-------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cdiff.out
Type: text/x-diff
Size: 1691 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20051009/59aa3853/cdiff.bin

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200510251553.j9PFrJJB093937>