Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Feb 2005 13:33:42 -0500
From:      John Baldwin <jhb@FreeBSD.org>
To:        Kris Kennaway <kris@obsecurity.org>
Cc:        julian@FreeBSD.org
Subject:   Re: panic: Assertion td->td_sleepqueue != NULL failed at /usr/src/sys/kern/subr_sleepqueue.c:258
Message-ID:  <200502241333.42942.jhb@FreeBSD.org>
In-Reply-To: <20050224013447.GA51370@xor.obsecurity.org>
References:  <20050223235405.GB19137@xor.obsecurity.org> <20050223235515.GA19260@xor.obsecurity.org> <20050224013447.GA51370@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 23 February 2005 08:34 pm, Kris Kennaway wrote:
> On Wed, Feb 23, 2005 at 03:55:15PM -0800, Kris Kennaway wrote:
> > On Wed, Feb 23, 2005 at 03:54:05PM -0800, Kris Kennaway wrote:
> > > I got this on a 12-processor e4500 running RELENG_5:
> > >
> > > panic: Assertion td->td_sleepqueue != NULL failed at
> > > /usr/src/sys/kern/subr_sleepqueue.c:258 cpuid = 0
> > > KDB: enter: panic
> > > [thread pid 1 tid 100003 ]
> > > Stopped at      kdb_enter+0x38: ta              %xcc, 1
> > > db> wh
> > > Tracing pid 1 tid 100003 td 0xfffff801385067b0
> > > panic() at panic+0x19c
> > > sleepq_add() at sleepq_add+0x168
> > > cv_wait() at cv_wait+0x174
> > > _sx_xlock() at _sx_xlock+0x64
> > > kern_wait() at kern_wait+0x3c
> > > wait4() at wait4+0x18
> > > syscall() at syscall+0x220
> > > -- syscall (7, FreeBSD ELF64, wait4) %o7=0x10a7b0 --
> >
> >     1 fffff80138505ab8    0     0     1 0004200 [SLPQ proctree
> > 0xc03de0c8][CPU 0] init
> >
> > > About the only nonstandard thing I did was set
> > > kern.sched.ipiwakeup.onecpu=1 which was suggested for working around
> > > other deadlocks.  I don't recall if preemption is enabled for this
> > > machine (I didn't set it up).  Is there any other online debugging I
> > > can do?
>
> No preemption.  I did get a core, and I'll add KTR_PROC per discussion
> of this same panic when Peter Holm reported it in December.
>
> Kris

I've seen this locally once, Peter and others have seen it as well.  It always 
happens with proctree, which is probably the most heavily contended sx(9) 
lock in the system.  The symptoms are that a thread was asleep on the 
proctree sleep queue.  It was made runnable by someone other than the sleep 
queue code (so somehow TDI_SLEEPING was cleared somewhere else besides 
subr_sleepqueue.c) and so when it resumes, it leaves it sleep queue object 
behind in the sleepqueue chains table.  It also still has td_wchan and 
td_wmesg set.  (sleepq_remove_thread() clears those two when it takes a 
thread off of a sleep queue and gives it a sleep queue object.)

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200502241333.42942.jhb>