Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 Jan 2005 22:45:54 +0100
From:      Peter Holm <peter@holm.cc>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Assertion td->td_sleepqueue != NULL failed at kern/subr_sleepqueue.c:270
Message-ID:  <20050106214554.GA45533@peter.osted.lan>
In-Reply-To: <200501061617.49967.jhb@FreeBSD.org>
References:  <20050105122636.GA31684@peter.osted.lan> <200501061617.49967.jhb@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jan 06, 2005 at 04:17:49PM -0500, John Baldwin wrote:
> On Wednesday 05 January 2005 07:26 am, Peter Holm wrote:
> > With GENERIC HEAD from Dec 31 09:28 UTC + bmilekic@'s uma_core
> > patch + alc's patch I got the following strange assert:
> >
> > panic(c0827c46,c082dd18,c082dc8d,10e,c08f4660) at panic+0x190
> > sleepq_add(c08eec90,c08ee6e8,c082a9bf,1,c08ee6e8,0,c0827ca9,7d)
> >    at sleepq_add+0x156
> > cv_wait(c08eec90,c08ee6e8,c151de30,0,ffffffff) at cv_wait+0x100
> > _sx_xlock(c08eec60,c0828867,247,0,c151ddc8) at _sx_xlock+0x59
> > kern_wait(c151e450,ffffffff,cbc67c90,0,0) at kern_wait+0x4b
> > wait4(c151e450,cbc67d14,4,3f8,282) at wait4+0x29
> > syscall(2f,2f,bfbf002f,2,0) at syscall+0x128
> > Xint0x80_syscall() at Xint0x80_syscall+0x1f
> > --- syscall (7, FreeBSD ELF32, wait4), eip = 0x805170b, esp =
> > 0xbfbfedbc, ebp = 0xbfbfedd8 ---
> >
> > Looks like td->td_sleepqueue is NULL!
> >
> > Details at http://www.holm.cc/stress/log/cons100.html
> 
> This is a truly odd panic.  The basic theory of operation with sleep queues is 
> that every thread that is not already queued on a sleep queue carries a sleep 
> queue structure around that they donate to a wait channel when they block on 
> it.  Once they are resumed, they reclaim a sleep queue from the waitchannel.  
> This resuming bit happens in sleepq_remove_thread() in subr_sleepqueue.c.  As 
> you can see, in addition to assigning a sleepqueue to the thread being 
> removed from a queue, it also clears td_wchan and td_wmesg.  The thread in 
> question has both fields set (as if it were asleep on "proctree", which is 
> what it is trying to back to sleep on now).  However, it is not on a sleep 
> queue (td_slpq.tqe_next is NULL).  So, apparently, it seems that a thread was 
> removed from the sleep queue and resumed (made runnable) but 
> sleepq_remove_thread() wasn't called.  Do you have any local patches that 
> might affect this btw?  I notice you get a lot of trap 9's in your dmesg 
> which is somewhat unsettling.

These are the modifications:
http://www.holm.cc/stress/log/mods.html

The trap 9 are not uncommon for the test suite.

> 
> -- 
> John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

-- 
Peter Holm



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050106214554.GA45533>