From owner-freebsd-current@FreeBSD.ORG Tue Nov 4 00:51:10 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 449BF16A4CF; Tue, 4 Nov 2003 00:51:10 -0800 (PST) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id A007D43F75; Tue, 4 Nov 2003 00:51:08 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3p2/8.8.7) with ESMTP id TAA28390; Tue, 4 Nov 2003 19:51:04 +1100 Date: Tue, 4 Nov 2003 19:51:04 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: John Baldwin In-Reply-To: Message-ID: <20031104192418.Q5684@gamplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Soren Schmidt cc: current@freebsd.org Subject: Re: NULL td passed to propagate_priority() when using xmms... X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Nov 2003 08:51:10 -0000 On Mon, 3 Nov 2003, John Baldwin wrote: > On 01-Nov-2003 Soren Schmidt wrote: > > It seems Sean Chittenden wrote: > >> Howdy. I'm not sure if this is a ULE bug or a KSE bug, or both, but, > >> for those interested (this is using ule 1.67, rebuilding world now), > >> here's my stack. I couldn't figure out where td was being set to > >> NULL. :( Oh! Where is TD_SET_LOCK defined? egrep -r didn't turn up > >> anything. -sc > > > > Its not ULE, I'm running 4BSD and has gotten this on boot for over a > > week now, rendering -current totally useless... > > Having a kernel panic with INVARIANTS on would really help narrow down > where the bug is. I found something that causes this bug fairly reliably: - configure ddb so that db_print_backtrace() is called on panics. - break the fd driver so that the panic() in fdstrategy() is called on floppy accesses. - attempt to access a floppy so that fdstrategy() is called. - db_print_backtrace() then does bad things. It never completes here, though it works in other contexts. Usually it prints only the first line or two. Then quite often ddb is called for a null pointer panic in propagate_priority(). More details about the null pointer panic: This seems to have nothing to do with scheduling. propagate_priority() is not called with a null td of course, but it sometimes follows a null m: %%% /* * Pick up the mutex that td is blocked on. */ m = td->td_blocked; MPASS(m != NULL); /* * Check if the thread needs to be moved up on * the blocked chain */ if (td == TAILQ_FIRST(&m->mtx_blocked)) { continue; } %%% I don't have invariants enabled, so MPASS(m != NULL) doesn't do anything, but m is null so attempting to load m->mtx_blocked causes a panic. For the backtrace context, propagate_priority() gets called for attempting to aquire a lock in softclock(). Tasks like the softclock task get scheduled despite the system being in panic(). ps seemed to show that the user process doing the floppy access no longer existed. I don't know how that could happen, since the panic() is done in the context of the that process. More details about bugs in db_print_backtrace(): Maybe the stack is messed up. Attempting to access invalid stack offsets can cause problems. My version of db_print_backtrace() has extra code to attempt not to access invalid offsets, but there is normally no problem since ddb's trap handler fixes up the problem. But backtrace() bogusly calls db_print_backtrace() in non-ddb context and then the longjmp in the trap handler goes to hyperspace if anywhere. Bugs tripped over while debugging this: Putting a breakpoint in fdopen() didn't work, because fd.c:fdopen() conflicts with kern_descrip.c:fdopen(). This was broken in fd.c 1.259. There are hundreds of similar conflicts in GENERIC, some for obviously broken things like the same malloc type being static in several files. Bruce