From owner-freebsd-current Sun Jun 27 1:44:21 1999 Delivered-To: freebsd-current@freebsd.org Received: from overcee.netplex.com.au (overcee.netplex.com.au [202.12.86.7]) by hub.freebsd.org (Postfix) with ESMTP id 86EC614C96 for ; Sun, 27 Jun 1999 01:44:14 -0700 (PDT) (envelope-from peter@netplex.com.au) Received: from netplex.com.au (localhost [127.0.0.1]) by overcee.netplex.com.au (Postfix) with ESMTP id 24D4D81; Sun, 27 Jun 1999 16:44:14 +0800 (WST) (envelope-from peter@netplex.com.au) X-Mailer: exmh version 2.0.2 2/24/98 To: Matthew Dillon Cc: current@FreeBSD.ORG, mckusick@mckusick.com Subject: Re: BUF_LOCK() related panic.. In-reply-to: Your message of "Sun, 27 Jun 1999 01:15:43 MST." <199906270815.BAA10773@apollo.backplane.com> Date: Sun, 27 Jun 1999 16:44:14 +0800 From: Peter Wemm Message-Id: <19990627084414.24D4D81@overcee.netplex.com.au> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Matthew Dillon wrote: > Ah, yes, some of us were just discussing this in a small mailing list. > Hopefully Kirk will pick up on it soon. Ah well.. someone else gets to b e > the brunt of it for a change :-). Kirk doesn't have an SMP box so he > didn't see the bug. > > I have tentitively tracked the problem down to the apparent inability of > lockmgr() locks to function from interrupts, even when used in a > non-blocking manner, due to the simplelock's it uses internally. The > new buffer cache code Kirk committed switched from B_BUSY (manually > implemented locks) to lockmgr() locks. I think what is going on is > that mainline code is getting a simplelock and then an interrupt is > coming along and also trying to get the same lock, but I can't be sure > because my DDB backtraces are somewhat munged. It seems to me the main problem (so far) is the buftimelock.. simple_lock(&buftimelock); bp->b_lock.lk_wmesg = buf_wmesg; bp->b_lock.lk_prio = PRIBIO + 4; bp->b_lock.lk_timo = 0; return (lockmgr(&(bp)->b_lock, locktype, &buftimelock, curproc)); Inside lockmgr(): simple_lock(&lkp->lk_interlock); if (flags & LK_INTERLOCK) simple_unlock(interlkp); ^^^^^^^^ <--- &buftimelock; Note that there is no LK_INTERLOCK in any of the calls to lockmgr().. On UP, simplelocks are noops. On SMP, they are real and nothing is ever freeing buftimelock. But that doesn't fix the UP problem where cluster_wbuild() tries to recursively re-lock a buf that the current process already owns. I have a few ideas about that one though, I just don't understand the clustering well enough yet to fix it. Speaking of SMP and simple locks, I'd like to turn on the debugging simplelocks that keep a reference count and check before switching to make sure that a process doesn't sleep holding a lock. This is a pretty fundamental sanity check and would have found the LK_INTERLOCK problem above before it got committed. Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message