From owner-freebsd-current Mon Jun 28 0:56:26 1999 Delivered-To: freebsd-current@freebsd.org Received: from flamingo.McKusick.COM (flamingo.mckusick.com [209.31.233.178]) by hub.freebsd.org (Postfix) with ESMTP id A0DDD153B3; Mon, 28 Jun 1999 00:56:20 -0700 (PDT) (envelope-from mckusick@flamingo.McKusick.COM) Received: from flamingo.McKusick.COM (mckusick@localhost.concentric.net [127.0.0.1]) by flamingo.McKusick.COM (8.9.3/8.9.0) with ESMTP id UAA01061; Sun, 27 Jun 1999 20:47:57 -0700 (PDT) Message-Id: <199906280347.UAA01061@flamingo.McKusick.COM> To: Matthew Dillon Subject: Re: Found the startup panic - ccd ( patch included ) Cc: Alan Cox , Julian Elischer , Mike Smith , Peter Wemm , "John S. Dyson" , dg@root.com, dyson@iquest.net, current@freebsd.org, Greg Lehey In-reply-to: Your message of "Sat, 26 Jun 1999 17:51:05 PDT." <199906270051.RAA05366@apollo.backplane.com> Date: Sun, 27 Jun 1999 20:47:51 -0700 From: Kirk McKusick Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG First off, sorry for the silence. This is gay freedom weekend in San Francisco, and I've been rather busy socially. It was poor timing on my part to commit such a big change just before I was going to be gone all weekend. I have tried to answer several questions below. Date: Sat, 26 Jun 1999 17:51:05 -0700 (PDT) From: Matthew Dillon To: Alan Cox Cc: Julian Elischer , Kirk McKusick , Mike Smith , "John S. Dyson" , dg@root.com, dyson@iquest.net Subject: Found the startup panic - ccd ( patch included ) :I'm seeing the same thing here. : : Alan : :P.S. Julian, perhaps a "mild" warning to -smp would be appropriate. :(I'm off to dinner.) I think I found it. The CCD device is malloc()ing buffers and the new lock is not being initialized. diff included below. Also, in the ccd driver, it was inheriting the B_BUSY by copying b_flags - but Kirk must-a missed it because it was not explicitly specifying B_BUSY. So we also have to call BUF_LOCK() after the BUF_LOCKINIT(). Bleh. I'm not sure I want to know why ccd is malloc()ing its own buffer structures. I'll bet we could improve ccd's performance significantly by having it use getpbuf(). Alan, et all, apply the patch then recompile and reinstall your /usr/src/sys/modules, recompile and reinstall your kernel, and reboot. Kirk, please review and then commit the patch if all looks ok! Thanks! My test machine gets to where I can login, now. I am running more tests. -Matt Matthew Dillon These changes look absolutely correct to me. I see that Peter has committed them. Index: dev/ccd/ccd.c =================================================================== RCS file: /home/ncvs/src/sys/dev/ccd/ccd.c,v retrieving revision 1.48 diff -u -r1.48 ccd.c --- ccd.c 1999/05/30 16:51:18 1.48 +++ ccd.c 1999/06/27 00:33:52 @@ -915,6 +915,8 @@ cbp->cb_buf.b_data = addr; cbp->cb_buf.b_vp = ci->ci_vp; LIST_INIT(&cbp->cb_buf.b_dep); + BUF_LOCKINIT(&cbp->cb_buf); + BUF_LOCK(&cbp->cb_buf, LK_EXCLUSIVE); cbp->cb_buf.b_resid = 0; if (cs->sc_ileave == 0) cbc = dbtob((off_t)(ci->ci_size - cbn)); @@ -947,6 +949,8 @@ cbp->cb_buf.b_dev = ci2->ci_dev; cbp->cb_buf.b_vp = ci2->ci_vp; LIST_INIT(&cbp->cb_buf.b_dep); + BUF_LOCKINIT(&cbp->cb_buf); + BUF_LOCK(&cbp->cb_buf, LK_EXCLUSIVE); cbp->cb_comp = ci2 - cs->sc_cinfo; cb[1] = cbp; /* link together the ccdbuf's and clear "mirror done" flag */ From: Peter Wemm Date: Sun, 27 Jun 1999 16:44:14 +0800 To: Matthew Dillon Subject: Re: BUF_LOCK() related panic.. Cc: current@FreeBSD.ORG, mckusick@mckusick.com In-reply-to: Your message of "Sun, 27 Jun 1999 01:15:43 MST." <199906270815.BAA10773@apollo.backplane.com> It seems to me the main problem (so far) is the buftimelock.. simple_lock(&buftimelock); bp->b_lock.lk_wmesg = buf_wmesg; bp->b_lock.lk_prio = PRIBIO + 4; bp->b_lock.lk_timo = 0; return (lockmgr(&(bp)->b_lock, locktype, &buftimelock, curproc)); Inside lockmgr(): simple_lock(&lkp->lk_interlock); if (flags & LK_INTERLOCK) simple_unlock(interlkp); ^^^^^^^^ <--- &buftimelock; Note that there is no LK_INTERLOCK in any of the calls to lockmgr().. On UP, simplelocks are noops. On SMP, they are real and nothing is ever freeing buftimelock. But that doesn't fix the UP problem where cluster_wbuild() tries to recursively re-lock a buf that the current process already owns. I have a few ideas about that one though, I just don't understand the clustering well enough yet to fix it. Speaking of SMP and simple locks, I'd like to turn on the debugging simplelocks that keep a reference count and check before switching to make sure that a process doesn't sleep holding a lock. This is a pretty fundamental sanity check and would have found the LK_INTERLOCK problem above before it got committed. Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au The above along with the splbio locking is the correct fix. I see that Peter has already committed it. On the subject of BUF_KERNPROC, this must be done as I have coded it, not later in biodone. The problem is when the filesystem does a readahead. It starts the I/O on the buffer that it really wants, then starts ASYNC reads on the readahead buffers. When it finishes with the buffer that it wanted and the application enters the kernel to get the next buffer, the filesystem will attempt to lock the buffer on which it earlier initiated the read. If the I/O has not yet finished one of two bad things will happen. If the lock is not recursive, it will panic with `locking against myself' since it will hold the lock from the time when it initiated the readahead. If recursive locking is allowed, then it will get the buffer (since it holds the exclusive lock) and use it before it has been filled. So, it is semantically important that ASYNC read buffers be disowned (i.e., given to the kernel) at the time they are created. That way, only the kernel can access the buffer until the read I/O is done. I agree that this code is ugly, but it is necessary. From: Peter Wemm Date: Mon, 28 Jun 1999 06:47:13 +0800 To: Matthew Dillon Subject: Re: BUF_LOCK() related panic.. Cc: current@FreeBSD.ORG, mckusick@mckusick.com In-reply-to: Your message of "Sun, 27 Jun 1999 13:06:13 MST." <199906272006.NAA15499@apollo.backplane.com> Matthew Dillon wrote: > : > :But that doesn't fix the UP problem where cluster_wbuild() tries to > :recursively re-lock a buf that the current process already owns. I have a > :few ideas about that one though, I just don't understand the clustering > :well enough yet to fix it. > > Ok, I just hit this testing the lockmgr changes. > > I think the problem is that cluster_wbuild's algorithm was polluted > a little by Kirk's commit. > > Previously it tested for B_BUSY to determine if it could then lock it > to include in the cluster. Yep, I was aware of this before, but I didn't rip it out since I didn't know what Kirk's intentions were. I'm assuming he's got his experience with making the BSD/OS vfs reentrant in mind, so I don't want to break anything that gets closer to that. A seperate test and then lock would not be reentrant. (Sure, there are far bigger problems than this, but every bit helps when we get there) > Kirk changed this to actually attempt a lock, and then include it > if the lock succeeded and not include it if the lock failed. > > The problem is that if the buffer was already locked by the same process, > this change results in a panic instead of a simple failure to obtain the > lock. > The solution is to re-tool the code to use the original algorithm ( test > the lock before trying to get it, rather then simply trying to get it ), > but with the new locks. I do not have time today to do this but I believ e > I have given sufficient information for Peter, Kirk, or Alan to make the > fix. Actually, I think there is another set if missing BUF_KERNPROC() calls, cluster_callback() frees buffers, so all buffers submitted with it had better be reassigned. This is (I think) part of the problem that cluster_wbuild() is hitting - things were supposed to have been reassigned but are still hanging onto the current process. > I believe there are two or three areas in Kirk's patchset where he > replaced an explicit test with an attempt to actually gain the lock where > this sort of panic can occur. I think Kirk was trying to optimize the > code :-) Heh heh. Just goes to show that combining functional > replacements with optimizations all in one go does not always work. > > -Matt > Matthew Dillon > > > :Cheers, > :-Peter > :-- > :Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au > > > Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au The BUF_LOCK cannot simply be replaced by a check to see if the buffer has a non-zero count. If recursive locks are allowed, they need to be taken into consideration. If shared locks are involved there may be multiple of them allowed (though obviously they are not yet used). I propose the following change to the buffer locking code to basically avoid the locking against myself panic when doing a polling lock (since if recursion is not allowed it will return an error rather than blocking). I am going to leave this running on my test machine overnight to check for problems. I am committing it now as I believe it to be correct and that it will solve some of the locking problems that we have seen. Index: kern_lock.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_lock.c,v retrieving revision 1.26 diff -c -r1.26 kern_lock.c *** kern_lock.c 1999/06/26 02:46:00 1.26 --- kern_lock.c 1999/06/28 07:28:21 *************** *** 337,348 **** * Recursive lock. */ #if !defined(MAX_PERF) ! if ((extflags & LK_CANRECURSE) == 0) panic("lockmgr: locking against myself"); #endif ! lkp->lk_exclusivecount++; ! COUNT(p, 1); ! break; } /* * If we are just polling, check to see if we will sleep. --- 337,350 ---- * Recursive lock. */ #if !defined(MAX_PERF) ! if ((extflags & (LK_NOWAIT | LK_CANRECURSE)) == 0) panic("lockmgr: locking against myself"); #endif ! if ((extflags & LK_CANRECURSE) != 0) { ! lkp->lk_exclusivecount++; ! COUNT(p, 1); ! break; ! } } /* * If we are just polling, check to see if we will sleep. I do not see the problem that you are pointing out with missing BUF_KERNPROC in cluster_callback, but it is well past midnight, so I may not be thinking clearly. Greg Lehey has sent me a panic with the buffer locking in the NFS code. I am too tired to attack it tonight, but will look at it in the morning. Kirk To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message