From owner-freebsd-current  Mon Jun 28  0:56:26 1999
Delivered-To: freebsd-current@freebsd.org
Received: from flamingo.McKusick.COM (flamingo.mckusick.com [209.31.233.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id A0DDD153B3; Mon, 28 Jun 1999 00:56:20 -0700 (PDT)
	(envelope-from mckusick@flamingo.McKusick.COM)
Received: from flamingo.McKusick.COM (mckusick@localhost.concentric.net [127.0.0.1])
	by flamingo.McKusick.COM (8.9.3/8.9.0) with ESMTP id UAA01061;
	Sun, 27 Jun 1999 20:47:57 -0700 (PDT)
Message-Id: <199906280347.UAA01061@flamingo.McKusick.COM>
To: Matthew Dillon <dillon@apollo.backplane.com>
Subject: Re: Found the startup panic - ccd ( patch included ) 
Cc: Alan Cox <alc@cs.rice.edu>, Julian Elischer <julian@whistle.com>,
	Mike Smith <mike@smith.net.au>, Peter Wemm <peter@freebsd.org>,
	"John S. Dyson" <toor@dyson.iquest.net>, dg@root.com,
	dyson@iquest.net, current@freebsd.org, Greg Lehey <grog@lemis.com>
In-reply-to: Your message of "Sat, 26 Jun 1999 17:51:05 PDT."
             <199906270051.RAA05366@apollo.backplane.com> 
Date: Sun, 27 Jun 1999 20:47:51 -0700
From: Kirk McKusick <mckusick@flamingo.McKusick.COM>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

First off, sorry for the silence. This is gay freedom weekend
in San Francisco, and I've been rather busy socially. It was
poor timing on my part to commit such a big change just before
I was going to be gone all weekend. I have tried to answer
several questions below.

	Date: Sat, 26 Jun 1999 17:51:05 -0700 (PDT)
	From: Matthew Dillon <dillon@apollo.backplane.com>
	To: Alan Cox <alc@cs.rice.edu>
	Cc: Julian Elischer <julian@whistle.com>,
		Kirk McKusick <mckusick@flamingo.mckusick.com>,
		Mike Smith <mike@smith.net.au>,
		"John S. Dyson" <toor@dyson.iquest.net>, dg@root.com,
		dyson@iquest.net
	Subject: Found the startup panic - ccd ( patch included )

	:I'm seeing the same thing here.
	:
	:	Alan
	:
	:P.S. Julian, perhaps a "mild" warning to -smp would be appropriate.
	:(I'm off to dinner.)

	I think I found it.  The CCD device is malloc()ing buffers and
	the new lock is not being initialized.  diff included below.
	Also, in the ccd driver, it was inheriting the B_BUSY by copying
	b_flags - but Kirk must-a missed it because it was not explicitly
	specifying B_BUSY.  So we also have to call BUF_LOCK() after the 
	BUF_LOCKINIT().

	Bleh.  I'm not sure I want to know why ccd is malloc()ing its own
	buffer structures.  I'll bet we could improve ccd's performance 
	significantly by having it use getpbuf().

	Alan, et all, apply the patch then recompile and reinstall your
	/usr/src/sys/modules, recompile and reinstall your kernel, and reboot.

	Kirk, please review and then commit the patch if all looks ok!  Thanks!

	My test machine gets to where I can login, now.  I am running more
	tests.

					    -Matt
					    Matthew Dillon 
						<dillon@backplane.com>


These changes look absolutely correct to me. I see that Peter has
committed them.

	Index: dev/ccd/ccd.c
	===================================================================
	RCS file: /home/ncvs/src/sys/dev/ccd/ccd.c,v
	retrieving revision 1.48
	diff -u -r1.48 ccd.c
	--- ccd.c	1999/05/30 16:51:18	1.48
	+++ ccd.c	1999/06/27 00:33:52
	@@ -915,6 +915,8 @@
		cbp->cb_buf.b_data = addr;
		cbp->cb_buf.b_vp = ci->ci_vp;
		LIST_INIT(&cbp->cb_buf.b_dep);
	+	BUF_LOCKINIT(&cbp->cb_buf);
	+	BUF_LOCK(&cbp->cb_buf, LK_EXCLUSIVE);
		cbp->cb_buf.b_resid = 0;
		if (cs->sc_ileave == 0)
		       cbc = dbtob((off_t)(ci->ci_size - cbn));
	@@ -947,6 +949,8 @@
			cbp->cb_buf.b_dev = ci2->ci_dev;
			cbp->cb_buf.b_vp = ci2->ci_vp;
			LIST_INIT(&cbp->cb_buf.b_dep);
	+		BUF_LOCKINIT(&cbp->cb_buf);
	+		BUF_LOCK(&cbp->cb_buf, LK_EXCLUSIVE);
			cbp->cb_comp = ci2 - cs->sc_cinfo;
			cb[1] = cbp;
			/* link together the ccdbuf's and clear "mirror done" flag */


    From:    Peter Wemm <peter@netplex.com.au>
    Date:    Sun, 27 Jun 1999 16:44:14 +0800
    To:      Matthew Dillon <dillon@apollo.backplane.com>
    Subject: Re: BUF_LOCK() related panic.. 
    Cc: current@FreeBSD.ORG, mckusick@mckusick.com
    In-reply-to: Your message of "Sun, 27 Jun 1999 01:15:43 MST."
		 <199906270815.BAA10773@apollo.backplane.com> 

    It seems to me the main problem (so far) is the buftimelock..

	    simple_lock(&buftimelock);
	    bp->b_lock.lk_wmesg = buf_wmesg;
	    bp->b_lock.lk_prio = PRIBIO + 4;
	    bp->b_lock.lk_timo = 0; 
	    return (lockmgr(&(bp)->b_lock, locktype, &buftimelock, curproc));

    Inside lockmgr():

	    simple_lock(&lkp->lk_interlock);
	    if (flags & LK_INTERLOCK)
		    simple_unlock(interlkp);
				  ^^^^^^^^ <--- &buftimelock;

    Note that there is no LK_INTERLOCK in any of the calls to lockmgr()..  On
    UP, simplelocks are noops.  On SMP, they are real and nothing is ever
    freeing buftimelock.

    But that doesn't fix the UP problem where cluster_wbuild() tries to
    recursively re-lock a buf that the current process already owns.  I have a
    few ideas about that one though, I just don't understand the clustering
    well enough yet to fix it.

    Speaking of SMP and simple locks, I'd like to turn on the debugging
    simplelocks that keep a reference count and check before switching to make
    sure that a process doesn't sleep holding a lock.  This is a pretty
    fundamental sanity check and would have found the LK_INTERLOCK problem
    above before it got committed.

    Cheers,
    -Peter
    --
    Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au

The above along with the splbio locking is the correct fix.
I see that Peter has already committed it.

On the subject of BUF_KERNPROC, this must be done as I have coded
it, not later in biodone. The problem is when the filesystem does
a readahead. It starts the I/O on the buffer that it really wants,
then starts ASYNC reads on the readahead buffers. When it finishes
with the buffer that it wanted and the application enters the kernel
to get the next buffer, the filesystem will attempt to lock the
buffer on which it earlier initiated the read. If the I/O has not
yet finished one of two bad things will happen. If the lock is not
recursive, it will panic with `locking against myself' since it will
hold the lock from the time when it initiated the readahead. If
recursive locking is allowed, then it will get the buffer (since
it holds the exclusive lock) and use it before it has been filled.
So, it is semantically important that ASYNC read buffers be disowned
(i.e., given to the kernel) at the time they are created. That way,
only the kernel can access the buffer until the read I/O is done.
I agree that this code is ugly, but it is necessary.

    From:    Peter Wemm <peter@netplex.com.au>
    Date:    Mon, 28 Jun 1999 06:47:13 +0800
    To:      Matthew Dillon <dillon@apollo.backplane.com>
    Subject: Re: BUF_LOCK() related panic.. 
    Cc: current@FreeBSD.ORG, mckusick@mckusick.com
    In-reply-to: Your message of "Sun, 27 Jun 1999 13:06:13 MST."
		 <199906272006.NAA15499@apollo.backplane.com> 

    Matthew Dillon wrote:
    > :
    > :But that doesn't fix the UP problem where cluster_wbuild() tries to
    > :recursively re-lock a buf that the current process already owns.  I have a
    > :few ideas about that one though, I just don't understand the clustering
    > :well enough yet to fix it.
    > 
    >     Ok, I just hit this testing the lockmgr changes.
    > 
    >     I think the problem is that cluster_wbuild's algorithm was polluted
    >     a little by Kirk's commit.
    > 
    >     Previously it tested for B_BUSY to determine if it could then lock it
    >     to include in the cluster.

    Yep, I was aware of this before, but I didn't rip it out since I didn't
    know what Kirk's intentions were.  I'm assuming he's got his experience
    with making the BSD/OS vfs reentrant in mind, so I don't want to break
    anything that gets closer to that.  A seperate test and then lock would
    not be reentrant.  (Sure, there are far bigger problems than this, but
    every bit helps when we get there)

    >     Kirk changed this to actually attempt a lock, and then include it
    >     if the lock succeeded and not include it if the lock failed.
    > 
    >     The problem is that if the buffer was already locked by the same process,
    >     this change results in a panic instead of a simple failure to obtain the
    >     lock.

    >     The solution is to re-tool the code to use the original algorithm ( test
    >     the lock before trying to get it, rather then simply trying to get it ),
    >     but with the new locks.  I do not have time today to do this but I believ
	e
    >     I have given sufficient information for Peter, Kirk, or Alan to make the
    >     fix.

    Actually, I think there is another set if missing BUF_KERNPROC() calls,
    cluster_callback() frees buffers, so all buffers submitted with it had
    better be reassigned.  This is (I think) part of the problem that
    cluster_wbuild() is hitting - things were supposed to have been reassigned
    but are still hanging onto the current process.

    >     I believe there are two or three areas in Kirk's patchset where he 
    >     replaced an explicit test with an attempt to actually gain the lock where
    >     this sort of panic can occur.  I think Kirk was trying to optimize the
    >     code :-)  Heh heh.  Just goes to show that combining functional 
    >     replacements with optimizations all in one go does not always work.
    > 
    > 					-Matt
    > 					Matthew Dillon 
    > 					<dillon@backplane.com>
    > 
    > :Cheers,
    > :-Peter
    > :--
    > :Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au
    > 
    > 
    > 

    Cheers,
    -Peter
    --
    Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au

The BUF_LOCK cannot simply be replaced by a check to see if the
buffer has a non-zero count. If recursive locks are allowed, they
need to be taken into consideration. If shared locks are involved
there may be multiple of them allowed (though obviously they are
not yet used). I propose the following change to the buffer locking
code to basically avoid the locking against myself panic when doing
a polling lock (since if recursion is not allowed it will return
an error rather than blocking). I am going to leave this running
on my test machine overnight to check for problems. I am committing 
it now as I believe it to be correct and that it will solve some
of the locking problems that we have seen.

Index: kern_lock.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_lock.c,v
retrieving revision 1.26
diff -c -r1.26 kern_lock.c
*** kern_lock.c	1999/06/26 02:46:00	1.26
--- kern_lock.c	1999/06/28 07:28:21
***************
*** 337,348 ****
  			 *	Recursive lock.
  			 */
  #if !defined(MAX_PERF)
! 			if ((extflags & LK_CANRECURSE) == 0)
  				panic("lockmgr: locking against myself");
  #endif
! 			lkp->lk_exclusivecount++;
! 			COUNT(p, 1);
! 			break;
  		}
  		/*
  		 * If we are just polling, check to see if we will sleep.
--- 337,350 ----
  			 *	Recursive lock.
  			 */
  #if !defined(MAX_PERF)
! 			if ((extflags & (LK_NOWAIT | LK_CANRECURSE)) == 0)
  				panic("lockmgr: locking against myself");
  #endif
! 			if ((extflags & LK_CANRECURSE) != 0) {
! 				lkp->lk_exclusivecount++;
! 				COUNT(p, 1);
! 				break;
! 			}
  		}
  		/*
  		 * If we are just polling, check to see if we will sleep.

I do not see the problem that you are pointing out with missing
BUF_KERNPROC in cluster_callback, but it is well past midnight,
so I may not be thinking clearly.

Greg Lehey has sent me a panic with the buffer locking in the NFS code.
I am too tired to attack it tonight, but will look at it in the morning.

	Kirk


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message