Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 27 Oct 2001 15:24:40 -0400
From:      Jules Gilbert <jules@aasp.net>
To:        freebsd-questions@freebsd.org
Cc:        pg@eth1.com, wfaxon@gis.net, david@catwhisker.org, green@freebsd.org, mckusick@mckusick.com
Subject:   panic: bqrelse: multiple ref .. thought fixed a year ago
Message-ID:  <3BDB09F8.F6E0D3A0@aasp.net>

next in thread | raw e-mail | index | archive | help
Hello folks:

We are having a big problem which is interfering with a whole lot of
things.  We are running FreeBSD 4.3 and we are seeing the infamous
"bqrelse: multiple refs" problem.
The panic then the dump, syncing disks..  We thought this was fixed over
a year ago in vfs_bio.c

This problem occured most recently when I exit'ed a remote ssh session.
The exit took several seconds, and caused me to believe something was
wrong.  I  then logged back in, and sure enough, we had a 'sh.core' dump
file (of zero size) and my running jobs had died.  (The machine dumped.)

Later, I could not get in at all, (of course the machine was dead at
that point)  and the other machines doing NFS writes failed as well. NFS
structure IS:

This machine with this problem, call it PRIME1,  NFS serves 6 other
FBSD4.3 machines as clients. They ALL mount PRIME1's /mnt/public and all
6 write into this directory with their own files.

So, this morning, searching the net, we found several references to
"bqrelse" but none of the references seemed to assert that the fix was
such-and-such.  Does a fix exist?  By the way, I maintain multiple
FreeBSD boxes and am doing lot's of NFS activity, in addition to my
occasional SSH login.

I am willing to make queue's larger, change parameters or whatever else
it takes to make this work.

Pls help us.

===================================================================
Our search results netted the following fr July 2000

Search Result 1
From: Kirk McKusick (mckusick@mckusick.com)
Subject: Re: Panic: bqrelse: multiple refs
Newsgroups: mailing.freebsd.current
View: (This is the only article in this thread) | Original Format
Date: 2000/07/26


Date: Tue, 25 Jul 2000 11:47:03 -0400 (EDT)
 From: Brian Fundakowski Feldman <green@FreeBSD.org>
 To: Ollivier Robert <roberto@eurocontrol.fr>
 Cc: "FreeBSD Current Users' list" <freebsd-current@FreeBSD.org>,
  mckusick@mckusick.com
 Subject: Panic: lockmgr: pid 5, not exclusive lock holder 0 unlocking
 In-Reply-To: <20000725170455.F636@caerdonn.eurocontrol.fr>

 On Tue, 25 Jul 2000, Ollivier Robert wrote:

 > According to Brian Fundakowski Feldman:
 > > Actually, I'm pretty certain this is the fix:
 >
 > Well it won't panic but isn't it putting the problem under the
carpet?
 > I agree the panic seems to be here temporarely but...

 No, I'm really certain this isn't the case.  You see, struct buf has
 a b_lock that until recently was a plain, exclusive lockmgr lock.  In
 Kirk's last round of changes, he converted b_lock to be LK_CANRECURSE,
 which means that the lock, while still an exclusive lock, may be
 relocked multiple times by the same caller.

 The panics are plain wrong.  What's left is to determine what is the
 proper thing to do in each of these cases, which I'm certain that many
 people already know already (you see, I'm still a bit green ;). What I
 am _almost_ sure about is that the right thing is just to remove one
 of the locks and let it get freed back up the call chain.  I'm almost
 certain this is the case because if you are grabbing exclusive locks
 and recursing upon them, your call chain is the only consumer and in
 a recursive-locking-callchain, you will have multiple symmetric lock
 and unlock pairs.  Anything else horribly complicates things, and this
 makes me a good 95% certain that this is exactly the right fix, not
 that it's sweeping any true bugs under the carpet.

 Allowing recursive locks is pretty much the only way to solve many of
 the problems here because it's simply not possible to support all code
 paths without allowing for this recursion.  The code would either be
 horribly complicated or non-functional.  I'm certain Kirk may be able
 to back me up here.  It seems that the cleanup is meant to make the
 locks recursive mostly to facilitate correct/proper call chains, and
 that's consistent with my understand at least :)

 Indeed, if you look at the comment in brelse() from the delta, you
 will see that the intention of allowing this very situation to occur
 and simply BUF_UNLOCK() was planned for and the panic()s were for
 debugging during the previous time that b_locks weren't LK_CANRECURSE.

 As always, take what I say with a grain of salt since I'm definitely
 not a VFS guru in any manner; I just happen to think I understand this
 one :)

 > --
 > Ollivier ROBERT -=- Eurocontrol EEC/ITM -=-
Ollivier.Robert@eurocontrol.fr
 > The Postman hits! The Postman hits! You have new mail.

 --
  Brian Fundakowski Feldman           \  FreeBSD: The Power to Serve!  /

  green@FreeBSD.org                    `------------------------------'

The above explanation is correct. When I made the change to allow
recursive buffer locks, I should have removed that panic (but forgot
that I had put it in there, sigh). I have just made the change on
freefall. Sorry for the problems caused by that change.

 Kirk McKusick





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3BDB09F8.F6E0D3A0>