Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 27 Aug 2006 22:28:04 +0200
From:      Greg Byshenk <gbyshenk@byshenk.net>
To:        Michael Abbott <michael@araneidae.co.uk>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: NFS locking: lockf freezes (rpc.lockd problem?)
Message-ID:  <20060827202803.GP633@core.byshenk.net>
In-Reply-To: <20060827183903.G52383@saturn.araneidae.co.uk>
References:  <20060827102135.B49194@saturn.araneidae.co.uk> <20060827135434.GH79046@deviant.kiev.zoral.com.ua> <20060827183903.G52383@saturn.araneidae.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Aug 27, 2006 at 07:17:34PM +0000, Michael Abbott wrote:
> On Sun, 27 Aug 2006, Kostik Belousov wrote:

> >Make sure that rpc.statd is running.
> Yep.  Took me some while to figure that one out, but the first lockf test 
> failed without that.
 
[...]
 
> As for the other test, let's have a look.  Here we are before the test 
> (NFS server, 4.11, is saturn, test machine, 6.1, is venus):
 
> saturn$ ps auxww | grep rpc\\.
> root    48917  0.0  0.1   980  640  ??  Is    7:56am   0:00.01 rpc.lockd
> root      115  0.0  0.1 263096  536  ??  Is   18Aug06   0:00.00 rpc.statd
 
[...]
 
> Well, how odd: as soon as I start the test process 515 on venus goes away. 
> Now to wait for it to fail... (doesn't take too long):
 
[...] 
 
> In conclusion: I agree with Greg Byshenk that the NFS server is bound to 
> be the one at fault, BUT, is this "freeze until reboot" behaviour really 
> what we want?  I remain astonished (and irritated) that `kill -9` doesn't 
> work!

The problem here is that the process is waiting for somthing, and 
thus not listening to signals (including your 'kill').

I'm not an expert on this, but my first guess would be that saturn (your
server) is offering something that it can't deliver.  That is, the client
asks the server "can you do X?", and the server says "yes I can", so the
client says "do X" and waits -- and the server never does it.

Or alternatively (based on your rpc.statd dying), rpc.lockd on your
client is trying to use rpc.statd to communicate with your server.  And
it starts successfully, but then rpc.statd dies (for some reason) and
your lock ends up waiting forever for it to answer.


I would recommend starting both rpc.lockd and rpc.statd with the '-d'
flag, to see if this provides any information as to what is going on.
There may well be a bug somewhere, but you need to find where it is.
I suspect that it is not actually in rpc.statd, as nothing in the
source has changed since January 2005.

An alternative would be to update to RELENG_6 (or at least RELENG_6_1)
and then try again.


-- 
greg byshenk  -  gbyshenk@byshenk.net  -  Leiden, NL



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060827202803.GP633>