Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 5 Jul 2006 23:49:13 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Francisco Reyes <lists@stringsutils.com>
Cc:        freebsd-stable@freebsd.org, Michel Talon <talon@lpthe.jussieu.fr>
Subject:   Re: NFS Locking Issue
Message-ID:  <20060705234514.I70011@fledge.watson.org>
In-Reply-To: <cone.1152136419.991036.72616.1000@zoraida.natserv.net>
References:  <E1FxzUU-000MMw-5m@cs1.cs.huji.ac.il> <20060705100403.Y80381@fledge.watson.org> <cone.1152136419.991036.72616.1000@zoraida.natserv.net>

next in thread | previous in thread | raw e-mail | index | archive | help

On Wed, 5 Jul 2006, Francisco Reyes wrote:

>> can you trigger it using work on just one client against a server, without 
>> client<->client interactions?  This makes tracking and reproduction a lot 
>> easier
>
> Personally I am experiencing two problems.
> 1- NFS clients freeze/hang if the server goes away.
> We have clients with several mounts so if one of the servers dies then the 
> entire operation of the client is put in jeopardy.
>
> This I can reproduce every single time with a 6.X client.. with both a 5.X 
> and a 6.X server.
>
> "umount -f" hangs too.

The problems you are experiencing are almost certainly not related to 
rpc.lockd, rather, bugs in the NFS client.

Let's just look at the normal use hang for now, and revisit umount -f after 
that.

>> as multi-client test cases are really tricky!
>
> The second case only happens under heavy load and restarting nfsd makes it 
> go away. Basically 'b' column in vmstat goes high and the performnance of 
> the machine falls to the floor.
>
> Going to try 
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneld 
> ebug-deadlocks.html
>
> And reading up on how to debug with DDB. Have another user who volunteered 
> to give me some pointers.. so will try that.. so I am able to actually 
> produce more helpfull info.

If you can get into DDB when the hang has occurred, output via serial console 
for the following commands would be very helpful:

show pcpu
show allpcpu
ps
trace
traceall
show locks
show alllocks
show uma
show malloc
show lockedvnods

Note that the last two will only work if you compile WITNESS in -- WITNESS 
significantly changes kernel timing, so you may find it closes whatever race 
you're running into.  If you can reproduce the problem with WITNESS and 
INVARIANTS, that would be very useful.  The above output will hopefully tell 
us the basic state of the system with respect to processes, threads, locking, 
and so on, and may help us track things down.  For the above, you definitely 
want a serial console as it will be quite a bit of output.

Also, can you send the output of the 'mount' command from the un-hung state? 
I notice a lot of threads stuck in 'ufs'.

Finally, during the above, if you could disable background file system 
checking by placing the following in /etc/rc.conf:

   background_fsck="NO"

And boot to single user mode, doing a full fsck -p before booting up, in order 
to make sure the file system is in a good state before beginning.

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060705234514.I70011>