Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Mar 1999 22:49:10 +0800
From:      Peter Wemm <peter@netplex.com.au>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Pierre Beyssac <beyssac@enst.fr>, freebsd-current@FreeBSD.ORG
Subject:   Re: panic: vfs_busy: unexpected lock failure 
Message-ID:  <199903181449.WAA33699@spinner.netplex.com.au>
In-Reply-To: Your message of "Wed, 17 Mar 1999 17:11:00 PST." <199903180111.RAA34092@apollo.backplane.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
Matthew Dillon wrote:
> :On Tue, Mar 16, 1999 at 12:52:32PM -0800, Matthew Dillon wrote:
> :>     Ahhhh..  And if you make those AMD mounts normal nfs mounts it doesn't
     
> :>     fry?  If so, then we have a bug in AMD somewhere.
> :
> :I tried the cp several times again on a regular NFS mount, to make
> :sure, and no, it doesn't seem to panic. So yes, that seems to be
> :AMD-related.  Can't it be in the vfs layer though?
> :-- 
> :Pierre Beyssac		pb@enst.fr
> 
>     It's probably AMD.  I'm not really up on how AMD works... hasn't someone
>     done some work on it recently to fix other breakages?  Maybe they could
>     look at this panic.

AMD is easy to upset, and that's bad because it's holding a mountpoint in /
(ie: /host) which often gets hit by every single getcwd() call when it 
gets a lstat("/host"...) or whatever.  I think this is the single largest 
source of load on the amd process.

The other problem is that amd is an rpc client, it depends on the libc rpc 
code for robustness, and that's not the first word that springs to mind 
when I think of it...  When amd hangs on a dns lookup, there are all sorts 
of VFS locking cascades and NFS wedges while the kernel is retrying all 
those retransmitted packets to amd's pseudo-nfs server port.  It's been 
found to be the primary cause of the 'nfsrcv' hangs - processes wedged in 
getcwd() style situations trying to stat /host.

IMHO, /host needs to move down a level to get it out of the way of 
getcwd().  NFS mounts should probably move away from / as well, as they 
cause traffic on each getcwd().

I think the default settings should look something like this..

/net			- amd and nfs related stuff
/net/sysname/mount1	- nfs mount created by amd
/net/sysname/mount2	- nfs mount created by amd
/net/host		- /host lives here instead.

and a symlink:
/host -> /net/host

I think that'll stop amd from being hammered by all those lstat()'s in 
getcwd and friends in the root directory.

And instead of mounting NFS things as:  /a,  mount them as /net/a instead 
and use a symlink.

This isn't a "fix", it's just trying to move a particularly weak link out
of the direct line of fire.  A real solution would be a proper userfs
interface that could cope with kernel<->user_process protocol timeouts,
process deaths, etc.  Of course, then there's always an in-kernel autofs
etc.

Cheers,
-Peter




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199903181449.WAA33699>