Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Oct 1996 14:46:27 +0100 (BST)
From:      Doug Rabson <dfr@render.com>
To:        Michael Hancock <michaelh@cet.co.jp>
Cc:        Karl Denninger <karl@Mcs.Net>, freebsd-hackers@FreeBSD.org
Subject:   Re: NFS node: disappearing directory
Message-ID:  <Pine.BSF.3.95.961018144130.14906D-100000@minnow.render.com>
In-Reply-To: <Pine.SV4.3.93.961018143541.1217A-100000@parkplace.cet.co.jp>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 18 Oct 1996, Michael Hancock wrote:

> On Thu, 17 Oct 1996, Karl Denninger wrote:
> 
> > Background:	Server is a BSDI 2.0 machine
> > 		Client is a FreeBSD 2.2-CURRENT system
> > 
> > Symptom:	Randomly, "getcwd()" fails.
> > 
> > Analysis thus far:
> > 		The search up-directory in "getcwd()" walks up the directory
> > 		tree from "." to the root (determined by the device numbers
> > 		and inodes for root when it gets there) looking at each path
> > 		component and inserting it in the returned string.
> > 
> > 		Thus, what happens in getcwd() is this:
> > 
> > 			Save our inode number
> > 			Open ".."
> > 			Read through it, looking for the inode number.
> > 			Save the path component
> > 
> > 			Iterate until you reach "/"
> > 
> > 			Return the path to the user.
> > 
> > 		Now, some problems we've found.
> > 
> > 		1) When it fails, the up-movement works but the inode
> > 		number is NOT seen in the FIRST component when the directory
> > 		read is performed.  (ie: the directory is
> > 		"/user/contrib/swilson", the first component is 'swilson'
> > 		and that is not found at the first "step-up".
> 
> Is this remote /user mounted on local /user?  This might not be relevant,
> but it helps in understanding the execution path.
>  
> > 		2) A bug in libc() was found where the path was not being
> > 		null-terminated, which led to comparisons looking for really 
> > 		bizarre names (ie: 1024-byte random strings).   We thought
> > 		this might be the cause of the problem, but it wasn't.  I
> > 		have send in a commit (already accepted) to fix this.
> > 
> > 		3) Two consecutive "mv"s (rename to a different name, then
> > 		back) clear the problem on that given directory - but it
> > 		does eventually come back.
> > 
> > 		4) The problem is *random* and comes and goes for a given
> > 		directory.  That it exists one minute does not imply that it
> > 		will 2 minutes later.
> > 
> > 		5) Some people have reported that if they actually do a "ls"
> > 		of the directory up one level, the affected paths are now
> > 		showing up.  I've not been able to nail this, but its
> > 		consistent with the failure noted in (1) above.
> > 
> > Current speculation is that this is a vnode cache handling problem of some 
> > kind, where the vnode for the desired directory is being "flushed" but 
> > never reloaded into the cache.  We're still investigating and searching for
> > the root cause.

I have a pretty good idea what is happening here.  If I run this script:

for i in a b c d e f; do
	for j in 0 1 2 3 4 5 6 7 8 9; do
		for k in 0 1 2 3 4 5 6 7 8 9; do
			for l in 0 1 2 3 4 5 6 7 8 9; do
				mkdir test/$i$j$k$l
			done
		done
	done
done

and at the same time, continually list the contents of the directory
'test', after it gets past about 'a252', the client can no longer see past
the first 4k of the directory.  This is because the client detects that
the directory is modified and tries to trash its buffered copy of the
directory contents withg nfs_invaldir() and nfs_vinvalbuf().  This
confuses nfs_getcookie() and the directory appears to be truncated.  I am
working on a fix. 

--
Doug Rabson, Microsoft RenderMorphics Ltd.	Mail:  dfr@render.com
						Phone: +44 171 734 3761
						FAX:   +44 171 734 6426




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95.961018144130.14906D-100000>