Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Oct 1996 14:52:47 +0900 (JST)
From:      Michael Hancock <michaelh@cet.co.jp>
To:        Karl Denninger <karl@Mcs.Net>
Cc:        freebsd-hackers@freebsd.org
Subject:   NFS node: disappearing directory
Message-ID:  <Pine.SV4.3.93.961018143541.1217A-100000@parkplace.cet.co.jp>
In-Reply-To: <199610171810.NAA00720@Jupiter.Mcs.Net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 17 Oct 1996, Karl Denninger wrote:

> Background:	Server is a BSDI 2.0 machine
> 		Client is a FreeBSD 2.2-CURRENT system
> 
> Symptom:	Randomly, "getcwd()" fails.
> 
> Analysis thus far:
> 		The search up-directory in "getcwd()" walks up the directory
> 		tree from "." to the root (determined by the device numbers
> 		and inodes for root when it gets there) looking at each path
> 		component and inserting it in the returned string.
> 
> 		Thus, what happens in getcwd() is this:
> 
> 			Save our inode number
> 			Open ".."
> 			Read through it, looking for the inode number.
> 			Save the path component
> 
> 			Iterate until you reach "/"
> 
> 			Return the path to the user.
> 
> 		Now, some problems we've found.
> 
> 		1) When it fails, the up-movement works but the inode
> 		number is NOT seen in the FIRST component when the directory
> 		read is performed.  (ie: the directory is
> 		"/user/contrib/swilson", the first component is 'swilson'
> 		and that is not found at the first "step-up".

Is this remote /user mounted on local /user?  This might not be relevant,
but it helps in understanding the execution path.
 
> 		2) A bug in libc() was found where the path was not being
> 		null-terminated, which led to comparisons looking for really 
> 		bizarre names (ie: 1024-byte random strings).   We thought
> 		this might be the cause of the problem, but it wasn't.  I
> 		have send in a commit (already accepted) to fix this.
> 
> 		3) Two consecutive "mv"s (rename to a different name, then
> 		back) clear the problem on that given directory - but it
> 		does eventually come back.
> 
> 		4) The problem is *random* and comes and goes for a given
> 		directory.  That it exists one minute does not imply that it
> 		will 2 minutes later.
> 
> 		5) Some people have reported that if they actually do a "ls"
> 		of the directory up one level, the affected paths are now
> 		showing up.  I've not been able to nail this, but its
> 		consistent with the failure noted in (1) above.
> 
> Current speculation is that this is a vnode cache handling problem of some 
> kind, where the vnode for the desired directory is being "flushed" but 
> never reloaded into the cache.  We're still investigating and searching for
> the root cause.

But 3) says it does get reloaded.

> Note that this appears to happen on directories with LARGE numbers of 
> subdirectory entries -- and not on ones with small numbers of directories.
> I've never seen it occur, for example, on MY home directory -- but I'm on a
> disk pack with maybe 20 directories at the same level that I'm on.  
> 
> The places where it happens frequently have perhaps 3,000 - 4,000 directories
> at the same level, which is common on our big user disks.
> 
> That's what we know right now.
> 
> BTW, the heuristic in getcwd() needs some work, but I'm not sure how to
> accomplish it as of yet.  The reason is that we'd REALLY like to be able to 
> protect the directories involved from a listing -- that is, make them mode 
> 711.  However, doing this causes logins to fail and all the shells to bitch
> loudly.
> 
> An example:
> 
> /			- 755
> /user			- 755
> /user/contrib		- 711
> /user/contrib/who-am-i	- 700
> 
> The user is "who-am-i", and in that directory.
> 
> getcwd() will return an error in this environment, as when it tries to READ
> /user/contrib to find the inode match for the "who-am-i" component it is
> unable to open that directory for this purpose.
> 
> I'm doing a brain-search on ways to make it possible to protect things in
> this fashion and still have the getcwd() call succeed, but I don't know if
> its even possible.

The above permissions work under SysV.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.SV4.3.93.961018143541.1217A-100000>