Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Jul 1995 11:50:14 +0100 (BST)
From:      Doug Rabson <dfr@render.com>
To:        Terry Lambert <terry@cs.weber.edu>
Cc:        peter@haywire.dialix.com, freebsd-current@freebsd.org
Subject:   Re: what's going on here? (NFSv3 problem?)
Message-ID:  <Pine.BSF.3.91.950724112353.12542B-100000@minnow.render.com>
In-Reply-To: <9507212036.AA06811@cs.weber.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 21 Jul 1995, Terry Lambert wrote:

> > NFSv3 defines a mechanism to validate the cookies used to read directory 
> > entries.  Each readdir request returns a set of directory entries, each 
> > with a cookie which can be used to start another readdir just after the 
> > entry.  To read from the beginning of the directory, one passes a NULL 
> > cookie.
> > 
> > NFSv3 also returns a 'cookie verifier' which must be passed with the next
> > readdir, along with the cookie representing the place to read from.  If the
> > directory block was compacted, then the server should use the verifier to
> > detect this and can return an error to the client to force it to retry the
> > read from the beginning of the directory. 
> 
> Most file systems do not provide a generation count on directory blocks
> with which to validate the "cookie".
> 
> With that in mind, the "cookie" is typically interpreted either as an
> entry offset or as a byte offset of entry, either in the block or in
> the directory.

The NFSv3 code in -current uses the modification time of the directory as 
the verifier.  This is perhaps a slightly pessimistic solution but it 
should detect compactions.  The client reacts to bad cookie errors by 
flushing its cached information about the directory.  This seems to be a 
reasonable reaction to the directory being modified.

Can the ufs code ever compact a directory block *without* the 
modification time of the directory changing?  Presumably it only ever 
does this as a result of some other operation on the directory.

> 
> It is this use which one uses to resynchronize entries in the case of
> block compaction, with the inevitable problem potential this has
> associated with it (duplicate vs. skipped entries, as you point out).
> 
> > > The buffer crap that got done to avoid a file system top end user
> > > presentation layer is totally bogus, and remains the cause of the
> > > prblem.  If no one is interested in fixing it, I suggest reducing
> > > the transfer size to the page size or smaller.
> > 
> > I can't parse this one.
> 
> The stat structure passed around internally is larger than the stat
> structure expected by NFS.
> 
> Rather than fix the view of things at the time it was exported to
> NFS, the internal buffer representation for all file system capable
> of being exported was changed.
> 
> I can't say I'm not glad that this is coming back to haunt us.
> 

At the time, I was more interested in fixing the completely stupid 
assumption the NFS server was making about the FS implementation which 
only ever worked for UFS.  Adding a whole new layer of code between NFS 
and the VFS would have added maintenance problems, consistency problems 
(we would be caching directory information; when is the cache invalid?  
when should stuff be removed from it?) and needless complication.

I added code as part of this fix which would deal with unaligned UFS
directory reads, more or less on the lines of the approach you suggested.  
The FS reads from the aligned address.  NFS then finds from the information
returned by the FS the first entry whose cookie is greater or equal to the
cookie sent by the client.  The only restriction this places on VFS for
directory cookies is that they increase monotonically in a directory. 

In the case of a compacted directory block, the client may recieve 
filenames it has already seen or it may miss a few entries.  It will 
never recieve corrupt information.

> > > And, of course, at the same time eat the increased and otherwise
> > > unnecessary overhead in the read/write path transfers that will
> > > result from doing this "fix".
> > 
> > I don't think that any fix is needed.  The NFSv2 behaviour is adequate 
> > and NFSv3 has the mechanism to detect this problem.
> 
> This is the "drop the buffer size" fix, not the detection fix (which would
> be unnecessary if the buffer size "fix" wasn't there).
> 
> 
> It should also be noted that NFSv3 classifies the blocked directory
> entry retrieval as an *optional* implementation for the server, and the
> problem would also go away were the option declined and versioning more
> strictly enforced.
> 
> I don't think this would be the cannonically correct thing to do.

The current v2 server has an adequate strategy for dealing with directory
compaction for all read sizes, IMHO.  The directory verifier is *not*
optional in NFSv3.  The only optional part AFAIK is the use of READDIRPLUS by
the client to read file attributes with the names.  Both READDIR and
READDIRPLUS *must* implement a verifier strategy. 

A server *can* choose to return zero for a verifier but only if the 
cookies it generates are *always* valid, e.g. for read-only media.  From 
rfc1813, section 3.3.16:

      One implementation of the cookie-verifier mechanism might
      be for the server to use the modification time of the
      directory. This might be overly restrictive, however. A
      better approach would be to record the time of the last
      directory modification that changed the directory
      organization in a way that would make it impossible to
      reliably interpret a cookie. Servers in which directory
      cookies are always valid are free to use zero as the
      verifier always.

--
Doug Rabson, Microsoft RenderMorphics Ltd.	Mail:  dfr@render.com
						Phone: +44 171 251 4411
						FAX:   +44 171 251 0939





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.91.950724112353.12542B-100000>