Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Aug 1999 02:20:33 +0400
From:      Dmitrij Tejblum <tejblum@arc.hq.cti.ru>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Dmitrij Tejblum <tejblum@arc.hq.cti.ru>, Doug Rabson <dfr@nlsystems.com>, current@FreeBSD.ORG
Subject:   Re: NFSv3 on freebsd<-->solaris 
Message-ID:  <199908292220.CAA00778@tejblum.pp.ru>
In-Reply-To: Your message of "Sun, 29 Aug 1999 13:12:31 PDT." <199908292012.NAA06936@apollo.backplane.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
>     It isn't possible to do this and still remain synchronized.  If the
>     directory changes on the server, the client has no way of knowing 
>     whether a cookie corresponds to the same file if you always return
>     a valid response.  This breaks the protocol.
> 
>     A local filesystem getdirientries() call is monotonic, stateful, and
>     cache coherent.  An NFS readdir rpc is stateless, not monotonic, and can
>     only approximate cache coherency.

Perhaps I am mistaken, but I disagree. getdirentries() call is not 
monolitic and is stateless. Let see:

To read a directory with the getdirentries() call, the application have 
to open it just like every over file and get a file descriptor. Like 
every over file descriptor, the open directory has associated offset, 
or pointer. 

The getdirentries() syscall supply the directory pointer to VOP_READDIR 
as uio_offset. (The cookie sent by NFS client is supplied to VOP_READDIR 
as uio_offset too.) After exit from VOP_READDIR, the uio_offset stored 
back in the file descriptor offset. The file offset is the only state
saved.

Note also that offset has nothing to do with the size of data 
transferred by getdirentries(), escpecially if the filesystem is not 
UFS. That is, the offset is actually just a handy place to store the 
cookie (OTOH, for any local filesystem I am aware of it indeed the 
offset in the physical directory.)

Note that the application can do lseek on the directory, that is change 
the next cookie used. It is used by seekdir(). (And, of course, the
application may lseek to anywhere it like, and the filesystem will have 
to deal with the bogus cookie.

>     * an NFS readdir rpc is stateless and not monotonic.  The server cannot
>       tell the difference between a new rpc, a retry, or several different
>       processes on the client scanning the same directory (running at different
>       points in the directory).

With the local applications, VOP_READDIR cannot tell the difference 
too. There may be several program scanning one directory, the program 
may do seekdir(), the only known thing is the uio_offset, that is the 
cookie.

> 
>     * An NFS readdir rpc can only approximate cache coherency, but that
>       doesn't mean you can throw cache coherency out the window.  

What cache coherency? Noone ever mmap() a directory, I hope. After 
getdirentries() syscall finished, someone may change the directory in 
any way (just after read() call and a regular file). After the nfs 
readdir reply sent to the client, someone may change the directory in 
any way. Again, I don't see any difference. 

> It 
>       approximates cache coherency through the use of the verifier key.  If
>       the verifier key supplied by the client is wrong, the server has to
>       tell it so.  Otherwise the client's directory cache will get out of
>       sync.

Nope, the verifier is for the server can validate the cookie. Cache 
validation need to be done my checking of mtime, like with regular 
files. What if the client cached all the directory, and then the 
directory has changed? So, the cache coherency with directories is 
no worse than with regular files.

Note, that just like READ call return file attributes that can be used 
to cache validation, the READDIR call return the directory attributes, 
that can be used for this purpose.

>     Furthermore, the NFS readdir rpc has no notion of 'dead' directory entries
>     as far as I can tell.  This means that from the point of view of an NFS
>     client, directories are always 'compacted'.  Since clients may implement
>     a block cache for directories, the server cannot afford to return a valid
>     response if the verifier mismatches because it will screw up the client's
>     block cache for the directory.  This is very different from the way most
>     local directories are scanned - filesystems such as UFS maintain dead
>     directory entries and thus allow a directory data block to be scanned 
>     without any locking.  We cannot use this trick with NFS.
> 
>     Add on top of that the fact that the NFS directory 'block size' may
>     different then a local filesystem's.  NFS must translate padding 
>     characteristics between the local filesystem and the NFS client's notion
>     of the directory.  Even if we did support the notion of dead directory
>     entries in NFS, trying to translate the padding characteristics at the
>     same time would be fairly difficult to accomplish.

Umm, I didn't understand that the translation has to do with the issue. 
BTW, not all local filesystems are UFS.

> 
> :> Our NFS client used to have the same problem (a long time ago) and I put
> :> code into it to re-read the directory if its cookies are stale.
> :
> :(According to a mail recently sent to -hackers, that doesn't work. 
> :In -current, the recovery code has a debugging printf(), so I guess 
> :the code only triggered in very rare cases (see above).)
> 
>     This works on FreeBSD clients as far as I know.  That is what I thought
>     that email sent to hackers said... that it works w/ FreeBSD clients but
>     not with certain Sun clients.

The email titled "readdir() broken?" say that he can work around this 
bug by the workaround designed for SunOS 4.1.4 (and local filesystems). His NFS 
client and server are -STABLE.

> 
> :Anyway, I don't actually care what is correct NFS client behavior. I am 
> :saying that sending "bad cookie" error is not useful for FreeBSD sever.
> :
> :Dima
> 
>     My understanding is that it is part of the protocol spec.  We are not
>     going to become incompatible with the spec.

I think this is a misinterpretation of the spec (though the place 
apparently cannot be interpreted correctly). Again, since Sun, 
who invented NFS and wrote the NFS spec, had the "bug" all the time 
(in Solaris 2.5, 2.7 ...) then it must be not a bug.

Dima




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199908292220.CAA00778>