Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Jun 2003 13:39:05 -0700 (PDT)
From:      Don Lewis <truckman@FreeBSD.org>
To:        uitm@blackflag.ru
Cc:        freebsd-hackers@FreeBSD.org
Subject:   Re: open() and ESTALE error
Message-ID:  <200306202039.h5KKd5M7060679@gw.catspoiler.org>
In-Reply-To: <200306201835.WAA00763@slt.oz>

next in thread | previous in thread | raw e-mail | index | archive | help
On 20 Jun, Andrey Alekseyev wrote:
> Don,
> 
>> One case where there is a difference between timing out old file handles
>> and just invalidating them on ESTALE:
> 
> Frankly, I just didn't find any mechanism in the STABLE kernel that
> does "timing out" for file handles. Do you mean, it would be nice to have
> it or are you trying to point it out to me? ;-P

If there isn't such a mechanism, there should be.

>> client%	cmd1 > file1; cmd2 > file2
>> server% mv file1 tmpfile; mv file2 file1; mv tmpfile file1
>> 
>> wait an hour
>> 
>> client% cat /dev/null > file1
>> 
>> If file handles are cached indefinitely, and the client didn't recycle
>> the vnode for file1, which file on the server got truncated?  Since
>> neither file was deleted on the server, you can't rely on ESTALE to
>> detect this situation.
> 
> Eh, but the generation number for file1 should have been changed! This will
> result in a definite ESTALE error for file1 from the server. That is, I
> believe that if you attempt to open("file1", O_CREAT) after an hour, you'll
> get ESTALE from the server (on which nfs_request() will invalidate "file1"
> namecache entry and vnode+nfsnode+old-file-handle) and the second vn_open()
> will re-lookup file1 and get a valid new file handle.

If the client still has a cached copy of the file handle for file1,
won't it just use that and truncate file2 on the server?  The handle
never doesn't stale because the file was never deleted on the server.

> Actually, this is what indeed happens if the second open() comes from the
> userland application :)  I'm just trying to eliminate the need of modifying
> a generic application.
> 
> For my example with moves, the next "cat" will always(!) succeed.
> 
>> Question: does the timeout of the directory attributes cause open() do
>> do an NFS lookup on the file, or does open() just find the vnode in the
>> cache and use its cached handle?
> 
> Well, for open() without O_CREAT the sequence is this:
> open() -> vn_open() -> namei() -> lookup() -> VOP_LOOKUP() -> nfs_lookup()
>           |
>           VOP_ACCESS() -> nfs_access() [ -> nfs3_access_otw() <possibly>]
>           |
>           VOP_OPEN() -> nfs_open()
> 
> Lookup is always done first (obviously). It may return cached name which
> contains a pointer to a cached vnode/nfsnode. Cached vnode/nfsnode is used
> further in VOP_ACCESS() and VOP_OPEN(). Either function may or may not
> update file attributes cached inside nfsnode. Neither VOP_ACCESS() or
> VOP_OPEN() ever updates the *file handle*. File handle comes from
> VOP_LOOKUP().  And VOP_LOOKUP() only places it there if vnode/nfsnode isn't
> cached.  Which I believe happens only if there is no cached filename in
> the namecache. I really tried to do my best to describe everything in:
> http://www.blackflag.ru/patches/nfs_attr.txt
> Please take a look.

If the client is mostly idle, then the cached filename is unlikely to be
flushed, so even after a long period of time, namei() will return the
old vnode and its associated file handle.  If the file on the server was
renamed and not deleted, the server won't return ESTALE for the handle
and open() will return a descriptor for the original file on the server
that has since been renamed, not for the new file on the server that
lives at the path name passed to open() on the client.

Another example:

client% cmd1 > file1
client% cmd2 > file2
client% more file1
        ^Z
        suspended

server% mv file1 tmpfile; mv file2 file1; mv tmpfile file2

wait 24 hours

client% cat /dev/null > file1
client% fg

The last cat comand should truncate file1 on the server, which is the
output of cmd2.  When the more command resumes, it should still be able
to able to see the output of cmd1.  The old file1 vnode and file handle
should remain valid, but the lookup to open file1 for the last cat
command needs to know that the cache entry has timed out and that the
handle associated with the cached vnode for file1 hasn't been validated
in a while.  Lookup() needs to bypass the cache in the case and pass the
lookup request to the server.  If the file handle returned is the same
as before, the cache entry should be freshened, if the file handle is
different then a new vnode needs to be allocated and associated with the
name cache entry and the new handle.  The old vnode and its handle need
to be retained until either an rpc using this handle returns ESTALE, or
the the file is closed and the vnode is recycled.


> Whether ESTALE came from VOP_ACCESS() or VOP_OPEN() depends on several
> factors. Namely, the value of nfsaccess_cache_timeout sysctl, acmin/acmax
> and the age of the file in question.
> 
> Generally speaking, if nfsaccess_cache_timeout is less than acmin,
> VOP_ACCESS() that comes right before VOP_OPEN() in vn_open() will try to do
> an "access" RPC request and it'll fail if the file handle is stale. If
> nfsaccess_cache_timeout is greater than acmin, than it's possible that
> VOP_ACCESS() will answer "yes" basing on the cached attributes, but
> VOP_GETATTR(), which is called from nfs_open() (which is VOP_OPEN() for
> NFS) will in turn "go to the wire" and still nfs_request() will fail with
> ESTALE.
> 
> Hope, I'm making it clear :)

Yeah, but the solution that you propose doesn't fix the case where
ESTALE is not returned but namei() returns a cached vnode associated
with a file on the server that doesn't exist at the specified path name.

Also, fixing open() doesn't fix similar problems that can occur with
other syscalls that take path names, such as stat() and readlink().

If the lookup code is changed so that it more frequently revalidates the
name->vnode->handle entries, then the window where open() can fail due
to ESTALE would be greatly reduced.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200306202039.h5KKd5M7060679>