Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Jun 2003 23:17:36 -0700 (PDT)
From:      Don Lewis <truckman@FreeBSD.org>
To:        uitm@blackflag.ru
Cc:        freebsd-hackers@FreeBSD.org
Subject:   Re: open() and ESTALE error
Message-ID:  <200306200617.h5K6HaM7058935@gw.catspoiler.org>
In-Reply-To: <200306190955.NAA00538@slt.oz>

next in thread | previous in thread | raw e-mail | index | archive | help
On 19 Jun, Andrey Alekseyev wrote:
> Hello,
> 
> I've been trying lately to develop a solution for the problem with
> open() that manifests itself in ESTALE error in the following situation:
> 
> 1. NFS server: echo "1111" > file01
> 2. NFS client: cat file01
> 3. NFS server: echo "2222" > file02 && mv file02 file01
> 4. NFS client: cat file01 (either old file01 contents or ESTALE)
> 
> My study shows that actually the problem appears to be in VOP_ACCESS()
> which is called from vn_open(). If nfs_access() decides to "go to the wire"
> in #4, it then uses a cached file handle which is indeed stale. Thus,
> open() eventually fails with ESTALE too (ESTALE comes from underlying
> nfs_request()).
> 
> I understand all the fundamental NFS-related integrity problems, but not
> this one :) That is, I see no reason for open() to fail to open a file for
> reading or writing if the system knows the problem is it's own. Why not
> just do another lookup and try obtain a valid file handle?
> 
> I was playing with different parts of the kernel while "fixing" this for
> myself. However, I believe, the simpliest patch would be for
> vfs_syscalls.c:open() (I've also made a working patch against vn_open(),
> though).
> 
> Could anyone please be so kind to comment this issue?
> 
> TIA
> 
> --- kern/vfs_syscalls.c.orig	Thu Jun 19 13:22:50 2003
> +++ kern/vfs_syscalls.c	Thu Jun 19 13:29:11 2003
> @@ -1008,6 +1008,7 @@
>  	int type, indx, error;
>  	struct flock lf;
>  	struct nameidata nd;
> +	int stale = 0;
>  
>  	oflags = SCARG(uap, flags);
>  	if ((oflags & O_ACCMODE) == O_ACCMODE)
> @@ -1025,8 +1026,15 @@
>  	 * the descriptor while we are blocked in vn_open()
>  	 */
>  	fhold(fp);
> +again:
>  	error = vn_open(&nd, flags, cmode);
>  	if (error) {
> +		/*
> +		 * if the underlying filesystem returns ESTALE
> +		 * we must have used a cached file handle.
> +		 */
> +		if (error == ESTALE && stale++ == 0)
> +			goto again;
>  		/*
>  		 * release our own reference
>  		 */

I can't get very enthusiastic about changing the file system independent
code to fix a deficiency in the NFS implementation.

If the name of the file are you attempting to open is relative to your
current working directory, and your current working directory is nuked
on the server, vn_open will return ESTALE, and your patch above will
loop forever.

NFS really doesn't work very well if modifications are make by both a
client and the server, or by multiple clients.  Solaris attempts to
compensate with a mount option:
           noac  Suppress data and attribute  caching.  The  data
                 caching  that is suppressed is the write-behind.
                 The local page cache is  still  maintained,  but
                 data  copied  into  it is immediately written to
                 the server.


If the rename on the server was done within the attribute validity time
on the client, vn_open() will succeed even without your patch, but you
may encounter the ESTALE error when you actually try to read or write
the file.

Unless you have some sort of locking protocol or other way of
synchronizing this sequence of operations on the client and server, the
server could do the rename while the client has the file open, after
which some I/O operation on the client will encounter ESTALE.

If the problem is that open() is failing a long time after the server
did the rename, then the best solution may be for the client to time out
file handles more aggressively.  If the vnode on the client is closed,
the file handle could be timed out after acregmin/acregmax or
acdirmin/acdirmax, or a new handle timeout parameter.  This may decrease
performance, but nothing is free ...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200306200617.h5K6HaM7058935>