Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Jun 2003 18:56:32 -0400
From:      John <jwd@bsdwins.com>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        uitm@blackflag.ru
Subject:   Re: open() and ESTALE error
Message-ID:  <20030620225632.GA29485@BSDWins.Com>
In-Reply-To: <3EF2CF89.3E5542F5@mindspring.com>
References:  <200306200617.h5K6HaM7058935@gw.catspoiler.org> <3EF2CF89.3E5542F5@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
----- Terry Lambert's Original Message -----
> Specifically, see the underline part of:
> 
> > > +             if (error == ESTALE && stale++ == 0)
>                                       ---------------
> 
> ...he exits it after retrying it fails, and falls into the
> standard ESTALE return case.
> 
> If this gets committed (which I think it shouldn't because I
> can see a genuinely bad handle getting converted to a good one
> in a couple of cases), that line should probably be rewritten
> to be more obvious (e.g. move the "stale++" before the "if"
> statement and adjust the compare to compensate for the difference
> so no one else reads it the way we did).

hi folks,

   After looking at his original patch, I suggested modifying
it for clarity to be of the form:

   error = vn_open(&nd, flags, cmode);
   if (error == ESTALE)
      error = vn_open(&nd, flags, cmode); /* single retry */


   While I understand a number of you have reservations against
this change, I think it worth serious consideration. Unless
someone is willing to go into each of the individual fs layers
and deal with ESTALE, this appears to be a relatively straight
forward and easy to understand approach.

   Most of the main applications I run on clusters have all
had their open routines recoded similar to the following (this
from ftpd):

   int try = 0;
   while ((fin = fopen(name,"r")) == NULL && errno == ESTALE && try < 3 ) {
      if (logging > 1)
         syslog(LOG_INFO,"fopen(\"%s\"): %m: attempting retry",name);
   }
   if (fin == NULL && logging > 1)
      syslog(LOG_INFO,"get fopen(\"%s\"): %m",name);   


   This is a real problem when using fbsd in high load / high
throughput situations where highly sequenced operations are
performed on a common set of data files from multiple machines. An
example of this environment can be seen here:

http://www.freebsd.org/~jwd/images/cluster.jpg

   If no one has any patches which can provide a better solution
for handling ESTALE I would like to see Andreys' patch given
a chance.

   Of course, if we don't want to do this, then I think it is
high time we documented that open(2) can return ESTALE and provide
a library routine that wraps open() with a retry :-)

-John



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030620225632.GA29485>