Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Dec 1998 23:44:57 -0700 (MST)
From:      David G Andersen <danderse@cs.utah.edu>
To:        bright@hotjobs.com (Alfred Perlstein)
Cc:        danderse@cs.utah.edu, karl@Denninger.Net, hackers@FreeBSD.ORG
Subject:   Re: yup, found it (NFS)
Message-ID:  <199812170644.XAA03482@lal.cs.utah.edu>
In-Reply-To: <Pine.BSF.4.05.9812170116200.378-100000@bright.fx.genx.net> from "Alfred Perlstein" at Dec 17, 98 01:27:30 am

next in thread | previous in thread | raw e-mail | index | archive | help
Lo and behold, Alfred Perlstein once said:
> as an interrupt) is sent to a process waiting for an NFS server,
> the corresponding I/O system call returns with a transient error.
> (*1)  Normally, the process is terminated by the signal.(*2)

  Right.  This is simply having the system call return EINTR, one of the
options I duscuss.  However, it's not clear to me that this is the best
option when the action was triggered by a close() on a file descriptor.
Certainly, if it happens during a write, most processes know how to cope
with it -- and the kernel does the right thing, returning EINTR.  However,
when flushing a dirty block during a close, and you get an intr, what do
you expect to do?

  Other BSDs don't have this problem because they call sleep() in this
context, which isn't interruptable.  That leads to the other simple fix I
noted, changing the tsleep() call to not have the interruptable flag.
(Thus making us equivalent to blahBSD, etc).  But that means that you
could potentially wedge a process on an NFS server that hung, despite the
'intr' flag.  It also explains why removing the nfsiods works, because the
buffer flushing occurs as a part of the write() (which is interruptable
and behaves properly), instead of being delayed until the close.
> 
> This isn't exactly good, a normal write should proceed as normal
> correct?  Maybe it can delay the signal and try an extra 4-5 times
> and delay the signal untill after the syscall?

  It's not a write; most programs that are signal aware will notice the
EINTR and retry the write.  It's the close -- when was the last time you
saw a program which checked the return value from close()?  (The answer is
about 20% of the time in usr.bin).

   Solaris isn't affected by this.  It's FreeBSD specific due to the
sleep() -> tsleep() change.

   -Dave

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199812170644.XAA03482>