Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 1 Mar 1998 23:26:53 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        dyson@FreeBSD.ORG
Cc:        nrice@emu.sourcee.com, karl@mcs.net, jb@cimlogic.com.au, joe@via.net, hackers@FreeBSD.ORG
Subject:   Re: help - make world fails
Message-ID:  <199803012326.QAA04854@usr08.primenet.com>
In-Reply-To: <199803011531.KAA02458@dyson.iquest.net> from "John S. Dyson" at Mar 1, 98 10:31:21 am

next in thread | previous in thread | raw e-mail | index | archive | help
> > > I think that the system is very close to stable again, with the
> > > NFS caveat.  Once I can solve the (very reproduceable) problem,
> > > I will be much happier with NFS.  There are also some outstanding
> > > bugfixes for NFS, which I am working with in my local tree...
> > 
> > Would any of those outstanding ``bug fixes'' resolve the issue with
> > NFS client freezing the system when the server is non-responsive?
>
> Not yet.  I am working on things that are *more* severe than that
> right now.  Not discounting the above problem though as not being
> severe.

IMO, this is a problem in the RPC state machine not being sensitive
to remote resets in the middle of an operation.

Basically, an RPC call is made, your request is ack'ed or nak'ed,
and if it was ack'ed, you go into a state from which you can only
emerge with more data from the server.

Probably this needs to timeout back to a retry as if you had not
been ack'ed.  I have not looked very deeply into what this would
mean in terms of needing to unwind state, in the case that the
original reques could no longer be validly served (ie: open/unlink
an NFS file (results in a rename) and continue to do I/O).

One thing that would help is server-signalling.  This is basically
the job of rpc.statd.  THe operation could be retried before the
timeout.

One real pain is that for a long delay link, ie: satellite, Sprint (;-)),
etc., if you were to restart the call that was ACK'ed and wait for
another ACK, you would have to accept a response-without-ACK to
make yourself robust (ie: if the OP was a "delete file" or whatever,
it's not idempotent -- ie: unlike a block write, you can't replay the
event with no ill effect).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199803012326.QAA04854>