Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Oct 2002 23:44:14 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Maxim Sobolev <sobomax@FreeBSD.ORG>, hackers@FreeBSD.ORG
Subject:   Re: Patch to allow a driver to report unrecoverable write errors to the  buf layer
Message-ID:  <3DB0FF3E.E4096707@mindspring.com>
References:  <3DB048B5.21097613@FreeBSD.org> <200210181807.g9II7cBY024485@apollo.backplane.com> <3DB0516F.9BE00F57@FreeBSD.org> <200210181835.g9IIZsBX061970@apollo.backplane.com> <20021019051202.GB14922@vega.vega.com> <200210190613.g9J6Debh023134@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Matthew Dillon wrote:
> :Hmm, the current approach doesn't look all that "right" to me, because we are
> :retrying operation even though the upper-layer code that initiated it was
> :already notified about the failure (e.g. received EIO), so that it should not
> :assume that the data was actually written successfully. Or I am missing
> :something?
> 
>     Yah, most writes issued through the buffer cache are asynchronous or
>     delayed.  So the VFS layer that initiated the write is not necessarily
>     going to be notified of a failure.  Thus the failure notification does
>     not help us here.

First off, the failure notification needs to cascase all the way up:

o	retry by disk electronics
o	retry by controller
o	retry by driver
o	retry by FS
o	retry by application

At any one of those layers, you could insert a "media perfection layer";
for example, using "GEOM", you could insert BAD144 support between the
FS and the driver.

The argument about the failure not going to the request that caused
the failure is bogus; if we are actually talking about a request to
a file where the semantics of the write are such that the request
returns successfully before the write is guaranteed to be successful,
OK: then the failure is there for the next operation.  And that's
fine, and reasonable.

If people care about their data, they will use synchronous I/O, or
they will use an asynchronous I/O interface with explicit completion
notification (e.g. "aiowrite").  If they don't care about their data,
then signalling a failure preemptively on the next attempt -- e.g.
by closing a descriptor out from under them, or marking it read-only
following a write failure -- is the right thing to do.

In reality, people *do* care, even when they say they don't care...
or they would be opening /dev/null instead of a file, to receive
their writes.  All we are really arguing about here is delayed
notification because of intentional acceptance of bogus semantics
surrounding the commit to stable storage.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3DB0FF3E.E4096707>