From owner-freebsd-hackers  Fri Oct 18 22:12:20 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B04E937B401
	for <hackers@FreeBSD.ORG>; Fri, 18 Oct 2002 22:12:18 -0700 (PDT)
Received: from baraca.united.net.ua (ns.united.net.ua [193.111.8.193])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7C8EC43EA3
	for <hackers@FreeBSD.ORG>; Fri, 18 Oct 2002 22:12:17 -0700 (PDT)
	(envelope-from max@vega.com)
Received: from vega.vega.com (xDSL-2-2.united.net.ua [193.111.9.226])
	by baraca.united.net.ua (8.11.6/8.11.6) with ESMTP id g9J5C6d14498;
	Sat, 19 Oct 2002 08:12:07 +0300 (EEST)
	(envelope-from max@vega.com)
Received: from vega.vega.com (max@localhost [127.0.0.1])
	by vega.vega.com (8.12.6/8.12.5) with ESMTP id g9J5C5aJ015040;
	Sat, 19 Oct 2002 08:12:05 +0300 (EEST)
	(envelope-from sobomax@FreeBSD.org)
Received: (from max@localhost)
	by vega.vega.com (8.12.6/8.12.5/Submit) id g9J5C3QJ015039;
	Sat, 19 Oct 2002 08:12:03 +0300 (EEST)
Date: Sat, 19 Oct 2002 08:12:02 +0300
From: Maxim Sobolev <sobomax@FreeBSD.ORG>
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: hackers@FreeBSD.ORG
Subject: Re: Patch to allow a driver to report unrecoverable write errors to the buf layer
Message-ID: <20021019051202.GB14922@vega.vega.com>
References: <3DB048B5.21097613@FreeBSD.org> <200210181807.g9II7cBY024485@apollo.backplane.com> <3DB0516F.9BE00F57@FreeBSD.org> <200210181835.g9IIZsBX061970@apollo.backplane.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
In-Reply-To: <200210181835.g9IIZsBX061970@apollo.backplane.com>
User-Agent: Mutt/1.4i
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

On Fri, Oct 18, 2002 at 11:35:54AM -0700, Matthew Dillon wrote:
> 
> :> :
> :> :There is a very easy way to trigger the problem: insert blank floppy
> :> :...
> :> 
> :>     Your patch looks slightly incomplete to me, but the concept is reasonable.
> :>     The BIO_NORETRY test that sets B_INVAL should probably be done in
> :>     brelse(), not in bufwait().  It is the code in brelse() that actually
> :>     does the re-dirtying of the buffer in case of a write-error.
> :
> :Ah, actually I've initially put it into brelse() but then reconsidered
> :a decision and moved it down into bufwait(). I'll move it back. ;)
> 
>     Heh heh.  Well, it seems to me that since it is the BUF abstraction
>     that has the error check / redirtying / retry code, then the BUF
>     abstraction should probably be responsible for the no-retry case as
>     well.  The BIO abstraction is really designed to hold an I/O operation,
>     not really to hold meta operations.  You could still specify a BIO
>     flag for it since it's a media hack of sorts, but the BUF code should
>     be responsible for processing it.

OK, thank you for deteiled explanation.

>     I dunno about a formal abstraction.  We need to differentiate between
>     media which can and cannot remap blocks.  A 'perfect' solution
>     would be far more complex.  File data blocks would have to be
>     remapped at the filesystem level and meta-data would have to be 
>     invalidated in-core (bitmap, inode blocks with write errors), and
>     the filesystem would have to be marked dirty on unmount.  Then unmount
>     could safely destroy the buffers representing the write-error'd meta
>     data. 
> 
>     The VFS layer would definitely need to be involved.  We have the
>     advantage in that the buffer cache is already logically mapped, but
>     it would still be a fairly sophisticated piece of work.
> 
> :>     This re-dirtying is necessary in most cases to prevent filesystem
> :>     corruption.  Otherwise the buffer may be thrown away and a re-read
> :>     may return the original pre-modified data, causing massive filesystem
> :>     corruption elsewhere (consider what that would mean for a bitmap block).
> :> 
> :>     I think it's perfectly reasonable to do away with the buffer in the
> :>     case of a floppy error, though.
> 
>     Just a bit of history.  Originally the buffer cache did not retry error'd
>     out writes.  I changed it several years ago because the mechanism
>     was producing massive filesystem corruption in the face of disk write
>     errors.  The floppy issue was a known issue at the time and I am quite
>     happy that someone is tackling the problem now!

Hmm, the current approach doesn't look all that "right" to me, because we are
retrying operation even though the upper-layer code that initiated it was
already notified about the failure (e.g. received EIO), so that it should not
assume that the data was actually written successfully. Or I am missing
something?

-Maxim

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message