From owner-freebsd-hackers Fri Oct 18 22:12:20 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B04E937B401 for ; Fri, 18 Oct 2002 22:12:18 -0700 (PDT) Received: from baraca.united.net.ua (ns.united.net.ua [193.111.8.193]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7C8EC43EA3 for ; Fri, 18 Oct 2002 22:12:17 -0700 (PDT) (envelope-from max@vega.com) Received: from vega.vega.com (xDSL-2-2.united.net.ua [193.111.9.226]) by baraca.united.net.ua (8.11.6/8.11.6) with ESMTP id g9J5C6d14498; Sat, 19 Oct 2002 08:12:07 +0300 (EEST) (envelope-from max@vega.com) Received: from vega.vega.com (max@localhost [127.0.0.1]) by vega.vega.com (8.12.6/8.12.5) with ESMTP id g9J5C5aJ015040; Sat, 19 Oct 2002 08:12:05 +0300 (EEST) (envelope-from sobomax@FreeBSD.org) Received: (from max@localhost) by vega.vega.com (8.12.6/8.12.5/Submit) id g9J5C3QJ015039; Sat, 19 Oct 2002 08:12:03 +0300 (EEST) Date: Sat, 19 Oct 2002 08:12:02 +0300 From: Maxim Sobolev To: Matthew Dillon Cc: hackers@FreeBSD.ORG Subject: Re: Patch to allow a driver to report unrecoverable write errors to the buf layer Message-ID: <20021019051202.GB14922@vega.vega.com> References: <3DB048B5.21097613@FreeBSD.org> <200210181807.g9II7cBY024485@apollo.backplane.com> <3DB0516F.9BE00F57@FreeBSD.org> <200210181835.g9IIZsBX061970@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <200210181835.g9IIZsBX061970@apollo.backplane.com> User-Agent: Mutt/1.4i Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, Oct 18, 2002 at 11:35:54AM -0700, Matthew Dillon wrote: > > :> : > :> :There is a very easy way to trigger the problem: insert blank floppy > :> :... > :> > :> Your patch looks slightly incomplete to me, but the concept is reasonable. > :> The BIO_NORETRY test that sets B_INVAL should probably be done in > :> brelse(), not in bufwait(). It is the code in brelse() that actually > :> does the re-dirtying of the buffer in case of a write-error. > : > :Ah, actually I've initially put it into brelse() but then reconsidered > :a decision and moved it down into bufwait(). I'll move it back. ;) > > Heh heh. Well, it seems to me that since it is the BUF abstraction > that has the error check / redirtying / retry code, then the BUF > abstraction should probably be responsible for the no-retry case as > well. The BIO abstraction is really designed to hold an I/O operation, > not really to hold meta operations. You could still specify a BIO > flag for it since it's a media hack of sorts, but the BUF code should > be responsible for processing it. OK, thank you for deteiled explanation. > I dunno about a formal abstraction. We need to differentiate between > media which can and cannot remap blocks. A 'perfect' solution > would be far more complex. File data blocks would have to be > remapped at the filesystem level and meta-data would have to be > invalidated in-core (bitmap, inode blocks with write errors), and > the filesystem would have to be marked dirty on unmount. Then unmount > could safely destroy the buffers representing the write-error'd meta > data. > > The VFS layer would definitely need to be involved. We have the > advantage in that the buffer cache is already logically mapped, but > it would still be a fairly sophisticated piece of work. > > :> This re-dirtying is necessary in most cases to prevent filesystem > :> corruption. Otherwise the buffer may be thrown away and a re-read > :> may return the original pre-modified data, causing massive filesystem > :> corruption elsewhere (consider what that would mean for a bitmap block). > :> > :> I think it's perfectly reasonable to do away with the buffer in the > :> case of a floppy error, though. > > Just a bit of history. Originally the buffer cache did not retry error'd > out writes. I changed it several years ago because the mechanism > was producing massive filesystem corruption in the face of disk write > errors. The floppy issue was a known issue at the time and I am quite > happy that someone is tackling the problem now! Hmm, the current approach doesn't look all that "right" to me, because we are retrying operation even though the upper-layer code that initiated it was already notified about the failure (e.g. received EIO), so that it should not assume that the data was actually written successfully. Or I am missing something? -Maxim To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message