From owner-freebsd-fs@FreeBSD.ORG  Thu Mar 29 21:14:02 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
X-Original-To: freebsd-fs@freebsd.org
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 45F9A16A400
	for <freebsd-fs@freebsd.org>; Thu, 29 Mar 2007 21:14:02 +0000 (UTC)
	(envelope-from bde@zeta.org.au)
Received: from mailout2.pacific.net.au (mailout2-3.pacific.net.au [61.8.2.226])
	by mx1.freebsd.org (Postfix) with ESMTP id DCBA313C46C
	for <freebsd-fs@freebsd.org>; Thu, 29 Mar 2007 21:14:01 +0000 (UTC)
	(envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.2.163])
	by mailout2.pacific.net.au (Postfix) with ESMTP id A88831099BD;
	Fri, 30 Mar 2007 07:13:55 +1000 (EST)
Received: from besplex.bde.org (katana.zip.com.au [61.8.7.246])
	by mailproxy2.pacific.net.au (Postfix) with ESMTP id 6A55027405;
	Fri, 30 Mar 2007 07:13:59 +1000 (EST)
Date: Fri, 30 Mar 2007 07:13:57 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@besplex.bde.org
To: Ivan Voras <ivoras@fer.hr>
In-Reply-To: <euh5hh$iis$1@sea.gmane.org>
Message-ID: <20070330062726.I2388@besplex.bde.org>
References: <euca4b$6l8$1@sea.gmane.org> <20070328100536.S6916@besplex.bde.org>
	<euh5hh$iis$1@sea.gmane.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org
Subject: Re: gvirstor & UFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Mar 2007 21:14:02 -0000

On Thu, 29 Mar 2007, Ivan Voras wrote:

> Bruce Evans wrote:
>
>> The following old patch may help.  vfs retries too hard after write
>> errors.  Retrying after EIO is bad enough (since most parts of the
>> kernel still expect the old treatment of not retrying), but retrying
>> after a non-recoverable error is just a bug.
>
> I've tried the patch - it resulted in a panic :(
> 
> g_vfs_done():virstor/foo[WRITE(offset=17353104384, length=131072)]error = 28
> /bla: got error 28 while accessing file system
> panic: softdep_deallocate_dependencies: unrecovered I/O error
> cpuid=0

That is hard to fix.  The change to vfs_bio.c to not discard buffer contents
after a write error (rev.1.196 of vfs_bio.c) may even have been triggered
by this and similar panics in soft updates.  However, I think it is a bug
for file systems to not be able to deal with i/o errors.  Rev.1.196 could
have reasonably left the buffer alone instead of discarding it as before
or clearing its error indicator and dirty flag as now, so that file system
code could deal with the error a little later.  Then I think the above
panic would still occur, sincs soft updates can't deal with the error.
Soft updates is apparently depending on not even seeing the error.  But
some errors are non-recoverable, so not seeing them is no solution.

> The file system on the virstor device was created with softupdates
> enables, as shown...
>
> backtrace:
> panic() ... softdep_deallocate_dependencies() ... brelse()  ...
> bufdone_finish() ... bufdone() ... cluster_callback() ... bufdone() ...
> g_vfs_done() ... bio_done() ... g_io_schedule_up(), ...

Apparently it get past the media size check in g_io_check() to give
ENOSPC instead of EIO because g_io_check() only checks the virtual
size.  To support virtual overcommitted media, it is necessary for
file systems to either do a physical check whenever they allocate a
block (just checking that the block number is <= the maximum allocated
one like most file systems do is insufficient of the media is
overcommitted), or deal with ENOSPC-type errors later when they occur
at write time.  I once worked on a toy file system that did the former
-- allocation was essentially malloc() and done by the file system,
and deallocation was essentially free() and also done by the file
system.  FreeBSD seems to only have support for the free() part of
this -- BIO_DELETE.  For malloc()ed md disks, BIO_DELETE gives the
free(), but allocation is done by just writing to a block.  The malloc()
for this uses M_WAITOK, so when a malloc()ed md disk is overcommitted
and full, ENOSPC is not returned -- the system hangs instead :-(.

Bruce