Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Apr 2007 12:42:29 -0500
From:      Eric Anderson <anderson@freebsd.org>
To:        rick-freebsd@kiwi-computer.com
Cc:        freebsd-geom@freebsd.org
Subject:   Re: volume management
Message-ID:  <461BCC85.2080900@freebsd.org>
In-Reply-To: <20070410172604.GA21036@keira.kiwi-computer.com>
References:  <20070409152401.GG76673@garage.freebsd.pl>	<20070409153203.GA88082@harmless.hu> <461A5EC6.8010000@freebsd.org>	<20070409154407.GA88621@harmless.hu> <evfqtt$n23$1@sea.gmane.org>	<20070410111957.GA85578@garage.freebsd.pl> <461B75B2.40201@fer.hr>	<20070410114115.GB85578@garage.freebsd.pl>	<20070410161445.GA18858@keira.kiwi-computer.com>	<20070410162129.GI85578@garage.freebsd.pl> <20070410172604.GA21036@keira.kiwi-computer.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 04/10/07 12:26, Rick C. Petty wrote:
> On Tue, Apr 10, 2007 at 06:21:29PM +0200, Pawel Jakub Dawidek wrote:
>> The choice you have currently is to panic and lost few last seconds of
>> your data, but keep file system in a consistent state, or to return
> 
> How can you guarantee the FS is consistent at that point?  Are you looking
> through the list of blocks to be written?  Granted, with soft updates this
> is less risky, because presumably the metadata blocks haven't been written
> until the data blocks are.
> 
>> ENOSPC which nobody is going to handle and which may at the end corrupt
>> your file system to a state that fsck won't be able to fix it.
> 
> Is a file system thread waiting on the block to be written, or because it's
> in a write cache is the caller lost forever?  I thought the UFS soft
> updates code was blocking on the write, even though the userland caller had
> a successful return.  If so, the FS should handle the error and avoid
> inconsistencies.
> 
> I certainly see this type of behavior in gvinum when a disk is lost and a
> write to a slice cannot finish successfully.  I'm very glad the box doesn't
> panic as often because I can sometimes go in and bring the drive back up.
> 
>> This is not about simple write operation to the disk. Those operations
>> are delayed anyway, your userland process will see the write operation
>> succeeded. This is about kernel and file system consistency.
> 
> I'm aware of that, but what's the call stack leading up to the GEOM
> failure?  I was under the impression that UFS was blocked waiting for a
> write operation, which is all done in the kernel anyway.


I think the issue is that UFS doesn't expect to see ENOSPC from the 
storage, since it believes it's on a provider that should be big enough. 
  Is the right thing to teach UFS to recognize ENOSPC, and pass that on 
to the userland?


>> It will be
>> great to just fix everything in the kernel to handle errors properly,
>> but good luck with that.
> 
> That's a worthy goal and something we should be pursuing.  After all,
> FreeBSD used to be noted for its stability.  I wouldn't call panics a sign
> of stability..  You're better off invalidating all the geom consumers and
> leaving the rest of the system up so an admin can try to recover critical
> data, or so the remaining geom providers can continue to function.

There's been talk in the past about making the mount read-only instead 
of a panic in some situations, but I know nothing more than that.

Eric






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?461BCC85.2080900>