From owner-freebsd-fs@FreeBSD.ORG Thu Mar 29 21:14:02 2007 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 45F9A16A400 for ; Thu, 29 Mar 2007 21:14:02 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2-3.pacific.net.au [61.8.2.226]) by mx1.freebsd.org (Postfix) with ESMTP id DCBA313C46C for ; Thu, 29 Mar 2007 21:14:01 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.2.163]) by mailout2.pacific.net.au (Postfix) with ESMTP id A88831099BD; Fri, 30 Mar 2007 07:13:55 +1000 (EST) Received: from besplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (Postfix) with ESMTP id 6A55027405; Fri, 30 Mar 2007 07:13:59 +1000 (EST) Date: Fri, 30 Mar 2007 07:13:57 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Ivan Voras In-Reply-To: Message-ID: <20070330062726.I2388@besplex.bde.org> References: <20070328100536.S6916@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org Subject: Re: gvirstor & UFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Mar 2007 21:14:02 -0000 On Thu, 29 Mar 2007, Ivan Voras wrote: > Bruce Evans wrote: > >> The following old patch may help. vfs retries too hard after write >> errors. Retrying after EIO is bad enough (since most parts of the >> kernel still expect the old treatment of not retrying), but retrying >> after a non-recoverable error is just a bug. > > I've tried the patch - it resulted in a panic :( > > g_vfs_done():virstor/foo[WRITE(offset=17353104384, length=131072)]error = 28 > /bla: got error 28 while accessing file system > panic: softdep_deallocate_dependencies: unrecovered I/O error > cpuid=0 That is hard to fix. The change to vfs_bio.c to not discard buffer contents after a write error (rev.1.196 of vfs_bio.c) may even have been triggered by this and similar panics in soft updates. However, I think it is a bug for file systems to not be able to deal with i/o errors. Rev.1.196 could have reasonably left the buffer alone instead of discarding it as before or clearing its error indicator and dirty flag as now, so that file system code could deal with the error a little later. Then I think the above panic would still occur, sincs soft updates can't deal with the error. Soft updates is apparently depending on not even seeing the error. But some errors are non-recoverable, so not seeing them is no solution. > The file system on the virstor device was created with softupdates > enables, as shown... > > backtrace: > panic() ... softdep_deallocate_dependencies() ... brelse() ... > bufdone_finish() ... bufdone() ... cluster_callback() ... bufdone() ... > g_vfs_done() ... bio_done() ... g_io_schedule_up(), ... Apparently it get past the media size check in g_io_check() to give ENOSPC instead of EIO because g_io_check() only checks the virtual size. To support virtual overcommitted media, it is necessary for file systems to either do a physical check whenever they allocate a block (just checking that the block number is <= the maximum allocated one like most file systems do is insufficient of the media is overcommitted), or deal with ENOSPC-type errors later when they occur at write time. I once worked on a toy file system that did the former -- allocation was essentially malloc() and done by the file system, and deallocation was essentially free() and also done by the file system. FreeBSD seems to only have support for the free() part of this -- BIO_DELETE. For malloc()ed md disks, BIO_DELETE gives the free(), but allocation is done by just writing to a block. The malloc() for this uses M_WAITOK, so when a malloc()ed md disk is overcommitted and full, ENOSPC is not returned -- the system hangs instead :-(. Bruce