From owner-freebsd-fs@FreeBSD.ORG Thu Nov 1 00:09:36 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BC21CDF4; Thu, 1 Nov 2012 00:09:36 +0000 (UTC) (envelope-from prvs=1652892d21=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 14E948FC0C; Thu, 1 Nov 2012 00:09:35 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50000903327.msg; Thu, 01 Nov 2012 00:09:33 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 01 Nov 2012 00:09:33 +0000 (not processed: message from valid local sender) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1652892d21=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: From: "Steven Hartland" To: "Peter Jeremy" References: <27087376D1C14132A3CC1B4016912F6D@multiplay.co.uk> <20121031212346.GL3309@server.rulingia.com> Subject: Re: ZFS corruption due to lack of space? Date: Thu, 1 Nov 2012 00:09:33 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Nov 2012 00:09:36 -0000 On 2012-Oct-31 17:25:09 -0000, Steven Hartland wrote: >>Been running some tests on new hardware here to verify all >>is good. One of the tests was to fill the zfs array which >>seems like its totally corrupted the tank. > >I've accidently "filled" a pool, and had multiple processes try to >write to the full pool, without either emptying the free space reserve >(so I could still delete the offending files) or corrupting the pool. Same here but its the first time I've had ZIL in place at the time so wondering if that may be playing a factor. > Had you tried to read/write the raw disks before you tried the > ZFS testing? Yes, didn't see any issues but then it wasn't checksuming so tbh I wouldn't have noticed if it was silently corrupting data. >Do you have compression and/or dedupe enabled on the pool? Nope bog standard raidz2 no additional settings >>1. Given the information it seems like the multiple writes filling >>the disk may have caused metadata corruption? > > I don't recall seeing this reported before. Nore me and we've been using ZFS for years, but never filled a pool with such known simultanious access + ZIL before >>2. Is there anyway to stop the scrub? > >Other than freeing up some space, I don't think so. If this is a test >pool that you don't need, you could try destroying it and re-creating >it - that may be quicker and easier than recovering the existing pool. Artems trick of cat /dev/null > /tank2/ worked and I've now managed to stop the scrub :) >>3. Surely low space should never prevent stopping a scrub? > > As Artem noted, ZFS is a copy-on-write filesystem. It is supposed to > reserve some free space to allow metadata updates (stop scrubs, delete > files, etc) even when it is "full" but I have seen reports of this not > working correctly in the past. A truncate-in-place may work. Yes it did thanks, but as you said if this metadata update was failing due to out of space lends credability to the fact that the same lack of space and hence failure to update metadata could have also caused the corruption in the first place. Its interesting to note that the zpool is reporting pleanty of free space even when the root zfs volume was showing 0, so you would expect there to be pleanty of space for it be able to stop the scrub but it appears not which is definitely interesting and could point to the underlying cause? zpool list tank2 NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT tank2 19T 18.7T 304G 98% 1.00x ONLINE - zfs list tank2 NAME USED AVAIL REFER MOUNTPOINT tank2 13.3T 0 13.3T /tank2 Current state is:- scan: scrub in progress since Wed Oct 31 16:13:53 2012 1.64T scanned out of 18.7T at 62.8M/s, 79h12m to go 280M repaired, 8.76% done Something else that was interesting is while the scrub was running devd was using a good amount of CPU 40% of a 3.3Ghz core, which I've never seen before. Any ideas why its usage would be so high? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.