From owner-freebsd-fs@FreeBSD.ORG  Tue Feb  5 16:13:33 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8811916A417
	for <freebsd-fs@freebsd.org>; Tue,  5 Feb 2008 16:13:33 +0000 (UTC)
	(envelope-from joe@skyrush.com)
Received: from shadow.wildlava.net (shadow.wildlava.net [67.40.138.81])
	by mx1.freebsd.org (Postfix) with ESMTP id 5465413C447
	for <freebsd-fs@freebsd.org>; Tue,  5 Feb 2008 16:13:33 +0000 (UTC)
	(envelope-from joe@skyrush.com)
Received: from [129.162.240.95] (unknown [129.162.240.95])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by shadow.wildlava.net (Postfix) with ESMTP id DB8C68F424;
	Tue,  5 Feb 2008 09:13:31 -0700 (MST)
Message-ID: <47A88ADE.7050503@skyrush.com>
Date: Tue, 05 Feb 2008 09:12:14 -0700
From: Joe Peterson <joe@skyrush.com>
User-Agent: Thunderbird 2.0.0.9 (Windows/20071031)
MIME-Version: 1.0
To: =?UTF-8?B?RGFnLUVybGluZyBTbcO4cmdyYXY=?= <des@des.no>
References: <47A73C8D.3000107@skyrush.com>
	<86prvby5o1.fsf@ds4.des.no>	<47A864D9.4060504@skyrush.com>
	<864pcnxz8f.fsf@ds4.des.no>
In-Reply-To: <864pcnxz8f.fsf@ds4.des.no>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cc: freebsd-fs@freebsd.org
Subject: Re: Forcing full file read in ZFS even when checksum error
	encountered
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Feb 2008 16:13:33 -0000

Dag-Erling Smørgrav wrote:
> There is now way to "read the bad data" since an unrecoverable checksum
> error means that ZFS has no idea which of the multiple version of the
> affected block is the right one.

Nope, no mirror, no RAIDZ - just one partition.  But as far as I know, there
were no read errors, just a checksum error.  I've also done a couple of
surface scans of the drive, and no problems.  So all I can imagine is that
either data got "changed" on the disk (due to who know what), or the metadata
got messed up (either hardware or some SW bug).

I'd like to figure out what ZFS thinks the bytes in the file really are and
why they are showing as a checksum error.  So, since I only have one copy
(i.e. no RAID/mirror), then I should be able to tell ZFS to "go ahead and read
the bytes, not stopping when it hits the checksum mismatch).  Then I could do
an analysis of the data, compared to what the file should contain.

I assume I can hack the ZFS source to "disable" stopping on the checksum
problem, but I figured there might be some debug mode that would let me do
this without delving into the code.

> (I assume this was a raidz pool; if not, imagine Nelson Muntz from the
> Simpsons yelling "ha ha!" at you)

Ah, don't worry, I have backups (I'm just playing around with ZFS at the
moment...  :)

> My advice is to use 'dd conv=noerror' with a sufficiently small block
> size to recover what parts you can.

I haven't lost anything, so no need to do that.  I just want to see what's up
with this particular ZFS issue.  If it's a bug, at least I could submit it to Sun.

							Thanks, Joe