From owner-freebsd-fs@FreeBSD.ORG Wed Mar 5 00:15:28 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 33FEB1065673; Wed, 5 Mar 2008 00:15:28 +0000 (UTC) (envelope-from joe@skyrush.com) Received: from shadow.wildlava.net (shadow.wildlava.net [67.40.138.81]) by mx1.freebsd.org (Postfix) with ESMTP id EB3F18FC1A; Wed, 5 Mar 2008 00:15:27 +0000 (UTC) (envelope-from joe@skyrush.com) Received: from [10.0.3.98] (mail.boulder.swri.edu [65.241.78.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by shadow.wildlava.net (Postfix) with ESMTP id 3A18B8F432; Tue, 4 Mar 2008 17:15:25 -0700 (MST) Message-ID: <47CDE61A.8040102@skyrush.com> Date: Tue, 04 Mar 2008 17:15:22 -0700 From: Joe Peterson User-Agent: Thunderbird 2.0.0.9 (X11/20071119) MIME-Version: 1.0 To: Eric Anderson References: <47ACD7D4.5050905@skyrush.com> <47ACDE82.1050100@skyrush.com> <20080208173517.rdtobnxqg4g004c4@www.wolves.k12.mo.us> <47ACF0AE.3040802@skyrush.com> <1202747953.27277.7.camel@buffy.york.ac.uk> <47B0A45C.4090909@skyrush.com> <47CD4DCF.5070505@freebsd.org> In-Reply-To: <47CD4DCF.5070505@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Analysis of disk file block with ZFS checksum error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Mar 2008 00:15:28 -0000 Eric Anderson wrote: > I'm starting to think there is a timing issue or some such problem with > ZFS, since I can use the same drives in a gmirror with UFS, and never > have any data problems (md5 checksums confirm it over-and-over). I > highly doubt that everyone is seeing similar issues and it just is > because ZFS is so intense. I've had plenty of systems under severe disk > load that have never exhibited corrupt files because of something like > this. I also wondered this - i.e. if ZFS was triggering a certain timing behavior that revealed the problem. Still, if this is the case, it seems to me that the problem lies in the ATA subsystem, since it should prevent a higher-level things like ZFS to be able to create bad timings (or am I not thinking of this correctly?). Also, I think there were some reports of problems with DMA/ATA when *not* using ZFS. > I wish we could get our hands on this issue.. Seems like some common > threads are ATA/SATA disks. Is your setup running 32bit or 64bit > FreeBSD? (if you already mentioned it, I'm sorry, I missed it) This was on 32bit FreeBSD with PATA. I am the one who had no SMART issues and no DMA errors reported under Linux. Changing the cable may have "fixed" it, since I did not see errors in some further testing, but even if so, my theory is that there is some edge case (timing?) that the FreeBSD ATA drivers were sensitive to, and perhaps my change of cables pushed the problem to the other side of the threshold. Since I never saw errors under Linux (and I've been using that cable for a couple of years), I do not necessarily think the cable was actually "defective". -Joe