From owner-freebsd-stable@FreeBSD.ORG  Mon Oct 20 16:44:51 2008
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8E81C1065672;
	Mon, 20 Oct 2008 16:44:51 +0000 (UTC) (envelope-from cswiger@mac.com)
Received: from asmtpout022.mac.com (asmtpout022.mac.com [17.148.16.97])
	by mx1.freebsd.org (Postfix) with ESMTP id 78F8D8FC1A;
	Mon, 20 Oct 2008 16:44:51 +0000 (UTC) (envelope-from cswiger@mac.com)
MIME-version: 1.0
Content-transfer-encoding: 7BIT
Content-type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Received: from cswiger1.apple.com ([17.227.140.124])
	by asmtp022.mac.com (Sun Java(tm) System Messaging Server 6.3-7.03
	(built Aug
	7 2008; 32bit)) with ESMTPSA id <0K9100GDQQIQEN50@asmtp022.mac.com>;
	Mon, 20 Oct 2008 09:44:51 -0700 (PDT)
Message-id: <98238FC8-0FC4-4410-829F-EF2EA16A57B8@mac.com>
From: Chuck Swiger <cswiger@mac.com>
To: Jeremy Chadwick <koitsu@FreeBSD.org>
In-reply-to: <20081020132208.GA3847@icarus.home.lan>
Date: Mon, 20 Oct 2008 09:44:50 -0700
References: <200810171530.45570.joao@matik.com.br>
	<E3C2EAB9-12ED-4D3E-B07A-E2FF5892D26A@mac.com>
	<200810200837.40451.joao@matik.com.br>
	<20081020132208.GA3847@icarus.home.lan>
X-Mailer: Apple Mail (2.929.2)
Cc: freebsd-stable@freebsd.org, JoaoBR <joao@matik.com.br>
Subject: Re: constant zfs data corruption
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Oct 2008 16:44:51 -0000

Hi, all--

On Oct 20, 2008, at 6:22 AM, Jeremy Chadwick wrote:
[ ...JoaoBR wrote... ]
>> well, hardware seems to be ok and not older than 6 month, also  
>> happens not
>> only on one machine ... smartctl do not report any hw failures on  
>> disk
>>
>> regarding jumpering the drives to 150 you suspect a driver problem?
>
> It's not because of a driver problem.  There are known SATA chipsets
> which do not properly work with SATA300 (particularly VIA and SiS
> chipsets); they claim to support it, but data is occasionally  
> corrupted.
> Capping the drive to SATA150 fixes this problem.
>
> http://en.wikipedia.org/wiki/Serial_ATA#SATA_1.5_Gbit.2Fs_and_SATA_3_Gbit.2Fs

Exactly so.  Just as a general principle, if you've got sporadic data  
corruption, turning I/O and system busses down a notch and retesting  
is a useful starting point towards identifying whether the issue is  
repeatable and whether it leans towards a hardware issue or software.   
However, ZFS file checksumming supposedly is code that has been  
carefully reviewed and tested so when it logs problems that is  
supposed to be a fairly sure sign that the hardware isn't behaving  
right.

> There are also known problems with Silicon Image chipsets (on Linux,
> Windows, and FreeBSD).

Particularly with the 3112/4/x variants.  My understanding is that the  
later 312x/313x chipsets are "better" in the sense that an improvement  
to something bad is a relative status not denoting "approval".  :-)

> Because you didn't provide your smartctl output, I can't really tell  
> if
> the drives are in "good shape" or not.  :-)
>
> Also, do you not think it's a little odd that the only data corruption
> occurring for you are related to RRDtool?

RRD tends to involve lots of small writes so it's files are going to  
be changed often compared to other things that might be running; a  
busy webserver or mailserver would involve more I/O to logfiles and  
queue/mailspool, or so I would expect, but who knows what the machine  
in question is being used for?

Regards,
-- 
-Chuck