From owner-freebsd-stable@FreeBSD.ORG Mon Oct 20 17:15:17 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A9E221065673 for ; Mon, 20 Oct 2008 17:15:17 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA05.emeryville.ca.mail.comcast.net (qmta05.emeryville.ca.mail.comcast.net [76.96.30.48]) by mx1.freebsd.org (Postfix) with ESMTP id 88A778FC14 for ; Mon, 20 Oct 2008 17:15:16 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA12.emeryville.ca.mail.comcast.net ([76.96.30.44]) by QMTA05.emeryville.ca.mail.comcast.net with comcast id V4xr1a0060x6nqcA55FGyF; Mon, 20 Oct 2008 17:15:16 +0000 Received: from koitsu.dyndns.org ([69.181.141.110]) by OMTA12.emeryville.ca.mail.comcast.net with comcast id V5FG1a0042P6wsM8Y5FGDs; Mon, 20 Oct 2008 17:15:16 +0000 X-Authority-Analysis: v=1.0 c=1 a=8pif782wAAAA:8 a=QycZ5dHgAAAA:8 a=SwsCcULgg_0nKNSTLnUA:9 a=sdlEeXKs_2gQ3hZxgg3yQPbDqkIA:4 a=EoioJ0NPDVgA:10 a=LY0hPdMaydYA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id 09399C9437; Mon, 20 Oct 2008 10:15:16 -0700 (PDT) Date: Mon, 20 Oct 2008 10:15:16 -0700 From: Jeremy Chadwick To: JoaoBR Message-ID: <20081020171516.GA8551@icarus.home.lan> References: <200810171530.45570.joao@matik.com.br> <200810200837.40451.joao@matik.com.br> <20081020132208.GA3847@icarus.home.lan> <200810201507.30778.joao@matik.com.br> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200810201507.30778.joao@matik.com.br> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: freebsd-stable@freebsd.org Subject: Re: constant zfs data corruption X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Oct 2008 17:15:17 -0000 On Mon, Oct 20, 2008 at 03:07:30PM -0200, JoaoBR wrote: > On Monday 20 October 2008 11:22:08 you wrote: > > On Mon, Oct 20, 2008 at 08:37:40AM -0200, JoaoBR wrote: > > > On Friday 17 October 2008 15:39:59 Chuck Swiger wrote: > > > > On Oct 17, 2008, at 11:30 AM, JoaoBR wrote: > > > > > constantly I find data corruption on ZFS volums, ever from rrdtool, > > > > > this > > > > > corrupt data happens on SATA disks, never seem on SCSI > > > > > > > > Presumably your SATA drives are correctly being reported by ZFS as > > > > corrupting data, and you should do something like replace cables, the > > > > drives themselves, perhaps try downgrading to SATA-150 rather than > > > > -300 if you are using the later. Also consider running a drive > > > > diagnostic utility from the mfgr (or smartmontools) and doing an > > > > extended self-test or destructive write surface check. > > > > > > well, hardware seems to be ok and not older than 6 month, also happens > > > not only on one machine ... smartctl do not report any hw failures on > > > disk > > > > > > regarding jumpering the drives to 150 you suspect a driver problem? > > > > It's not because of a driver problem. There are known SATA chipsets > > which do not properly work with SATA300 (particularly VIA and SiS > > chipsets); they claim to support it, but data is occasionally corrupted. > > Capping the drive to SATA150 fixes this problem. > > > > http://en.wikipedia.org/wiki/Serial_ATA#SATA_1.5_Gbit.2Fs_and_SATA_3_Gbit.2 > >Fs > > > > There are also known problems with Silicon Image chipsets (on Linux, > > Windows, and FreeBSD). > > > > Because you didn't provide your smartctl output, I can't really tell if > > the drives are in "good shape" or not. :-) > > > > ok then here it comes > {snip} Yup, looks fine. All attributes are quite decent, except Temperature, which is high (46C, highest seen is 52C -- blazing hot). However, I refuse to believe that a high drive temperature would manifest itself as data corruption on only certain kinds of files. :-) So I think your drive is in OK shape. > > Also, do you not think it's a little odd that the only data corruption > > occurring for you are related to RRDtool? > > this yes I think is suspitious Chuck's probably spot-on with regards to explaining why this is. Something to keep in mind is that RRDtool has a history of bugs, so I wouldn't be surprised if the issue turned out to be there. It's really too bad we have no decent, actively-maintained alternatives to RRDtool. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |