From owner-freebsd-stable@FreeBSD.ORG Mon Oct 20 17:14:01 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 553ED10656E5; Mon, 20 Oct 2008 17:14:00 +0000 (UTC) (envelope-from joao@matik.com.br) Received: from msrv.matik.com.br (msrv.matik.com.br [200.153.48.3]) by mx1.freebsd.org (Postfix) with ESMTP id 2810C8FC3F; Mon, 20 Oct 2008 17:13:59 +0000 (UTC) (envelope-from joao@matik.com.br) Received: from [10.10.2.2] (189-19-2-198.dsl.telesp.net.br [189.19.2.198]) by msrv.matik.com.br (8.14.2/8.14.2) with ESMTP id m9KHDq6U070650; Mon, 20 Oct 2008 15:13:52 -0200 (BRST) (envelope-from joao@matik.com.br) From: JoaoBR Organization: Infomatik To: Chuck Swiger Date: Mon, 20 Oct 2008 15:12:12 -0200 User-Agent: KMail/1.9.7 References: <200810171530.45570.joao@matik.com.br> <20081020132208.GA3847@icarus.home.lan> <98238FC8-0FC4-4410-829F-EF2EA16A57B8@mac.com> In-Reply-To: <98238FC8-0FC4-4410-829F-EF2EA16A57B8@mac.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200810201512.12926.joao@matik.com.br> X-Spam-Status: No, score=2.1 required=5.0 tests=ALL_TRUSTED,AWL, BR_RECEIVED_SPAMMER, SARE_RECV_SPAM_DOMN02, TW_ZF autolearn=no version=3.2.5 X-Spam-Level: ** X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on msrv.matik.com.br X-Virus-Scanned: ClamAV 0.93.3/8451/Mon Oct 20 14:02:15 2008 on msrv.matik.com.br X-Virus-Status: Clean Cc: Jeremy Chadwick , freebsd-stable@freebsd.org Subject: Re: constant zfs data corruption X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Oct 2008 17:14:01 -0000 On Monday 20 October 2008 14:44:50 Chuck Swiger wrote: > Hi, all-- > > On Oct 20, 2008, at 6:22 AM, Jeremy Chadwick wrote: > [ ...JoaoBR wrote... ] > > >> well, hardware seems to be ok and not older than 6 month, also > >> happens not > >> only on one machine ... smartctl do not report any hw failures on > >> disk > >> > >> regarding jumpering the drives to 150 you suspect a driver problem? > > > > It's not because of a driver problem. There are known SATA chipsets > > which do not properly work with SATA300 (particularly VIA and SiS > > chipsets); they claim to support it, but data is occasionally > > corrupted. > > Capping the drive to SATA150 fixes this problem. > > > > http://en.wikipedia.org/wiki/Serial_ATA#SATA_1.5_Gbit.2Fs_and_SATA_3_Gb= it > >.2Fs > > Exactly so. Just as a general principle, if you've got sporadic data > corruption, turning I/O and system busses down a notch and retesting > is a useful starting point towards identifying whether the issue is > repeatable and whether it leans towards a hardware issue or software. > However, ZFS file checksumming supposedly is code that has been > carefully reviewed and tested so when it logs problems that is > supposed to be a fairly sure sign that the hardware isn't behaving > right. > ok, I will jumper it on some machines and see if the error comes back, even= if=20 my are Nvidia Sata > > > Because you didn't provide your smartctl output, I can't really tell > > if > > the drives are in "good shape" or not. :-) > > > > Also, do you not think it's a little odd that the only data corruption > > occurring for you are related to RRDtool? > > RRD tends to involve lots of small writes so it's files are going to > be changed often compared to other things that might be running; a > busy webserver or mailserver would involve more I/O to logfiles and > queue/mailspool, or so I would expect, but who knows what the machine > in question is being used for? > this server are transparent proxies (squid) on the top of small ISP network= s=20 with IPFW bandwidth control for the clients, the rrdtools collect the clien= t=20 traffic and some other data at every 5 minutes very ocasional I get the data corruption on a squid_cache file, normally 2= =20 days after the rrdtool error appears first =2D-=20 Jo=E3o A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura. Service fornecido pelo Datacenter Matik https://datacenter.matik.com.br