Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Oct 2008 15:12:12 -0200
From:      JoaoBR <joao@matik.com.br>
To:        Chuck Swiger <cswiger@mac.com>
Cc:        Jeremy Chadwick <koitsu@freebsd.org>, freebsd-stable@freebsd.org
Subject:   Re: constant zfs data corruption
Message-ID:  <200810201512.12926.joao@matik.com.br>
In-Reply-To: <98238FC8-0FC4-4410-829F-EF2EA16A57B8@mac.com>
References:  <200810171530.45570.joao@matik.com.br> <20081020132208.GA3847@icarus.home.lan> <98238FC8-0FC4-4410-829F-EF2EA16A57B8@mac.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday 20 October 2008 14:44:50 Chuck Swiger wrote:
> Hi, all--
>
> On Oct 20, 2008, at 6:22 AM, Jeremy Chadwick wrote:
> [ ...JoaoBR wrote... ]
>
> >> well, hardware seems to be ok and not older than 6 month, also
> >> happens not
> >> only on one machine ... smartctl do not report any hw failures on
> >> disk
> >>
> >> regarding jumpering the drives to 150 you suspect a driver problem?
> >
> > It's not because of a driver problem.  There are known SATA chipsets
> > which do not properly work with SATA300 (particularly VIA and SiS
> > chipsets); they claim to support it, but data is occasionally
> > corrupted.
> > Capping the drive to SATA150 fixes this problem.
> >
> > http://en.wikipedia.org/wiki/Serial_ATA#SATA_1.5_Gbit.2Fs_and_SATA_3_Gb=
it
> >.2Fs
>
> Exactly so.  Just as a general principle, if you've got sporadic data
> corruption, turning I/O and system busses down a notch and retesting
> is a useful starting point towards identifying whether the issue is
> repeatable and whether it leans towards a hardware issue or software.
> However, ZFS file checksumming supposedly is code that has been
> carefully reviewed and tested so when it logs problems that is
> supposed to be a fairly sure sign that the hardware isn't behaving
> right.
>

ok, I will jumper it on some machines and see if the error comes back, even=
 if=20
my are Nvidia Sata

>
> > Because you didn't provide your smartctl output, I can't really tell
> > if
> > the drives are in "good shape" or not.  :-)
> >
> > Also, do you not think it's a little odd that the only data corruption
> > occurring for you are related to RRDtool?
>
> RRD tends to involve lots of small writes so it's files are going to
> be changed often compared to other things that might be running; a
> busy webserver or mailserver would involve more I/O to logfiles and
> queue/mailspool, or so I would expect, but who knows what the machine
> in question is being used for?
>

this server are transparent proxies (squid) on the top of small ISP network=
s=20
with IPFW bandwidth control for the clients, the rrdtools collect the clien=
t=20
traffic and some other data at every 5 minutes

very ocasional I get the data corruption on a squid_cache file, normally 2=
=20
days after the rrdtool error appears first

=2D-=20


Jo=E3o







A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura.
Service fornecido pelo Datacenter Matik  https://datacenter.matik.com.br



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200810201512.12926.joao>