Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 06 Feb 2008 11:00:10 +0100
From:      =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To:        Brooks Davis <brooks@freebsd.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Forcing full file read in ZFS even when checksum error encountered
Message-ID:  <86tzkmv1zp.fsf@ds4.des.no>
In-Reply-To: <20080205173102.GA85735@lor.one-eyed-alien.net> (Brooks Davis's message of "Tue\, 5 Feb 2008 11\:31\:02 -0600")
References:  <47A73C8D.3000107@skyrush.com> <86prvby5o1.fsf@ds4.des.no> <47A864D9.4060504@skyrush.com> <864pcnxz8f.fsf@ds4.des.no> <47A88ADE.7050503@skyrush.com> <86abmfwc6h.fsf@ds4.des.no> <20080205173102.GA85735@lor.one-eyed-alien.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Brooks Davis <brooks@freebsd.org> writes:
> We've also experienced several situations were zfs was detecting
> corruption caused by bad cabling or bad controller firmware so SMART
> had nothing to report.

I once worked on a FreeBSD-based turn-key web hosting solution.  Our
first (and, eventually, only) customer had a four-server rig where two
of the servers shared a split rack-mount SCSI cabinet.  Either the
cables or the cabinet (most likely the latter) were defective, and at
some point, all data written to disk became silently corrupted.

To add insult to injury, the backups were unusable.  I don't recall why;
either the customer hadn't put tapes in the backup server, or the
backups had been corrupted as well, or maybe the backup solution I'd
written simply didn't work and hadn't been properly tested.

It was around that time I discovered that FreeBSD's version of GNU tar,
when run with certain command-line options (such as those typically used
by *shiver* Amanda), would create archives that no tar implementation -
itself included - could extract.

That rig was a disaster in so many ways...  When I first ordered it, I
also ordered spares for everything - spare PSUs, spare cooling fans,
spare disks etc.  Of course, the spares we actually got didn't fit.  I
can understand delivering the wrong spare fan when the original is
factory-installed, but when you ship a SCSI cabinet full of disks along
with spare disks that *don't match*, you know you're about to lose a
customer.

Oh, and the SCSI cables we used between the servers and the cabinet
didn't fit or lock properly, so we kept losing touch with the disks
until we replaced the cables.

Strangely enough, that supplier later lost their exclusive contract with
the University of Oslo, and went out of business.  Somebody apparently
figured that Dell could more easily handle larger orders, was less
likely to ship the wrong parts, and more likely to admit it and quickly
send a replacement if it did happen (although I've heard of a case where
someone complained to Dell about six missing screws, and shortly
thereafter received six separate cardboard boxes, each containing one
screw)

(oh shit, I'm turning into a rambling old fart!)

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86tzkmv1zp.fsf>