Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Jan 2008 09:38:22 +0100
From:      Bernd Walter <ticso@cicely12.cicely.de>
To:        Scott Long <scottl@samsco.org>
Cc:        freebsd-fs@freebsd.org, Brooks Davis <brooks@freebsd.org>, ticso@cicely.de, Tz-Huan Huang <tzhuan@csie.org>
Subject:   Re: ZFS i/o errors - which disk is the problem?
Message-ID:  <20080108083822.GL76422@cicely12.cicely.de>
In-Reply-To: <47830BC0.5060100@samsco.org>
References:  <477B16BB.8070104@freebsd.org> <20080102070146.GH49874@cicely12.cicely.de> <477B8440.1020501@freebsd.org> <200801031750.31035.peter.schuller@infidyne.com> <477D16EE.6070804@freebsd.org> <20080103171825.GA28361@lor.one-eyed-alien.net> <6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com> <20080107135925.GF65134@cicely12.cicely.de> <47830BC0.5060100@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jan 07, 2008 at 10:36:00PM -0700, Scott Long wrote:
> Bernd Walter wrote:
> >On Mon, Jan 07, 2008 at 10:44:13AM +0800, Tz-Huan Huang wrote:
> >>2008/1/4, Brooks Davis <brooks@freebsd.org>:
> >The data is corrupted by controller and/or disk subsystem.
> >You have no other data sources for the broken data, so it is lost.
> >The only garantied way is to get it back from backup.
> >Maybe older snapshots/clones are still readable - I don't know.
> >Nevertheless data is corrupted and that's the purpose for alternative
> >data sources such as raidz/mirror and at last backup.
> >You shouldn't have ignored those errors at first, because you are
> >running with faulty hardware.
> >Without ZFS checksumming the system would just process the broken
> >data with unpredictable results.
> >If all those errors are fresh then you likely used a broken RAID
> >controller below ZFS, which silently corrupted syncronity and then
> >blow when disk state changed.
> >Unfortunately many RAID controllers are broken and therefor useless.
> >
> 
> Huh?  Could you be any more vague?  Which controllers are broken?  Have 
> you contacted anyone about the breakage?  Can you describe the breakage?
> I call bullshit, pure and simple.

Just go back a few mails in the same thread were someone fixed CRC
errors by updating the RAID controller firmware.
I'm amazed how often I read something like this lately.
And if you read the whole thread then you will notice that we are
currently talking about another person which has corrupted data on
a RAID disk - not sure if this is the controller, a drive or the
drivers, but something is faulty here and I wouldn't be surprised
if it is the controller.
And then there are so many RAID controllers without backed memory or
other mechanism to garantie syncronity for the disks, which I call
broken by design.
You know yourself how important syncronity is for RAID, especially
when it comes to parity based RAID and you know how fragile it is
when it comes to power failure.

-- 
B.Walter                http://www.bwct.de      http://www.fizon.de
bernd@bwct.de           info@bwct.de            support@fizon.de



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080108083822.GL76422>