Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Aug 2007 00:38:23 -0700 (PDT)
From:      Tom Samplonius <tom@samplonius.org>
To:        Martin Nilsson <martin@gneto.com>
Cc:        Artem Kuchin <matrix@itlegion.ru>, freebsd-stable@freebsd.org
Subject:   Re: A little story of failed raid5 (3ware 8000 series)
Message-ID:  <27560580.441188027503141.JavaMail.root@ly.sdf.com>
In-Reply-To: <46CA7681.3070909@gneto.com>

next in thread | previous in thread | raw e-mail | index | archive | help

----- "Martin Nilsson" <martin@gneto.com> wrote:

> That is what patrol read is intended to detect before it is a problem.
> 
> In a RAID5 array the checksums are only used when reconstructing data,
> 
> if you have a bad block in a checksum sector it will not be detected 
> until a drive have failed and you try to rebuild the array, 
> unfortunately at that time it is too late...
> 
> Beware that OS software solutions like diskcheckd will not find this
> as 
> it only reads the data, not the checksums, it must be done on the 
> controller.

  This isn't really accurate.  First of all, if the RAID controller isn't confirming checksums before giving the data to the OS, what is the checksum for exactly?  It is supposed to be for detecting data corruption, so if the card isn't using the checksum, its kinda of useless.  I know some RAID systems do fake their checksums, as they don't actually validate data against the checksums during normal reads because they don't have the processing power.  I'd stay away from these type of systems (cough ... Blue Arc ... cough).

  Second, most RAID systems don't use their own checksums anymore.  Netapp is quite famous for their ZCS (zone checksum) drives, and still uses a variation of this technology on their newer systems (which are using 512 sectors).  But most RAID vendors just rely on the drives own error detection and correction systems (hamming code based usually, which is actually pretty solid).  I'm pretty sure that that 3ware doesn't use any checksums.

  However, in this particular case, validating checksums would have been unhelpful, since the disk was unreadable.  diskcheckd would have detected this issue.  It would probably have prevented the problem, if it had been running previously.

  ZFS is also a good option.  It has file level checksumming.  ZFS never trusts the disks, and is super paranoid.  And ZFS can do background scrubbing too.  I can't wait for ZFS in FreeBSD 7, because ZFS in software is going to 10 x better than anything 3ware has.


> Regards,
> Martin


Tom



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?27560580.441188027503141.JavaMail.root>