Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 01 Nov 2004 14:01:40 +0100
From:      Martin Nilsson <martin@gneto.com>
To:        Brad Knowles <brad@stop.mail-abuse.org>
Cc:        current@freebsd.org
Subject:   Re: Gvinum RAID5 performance
Message-ID:  <418633B4.80004@gneto.com>
In-Reply-To: <p06002006bdabc1160a6a@[10.0.1.3]>
References:  <002401c4bf9c$c4fee8e0$0201000a@riker> <p06002002bdab24905ad8@[10.0.1.3]> <1099286568.4185c82881654@picard.newmillennium.net.au> <p06002006bdabc1160a6a@[10.0.1.3]>

next in thread | previous in thread | raw e-mail | index | archive | help
You guys seem to confuse byte level striping with block level striping
and the use of the parity disk. Adaptec have a nice whitepaper that
explains this here: http://graphics.adaptec.com/pdfs/ACSP_RAID_Ch4.pdf

Brad Knowles wrote:
> At 4:22 PM +1100 2004-11-01, Alastair D'Silva wrote:
>> offshoot
>>  of this is that to ensure data integrity, a background process is run
>>  periodically to verify the parity.

That process (LSI Logic calls it patrol read) is more to exercise the
disks to spot seldom used marginal blocks in time, just like diskcheckd
in ports.

>     Keep in mind that if you've got a five disk RAID-5 array, then for 
> any given block, four of those disks are data and would have to be 
> accessed on every read operation anyway, and only one disk would be 
> parity.  

No RAID5 is block striped, so for any read operation you only have to
read the block(s) where the data is stored. Use as large blocks as
possible to avoid accessing more than one dive per transaction.

>     Even if you could get away from reading from all disks in the stripe 
> (and delaying the parity calculations to a background process), you're 
> not going to get away from writing to all disks in the stripe, because 
> those parity bits have to be written at the same time as the data and 
> you cannot afford a lazy evaluation here.

When writing you read the data+parity block, do some XOR magic and then
write out the two blocks again. This is why RAID5 is so painfully slow
with writes as it have to do four disk transactions for every single
write transaction. A large battery backed writeback cache can help with
this, both to order the accesses better and to delay write bursts until
later when the disks are not accessed so much.


The parity is used to reconstruct a failed drive, not to check the
integrity of data on the drives when reading. Drives have very good
error detection when reading data, if data is returned upon a read
operation it can be assumed to be correct. If the read fails the RAID
system should mark the drive as down/failed, and treat the array as
degraded i.e. use the parity on reads.

	/Martin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?418633B4.80004>