FreeBSD Mail Archives

Date:      Wed, 2 Feb 2000 12:33:17 +1030
From:      Greg Lehey <grog@lemis.com>
To:        "Justin T. Gibbs" <gibbs@FreeBSD.org>
Cc:        Gary Palmer <gjp@in-addr.com>, scsi@FreeBSD.org, up@3.am, Wilko Bulte <wilko@yedi.iaf.nl>
Subject:   Definitions of RAID levels (was: hardware vs software stripping)
Message-ID:  <20000202123317.P55303@freebie.lemis.com>
In-Reply-To: <200002020129.SAA00438@caspian.plutotech.com>
References:  <20000202112755.L55303@freebie.lemis.com> <200002020129.SAA00438@caspian.plutotech.com>

On Tuesday,  1 February 2000 at 18:29:30 -0700, Justin T. Gibbs wrote:
> [summary or raid types]
>
>> Is this all they say about it?
>
> That's from the summary section.

Ah.

>> It begs the question why RAID-3 must access all members of the disk
>> at a time.  The only reason I can think of is that the data is
>> interleaved in such a manner that you can't get *any* useful data
>> without reading them all.  This rather agrees with the idea that
>> the data is spread in units of less than a sector.  It also doesn't
>> say why RAID-4 is less suitable for large file transfers.
>
> The point is that the complexity of RAID 4 buys little if all you
> want to do is write large files.

Right, but my understanding was that the RAID levels all describe
different layouts, not access methods.  I'm sure that the various
strategies for accessing RAID-5 are very different.

>> My understanding is that RAID-3, effectively striping at a sub-sector
>> level, can give much higher data rates without buffering, and that's
>> its raison d'être.
>
> If you stripe at the sub-sector level, you must perform RMW.  This makes
> absolutely no sense.

I think you're misunderstanding my use of the term "stripe".  I'm not
talking about "transactions" here, I'm talking about layout.  If I
have a 9 disk RAID-[345] set with a stripe size of 64 bytes, I can
read one sector from each of the 8 data disks and have a total of 8
sectors.  I can do the same thing if each disk contains an individual
bit of a byte.  Older disk and drum technology used a very similar
method (multiple heads) to speed up transfer times.  With relatively
simple hardware support, this would make a lot of sense, and if RAID-3
is really what you say, it makes me wonder why people haven't thought
of this alternative.

>>>>> In RAID4, it is supposed to be a multiple of your transaction
>>>>> size so you can perform partial read (assuming you don't need
>>>>> parity verification)
>>>>
>>>> Where do you get the term "transaction" from?  I haven't seen it in
>>>
>>> From the dictionary?  8-)
>>>
>>> The point is that your system is such that you may be able to
>>> satisfy a request by only reading one component of the stripe.
>>
>> That's one point.  My point is that a transaction may be of various
>> sizes, whereas the stripe has a fixed size.
>
> If your transaction is larger, perhaps you satisfy it by modifying 1 or
> more full stripes and only partially modifying the border stripes.
> The point is still the same.

Well, I can't see that.  You're saying that RAID-4 stripes should be a
multiple of the transaction size, and I'm saying the "transaction"
size is variable.  The "point" seems to be that this is the main
difference in your definitions of RAID-3 and RAID-4.

>>>> any RAID documentation.  In ufs, there is no fixed size.
>>>
>>> Sure there is, the block size (i.e. 8k.)
>>
>> ufs has a block size, sure, but the transfers are very seldom equal to
>> the block size.
>
> Lets say that you do 64k "strips" on each drive.  To satisfy an 8k
> transaction, you only need to touch on drive

(or maybe two if it goes off the end of the first strip.

> (and the parity disk on a write).  To satisfy a 128k transaction,
> you touch at most 4 (3 if your transaction is aligned).  You don't
> need to touch all N.

Sure.

> That is the difference.

From what?  This thread was about the differences between RAID-3 and
RAID-4.  I don't see anything different in these definitions except
possibly the software.

>>>> I'd call both of these RAID-4, considering that RAID doesn't use the
>>>> term "transaction".
>>>
>>> Sure it does.
>>
>> Is this in The Book as well?  How is it defined?
>
> The same way it is defined in the dictionary.  

OK, let me get a dictionary:

  Transaction: 1.  The adjustment of a dispute between parties by
 	           mutual concession; compromise.
	       2.  The action of transacting.
	       3.  That which is or has been transacted; a piece of
                   business.
	       4.  The action of passing or making over a thing from
                   one person, thing, or state to another.
               5.  (pl) The record of it proceedings published by a
                   learned society.

I don't really think any of those definitions even come close to what
we're talking about.  Even in computer science, the term "transaction"
normally has a different meaning.  We *do* need to define what we're
talking about here.

> The way you determine which RAID type is appropriate for you is by
> looking at the number of disks you have, the efficient disk strip
> size, as well as the transaction type and size of your application.
> That's what this is all about.

Yes, but in order to do that you need to know what the RAID types are,
and so far we have no agreement on what RAID-3 *is*.  

>>> In RAID-3, your transaction size *is* the stripe size.  In RAID-4,
>>> it may be less than the stripe size.
>>
>> So what is it in the Pluto implementation that stops you from
>> reading only part of a RAID-3 stripe?
>
> We could read part of a RAID-3 stripe if we decided the software
> complexity warranted it.  In our application, it makes more sense
> to read the entire stripe and cache it rather than read individual
> chunks.

OK, and you could do this without changing the physical layout?  In
that case, I'd suggest this is RAID-4, not RAID-3.  Note that the text
you quote states:

   Unlike RAID Level 3, however, a RAID Level 4 array's member disks
   are independently accessible.

This still suggests to me that there is something about RAID-3 layout,
not the software implementation, which makes it impossible to access
drives individually.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000202123317.P55303>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation