Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 Feb 2000 19:04:41 +1030
From:      Greg Lehey <grog@lemis.com>
To:        "Justin T. Gibbs" <gibbs@narnia.plutotech.com>, Gary Palmer <gjp@in-addr.com>
Cc:        scsi@FreeBSD.org, up@3.am, Wilko Bulte <wilko@yedi.iaf.nl>
Subject:   Re: hardware vs software stripping
Message-ID:  <20000201190440.Q76348@freebie.lemis.com>
In-Reply-To: <200001311432.HAA32638@narnia.plutotech.com>
References:  <up@3.am> <87942.949373872@in-addr.com> <Pine.BSF.4.10.10001301401360.60037-100000@server.b0x.com> <200001311432.HAA32638@narnia.plutotech.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, 31 January 2000 at  7:32:31 -0700, Justin T. Gibbs wrote:
> In article <20000131104827.A62824@freebie.lemis.com> you wrote:
>>
>> I suppose you mean striping.  RAID-5 doesn't stripe at the byte level,
>> it stripes at the block level.  RAID-3 stripes at the byte level.
>
> I've heard you say this several times, but it is simply not true.

It's not simply true, anyway :-)

I think one of the problems is that I can't find an authoritative
definition of the levels.  I was going to buy one of those
super-expensive books that you probably have, but in the meantime I've
been limited to various web pages.  At
http://www.fdma.com/info/raidinto.html (now dead), I was told that
RAID-3 stripes at the byte level, and RAID-4 stripes at a block level.

At
http://www.lib.ox.ac.uk/internet/news/faq/archive/arch-storage.part1.html,
I read:

Raid Level 3 - Data protection disk - mathematical ECC type code
               calculated from multiple spindles and stored on another
               spindle.

Raid Level 4??? similar to 3, with block striping instead of byte.

Raid Level 5 - Striping plus data protection - stripe data across
               multiple spindles (as in RAID Level 0) and have data
               protection calculations (as in RAID level 3) but don't
               put all the calculated figures onto one spindle, but
               spread it out.

That appears to be less than authoritative.

At http://www.adaptec.com/technology/whitepapers/raid.html, I read:


RAID Level 3 stripes data at a byte level across several drives, with
             parity stored on one drive. It is otherwise similar to
             level 4. Byte-level striping requires hardware support
             for efficient use.

RAID Level 4 stripes data at a block level across several drives, with
             parity stored on one drive.  The parity information
             allows recovery from the failure of any single drive. The
             performance of a level 4 array is very good for reads
             (the same as level 0). Writes, however, require that
             parity data be updated each time. This slows small random
             writes, in particular, though large writes or sequential
             writes are fairly fast. Because only one drive in the
             array stores redundant data, the cost per megabyte of a
             level 4 array can be fairly low.

RAID Level 5 is similar to level 4, but distributes parity among the
             drives. This can speed small writes in multiprocessing
             systems, since the parity disk does not become a
             bottleneck. Because parity data must be skipped on each
             drive during reads, however, the performance for reads
             tends to be considerably lower than a level 4 array. The
             cost per megabyte is the same as for level 4.

Later in this page, I read:

  RAID Level Uses 
  
  Level 0 (striping) 
    Any application which requires very high speed storage, but does not
    need redundancy. Photoshop temporary files are a good example.
  
  Level 1 (mirroring) 
    Applications which require redundancy with fast random writes;
    entry-level systems where only two drives are available. Small file
    servers are an example.
  
  Level 4 (parity) 
    Applications which require redundancy at low cost, or with
    high-speed reads. This is good for archival storage. Larger file
    servers are an example.

  Level 5 (distributed parity) 
    Similar to level 4, but may provide higher performance if most I/O
    is random and in small chunks. Database servers are an example.

Note that they don't mention RAID-2 or RAID-3.  I'd agree with all
this except for RAID-4: there's no real advantage to RAID-4 over
RAID-5.

At http://www.baydel.com/tutorial.html I read:

RAID Levels

  The 1988 RAID paper proposed 5 levels: 1: mirroring. 3: byte
  striping with dedicated parity. 4: block striping with dedicated
  parity. 5: block striping with distributed parity. (RAID2 was
  superseded by RAID3)

  RAID3 was considered to be ideally suited to large 'scientific'
  transfers and RAID5 to OLTP, or Transaction Processing.
  Inexplicably, the researchers gave a strong implication that RAID3
  write performance would be bottlenecked on the parity drive. In
  fact, RAID3 'parallel' write performance is far better than with
  RAID5 or 'independent' RAIDs. Also, over the years, OLTP
  applications have been exhibiting an increasing write load with a
  small I/O size, resulting in a negation of the benefit of
  RAID5. Other applications such as NFS fileserving, Novell,
  Multi-media etc have I/O granularity above a size ideally suited to
  RAID5.

This is an interesting viewpoint.  In many cases, it's true, if you
always transfer a complete number of blocks, since then the pre-reads
of RAID-[45] aren't needed, which nearly doubles the write
performance.

Most of the other URLs I had have died, but they said much the same
sort of thing.

Finally, at http://www.whatis.com/raid.htm I read:

  RAID-3. This type uses striping and dedicates one drive to storing
    parity information. The embedded error checking (ECC) information
    is used to detect errors. Data recovery is accomplished by
    calculating the exclusive OR (XOR) of the information recorded on
    the other drives. Since an I/O operation addresses all drives at
    the same time, RAID-3 cannot overlap I/O. For this reason, RAID-3
    is best for single-user systems with long record applications.

  RAID-4. This type uses large stripes, which means you can read
    records from any single drive. This allows you to take advantage
    of overlapped I/O for read operations. Since all write operations
    have to update the parity drive, no I/O overlapping is
    possible. RAID-4 offers no advantage over RAID-5.

  RAID-5. This type includes a rotating parity array, thus addressing
    the write limitation in RAID-4. Thus, all read and write
    operations can be overlapped. RAID-5 stores parity information but
    not redundant data (but parity information can be used to
    reconstruct data). RAID-5 requires at least three and usually five
    disks for the array. It's best for multi-user systems in which
    performance is not critical or which do few write operations.

This comes closest to your definition by not using the term 'byte' in
describing RAID-3, but it doesn't deny the possibility either.  In
general, it's a bit vague.  Theoretically, the RAID-3 unit could be
sectors, but in my view that would make it a special case of RAID-4.

This page is also inaccurate in its description of RAID-4 and RAID-5:
RAID-4 *can* overlap read operations, and RAID-5 can't always overlap
write operations.  In fact, there's very little difference in the
amount of mutual exclusion needed on writes.

> RAID-3 is the same as RAID4 without the optimization for partial
> stripe writes.  In otherwords, in RAID-3, you must read or write a
> full stripe where RAID-4 adds the ability to perform RMW operations
> on the parity block of the stripe for sub-stripe updates. 

I'm not sure I follow you here.  Are you saying that the data layout
is the same and the difference is in the implementation of the
software?  That doesn't seem to justify a separate level. 

> Pluto uses a RAID-3 system in its video server products and it is
> certainly not striped on a byte level.

So how exactly is it striped?

> (Just as an aside, given the minimum 512 byte sector size of most
> magnetic media, striping an a per byte basis would be really
> wasteful).

Agreed, unless you use a PLA to split the data.

Obviously, the manufacturer of your RAID-3 box uses the term
differently from the way it's defined above.  There's obviously some
confusion, but I don't know who is right, but I would have thought
Adaptec knew what they're talking about (especially when they point
out the need for hardware support).

On Monday, 31 January 2000 at 21:57:52 -0500, Gary Palmer wrote:
> up@3.am wrote in message ID
> <Pine.BSF.4.10.10001311627210.14233-100000@richard2.pil.net>:
>> IIRC, the main difference between 3 and 5 is that 3 puts all of the parity
>> blocks on one spindle, whereas 5 distributes them across all of the
>> spindles.
>
> You're confusing RAID3 with RAID4.  RAID4 is RAID 0 with parity (on
> one spindle) and RAID 5 is RAID 0 with striped parity.

I'd call RAID-5 rotated parity, not striped.  The way I see it,
RAID-[3-5] are all striped.

Before you reply to these messages telling me where I'm wrong, please
check out http://www.lemis.com/vinum/implementation.html and tell me
where you disagree with what I say there.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000201190440.Q76348>