Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Jul 1998 06:01:13 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        grog@lemis.com (Greg Lehey)
Cc:        tlambert@primenet.com, gibbs@plutotech.com, andre@pipeline.ch, Matthew.Alton@anheuser-busch.com, Hackers@FreeBSD.ORG
Subject:   Re: Software RAID-5 performance
Message-ID:  <199807140601.XAA00726@usr06.primenet.com>
In-Reply-To: <19980714122952.L754@freebie.lemis.com> from "Greg Lehey" at Jul 14, 98 12:29:52 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> Non-interleaved I/O, on the other hand, can be a big penalty (if we're
> talking about the same thing).  If I have an array with 5 drives, each
> capable of a realistic 5 MB/s, and a stripe width of 64 kB, and I
> write 256 kB to it, I need to do:

We are talking about read(2) calls occuring serially, and write(2)
calls occurring non-serially, presuming write(2) is implemented
correctly for the non-blocking I/O case.

This means that reads will trigger a fetch, but return an EWOULDBLOCK,
and writes copy to a buffer (presuming, as you did, that you will be
writing a stripe, which puts it on a page boundry with a page increment).

> 2.  Calculate parity.  On the 486/66, this looks like being about 8
>     ms.

This is the overhead I was referring to.

> 3.  Write the blocks.  If you can do this in parallel, it'll take
>     about 13 ms.  Serially, it'll take about 50 ms.

Writes occur in parallel if they are queued when the write is requested,
and success is indicated by permission to write and available buffer
space.  Hard errors are a seperate issue.  due to the nature of an
async fd, I think it's safe to say that writes, at least in page
increments on page boundries, complete immediately if there is buffer
space, and therefore multiple user space threads writes are interleaved.

For this case, it takes it from 13ms to 21 ms, or to be generous, one
and a half times slower.


> > Software RAID is a data integrity issue, not a performance one,
> > and I think making the performance argument for whatever reason
> > (protection domain crossing, interleaved I/O, SMP scalability,
> > etc.) is a strawman at best.
> 
> I'm not sure that I understand what you're saying here.  Obviously
> offloading the checksum calculation (or anything else, for that
> matter) to an external box will offload the CPU.  And I can't see any
> particular difference in data integrity between the two approaches.

If you have a specific need for RAID-5 assurances, then performance
is a secondardy consideration.  The next consideration after assumed
fault tolerance requirements is the performance/money trade-off for
hardware RAID 5 vs. software.

I think performance will be secondary, so a performance argument is
really secondary; you can throw money at the RAID-5 performance
issues to make them go away.

So even if there is a significant performance penalty (1.538 times
slower is a significant penalty, IMO), if your application requires
RAID-5, then it requires it at any cost.

And if performance isn't an issue at that point, then pointing at
user space threads as a bottleneck is the wrong thing to do (and it
isn't even the bottleneck it is blamed as being; the overhead from
non-interleaved reads (which are effectively interleaved read-ahead
requests, followed by serial copy-from-cache, which means "about as
fast as you can get, since you have to copy anyway") is negligible
compared to the overhead you are already willing to eat to get the
fault tolerance.  There's a tiny increment, true, but it's much less
than the additional overhead you'd get from kernel thread context
switching on a UP kernel, or even on an SMP kernel, if you didn't
have thread-CPU affinity.

So the fact that FreeBSD currently has user space threads is pretty
much a red herring.  The performance penalty for user space threads,
and the performance benefit for kernel space threads won't affect
this particular (I/O bound) application anyway, unlees your box is
(1) SMP and (2) has CPU affinity code; if it does, you can cut the
time from 16ms for two operations to 8ms for two operations (still
128% of the time it would take with hardware RAID-5).

The losses from user space threading in RAIDFrame are negligible.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199807140601.XAA00726>