Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Oct 2004 20:07:14 -0400
From:      Chuck Swiger <cswiger@mac.com>
To:        Scott Long <scottl@freebsd.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: FreeBSD 5.3b7and poor ata performance
Message-ID:  <417D9532.5000103@mac.com>
In-Reply-To: <417D812F.1040404@freebsd.org>
References:  <14479.1098695558@critter.freebsd.dk> <417D25E8.6080804@ng.fadesa.es> <200410251928.01536.victor@alf.dyndns.ws> <"200410251837.58257.Thoma s.Sparrev ohn"@btinternet.com> <417D3F12.20302@DeepCore.dk> <417D40A1.9030802@ng.fadesa.es> <417D45F1.9090504@freebsd.org> <77F3FD4D-26BE-11D9-9A2F-003065ABFD92@mac.com> <F5F15CA0-26C5-11D9-9A2F-003065ABFD92@mac.com> <417D65F1.2040809@freebsd.org> <p0600205fbda318006656@[10.0.1.3]> <417D6F4C.9000404@freebsd.org> <p06002061bda3224cd029@[10.0.1.3]> <64029B30-26D2-11D9-9A2F-003065ABFD92@mac.com> <417D812F.1040404@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Scott Long wrote:
> Charles Swiger wrote:
[ ...let's pray to the format=flowed gods... ]
>> If you prefer...            ...consider using:
>> ----------------------------------------------
>> performance, reliability:     RAID-1 mirroring
>> performance, cost:            RAID-0 striping
>> reliability, performance:     RAID-1 mirroring (+ hot spare, if possible)
>> reliability, cost:            RAID-5 (+ hot spare)
>> cost, reliability:            RAID-5
>> cost, performance:            RAID-0 striping
> 
> It's more complex than that.

Certainly.  I plead guilty both of generalizing, and of simplifying 
matters...but I'm not the first one to do so!  :-)  For example, I didn't 
mention RAID-10 or -50, although both can do very well if you've got enough disks.

Still, I suspect the table above may be helpful to someone.

> Are you talking software RAID, PCI RAID, or external RAID?

I didn't specify.  I am not sure that considering a specific example would 
change the generalization above, since other considerations like the ratio of 
reads to writes also have a significant impact on whether, say, RAID-5 or 
RAID-1 is a better choice for a particular case.

However, if you can point me to general counterexamples where this issue would 
change the recommendations I made above, I would be happy to consider them.

> That affects all three quite a bit.  Also, how do you define reliability?

At the physical component layer, reliability gets defined by MTBF #s for the 
various failure modes, things like spindle bearing wear, # of start-stop 
cycles, etc.  SMART provides some helpful parameters for disks, and there is 
the I2B or SMBUS mechanisms for doing hardware-level checking of the 
controller cards or the MB.

At the logical level, considering a RAID system as a whole, reliability 
equates to "availability", which can be measured by how long (or whether) the 
data on the RAID volume is _correctly_ available to the system.

> Do you verify reads on RAID-1 and 5?

This is answered by how you value the performance vs. reliability tradeoff.

> Also, what about error recovery?

Are you talking about issues like, "what are your chances of losing data if 
two drives fail"?

>> That rule dates back to the early days of SCSI-2, where you could fit 
>> about four drives worth of aggregate throughput over a 40Mbs 
>> ultra-wide bus.  The idea behind it is still sound, although the 
>> numbers of drives you can fit obviously changes whether you talk about 
>> ATA-100 or SATA-150.
> 
> The formula here is simple:
> 
> ATA: 2
> SATA: 1
> 
> So the channel transport starts becoming irrlevant now (except when you
> talk about SAS and having bonded channels going to switches).  The
> limiting factor again becomes PCI.

I absolutely agree that your consumer-grade 32-bit, 33MHz PCI is a significant 
limiting factor and will probably act as a bottle bottleneck even to a 
four-disk RAID config.

> An easy example is the software RAID cards that are based on the Marvell 8
> channel SATA chip.  It can drive all 8 drives at max platter speed if you
> have enough PCI bandwidth (and I've tested this recently with FreeBSD 5.3,
> getting >200 MB/s across 4 drives).  However, you're talking about
> PCI-X-100 bandwidth at that point, which is not what most people have in
> their desktop systems.

True, although that will gradually change over the next year or two as PCI-X 
systems like the AMD Opteron and the G5 Macs get adopted.

Besides, given the quality trends of consumer-grade hard drives, more and more 
people are using RAID to save them from a 16-month old dead drive (brought to 
you courtesy of vendors H, I, or Q).

> And for reasons of reliability, I wouldn't consider software RAID to
> be something that you would base your server-class storage on other than
> to mirror the boot drive so a failure there doesn't immediately bring
> you down.

If you cannot trust your OS to handle your data via software RAID properly, 
why should you trust the OS to pass data on to a hardware RAID controller 
which actually is valid?

For example, it seems to me that a failure mode such as a bad memory chip 
would result in incorrect data going to the disks regardless of whether you 
were using software or hardware RAID.

Ditto for an application-level bug which generates the wrong results. [1]

[ ... ]
> What is interesting is measuring how many single-sector transfers can be
> done per second and how much CPU that consumes.  I used to be able
> to get about 11,000 io/s on an aac card on a 5.2-CURRENT system from
> last winter.  Now I can only get about 7,000.  I not sure where the
> problem is yet, unfortunately.  I'm using KSE pthreads to generate a
> lot of parallel requests with as little overhead as possible, so maybe
> something there has changed, or maybe something in the I/O path above
> the driver has changed, or maybe something in interrupt handling or
> shceduling has changed.  It would be interesting to figure this out
> since this definitenly shows a problem.

Thanks for your thoughts.

-- 
-Chuck

[1]: This is why RAID is still not a substitute for good backups...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?417D9532.5000103>