Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Oct 2004 22:35:08 -0600
From:      "Kenneth D. Merry" <ken@freebsd.org>
To:        Roisin Murphy <Roisin.Murphy@gmail.com>
Cc:        freebsd-hardware@freebsd.org
Subject:   Re: sata raid & write cache state
Message-ID:  <20041011043508.GA72113@nargothrond.kdm.org>
In-Reply-To: <b21e6cca041010181932879aeb@mail.gmail.com>
References:  <b21e6cca041010181932879aeb@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Oct 10, 2004 at 18:19:11 -0700, Roisin Murphy wrote:
> hi
> 
> I'm thinking of setting up a raid array with sata disks
> so far this intel controller seems to be winning :) 
> http://www.intel.com/design/servers/RAID/SRCS16/
> 
> the handbook says: 11.12.1.5 hw.ata.wc
> FreeBSD 4.3 flirted with turning off IDE write caching. This reduced
> write bandwidth to IDE disks but was considered necessary due to
> serious data consistency issues introduced by hard drive vendors. The
> problem is that IDE drives lie about when a write completes. With IDE
> write caching turned on, IDE hard drives not only write data to disk
> out of order, but will sometimes delay writing some blocks
> indefinitely when under heavy disk loads. A crash or power failure may
> cause serious file system corruption. FreeBSD's default was changed to
> be safe. Unfortunately, the result was such a huge performance loss
> that we changed write caching back to on by default after the release.
> You should check the default on your system by observing the hw.ata.wc
> sysctl variable. If IDE write caching is turned off, you can turn it
> back on by setting the kernel variable back to 1. This must be done
> from the boot loader at boot time. Attempting to do it after the
> kernel boots will have no effect.
> 
> well, this is what i also heard from a friend of mine, that he has
> seen too many dead ide raid setups, because of that ata command set,
> that has no way to tell the state of the write cache content.
> 
> 1. now, the question is, is this the same with the SATA command set?
> or is sata more like scsi in this respect?

Things are mostly the same with SATA disks, especially if the disk doesn't
support tagged queueing.

> 2. i haven't read much about raid controllers yet, but i would think
> that with a proper hardware raid5 controller, there's no need for
> write cache being enabled on the actual disks, as the controller with
> its cache could optimize the disk writes. Is this so? does a proper
> hardware raid controller switch the cache off on its disks?

A proper hardware RAID controller does switch the cache off on its disks,
but I seriously doubt you will find an ATA or SATA RAID controller that
does that.  You basically get incredibly lousy performance without write
caching turned on.

There are two reasons for this.  One is obvious, the other I'm not certain
about, you'll need to check the specs:

1.  Many (most) ATA and SATA drives do not support tagged queueing.
Because of this, you can only have one command outstanding to the drive at
a time.  A write cache masks this, since it will immediately tell you it
has completed the write command when in fact the data for that command has
just gone into cache.  That's the cool thing that a write cache does for
you -- it boosts throughput without tagged queueing.  (Note that many
Hitachi (formerly IBM) disks do support queueing.  Some other vendors
may as well, I don't know for sure.)

2.  The way tagged queueing in ATA (and SATA) works is that the status
phase for a given command must come directly after the data phase.  I'm not
sure if that is because they don't include a tag number in the status phase
or just what.  Anyway, this is the part I haven't verified for myself in
the specs.  If that is true, what it means is that even if your disk
supports tagged queueing, you won't be able to submit data for a bunch of
different write commands and then get separate status back.  As soon as you
submit data for one write, you have to wait for status to come back.  So
it's about as bad as having a drive that can't do tagged queueing.  

The only thing a scheme like that would give you is a bit better
performance on random reads.

> 3. Is this the case with scsi also? if the disk could fully report
> write cache state, the array couldn't mess up/die like they report it
> with ide raid setups, right? is the disk write cache enabled on scsi
> raid5 setups?

Well, any decent SCSI RAID controller will either just disable the write
cache altogether, or it will give the user the option of disabling the
write cache.

With SCSI disks, you don't have the same performance problem that you do
with ATA disks doing tagged queueing (or not), because:

1.  All modern SCSI disks can do tagged queueing.

2.  With SCSI disks, the status phase is completely decoupled from the data
phase.  There is no ordering constraint.  Since a tag number comes along
with the status, the controller knows which command completed.

Also, keep in mind that with any RAID controller that does RAID-5 or
RAID-1, you should get a battery backed cache.  It may be an option, but
you should get it.  This will protect you from the RAID-5/RAID-1 write
hole.  That is, when you have a crash, you don't know:

1.  What writes you have outstanding.
2.  Whether all, part, or none of those writes got committed.

So without a battery backed cache, you will have to scrub your entire
array to make sure the parity is consistent, and you still will not know
whether some of your data was corrupted.  All you can really do is sync the
parity.

Of course a battery backed cache is useless if write caching is turned on
on your drives.  So it will be a useless feature with most ATA or SATA RAID
controllers, because it's unlikely that they would want to tank their
performance badly by disabling write caching.

With that sort of setup, you should just run with a UPS, and make sure your
machine shuts down cleanly in the event of a power outage.

Don't get me wrong, I'm not bashing ATA, SATA or SATA RAID controllers.
The disks are much cheaper, and make a lot of sense for some applications.
As long as you know the limitations, you can use that sort of hardware
successfully.  (FWIW, my day job has a lot to do with SATA RAID.)

> 4. with that intel SRCS16 controller, would hw.ata.wc sysctl work?
> could i turn the cache off on my sata disks like that? or do i need
> that manufacturer DOS floppy with utilities to turn the defaults
> on/off on my disks? (ideally the controller would take care of this)

Nope, the sysctl wouldn't work, if it is a processor based RAID controller.

> 5. also that intel SRCS16 controller should support 'online capacity
> expansion' < that means if i start with 3 disks, i can add more disk
> if i need more storage, without having to recreate the array from
> scratch?

If it does support online capacity expansion, then yes, you would be able
to add capacity to the array.  What they probably do is just
re-stripe/create the array on the fly.  With that, combined with growfs(8)
you could add space to your system.  (You could also just add another
partition if that's more convenient.)

Ken
-- 
Kenneth Merry
ken@FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041011043508.GA72113>