Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Sep 2008 09:32:07 -0700
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        Zaphod Beeblebrox <zbeeble@gmail.com>
Cc:        freebsd-hackers@freebsd.org, kpielorz_lst@tdx.co.uk
Subject:   Re: ZFS w/failing drives - any equivalent of Solaris FMA?
Message-ID:  <20080912163207.GE60094@icarus.home.lan>
In-Reply-To: <5f67a8c40809120904o49b6e410l5b65a20f5216202@mail.gmail.com>
References:  <C984A6E7B1C6657CD8C4F79E@Slim64.dmpriest.net.uk> <200809121544.m8CFiRHQ099725@lurza.secnetix.de> <5f67a8c40809120904o49b6e410l5b65a20f5216202@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Sep 12, 2008 at 12:04:27PM -0400, Zaphod Beeblebrox wrote:
> On Fri, Sep 12, 2008 at 11:44 AM, Oliver Fromme <olli@lurza.secnetix.de>wrote:
> > Did you try "atacontrol detach" to remove the disk from
> > the bus?  I haven't tried that with ZFS, but gmirror
> > automatically detects when a disk has gone away, and
> > doesn't try to do anything with it anymore.  It certainly
> > should not hang the machine.  After all, what's the
> > purpose of a RAID when you have to reboot upon drive
> > failure.  ;-)
> 
> To be fair, many "home" users run RAID without the expectation of being able
> to hot swap the drives.  While RAID can provide high availability, but it
> can also provide simple data security.

RAID only ensures a very, very tiny part of "data security", and it
depends greatly on what RAID implementation you use.  No RAID
implementation I know of provides against transparent data corruption
("bit-rot"), and many RAID controllers and RAID drivers have bugs that
induce corruption (to date, that's (very old ATA) Highpoint chips,
nVidia/nForce chips, JMicron or Silicon Image chips -- all of these are
used on consumer boards).

A big problem is also that end-users *still* think RAID is a replacement
for doing backups.  :-(

> To your point... I suppose you have to reboot at some point after the drive
> failure, but my experience has been that the reboot has been under my
> control some time after the failure (usually when I have the replacement
> drive).

For home use, sure.  Since most home/consumer systems do not include
hot-swappable drive bays, rebooting is required.  Although more and more
consumer motherboards are offering AHCI -- which is the only reliable
way you'll get that capability with SATA.

In my case with servers in a co-lo, it's not acceptable.  Our systems
contain SATA backplanes that support hot-swapping, and it works how it
should (yank the disk, replace with a new one) on Linux -- there is no
need to do a bunch of hoopla like on FreeBSD.  On FreeBSD, with that
hoopla, also take the risk of inducing a kernel panic.  That risk does
not sit well with me, but thankfully I've only been in that situation
(replacing a bad disk + using hot-swapping) once -- and it did work.

At my home, I have a pseudo-NAS system running FreeBSD.  The case is
from Supermicro, a mid-tower, and has a SATA backplane that supports
hot-swapping.  I use ZFS on this system, sporting 3 disks and one
(non-ZFS) for boot/OS.  But because I'm using ata(4) -- see above.

Individuals on -stable and other lists using ZFS have posted their
experiences with disk failures.  I believe to date I've seen one which
worked flawlessly, and the others reporting strange issues with
resilvering, or in a couple cases, lost all their zpools permanently.
Of course, it's very rare in this day and age for people to mail a
mailing list reporting *successes* with something -- people usually only
mail if something *fails*.  :-)

That said, pjd@'s dedication to getting ZFS working reliably on FreeBSD
is outstanding.  It's a great filesystem replacement, and even the Linux
folks are a bit jealous over how simple and painless it is.  I can
share their jealousy -- I've looked at the LVM docs... never again.

> About the only real improvement I'd like to see in this setup is the ability
> to spin down idle drives.  That would be an ideal setup for the home RAID
> array.

There is a FreeBSD port which handles this, although such a feature
should ideally be part of the ata(4) system (as should TCQ/NCQ and a
slew of other things -- some of those are being worked on).

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080912163207.GE60094>