Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Dec 2004 10:28:53 -0800 (PST)
From:      Doug White <dwhite@gumbysoft.com>
To:        Joe Rhett <jrhett@meer.net>
Cc:        =?iso-8859-1?Q?S=F8ren?= Schmidt <sos@DeepCore.dk>
Subject:   Re: drive failure during rebuild causes page fault
Message-ID:  <20041213102333.V92964@carver.gumbysoft.com>
In-Reply-To: <20041213060549.GE78120@meer.net>
References:  <20041213052628.GB78120@meer.net> <20041213054159.GC78120@meer.net><20041213060549.GE78120@meer.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 12 Dec 2004, Joe Rhett wrote:

> On Sun, Dec 12, 2004 at 09:59:16PM -0800, Doug White wrote:
> > Thats a nice shotgun you have there.
>
> Yessir.  And that's what testing is designed to uncover.  The question is
> why this works, and how do we prevent it?

I'm sure Soren appreciates you donating your feet to the cause :)

Why it works: the system assumes the administrator is competent enough to
not yank a disk that is being rebuilt to.

> Is there a proper way to handle these sort of events?  If so, where is it
> documented?
>
> And fyi just pulling the drives causes the same failure so that means that
> RAID1 buys you nothing because your system will also crash.

This is why I don't trust ATA RAID for fault tolerance -- it'll save your
data, but the system will tank.  Since the disk state is maintained by
the OS and not abstracted by a separate processor, if a disk dies in a
particularly bad way the system may not be able to cope.

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite@gumbysoft.com          |  www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041213102333.V92964>