From owner-freebsd-stable@FreeBSD.ORG Mon Dec 13 18:28:53 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3941F16A4CE for ; Mon, 13 Dec 2004 18:28:53 +0000 (GMT) Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2478943D1F for ; Mon, 13 Dec 2004 18:28:53 +0000 (GMT) (envelope-from dwhite@gumbysoft.com) Received: by carver.gumbysoft.com (Postfix, from userid 1000) id 14E8572DD4; Mon, 13 Dec 2004 10:28:53 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by carver.gumbysoft.com (Postfix) with ESMTP id 1006072DCB; Mon, 13 Dec 2004 10:28:53 -0800 (PST) Date: Mon, 13 Dec 2004 10:28:53 -0800 (PST) From: Doug White To: Joe Rhett In-Reply-To: <20041213060549.GE78120@meer.net> Message-ID: <20041213102333.V92964@carver.gumbysoft.com> References: <20041213052628.GB78120@meer.net> <20041213054159.GC78120@meer.net><20041213060549.GE78120@meer.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-stable@freebsd.org cc: =?iso-8859-1?Q?S=F8ren?= Schmidt Subject: Re: drive failure during rebuild causes page fault X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Dec 2004 18:28:53 -0000 On Sun, 12 Dec 2004, Joe Rhett wrote: > On Sun, Dec 12, 2004 at 09:59:16PM -0800, Doug White wrote: > > Thats a nice shotgun you have there. > > Yessir. And that's what testing is designed to uncover. The question is > why this works, and how do we prevent it? I'm sure Soren appreciates you donating your feet to the cause :) Why it works: the system assumes the administrator is competent enough to not yank a disk that is being rebuilt to. > Is there a proper way to handle these sort of events? If so, where is it > documented? > > And fyi just pulling the drives causes the same failure so that means that > RAID1 buys you nothing because your system will also crash. This is why I don't trust ATA RAID for fault tolerance -- it'll save your data, but the system will tank. Since the disk state is maintained by the OS and not abstracted by a separate processor, if a disk dies in a particularly bad way the system may not be able to cope. -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org