From owner-freebsd-stable@FreeBSD.ORG  Fri Jan 13 18:00:06 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7949016A41F
	for <freebsd-stable@freebsd.org>; Fri, 13 Jan 2006 18:00:06 +0000 (GMT)
	(envelope-from mike@sentex.net)
Received: from smarthost2.sentex.ca (smarthost2.sentex.ca [205.211.164.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3BC2A43D49
	for <freebsd-stable@freebsd.org>; Fri, 13 Jan 2006 18:00:03 +0000 (GMT)
	(envelope-from mike@sentex.net)
Received: from lava.sentex.ca (pyroxene.sentex.ca [199.212.134.18])
	by smarthost2.sentex.ca (8.13.4/8.13.4) with ESMTP id k0DHxgwD089008;
	Fri, 13 Jan 2006 12:59:42 -0500 (EST) (envelope-from mike@sentex.net)
Received: from simian.sentex.net (simeon.sentex.ca [192.168.43.27])
	by lava.sentex.ca (8.13.3/8.13.3) with ESMTP id k0DHxbfR053371
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 13 Jan 2006 12:59:37 -0500 (EST) (envelope-from mike@sentex.net)
Message-Id: <6.2.3.4.0.20060113125258.045378d8@64.7.153.2>
X-Mailer: QUALCOMM Windows Eudora Version 6.2.3.4
Date: Fri, 13 Jan 2006 12:59:37 -0500
To: Doug Ambrisko <ambrisko@ambrisko.com>
From: Mike Tancsa <mike@sentex.net>
In-Reply-To: <200601131659.k0DGxmob083744@ambrisko.com>
References: <200601122020.59843.jkim@FreeBSD.org>
	<200601131659.k0DGxmob083744@ambrisko.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
X-Virus-Scanned: by amavisd-new
X-Scanned-By: MIMEDefang 2.51 on 205.211.164.50
Cc: freebsd-stable@freebsd.org
Subject: Re: 6.0 on Dell 1850 with PERC4e/DC RAID?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Jan 2006 18:00:06 -0000

At 11:59 AM 13/01/2006, Doug Ambrisko wrote:
>|
>| That's lame.  Under what condition does it happen, do you know?
>
>Running RAID 10, a drive was swapped and the rebuild started on the
>replacement drive.  The rebuild complained about the source drive
>for the mirror rebuild having read errors that couldn't be recovered.
>It continued on and finished re-creating the mirror.  Then the RAID
>proceeeded onto a background init which they normal did and started
>failing that and re-starting the background init over and over again.
>The box changed the RAID from degraded to optimal when the rebuild
>completed (with errors).  Do a dd of the entire RAID logical device
>returned an error at the bad sector since it couldn't recover that.
>The RAID controller reported an I/O error and still left the RAID as
>optimal.
>
>We reported this and where told that's the way it is designed :-(


Interesting timing as I ran into this sort of situation on the 
weekend on a 3ware drive in RAID1. The card had complained for a week 
about read errors on drive 1. We thought we would wait until the 
weekend maintenance window to swap it out.  Sadly, before that 
window, drive zero totally died a horrible death.  We popped in a new 
drive on port zero, started the rebuild, and it crapped out saying 
there was a read error on drive 1.  However, there is a check box 
that says continue the build, even with errors on the source drive.

This setup seems to give you the best of both worlds.  We did a quick 
check of the resultant files compared to backups and only a couple 
were toasted. (The box is going to be retired in a month, so if there 
is other hidden fs corruption if it holds out for another 3 weeks we 
dont care too much). The correct approach would be to do a total 
restore of course, but this was good enough for us in this 
situation.  I guess the question is, is this RAID1 in a proper mirror 
given that there are hard errors on the drive on port 1 ?

         ---Mike