From owner-freebsd-stable@FreeBSD.ORG Fri Jan 13 18:00:06 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7949016A41F for ; Fri, 13 Jan 2006 18:00:06 +0000 (GMT) (envelope-from mike@sentex.net) Received: from smarthost2.sentex.ca (smarthost2.sentex.ca [205.211.164.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3BC2A43D49 for ; Fri, 13 Jan 2006 18:00:03 +0000 (GMT) (envelope-from mike@sentex.net) Received: from lava.sentex.ca (pyroxene.sentex.ca [199.212.134.18]) by smarthost2.sentex.ca (8.13.4/8.13.4) with ESMTP id k0DHxgwD089008; Fri, 13 Jan 2006 12:59:42 -0500 (EST) (envelope-from mike@sentex.net) Received: from simian.sentex.net (simeon.sentex.ca [192.168.43.27]) by lava.sentex.ca (8.13.3/8.13.3) with ESMTP id k0DHxbfR053371 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 13 Jan 2006 12:59:37 -0500 (EST) (envelope-from mike@sentex.net) Message-Id: <6.2.3.4.0.20060113125258.045378d8@64.7.153.2> X-Mailer: QUALCOMM Windows Eudora Version 6.2.3.4 Date: Fri, 13 Jan 2006 12:59:37 -0500 To: Doug Ambrisko From: Mike Tancsa In-Reply-To: <200601131659.k0DGxmob083744@ambrisko.com> References: <200601122020.59843.jkim@FreeBSD.org> <200601131659.k0DGxmob083744@ambrisko.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Virus-Scanned: by amavisd-new X-Scanned-By: MIMEDefang 2.51 on 205.211.164.50 Cc: freebsd-stable@freebsd.org Subject: Re: 6.0 on Dell 1850 with PERC4e/DC RAID? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jan 2006 18:00:06 -0000 At 11:59 AM 13/01/2006, Doug Ambrisko wrote: >| >| That's lame. Under what condition does it happen, do you know? > >Running RAID 10, a drive was swapped and the rebuild started on the >replacement drive. The rebuild complained about the source drive >for the mirror rebuild having read errors that couldn't be recovered. >It continued on and finished re-creating the mirror. Then the RAID >proceeeded onto a background init which they normal did and started >failing that and re-starting the background init over and over again. >The box changed the RAID from degraded to optimal when the rebuild >completed (with errors). Do a dd of the entire RAID logical device >returned an error at the bad sector since it couldn't recover that. >The RAID controller reported an I/O error and still left the RAID as >optimal. > >We reported this and where told that's the way it is designed :-( Interesting timing as I ran into this sort of situation on the weekend on a 3ware drive in RAID1. The card had complained for a week about read errors on drive 1. We thought we would wait until the weekend maintenance window to swap it out. Sadly, before that window, drive zero totally died a horrible death. We popped in a new drive on port zero, started the rebuild, and it crapped out saying there was a read error on drive 1. However, there is a check box that says continue the build, even with errors on the source drive. This setup seems to give you the best of both worlds. We did a quick check of the resultant files compared to backups and only a couple were toasted. (The box is going to be retired in a month, so if there is other hidden fs corruption if it holds out for another 3 weeks we dont care too much). The correct approach would be to do a total restore of course, but this was good enough for us in this situation. I guess the question is, is this RAID1 in a proper mirror given that there are hard errors on the drive on port 1 ? ---Mike