From owner-freebsd-questions@FreeBSD.ORG Thu Jun 19 15:06:38 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DF46E1065677 for ; Thu, 19 Jun 2008 15:06:38 +0000 (UTC) (envelope-from ml@t-b-o-h.net) Received: from vjofn.tucs-beachin-obx-house.com (unknown [IPv6:2001:470:1f00:ffff::5e5]) by mx1.freebsd.org (Postfix) with ESMTP id 961E88FC18 for ; Thu, 19 Jun 2008 15:06:38 +0000 (UTC) (envelope-from ml@t-b-o-h.net) Received: (from tbohml@localhost) by vjofn.tucs-beachin-obx-house.com (8.14.2/8.14.2) id m5JF6PSm021061; Thu, 19 Jun 2008 11:06:25 -0400 (EDT) From: Tuc at T-B-O-H Message-Id: <200806191506.m5JF6PSm021061@vjofn.tucs-beachin-obx-house.com> To: daniel_k_eriksson@telia.com (Daniel Eriksson) Date: Thu, 19 Jun 2008 11:06:25 -0400 (EDT) In-Reply-To: <4F9C9299A10AE74E89EA580D14AA10A61A1947@royal64.emp.zapto.org> from "Daniel Eriksson" at Jun 19, 2008 11:02:14 AM X-Mailer: ELM [version 2.5 PL6] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-questions@freebsd.org, ryan.coleman@cwis.biz Subject: Re: "Fixing" a RAID X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Jun 2008 15:06:39 -0000 > > I recently had this happen to me on an 8 x 1 TB RAID-5 array on a > Highpoint RocketRAID 2340 controller. For some unknown reason two drives > developed unreadable sectors within hours of each other. To make a long > story short, the way I "fixed" this was to: > Not FreeBSD related, so you can delete now if not interested... We had a 1.5TB NetApp filer at my previous place. It was originally backed up by another 1.5TB filer taking snapshots every few hours. After a few years, the customer decided it was "too safe" so they used the 2nd filer for something else. A month later, we had a double disk failure in the same volume. The NetApp freaked out and rebooted, but when it did it marked one disk dead, and the other as fine. Since there was a hot spare, it started to attempt a rebuild. It took 9 hours for a 72G disk, and the 1/2 failed drive sounded like it was putting the head through the media with lead shot in it. The filer performed at about 1/2 speed during that time. The SECOND that it finished, and the software claimed that the array was in optimal mode, we immediately pulled the bad disk out and replaced it with a fresh disk. That rebuild went fine. Pulled the failed disk, and put another disk in for hot spare. Not sure if its a testimony to NetApp, or our and the customers luck. They had specifically not wanted backups, and rebuilding the data would have taken months, many man hours, and loss of revenue to the site. Ever since then, I try to get disks made at different times and different batches. You figure that if they were MADE around the same time, they will most likely DIE around the same time. :) Tuc