Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Jul 2013 16:07:40 -0400 (EDT)
From:      Daniel Feenberg <feenberg@nber.org>
To:        "Steve O'Hara-Smith" <steve@sohara.org>
Cc:        frank2@fjl.co.uk, freebsd-questions@freebsd.org
Subject:   Re: to gmirror or to ZFS
Message-ID:  <Pine.GSO.4.64.1307201558490.29785@nber6>
In-Reply-To: <20130720201214.90206565e00675611996176d@sohara.org>
References:  <4DFBC539-3CCC-4B9B-AB62-7BB846F18530@gmail.com> <alpine.BSF.2.00.1307152211180.74094@wonkity.com> <976836C5-F790-4D55-A80C-5944E8BC2575@gmail.com> <51E51558.50302@ShaneWare.Biz> <51E52190.7020008@fjl.co.uk> <CAOaKuAVULVuZxtExp=mNi-J7kMNbsxbLJVsv8nKTA0-Ru6M3%2Bw@mail.gmail.com> <6CE5718E-2646-4D8C-AF98-37384B8851C5@mac.com> <CAOaKuAU8nhaoq%2B6hCVkB%2Bb-ppiBvYPKANdWJRnYcmKaPdecwZA@mail.gmail.com> <DCC017BE-A293-4C1B-8B6F-D9AF6F50125B@mac.com> <51EAC56C.4030801@fjl.co.uk> <20130720201214.90206565e00675611996176d@sohara.org>

next in thread | previous in thread | raw e-mail | index | archive | help


On Sat, 20 Jul 2013, Steve O'Hara-Smith wrote:

> On Sat, 20 Jul 2013 18:14:20 +0100
> Frank Leonhardt <frank2@fjl.co.uk> wrote:
>
>> It's worth noting, as a warning for anyone who hasn't been there, that
>> the number of times a second drive in a RAID system fails during a
>> rebuild is higher than would be expected. During a rebuild the remaining
>> drives get thrashed, hot, and if they're on the edge, that's when
>> they're going to go. And at the most inconvenient time. Okay - obvious
>> when you think about it, but this tends to be too late.
>
> 	Having the cabinet stuffed full of nominally identical drives
> bought at the same time from the same supplier tends to add to the
> probability that more than one drive is on the edge when one goes. It's a
> pity there are now only two manufacturers of spinning rust.

Often this is presummed to be the reason for double failures close in 
time, also common mode failures such as environment, a defective power 
supply or excess voltage can be blamed. I have to think that the most 
common "cause" for a second failure soon after the first is that a failed 
drive often isn't detected until a particular sector is read or written. 
Since the resilvering reads and writes every sector on multiple disks, 
including unused sectors, it can "detect" latent problems that may have 
existed since the drive was new but which haven't been used for data yet, 
or have gone bad since the last write, but haven't been read since.

The ZFS scrub processes only sectors with data, so it provides only 
partial protection against double failures.

Daniel Feenberg
NBER


>
> -- 
> Steve O'Hara-Smith <steve@sohara.org>
> _______________________________________________
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.64.1307201558490.29785>