From owner-freebsd-questions@freebsd.org  Mon Apr 18 22:05:23 2016
Return-Path: <owner-freebsd-questions@freebsd.org>
Delivered-To: freebsd-questions@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E89DDB132BA
 for <freebsd-questions@mailman.ysv.freebsd.org>;
 Mon, 18 Apr 2016 22:05:23 +0000 (UTC)
 (envelope-from galtsev@kicp.uchicago.edu)
Received: from cosmo.uchicago.edu (cosmo.uchicago.edu [128.135.70.90])
 by mx1.freebsd.org (Postfix) with ESMTP id AF1491D59
 for <freebsd-questions@freebsd.org>; Mon, 18 Apr 2016 22:05:23 +0000 (UTC)
 (envelope-from galtsev@kicp.uchicago.edu)
Received: by cosmo.uchicago.edu (Postfix, from userid 48)
 id 841D8CB8CA2; Mon, 18 Apr 2016 17:05:22 -0500 (CDT)
Received: from 128.135.52.6 (SquirrelMail authenticated user valeri)
 by cosmo.uchicago.edu with HTTP;
 Mon, 18 Apr 2016 17:05:22 -0500 (CDT)
Message-ID: <64031.128.135.52.6.1461017122.squirrel@cosmo.uchicago.edu>
In-Reply-To: <20160418210257.GB86917@neutralgood.org>
References: <571533F4.8040406@bananmonarki.se> <57153E6B.6090200@gmail.com>
 <20160418210257.GB86917@neutralgood.org>
Date: Mon, 18 Apr 2016 17:05:22 -0500 (CDT)
Subject: Re:  Raid 1+0
From: "Valeri Galtsev" <galtsev@kicp.uchicago.edu>
To: "Kevin P. Neal" <kpn@neutralgood.org>
Cc: "Shamim Shahriar" <shamim.shahriar@gmail.com>,
 freebsd-questions@freebsd.org
Reply-To: galtsev@kicp.uchicago.edu
User-Agent: SquirrelMail/1.4.8-5.el5.centos.7
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Apr 2016 22:05:24 -0000


On Mon, April 18, 2016 4:02 pm, Kevin P. Neal wrote:
> On Mon, Apr 18, 2016 at 09:07:07PM +0100, Shamim Shahriar wrote:
>> On 18/04/2016 20:22, Bernt Hansson wrote:
>> > Hello list
>> >
>> >
>> > Used gstripe to stripe the arrays raid/r0 + r1 into stripe0
>> >
>> Hi
>>
>> I'm sure there are people with more expertise than I am, and they can
>> confirm either ways. But in my mind, given that you used RAID1 first
>> (mirror) and then used those two RAID1 to create a RAID0, this is
>> logically RAID 1+0. In other words, if you lose one disc from each of
>> the RAID1 you are still safe. If you lose both from one single mirror
>> array (highly unlikely), the stripe is unlikely to be of any use.
>
> Not that unlikely. If you take identical disks from the same company and
> subject them to identical load then the probability that they will fail
> around the same time is much higher than random.

Not correct. First of all, in most of the cases, failure of each of the
drives are independent events (unless you do weird thing such as you have
huge RAID with many drives and you never scan the whole surface of each
drive, then failure of one drive and the need to rebuild RAID may trigger
discovery of another "bad" drive but strictly speaking, the other drive
was in that bad state earlier, say with huge bad area you never tried to
access, so these events are not simultaneous, and situation can be avoided
by correct RAID configuration: scan surface of drives periodically!).

Now, let's take a look at the event of simultaneous failure of two drives.
Let me call "simultaneous" a failure of second drive within time interval
necessary to rebuild array after first drive failure. Let's consider this
time to be as long as 3 days (my RAIDs rebuild much faster). Now, let's
deal with RAID array with 6 years old drives. What percentage of drives
that die between their 6th and 7th years in service? Let's take really big
number: 1%. (I'm lucky, my drive longevity is better). So probability that
a drive dies during particular 3 days near its 6 years of age will be 0.01
* 3 (days) / 365 (days) so it is about ten to minus fourth power (0.0001).

Now, probability of two drives dying simultaneously (or rather second
drive dying within 3 days of first drive), as these two events are
independent, is just a product of probabilities of each of events, so it
is probability of single event squared. That will be ten to minus 8th
power (0.00000001)

Note that while the event of single drive though not that likely, still is
 observable, the event of two drives dying simultaneously (we estimated it
for one withing 3 days of another) is way less likely. So, I would say,
provided RAID configured well, it is quite unlikely you will have two
drives die on you (be are the same manufacturer/model/batch or something
absolutely different).

The RAID horror stories - which I have heard myself not once including
from people who it happened to - when several drives die "simultaneously"
are under further scrutiny just poorly configured RAIDs. Namely, several
drives are sort of dead and sit like that waiting the moment one drive is
actually kicked out of array, triggering array rebuild, and thus
triggering access to the whole surface of all drives, and discovery of
other drives that were sitting "almost dead" undiscovered just because
"raid verification" was not run every so often. I find even with
verification of drive full surface scan as rare as 1 Month I am reasonably
safe from hitting multiple drives die leading to data loss.

Somebody with better knowledge of probability theory will correct me if
I'm wrong some place.


This, of course, has nothing to do with occasional "small" data loss if no
other measures but just having data on RAID are taken. What I mean here
is, RAID has no ultimately reliable mechanism to discover which of drives
gives corrupted stripe on read (due to not correctly recovered by drive
firmware content of bad block) - if checksum doesn't match in RAID-5 there
is no way to figure mathematically which stripe is corrupted (one could
suspect particular drive if its response time is slower, as its firmware
might be busy attempting to recover data by multiple read and
superimposition of results). With RAID-6 situation is better, as excluding
one of the drives at a time you can discover self - consistent group of
N-1 drives, thus knowing the stripe on which drive you need to correct.
But I'm not sure RAID adapter firmware indeed does it. Which leads us to
conclusion: we need to take advantage of filesystems ensuring data
integrity (zfs comes to my mind).


Valeri

>
> That's why when I set up a mirror I always build it with drives from
> different companies. And I make it a three way mirror if I can.
> --
> Kevin P. Neal                                http://www.pobox.com/~kpn/
>
>  "Good grief, I've just noticed I've typed in a rant. Sorry chaps!"
>                             Keir Finlow Bates, circa 1998
> _______________________________________________
> freebsd-questions@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to
> "freebsd-questions-unsubscribe@freebsd.org"
>


++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++