Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Aug 2014 10:35:36 +0200 (CEST)
From:      =?ISO-8859-1?Q?Trond_Endrest=F8l?= <Trond.Endrestol@fagskolen.gjovik.no>
To:        Scott Bennett <bennett@sdf.org>
Cc:        freebsd@qeng-ho.org, freebsd-questions@freebsd.org
Subject:   Re: gvinum raid5 vs. ZFS raidz
Message-ID:  <alpine.BSF.2.11.1408071034510.64214@mail.fig.ol.no>
In-Reply-To: <201408070831.s778VhJc015365@sdf.org>
References:  <201408020621.s726LsiA024208@sdf.org> <alpine.BSF.2.11.1408020356250.1128@wonkity.com> <53DCDBE8.8060704@qeng-ho.org> <201408060556.s765uKJA026937@sdf.org> <53E1FF5F.1050500@qeng-ho.org> <201408070831.s778VhJc015365@sdf.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 7 Aug 2014 03:31-0500, Scott Bennett wrote:

> Arthur Chance <freebsd@qeng-ho.org> wrote:
> 
> > On 06/08/2014 06:56, Scott Bennett wrote:
> > > Arthur Chance <freebsd@qeng-ho.org> wrote:
> > >> 
> > >> [stuff deleted --SB]
> > >       I wonder if what varies is the amount of space taken up by the
> > > checksums.  If there's a checksum for each block, then the block size
> > > would change the fraction of the space lost to checksums, and the parity
> > > for the checksums would thus also change.  Enough to matter?  Maybe.
> >
> > I'm not a file system guru, but my (high level) understanding is as 
> > follows. Corrections from anyone more knowledgeable welcome.
> >
> > 1. UFS and ZFS both use tree structures to represent files, with the 
> > data stored at the leaves and bookkeeping stored in the higher nodes. 
> > Therefore the overhead scales as the log of the data size, which is a 
> > negligible fraction for any sufficiently large amount of data.
> >
> > 2. UFS doesn't have data checksums, it relies purely on the hardware 
> > checksums. (This is the area I'm least certain of.)
> 
>      What hardware checksums are there?  I wasn't aware that this sort of
> hardware kept any.

To quote http://en.wikipedia.org/wiki/Disk_sector:

In disk drives, each physical sector is made up of three basic parts, 
the sector header, the data area and the error-correcting code (ECC).

> > 3. ZFS keeps its checksums in a Merkel tree 
> > (http://en.wikipedia.org/wiki/Merkle_tree) so the checksums are held in 
> > the bookkeeping blocks, not in the data blocks. This simply changes the 
> > constant multiplier in front of the logarithm for the overhead. Also, I 
> > believe ZFS doesn't use fixed size data blocks, but aggregates writes 
> > into blocks of up to 128K.
> >
> > Personally, I don't worry about the overheads of checksumming as the 
> > cost of the parity stripe(s) in raidz is dominant. It's a cost well 
> > worth paying though - I have a 3 disk raidz1 pool and a disk went bad 
> > within 3 months of building it (the manufacturer turned out to be having 
> > a few problems at the time) but I didn't lose a byte.
> >
>      Good testimonial.  I'm not worried about the checksum space either.
> I figure the benefits make it cheap at the price.  Of more concern to me
> now is how I'm going to come up with at least two more 2 TB drives to set
> up a raidz2 with a tolerably small fraction of the total space being tied
> up in combined ZFS overhead (i.e., bookkeeping, parity, checksums, etc.)
> 
> 
>                                   Scott Bennett, Comm. ASMELG, CFIAG
> **********************************************************************
> * Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
> *--------------------------------------------------------------------*
> * "A well regulated and disciplined militia, is at all times a good  *
> * objection to the introduction of that bane of all free governments *
> * -- a standing army."                                               *
> *    -- Gov. John Hancock, New York Journal, 28 January 1790         *
> **********************************************************************

-- 
+-------------------------------+------------------------------------+
| Vennlig hilsen,               | Best regards,                      |
| Trond Endrestøl,              | Trond Endrestøl,                   |
| IT-ansvarlig,                 | System administrator,              |
| Fagskolen Innlandet,          | Gjøvik Technical College, Norway,  |
| tlf. mob.   952 62 567,       | Cellular...: +47 952 62 567,       |
| sentralbord 61 14 54 00.      | Switchboard: +47 61 14 54 00.      |
+-------------------------------+------------------------------------+
From owner-freebsd-questions@FreeBSD.ORG  Thu Aug  7 09:37:51 2014
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 168E9CA3
 for <freebsd-questions@freebsd.org>; Thu,  7 Aug 2014 09:37:51 +0000 (UTC)
Received: from sdf.lonestar.org (mx.sdf.org [192.94.73.24])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "mx.sdf.org", Issuer "SDF.ORG" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id E76062CF4
 for <freebsd-questions@freebsd.org>; Thu,  7 Aug 2014 09:37:50 +0000 (UTC)
Received: from sdf.org (IDENT:bennett@sdf.lonestar.org [192.94.73.15])
 by sdf.lonestar.org (8.14.8/8.14.5) with ESMTP id s779alOS002276
 (using TLSv1/SSLv3 with cipher DHE-RSA-AES256-GCM-SHA384 (256 bits) verified
 NO); Thu, 7 Aug 2014 09:36:48 GMT
Received: (from bennett@localhost)
 by sdf.org (8.14.8/8.12.8/Submit) id s779akMv017524;
 Thu, 7 Aug 2014 04:36:46 -0500 (CDT)
From: Scott Bennett <bennett@sdf.org>
Message-Id: <201408070936.s779akMv017524@sdf.org>
Date: Thu, 07 Aug 2014 04:36:46 -0500
To: Trond.Endrestol@fagskolen.gjovik.no
Subject: Re: gvinum raid5 vs. ZFS raidz
References: <201408020621.s726LsiA024208@sdf.org>
 <alpine.BSF.2.11.1408020356250.1128@wonkity.com>
 <53DCDBE8.8060704@qeng-ho.org> <201408060556.s765uKJA026937@sdf.org>
 <53E1FF5F.1050500@qeng-ho.org> <201408070831.s778VhJc015365@sdf.org>
 <alpine.BSF.2.11.1408071034510.64214@mail.fig.ol.no>
In-Reply-To: <alpine.BSF.2.11.1408071034510.64214@mail.fig.ol.no>
User-Agent: Heirloom mailx 12.4 7/29/08
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd@qeng-ho.org, freebsd-questions@freebsd.org
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/>;
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Aug 2014 09:37:51 -0000

Trond Endrest?l <Trond.Endrestol@fagskolen.gjovik.no> wrote:
> On Thu, 7 Aug 2014 03:31-0500, Scott Bennett wrote:
> > Arthur Chance <freebsd@qeng-ho.org> wrote:
> > > On 06/08/2014 06:56, Scott Bennett wrote:
> > > > Arthur Chance <freebsd@qeng-ho.org> wrote:
> > > >> 
> > > >> [stuff deleted --SB]
> > > >       I wonder if what varies is the amount of space taken up by the
> > > > checksums.  If there's a checksum for each block, then the block size
> > > > would change the fraction of the space lost to checksums, and the parity
> > > > for the checksums would thus also change.  Enough to matter?  Maybe.
> > >
> > > I'm not a file system guru, but my (high level) understanding is as 
> > > follows. Corrections from anyone more knowledgeable welcome.
> > >
> > > 1. UFS and ZFS both use tree structures to represent files, with the 
> > > data stored at the leaves and bookkeeping stored in the higher nodes. 
> > > Therefore the overhead scales as the log of the data size, which is a 
> > > negligible fraction for any sufficiently large amount of data.
> > >
> > > 2. UFS doesn't have data checksums, it relies purely on the hardware 
> > > checksums. (This is the area I'm least certain of.)
> > 
> >      What hardware checksums are there?  I wasn't aware that this sort of
> > hardware kept any.
>
> To quote http://en.wikipedia.org/wiki/Disk_sector:
>
> In disk drives, each physical sector is made up of three basic parts, 
> the sector header, the data area and the error-correcting code (ECC).

     That's interesting, and I know it was true in the days of minicomputers.
However, it appears to be out of date, based upon 1) the observed fact that
corrupted data *do* get recorded onto today's PC-style disk drives with no
indication that an error has occurred, no parity bits are present in the
processor chips, memory cards, motherboards, PATA/SATA/SCSI/etc. controllers,
nor 2) the disk drives themselves, as confirmed by the technical support guy
I spoke with about it at Seagate/Samsung recently.  That guy said that there
is *no parity-checking* of data written to/read from the disks and that some
silent errors are now considered to be "normal" on disks whose capacities
exceed 1 TB.
>
> [remainder deleted  --SB]


                                  Scott Bennett, Comm. ASMELG, CFIAG
**********************************************************************
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
**********************************************************************



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.11.1408071034510.64214>