FreeBSD Mail Archives

Date:      Thu, 21 Aug 2014 05:33:32 -0500
From:      Scott Bennett <bennett@sdf.org>
To:        freebsd-questions@freebsd.org, paul@kraus-haus.org
Subject:   Re: some ZFS questions
Message-ID:  <201408211033.s7LAXWbN006985@sdf.org>
In-Reply-To: <D3447347-512D-4DFF-BC4A-8ECE5B77F3D4@kraus-haus.org>
References:  <201408070816.s778G9ug015988@sdf.org> <53E3A836.9080904@my.hennepintech.edu> <201408080706.s78765xs022311@sdf.org> <D3447347-512D-4DFF-BC4A-8ECE5B77F3D4@kraus-haus.org>

Paul Kraus <paul@kraus-haus.org> wrote:

> On Aug 8, 2014, at 3:06, Scott Bennett <bennett@sdf.org> wrote:
>
> >     Well, I need space for encrypted file systems and for unencrypted file
> > systems at a roughly 1:3 ratio.  I have four 2 TB drives for the purpose
> > already, but apparently need at least two more.  If zvol+geli+UFS is not the
> > way and using multiple slices/partitions is not the way either, then how
> > should I set it up?
>
> How much data do you need to store?

     Initially, probably about 1.4 - 1.5 TB.  The rest would be space for
continuing to build the archives.
>
> With four 2TB drives I would setup either a 2x2 mirror (if I needed random I/O performance) or a 4 drive RAIDz2 if I needed reliability (the RAIDz2 configuration has a substantially higher MTTDL than a 2-way mirror).

     I'm now aiming for six drives in a raidz2.
>
> How much does the geli encryption cost in terms of space and speed? Is there a strong reason to not encrypt ALL the data? It can be in different zfs datasets (the ZFS term for a filesystem). In fact, I am NOT a fan of using the base dataset that is created with every zpool; I always create addition zfs datasets below the root of the zpool.

     Copying a file from one .eli partition to another can easily run up
to 25% on each of two cores of a Q6600.
     I didn't realize that there would be a "base data set"; I thought that
one had to create at least one file system or zvol in a pool in order to
use the space at all.
>
> Note that part of the reason it is not recommended to create more than one vdev per physical device is that load on one zpool can then effect the performance of the other. It also means that you cannot readily predict the performance of *either* as they will interact with each other. Neither of the above may apply to you, but knowing *why* can help you choose to ignore a recommendation :-)
>
     Again, this will be long-term, archival storage, so demand should
be quite low nearly all of the time.
> > 
> >     I see.  I had gathered from the zpool(8) man page's two forms of the
> > "replace" subcommand that the form shown above should be used if the failing
> > disk were still somewhat usable, but that the other form should be used if
> > the failing disk were already a failed disk.  I figured from that that ZFS
> > would try to get whatever it could from the failing disk and only recalculate
> > from the rest for blocks that couldn't be read intact from the failing disk.
> > If that is not the case, then why bother to have two forms of the "replace"
> > subcommand?  Wouldn't it just be simpler to unplug the failing drive, plug
> > in the new drive, and then use the other form of the "replace" subcommand,
> > in which case that would be the *only* form of the subcommand?
>
> I suspect that is legacy usage. The behavior of resilver (and scrub) operations changed a fair bit in the first couple years of ZFS?s use in the real world. One of the HUGE advantages of the OpenSolaris project was the very fast feedback from the field directly to the developers. You still see that today in the OpenZFS project. While I am not a developer, I do subscribe to the ZFS-developer mailing list to read what is begin worked on and why.
>
> >     In any case, that is why I was asking what would happen in the
> > mid-rebuild failure situation.  If both subcommands are effectively identical,
> > then I guess it shouldn't be a big problem.
>
> IIRC, at some point the replace operation (resilver) was modified to use a ?failing? device to speed the process if it were still available. You still need to read the data and compare to the checksum, but it can be faster if you have the bad drive for some of the data. But my memory here may also be faulty, this is a good question to ask over on the ZFS list.

     Okay.
>
> >     How does one set a limit?  Is there an undocumented sysctl variable
> > for it?
>
> $ sysctl -a | grep vfs.zfs
>
> to find all the zfs handles (not all may be tunable)

     Are they all documented somewhere?
>
> Set them in /boot/loader.conf
>
> vfs.zfs.arc_max=?nnnM? is what you want :-)

     Thanks.  Already noted elsewhere, too.
>
> If /boot/loader.conf does not exist, create it, same format as /boot/defaults/loader.conf (but do not change things there, they may be overwritten by OS updates/upgrades).

     It already has many settings in it.
>
> >     However, no one seems to have tackled my original question 4) regarding
> > "options KVA_PAGES=n".  Care to take a stab at it?
>
> See the writeup at https://wiki.freebsd.org/ZFSTuningGuide
>
> I have not needed to make these tunings, so I cannot confirm them, but they have been out there for long enough that I suspect if they were wrong (or bad) they would have been corrected or removed.
>
> >     If ZFS has no way to prevent thrashing in that situation, then that is
> > a serious design deficiency in ZFS.
>
> Before you start making claims about ?design deficiencies? in ZFS I suggest you take a good hard look at the actual design and the criteria it was designed to fulfill. ZFS was NOT designed to be easy on drives. Nor was it designed to be easy on any of the other hardware (CPU or RAM). It WAS designed to be as fault tolerant as any physical system can be. It WAS designed to be incredibly scalable. It WAS designed to be very portable. It was NOT designed to be cheap.
>
> >     Does that then leave me with just the zvol+geli+UFS way to proceed?
> > I mean, I would love to be wealthy enough to throw thrice as many drives
> > into this setup, but I'm not.  I can get by with using a single set of drives
> > for the two purposes that need protection against device failure and silent
> > data corruption and then finding a smaller, cheaper drive or two for the
> > remaining purposes, but devoting a whole set of drives to each purpose is
> > not an option.  If ZFS really needs to be used that way, then that is another
> > serious design flaw,
>
> You seem to be annoyed that ZFS was not designed for your specific requirements. I would not say that ZFS has a ?serious design flaw? simply because it was designed for the exact configuration you need. What you need is the Oracle implementation of encryption under ZFS, which you can get by paying for it.

     Not at all.  I was commenting about something that would, in fact,
be a design flaw for any type of device- and space-management system if
it were the case.  However, we have already established elsewhere since
the time that I posted that comment that the limitation does not, in fact,
exist, so it's a non-issue anyway.


                                  Scott Bennett, Comm. ASMELG, CFIAG
**********************************************************************
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
**********************************************************************

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201408211033.s7LAXWbN006985>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation