FreeBSD Mail Archives

Date:      Fri, 8 Aug 2014 12:36:09 -0400
From:      Paul Kraus <paul@kraus-haus.org>
To:        Scott Bennett <bennett@sdf.org>, FreeBSD Questions !!!! <freebsd-questions@freebsd.org>
Cc:        Andrew Berg <aberg010@my.hennepintech.edu>
Subject:   Re: some ZFS questions
Message-ID:  <D3447347-512D-4DFF-BC4A-8ECE5B77F3D4@kraus-haus.org>
In-Reply-To: <201408080706.s78765xs022311@sdf.org>
References:  <201408070816.s778G9ug015988@sdf.org> <53E3A836.9080904@my.hennepintech.edu> <201408080706.s78765xs022311@sdf.org>

On Aug 8, 2014, at 3:06, Scott Bennett <bennett@sdf.org> wrote:

>     Well, I need space for encrypted file systems and for unencrypted =
file
> systems at a roughly 1:3 ratio.  I have four 2 TB drives for the =
purpose
> already, but apparently need at least two more.  If zvol+geli+UFS is =
not the
> way and using multiple slices/partitions is not the way either, then =
how
> should I set it up?

How much data do you need to store?

With four 2TB drives I would setup either a 2x2 mirror (if I needed =
random I/O performance) or a 4 drive RAIDz2 if I needed reliability (the =
RAIDz2 configuration has a substantially higher MTTDL than a 2-way =
mirror).

How much does the geli encryption cost in terms of space and speed? Is =
there a strong reason to not encrypt ALL the data? It can be in =
different zfs datasets (the ZFS term for a filesystem). In fact, I am =
NOT a fan of using the base dataset that is created with every zpool; I =
always create addition zfs datasets below the root of the zpool.

Note that part of the reason it is not recommended to create more than =
one vdev per physical device is that load on one zpool can then effect =
the performance of the other. It also means that you cannot readily =
predict the performance of *either* as they will interact with each =
other. Neither of the above may apply to you, but knowing *why* can help =
you choose to ignore a recommendation :-)

>=20
>     I see.  I had gathered from the zpool(8) man page's two forms of =
the
> "replace" subcommand that the form shown above should be used if the =
failing
> disk were still somewhat usable, but that the other form should be =
used if
> the failing disk were already a failed disk.  I figured from that that =
ZFS
> would try to get whatever it could from the failing disk and only =
recalculate
> from the rest for blocks that couldn't be read intact from the failing =
disk.
> If that is not the case, then why bother to have two forms of the =
"replace"
> subcommand?  Wouldn't it just be simpler to unplug the failing drive, =
plug
> in the new drive, and then use the other form of the "replace" =
subcommand,
> in which case that would be the *only* form of the subcommand?

I suspect that is legacy usage. The behavior of resilver (and scrub) =
operations changed a fair bit in the first couple years of ZFS=92s use =
in the real world. One of the HUGE advantages of the OpenSolaris project =
was the very fast feedback from the field directly to the developers. =
You still see that today in the OpenZFS project. While I am not a =
developer, I do subscribe to the ZFS-developer mailing list to read what =
is begin worked on and why.

>     In any case, that is why I was asking what would happen in the
> mid-rebuild failure situation.  If both subcommands are effectively =
identical,
> then I guess it shouldn't be a big problem.

IIRC, at some point the replace operation (resilver) was modified to use =
a =93failing=94 device to speed the process if it were still available. =
You still need to read the data and compare to the checksum, but it can =
be faster if you have the bad drive for some of the data. But my memory =
here may also be faulty, this is a good question to ask over on the ZFS =
list.

>     How does one set a limit?  Is there an undocumented sysctl =
variable
> for it?

$ sysctl -a | grep vfs.zfs

to find all the zfs handles (not all may be tunable)

Set them in /boot/loader.conf

vfs.zfs.arc_max=3D=93nnnM=94 is what you want :-)

If /boot/loader.conf does not exist, create it, same format as =
/boot/defaults/loader.conf (but do not change things there, they may be =
overwritten by OS updates/upgrades).

>     However, no one seems to have tackled my original question 4) =
regarding
> "options KVA_PAGES=3Dn".  Care to take a stab at it?

See the writeup at https://wiki.freebsd.org/ZFSTuningGuide

I have not needed to make these tunings, so I cannot confirm them, but =
they have been out there for long enough that I suspect if they were =
wrong (or bad) they would have been corrected or removed.

>     If ZFS has no way to prevent thrashing in that situation, then =
that is
> a serious design deficiency in ZFS.

Before you start making claims about =93design deficiencies=94 in ZFS I =
suggest you take a good hard look at the actual design and the criteria =
it was designed to fulfill. ZFS was NOT designed to be easy on drives. =
Nor was it designed to be easy on any of the other hardware (CPU or =
RAM). It WAS designed to be as fault tolerant as any physical system can =
be. It WAS designed to be incredibly scalable. It WAS designed to be =
very portable. It was NOT designed to be cheap.

>     Does that then leave me with just the zvol+geli+UFS way to =
proceed?
> I mean, I would love to be wealthy enough to throw thrice as many =
drives
> into this setup, but I'm not.  I can get by with using a single set of =
drives
> for the two purposes that need protection against device failure and =
silent
> data corruption and then finding a smaller, cheaper drive or two for =
the
> remaining purposes, but devoting a whole set of drives to each purpose =
is
> not an option.  If ZFS really needs to be used that way, then that is =
another
> serious design flaw,

You seem to be annoyed that ZFS was not designed for your specific =
requirements. I would not say that ZFS has a =93serious design flaw=94 =
simply because it was designed for the exact configuration you need. =
What you need is the Oracle implementation of encryption under ZFS, =
which you can get by paying for it.

--
Paul Kraus
paul@kraus-haus.org

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D3447347-512D-4DFF-BC4A-8ECE5B77F3D4>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation