From owner-freebsd-questions@FreeBSD.ORG  Fri Aug  8 16:36:14 2014
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 898CF2A1
 for <freebsd-questions@freebsd.org>; Fri,  8 Aug 2014 16:36:14 +0000 (UTC)
Received: from mail-qg0-f54.google.com (mail-qg0-f54.google.com
 [209.85.192.54])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4569D2A03
 for <freebsd-questions@freebsd.org>; Fri,  8 Aug 2014 16:36:13 +0000 (UTC)
Received: by mail-qg0-f54.google.com with SMTP id z60so6434109qgd.27
 for <freebsd-questions@freebsd.org>; Fri, 08 Aug 2014 09:36:12 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:mime-version:content-type:from
 :in-reply-to:date:cc:content-transfer-encoding:message-id:references
 :to; bh=XlpAwLS05Ddcnyatwu7jQJ4Xl7Cl9bxxRLOjE+7Rqyw=;
 b=OHbmTDjY1SNDW7sw1gWnzHMvL1P3YwgHKyn59sUDrq6oC3HQ10Jv4EVtytFlRdj7a9
 TncCkifbHAL7y1QZqRzxulipTuubEI8DhqYGkbOLE2ZE0WPcDTABPCLmBiL9isVptPd6
 vcoC2aB9X7K5IFsespJ+DQ9GoRajETf6TSqIiTmEfOLLHUILZIi6LVj3Y6XxS2Mf0quW
 GJaEpiFUGTSsK7F7XNWPjGYbmiO7+4E8DJU1Z6BKyhjcInvVAYA5qAAtRi0/x5mje7F+
 OuhczkR2LKifNuYyFdLYnsib4E8MNW4P+Z9xxPFLghzUpFe/Fv6b6nAt3Sr58pWgGZw5
 3IpQ==
X-Gm-Message-State: ALoCoQlXs+DIvJgymosvhObnW/4+a9CDkaoQxgehtIe7xJSg2KlwK0NTE7upINfoj6ins+VbY6uq
X-Received: by 10.140.41.133 with SMTP id z5mr24333532qgz.99.1407515772679;
 Fri, 08 Aug 2014 09:36:12 -0700 (PDT)
Received: from [192.168.1.127] (c-71-234-255-65.hsd1.vt.comcast.net.
 [71.234.255.65])
 by mx.google.com with ESMTPSA id y9sm6310063qaj.8.2014.08.08.09.36.10
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Fri, 08 Aug 2014 09:36:11 -0700 (PDT)
Subject: Re: some ZFS questions
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
Content-Type: text/plain; charset=windows-1252
From: Paul Kraus <paul@kraus-haus.org>
In-Reply-To: <201408080706.s78765xs022311@sdf.org>
Date: Fri, 8 Aug 2014 12:36:09 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <D3447347-512D-4DFF-BC4A-8ECE5B77F3D4@kraus-haus.org>
References: <201408070816.s778G9ug015988@sdf.org>
 <53E3A836.9080904@my.hennepintech.edu> <201408080706.s78765xs022311@sdf.org>
To: Scott Bennett <bennett@sdf.org>,
 FreeBSD Questions !!!! <freebsd-questions@freebsd.org>
X-Mailer: Apple Mail (2.1878.6)
Cc: Andrew Berg <aberg010@my.hennepintech.edu>
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Aug 2014 16:36:14 -0000

On Aug 8, 2014, at 3:06, Scott Bennett <bennett@sdf.org> wrote:

>     Well, I need space for encrypted file systems and for unencrypted =
file
> systems at a roughly 1:3 ratio.  I have four 2 TB drives for the =
purpose
> already, but apparently need at least two more.  If zvol+geli+UFS is =
not the
> way and using multiple slices/partitions is not the way either, then =
how
> should I set it up?

How much data do you need to store?

With four 2TB drives I would setup either a 2x2 mirror (if I needed =
random I/O performance) or a 4 drive RAIDz2 if I needed reliability (the =
RAIDz2 configuration has a substantially higher MTTDL than a 2-way =
mirror).

How much does the geli encryption cost in terms of space and speed? Is =
there a strong reason to not encrypt ALL the data? It can be in =
different zfs datasets (the ZFS term for a filesystem). In fact, I am =
NOT a fan of using the base dataset that is created with every zpool; I =
always create addition zfs datasets below the root of the zpool.

Note that part of the reason it is not recommended to create more than =
one vdev per physical device is that load on one zpool can then effect =
the performance of the other. It also means that you cannot readily =
predict the performance of *either* as they will interact with each =
other. Neither of the above may apply to you, but knowing *why* can help =
you choose to ignore a recommendation :-)

>=20
>     I see.  I had gathered from the zpool(8) man page's two forms of =
the
> "replace" subcommand that the form shown above should be used if the =
failing
> disk were still somewhat usable, but that the other form should be =
used if
> the failing disk were already a failed disk.  I figured from that that =
ZFS
> would try to get whatever it could from the failing disk and only =
recalculate
> from the rest for blocks that couldn't be read intact from the failing =
disk.
> If that is not the case, then why bother to have two forms of the =
"replace"
> subcommand?  Wouldn't it just be simpler to unplug the failing drive, =
plug
> in the new drive, and then use the other form of the "replace" =
subcommand,
> in which case that would be the *only* form of the subcommand?

I suspect that is legacy usage. The behavior of resilver (and scrub) =
operations changed a fair bit in the first couple years of ZFS=92s use =
in the real world. One of the HUGE advantages of the OpenSolaris project =
was the very fast feedback from the field directly to the developers. =
You still see that today in the OpenZFS project. While I am not a =
developer, I do subscribe to the ZFS-developer mailing list to read what =
is begin worked on and why.

>     In any case, that is why I was asking what would happen in the
> mid-rebuild failure situation.  If both subcommands are effectively =
identical,
> then I guess it shouldn't be a big problem.

IIRC, at some point the replace operation (resilver) was modified to use =
a =93failing=94 device to speed the process if it were still available. =
You still need to read the data and compare to the checksum, but it can =
be faster if you have the bad drive for some of the data. But my memory =
here may also be faulty, this is a good question to ask over on the ZFS =
list.

>     How does one set a limit?  Is there an undocumented sysctl =
variable
> for it?

$ sysctl -a | grep vfs.zfs

to find all the zfs handles (not all may be tunable)

Set them in /boot/loader.conf

vfs.zfs.arc_max=3D=93nnnM=94 is what you want :-)

If /boot/loader.conf does not exist, create it, same format as =
/boot/defaults/loader.conf (but do not change things there, they may be =
overwritten by OS updates/upgrades).

>     However, no one seems to have tackled my original question 4) =
regarding
> "options KVA_PAGES=3Dn".  Care to take a stab at it?

See the writeup at https://wiki.freebsd.org/ZFSTuningGuide

I have not needed to make these tunings, so I cannot confirm them, but =
they have been out there for long enough that I suspect if they were =
wrong (or bad) they would have been corrected or removed.

>     If ZFS has no way to prevent thrashing in that situation, then =
that is
> a serious design deficiency in ZFS.

Before you start making claims about =93design deficiencies=94 in ZFS I =
suggest you take a good hard look at the actual design and the criteria =
it was designed to fulfill. ZFS was NOT designed to be easy on drives. =
Nor was it designed to be easy on any of the other hardware (CPU or =
RAM). It WAS designed to be as fault tolerant as any physical system can =
be. It WAS designed to be incredibly scalable. It WAS designed to be =
very portable. It was NOT designed to be cheap.

>     Does that then leave me with just the zvol+geli+UFS way to =
proceed?
> I mean, I would love to be wealthy enough to throw thrice as many =
drives
> into this setup, but I'm not.  I can get by with using a single set of =
drives
> for the two purposes that need protection against device failure and =
silent
> data corruption and then finding a smaller, cheaper drive or two for =
the
> remaining purposes, but devoting a whole set of drives to each purpose =
is
> not an option.  If ZFS really needs to be used that way, then that is =
another
> serious design flaw,

You seem to be annoyed that ZFS was not designed for your specific =
requirements. I would not say that ZFS has a =93serious design flaw=94 =
simply because it was designed for the exact configuration you need. =
What you need is the Oracle implementation of encryption under ZFS, =
which you can get by paying for it.

--
Paul Kraus
paul@kraus-haus.org