Date: Fri, 8 Aug 2014 12:36:09 -0400 From: Paul Kraus <paul@kraus-haus.org> To: Scott Bennett <bennett@sdf.org>, FreeBSD Questions !!!! <freebsd-questions@freebsd.org> Cc: Andrew Berg <aberg010@my.hennepintech.edu> Subject: Re: some ZFS questions Message-ID: <D3447347-512D-4DFF-BC4A-8ECE5B77F3D4@kraus-haus.org> In-Reply-To: <201408080706.s78765xs022311@sdf.org> References: <201408070816.s778G9ug015988@sdf.org> <53E3A836.9080904@my.hennepintech.edu> <201408080706.s78765xs022311@sdf.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Aug 8, 2014, at 3:06, Scott Bennett <bennett@sdf.org> wrote: > Well, I need space for encrypted file systems and for unencrypted = file > systems at a roughly 1:3 ratio. I have four 2 TB drives for the = purpose > already, but apparently need at least two more. If zvol+geli+UFS is = not the > way and using multiple slices/partitions is not the way either, then = how > should I set it up? How much data do you need to store? With four 2TB drives I would setup either a 2x2 mirror (if I needed = random I/O performance) or a 4 drive RAIDz2 if I needed reliability (the = RAIDz2 configuration has a substantially higher MTTDL than a 2-way = mirror). How much does the geli encryption cost in terms of space and speed? Is = there a strong reason to not encrypt ALL the data? It can be in = different zfs datasets (the ZFS term for a filesystem). In fact, I am = NOT a fan of using the base dataset that is created with every zpool; I = always create addition zfs datasets below the root of the zpool. Note that part of the reason it is not recommended to create more than = one vdev per physical device is that load on one zpool can then effect = the performance of the other. It also means that you cannot readily = predict the performance of *either* as they will interact with each = other. Neither of the above may apply to you, but knowing *why* can help = you choose to ignore a recommendation :-) >=20 > I see. I had gathered from the zpool(8) man page's two forms of = the > "replace" subcommand that the form shown above should be used if the = failing > disk were still somewhat usable, but that the other form should be = used if > the failing disk were already a failed disk. I figured from that that = ZFS > would try to get whatever it could from the failing disk and only = recalculate > from the rest for blocks that couldn't be read intact from the failing = disk. > If that is not the case, then why bother to have two forms of the = "replace" > subcommand? Wouldn't it just be simpler to unplug the failing drive, = plug > in the new drive, and then use the other form of the "replace" = subcommand, > in which case that would be the *only* form of the subcommand? I suspect that is legacy usage. The behavior of resilver (and scrub) = operations changed a fair bit in the first couple years of ZFS=92s use = in the real world. One of the HUGE advantages of the OpenSolaris project = was the very fast feedback from the field directly to the developers. = You still see that today in the OpenZFS project. While I am not a = developer, I do subscribe to the ZFS-developer mailing list to read what = is begin worked on and why. > In any case, that is why I was asking what would happen in the > mid-rebuild failure situation. If both subcommands are effectively = identical, > then I guess it shouldn't be a big problem. IIRC, at some point the replace operation (resilver) was modified to use = a =93failing=94 device to speed the process if it were still available. = You still need to read the data and compare to the checksum, but it can = be faster if you have the bad drive for some of the data. But my memory = here may also be faulty, this is a good question to ask over on the ZFS = list. > How does one set a limit? Is there an undocumented sysctl = variable > for it? $ sysctl -a | grep vfs.zfs to find all the zfs handles (not all may be tunable) Set them in /boot/loader.conf vfs.zfs.arc_max=3D=93nnnM=94 is what you want :-) If /boot/loader.conf does not exist, create it, same format as = /boot/defaults/loader.conf (but do not change things there, they may be = overwritten by OS updates/upgrades). > However, no one seems to have tackled my original question 4) = regarding > "options KVA_PAGES=3Dn". Care to take a stab at it? See the writeup at https://wiki.freebsd.org/ZFSTuningGuide I have not needed to make these tunings, so I cannot confirm them, but = they have been out there for long enough that I suspect if they were = wrong (or bad) they would have been corrected or removed. > If ZFS has no way to prevent thrashing in that situation, then = that is > a serious design deficiency in ZFS. Before you start making claims about =93design deficiencies=94 in ZFS I = suggest you take a good hard look at the actual design and the criteria = it was designed to fulfill. ZFS was NOT designed to be easy on drives. = Nor was it designed to be easy on any of the other hardware (CPU or = RAM). It WAS designed to be as fault tolerant as any physical system can = be. It WAS designed to be incredibly scalable. It WAS designed to be = very portable. It was NOT designed to be cheap. > Does that then leave me with just the zvol+geli+UFS way to = proceed? > I mean, I would love to be wealthy enough to throw thrice as many = drives > into this setup, but I'm not. I can get by with using a single set of = drives > for the two purposes that need protection against device failure and = silent > data corruption and then finding a smaller, cheaper drive or two for = the > remaining purposes, but devoting a whole set of drives to each purpose = is > not an option. If ZFS really needs to be used that way, then that is = another > serious design flaw, You seem to be annoyed that ZFS was not designed for your specific = requirements. I would not say that ZFS has a =93serious design flaw=94 = simply because it was designed for the exact configuration you need. = What you need is the Oracle implementation of encryption under ZFS, = which you can get by paying for it. -- Paul Kraus paul@kraus-haus.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D3447347-512D-4DFF-BC4A-8ECE5B77F3D4>