From owner-freebsd-questions@FreeBSD.ORG Fri Aug 8 16:36:14 2014 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 898CF2A1 for ; Fri, 8 Aug 2014 16:36:14 +0000 (UTC) Received: from mail-qg0-f54.google.com (mail-qg0-f54.google.com [209.85.192.54]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4569D2A03 for ; Fri, 8 Aug 2014 16:36:13 +0000 (UTC) Received: by mail-qg0-f54.google.com with SMTP id z60so6434109qgd.27 for ; Fri, 08 Aug 2014 09:36:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:mime-version:content-type:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to; bh=XlpAwLS05Ddcnyatwu7jQJ4Xl7Cl9bxxRLOjE+7Rqyw=; b=OHbmTDjY1SNDW7sw1gWnzHMvL1P3YwgHKyn59sUDrq6oC3HQ10Jv4EVtytFlRdj7a9 TncCkifbHAL7y1QZqRzxulipTuubEI8DhqYGkbOLE2ZE0WPcDTABPCLmBiL9isVptPd6 vcoC2aB9X7K5IFsespJ+DQ9GoRajETf6TSqIiTmEfOLLHUILZIi6LVj3Y6XxS2Mf0quW GJaEpiFUGTSsK7F7XNWPjGYbmiO7+4E8DJU1Z6BKyhjcInvVAYA5qAAtRi0/x5mje7F+ OuhczkR2LKifNuYyFdLYnsib4E8MNW4P+Z9xxPFLghzUpFe/Fv6b6nAt3Sr58pWgGZw5 3IpQ== X-Gm-Message-State: ALoCoQlXs+DIvJgymosvhObnW/4+a9CDkaoQxgehtIe7xJSg2KlwK0NTE7upINfoj6ins+VbY6uq X-Received: by 10.140.41.133 with SMTP id z5mr24333532qgz.99.1407515772679; Fri, 08 Aug 2014 09:36:12 -0700 (PDT) Received: from [192.168.1.127] (c-71-234-255-65.hsd1.vt.comcast.net. [71.234.255.65]) by mx.google.com with ESMTPSA id y9sm6310063qaj.8.2014.08.08.09.36.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 08 Aug 2014 09:36:11 -0700 (PDT) Subject: Re: some ZFS questions Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Content-Type: text/plain; charset=windows-1252 From: Paul Kraus In-Reply-To: <201408080706.s78765xs022311@sdf.org> Date: Fri, 8 Aug 2014 12:36:09 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <201408070816.s778G9ug015988@sdf.org> <53E3A836.9080904@my.hennepintech.edu> <201408080706.s78765xs022311@sdf.org> To: Scott Bennett , FreeBSD Questions !!!! X-Mailer: Apple Mail (2.1878.6) Cc: Andrew Berg X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Aug 2014 16:36:14 -0000 On Aug 8, 2014, at 3:06, Scott Bennett wrote: > Well, I need space for encrypted file systems and for unencrypted = file > systems at a roughly 1:3 ratio. I have four 2 TB drives for the = purpose > already, but apparently need at least two more. If zvol+geli+UFS is = not the > way and using multiple slices/partitions is not the way either, then = how > should I set it up? How much data do you need to store? With four 2TB drives I would setup either a 2x2 mirror (if I needed = random I/O performance) or a 4 drive RAIDz2 if I needed reliability (the = RAIDz2 configuration has a substantially higher MTTDL than a 2-way = mirror). How much does the geli encryption cost in terms of space and speed? Is = there a strong reason to not encrypt ALL the data? It can be in = different zfs datasets (the ZFS term for a filesystem). In fact, I am = NOT a fan of using the base dataset that is created with every zpool; I = always create addition zfs datasets below the root of the zpool. Note that part of the reason it is not recommended to create more than = one vdev per physical device is that load on one zpool can then effect = the performance of the other. It also means that you cannot readily = predict the performance of *either* as they will interact with each = other. Neither of the above may apply to you, but knowing *why* can help = you choose to ignore a recommendation :-) >=20 > I see. I had gathered from the zpool(8) man page's two forms of = the > "replace" subcommand that the form shown above should be used if the = failing > disk were still somewhat usable, but that the other form should be = used if > the failing disk were already a failed disk. I figured from that that = ZFS > would try to get whatever it could from the failing disk and only = recalculate > from the rest for blocks that couldn't be read intact from the failing = disk. > If that is not the case, then why bother to have two forms of the = "replace" > subcommand? Wouldn't it just be simpler to unplug the failing drive, = plug > in the new drive, and then use the other form of the "replace" = subcommand, > in which case that would be the *only* form of the subcommand? I suspect that is legacy usage. The behavior of resilver (and scrub) = operations changed a fair bit in the first couple years of ZFS=92s use = in the real world. One of the HUGE advantages of the OpenSolaris project = was the very fast feedback from the field directly to the developers. = You still see that today in the OpenZFS project. While I am not a = developer, I do subscribe to the ZFS-developer mailing list to read what = is begin worked on and why. > In any case, that is why I was asking what would happen in the > mid-rebuild failure situation. If both subcommands are effectively = identical, > then I guess it shouldn't be a big problem. IIRC, at some point the replace operation (resilver) was modified to use = a =93failing=94 device to speed the process if it were still available. = You still need to read the data and compare to the checksum, but it can = be faster if you have the bad drive for some of the data. But my memory = here may also be faulty, this is a good question to ask over on the ZFS = list. > How does one set a limit? Is there an undocumented sysctl = variable > for it? $ sysctl -a | grep vfs.zfs to find all the zfs handles (not all may be tunable) Set them in /boot/loader.conf vfs.zfs.arc_max=3D=93nnnM=94 is what you want :-) If /boot/loader.conf does not exist, create it, same format as = /boot/defaults/loader.conf (but do not change things there, they may be = overwritten by OS updates/upgrades). > However, no one seems to have tackled my original question 4) = regarding > "options KVA_PAGES=3Dn". Care to take a stab at it? See the writeup at https://wiki.freebsd.org/ZFSTuningGuide I have not needed to make these tunings, so I cannot confirm them, but = they have been out there for long enough that I suspect if they were = wrong (or bad) they would have been corrected or removed. > If ZFS has no way to prevent thrashing in that situation, then = that is > a serious design deficiency in ZFS. Before you start making claims about =93design deficiencies=94 in ZFS I = suggest you take a good hard look at the actual design and the criteria = it was designed to fulfill. ZFS was NOT designed to be easy on drives. = Nor was it designed to be easy on any of the other hardware (CPU or = RAM). It WAS designed to be as fault tolerant as any physical system can = be. It WAS designed to be incredibly scalable. It WAS designed to be = very portable. It was NOT designed to be cheap. > Does that then leave me with just the zvol+geli+UFS way to = proceed? > I mean, I would love to be wealthy enough to throw thrice as many = drives > into this setup, but I'm not. I can get by with using a single set of = drives > for the two purposes that need protection against device failure and = silent > data corruption and then finding a smaller, cheaper drive or two for = the > remaining purposes, but devoting a whole set of drives to each purpose = is > not an option. If ZFS really needs to be used that way, then that is = another > serious design flaw, You seem to be annoyed that ZFS was not designed for your specific = requirements. I would not say that ZFS has a =93serious design flaw=94 = simply because it was designed for the exact configuration you need. = What you need is the Oracle implementation of encryption under ZFS, = which you can get by paying for it. -- Paul Kraus paul@kraus-haus.org