From owner-freebsd-stable@FreeBSD.ORG  Mon Oct 13 22:51:04 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4BB1FCAA
 for <freebsd-stable@FreeBSD.org>; Mon, 13 Oct 2014 22:51:04 +0000 (UTC)
Received: from gw.catspoiler.org (cl-1657.chi-02.us.sixxs.net
 [IPv6:2001:4978:f:678::2])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id D3F05BC2
 for <freebsd-stable@FreeBSD.org>; Mon, 13 Oct 2014 22:51:03 +0000 (UTC)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
 by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id s9DMotuq030403;
 Mon, 13 Oct 2014 15:50:59 -0700 (PDT)
 (envelope-from truckman@FreeBSD.org)
Message-Id: <201410132250.s9DMotuq030403@gw.catspoiler.org>
Date: Mon, 13 Oct 2014 15:50:55 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
Subject: Re: getting to 4K disk blocks in ZFS
To: lyndon@orthanc.ca
In-Reply-To: <E44D9624-D8A2-4CA5-B399-3C1983904CA9@orthanc.ca>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
Cc: freebsd-stable@FreeBSD.org
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Oct 2014 22:51:04 -0000

On 13 Oct, Lyndon Nerenberg wrote:
> 
> On Oct 13, 2014, at 1:47 PM, Don Lewis <truckman@FreeBSD.org> wrote:
> 
>> Combine that with raidz and now the
>> overhead is about 40%.  Ouch!
> 
> But ~33 of that 40% is generic RAID overhead, and has nothing to do
> with the 4K block size issue.  (I.e., you would have 33% hit even if
> you were running UFS on a three disk RAID5.)

Actually, I meant this as just the fragmentation overhead.  I was
incorrectly thinking that there was one parity block per stripe, and with
a three disk raidz1, I'd get two data sectors and one parity sector per
stripe, and that even a tiny file would get two data sectors and one
parity sector.  The 40% wasted space was calcualted based on an
effective 8K sector size.  Parity would then add another 50% to the
space.

It turns out that this isn't the way it works, at least according to my
inderstanding from the previously posted link.  A tiny file only gets
allocated a single data sector, but also gets its own dedicated parity
sector. This helps reduce the waste from fragmentation, but increases
the parity overhead. With ashift=12, if you filled the filessytem with
4K files, half the space would be consumed by parity blocks instead of
the expected 1/3rd.  If you reduce the file size to 512b, you still
can't fit in any more files because each file would still require a 4K
data sector and a 4K parity sector, so only 6.25% of the space consumed
would hold useful data.

It looks like my mail spool would grow by a factor of about 1.9 because
of fragmentation and parity overhead.  Actually, not as bad as I feared
it could be, but not much more efficient than a mirror.
 
> On any real-world system where you're running ZFS, it's unlikely the
> 4K block overhead is really going to be an issue.  And the underlying
> disk hardware is moving to 4K physical sectors, anyway.  Sooner or
> later you're just going to have to suck it up.

If you're storing enough small files to make it worthwhile to optimize
for that case, you might be better off with older technology 512b sector
drives than newer 4K sector drives that have to be 8x larger to store
the same amount of data.  Drive capacities aren't growing nearly as fast
anymore.  How long does it take drive capacity to go up by a factor of
8x?