From owner-freebsd-stable@FreeBSD.ORG Mon Oct 13 22:51:04 2014 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4BB1FCAA for ; Mon, 13 Oct 2014 22:51:04 +0000 (UTC) Received: from gw.catspoiler.org (cl-1657.chi-02.us.sixxs.net [IPv6:2001:4978:f:678::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D3F05BC2 for ; Mon, 13 Oct 2014 22:51:03 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id s9DMotuq030403; Mon, 13 Oct 2014 15:50:59 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201410132250.s9DMotuq030403@gw.catspoiler.org> Date: Mon, 13 Oct 2014 15:50:55 -0700 (PDT) From: Don Lewis Subject: Re: getting to 4K disk blocks in ZFS To: lyndon@orthanc.ca In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: freebsd-stable@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Oct 2014 22:51:04 -0000 On 13 Oct, Lyndon Nerenberg wrote: > > On Oct 13, 2014, at 1:47 PM, Don Lewis wrote: > >> Combine that with raidz and now the >> overhead is about 40%. Ouch! > > But ~33 of that 40% is generic RAID overhead, and has nothing to do > with the 4K block size issue. (I.e., you would have 33% hit even if > you were running UFS on a three disk RAID5.) Actually, I meant this as just the fragmentation overhead. I was incorrectly thinking that there was one parity block per stripe, and with a three disk raidz1, I'd get two data sectors and one parity sector per stripe, and that even a tiny file would get two data sectors and one parity sector. The 40% wasted space was calcualted based on an effective 8K sector size. Parity would then add another 50% to the space. It turns out that this isn't the way it works, at least according to my inderstanding from the previously posted link. A tiny file only gets allocated a single data sector, but also gets its own dedicated parity sector. This helps reduce the waste from fragmentation, but increases the parity overhead. With ashift=12, if you filled the filessytem with 4K files, half the space would be consumed by parity blocks instead of the expected 1/3rd. If you reduce the file size to 512b, you still can't fit in any more files because each file would still require a 4K data sector and a 4K parity sector, so only 6.25% of the space consumed would hold useful data. It looks like my mail spool would grow by a factor of about 1.9 because of fragmentation and parity overhead. Actually, not as bad as I feared it could be, but not much more efficient than a mirror. > On any real-world system where you're running ZFS, it's unlikely the > 4K block overhead is really going to be an issue. And the underlying > disk hardware is moving to 4K physical sectors, anyway. Sooner or > later you're just going to have to suck it up. If you're storing enough small files to make it worthwhile to optimize for that case, you might be better off with older technology 512b sector drives than newer 4K sector drives that have to be 8x larger to store the same amount of data. Drive capacities aren't growing nearly as fast anymore. How long does it take drive capacity to go up by a factor of 8x?