Date: Sun, 6 May 2012 09:49:33 -0400 From: Michael Shuey <shuey@fmepnet.org> To: Miroslav Lachman <000.fbsd@quip.cz> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS 4K drive overhead Message-ID: <CAELRr5kZohBbKMBoScezNeYBw5mV1k9XcLN%2B74W2qavUnwvYUQ@mail.gmail.com> In-Reply-To: <4FA63A68.3050607@quip.cz> References: <CA%2BZnuqw8YpOD3fV6%2BeoGLqH6J%2Bpmafaw=c_iaMUNRK7TUe39%2Bw@mail.gmail.com> <4FA63A68.3050607@quip.cz>
next in thread | previous in thread | raw e-mail | index | archive | help
A couple months back, I finished rebuilding my spools to use ashift=3D12, after trying a 4k drive in a pool with ashift=3D9. If you try a 4k drive on an ashift=3D9 pool, you're going to have a bad time. Performance for occasional IO (particularly streaming) isn't too bad with mis-aligned sectors. However, resilvering time is MUCH, MUCH, MUCH higher - I saw estimates for resilver completion go up by over an order of magnitude, and pool performance become nearly unusable while a resilver was in operation. ZFS will dynamically adjust block size for a file, between the smallest block size the media supports and 128k or so (IIRC). That means that even if you align a partition on your 4k disk, or use the raw disk itself (so ZFS starts on an aligned sector), after the first small file is written you'll be doing un-aligned IOs. Resilvering a 1.5 TB drive was estimated at over 230 hours for me; it was actually less time to abort and rebuild the server from backups. Given the propensity of 4k drives on the market now, and the likelihood that they'll be the only product available in the future, I'd highly recommend using ashift=3D12 on any new zpools. It's time to stop using ashift=3D9. On Sun, May 6, 2012 at 4:46 AM, Miroslav Lachman <000.fbsd@quip.cz> wrote: > Chris wrote: >> >> Hi all, >> >> I'm planning on making a raidz2 with 6 2 TB drives - all 4K sectors, >> all reporting as 512 bytes. I've been reading some disturbing things >> about ZFS when used on 4K drives. In this discussion >> >> (http://mail.opensolaris.org/pipermail/zfs-discuss/2011-October/049959.h= tml), >> Jim Klimov pointed out that when ZFS is used with ashift=3D12, the >> metadata overhead for a filesystem with a lot of small files can reach >> 100% >> (http://mail.opensolaris.org/pipermail/zfs-discuss/2011-October/049960.h= tml)! >> That seems pretty bad to me. My questions are: >> >> Does anyone on this list have experience using ZFS on 4K drives with >> ashift=3D12? Is the overhead per file, such that having a relatively >> large average filesize, say, 19 MB, would render it insignificant? Or >> would the overhead be large regardless? > > > Average size of 19MB is much more larger than 4k (metadata), the overhead > will be not so high as with really small files (files with size of few kB= ). > > >> What is the speed penalty for using ashift=3D9 on the array? Is the >> safety of the data on the array an issue =A0(due to how ZFS can't write >> to a 512 byte sector but it's coded with the assumption that it can >> thus making it no longer strictly copy-on-write)? Does anyone have any >> experience with ashift=3D9 arrays on 4K drives? > > > Even if the overhead will be larger, the speed penalty is much higher. Yo= u > should read about it in some post on this blog: > > http://blog.des.no/search/label/freebsd > > There are various articles with banchmarks of 4k sectors drives and some = of > them are almost useless with unaligned writes. So I strongly recommend yo= u > to use 4k (ashift=3D12). > > Use ashift=3D9 only if performance doesn't metter and you are concerned o= nly > on available space. > > Miroslav Lachman > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAELRr5kZohBbKMBoScezNeYBw5mV1k9XcLN%2B74W2qavUnwvYUQ>