Date: Wed, 12 Oct 2011 08:59:38 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Larry Rosenman <ler@lerctr.org> Cc: freebsd-fs@freebsd.org Subject: Re: AF (4096 byte sector) drives: Can you mix/match in a ZFS pool? Message-ID: <20111012155938.GA24649@icarus.home.lan> In-Reply-To: <4E95AE08.7030105@lerctr.org> References: <4E95AE08.7030105@lerctr.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Oct 12, 2011 at 10:11:04AM -0500, Larry Rosenman wrote: > I have a root on ZFS box with 6 drives, all 400G (except one 500G) > in a pool. > > I want to upgrade to 2T or 3T drives, but was wondering if you can > mix/match while doing the drive by drive > replacement. > > This is on 9.0-BETA3 if that matters. This is a very good question, and opens a large can of worms. My gut feeling tells me this discussion is going to be very long. I'm going to say that no, mixing 512-byte and 4096-byte sector drives in a single vdev is a bad idea. Here's why: The procedure I've read for doing this is as follows: ada0 = 512-byte sector disk ada1 = 4096-byte sector disk ada2 = 512-byte sector disk gnop create -S 4096 ada1 zpool create mypool raidz ada0 ada1.nop ada2 zdb | grep ashift <should show "ashift: 12" for 4096-byte alignment or "ashift: 9" for 512-byte alignment> zpool export mypool gnop destroy ada1.nop zpool import mypool There's an example of this procedure here, but the author does not disclose if he's using three (3) WD20EARS drives, or if he's only using one (1) WD20EARS drive (shown as ada0 in his list). I have a feeling he's using multi-sized-sector drives, which means his performance probably sucks: http://blog.monsted.dk/?q=node/1 Here's the kicker: the "ashift" parameter -- which is what in ZFS land helps with the alignment issue -- is defined on a per-vdev basis. It's hard to explain. Look at the below zdb output for a 2-disk mirror that consists of a single vdev: mypool: name: 'mypool' ... vdev_children: 1 vdev_tree: type: 'root' id: 0 ... children[0]: type: 'mirror' id: 0 ... ashift: 9 ... children[0]: type: 'disk' id: 0 ... path: '/dev/ada1' phys_path: '/dev/ada1' ... children[1]: type: 'disk' id: 1 ... path: '/dev/ada3' phys_path: '/dev/ada3' ... Note where the "ashift" parameter is located in the above tree. (I imagine a pool with multiple vdevs would therefore have one ashift parameter per vdev set). Circling back to the procedure I stated above: this would result in an ashift=12 alignment for all I/O to all underlying disks. How do you think your 512-byte sector drives are going to perform when doing reads and writes? (Answer: badly) Likewise, what if you just screw the whole gnop thing and stick the drive in and treat it without alignment (e.g. ashift=9, which is the default I believe)? You'll suffer from bad write performance (up to ~30%) due to lack of proper alignment on that one drive. Meaning, that drive will effectively become a delay bottleneck for your writes to the pool. So my advice is do not mix-match 512-byte and 4096-byte sector disks in a vdev that consists of multiple disks. If your next question is "what if I just make the 4096-byte sector disk its own vdev and the 512-byte ones their own vdev?" then the answer is: don't do this if you care about your pool. E.g. a raidz1 pool with 3 512-byte sector disks in a vdev + one 4096-byte sector disk in a vdev means that if the 4096-byte sector disk dies your pool is screwed. If your next question is "what about if I had a mirror that consisted of two vdevs (ada0 + ada1, then another as ada2 + ada3), and say disk ada2 is a 4096-byte sector drive, will that hurt the entire pool or just the vdev?", I do not have an answer. If you use ZFS with a single-disk pool (e.g. zpool create blah ada1), then you should absolutely be able to use the above procedure and not run into any issues. As I finish this Email I'm certain folks will come along and tell me I'm wrong, but given the above data I don't see how that'd be the case. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111012155938.GA24649>