Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Oct 2011 08:59:38 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Larry Rosenman <ler@lerctr.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: AF (4096 byte sector) drives: Can you mix/match in a ZFS pool?
Message-ID:  <20111012155938.GA24649@icarus.home.lan>
In-Reply-To: <4E95AE08.7030105@lerctr.org>
References:  <4E95AE08.7030105@lerctr.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Oct 12, 2011 at 10:11:04AM -0500, Larry Rosenman wrote:
> I have a root on ZFS box with 6 drives, all 400G (except one 500G)
> in a pool.
> 
> I want to upgrade to 2T or 3T drives, but was wondering if you can
> mix/match while doing the drive by drive
> replacement.
> 
> This is on 9.0-BETA3 if that matters.

This is a very good question, and opens a large can of worms.  My gut
feeling tells me this discussion is going to be very long.

I'm going to say that no, mixing 512-byte and 4096-byte sector drives in
a single vdev is a bad idea.  Here's why:

The procedure I've read for doing this is as follows:

ada0 =  512-byte sector disk
ada1 = 4096-byte sector disk
ada2 =  512-byte sector disk

gnop create -S 4096 ada1
zpool create mypool raidz ada0 ada1.nop ada2
zdb | grep ashift
   <should show "ashift: 12" for 4096-byte alignment or "ashift: 9" for
    512-byte alignment>
zpool export mypool
gnop destroy ada1.nop
zpool import mypool

There's an example of this procedure here, but the author does not
disclose if he's using three (3) WD20EARS drives, or if he's only using
one (1) WD20EARS drive (shown as ada0 in his list).  I have a feeling
he's using multi-sized-sector drives, which means his performance
probably sucks:

http://blog.monsted.dk/?q=node/1

Here's the kicker: the "ashift" parameter -- which is what in ZFS land
helps with the alignment issue -- is defined on a per-vdev basis.  It's
hard to explain.  Look at the below zdb output for a 2-disk mirror that
consists of a single vdev:

mypool:
    name: 'mypool'
    ...
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        ...
        children[0]:
            type: 'mirror'
            id: 0
            ...
            ashift: 9
            ...
            children[0]:
                type: 'disk'
                id: 0
                ...
                path: '/dev/ada1'
                phys_path: '/dev/ada1'
                ...
            children[1]:
                type: 'disk'
                id: 1
                ...
                path: '/dev/ada3'
                phys_path: '/dev/ada3'
                ...

Note where the "ashift" parameter is located in the above tree.  (I
imagine a pool with multiple vdevs would therefore have one ashift
parameter per vdev set).

Circling back to the procedure I stated above: this would result in an
ashift=12 alignment for all I/O to all underlying disks.  How do you
think your 512-byte sector drives are going to perform when doing reads
and writes?  (Answer: badly)

Likewise, what if you just screw the whole gnop thing and stick the
drive in and treat it without alignment (e.g. ashift=9, which is the
default I believe)?  You'll suffer from bad write performance (up to
~30%) due to lack of proper alignment on that one drive.  Meaning, that
drive will effectively become a delay bottleneck for your writes to the
pool.

So my advice is do not mix-match 512-byte and 4096-byte sector disks in a
vdev that consists of multiple disks.

If your next question is "what if I just make the 4096-byte sector disk
its own vdev and the 512-byte ones their own vdev?" then the answer is:
don't do this if you care about your pool.  E.g. a raidz1 pool with 3
512-byte sector disks in a vdev + one 4096-byte sector disk in a vdev
means that if the 4096-byte sector disk dies your pool is screwed.

If your next question is "what about if I had a mirror that consisted of
two vdevs (ada0 + ada1, then another as ada2 + ada3), and say disk ada2
is a 4096-byte sector drive, will that hurt the entire pool or just the
vdev?", I do not have an answer.

If you use ZFS with a single-disk pool (e.g. zpool create blah ada1),
then you should absolutely be able to use the above procedure and not
run into any issues.

As I finish this Email I'm certain folks will come along and tell me I'm
wrong, but given the above data I don't see how that'd be the case.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111012155938.GA24649>