Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Dec 2020 17:23:18 -0500
From:      Paul Mather <>
Subject:   Re: effect of differing spindle speeds on prospective zfs 	vdevs
Message-ID:  <>
In-Reply-To: <>
References:  <>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help
On Sat, 5 Dec 2020 19:16:33 +0000, tech-lists <> =

> Hi,
> On Sat, Dec 05, 2020 at 08:51:08AM -0500, Paul Mather wrote:
>> IIRC, ZFS pools have a single ashift for the entire pool, so you =
>> set it to accommodate the 4096/4096 devices to avoid performance=20
>> degradation.  I believe it defaults to that now, and should =
>> anyway.  But, in a mixed setup of vdevs like you have, you should be=20=

>> using ashift=3D12.
>> I believe having an ashift=3D9 on your mixed-drive setup would have =
>> biggest performance impact in terms of reducing performance.
> Part of my confusion about the ashift thing is I thought ashift=3D9 =
was for
> 512/512 logical/physical. Is this still the case?
> On a different machine which has been running since FreeBSD12 was =
> one of the disks in the array went bang. zdb shows ashift=3D9 (as was =
> when it was created). The only available replacement was an otherwise=20=

> identical disk but 512 logical/4096 physical. zpool status mildly =
> about preformance degradation like this:
> ada2    ONLINE       0     0     0  block size: 512B configured, 4096B =
>  state: ONLINE
> status: One or more devices are configured to use a non-native block =
>      Expect reduced performance.
> action: Replace affected devices with devices that support the
>      configured block size, or migrate data to a properly configured
>      pool.
> The other part of my confusion is that I understood zfs to set its own=20=

> blocksize on the fly.

You're correct in that ZFS has its own concept of a block size (the =
"recordsize" property) but this is not the same as the block size =
concerning ashift.  When "zpool" complains about "non-native block size" =
it is talking about the physical block size of the underlying vdev.  =
That is the smallest unit of data that are read or written from the =
device.  (It also has an impact on where partitions can be addressed.)

When hard drives became larger the number of bits used to address =
logical blocks (LBAs) became insufficient to reference all blocks on the =
device.  One way around this, and to enable devices to store more total =
data, was to make the referenced blocks larger.  (Larger block sizes are =
also good in that they require relatively less space for ECC data.)  =
Hence, the 4K "advanced format" drives arrived.  Before that, block =
(a.k.a. sector) sizes typically had been 512 bytes for hard drives.  =
After, it became 4096 bytes.

For some drives, the device actually utilises 4096-byte sectors but =
advertises a 512-byte sector size to the outside world.  =46rom a read =
standpoint this doesn't create a problem.  It is when writing that you =
can incur performance issues.  This is because writing/updating a =
512-byte sector within a 4096-byte physical sector involves a =
read-modify-write operation: the original 4096-byte contents must be =
read, then the 512-byte subset updated, and finally the new 4096-byte =
whole re-written back to disk.  That involves more than simply writing a =
512-byte block as-is to a 512-byte sector.  (In similar fashion, =
partitions not aligned on a 4K boundary can incur performance =
degradation for 4096-byte physical sectors that advertise as 512-byte.)

> (I guess there must be some performance degradation but it's not
> yet enough for me to notice. Or it might only be noticable if low on =

ZFS has a lot of caching, plus the use of ZIL "batches" writes, and all =
of this can ameliorate the effects of misaligned block sizes and =
partition boundaries.  (Large sequential writes are best for =
performance, especially in spinning disks that incur penalties for head =
movement and can incur rotational delays.)  But, if you have a =
write-intensive pool, you are unnecessarily causing yourself a =
performance hit by not using the correct ashift and/or partition =

BTW, low space mainly affects performance due to fragmentation.  It is a =
different issue vs. mismatched block size (ashift).

When I replaced my ashift=3D9 512-byte drives I eventually recreated the =
pool with ashift=3D12.  Using ashift=3D12 on pools with 512-byte sector =
size drives will not incur any performance penalty, which is why ashift =
defaults to 12 nowadays.  (I wouldn't be surprised if the default =
changes to ashift=3D13 due to the prevalence of SSDs these days.)



Want to link to this message? Use this URL: <>