Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 May 2018 16:20:23 -0500
From:      Eric Borisch <eborisch@gmail.com>
To:        Paul Esson <paul.esson@redstor.com>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: Unexpected zvol usage
Message-ID:  <CAMsT2=kybGc0os70yGmEPFeBZU1DMKB0wtDPzBC9rJJwPACuoQ@mail.gmail.com>
In-Reply-To: <HE1PR0102MB25880FD0731B56770F19D7F29E900@HE1PR0102MB2588.eurprd01.prod.exchangelabs.com>
References:  <HE1PR0102MB25880FD0731B56770F19D7F29E900@HE1PR0102MB2588.eurprd01.prod.exchangelabs.com>

next in thread | previous in thread | raw e-mail | index | archive | help
You're hitting the raidz-N layout rules: individual allocations must be a
multiple of (N+1), or 3 for raidz-2, of the underlying block (ashift=12 ->
4k blocks). This is because each individual allocation carries its own
parity, and also to avoid leaving holes in the drive when the allocation is
removed: https://www.delphix.com/blog/delphix-engineering/
zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz

So for an 8K block, raidz-2, with D being data, P being parity, and X being
padding, you're at: DDP,PXX (',' marking allowable allocation multiples of
3), which means 2/3 (66.7%) of your storage is used for metadata + padding;
compared with what you likely expected for parity-2 with 12 drives = 1/6
(16.7%), or an "extra" 50% overhead.

The "extra" overhead for various volblocksizes in this layout (2/12) are:
  4k: 50.0%
  8k: 50.0%
 16k:16.7%
 32k:16.7%
 64k: 7.1%
128k: 7.1%

That's why Kevin is suggesting 128k. For this particular layout, you would
have very similar space efficiency [2] with 64k, and improved latency. Or
16k to significantly reduce the "extra" overhead with the lowest latency
impact.

I calculate these "extra" overheads here [1], if you're interested.

The compression potential with larger block sizes also helps to counteract
this overhead on compressible workloads, so it is better on two fronts, if
you enable something like lz4 with very low overhead. You really need to
test with your workload to find your "optimal" choice, however.

I ran into this myself when spinning up some VMs, and put together the
linked sheet (Based on one from Matt Ahrens) to help myself and others when
selecting array and zvol layouts / settings.
   - Eric

[1] https://docs.google.com/spreadsheets/d/1kQJJpUtbWB_
Poyc7jcO3mNFrFqeHSWuQ8U8Y5UC3dHY/edit?usp=sharing
[2] Similar, but I'm guessing not exact; there must be more overhead in the
tracking of twice as many blocks, but it's fairly hidden from userland.

On Fri, May 18, 2018 at 10:42 AM, Paul Esson <paul.esson@redstor.com> wrote:

> Hi Folks,
>
> I have an 11.1-RELEASE system being used as a host for a bhyve guest.
> There is a large zpool on the host created from 12 x 10TB HDDs using raidz2
> redundancy with ashift12.  I have created a sparse zvol within the pool
> using default settings and presented that to the bhyve vm as an ahci-hd
> disk type.  The guest has a zpool and filesystem dataset built on this
> disk.  When I start to write to the filesystem on the guest I am finding
> that the used/referenced on the host's zvol are more than double those on
> the guest.  The logicalused/referenced values on the host zvol are more in
> line with the equivalent guest values, but my problem is that the host zvol
> is likely to fill before I have written all intended data to the guest.
>
>
> I have included below information from both the host and guest before and
> after writing.  This output shows that the zvol uses a default 8K blocksize
> and that the guest zfs is therefore ashift13.  I also tried creating the
> zvol with a 4K blocksize and the guest zfs ashift12 so that 4K blocks were
> consistent across hosts and guest, but still saw the amplification on
> writes to the zvol.
>
> Any insight greatly appreciated.
>
>
>
> HOST
>
> Zpool
> RAIDZ2 12 x HDDs, ashift 12
>
> NAME       PROPERTY              VALUE                   SOURCE
> dc1-hn-01  type                  filesystem              -
> dc1-hn-01  creation              Mon Apr 23 14:35 2018   -
> dc1-hn-01  used                  32.0G                   -
> dc1-hn-01  available             78.2T                   -
> dc1-hn-01  referenced            201K                    -
> dc1-hn-01  compressratio         1.00x                   -
> dc1-hn-01  mounted               yes                     -
> dc1-hn-01  quota                 none                    default
> dc1-hn-01  reservation           none                    default
> dc1-hn-01  recordsize            128K                    default
> dc1-hn-01  mountpoint            /export/data/dc1-hn-01  local
> dc1-hn-01  sharenfs              off                     default
> dc1-hn-01  checksum              on                      default
> dc1-hn-01  compression           off                     default
> dc1-hn-01  atime                 on                      default
> dc1-hn-01  devices               on                      default
> dc1-hn-01  exec                  on                      default
> dc1-hn-01  setuid                on                      default
> dc1-hn-01  readonly              off                     default
> dc1-hn-01  jailed                off                     default
> dc1-hn-01  snapdir               hidden                  default
> dc1-hn-01  aclmode               discard                 default
> dc1-hn-01  aclinherit            restricted              default
> dc1-hn-01  canmount              on                      default
> dc1-hn-01  xattr                 off                     temporary
> dc1-hn-01  copies                1                       default
> dc1-hn-01  version               5                       -
> dc1-hn-01  utf8only              off                     -
> dc1-hn-01  normalization         none                    -
> dc1-hn-01  casesensitivity       sensitive               -
> dc1-hn-01  vscan                 off                     default
> dc1-hn-01  nbmand                off                     default
> dc1-hn-01  sharesmb              off                     default
> dc1-hn-01  refquota              none                    default
> dc1-hn-01  refreservation        none                    default
> dc1-hn-01  primarycache          all                     default
> dc1-hn-01  secondarycache        all                     default
> dc1-hn-01  usedbysnapshots       0                       -
> dc1-hn-01  usedbydataset         201K                    -
> dc1-hn-01  usedbychildren        32.0G                   -
> dc1-hn-01  usedbyrefreservation  0                       -
> dc1-hn-01  logbias               latency                 default
> dc1-hn-01  dedup                 off                     default
> dc1-hn-01  mlslabel                                      -
> dc1-hn-01  sync                  standard                default
> dc1-hn-01  refcompressratio      1.00x                   -
> dc1-hn-01  written               201K                    -
> dc1-hn-01  logicalused           2.89G                   -
> dc1-hn-01  logicalreferenced     36.5K                   -
> dc1-hn-01  volmode               default                 default
> dc1-hn-01  filesystem_limit      none                    default
> dc1-hn-01  snapshot_limit        none                    default
> dc1-hn-01  filesystem_count      none                    default
> dc1-hn-01  snapshot_count        none                    default
> dc1-hn-01  redundant_metadata    all                     default
>
> NAME                               AVAIL   USED  USEDSNAP  USEDDS
> USEDREFRESERV  USEDCHILD
> dc1-hn-01                          78.2T  32.0G         0    201K
>     0      32.0G
> dc1-hn-01/vm                       78.2T  31.9G         0    990M
>     0      30.9G
> dc1-hn-01/vm/dc1-olbp-sn-11        78.2T  30.9G         0    238K
>     0      30.9G
> dc1-hn-01/vm/dc1-olbp-sn-11/disk0  78.2T  30.9G         0   4.35G
>   26.6G          0
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  78.2T  4.50M         0   4.50M
>       0          0
>
> Sparse ZVOL - baseline
>
> NAME                               PROPERTY              VALUE
>       SOURCE
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  type                  volume
>        -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  creation              Fri May 18 15:36
> 2018  -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  used                  4.50M
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  available             78.2T
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  referenced            4.50M
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  compressratio         1.00x
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  reservation           none
>        default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  volsize               28T
>       local
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  volblocksize          8K
>        -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  checksum              on
>        default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  compression           off
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  readonly              off
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  copies                1
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  refreservation        none
>        default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  primarycache          all
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  secondarycache        all
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  usedbysnapshots       0
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  usedbydataset         4.50M
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  usedbychildren        0
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  usedbyrefreservation  0
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  logbias               latency
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  dedup                 off
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  mlslabel
>        -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  sync                  standard
>        default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  refcompressratio      1.00x
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  written               4.50M
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  logicalused           1.89M
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  logicalreferenced     1.89M
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  volmode               dev
>       local
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  snapshot_limit        none
>        default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  snapshot_count        none
>        default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  redundant_metadata    all
>       default
>
>
> GUEST - baseline
>
> 1 x vdisk from host ZVOL ashift 13
>
> NAME                AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV
> USEDCHILD
> dc1-sn-11           26.9T   632K         0    176K              0
>  456K
>
> NAME       PROPERTY              VALUE                   SOURCE
> dc1-sn-11  type                  filesystem              -
> dc1-sn-11  creation              Fri May 18 15:40 2018   -
> dc1-sn-11  used                  632K                    -
> dc1-sn-11  available             26.9T                   -
> dc1-sn-11  referenced            176K                    -
> dc1-sn-11  compressratio         1.00x                   -
> dc1-sn-11  mounted               yes                     -
> dc1-sn-11  quota                 none                    default
> dc1-sn-11  reservation           none                    default
> dc1-sn-11  recordsize            128K                    default
> dc1-sn-11  mountpoint            /export/data/dc1-sn-11  local
> dc1-sn-11  sharenfs              off                     default
> dc1-sn-11  checksum              on                      default
> dc1-sn-11  compression           off                     default
> dc1-sn-11  atime                 on                      default
> dc1-sn-11  devices               on                      default
> dc1-sn-11  exec                  on                      default
> dc1-sn-11  setuid                on                      default
> dc1-sn-11  readonly              off                     default
> dc1-sn-11  jailed                off                     default
> dc1-sn-11  snapdir               hidden                  default
> dc1-sn-11  aclmode               discard                 default
> dc1-sn-11  aclinherit            restricted              default
> dc1-sn-11  canmount              on                      default
> dc1-sn-11  xattr                 off                     temporary
> dc1-sn-11  copies                1                       default
> dc1-sn-11  version               5                       -
> dc1-sn-11  utf8only              off                     -
> dc1-sn-11  normalization         none                    -
> dc1-sn-11  casesensitivity       sensitive               -
> dc1-sn-11  vscan                 off                     default
> dc1-sn-11  nbmand                off                     default
> dc1-sn-11  sharesmb              off                     default
> dc1-sn-11  refquota              none                    default
> dc1-sn-11  refreservation        none                    default
> dc1-sn-11  primarycache          all                     default
> dc1-sn-11  secondarycache        all                     default
> dc1-sn-11  usedbysnapshots       0                       -
> dc1-sn-11  usedbydataset         176K                    -
> dc1-sn-11  usedbychildren        456K                    -
> dc1-sn-11  usedbyrefreservation  0                       -
> dc1-sn-11  logbias               latency                 default
> dc1-sn-11  dedup                 off                     default
> dc1-sn-11  mlslabel                                      -
> dc1-sn-11  sync                  standard                default
> dc1-sn-11  refcompressratio      1.00x                   -
> dc1-sn-11  written               176K                    -
> dc1-sn-11  logicalused           49K                     -
> dc1-sn-11  logicalreferenced     11.5K                   -
> dc1-sn-11  volmode               default                 default
> dc1-sn-11  filesystem_limit      none                    default
> dc1-sn-11  snapshot_limit        none                    default
> dc1-sn-11  filesystem_count      none                    default
> dc1-sn-11  snapshot_count        none                    default
> dc1-sn-11  redundant_metadata    all                     default
>
> After writing some data to the guest
>
> HOST ZVOL
>
> NAME                               PROPERTY              VALUE
>       SOURCE
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  type                  volume
>        -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  creation              Fri May 18 15:36
> 2018  -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  used                  99.7G
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  available             78.1T
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  referenced            99.7G
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  compressratio         1.00x
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  reservation           none
>        default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  volsize               28T
>       local
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  volblocksize          8K
>        -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  checksum              on
>        default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  compression           off
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  readonly              off
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  copies                1
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  refreservation        none
>        default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  primarycache          all
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  secondarycache        all
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  usedbysnapshots       0
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  usedbydataset         99.7G
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  usedbychildren        0
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  usedbyrefreservation  0
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  logbias               latency
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  dedup                 off
>       default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  mlslabel
>        -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  sync                  standard
>        default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  refcompressratio      1.00x
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  written               99.7G
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  logicalused           43.6G
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  logicalreferenced     43.6G
>       -
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  volmode               dev
>       local
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  snapshot_limit        none
>        default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  snapshot_count        none
>        default
> dc1-hn-01/vm/dc1-olbp-sn-11/disk1  redundant_metadata    all
>       default
>
> GUEST ZFS
>
> NAME       PROPERTY              VALUE                   SOURCE
> dc1-sn-11  type                  filesystem              -
> dc1-sn-11  creation              Fri May 18 15:40 2018   -
> dc1-sn-11  used                  44.3G                   -
> dc1-sn-11  available             26.8T                   -
> dc1-sn-11  referenced            176K                    -
> dc1-sn-11  compressratio         1.00x                   -
> dc1-sn-11  mounted               no                      -
> dc1-sn-11  quota                 none                    default
> dc1-sn-11  reservation           none                    default
> dc1-sn-11  recordsize            128K                    default
> dc1-sn-11  mountpoint            /export/data/dc1-sn-11  local
> dc1-sn-11  sharenfs              off                     default
> dc1-sn-11  checksum              on                      default
> dc1-sn-11  compression           off                     default
> dc1-sn-11  atime                 on                      default
> dc1-sn-11  devices               on                      default
> dc1-sn-11  exec                  on                      default
> dc1-sn-11  setuid                on                      default
> dc1-sn-11  readonly              off                     default
> dc1-sn-11  jailed                off                     default
> dc1-sn-11  snapdir               hidden                  default
> dc1-sn-11  aclmode               discard                 default
> dc1-sn-11  aclinherit            restricted              default
> dc1-sn-11  canmount              on                      default
> dc1-sn-11  xattr                 on                      default
> dc1-sn-11  copies                1                       default
> dc1-sn-11  version               5                       -
> dc1-sn-11  utf8only              off                     -
> dc1-sn-11  normalization         none                    -
> dc1-sn-11  casesensitivity       sensitive               -
> dc1-sn-11  vscan                 off                     default
> dc1-sn-11  nbmand                off                     default
> dc1-sn-11  sharesmb              off                     default
> dc1-sn-11  refquota              none                    default
> dc1-sn-11  refreservation        none                    default
> dc1-sn-11  primarycache          all                     default
> dc1-sn-11  secondarycache        all                     default
> dc1-sn-11  usedbysnapshots       0                       -
> dc1-sn-11  usedbydataset         176K                    -
> dc1-sn-11  usedbychildren        44.3G                   -
> dc1-sn-11  usedbyrefreservation  0                       -
> dc1-sn-11  logbias               latency                 default
> dc1-sn-11  dedup                 off                     default
> dc1-sn-11  mlslabel                                      -
> dc1-sn-11  sync                  standard                default
> dc1-sn-11  refcompressratio      1.00x                   -
> dc1-sn-11  written               176K                    -
> dc1-sn-11  logicalused           44.2G                   -
> dc1-sn-11  logicalreferenced     11.5K                   -
> dc1-sn-11  volmode               default                 default
> dc1-sn-11  filesystem_limit      none                    default
> dc1-sn-11  snapshot_limit        none                    default
> dc1-sn-11  filesystem_count      none                    default
> dc1-sn-11  snapshot_count        none                    default
> dc1-sn-11  redundant_metadata    all                     default
>
>
> Regards,
>
>
> Paul Esson
> t  +44 (0)118 951 5235  |   m  +44 (0)776 690 6514
> e  paul.esson@redstor.com<mailto:paul.esson@redstor.com>
>
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMsT2=kybGc0os70yGmEPFeBZU1DMKB0wtDPzBC9rJJwPACuoQ>