FreeBSD Mail Archives

From:      mike tancsa <mike@sentex.net>
Cc:        Matthew Grooms <mgrooms@shrew.net>, stable@freebsd.org
In-Reply-To: <CANCZdfqa0=1kJQYpbZQ3z2xdxt8x6L7iYJSjQUc7SGUap8KP5Q@mail.gmail.com>
References:  <5e1b5097-c1c0-4740-a491-63c709d01c25@sentex.net> <67721332-fa1d-4b3c-aa57-64594ad5d77a@shrew.net> <77e203b3-c555-408b-9634-c452cb3a57ac@sentex.net> <CANCZdfqx_vhNb2BukbM0bxrf8NH_9sXPKW%2BUf=LdoXjw_2w=Dg@mail.gmail.com> <a6a53e96-a8ee-48c0-ae76-1e4150679f13@sentex.net> <CANCZdfqa0=1kJQYpbZQ3z2xdxt8x6L7iYJSjQUc7SGUap8KP5Q@mail.gmail.com>

This is a multi-part message in MIME format.
--------------eLPL2RN0VjwmH2GZki0qyOfy
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

On 5/2/2024 10:34 AM, Warner Losh wrote:
>
>
> On Thu, May 2, 2024 at 8:19 AM mike tancsa <mike@sentex.net> wrote:
>
>     On 5/2/2024 10:16 AM, Warner Losh wrote:
>>
>>     When trims are fast, you want to send them to the drive as soon
>>     as you
>>     know the blocks are freed. UFS always does this (if trim is
>>     enabled at all).
>>     ZFS has a lot of knobs to control when / how / if this is done.
>>
>>     vfs.zfs.vdev.trim_min_active: 1
>>     vfs.zfs.vdev.trim_max_active: 2
>>     vfs.zfs.trim.queue_limit: 10
>>     vfs.zfs.trim.txg_batch: 32
>>     vfs.zfs.trim.metaslab_skip: 0
>>     vfs.zfs.trim.extent_bytes_min: 32768
>>     vfs.zfs.trim.extent_bytes_max: 134217728
>>     vfs.zfs.l2arc.trim_ahead: 0
>>
>>     I've not tried to tune these in the past, but you can see how they affect things.
>>
>     Thanks Warner, I will try and play around with these values to see
>     if they impact things.  BTW, do you know what / why things would
>     be "skipped" during trim events ?
>
>     kstat.zfs.zrootoffs.misc.iostats.trim_bytes_failed: 0
>     kstat.zfs.zrootoffs.misc.iostats.trim_extents_failed: 0
>     kstat.zfs.zrootoffs.misc.iostats.trim_bytes_skipped: 5968330752
>     kstat.zfs.zrootoffs.misc.iostats.trim_extents_skipped: 503986
>     kstat.zfs.zrootoffs.misc.iostats.trim_bytes_written: 181593186304
>     kstat.zfs.zrootoffs.misc.iostats.trim_extents_written: 303115
>
>
> A quick look at the code suggests that it is when the extent to be 
> trimmed is smaller than the extent_bytes_min parameter.
>
> The minimum seems to be a trade off between too many trims to the 
> drive and making sure that the trims that you do send are maximally 
> effective. By specifying a smaller size, you'll be freeing up more 
> holes in the underlying NAND blocks. In some drives, this triggers 
> more data copying (and more write amp), so you want to set it a bit 
> higher for those. In other drivers, it improves the efficiency of the 
> GC algorithm, allowing each underlying block groomed to recover more 
> space for future writes. In the past, I've found that ZFS' defaults 
> are decent for 2018ish level of SATA SSDs, but a bit too trim avoidy 
> for newer nvme drives, even the cheap consumer ones. Though that's 
> just a coarse generalization from my buildworld workload. Other work 
> loads will have other data patterns, ymmv, so you need to measure it.
>
OK some updates.  Since a new version of zfs was MFC'd into RELENG14 I 
thought I would try again. And to my pleasant surprise, it is working 
*really* well.  My test of zfs send ${a} | zfs recv ${a}2  | zfs 
destroy ${a}2,  followed by a zpool -w trim where ${a}= a ~ 300G dataset 
with millions of files of various sizes, is now very predictable over 
the course of a dozen loops. Previously, this would start to slow down 
by a factor of 3 or 4 after 3 iterations which corresponded roughly to 
the 1TB sized drives.


zfs-2.2.4-FreeBSD_g256659204
zfs-kmod-2.2.4-FreeBSD_g256659204

FreeBSD r-14mfitest 14.1-STABLE FreeBSD 14.1-STABLE stable/14-45764d1d4 
GENERIC amd64

same hardware as before, same zfs datasets.  One thing that did seems to 
change and I am not sure if thats an issue or not is that I had booted 
TruNas' Linux variant to run the tests which also worked as expected.  
But looking at zpool history I dont see anything obvious that would 
change the behaviour when I re-ran the tests. zpool history shows

2024-04-25.10:16:35 zpool export quirk-test
2024-04-25.10:16:48 zpool import quirk-test
2024-04-26.11:44:25 zpool export quirk-test
2024-04-26.13:11:18 py-libzfs: zpool import 13273111966766428207 quirk-test
2024-04-26.13:11:22 py-libzfs: zfs inherit -r quirk-test/junk
2024-04-26.13:11:23 py-libzfs: zfs inherit -r quirk-test/bull1
2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o 
readonly=off -o snapdir=hidden -o xattr=sa quirk-test/.system
2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o 
readonly=off -o snapdir=hidden -o quota=1G -o xattr=sa 
quirk-test/.system/cores
2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o 
readonly=off -o snapdir=hidden -o xattr=sa quirk-test/.system/samba4
2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o 
readonly=off -o snapdir=hidden -o xattr=sa 
quirk-test/.system/configs-ae32c386e13840b2bf9c0083275e7941
2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o 
readonly=off -o snapdir=hidden -o xattr=sa 
quirk-test/.system/netdata-ae32c386e13840b2bf9c0083275e7941
2024-04-26.13:22:10 zfs snapshot quirk-test/bull1@snap1

Nothing else seems to have been done to the pool params.  I tried the 
tests with the defaults of vfs.zfs.trim.extent_bytes_min as well with it 
divided by 2 but that didnt seems to make any difference.  I have a 
stack of fresh WD 1 and 2TB blue SSDs that I might pop into the test box 
later this week and see if all is still good in case Linux did something 
to these disks although the output of camcontrol identify doesnt show 
any difference prior to the import/export so nothing seems to have 
changed with the drives.

     ---Mike




--------------eLPL2RN0VjwmH2GZki0qyOfy
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit

<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">On 5/2/2024 10:34 AM, Warner Losh
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CANCZdfqa0=1kJQYpbZQ3z2xdxt8x6L7iYJSjQUc7SGUap8KP5Q@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Thu, May 2, 2024 at
            8:19 AM mike tancsa &lt;<a href="mailto:mike@sentex.net"
              moz-do-not-send="true" class="moz-txt-link-freetext">mike@sentex.net</a>&gt;
            wrote:<br>
          </div>
          <blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <div>On 5/2/2024 10:16 AM, Warner Losh wrote:<br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote"><br>
                    <div>When trims are fast, you want to send them to
                      the drive as soon as you</div>
                    <div>know the blocks are freed. UFS always does this
                      (if trim is enabled at all).</div>
                    <div>ZFS has a lot of knobs to control when / how /
                      if this is done.</div>
                    <div><br>
                    </div>
                    <div>
                      <pre id="m_8001559206439009662gmail-line1">vfs.zfs.vdev.trim_min_active: 1
vfs.zfs.vdev.trim_max_active: 2
vfs.zfs.trim.queue_limit: 10
vfs.zfs.trim.txg_batch: 32
vfs.zfs.trim.metaslab_skip: 0
vfs.zfs.trim.extent_bytes_min: 32768
vfs.zfs.trim.extent_bytes_max: 134217728
vfs.zfs.l2arc.trim_ahead: 0

</pre>
                      <pre id="m_8001559206439009662gmail-line1">I've not tried to tune these in the past, but you can see how they affect things.
</pre>
                    </div>
                    <br>
                  </div>
                </div>
              </blockquote>
              <p>Thanks Warner, I will try and play around with these
                values to see if they impact things.  BTW, do you know
                what / why things would be "skipped" during trim events
                ?</p>
              <p>kstat.zfs.zrootoffs.misc.iostats.trim_bytes_failed: 0<br>
                kstat.zfs.zrootoffs.misc.iostats.trim_extents_failed: 0<br>
                kstat.zfs.zrootoffs.misc.iostats.trim_bytes_skipped:
                5968330752<br>
                kstat.zfs.zrootoffs.misc.iostats.trim_extents_skipped:
                503986<br>
                kstat.zfs.zrootoffs.misc.iostats.trim_bytes_written:
                181593186304<br>
                kstat.zfs.zrootoffs.misc.iostats.trim_extents_written:
                303115</p>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>A quick look at the code suggests that it is when the
            extent to be trimmed is smaller than the extent_bytes_min
            parameter.</div>
          <div><br>
          </div>
          <div>The minimum seems to be a trade off between too many
            trims to the drive and making sure that the trims that you
            do send are maximally effective. By specifying a smaller
            size, you'll be freeing up more holes in the underlying NAND
            blocks. In some drives, this triggers more data copying (and
            more write amp), so you want to set it a bit higher for
            those. In other drivers, it improves the efficiency of the
            GC algorithm, allowing each underlying block groomed to
            recover more space for future writes. In the past, I've
            found that ZFS' defaults are decent for 2018ish level of
            SATA SSDs, but a bit too trim avoidy for newer nvme drives,
            even the cheap consumer ones. Though that's just a coarse
            generalization from my buildworld workload. Other work loads
            will have other data patterns, ymmv, so you need to measure
            it.<br>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <p>OK some updates.  Since a new version of zfs was MFC'd into
      RELENG14 I thought I would try again. And to my pleasant surprise,
      it is working *really* well.  My test of zfs send ${a} | zfs recv
      ${a}2  | zfs destroy ${a}2,  followed by a zpool -w trim where
      ${a}= a ~ 300G dataset with millions of files of various sizes, is
      now very predictable over the course of a dozen loops. 
      Previously, this would start to slow down by a factor of 3 or 4
      after 3 iterations which corresponded roughly to the 1TB sized
      drives. <br>
    </p>
    <p><br>
    </p>
    <p>zfs-2.2.4-FreeBSD_g256659204<br>
      zfs-kmod-2.2.4-FreeBSD_g256659204</p>
    <p>FreeBSD r-14mfitest 14.1-STABLE FreeBSD 14.1-STABLE
      stable/14-45764d1d4 GENERIC amd64</p>
    <p>same hardware as before, same zfs datasets.  One thing that did
      seems to change and I am not sure if thats an issue or not is that
      I had booted TruNas' Linux variant to run the tests which also
      worked as expected.  But looking at zpool history I dont see
      anything obvious that would change the behaviour when I re-ran the
      tests. zpool history shows<br>
    </p>
    <p>2024-04-25.10:16:35 zpool export quirk-test<br>
      2024-04-25.10:16:48 zpool import quirk-test<br>
      2024-04-26.11:44:25 zpool export quirk-test<br>
      2024-04-26.13:11:18 py-libzfs: zpool import 13273111966766428207 
      quirk-test<br>
      2024-04-26.13:11:22 py-libzfs: zfs inherit -r quirk-test/junk<br>
      2024-04-26.13:11:23 py-libzfs: zfs inherit -r quirk-test/bull1<br>
      2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o
      readonly=off -o snapdir=hidden -o xattr=sa quirk-test/.system<br>
      2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o
      readonly=off -o snapdir=hidden -o quota=1G -o xattr=sa
      quirk-test/.system/cores<br>
      2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o
      readonly=off -o snapdir=hidden -o xattr=sa
      quirk-test/.system/samba4<br>
      2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o
      readonly=off -o snapdir=hidden -o xattr=sa
      quirk-test/.system/configs-ae32c386e13840b2bf9c0083275e7941<br>
      2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o
      readonly=off -o snapdir=hidden -o xattr=sa
      quirk-test/.system/netdata-ae32c386e13840b2bf9c0083275e7941<br>
      2024-04-26.13:22:10 zfs snapshot quirk-test/bull1@snap1</p>
    <p>Nothing else seems to have been done to the pool params.  I tried
      the tests with the defaults of vfs.zfs.trim.extent_bytes_min as
      well with it divided by 2 but that didnt seems to make any
      difference.  I have a stack of fresh WD 1 and 2TB blue SSDs that I
      might pop into the test box later this week and see if all is
      still good in case Linux did something to these disks although the
      output of camcontrol identify doesnt show any difference prior to
      the import/export so nothing seems to have changed with the
      drives.<br>
    </p>
    <p>    ---Mike<br>
    </p>
    <p><br>
    </p>
    <p><br>
    </p>
    <p><br>
    </p>
  </body>
</html>

--------------eLPL2RN0VjwmH2GZki0qyOfy--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation