Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 May 2024 08:34:22 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        mike tancsa <mike@sentex.net>
Cc:        Matthew Grooms <mgrooms@shrew.net>, stable@freebsd.org
Subject:   Re: how to tell if TRIM is working
Message-ID:  <CANCZdfqa0=1kJQYpbZQ3z2xdxt8x6L7iYJSjQUc7SGUap8KP5Q@mail.gmail.com>
In-Reply-To: <a6a53e96-a8ee-48c0-ae76-1e4150679f13@sentex.net>
References:  <5e1b5097-c1c0-4740-a491-63c709d01c25@sentex.net> <67721332-fa1d-4b3c-aa57-64594ad5d77a@shrew.net> <77e203b3-c555-408b-9634-c452cb3a57ac@sentex.net> <CANCZdfqx_vhNb2BukbM0bxrf8NH_9sXPKW%2BUf=LdoXjw_2w=Dg@mail.gmail.com> <a6a53e96-a8ee-48c0-ae76-1e4150679f13@sentex.net>

next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000076dabe06177980f1
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, May 2, 2024 at 8:19=E2=80=AFAM mike tancsa <mike@sentex.net> wrote:

> On 5/2/2024 10:16 AM, Warner Losh wrote:
>
>
> When trims are fast, you want to send them to the drive as soon as you
> know the blocks are freed. UFS always does this (if trim is enabled at
> all).
> ZFS has a lot of knobs to control when / how / if this is done.
>
> vfs.zfs.vdev.trim_min_active: 1
> vfs.zfs.vdev.trim_max_active: 2
> vfs.zfs.trim.queue_limit: 10
> vfs.zfs.trim.txg_batch: 32
> vfs.zfs.trim.metaslab_skip: 0
> vfs.zfs.trim.extent_bytes_min: 32768
> vfs.zfs.trim.extent_bytes_max: 134217728
> vfs.zfs.l2arc.trim_ahead: 0
>
>
> I've not tried to tune these in the past, but you can see how they affect=
 things.
>
>
> Thanks Warner, I will try and play around with these values to see if the=
y
> impact things.  BTW, do you know what / why things would be "skipped"
> during trim events ?
>
> kstat.zfs.zrootoffs.misc.iostats.trim_bytes_failed: 0
> kstat.zfs.zrootoffs.misc.iostats.trim_extents_failed: 0
> kstat.zfs.zrootoffs.misc.iostats.trim_bytes_skipped: 5968330752
> kstat.zfs.zrootoffs.misc.iostats.trim_extents_skipped: 503986
> kstat.zfs.zrootoffs.misc.iostats.trim_bytes_written: 181593186304
> kstat.zfs.zrootoffs.misc.iostats.trim_extents_written: 303115
>

A quick look at the code suggests that it is when the extent to be trimmed
is smaller than the extent_bytes_min parameter.

The minimum seems to be a trade off between too many trims to the drive and
making sure that the trims that you do send are maximally effective. By
specifying a smaller size, you'll be freeing up more holes in the
underlying NAND blocks. In some drives, this triggers more data copying
(and more write amp), so you want to set it a bit higher for those. In
other drivers, it improves the efficiency of the GC algorithm, allowing
each underlying block groomed to recover more space for future writes. In
the past, I've found that ZFS' defaults are decent for 2018ish level of
SATA SSDs, but a bit too trim avoidy for newer nvme drives, even the cheap
consumer ones. Though that's just a coarse generalization from my
buildworld workload. Other work loads will have other data patterns, ymmv,
so you need to measure it.

Another way to get statistics, one that I've not been able to measure a
slowdown from, is to enable CAM_IOSCHED_DYNAMIC. Then you get a lot more
statistics about the I/Os in the system, including latency measurements. In
theory, that also allows one to traffic shape the trims to the drive, but
I've had only limited success with that and haven't had the time to make it
a lot better.

Warner

--00000000000076dabe06177980f1
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Thu, May 2, 2024 at 8:19=E2=80=AFA=
M mike tancsa &lt;<a href=3D"mailto:mike@sentex.net">mike@sentex.net</a>&gt=
; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px=
 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>

 =20
   =20
 =20
  <div>
    <div>On 5/2/2024 10:16 AM, Warner Losh
      wrote:<br>
    </div>
    <blockquote type=3D"cite">
     =20
      <div dir=3D"ltr">
        <div class=3D"gmail_quote"><br>
          <div>When trims are fast, you want to send them to the drive
            as soon as you</div>
          <div>know the blocks are freed. UFS always does this (if trim
            is enabled at all).</div>
          <div>ZFS has a lot of knobs to control when / how / if this is
            done.</div>
          <div><br>
          </div>
          <div>
            <pre id=3D"m_8001559206439009662gmail-line1">vfs.zfs.vdev.trim_=
min_active: 1
vfs.zfs.vdev.trim_max_active: 2
vfs.zfs.trim.queue_limit: 10
vfs.zfs.trim.txg_batch: 32
vfs.zfs.trim.metaslab_skip: 0
vfs.zfs.trim.extent_bytes_min: 32768
vfs.zfs.trim.extent_bytes_max: 134217728
vfs.zfs.l2arc.trim_ahead: 0

</pre>
            <pre id=3D"m_8001559206439009662gmail-line1">I&#39;ve not tried=
 to tune these in the past, but you can see how they affect things.
</pre>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <p>Thanks Warner, I will try and play around with these values to
      see if they impact things.=C2=A0 BTW, do you know what / why things
      would be &quot;skipped&quot; during trim events ?</p>
    <p>kstat.zfs.zrootoffs.misc.iostats.trim_bytes_failed: 0<br>
      kstat.zfs.zrootoffs.misc.iostats.trim_extents_failed: 0<br>
      kstat.zfs.zrootoffs.misc.iostats.trim_bytes_skipped: 5968330752<br>
      kstat.zfs.zrootoffs.misc.iostats.trim_extents_skipped: 503986<br>
      kstat.zfs.zrootoffs.misc.iostats.trim_bytes_written: 181593186304<br>
      kstat.zfs.zrootoffs.misc.iostats.trim_extents_written: 303115</p></di=
v></blockquote><div><br></div><div>A quick look at the code suggests that i=
t is when the extent to be trimmed is smaller than the extent_bytes_min par=
ameter.</div><div><br></div><div>The minimum seems to be a trade off betwee=
n too many trims to the drive and making sure that the trims that you do se=
nd are maximally effective. By specifying a smaller size, you&#39;ll be fre=
eing up more holes in the underlying NAND blocks. In some drives, this trig=
gers more data copying (and more write amp), so you want to set it a bit hi=
gher for those. In other drivers, it improves the efficiency of the GC algo=
rithm, allowing each underlying block groomed to recover more space for fut=
ure writes. In the past, I&#39;ve found that ZFS&#39; defaults are decent f=
or 2018ish level of SATA SSDs, but a bit too trim avoidy for newer nvme dri=
ves, even the cheap consumer ones. Though that&#39;s just a coarse generali=
zation from my buildworld workload. Other work loads will have other data p=
atterns, ymmv, so you need to measure it.<br></div><div><br></div><div>Anot=
her way to get statistics, one that I&#39;ve not been able to measure a slo=
wdown from, is to enable CAM_IOSCHED_DYNAMIC. Then you get a lot more stati=
stics about the I/Os in the system, including latency measurements. In theo=
ry, that also allows one to traffic shape the trims to the drive, but I&#39=
;ve had only limited success with that and haven&#39;t had the time to make=
 it a lot better.</div><div><br></div><div>Warner</div><div><br></div></div=
></div>

--00000000000076dabe06177980f1--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqa0=1kJQYpbZQ3z2xdxt8x6L7iYJSjQUc7SGUap8KP5Q>