Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 May 2016 07:37:23 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 209571] ZFS and NVMe performing poorly. TRIM requests stall I/O activity
Message-ID:  <bug-209571-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209571

            Bug ID: 209571
           Summary: ZFS and NVMe performing poorly. TRIM requests stall
                    I/O activity
           Product: Base System
           Version: 10.3-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: borjam@sarenet.es

Created attachment 170388
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D170388&action=
=3Dedit
throughput graphs for two bonnie++ runs

On a test system with 10 Intel P3500 NVMEs I have found that TRIM
activity can cause a severe I/O stall. After running several bonnie++
tests, the ZFS file system has been almost unusable for 15 minutes (yes,
FIFTEEN!).



HOW TO REPRODUCE:

- Create a ZFS pool, in this case, a raidz2 pool with the 10 NVMEs.

- Create a dataset without compression (we want to test actual I/O
performance)

- Run bonnie++. As bonnie++ can quickly saturate a single CPU core and
hence it's unable to generate enough bandwidth for this setup, I run
four bonnie++ processes concurrently. In order to demonstrate this
issue, each bonnie++ performs two runs. So,
( bonnie++ -s 512g -x 2 -f) & # four times.


Graphs included. Made with devilator (an Orca compatible data collector)
pulling data from devstat(9). The disk is just one out of 10 (the other
9 graphs are identical, as expected).

The first run of four bonnie++ processes runs without flaws. On graph
1 (TwoBonniesTput) we have the first bonnie++ from the start of the
graph to around 08:30 (the green line is the "Intelligent reading"
phase, and a second bonnie++ starting right after it.

Bonnie++ does several tests, beginning with a write test (blue line
showing around 230 MBps, from the start to 07:40), followed by a
read/write test (from 07:40 to 08:15 on the graphs), showing
read/write/delete activity and finally a read test (green line showing
250 MBps from 08:15 to 08:30 more or less). After bonnie++ ends, the
files it created are deleted. In this particular test, four concurrent
bonnie++ processes created four files of 512 GB each, a total of 2 TB.

After the first run, the disks show the TRIM activity going on at a rate of
around
200 MB/s. It seems quite slow, since a test I did at home on an OCZ Vertex4=
 SSD
(albeit, a single one, not a pool) gave a peak of 2 GB/s. But I understand =
that=20
the ada driver is coalescing TRIM requests, while the nvd driver doesn't.

The trouble is: the second bonnie++ process is started right after the first
one,
and, THERE IS ALMOST NO WRITE ACTIVITY FOR 15 MINUTES. The writing activity=
 is=20
just frozen, and it doesn't pick up until about 08:45, stalling again, alth=
ough
for a shorter time, around 08:50.=20

On exhibit 2, "TwoBonniesTimes", it can be seen that the write latency duri=
ng
the stall
is zero, which means (unless I am wrong) that no write commands are actually
reaching
the disks.

During the stalls the ZFS system was unresponsive. Any commands such as a
simple
"zfs list" were painfully slow, taking even some minutes to complete.



EXPECTED BEHAVIOR:

I understand that a heavy TRIM activity must have an impact, but in this ca=
se
it's
causing a complete starvation for the rest of the ZFS I/O activity which is
clearly
wrong. This behavior could cause a severe problem, por example, when destro=
ying
a large
snapshot. In this case, the system is deleting 2 TB of data.




ATTEMPTS TO MITIGATE IT:

The first thing I tried was to reduce the priority of the TRIM operations in
the I/O
scheduler,=20
    vfs.zfs.vdev.trim_max_pending=3D100
    vfs.zfs.vdev.trim_max_active=3D1
    vfs.zfs.vdev.async_write_min_active=3D8

with no visible effect.

After reading the article describing the ZFS I/O scheduler I suspected that=
 the
trim
activity might be activating the write throttle. So I just disabled it.

    vfs.zfs.delay_scale=3D0

But it didn't help either. The writing processes still got stuck, but on
dp->dp_s rather
than dmu_tx_delay.



There are two problems here. It seems that the nvd driver doesn't coalesce =
trim=20
requests. On the other hand, ZFS is dumping a lot of trim requests assuming
that the
lower layer will coalesce them.=20

I don't think it's a good idea to make such an assumption blindly in ZFS. On
the other
hand, I think that there should be some throttling mechanism applied to trim
requests.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-209571-8>