Date: Tue, 17 May 2016 07:37:23 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 209571] ZFS and NVMe performing poorly. TRIM requests stall I/O activity Message-ID: <bug-209571-8@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209571 Bug ID: 209571 Summary: ZFS and NVMe performing poorly. TRIM requests stall I/O activity Product: Base System Version: 10.3-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Many People Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: borjam@sarenet.es Created attachment 170388 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D170388&action= =3Dedit throughput graphs for two bonnie++ runs On a test system with 10 Intel P3500 NVMEs I have found that TRIM activity can cause a severe I/O stall. After running several bonnie++ tests, the ZFS file system has been almost unusable for 15 minutes (yes, FIFTEEN!). HOW TO REPRODUCE: - Create a ZFS pool, in this case, a raidz2 pool with the 10 NVMEs. - Create a dataset without compression (we want to test actual I/O performance) - Run bonnie++. As bonnie++ can quickly saturate a single CPU core and hence it's unable to generate enough bandwidth for this setup, I run four bonnie++ processes concurrently. In order to demonstrate this issue, each bonnie++ performs two runs. So, ( bonnie++ -s 512g -x 2 -f) & # four times. Graphs included. Made with devilator (an Orca compatible data collector) pulling data from devstat(9). The disk is just one out of 10 (the other 9 graphs are identical, as expected). The first run of four bonnie++ processes runs without flaws. On graph 1 (TwoBonniesTput) we have the first bonnie++ from the start of the graph to around 08:30 (the green line is the "Intelligent reading" phase, and a second bonnie++ starting right after it. Bonnie++ does several tests, beginning with a write test (blue line showing around 230 MBps, from the start to 07:40), followed by a read/write test (from 07:40 to 08:15 on the graphs), showing read/write/delete activity and finally a read test (green line showing 250 MBps from 08:15 to 08:30 more or less). After bonnie++ ends, the files it created are deleted. In this particular test, four concurrent bonnie++ processes created four files of 512 GB each, a total of 2 TB. After the first run, the disks show the TRIM activity going on at a rate of around 200 MB/s. It seems quite slow, since a test I did at home on an OCZ Vertex4= SSD (albeit, a single one, not a pool) gave a peak of 2 GB/s. But I understand = that=20 the ada driver is coalescing TRIM requests, while the nvd driver doesn't. The trouble is: the second bonnie++ process is started right after the first one, and, THERE IS ALMOST NO WRITE ACTIVITY FOR 15 MINUTES. The writing activity= is=20 just frozen, and it doesn't pick up until about 08:45, stalling again, alth= ough for a shorter time, around 08:50.=20 On exhibit 2, "TwoBonniesTimes", it can be seen that the write latency duri= ng the stall is zero, which means (unless I am wrong) that no write commands are actually reaching the disks. During the stalls the ZFS system was unresponsive. Any commands such as a simple "zfs list" were painfully slow, taking even some minutes to complete. EXPECTED BEHAVIOR: I understand that a heavy TRIM activity must have an impact, but in this ca= se it's causing a complete starvation for the rest of the ZFS I/O activity which is clearly wrong. This behavior could cause a severe problem, por example, when destro= ying a large snapshot. In this case, the system is deleting 2 TB of data. ATTEMPTS TO MITIGATE IT: The first thing I tried was to reduce the priority of the TRIM operations in the I/O scheduler,=20 vfs.zfs.vdev.trim_max_pending=3D100 vfs.zfs.vdev.trim_max_active=3D1 vfs.zfs.vdev.async_write_min_active=3D8 with no visible effect. After reading the article describing the ZFS I/O scheduler I suspected that= the trim activity might be activating the write throttle. So I just disabled it. vfs.zfs.delay_scale=3D0 But it didn't help either. The writing processes still got stuck, but on dp->dp_s rather than dmu_tx_delay. There are two problems here. It seems that the nvd driver doesn't coalesce = trim=20 requests. On the other hand, ZFS is dumping a lot of trim requests assuming that the lower layer will coalesce them.=20 I don't think it's a good idea to make such an assumption blindly in ZFS. On the other hand, I think that there should be some throttling mechanism applied to trim requests. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-209571-8>