Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Mar 2020 16:36:49 +0100
From:      Peter Eriksson <pen@lysator.liu.se>
To:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   ZFS/NFS hickups and some tools to monitor stuff...
Message-ID:  <CFD0E4E5-EF2B-4789-BF14-F46AC569A191@lysator.liu.se>

next in thread | raw e-mail | index | archive | help
The last couple of weeks I=E2=80=99ve been fighting with a severe case =
of NFS users complaining about slow response times from our (5) FreeBSD =
11.3-RELEASE-p6 file servers. Now even though our SMB (Windows) users =
(thankfully since they are like 500 per server vs 50 NFS users) didn=E2=80=
=99t see the same slowdown (or atleast didn=E2=80=99t complain about it) =
the root cause is probably ZFS-related.

We=E2=80=99ve identified a number of cases where some ZFS operation can =
cause severe slowdown of NFS operations, and I=E2=80=99ve been trying to =
figure our what is the cause and ways to mitigate the problem=E2=80=A6

Some operations that have caused issues:

1. Resilver (basically made NFS service useless during the week it =
took=E2=80=A6) with response time for NFS operations regularity up to 10 =
seconds or more (vs the normal 1-10ms)

2. Snapshot recursive deferred destruction (=E2=80=9Czfs destroy -dr =
DATA@snapnam=E2=80=9D). Especially bad together with filesystems at or =
near quota.

3. Rsync cloning of data into the servers. Response times up to 15 =
minutes was seen=E2=80=A6 Yes, 15 minutes to do a mkdir(=E2=80=9Ctest-dir=E2=
=80=9D). Possibly in conjunction with #1 above=E2=80=A6.

Previously #1 and #2 hasn=E2=80=99t caused that much problems, and #3 =
definitely. Something has changed the last half year or so but so far I =
haven=E2=80=99t been able to figure it out.

In order to test/figure things out I=E2=80=99ve written a couple of =
tools that other might be finding useful:


A couple of Dtrace scripts to watch and monitor timing of kernel calls =
related to NFS, GSS & ZFS operations (the =E2=80=9Cnfsuser=E2=80=9D =
script is a simple =E2=80=9Cnfs user top=E2=80=9D tool). Also a patch =
that adds a sysctl than can be set and have the NFS kernel code be =
verbose about slow NFS operations:

  - https://github.com/ptrrkssn/freebsd-stuff =
<https://github.com/ptrrkssn/freebsd-stuff>;

With the kernel patch (nfsd-verbose-timing-11.3.patch or =
nfsd-verbose-timing-12.1.patch) then it=E2=80=99ll look like this:

(Altert if op takes more than 1000us):
# sysctl vfs.nfsd.verbose_timing=3D1000

nfsrvd_dorpc(vers=3D4.1, uid=3D65534, procnum=3D9, repstat=3D0) took =
2853 =C2=B5s
nfsrvd_dorpc(vers=3D4.0, uid=3D1003258, procnum=3D17, repstat=3D10009) =
took 5433 =C2=B5s
nfsrvd_dorpc(vers=3D4.1, uid=3D65534, procnum=3D9, repstat=3D0) took =
2026 =C2=B5s

(The Dtrace scripts can do similar stuff, but just in case I wanted a =
real in-kernel way to see if delays are caused by something on the =
machine or something else (network delays etc).



A very simple C program that basically in a loop tests some simple =
filesystem operations and measures the time it takes:

  - https://github.com/ptrrkssn/pfst <https://github.com/ptrrkssn/pfst>;


Mount a number of fileservers on /mnt/filur01, /mnt/filur02, =
/mnt/filur03 etc, then run this to have It complain if an operation =
takes more than 100ms):
(With =E2=80=9C-v=E2=80=9D will print time time each operation takes)

$ ./pfst -t100ms /mnt/filur0*
2020-03-14 10:56:36 [ 109 ms]: /mnt/filur03: =
mkdir("t-omnibus-615-53202") [Time limit exceeded]
2020-03-14 11:14:22 [ 536 s ]: /mnt/filur03: =
mkdir("t-omnibus-615-53729") [Time limit exceeded]
2020-03-14 11:31:47 [  13 m ]: /mnt/filur03: =
mkdir("t-omnibus-615-53965") [Time limit exceeded]
2020-03-14 11:35:44 [ 182 ms]: /mnt/filur03: =
mkdir("t-omnibus-615-54201") [Time limit exceeded]
2020-03-14 12:20:03 [  15 m ]: /mnt/filur03: =
mkdir("t-omnibus-615-55908") [Time limit exceeded]
2020-03-14 12:39:09 [  15 m ]: /mnt/filur03: =
mkdir("t-omnibus-615-56103") [Time limit exceeded]
2020-03-14 12:50:58 [ 466 s ]: /mnt/filur03: =
mkdir("t-omnibus-615-56344") [Time limit exceeded]

With =E2=80=9C-v=E2=80=9D:
2020-03-14 16:27:48 [1349 =C2=B5s]: /mnt/filur01: =
mkdir("t-omnibus-637-2")
2020-03-14 16:27:48 [ 327 =C2=B5s]: /mnt/filur01: =
rmdir("t-omnibus-637-2")


It would be interresting to see if others too are seeing ZFS and/or NFS =
slowdowns during heavy writing operations (resilver, snapshot-destroy, =
rsync)=E2=80=A6


Our DATA pools are basically 2xRAIDZ2(4+2) of 10TB 7200rpm disks + 400GB =
SSD:s for ZIL + 400GB SSDs for L2ARC. 256GB RAM, configured with ARC-MAX =
set to 64GB (used to be 128GB but we ran into out-of-memory with the =
500+ Samba smbd daemons that would compete for the RAM=E2=80=A6)


We=E2=80=99ve tried it with and without L2ARC, and replaced the SSD:s. =
Disabled TRIM. Not much difference. Tried trimming various sysctls but =
no difference seen so far. Annoying problem this=E2=80=A6

- Peter




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CFD0E4E5-EF2B-4789-BF14-F46AC569A191>