Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Dec 2019 15:32:39 +0100
From:      Peter Eriksson <pen@lysator.liu.se>
To:        freebsd-fs@freebsd.org
Subject:   Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
Message-ID:  <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>

next in thread | raw e-mail | index | archive | help
I=E2=80=99ve been looking at trying to figure out why our servers take =
so long to reboot, where the most time is spent doing a =E2=80=9Cshutdown=E2=
=80=9D. We=E2=80=99ve seen examples where it has taken 10-20 minutes (or =
more).

This is Dell PowerEdge R730xd servers with 256GB RAM and ~140TB of =
disks. FreeBSD 11.3. With ~24000 filsystems per server.
We normally cap ARC to 128GB RAM.

Adding a lot of debugging printf() calls to relevant parts of the code =
points to:

kern_shutdown() ->
  EVENTHANDLER_INVOKE(shutdown_post_sync) ->
  zfsshutdown() ->=20
  zfs__fini() ->=20
  spa_fini() ->=20
  zio_fini():

Debug output from a test run:
zio_fini: kmem_cache_destroy(zio_buf_cache & zio_data_buf_cache):

kmem_cache_destroy: uma_zfree_arg(0xfffff803465eec00) [zio_buf_12288] =
took 16 seconds
kmem_cache_destroy(zio_buf_cache[20]) took 16 seconds

kmem_cache_destroy: uma_zfree_arg(0xfffff803465eeb00) [zio_buf_16384] =
took 61 seconds
kmem_cache_destroy(zio_buf_cache[28]) took 61 seconds

kmem_cache_destroy: uma_zfree_arg(0xfffff8034c9018c0) [zio_buf_131072] =
took 87 seconds
kmem_cache_destroy(zio_buf_cache[224]) took 87 seconds

kmem_cache_destroy: uma_zfree_arg(0xfffff8034c901880) =
[zio_data_buf_131072] took 5 seconds
kmem_cache_destroy(zio_data_buf_cache[224]) took 5 seconds

(I modified the code here to print the time spent if it took 2 seconds =
or more)

This is on a newly rebooted server (with all filesystems mounted). Seems =
like uma_zfree_arg() is taking really long to execute. Now that code =
isn=E2=80=99t exactly easy to read (for me atleast)=E2=80=A6 Lot=E2=80=99s=
 of barrier/locks and stuff.

I wonder why this code should take so long? There shouldn=E2=80=99t be =
any disk I/O involved and it=E2=80=99s just a cache so I wonder if there =
might be some way to get rid of it quicker? Any UMA experts online? :-)

Reason for this is that I=E2=80=99d like to be able to make sure a =
server reboots more quickly in case of problems. Now with the parallell =
ZFS mount stuff being done at boot time that part is much quicker :-).

- Peter




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AD17E454-6A51-436D-A853-07F04A406EC9>