Date: Mon, 21 May 2018 20:27:12 +0800 From: =?UTF-8?B?5bSU54GPIChDVUkgSGFvKQ==?= <cuihao.leo@gmail.com> To: freebsd-questions@freebsd.org Subject: Bulk deletion of dedup-ed files: zfskern stuck for long time Message-ID: <CAJm2p=K7ax%2B%2Bo8OxDBPQX4HcmZX6S1v4GfA7HPdY7gjg-asEgA@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
New to this list. Hello everyone. I have been using Linux for years, but new to FreeBSD. I setup a FreeBSD ZFS file server for my lab. Not an enterprise setup, just a small tower server with 16GB memory and 8x 2TB SAS disks. We stores many (about 10 million now) small files in several ZFS datasets. Wishing to experience advanced ZFS features, I enabled dedup for our ZFS pool at first. Later I realized it was a terrible idea when we experienced severe performance degradation. 16GB seems to be too less to dedup. Leaving out the long story of my dedup nightmare, now I have disabled dedup, but existing dedup-ed files are kept. Everything seems good if I just read these dedup-ed files. The problem comes when I delete many dedup-ed files at once. Bulk deletion can render the ZFS pool non-responsive for minitues to an hour. Every ZFS access is stuck in D state. When it is stuck, gstat indicates neither vdev is fully used (%busy is about 20%), which I think is abnormal. Today I destroyed an ZFS pool with about a million of dudup-ed files. For two hours, the pool became non-responsive from time to time. I spent some time to debug it. "procstat -kk" on zfskern shows one suspicious stuck thread: 51021 100695 zfskern txg_thread_enter mi_switch+0xe5 sleepq_wait+0x3a _cv_wait+0x169 zio_wait+0x8b dbuf_read+0x71a dmu_buf_hold_by_dnode+0x3d zap_get_leaf_byblk+0x4d fzap_remove+0x90 zap_remove_uint64+0xc7 ddt_sync+0x553 dsl_scan_sync+0x3f8 spa_sync+0x9d7 txg_sync_thread+0x3f3 fork_exit+0x85 fork_trampoline+0xe or sometimes: 51021 100695 zfskern txg_thread_enter mi_switch+0xe5 sleepq_wait+0x3a _cv_wait+0x169 zio_wait+0x8b dbuf_read+0x71a dmu_buf_hold_by_dnode+0x3d zap_get_leaf_byblk+0x4d fzap_update+0x103 zap_updat e_uint64+0xf8 ddt_zap_update+0x69 ddt_sync+0x754 dsl_scan_sync+0x3f8 spa_sync+0x9d7 txg_sync_thread+0x3f3 fork_exit+0x85 fork_trampoline+0xe I know little about ZFS internals. It seems the thread was stuck in some sync actions (cleaning dedup table?). Can someone explain why ZFS pool was stuck for so long time, when neither vdev IO nor CPU was running out? Is this the correct behaviour? --=20 =E5=B4=94=E7=81=8F / CUI Hao Homepage: i-yu.me Twitter: @cuihaoleo
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJm2p=K7ax%2B%2Bo8OxDBPQX4HcmZX6S1v4GfA7HPdY7gjg-asEgA>