Date: Thu, 11 May 2017 17:06:23 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 219224] umount hanging in _cv_wait zil_commit zfs_sync sys_sync Message-ID: <bug-219224-8@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219224 Bug ID: 219224 Summary: umount hanging in _cv_wait zil_commit zfs_sync sys_sync Product: Base System Version: 11.0-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: t@sdf.org I'm observing hanging processes on a zfs system, apparently after umount tried to sync data before unmounting a jails devfs. tools-1-b# service jail restart git-1-b Stopping jails: git-1-b load: 1.77 cmd: umount 79620 [zilog->zl_cv_batch[0]] 568.08r 0.00u 0.01s 0% 2000k load: 1.65 cmd: umount 79620 [zilog->zl_cv_batch[0]] 575.16r 0.00u 0.01s 0% 2000k load: 2.32 cmd: umount 79620 [zilog->zl_cv_batch[0]] 580.44r 0.00u 0.01s 0% 2000k tools-1-b# procstat -kk 79620 PID TID COMM TDNAME KSTACK=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 79620 100216 umount - mi_switch+0xd2 sleepq_wait+0= x3a _cv_wait+0x194 zil_commit+0x85 zfs_sync+0xad sys_sync+0x146 amd64_syscall+0= x4ce Xfast_syscall+0xfb=20 tools-1-b# kgdb GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain condition= s. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found) ... Attempt to extract a component of a value that is not a structure pointer. Attempt to extract a component of a value that is not a structure pointer. #0 0xffffffff80b0cafb in sched_switch () (kgdb) tid 100216 [Switching to thread 2772 (Thread 100216)]#0 0xffffffff80b0cafb in sched_switch () (kgdb) bt #0 0xffffffff80b0cafb in sched_switch () #1 0xffffffff80ae4ae2 in mi_switch () #2 0xffffffff80b3226a in sleepq_wait () #3 0xffffffff80a66464 in _cv_wait () #4 0xffffffff82252945 in ?? () #5 0xfffffe201d5308e0 in ?? () #6 0xfffff805d1af7500 in ?? () #7 0xfffffe201d530870 in ?? () #8 0xffffffff80bae521 in __mnt_vnode_markerfree_all () #9 0xffffffff8227aded in ?? () #10 0x00000000fffff7ff in ?? () #11 0xfffffe201d530968 in ?? () #12 0xfffffe201d530990 in ?? () #13 0xffffffff80bb48a6 in sys_sync () Previous frame identical to this frame (corrupt stack?) (kgdb)=20 Two other processes are hanging too: tools-1-b# ps ax | awk '$3~"D"&&$5~"^/"' 36031 - DsJ 0:00.49 /var/cfengine/bin/cf-agent -Dfrom_cfexecd,schedu= led_ 40395 - DJ 0:00.01 /var/cfengine/bin/cf-promises -c /var/cfengine/i= nput 79620 0 D+ 0:00.01 /sbin/umount /jails/git-1-b/dev tools-1-b# procstat -kk 36031 PID TID COMM TDNAME KSTACK=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 36031 103064 cf-agent - mi_switch+0xd2 sleepq_wait+0= x3a _cv_wait+0x194 zil_commit+0x85 zfs_freebsd_putpages+0x71e VOP_PUTPAGES_APV+= 0x8d vnode_pager_putpages+0xe3 vm_pageout_flush+0xed vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197 vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d vinactive+0x17d tools-1-b# procstat -kk 40395 PID TID COMM TDNAME KSTACK=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 40395 103070 cf-promises - mi_switch+0xd2 sleepq_wait+0= x3a _cv_wait+0x194 zil_commit+0x85 zfs_freebsd_putpages+0x71e VOP_PUTPAGES_APV+= 0x8d vnode_pager_putpages+0xe3 vm_pageout_flush+0xed vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197 vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d vinactive+0x17d Everything is on zfs, with both the mothership and the jail in zroot/jails/cfengine-1-b sharing pool "zroot": tools-1-b# zpool list=20 NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT data 2.91T 1.43M 2.91T - 0% 0% 2.14x ONLINE - zroot 744G 19.6G 724G - 19% 2% 1.60x ONLINE - I observe another process being reported at 100% cpu usage, that's highly unusual, also inside the cfengine-jail: tools-1-b# ps w 2988 PID TT STAT TIME COMMAND 2988 - TsJ 118:40.91 /usr/local/sbin/cf-serverd tools-1-b# procstat -kk 2988 PID TID COMM TDNAME KSTACK=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 2988 100835 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f=20 2988 100922 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f=20 2988 100963 cf-serverd - dmu_object_info_from_db+0x2d zfs_zget+0xde zfs_get_data+0x40 zil_commit+0x6a1 zfs_freebsd_putpages+0x71e VOP_PUTPAGES_APV+0x8d vnode_pager_putpages+0xe3 vm_pageout_flush+0xed vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197 vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d vinactive+0x17d=20 2988 100966 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f=20 2988 100970 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f=20 2988 100973 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f [ ... 1000 more threads with exactly "mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f"] This thread 100963 seems to be looping around: tools-1-b# {repeat 20 procstat -k 2988} | awk '$2=3D=3D"100963"' | cut -d- = -f3- | sort | uniq=20 dbuf_hold_impl dbuf_hold dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle=20 dbuf_read dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV=20 dbuf_read dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_d= ata zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive=20 dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV vinactive=20 dmu_object_info_from_db zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV vinactive=20 dmu_object_info_from_dnode dmu_object_info_from_db zfs_zget zfsget_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV=20 dmu_zfetch dbuf_read dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle=20 dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data zil_co= mmit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_o bject_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV=20 dnode_rele dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV=20 zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV vinactive vputx Only the above listed processes are affected, there are ~200 other ones running (and writing to zroot pool). --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-219224-8>