Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 May 2017 17:06:23 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 219224] umount hanging in _cv_wait zil_commit zfs_sync sys_sync
Message-ID:  <bug-219224-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219224

            Bug ID: 219224
           Summary: umount hanging in _cv_wait zil_commit zfs_sync
                    sys_sync
           Product: Base System
           Version: 11.0-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: t@sdf.org

I'm observing hanging processes on a zfs system, apparently after umount
tried to sync data before unmounting a jails devfs.

tools-1-b# service jail restart git-1-b
Stopping jails: git-1-b
load: 1.77  cmd: umount 79620 [zilog->zl_cv_batch[0]] 568.08r 0.00u 0.01s 0%
2000k
load: 1.65  cmd: umount 79620 [zilog->zl_cv_batch[0]] 575.16r 0.00u 0.01s 0%
2000k
load: 2.32  cmd: umount 79620 [zilog->zl_cv_batch[0]] 580.44r 0.00u 0.01s 0%
2000k

tools-1-b# procstat -kk 79620
  PID    TID COMM             TDNAME           KSTACK=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
79620 100216 umount           -                mi_switch+0xd2 sleepq_wait+0=
x3a
_cv_wait+0x194 zil_commit+0x85 zfs_sync+0xad sys_sync+0x146 amd64_syscall+0=
x4ce
Xfast_syscall+0xfb=20

tools-1-b# kgdb
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain condition=
s.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols
found)
...
Attempt to extract a component of a value that is not a structure pointer.
Attempt to extract a component of a value that is not a structure pointer.
#0  0xffffffff80b0cafb in sched_switch ()
(kgdb) tid 100216
[Switching to thread 2772 (Thread 100216)]#0  0xffffffff80b0cafb in
sched_switch ()
(kgdb) bt
#0  0xffffffff80b0cafb in sched_switch ()
#1  0xffffffff80ae4ae2 in mi_switch ()
#2  0xffffffff80b3226a in sleepq_wait ()
#3  0xffffffff80a66464 in _cv_wait ()
#4  0xffffffff82252945 in ?? ()
#5  0xfffffe201d5308e0 in ?? ()
#6  0xfffff805d1af7500 in ?? ()
#7  0xfffffe201d530870 in ?? ()
#8  0xffffffff80bae521 in __mnt_vnode_markerfree_all ()
#9  0xffffffff8227aded in ?? ()
#10 0x00000000fffff7ff in ?? ()
#11 0xfffffe201d530968 in ?? ()
#12 0xfffffe201d530990 in ?? ()
#13 0xffffffff80bb48a6 in sys_sync ()
Previous frame identical to this frame (corrupt stack?)
(kgdb)=20

Two other processes are hanging too:

tools-1-b# ps ax | awk '$3~"D"&&$5~"^/"'
36031  -  DsJ      0:00.49 /var/cfengine/bin/cf-agent -Dfrom_cfexecd,schedu=
led_
40395  -  DJ       0:00.01 /var/cfengine/bin/cf-promises -c /var/cfengine/i=
nput
79620  0  D+       0:00.01 /sbin/umount /jails/git-1-b/dev

tools-1-b# procstat -kk 36031
  PID    TID COMM             TDNAME           KSTACK=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
36031 103064 cf-agent         -                mi_switch+0xd2 sleepq_wait+0=
x3a
_cv_wait+0x194 zil_commit+0x85 zfs_freebsd_putpages+0x71e VOP_PUTPAGES_APV+=
0x8d
vnode_pager_putpages+0xe3 vm_pageout_flush+0xed
vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197
vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f
vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d
vinactive+0x17d

tools-1-b# procstat -kk 40395
  PID    TID COMM             TDNAME           KSTACK=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
40395 103070 cf-promises      -                mi_switch+0xd2 sleepq_wait+0=
x3a
_cv_wait+0x194 zil_commit+0x85 zfs_freebsd_putpages+0x71e VOP_PUTPAGES_APV+=
0x8d
vnode_pager_putpages+0xe3 vm_pageout_flush+0xed
vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197
vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f
vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d
vinactive+0x17d

Everything is on zfs, with both the mothership and the
jail in zroot/jails/cfengine-1-b sharing pool "zroot":

tools-1-b# zpool list=20
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
data   2.91T  1.43M  2.91T         -     0%     0%  2.14x  ONLINE  -
zroot   744G  19.6G   724G         -    19%     2%  1.60x  ONLINE  -

I observe another process being reported at 100%
cpu usage, that's highly unusual, also inside the cfengine-jail:

tools-1-b# ps w 2988
 PID TT  STAT      TIME COMMAND
2988  -  TsJ  118:40.91 /usr/local/sbin/cf-serverd

tools-1-b# procstat -kk 2988
  PID    TID COMM             TDNAME           KSTACK=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
 2988 100835 cf-serverd       -                mi_switch+0xd2
thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f=20
 2988 100922 cf-serverd       -                mi_switch+0xd2
thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f=20
 2988 100963 cf-serverd       -                dmu_object_info_from_db+0x2d
zfs_zget+0xde zfs_get_data+0x40 zil_commit+0x6a1 zfs_freebsd_putpages+0x71e
VOP_PUTPAGES_APV+0x8d vnode_pager_putpages+0xe3 vm_pageout_flush+0xed
vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197
vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f
vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d
vinactive+0x17d=20
 2988 100966 cf-serverd       -                mi_switch+0xd2
thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f=20
 2988 100970 cf-serverd       -                mi_switch+0xd2
thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f=20
 2988 100973 cf-serverd       -                mi_switch+0xd2
thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f

[ ... 1000 more threads with exactly
"mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f"]

This thread 100963 seems to be looping around:

tools-1-b# {repeat 20 procstat -k 2988} | awk '$2=3D=3D"100963"' | cut -d- =
-f3- |
sort | uniq=20
                dbuf_hold_impl dbuf_hold dnode_hold_impl dmu_bonus_hold
zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV
vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush
vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV
vgonel vrecycle=20
                dbuf_read dmu_bonus_hold zfs_zget zfs_get_data zil_commit
zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush
vm_object_page_collect_flush vm_object_page_clean vm_object_terminate
zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive
VOP_INACTIVE_APV=20
                dbuf_read dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_d=
ata
zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages
vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean
vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle
zfs_freebsd_inactive=20
                dmu_bonus_hold zfs_zget zfs_get_data zil_commit
zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush
vm_object_page_collect_flush vm_object_page_clean vm_object_terminate
zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive
VOP_INACTIVE_APV vinactive=20
                dmu_object_info_from_db zfs_zget zfs_get_data zil_commit
zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush
vm_object_page_collect_flush vm_object_page_clean vm_object_terminate
zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive
VOP_INACTIVE_APV vinactive=20
                dmu_object_info_from_dnode dmu_object_info_from_db zfs_zget
zfsget_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV
vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush
vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV
vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV=20
                dmu_zfetch dbuf_read dnode_hold_impl dmu_bonus_hold zfs_zget
zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV
vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush
vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV
vgonel vrecycle=20
                dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data zil_co=
mmit
zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush
vm_o
bject_page_collect_flush vm_object_page_clean vm_object_terminate
zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive
VOP_INACTIVE_APV=20
                dnode_rele dmu_bonus_hold zfs_zget zfs_get_data zil_commit
zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush
vm_object_page_collect_flush vm_object_page_clean vm_object_terminate
zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive
VOP_INACTIVE_APV=20
                zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages
VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush
vm_object_page_collect_flush vm_object_page_clean vm_object_terminate
zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive
VOP_INACTIVE_APV vinactive vputx

Only the above listed processes are affected, there
are ~200 other ones running (and writing to zroot pool).

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-219224-8>