Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 Nov 2016 12:06:00 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Henri Hennebert <hlh@restart.be>, freebsd-stable@FreeBSD.org
Cc:        Konstantin Belousov <kib@FreeBSD.org>
Subject:   Re: Freebsd 11.0 RELEASE - ZFS deadlock
Message-ID:  <32686283-948a-6faf-7ded-ed8fcd23affb@FreeBSD.org>
In-Reply-To: <599c5a5b-aa08-2030-34f3-23ff19d09a9b@restart.be>
References:  <0c223160-b76f-c635-bb15-4a068ba7efe7@restart.be> <aaf2df40-b0df-2141-9ed8-5b947d8d5a33@FreeBSD.org> <43c9d4d4-1995-5626-d70a-f92a5b456629@FreeBSD.org> <a14d508d-351f-71f4-c7cc-ac73dbcde357@restart.be> <9d1f9a76-5a8d-6eca-9a50-907d55099847@FreeBSD.org> <6bc95dce-31e1-3013-bfe3-7c2dd80f9d1e@restart.be> <e4878992-a362-3f12-e743-8efa1347cabf@FreeBSD.org> <23a66749-f138-1f1a-afae-c775f906ff37@restart.be> <8e7547ef-87f7-7fab-6f45-221e8cea1989@FreeBSD.org> <6d991cea-b420-531e-12cc-001e4aeed66b@restart.be> <67f2e8bd-bff0-f808-7557-7dabe5cad78c@FreeBSD.org> <1cb09c54-5f0e-2259-a41a-fefe76b4fe8b@restart.be> <d25c8035-b710-5de9-ebe3-7990b2d0e3b1@FreeBSD.org> <9f20020b-e2f1-862b-c3fc-dc6ff94e301e@restart.be> <c1b7aa94-1f1d-7edd-8764-adb72fdc053c@FreeBSD.org> <599c5a5b-aa08-2030-34f3-23ff19d09a9b@restart.be>

next in thread | previous in thread | raw e-mail | index | archive | help
On 12/11/2016 14:40, Henri Hennebert wrote:
> I attatch it

Thank you!
So, these two threads are trying to get the lock in the exclusive mode:
Thread 687 (Thread 101243):
#0  sched_switch (td=0xfffff800b642b500, newtd=0xfffff8000285ea00, flags=<value
optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
#1  0xffffffff80561ae2 in mi_switch (flags=<value optimized out>, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0xffffffff805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0xffffffff8052f854 in sleeplk (lk=<value optimized out>, flags=<value
optimized out>, ilk=<value optimized out>, wmesg=0xffffffff813be535 "zfs",
pri=<value optimized out>, timo=51) at /usr/src/sys/kern/kern_lock.c:222
#4  0xffffffff8052f39d in __lockmgr_args (lk=<value optimized out>, flags=<value
optimized out>, ilk=<value optimized out>, wmesg=<value optimized out>,
pri=<value optimized out>, timo=<value optimized out>, file=<value optimized
out>, line=<value optimized out>) at /usr/src/sys/kern/kern_lock.c:958
#5  0xffffffff80616a8c in vop_stdlock (ap=<value optimized out>) at lockmgr.h:98
#6  0xffffffff8093784d in VOP_LOCK1_APV (vop=<value optimized out>, a=<value
optimized out>) at vnode_if.c:2087
#7  0xffffffff8063c5b3 in _vn_lock (vp=<value optimized out>, flags=548864,
file=<value optimized out>, line=<value optimized out>) at vnode_if.h:859
#8  0xffffffff8062a5f7 in vget (vp=0xfffff80049c2c000, flags=548864,
td=0xfffff800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523
#9  0xffffffff806118b9 in cache_lookup (dvp=<value optimized out>, vpp=<value
optimized out>, cnp=<value optimized out>, tsp=<value optimized out>,
ticksp=<value optimized out>) at /usr/src/sys/kern/vfs_cache.c:686
#10 0xffffffff806133dc in vfs_cache_lookup (ap=<value optimized out>) at
/usr/src/sys/kern/vfs_cache.c:1081
#11 0xffffffff80935777 in VOP_LOOKUP_APV (vop=<value optimized out>, a=<value
optimized out>) at vnode_if.c:127
#12 0xffffffff8061cdf1 in lookup (ndp=<value optimized out>) at vnode_if.h:54
#13 0xffffffff8061c492 in namei (ndp=<value optimized out>) at
/usr/src/sys/kern/vfs_lookup.c:306
#14 0xffffffff80509395 in kern_execve (td=<value optimized out>, args=<value
optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
#15 0xffffffff80508ccc in sys_execve (td=0xfffff800b642b500,
uap=0xfffffe010182cb80) at /usr/src/sys/kern/kern_exec.c:218
#16 0xffffffff808d449e in amd64_syscall (td=<value optimized out>, traced=0) at
subr_syscall.c:135
#17 0xffffffff808b7ddb in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396

Thread 681 (Thread 101147):
#0  sched_switch (td=0xfffff80065f4e500, newtd=0xfffff8000285f000, flags=<value
optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
#1  0xffffffff80561ae2 in mi_switch (flags=<value optimized out>, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0xffffffff805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0xffffffff8052f854 in sleeplk (lk=<value optimized out>, flags=<value
optimized out>, ilk=<value optimized out>, wmesg=0xffffffff813be535 "zfs",
pri=<value optimized out>, timo=51) at /usr/src/sys/kern/kern_lock.c:222
#4  0xffffffff8052f39d in __lockmgr_args (lk=<value optimized out>, flags=<value
optimized out>, ilk=<value optimized out>, wmesg=<value optimized out>,
pri=<value optimized out>, timo=<value optimized out>, file=<value optimized
out>, line=<value optimized out>) at /usr/src/sys/kern/kern_lock.c:958
#5  0xffffffff80616a8c in vop_stdlock (ap=<value optimized out>) at lockmgr.h:98
#6  0xffffffff8093784d in VOP_LOCK1_APV (vop=<value optimized out>, a=<value
optimized out>) at vnode_if.c:2087
#7  0xffffffff8063c5b3 in _vn_lock (vp=<value optimized out>, flags=548864,
file=<value optimized out>, line=<value optimized out>) at vnode_if.h:859
#8  0xffffffff8062a5f7 in vget (vp=0xfffff80049c2c000, flags=548864,
td=0xfffff80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523
#9  0xffffffff806118b9 in cache_lookup (dvp=<value optimized out>, vpp=<value
optimized out>, cnp=<value optimized out>, tsp=<value optimized out>,
ticksp=<value optimized out>) at /usr/src/sys/kern/vfs_cache.c:686
#10 0xffffffff806133dc in vfs_cache_lookup (ap=<value optimized out>) at
/usr/src/sys/kern/vfs_cache.c:1081
#11 0xffffffff80935777 in VOP_LOOKUP_APV (vop=<value optimized out>, a=<value
optimized out>) at vnode_if.c:127
#12 0xffffffff8061cdf1 in lookup (ndp=<value optimized out>) at vnode_if.h:54
#13 0xffffffff8061c492 in namei (ndp=<value optimized out>) at
/usr/src/sys/kern/vfs_lookup.c:306
#14 0xffffffff80509395 in kern_execve (td=<value optimized out>, args=<value
optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
#15 0xffffffff80508ccc in sys_execve (td=0xfffff80065f4e500,
uap=0xfffffe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218
#16 0xffffffff808d449e in amd64_syscall (td=<value optimized out>, traced=0) at
subr_syscall.c:135
#17 0xffffffff808b7ddb in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396

And the original stuck thread wants to get the lock in the shared mode.
And there should be another thread that already holds the lock in the shared
mode.  But I am not able to identify it.  I wonder if the original thread could
be trying to get the lock recursively...

It would be interesting to get more details from thread 101112.
You can switch to it using tid command, you can use 'fr' to select frames, 'info
local' and 'info args' to see what variables are available (not optimized out)
and the you can print any that look interesting.  It would be nice to get a file
path and a directory vnode where the lookup is called.

Thank you.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?32686283-948a-6faf-7ded-ed8fcd23affb>