Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Aug 2013 03:01:07 -0400
From:      J David <j.david.lists@gmail.com>
To:        freebsd-stable <freebsd-stable@freebsd.org>
Subject:   Re: NFS deadlock on 9.2-Beta1
Message-ID:  <CABXB=RQZNWg7wmajNWrBLQAiUsAYXqMFAF1GVpFTMf2QvqLqWw@mail.gmail.com>
In-Reply-To: <461961460.12238255.1377133690607.JavaMail.root@uoguelph.ca>
References:  <20130821131032.GX4972@kib.kiev.ua> <461961460.12238255.1377133690607.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
Now that a kernel with INVARIANTS/WITNESS is finally available on a
machine with serial console I am having terrible trouble provoking
this to happen.  (Machine grinds to a halt if I put the usual test
load on it due to all the debug code in the kernel.)

Did get this interesting LOR, though it did not cause a deadlock:

lock order reversal:
 1st 0xfffffe000adb9f30 so_snd_sx (so_snd_sx) @
/usr/src/sys/kern/uipc_sockbuf.c:145
 2nd 0xfffffe000aa5b098 newnfs (newnfs) @ /usr/src/sys/kern/uipc_syscalls.c:2062
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffff834c3995c0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffff834c399670
witness_checkorder() at witness_checkorder+0xc0a/frame 0xffffff834c3996f0
__lockmgr_args() at __lockmgr_args+0x390/frame 0xffffff834c399810
nfs_lock1() at nfs_lock1+0x87/frame 0xffffff834c399840
VOP_LOCK1_APV() at VOP_LOCK1_APV+0xbe/frame 0xffffff834c399870
_vn_lock() at _vn_lock+0x63/frame 0xffffff834c3998d0
kern_sendfile() at kern_sendfile+0x812/frame 0xffffff834c399ac0
do_sendfile() at do_sendfile+0x92/frame 0xffffff834c399b20
amd64_syscall() at amd64_syscall+0x259/frame 0xffffff834c399c30
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff834c399c30
--- syscall (393, FreeBSD ELF64, sys_sendfile), rip = 0x801b24f4c, rsp
= 0x7fffffffcf58, rbp = 0x7fffffffd290 ---

Once the real deal pops up, collecting the full requested info should
be no problem, but it could take awhile to happen with only one
machine that can't run the full test battery.  So if a "real" fix is
dependent on this, reverting r250907 for 9.2-RELEASE is probably the
way to go. With that configuration, releng/9.2 continues to be pretty
solid for us.

Thanks!

(Since this doesn't contain the requested info, I heavily trimmed the
Cc: list.  It is not my intention to waste the time of everybody
involved.)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABXB=RQZNWg7wmajNWrBLQAiUsAYXqMFAF1GVpFTMf2QvqLqWw>