From owner-freebsd-stable@FreeBSD.ORG Sat Jan 29 01:26:30 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 110C0106564A for ; Sat, 29 Jan 2011 01:26:30 +0000 (UTC) (envelope-from jjh@deterlab.net) Received: from tardis.deterlab.net (tardis.deterlab.net [206.117.25.63]) by mx1.freebsd.org (Postfix) with ESMTP id 003D18FC0C for ; Sat, 29 Jan 2011 01:26:29 +0000 (UTC) Received: from [10.0.2.23] (pod.isi.edu [128.9.168.186]) by tardis.deterlab.net (Postfix) with ESMTPSA id 8459B3C01A3 for ; Fri, 28 Jan 2011 17:10:41 -0800 (PST) From: John Hickey Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Fri, 28 Jan 2011 17:10:41 -0800 Message-Id: To: freebsd-stable@freebsd.org Mime-Version: 1.0 (Apple Message framework v1082) X-Mailer: Apple Mail (2.1082) Subject: nfsd hung on ufs vnode lock X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Jan 2011 01:26:30 -0000 There was a previous thread about this, but it doesn't look like there = was any resolution: http://lists.freebsd.org/pipermail/freebsd-stable/2010-May/056986.html I run a fileserver for an Emulab (www.emulab.net) system. As such, the = exports table is constantly modified as experiments are swapped in and = out. We also get a lot of researchers using NFS for strange things. In = this case, the exclusive lock was for a cache directory shared by about = 36 machines running Ubuntu 8.04 and mounting with NFSv2. Eventually, = all our nfsd processes get stuck since the exclusive lock for the = directory is never released. I could use any and all pointers on = getting this fixed. What I am running: jjh@users: ~$ uname -a FreeBSD users.isi.deterlab.net 7.3-RELEASE-p2 FreeBSD 7.3-RELEASE-p2 #9: = Tue Sep 14 16:24:57 PDT 2010 = root@users.isi.deterlab.net:/usr/obj/usr/src/sys/USERS7 i386 Here are the sleepchains for my system (note that 0xd1f72678 appears = twice): 0xce089cf0: tag syncer, type VNON usecount 1, writecount 0, refcount 2 mountedhere 0 flags () lock type syncer: EXCL (count 1) by thread 0xcdb4b000 (pid 46) 0xd1f72678: tag ufs, type VDIR usecount 2, writecount 0, refcount 67 mountedhere 0 flags () v_object 0xd1e90e80 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xce1146c0 (pid 866) with 62 = pending ino 143173560, on dev mfid0s1f 0xd1e6f228: tag ufs, type VDIR usecount 1, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xd180f480 ref 0 pages 1 lock type ufs: SHARED (count 1) ino 19268907, on dev mfid0s1f 0xd1a37564: tag ufs, type VNON usecount 1, writecount 0, refcount 1 mountedhere 0 flags () lock type ufs: EXCL (count 1) by thread 0xcdb4c240 (pid 871) ino 115689129, on dev mfid1s1d 0xce089cf0: tag syncer, type VNON usecount 1, writecount 0, refcount 2 mountedhere 0 flags () lock type syncer: EXCL (count 1) by thread 0xcdb4b000 (pid 46) 0xd1f72678: tag ufs, type VDIR usecount 2, writecount 0, refcount 67 mountedhere 0 flags () v_object 0xd1e90e80 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xce1146c0 (pid 866) with 62 = pending ino 143173560, on dev mfid0s1f 0xd1e6f228: tag ufs, type VDIR usecount 1, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xd180f480 ref 0 pages 1 lock type ufs: SHARED (count 1) ino 19268907, on dev mfid0s1f 0xd1a37564: tag ufs, type VNON usecount 1, writecount 0, refcount 1 mountedhere 0 flags () lock type ufs: EXCL (count 1) by thread 0xcdb4c240 (pid 871) ino 115689129, on dev mfid1s1d Here is process 866: (kgdb) proc 866 [Switching to thread 66 (Thread 100104)]#0 sched_switch (td=3D0xce1146c0,= newtd=3DVariable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1936 1936 =20 (kgdb) bt #0 sched_switch (td=3D0xce1146c0, newtd=3DVariable "newtd" is not = available. ) at /usr/src/sys/kern/sched_ule.c:1936 #1 0xc080a4a6 in mi_switch (flags=3DVariable "flags" is not available. ) at /usr/src/sys/kern/kern_synch.c:444 #2 0xc0837aab in sleepq_switch (wchan=3DVariable "wchan" is not = available. ) at /usr/src/sys/kern/subr_sleepqueue.c:497 #3 0xc08380f6 in sleepq_wait (wchan=3D0xd4176394) at = /usr/src/sys/kern/subr_sleepqueue.c:580 #4 0xc080a92a in _sleep (ident=3D0xd4176394, lock=3D0xc0ceb498, = priority=3D80, wmesg=3D0xc0bb656e "ufs", timo=3D0) at = /usr/src/sys/kern/kern_synch.c:230 #5 0xc07ea9fa in acquire (lkpp=3D0xcd7375a0, extflags=3DVariable = "extflags" is not available. ) at /usr/src/sys/kern/kern_lock.c:151 #6 0xc07eb2ec in _lockmgr (lkp=3D0xd4176394, flags=3D8194, = interlkp=3D0xd41763c4, td=3D0xce1146c0, file=3D0xc0bc20c8 = "/usr/src/sys/kern/vfs_subr.c", line=3D2062) at /usr/src/sys/kern/kern_lock.c:384 #7 0xc0a24765 in ffs_lock (ap=3D0xcd737608) at = /usr/src/sys/ufs/ffs/ffs_vnops.c:377 #8 0xc0b26876 in VOP_LOCK1_APV (vop=3D0xc0ca4740, a=3D0xcd737608) at = vnode_if.c:1618 #9 0xc0896d76 in _vn_lock (vp=3D0xd417633c, flags=3D8194, = td=3D0xce1146c0, file=3D0xc0bc20c8 "/usr/src/sys/kern/vfs_subr.c", = line=3D2062) at vnode_if.h:851 #10 0xc0889da4 in vget (vp=3D0xd417633c, flags=3D8194, td=3D0xce1146c0) = at /usr/src/sys/kern/vfs_subr.c:2062 #11 0xc087bd23 in vfs_hash_get (mp=3D0xce0962d0, hash=3D143173100, = flags=3DVariable "flags" is not available. ) at /usr/src/sys/kern/vfs_hash.c:81 #12 0xc0a1e429 in ffs_vgetf (mp=3D0xce0962d0, ino=3D143173100, flags=3D2, = vpp=3D0xcd737800, ffs_flags=3D0) at = /usr/src/sys/ufs/ffs/ffs_vfsops.c:1400 #13 0xc0a1e95e in ffs_vget (mp=3D0xce0962d0, ino=3D143173100, flags=3D2, = vpp=3D0xcd737800) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1380 #14 0xc0a00765 in ffs_valloc (pvp=3D0xd1f72678, mode=3D33152, = cred=3D0xcf024700, vpp=3D0xcd737800) at = /usr/src/sys/ufs/ffs/ffs_alloc.c:970 #15 0xc0a30945 in ufs_makeinode (mode=3D33152, dvp=3D0xd1f72678, = vpp=3D0xcd737a64, cnp=3D0xcd737a78) at = /usr/src/sys/ufs/ufs/ufs_vnops.c:2254 #16 0xc0a310c0 in ufs_create (ap=3D0xcd73799c) at = /usr/src/sys/ufs/ufs/ufs_vnops.c:193 #17 0xc0b26ed2 in VOP_CREATE_APV (vop=3D0xc0ca4740, a=3D0xcd73799c) at = vnode_if.c:206 #18 0xc09c02ad in nfsrv_create (nfsd=3D0xcde57500, slp=3D0xcde37000, = td=3D0xce1146c0, mrq=3D0xcd737c58) at vnode_if.h:112 #19 0xc09c7a61 in nfssvc (td=3D0xce1146c0, uap=3D0xcd737cfc) at = /usr/src/sys/nfsserver/nfs_syscalls.c:456 #20 0xc0b108e5 in syscall (frame=3D0xcd737d38) at = /usr/src/sys/i386/i386/trap.c:1101 #21 0xc0af4290 in Xint0x80_syscall () at = /usr/src/sys/i386/i386/exception.s:262 #22 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) John Hickey jjh@deterlab.net