From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 25 13:30:02 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8CE8A1065742 for ; Thu, 25 Sep 2008 13:30:02 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 18DFC8FC19 for ; Thu, 25 Sep 2008 13:30:01 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m8PDTWLY095309; Thu, 25 Sep 2008 09:29:55 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: Jeff Wheelhouse Date: Thu, 25 Sep 2008 08:45:05 -0400 User-Agent: KMail/1.9.7 References: <200809241234.55075.jhb@freebsd.org> <57DCDBC7-8542-4082-8893-5B96DA92DA9A@wheelhouse.org> In-Reply-To: <57DCDBC7-8542-4082-8893-5B96DA92DA9A@wheelhouse.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200809250845.06042.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]); Thu, 25 Sep 2008 09:29:56 -0400 (EDT) X-Virus-Scanned: ClamAV 0.93.1/8329/Thu Sep 25 04:47:46 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=BAYES_00,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: freebsd-hackers@freebsd.org Subject: Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Sep 2008 13:30:02 -0000 On Thursday 25 September 2008 01:34:06 am Jeff Wheelhouse wrote: > > On Sep 24, 2008, at 12:34 PM, John Baldwin wrote: > > > On Wednesday 24 September 2008 12:17:56 pm Jeff Wheelhouse wrote: > >> panic: lockmgr: thread 0xffffff0050858350, not exclusive lock holder > >> 0xffffff00074959f0 unlocking > >> cpuid = 0 > >> KDB: stack backtrace: > >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > >> panic() at panic+0x17a > >> _lockmgr() at _lockmgr+0x872 > >> VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 > >> null_unlock() at null_unlock+0xff > >> VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 > >> nullfs_mount() at nullfs_mount+0x244 > >> vfs_donmount() at vfs_donmount+0xe4d > >> nmount() at nmount+0xa5 > >> syscall() at syscall+0x254 > >> Xfast_syscall() at Xfast_syscall+0xab > >> --- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp = > >> 0x7fffffffdfc8, rbp = 0x7fffffffdfd0 --- > > > > Can you use gdb or the like to get the souce file/line for the > > nullfs_mount+0x244 frame? > > Got it again, this time with the full debug kernel, and I'm getting > the same weird results from gdb, so I'll go ahead and post it: > > panic: lockmgr: thread 0xffffff0003e499f0, not exclusive lock holder > 0xffffff000a5e16a0 unlocking > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > panic() at panic+0x17a > _lockmgr() at _lockmgr+0x872 > VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 > null_unlock() at null_unlock+0xff > VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 > nullfs_mount() at nullfs_mount+0x244 > vfs_donmount() at vfs_donmount+0xe4d > nmount() at nmount+0xa5 > syscall() at syscall+0x254 > Xfast_syscall() at Xfast_syscall+0xab > --- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp = > 0x7fffffffe1c8, rbp = 0x7fffffffe1d0 --- > > $ gdb /boot/kernel/nullfs.ko > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and > you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for > details. > This GDB was configured as "amd64-marcel-freebsd"... > (gdb) l *nullfs_mount+0x244 > 0x9c4 is in nullfs_mount (namei.h:163). > 158 struct thread *td) > 159 { > 160 ndp->ni_cnd.cn_nameiop = op; > 161 ndp->ni_cnd.cn_flags = flags; > 162 ndp->ni_segflg = segflg; > 163 ndp->ni_dirp = namep; > 164 ndp->ni_cnd.cn_thread = td; > 165 } > 166 > 167 #define NDF_NO_DVP_RELE 0x00000001 > (gdb) > > (That's NDINIT(), but line 163 doesn't look like it belongs in the > middle of a call stack. There's a VOP_UNLOCK a few lines above > NDINIT() in mount_nullfs(), and another one some ways farther on in > the function.) It's probably the one just before the NDINIT (note that the return address in the call stack is pointing to the next instruction to be executed after the call to VOP_UNLOCK(), so sometimes it can end up referring to the next line in the source code from the actual function call): if ((mp->mnt_vnodecovered->v_op == &null_vnodeops) && VOP_ISLOCKED(mp->mnt_vnodecovered)) { VOP_UNLOCK(mp->mnt_vnodecovered, 0); isvnunlocked = 1; } /* * Find lower node */ NDINIT(ndp, LOOKUP, FOLLOW|LOCKLEAF, UIO_SYSSPACE, target, td); error = namei(ndp); Can you 'p *mp'? I'm curious if mp->mnt_vnodecovered is NULL (in which case, why didn't the two tests in the if() fail?) > The good news is we took this particular machine out of production and > came up with a synthetic test based on our in-house code that can > probably reliably reproduce this within a few minutes. As you might > expect, the test involves hammering the same nullfs mount point with > mounts and umounts from multiple processes without any external > synchronization. Ok. Reproducibility is good. :) -- John Baldwin