From owner-freebsd-stable@FreeBSD.ORG Thu Feb 26 15:25:54 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 952421065672 for ; Thu, 26 Feb 2009 15:25:54 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 57CC88FC0A for ; Thu, 26 Feb 2009 15:25:54 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (pool-98-109-39-197.nwrknj.fios.verizon.net [98.109.39.197]) by cyrus.watson.org (Postfix) with ESMTPSA id CBD0346B45; Thu, 26 Feb 2009 10:25:53 -0500 (EST) Received: from localhost (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.14.3/8.14.3) with ESMTP id n1QFPOR1042853; Thu, 26 Feb 2009 10:25:47 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-stable@freebsd.org Date: Thu, 26 Feb 2009 10:12:24 -0500 User-Agent: KMail/1.9.7 References: <49A46AB4.3080003@palisadesys.com> <200902250915.57562.jhb@freebsd.org> <49A55D78.1070604@palisadesys.com> In-Reply-To: <49A55D78.1070604@palisadesys.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200902261012.24325.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Thu, 26 Feb 2009 10:25:47 -0500 (EST) X-Virus-Scanned: ClamAV 0.94.2/9051/Thu Feb 26 08:08:01 2009 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Guy Helmer Subject: Re: 7.1 hangs in cache_lookup mutex? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Feb 2009 15:25:54 -0000 On Wednesday 25 February 2009 10:02:16 am Guy Helmer wrote: > John Baldwin wrote: > > On Tuesday 24 February 2009 4:46:28 pm Guy Helmer wrote: > > > >> I think I may have found a clue regarding some of the hangs I'm seeing > >> on FreeBSD 7.1. > >> I have a program (kvoop), compiled under FreeBSD 6 and using > >> compatibility libraries under FreeBSD 7, that seems to be consistently > >> involved during these hangs. This time, I noticed that many processes > >> are stopped, waiting on the ufs lock. I can't tell for certain, but I > >> assume this kvoop process 33203 is blocking the other processes. The > >> kvoop process looks to me like it is waiting for a mutex, but nothing > >> shows up being deadlocked. Am I on the right track? Is there something > >> else I should be looking for? > >> > >> (kgdb) proc 33203 > >> [Switching to thread 129 (Thread 100241)]#0 sched_switch ( > >> td=0xffffff0019109a50, newtd=0x0, flags=1) > >> at ../../../kern/sched_ule.c:1944 > >> 1944 cpuid = PCPU_GET(cpuid); > >> (kgdb) where > >> #0 sched_switch (td=0xffffff0019109a50, newtd=0x0, flags=1) > >> at ../../../kern/sched_ule.c:1944 > >> #1 0xffffffff803a573b in mi_switch (flags=1, newtd=0x0) > >> at ../../../kern/kern_synch.c:440 > >> #2 0xffffffff803d1214 in turnstile_wait (ts=Variable "ts" is not available. > >> ) > >> at ../../../kern/subr_turnstile.c:748 > >> #3 0xffffffff80392db0 in _mtx_lock_sleep (m=0xffffffff8083c220, > >> tid=18446742974618442320, opts=Variable "opts" is not available. > >> ) at ../../../kern/kern_mutex.c:420 > >> #4 0xffffffff80392ea6 in _mtx_lock_flags (m=0xffffffff8083c220, opts=0, > >> file=0xffffffff805bd438 "../../../kern/vfs_cache.c", line=327) > >> at ../../../kern/kern_mutex.c:186 > >> #5 0xffffffff80403bc6 in cache_lookup (dvp=0xffffff00013e0dc8, > >> vpp=0xffffffffa2d93a18, cnp=0xffffffffa2d93a40) > >> at ../../../kern/vfs_cache.c:327 > >> #6 0xffffffff80404093 in vfs_cache_lookup (ap=Variable "ap" is not > >> available. > >> ) > >> at ../../../kern/vfs_cache.c:674 > >> #7 0xffffffff805628a0 in VOP_LOOKUP_APV (vop=0xffffffff8076e440, > >> a=0xffffffffa2d936b0) at vnode_if.c:99 > >> #8 0xffffffff80409afd in lookup (ndp=0xffffffffa2d939f0) at vnode_if.h:57 > >> #9 0xffffffff8040a824 in namei (ndp=0xffffffffa2d939f0) > >> at ../../../kern/vfs_lookup.c:219 > >> #10 0xffffffff8041dc4f in vn_open_cred (ndp=0xffffffffa2d939f0, > >> flagp=0xffffffffa2d9393c, cmode=0, cred=0xffffff0001588600, > >> fp=0xffffff0001964200) at ../../../kern/vfs_vnops.c:188 > >> #11 0xffffffff8041ccc7 in kern_open (td=0xffffff0019109a50, > >> path=0xffffac30
, pathseg=Variable > >> "pathseg" is not available. > >> ) > >> at ../../../kern/vfs_syscalls.c:1032 > >> #12 0xffffffff80544660 in ia32_syscall (frame=0xffffffffa2d93c80) > >> at ../../../amd64/ia32/ia32_syscall.c:182 > >> #13 0xffffffff805014d0 in Xint0x80_syscall () at ia32_exception.S:65 > >> #14 0x0000000028284037 in ?? () > >> > >> (kgdb) frame 4 > >> #4 0xffffffff80392ea6 in _mtx_lock_flags (m=0xffffffff8083c220, opts=0, > >> file=0xffffffff805bd438 "../../../kern/vfs_cache.c", line=327) > >> at ../../../kern/kern_mutex.c:186 > >> 186 _get_sleep_lock(m, curthread, opts, file, line); > >> (kgdb) list > >> 181 ("mtx_lock() of spin mutex %s @ %s:%d", > >> m->lock_object.lo_name, > >> 182 file, line)); > >> 183 WITNESS_CHECKORDER(&m->lock_object, opts | LOP_NEWORDER > >> | LOP_EXCLUSIVE, > >> 184 file, line); > >> 185 > >> 186 _get_sleep_lock(m, curthread, opts, file, line); > >> 187 LOCK_LOG_LOCK("LOCK", &m->lock_object, opts, > >> m->mtx_recurse, file, > >> 188 line); > >> 189 WITNESS_LOCK(&m->lock_object, opts | LOP_EXCLUSIVE, > >> file, line); > >> 190 curthread->td_locks++; > >> > >> (kgdb) print *m > >> $2 = {lock_object = {lo_name = 0xffffffff805bd5d2 "Name Cache", > >> lo_type = 0xffffffff805bd5d2 "Name Cache", lo_flags = 16973824, > >> lo_witness_data = {lod_list = {stqe_next = 0xffffffff807cd600}, > >> lod_witness = 0xffffffff807cd600}}, mtx_lock = 4, mtx_recurse = 0} > >> > > > > Is this from a coredump or while the system is live? mtx_lock = 4 indicates > > the mutex is unlocked, so there shouldn't be any threads waiting on it. > > > > > That was from running kgdb on a vmcore file. Would the state of the > mutex be different than if I was running kgdb on a live kernel? Well, if it was live perhaps the thread wasn't hung but was changing states while you were watching it. However, that is not true in a coredump. This is a very disturbing bug as it means something is broken in the mutex implementation itself. Have you reproduced this on multiple machines? -- John Baldwin