From owner-freebsd-current@FreeBSD.ORG Mon Jun 28 15:38:33 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 82BBC1065678; Mon, 28 Jun 2010 15:38:33 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 416888FC20; Mon, 28 Jun 2010 15:38:33 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id B16D846B38; Mon, 28 Jun 2010 11:38:32 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 427998A03C; Mon, 28 Jun 2010 11:38:31 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Date: Mon, 28 Jun 2010 11:32:57 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201006281132.57541.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 28 Jun 2010 11:38:31 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Attilio Rao , pluknet , Anton Yuzhaninov Subject: Re: panic in deadlkres X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Jun 2010 15:38:33 -0000 On Friday 25 June 2010 4:52:22 pm pluknet wrote: > On 25 June 2010 13:50, Anton Yuzhaninov wrote: > > I've got panic on 9-current from Jun 25 2010 > > > > May be this is bug in deadlock resolver > > > > panic: blockable sleep lock (sleep mutex) process lock @ > > /usr/src/sys/kern/kern_clock.c:203 > > > > db> show alllocks > > Process 0 (kernel) thread 0xc4dcd270 (100047) > > shared sx allproc (allproc) r = 0 (0xc0885ebc) locked @ > > /usr/src/sys/kern/kern_clock.c:193 > > > > db> show lock 0xc4dcd270 > > class: spin mutex > > name: D > > flags: {SPIN, RECURSE} > > state: {OWNED} > > > > (kgdb) bt > > #0 doadump () at pcpu.h:248 > > #1 0xc05ae59f in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416 > > #2 0xc05ae825 in panic (fmt=Variable "fmt" is not available. > > ) at /usr/src/sys/kern/kern_shutdown.c:590 > > #3 0xc048ff45 in db_panic (addr=Could not find the frame base for "db_panic". > > ) at /usr/src/sys/ddb/db_command.c:478 > > #4 0xc0490533 in db_command (last_cmdp=0xc086ef1c, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:445 > > #5 0xc0490662 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 > > #6 0xc04923ef in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:229 > > #7 0xc05dade6 in kdb_trap (type=3, code=0, tf=0xc4b31bd0) at /usr/src/sys/kern/subr_kdb.c:535 > > #8 0xc078696b in trap (frame=0xc4b31bd0) at /usr/src/sys/i386/i386/trap.c:692 > > #9 0xc076ca0b in calltrap () at /usr/src/sys/i386/i386/exception.s:165 > > #10 0xc05daf30 in kdb_enter (why=0xc07ea02d "panic", msg=0xc07ea02d "panic") at cpufunc.h:71 > > #11 0xc05ae806 in panic (fmt=0xc07efd94 "blockable sleep lock (%s) %s @ %s:%d") at /usr/src/sys/kern/kern_shutdown.c:573 > > #12 0xc05ee30b in witness_checkorder (lock=0xc5148088, flags=9, file=0xc07e3b20 "/usr/src/sys/kern/kern_clock.c", line=203, interlock=0x0) > > at /usr/src/sys/kern/subr_witness.c:1067 > > #13 0xc05a093c in _mtx_lock_flags (m=0xc5148088, opts=0, file=0xc07e3b20 "/usr/src/sys/kern/kern_clock.c", line=203) > > at /usr/src/sys/kern/kern_mutex.c:200 > > #14 0xc05706a9 in deadlkres () at /usr/src/sys/kern/kern_clock.c:203 > > #15 0xc0588721 in fork_exit (callout=0xc05705ea , arg=0x0, frame=0xc4b31d38) at /usr/src/sys/kern/kern_fork.c:843 > > #16 0xc076ca80 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:270 > > Hi! > > [throw in ideas (just ignore them if they're dumb, thinking badly atm).] > > AFAIK, that indicates that some thread already has > a spin mutex and then it tries to acquire a sleep mutex. > > Looks like kern/kern_clock.c v1.213 (SVN rev 206482) > has a regression in handling ticks wrap-up > w.r.t. it doesn't release a thread mutex, does it? This looks like a correct analysis to me. > >From subr_witness.c: > 1062: * Since spin locks include a critical section, this check > 1063: * implicitly enforces a lock order of all sleep > locks before > 1064: * all spin locks. > 1065: */ > 1066: if (td->td_critnest != 0 && !kdb_active) > 1067: panic("blockable sleep lock (%s) %s @ %s:%d", > 1068: class->lc_name, lock->lo_name, file, line); > > >From kern_clock.c, v1.213 (in several places, while holding a thread lock): > + /* Handle ticks wrap-up. */ > + if (ticks < td->td_blktick) > + continue; > > Should not it be like the next: > + /* Handle ticks wrap-up. */ > + if (ticks < td->td_blktick) { > + thread_unlock(td); > + continue; > + } > > The precondition idea to reproduce it is to lock a subject thread > in some deadlkres callout, handle re-wrap condition, then try > to lock a process to witch the thread belongs in (n+m)'th deadlkres > callout, or in different context. -- John Baldwin