From owner-freebsd-current@FreeBSD.ORG  Mon Jun 28 15:38:33 2010
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 82BBC1065678;
	Mon, 28 Jun 2010 15:38:33 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 416888FC20;
	Mon, 28 Jun 2010 15:38:33 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id B16D846B38;
	Mon, 28 Jun 2010 11:38:32 -0400 (EDT)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 427998A03C;
	Mon, 28 Jun 2010 11:38:31 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-current@freebsd.org
Date: Mon, 28 Jun 2010 11:32:57 -0400
User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; )
References: <i01u5g$39n$1@dough.gmane.org>
	<AANLkTinam4rwsVtFPSAidVOzdzrlx-whNBId8cMg_ySQ@mail.gmail.com>
In-Reply-To: <AANLkTinam4rwsVtFPSAidVOzdzrlx-whNBId8cMg_ySQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201006281132.57541.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Mon, 28 Jun 2010 11:38:31 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham
	version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: Attilio Rao <attilio@freebsd.org>, pluknet <pluknet@gmail.com>,
	Anton Yuzhaninov <citrin@citrin.ru>
Subject: Re: panic in deadlkres
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Jun 2010 15:38:33 -0000

On Friday 25 June 2010 4:52:22 pm pluknet wrote:
> On 25 June 2010 13:50, Anton Yuzhaninov <citrin@citrin.ru> wrote:
> > I've got panic on 9-current from Jun 25 2010
> >
> > May be this is bug in deadlock resolver
> >
> > panic: blockable sleep lock (sleep mutex) process lock @
> > /usr/src/sys/kern/kern_clock.c:203
> >
> > db> show alllocks
> > Process 0 (kernel) thread 0xc4dcd270 (100047)
> > shared sx allproc (allproc) r = 0 (0xc0885ebc) locked @
> > /usr/src/sys/kern/kern_clock.c:193
> >
> > db> show lock 0xc4dcd270
> >  class: spin mutex
> >  name: D
> >  flags: {SPIN, RECURSE}
> >  state: {OWNED}
> >
> > (kgdb) bt
> > #0  doadump () at pcpu.h:248
> > #1  0xc05ae59f in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:416
> > #2  0xc05ae825 in panic (fmt=Variable "fmt" is not available.
> > ) at /usr/src/sys/kern/kern_shutdown.c:590
> > #3  0xc048ff45 in db_panic (addr=Could not find the frame base for 
"db_panic".
> > ) at /usr/src/sys/ddb/db_command.c:478
> > #4  0xc0490533 in db_command (last_cmdp=0xc086ef1c, cmd_table=0x0, 
dopager=1) at /usr/src/sys/ddb/db_command.c:445
> > #5  0xc0490662 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
> > #6  0xc04923ef in db_trap (type=3, code=0) at 
/usr/src/sys/ddb/db_main.c:229
> > #7  0xc05dade6 in kdb_trap (type=3, code=0, tf=0xc4b31bd0) at 
/usr/src/sys/kern/subr_kdb.c:535
> > #8  0xc078696b in trap (frame=0xc4b31bd0) at 
/usr/src/sys/i386/i386/trap.c:692
> > #9  0xc076ca0b in calltrap () at /usr/src/sys/i386/i386/exception.s:165
> > #10 0xc05daf30 in kdb_enter (why=0xc07ea02d "panic", msg=0xc07ea02d 
"panic") at cpufunc.h:71
> > #11 0xc05ae806 in panic (fmt=0xc07efd94 "blockable sleep lock (%s) %s @ 
%s:%d") at /usr/src/sys/kern/kern_shutdown.c:573
> > #12 0xc05ee30b in witness_checkorder (lock=0xc5148088, flags=9, 
file=0xc07e3b20 "/usr/src/sys/kern/kern_clock.c", line=203, interlock=0x0)
> >    at /usr/src/sys/kern/subr_witness.c:1067
> > #13 0xc05a093c in _mtx_lock_flags (m=0xc5148088, opts=0, file=0xc07e3b20 
"/usr/src/sys/kern/kern_clock.c", line=203)
> >    at /usr/src/sys/kern/kern_mutex.c:200
> > #14 0xc05706a9 in deadlkres () at /usr/src/sys/kern/kern_clock.c:203
> > #15 0xc0588721 in fork_exit (callout=0xc05705ea <deadlkres>, arg=0x0, 
frame=0xc4b31d38) at /usr/src/sys/kern/kern_fork.c:843
> > #16 0xc076ca80 in fork_trampoline () at 
/usr/src/sys/i386/i386/exception.s:270
> 
> Hi!
> 
> [throw in ideas (just ignore them if they're dumb, thinking badly atm).]
> 
> AFAIK, that indicates that some thread already has
> a spin mutex and then it tries to acquire a sleep mutex.
> 
> Looks like kern/kern_clock.c v1.213 (SVN rev 206482)
> has a regression in handling ticks wrap-up
> w.r.t. it doesn't release a thread mutex, does it?

This looks like a correct analysis to me.

> >From subr_witness.c:
> 1062:                 * Since spin locks include a critical section, this 
check
> 1063:                 * implicitly enforces a lock order of all sleep
> locks before
> 1064:                 * all spin locks.
> 1065:                 */
> 1066:                if (td->td_critnest != 0 && !kdb_active)
> 1067:                        panic("blockable sleep lock (%s) %s @ %s:%d",
> 1068:                            class->lc_name, lock->lo_name, file, line);
> 
> >From kern_clock.c, v1.213 (in several places, while holding a thread lock):
> +					/* Handle ticks wrap-up. */
> +					if (ticks < td->td_blktick)
> +						continue;
> 
> Should not it be like the next:
> +					/* Handle ticks wrap-up. */
> +					if (ticks < td->td_blktick) {
> +						thread_unlock(td);
> +						continue;
> +					}
> 
> The precondition idea to reproduce it is to lock a subject thread
> in some deadlkres callout, handle re-wrap condition, then try
> to lock a process to witch the thread belongs in (n+m)'th deadlkres
> callout, or in different context.

-- 
John Baldwin