Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Dec 2011 16:17:46 +0100
From:      Attilio Rao <attilio@freebsd.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        mdf@freebsd.org, "O. Hartmann" <ohartman@zedat.fu-berlin.de>, freebsd-current@freebsd.org, Robert Watson <rwatson@freebsd.org>
Subject:   Re: Sleeping thread (tid 100033, pid 16): panic in FreeBSD 10.0-CURRENT/amd64 r228662
Message-ID:  <CAJ-FndDVvZ12gkwAjK4mD4ZZnTMcj4EPTPG8J6OmvYr7a5tVEA@mail.gmail.com>
In-Reply-To: <201112200935.30772.jhb@freebsd.org>
References:  <4EED2F1C.2060409@zedat.fu-berlin.de> <201112200852.23300.jhb@freebsd.org> <CAJ-FndB7o_vjbJefz1Bxa%2B=DEVZDxBoGPdKcVr5vNHdu-pFEFA@mail.gmail.com> <201112200935.30772.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
2011/12/20 John Baldwin <jhb@freebsd.org>:
> On Tuesday, December 20, 2011 9:20:09 am Attilio Rao wrote:
>> 2011/12/20 John Baldwin <jhb@freebsd.org>:
>> > On Saturday, December 17, 2011 10:41:15 pm mdf@freebsd.org wrote:
>> >> On Sat, Dec 17, 2011 at 5:45 PM, Alexander Kabaev <kabaev@gmail.com> =
wrote:
>> >> > On Sun, 18 Dec 2011 01:09:00 +0100
>> >> > "O. Hartmann" <ohartman@zedat.fu-berlin.de> wrote:
>> >> >
>> >> >> Sleeping thread (tid 100033, pid 16) owns a non sleepable lock
>> >> >> panic: sleeping thread
>> >> >> cpuid =3D 0
>> >> >>
>> >> >> PID 16 is always USB on my box.
>> >> >
>> >> > You really need to give us a backtrace when you quote panics. It is
>> >> > impossible to make any sense of the above panic message without mor=
e
>> >> > context.
>> >>
>> >> In the case of this panic, the stack of the thread which panics is
>> >> useless; it's someone trying to propagate priority that discovered it=
.
>> >> =C2=A0A backtrace on tid 100033 would be useful.
>> >>
>> >> With WITNESS enabled, it's possible to have this panic display the
>> >> stack of the incorrectly sleeping thread at the time it acquired the
>> >> lock, as well, but this code isn't in CURRENT or any release. =C2=A0I=
 have
>> >> a patch at $WORK I can dig up on Monday.
>> >
>> > Huh? =C2=A0The stock kernel dumps a stack trace of the offending threa=
d if you have
>> > DDB enabled:
>> >
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/*
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * If the threa=
d is asleep, then we are probably about
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * to deadlock.=
 =C2=A0To make debugging this easier, just
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * panic and te=
ll the user which thread misbehaved so
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * they can hop=
efully get a stack trace from the truly
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * misbehaving =
thread.
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 */
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (TD_IS_SLEEP=
ING(td)) {
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0printf(
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"Sleeping threa=
d (tid %d, pid %d) owns a non-sleepable lock\n",
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0td->td_tid, td->td_proc->p_pid);
>> > #ifdef DDB
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0db_trace_thread(td, -1);
>> > #endif
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0panic("sleeping thread");
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>> >
>> > It may be that we can make use of the STACK API here instead to output=
 this
>> > trace even when DDB isn't enabled. =C2=A0The patch below tries to do t=
hat
>> > (untested). =C2=A0It does some odd thigns though since it is effective=
ly running
>> > from a panic context already, so it uses a statically allocated 'struc=
t stack'
>> > rather than using stack_create() and uses stack_print_ddb() since it i=
s
>> > holding spin locks and can't possibly grab an sx lock:
>> >
>> > Index: subr_turnstile.c
>> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> > --- subr_turnstile.c =C2=A0 =C2=A0(revision 228534)
>> > +++ subr_turnstile.c =C2=A0 =C2=A0(working copy)
>> > @@ -72,6 +72,7 @@ __FBSDID("$FreeBSD$");
>> > =C2=A0#include <sys/proc.h>
>> > =C2=A0#include <sys/queue.h>
>> > =C2=A0#include <sys/sched.h>
>> > +#include <sys/stack.h>
>> > =C2=A0#include <sys/sysctl.h>
>> > =C2=A0#include <sys/turnstile.h>
>> >
>> > @@ -175,6 +176,7 @@ static void turnstile_fini(void *mem, int size);
>> > =C2=A0static void
>> > =C2=A0propagate_priority(struct thread *td)
>> > =C2=A0{
>> > + =C2=A0 =C2=A0 =C2=A0 static struct stack st;
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0struct turnstile *ts;
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0int pri;
>> >
>> > @@ -217,8 +219,10 @@ propagate_priority(struct thread *td)
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0printf(
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"Sleeping threa=
d (tid %d, pid %d) owns a non-sleepable lock\n",
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0td->td_tid, td->td_proc->p_pid);
>> > -#ifdef DDB
>> > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 db_trace_thread(td, -1);
>> > +#ifdef STACK
>> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 stack_zero(&st);
>> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 stack_save_td(&st, td);
>> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 stack_print_ddb(&st);
>> > =C2=A0#endif
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0panic("sleeping thread");
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>> >
>> > --
>>
>> I'm not sure it is a wise idea to trimm the DDB part, because it is
>> much more common than STACK enabled. Note that stack(9) is working if
>> you define DDB too, so I'd say to do that for both.
>> Also, I don't think you need the stack_zero() prior to set it.
>
> Err, STACK is enabled in GENERIC in release kernels but DDB is not, so I =
think
> STACK is the more common one. =C2=A0As far as stack_zero(), I was just be=
ing paranoid.

And what is the point for not having
#ifdef STACK
as
#if defined(STACK) || defined(DDB)

?

>> As we are here, however, I have a question for Robert here: do you
>> think we should support the _ddb() variant of options even in the case
>> DDB is not enabled in the kernel?
>> Probabilly the way it is nowadays is easier to have stack(9) both
>> defined for DDB and STACK, but in general I wouldn't advertise that.
>
> The _ddb variants are always enabled by my reading. =C2=A0They just use d=
ifferent
> entry points into the linker that don't use locking.

My question is different: why we define them anyway even when DDB is
not enabled?

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndDVvZ12gkwAjK4mD4ZZnTMcj4EPTPG8J6OmvYr7a5tVEA>