Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Aug 2011 22:11:44 +0200
From:      Attilio Rao <attilio@freebsd.org>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: debugging frequent kernel panics on 8.2-RELEASE
Message-ID:  <CAJ-FndCaTSoAU2Ycj=WEppzc1RmbQ6ugqiuuyCqUpYZuGXKt_g@mail.gmail.com>
In-Reply-To: <4E4D717F.3090802@FreeBSD.org>
References:  <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk> <A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk> <4E4380C0.7070908@FreeBSD.org> <EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E4AD35C.7020504@FreeBSD.org> <6A7238AED44542A880B082A40304D940@multiplay.co.uk> <4E4BA21F.6010805@FreeBSD.org> <581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk> <4E4BBA7F.30907@FreeBSD.org> <88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> <4E4D717F.3090802@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
2011/8/18 Andriy Gapon <avg@freebsd.org>:
> on 17/08/2011 23:21 Andriy Gapon said the following:
>>
>> It seems like everything starts with some kind of a race between
>> terminating
>> processes in a jail and termination of the jail itself. =C2=A0This is wh=
ere the
>> details are very thin so far. =C2=A0What we see is that a process (http)=
 is in
>> exit(2) syscall, in exit1() function actually, and past the place where
>> P_WEXIT
>> flag is set and even past the place where p_limit is freed and reset to
>> NULL.
>> At that place the thread calls prison_proc_free(), which calls
>> prison_deref().
>> Then, we see that in prison_deref() the thread gets a page fault because
>> of what
>> seems like a NULL pointer dereference. =C2=A0That's just the start of th=
e
>> problem and
>> its root cause.
>>
>> Then, trap_pfault() gets invoked and, because addresses close to NULL lo=
ok
>> like
>> userspace addresses, vm_fault/vm_fault_hold gets called, which in its tu=
rn
>> goes
>> on to call vm_map_growstack. =C2=A0First thing that vm_map_growstack doe=
s is a
>> call
>> to lim_cur(), but because p_limit is already NULL, that call results in =
a
>> NULL
>> pointer dereference and a page fault. =C2=A0Goto the beginning of this
>> paragraph.
>>
>> So we get this recursion of sorts, which only ends when a stack is
>> exhausted and
>> a CPU generates a double-fault.
>
> BTW, does anyone has an idea why the thread in question would "disappear"
> from
> the kgdb's point of view?
>
> (kgdb) p cpuid_to_pcpu[2]->pc_curthread->td_tid
> $3 =3D 102057
> (kgdb) tid 102057
> invalid tid
>
> info threads also doesn't list the thread.
>
> Is it because the panic happened while the thread was somewhere in exit1(=
)?
> is there an easy way to examine its stack in this case?

Yes it is likely it.

'tid' command should lookup the tid_to_thread() table (or similar
name) which returns NULL, which means the thread has past beyond the
point it was in the lookup table.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndCaTSoAU2Ycj=WEppzc1RmbQ6ugqiuuyCqUpYZuGXKt_g>