Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 Aug 2011 11:31:35 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        freebsd-stable@FreeBSD.org
Subject:   Re: debugging frequent kernel panics on 8.2-RELEASE
Message-ID:  <4E48D967.9060804@FreeBSD.org>
In-Reply-To: <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
References:  <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
on 14/08/2011 17:43 Steven Hartland said the following:
> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
>>
>> Maybe test it on couple of machines first just in case I overlooked something
>> essential, although I have a report from another use that the patch didn't break
>> anything for him (it was tested for an unrelated issue).
> 
> We've got this running on a ~40 machines and just had the first panic
> since the update. Unfortunately it doesn't seem to have changed anything :(
> 
> We have 352 thread entries starting with:-
> #0  sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0,
> flags=Variable "flags" is not available.
> 23 with:-
> cpustop_handler () at atomic.h:285
> and 16 with:-
> #0  fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562

I would like to get a full output of thread apply all bt.

> The main message being:-
> panic: double fault
> 
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> <118>Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15

So this line, does it indicate a shutdown of a jail or of the whole system?

> Fatal double fault
> rip = 0xffffffff8053b691

Can you please provide output of 'list *0xffffffff8053b691' in kgdb?

> rsp = 0xffffff8d8f356fb0
> rbp = 0xffffff8d8f357210
> cpuid = 2; apic id = 02
> panic: double fault
> cpuid = 2
> KDB: stack backtrace:
> #0 0xffffffff803bb75e at kdb_backtrace+0x5e
> #1 0xffffffff8038956e at panic+0x2ae
> #2 0xffffffff805802b6 at dblfault_handler+0x96
> #3 0xffffffff8056900d at Xdblfault+0xad

I think (not 100% sure) that with DDB in kernel we could get a better backtrace
here, possibly with pre-dblfault stack frames, because DDB backend is a bit more
smarter than the trivial stack(9) printer.

> stack: 0xffffff8d8f357000, 4

One thing I can say is that this looks like like a double-fault because of stack
exhaustion (the most typical cause): rsp value is below td_kstack.

Can you please also provide the following information:
p *((struct pcb *)((char *)0xffffff8d8f357000 + KSTACK_PAGES * PAGE_SIZE) - 1)
where KSTACK_PAGES is a value of KSTACK_PAGES option (amd64 default is 4) and
PAGE_SIZE is 4096.

> rsp = 0xffffff800009ae10

[snip]

> There are some indications that stopping jails could be the
> cause of the panics so on one test box I've added in invariants
> to see if we get anything shows up from that.

OK.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E48D967.9060804>