FreeBSD Mail Archives

Date:      Wed, 10 Aug 2011 08:12:56 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: debugging frequent kernel panics on 8.2-RELEASE
Message-ID:  <20110810151256.GA38601@icarus.home.lan>
In-Reply-To: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk>
References:  <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk>

On Wed, Aug 10, 2011 at 03:22:52PM +0100, Steven Hartland wrote:
> The base stack reported is a double fault with no additional
> details and CTRL+ALT+ESC fails to break to the debugger as
> does and NMI, even though it at least tries printing the
> following many times some quite jumbled:-
> NMI ... going to debugger

You may be interested in these system tunables (not sysctls).  These
come from sys/amd64/amd64/trap.c (i386 has the same):

machdep.kdb_on_nmi    (defaults to 1)
machdep.panic_on_nmi  (defaults to 1)

If what you're seeing is a hardware NMI that fires, followed by the
machine panicing, the above tunables are probably doing that.  A
hardware NMI could indicate an actual hardware issue of sorts, depending
on how the motherboard vendor implemented what they did.  For example,
on a series of mainboards we have at my workplace, the BIOS can be
configured to generate either an NMI or SMI# when different kinds of ECC
RAM errors happen (either single-bit or multi-bit parity errors).  I
don't know if that's what you're seeing.

If you're generating the NMI yourself (possibly via the KVM, etc.) then
okay, that's different.  I'm trying to discern whether or not *you're*
generating the NMI, or if the NMI just happens and causes a panic for
you and that's what you're worried about.

Now to discuss the "jumbled console output":

The interspersing of kernel text output has plagued FreeBSD for a very
long time (since approximately 6.x).  There have been statements from
kernel coders that you can decrease the likelihood of it happening by
increasing the PRINTF_BUFR_SIZE (not a typo) option in your kernel
configuration.  The issue is exacerbated by use of SMP (either
multi-core or multi-CPU).

The default (assuming your kernel configs are based off of GENERIC
within the past 4-5 years) is 128.  However, the same developers stated
that they have great reservations over increasing this number
dramatically (meaning, something like 256 will probably work, but larger
"may have repercussions which are unknown at this time").

I have stated publicly then, and will do so again now, that this option
does not solve the problem.  I acknowledge it may make it "less likely
to happen" or may decrease the amount of interspersed output, but in my
experience neither of those prove true; and more importantly, said
option does not solve the problem.  I've talked on-list with John
Baldwin about this problem in the past, who had some pretty good ideas
of how to solve it.

I should point out that Solaris 10 and OpenSolaris (not sure about
present-day releases) both have this problem as well, especially during
kernel panics or MCEs.  Linux addressed this issue by implementing a
ring-based cyclic buffer for its kernel messages (syslog/klogd), and the
model is extremely well-documented (quite clever too):

http://www.mjmwired.net/kernel/Documentation/trace/ring-buffer-design.txt

I'm still surprised not a single GSoC project has attempted to solve
this for FreeBSD.  It really is a serious matter, as it makes getting
kernel backtraces and crash data a serious pain in the butt.  It can
also impact real-time debugging.  These are the *worst* times to have to
tolerate something like this.

I can point you to old threads about this, and my old FreeBSD wiki page
("Commonly reported issues") touches on this as well.  The point I want
to get across is that PRINTF_BUFR_SIZE does not solve the problem.

> We've configured the dump device but that also seems to fail
> to capture any details just sitting there after panic with
> Dumping 4465MB:
> 
> The machines are single disk ZFS root install and the dump
> device is configured using the gptid, could this be what's
> preventing the dump happening?

I can tell you that others have reported this problem where the kernel
panic/dump begins but either locks up after showing the first progress
metre/amount, or during the dumping itself.

I give everyone the same advice: please make sure that you have a swap
partition that's large enough to fit your entire memory contents
(preferably a swap that's 2x or 1.5x the amount of physical RAM), and
please make sure it's on a dedicated slice (e.g. ada0s1b).  I do not
advise any sort of "abstraction" layer between swap and the rest of the
system.  It might seem like a great/fun/awesome idea followed by
"whatever jdc, it works!" but when a crash happens -- which is when you
need it most -- and it doesn't work, I won't sympathise.  :-)

As for the GPT aspects of things: I'm still not familiar with GPT (as a
technology I am, but when it comes to actual usability I am not).

> The kernel is compiled with:-
> options     KDB         # Kernel debugger related code
> options     KDB_TRACE       # Print a stack trace for a panic
> 
> We have remove KVM but not remote serial on the most of the
> machines.

As long as remote KVM provides actual VGA-level redirection, then that's
sufficient (though makes copy-pasting output basically impossible).  We
use serial console and tend to use these options; the DDB and GDB
options may be helpful for you, but not if the system is behaving the
way you describe.

# Debugging options
options         BREAK_TO_DEBUGGER       # Sending a serial BREAK drops to DDB
options         ALT_BREAK_TO_DEBUGGER   # Permit <CR>~<Ctrl-b> to drop to DDB
options         KDB                     # Enable kernel debugger support
options         KDB_TRACE               # Print stack trace automatically on panic
options         DDB                     # Support DDB
options         GDB                     # Support remote GDB

In combination with this, we use the following in /etc/rc.conf (the
dumpdev line is important, else savecore won't pick up anything):

dumpdev="auto"
ddb_enable="yes"

But we do not use any ddb scripts.  I keep it in there Just In Case.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110810151256.GA38601>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation