Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Aug 2011 09:00:19 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: debugging frequent kernel panics on 8.2-RELEASE
Message-ID:  <20110810160018.GA40279@icarus.home.lan>
In-Reply-To: <ADF5E597D1C0428D8FB838D94BDEB3A4@multiplay.co.uk>
References:  <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk> <20110810151256.GA38601@icarus.home.lan> <ADF5E597D1C0428D8FB838D94BDEB3A4@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 10, 2011 at 04:46:17PM +0100, Steven Hartland wrote:
> >On Wed, Aug 10, 2011 at 03:22:52PM +0100, Steven Hartland wrote:
> >>The base stack reported is a double fault with no additional
> >>details and CTRL+ALT+ESC fails to break to the debugger as
> >>does and NMI, even though it at least tries printing the
> >>following many times some quite jumbled:-
> >>NMI ... going to debugger
> 
> >If you're generating the NMI yourself (possibly via the KVM, etc.) then
> >okay, that's different.  I'm trying to discern whether or not *you're*
> >generating the NMI, or if the NMI just happens and causes a panic for
> >you and that's what you're worried about.
> 
> Yer generating it after panic in order to try and get to the debugger :)

Understood, thanks for clarifying.

> >Now to discuss the "jumbled console output":
> ...
> >The default (assuming your kernel configs are based off of GENERIC
> >within the past 4-5 years) is 128.  However, the same developers stated
> >that they have great reservations over increasing this number
> >dramatically (meaning, something like 256 will probably work, but larger
> >"may have repercussions which are unknown at this time").
> 
> Might try that if it will help but with so many production machines to
> action I'd like to try and avoid if possible.

I've used PRINTF_BUFR_SIZE=256 with success on our systems, but since it
doesn't actually *solve* the problem, I just use the default 128 and
just grit my teeth when we experience it.  It's larger values (e.g.
512/1024, etc.) which there is concern over.

> >In combination with this, we use the following in /etc/rc.conf (the
> >dumpdev line is important, else savecore won't pick up anything):
> >
> >dumpdev="auto"
> 
> I thought this was ment to be the default from back in the 6.x days but
> it didnt seem to work, so I added the gptid device from /etc/fstab

/etc/defaults/rc.conf has dumpdev="NO", which affects two things: both
/etc/rc.d/dumpon (this script is a little tricky, you really have to
read it slowly/pay close attention to what's going on), and
/etc/rc.d/savecore.

I've always wondered why dumpdev="NO" is the default, not "auto", since
on a system with no swap devices in /etc/fstab dumpdev="auto" should
behave the same.  Possibly the idea of the default is to ensure that
savecore(8) never gets run (e.g. there's no guarantee someone has
/var/crash, or a /var that's big enough to hold a crash dump; possibly
embedded systems or NFS-only systems, for example).

Touchy subject I guess.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110810160018.GA40279>