From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 10 16:00:21 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ACA07106564A
	for <freebsd-stable@freebsd.org>; Wed, 10 Aug 2011 16:00:21 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta06.westchester.pa.mail.comcast.net
	(qmta06.westchester.pa.mail.comcast.net [76.96.62.56])
	by mx1.freebsd.org (Postfix) with ESMTP id 5E8218FC22
	for <freebsd-stable@freebsd.org>; Wed, 10 Aug 2011 16:00:21 +0000 (UTC)
Received: from omta11.westchester.pa.mail.comcast.net ([76.96.62.36])
	by qmta06.westchester.pa.mail.comcast.net with comcast
	id Jfv81h0020mv7h056g0MbW; Wed, 10 Aug 2011 16:00:21 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta11.westchester.pa.mail.comcast.net with comcast
	id Jg0L1h00N1t3BNj3Xg0Lpk; Wed, 10 Aug 2011 16:00:21 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 050B0102C19; Wed, 10 Aug 2011 09:00:19 -0700 (PDT)
Date: Wed, 10 Aug 2011 09:00:19 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Steven Hartland <killing@multiplay.co.uk>
Message-ID: <20110810160018.GA40279@icarus.home.lan>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk>
	<20110810151256.GA38601@icarus.home.lan>
	<ADF5E597D1C0428D8FB838D94BDEB3A4@multiplay.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <ADF5E597D1C0428D8FB838D94BDEB3A4@multiplay.co.uk>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Aug 2011 16:00:21 -0000

On Wed, Aug 10, 2011 at 04:46:17PM +0100, Steven Hartland wrote:
> >On Wed, Aug 10, 2011 at 03:22:52PM +0100, Steven Hartland wrote:
> >>The base stack reported is a double fault with no additional
> >>details and CTRL+ALT+ESC fails to break to the debugger as
> >>does and NMI, even though it at least tries printing the
> >>following many times some quite jumbled:-
> >>NMI ... going to debugger
> 
> >If you're generating the NMI yourself (possibly via the KVM, etc.) then
> >okay, that's different.  I'm trying to discern whether or not *you're*
> >generating the NMI, or if the NMI just happens and causes a panic for
> >you and that's what you're worried about.
> 
> Yer generating it after panic in order to try and get to the debugger :)

Understood, thanks for clarifying.

> >Now to discuss the "jumbled console output":
> ...
> >The default (assuming your kernel configs are based off of GENERIC
> >within the past 4-5 years) is 128.  However, the same developers stated
> >that they have great reservations over increasing this number
> >dramatically (meaning, something like 256 will probably work, but larger
> >"may have repercussions which are unknown at this time").
> 
> Might try that if it will help but with so many production machines to
> action I'd like to try and avoid if possible.

I've used PRINTF_BUFR_SIZE=256 with success on our systems, but since it
doesn't actually *solve* the problem, I just use the default 128 and
just grit my teeth when we experience it.  It's larger values (e.g.
512/1024, etc.) which there is concern over.

> >In combination with this, we use the following in /etc/rc.conf (the
> >dumpdev line is important, else savecore won't pick up anything):
> >
> >dumpdev="auto"
> 
> I thought this was ment to be the default from back in the 6.x days but
> it didnt seem to work, so I added the gptid device from /etc/fstab

/etc/defaults/rc.conf has dumpdev="NO", which affects two things: both
/etc/rc.d/dumpon (this script is a little tricky, you really have to
read it slowly/pay close attention to what's going on), and
/etc/rc.d/savecore.

I've always wondered why dumpdev="NO" is the default, not "auto", since
on a system with no swap devices in /etc/fstab dumpdev="auto" should
behave the same.  Possibly the idea of the default is to ensure that
savecore(8) never gets run (e.g. there's no guarantee someone has
/var/crash, or a /var that's big enough to hold a crash dump; possibly
embedded systems or NFS-only systems, for example).

Touchy subject I guess.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |