From owner-freebsd-stable@FreeBSD.ORG Wed Aug 10 16:00:21 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ACA07106564A for ; Wed, 10 Aug 2011 16:00:21 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta06.westchester.pa.mail.comcast.net (qmta06.westchester.pa.mail.comcast.net [76.96.62.56]) by mx1.freebsd.org (Postfix) with ESMTP id 5E8218FC22 for ; Wed, 10 Aug 2011 16:00:21 +0000 (UTC) Received: from omta11.westchester.pa.mail.comcast.net ([76.96.62.36]) by qmta06.westchester.pa.mail.comcast.net with comcast id Jfv81h0020mv7h056g0MbW; Wed, 10 Aug 2011 16:00:21 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta11.westchester.pa.mail.comcast.net with comcast id Jg0L1h00N1t3BNj3Xg0Lpk; Wed, 10 Aug 2011 16:00:21 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 050B0102C19; Wed, 10 Aug 2011 09:00:19 -0700 (PDT) Date: Wed, 10 Aug 2011 09:00:19 -0700 From: Jeremy Chadwick To: Steven Hartland Message-ID: <20110810160018.GA40279@icarus.home.lan> References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk> <20110810151256.GA38601@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Aug 2011 16:00:21 -0000 On Wed, Aug 10, 2011 at 04:46:17PM +0100, Steven Hartland wrote: > >On Wed, Aug 10, 2011 at 03:22:52PM +0100, Steven Hartland wrote: > >>The base stack reported is a double fault with no additional > >>details and CTRL+ALT+ESC fails to break to the debugger as > >>does and NMI, even though it at least tries printing the > >>following many times some quite jumbled:- > >>NMI ... going to debugger > > >If you're generating the NMI yourself (possibly via the KVM, etc.) then > >okay, that's different. I'm trying to discern whether or not *you're* > >generating the NMI, or if the NMI just happens and causes a panic for > >you and that's what you're worried about. > > Yer generating it after panic in order to try and get to the debugger :) Understood, thanks for clarifying. > >Now to discuss the "jumbled console output": > ... > >The default (assuming your kernel configs are based off of GENERIC > >within the past 4-5 years) is 128. However, the same developers stated > >that they have great reservations over increasing this number > >dramatically (meaning, something like 256 will probably work, but larger > >"may have repercussions which are unknown at this time"). > > Might try that if it will help but with so many production machines to > action I'd like to try and avoid if possible. I've used PRINTF_BUFR_SIZE=256 with success on our systems, but since it doesn't actually *solve* the problem, I just use the default 128 and just grit my teeth when we experience it. It's larger values (e.g. 512/1024, etc.) which there is concern over. > >In combination with this, we use the following in /etc/rc.conf (the > >dumpdev line is important, else savecore won't pick up anything): > > > >dumpdev="auto" > > I thought this was ment to be the default from back in the 6.x days but > it didnt seem to work, so I added the gptid device from /etc/fstab /etc/defaults/rc.conf has dumpdev="NO", which affects two things: both /etc/rc.d/dumpon (this script is a little tricky, you really have to read it slowly/pay close attention to what's going on), and /etc/rc.d/savecore. I've always wondered why dumpdev="NO" is the default, not "auto", since on a system with no swap devices in /etc/fstab dumpdev="auto" should behave the same. Possibly the idea of the default is to ensure that savecore(8) never gets run (e.g. there's no guarantee someone has /var/crash, or a /var that's big enough to hold a crash dump; possibly embedded systems or NFS-only systems, for example). Touchy subject I guess. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |