From owner-freebsd-stable@FreeBSD.ORG Thu Aug 11 09:29:00 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3D65106567C for ; Thu, 11 Aug 2011 09:29:00 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta12.westchester.pa.mail.comcast.net (qmta12.westchester.pa.mail.comcast.net [76.96.59.227]) by mx1.freebsd.org (Postfix) with ESMTP id 717E48FC13 for ; Thu, 11 Aug 2011 09:29:00 +0000 (UTC) Received: from omta12.westchester.pa.mail.comcast.net ([76.96.62.44]) by qmta12.westchester.pa.mail.comcast.net with comcast id JxKm1h0020xGWP85CxV0ms; Thu, 11 Aug 2011 09:29:00 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta12.westchester.pa.mail.comcast.net with comcast id JxUz1h00N1t3BNj3YxUz6R; Thu, 11 Aug 2011 09:29:00 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 2A11F102C1A; Thu, 11 Aug 2011 02:28:58 -0700 (PDT) Date: Thu, 11 Aug 2011 02:28:58 -0700 From: Jeremy Chadwick To: Steven Hartland Message-ID: <20110811092858.GA94514@icarus.home.lan> References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk> <4E4380C0.7070908@FreeBSD.org> <44DD20E1CFA949E8A1B15B3847769DCB@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <44DD20E1CFA949E8A1B15B3847769DCB@multiplay.co.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Attilio Rao , freebsd-stable@freebsd.org, Andriy Gapon Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Aug 2011 09:29:00 -0000 On Thu, Aug 11, 2011 at 09:59:36AM +0100, Steven Hartland wrote: > That's not the issue as its happening across board over 130 machines :( Agreed, bad hardware sounds unlikely here. I could believe some strange incompatibility (e.g. BIOS quirk or the like[1]) that might cause problems en masse across many servers, but hardware issues are unlikely in this situation. [1]: I mention this because we had something similar happen at my workplace. For months we used a specific model of system from our vendor which worked reliably, zero issues. Then we got a new shipment of boxes (same model as prior) which started acting very odd (often AHCI timeout issues or MCEs which when decoded would usually turn out to be nonsensical). It took weeks to determine the cause given how slow the vendor was to respond: root cause turned out to be that the vendor decided, on a whim, to start shipping a newer BIOS version which wasn't "as compatible" with Solaris as previous BIOSes. Downgrading all the systems to the older BIOS fixed the problem. In Steve's case this is unlikely to be the situation, but I thought I'd share the story anyway. "SKU ABCXYZ-1" from August 2009 is not necessarily the same thing as "SKU ABCXYZ-1" from May 2010. ;-) This is also why I prefer to buy/build my own systems, since I cannot trust vendors to not mess about with settings w/out changing SKUs, P/Ns, or revision numbers. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |