From owner-freebsd-stable@FreeBSD.ORG Thu Aug 11 09:43:27 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1005D106566B; Thu, 11 Aug 2011 09:43:27 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id AB1C48FC0C; Thu, 11 Aug 2011 09:43:26 +0000 (UTC) Received: by yxl31 with SMTP id 31so1410923yxl.13 for ; Thu, 11 Aug 2011 02:43:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=PKtPQIWrVjkNrpcvkQulBIRfLOVajtCZ/GGcfZKxnOk=; b=Fyc1awJpwgbym5KdaTdFBycUDD8gs5/zyFNcc4O5X21rgAwqBiaaEIZd1gUwM96+g1 vIs0FXLbyF6MYGHUzUzeX+gBDIJijzvHhiJ3qk+8Mk3pMZID76l8+QFZ1nHhf8zsaGsB WtcgEkLDSoSTiAPRnhZJQp7+aY3UerLQzauhE= MIME-Version: 1.0 Received: by 10.236.181.6 with SMTP id k6mr6817260yhm.102.1313055805946; Thu, 11 Aug 2011 02:43:25 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.236.108.33 with HTTP; Thu, 11 Aug 2011 02:43:25 -0700 (PDT) In-Reply-To: <20110811092858.GA94514@icarus.home.lan> References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk> <4E4380C0.7070908@FreeBSD.org> <44DD20E1CFA949E8A1B15B3847769DCB@multiplay.co.uk> <20110811092858.GA94514@icarus.home.lan> Date: Thu, 11 Aug 2011 11:43:25 +0200 X-Google-Sender-Auth: P0srNRwDoN4pWyh4w3YeKXSZdR0 Message-ID: From: Attilio Rao To: Jeremy Chadwick Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-stable@freebsd.org, Steven Hartland , Andriy Gapon Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Aug 2011 09:43:27 -0000 2011/8/11 Jeremy Chadwick : > On Thu, Aug 11, 2011 at 09:59:36AM +0100, Steven Hartland wrote: >> That's not the issue as its happening across board over 130 machines :( > > Agreed, bad hardware sounds unlikely here. =C2=A0I could believe some str= ange > incompatibility (e.g. BIOS quirk or the like[1]) that might cause problem= s > en masse across many servers, but hardware issues are unlikely in this > situation. > > [1]: I mention this because we had something similar happen at my > workplace. =C2=A0For months we used a specific model of system from our > vendor which worked reliably, zero issues. =C2=A0Then we got a new shipme= nt > of boxes (same model as prior) which started acting very odd (often AHCI > timeout issues or MCEs which when decoded would usually turn out to be > nonsensical). =C2=A0It took weeks to determine the cause given how slow t= he > vendor was to respond: root cause turned out to be that the vendor > decided, on a whim, to start shipping a newer BIOS version which wasn't > "as compatible" with Solaris as previous BIOSes. =C2=A0Downgrading all th= e > systems to the older BIOS fixed the problem. That falls in the "hw problem" category for me. Anyway, we really would need much more information in order to take a proactive action. Would it be possible to access to one of the panic'ing machine? Is it always the same panic which is happening or it is variadic (like: once page fault, once fatal double fault, once fatal trap, etc.). Whatever informations you can provide may be valuable here. Thanks, Attilio --=20 Peace can only be achieved by understanding - A. Einstein