From owner-freebsd-stable@FreeBSD.ORG Mon Jul 12 12:25:56 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A188106566C; Mon, 12 Jul 2010 12:25:56 +0000 (UTC) (envelope-from markus.gebert@hostpoint.ch) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [217.26.48.124]) by mx1.freebsd.org (Postfix) with ESMTP id 435A98FC15; Mon, 12 Jul 2010 12:25:56 +0000 (UTC) Received: from [77.109.131.203] (port=60477 helo=ch4buk-en0.office.hostpoint.internal) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1OYI54-0005Vs-UD; Mon, 12 Jul 2010 14:25:55 +0200 Mime-Version: 1.0 (Apple Message framework v1078) From: Markus Gebert In-Reply-To: Date: Mon, 12 Jul 2010 14:25:54 +0200 Message-Id: <591666AA-E6CA-4478-9E96-3A2D558BD6B4@hostpoint.ch> References: <6B57591F-9FA2-45EB-825F-1DB025C0635D@hostpoint.ch> <201007091603.31843.jhb@freebsd.org> <08562D52-02AA-46CF-BFCD-00D0A3C4DC34@hostpoint.ch> To: alc@freebsd.org X-Mailer: Apple Mail (2.1078) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-stable@freebsd.org, John Baldwin Subject: Re: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jul 2010 12:25:56 -0000 On 10.07.2010, at 19:37, Alan Cox wrote: > On Fri, Jul 9, 2010 at 6:53 PM, Markus Gebert = wrote: > [snip] >=20 > Yes, this hardware comes from Sun directly, but getting Sun (/Oracle) = support for this issue is gonna be tough. FreeBSD is unsupported, and in = a short test we couldn't reproduce the problem with a Linux kernel. = While I agree that a hardware issue has always been and still is a = possibility to be considered, the fact that we tested this on two = machines remains as well as the fact that 6.x, 7.x do not show the = behavior. Another possibility is of course, that the X4100 is prone to = such issues and somehow 6.x and 7.x have workarounds we're not aware of = or just do something different in way so that this issue does not get = triggered. >=20 >=20 > 8.1 is our first release to have the driver for configuring and = reporting machine check exceptions enabled by default. Prior to 8.1, = you had to explicitly enable the driver at boot time. I was aware of that, but I don't think that it might be the cause. = Disabling MCA just makes the reporting go away, but the MCE and = subsequent fatal trap remain. With default BIOS settings, the OS does = not even get a chance to panic, the system just forces a reset before = the OS could do anything. And, as far as I can tell, that did not happen = on previous stable branches. Don't know though wether MCA changes the situation even when disabled in = loader.conf (hw.mca.enabled=3D0). I just checked our 7.2 setup, and MCA = does not seem to be in an 7.2 kernel, so I guess this was added to 8.0 = and activated by default in 8.1. To be honest, we did not check, wether = 8.0 shows the same behavior, but I guess running 8.1 with = hw.mca.enabled=3D0 should pretty much give the same situation as far as = MCA is concerned. Is there a way to get rid of MCA completely? (as opposed to just = "turning it off" via loader.conf) Markus