Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Jul 2010 14:41:51 +0200
From:      Markus Gebert <markus.gebert@hostpoint.ch>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-stable <freebsd-stable@freebsd.org>
Subject:   Re: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2?
Message-ID:  <FFB367B2-232D-460D-82B8-C3F03F1B53BE@hostpoint.ch>
In-Reply-To: <08562D52-02AA-46CF-BFCD-00D0A3C4DC34@hostpoint.ch>
References:  <6B57591F-9FA2-45EB-825F-1DB025C0635D@hostpoint.ch> <201007091603.31843.jhb@freebsd.org> <08562D52-02AA-46CF-BFCD-00D0A3C4DC34@hostpoint.ch>

next in thread | previous in thread | raw e-mail | index | archive | help

On 10.07.2010, at 01:53, Markus Gebert wrote:

>> I'm curious if disabling USB legacy support in the BIOS causes it to =
still die=20
>> even with ehci not loaded.  If so, then the SMI# for the ehci =
controller must=20
>> somehow prevent the issue, perhaps by triggering frequently enough to =
slow the=20
>> rate of I/O requests down?
>=20
>=20
> I disabled usb legacy support in the BIOS and booted a kernel with =
usb+ohci+ukbd+ums but without ehci. Unfortunately, I cannot reproduce =
the MCE.


Well, the situation has changed. Machine died over the weekend running =
our test load with above kernel configuration. It seems that not having =
ehci in the kernel at boot just makes the MCE much more unlikely to =
occur, but it occurs. With ehci, I can panic the machine within a =
minute, without ehci it seems to take at least hours. Still, I don't get =
why not having the ehci driver in the kernel should have any effect, =
especially because nothing is attached to it.

Panic message:

----
MCA: Bank 4, Status 0xb400004000030c2b
MCA: Global Cap 0x0000000000000105, Status 0x0000000000000007
MCA: Vendor "AuthenticAMD", ID 0x40f13, APIC ID 2
MCA: CPU 2 UNCOR BUSLG Observer WR I/O
MCA: Address 0xfd00000000
panic: blockable sleep lock (sleep mutex) 128 @ =
/usr/src/sys/vm/uma_core.c:1992
cpuid =3D 2
KDB: enter: panic
[thread pid 12 tid 100039 ]
Stopped at      kdb_enter+0x3d: movq    $0,0x69ccb0(%rip)
----

Don't know, why it's not a fatal trap 28 this time despite an MCE was =
detected. Seen this before though, also with kernels that have ehci and =
with usb legacy support, so seeing a different panic this time seems not =
related to the way the kernel was configured. Maybe a symptom? Or may it =
even be useful? If yes, what should I pull out of DDB?

In the meantime, I'll try harder to reproduce the MCE on current...


Markus




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FFB367B2-232D-460D-82B8-C3F03F1B53BE>