Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Jul 2010 14:25:57 +0200
From:      Markus Gebert <markus.gebert@hostpoint.ch>
To:        Andriy Gapon <avg@icyb.net.ua>
Cc:        freebsd-stable@freebsd.org, John Baldwin <jhb@freebsd.org>
Subject:   Re: 8.1-RC2 MCE caused by some LAPIC/clock changes?
Message-ID:  <5CABE3EC-1EE7-4B6B-85EA-70AA2A107948@hostpoint.ch>
In-Reply-To: <4C46B0C6.4020400@icyb.net.ua>
References:  <6B57591F-9FA2-45EB-825F-1DB025C0635D@hostpoint.ch>	<9DCFE2F6-D7CB-49CB-8EBC-06C1E5EBB727@hostpoint.ch>	<F744F475-3D2B-4BC6-856A-A5D302AA8681@hostpoint.ch>	<201007201559.45081.jhb@freebsd.org> <6781BC8B-51E0-4F8B-9307-9C062DE70C21@hostpoint.ch> <4C46B0C6.4020400@icyb.net.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

On 21.07.2010, at 10:33, Andriy Gapon wrote:

> on 21/07/2010 03:57 Markus Gebert said the following:
>> Another thing though: Today I compared verbose boot output from =
8-stable and
>> the current box. I saw that the ioapic sets up IRQ routing =
differently on
>> these two systems although the hardware is the same. This seemed not =
so
>> interesting at first, but then I noticed that 8-stable sets up two =
routes (to
>> lapic0 and lapic2, or sometimes lapic3) for IRQ58 (mpt0), while =
current only
>> uses one route (to lapic0).
>=20
> My understanding that it's not "two routes", but re-routing.
> During early boot all interrupts are bound to BSP; later, when APs =
become
> online, the interrupts are re-distributed among available CPUs.

I guess you're right, misinterpretation on my side. Thanks for =
clarifying this.

Now being aware of this, it seems to me that in the =
machdep.lapic_allclocks=3D0 case, there might just be more interrupts to =
be assigned/routed due to "more clocks being used". If that's true, =
maybe it's just "luck" that in this case the mpt interrupt gets assigned =
to lapic0/cpu0 and the box runs fine. I'm just guessing though, since I =
have no clue how interrupts are assigned to lapics exactly (round-robin? =
some logic?).


>> I used 'cpuset -c -l 0 -x 58' in an attempt to make my 8-stable box =
behave
>> like the one running current. Indeed, this seems to have changed =
IRQ58 to be
>> routed to lapic0 only. And the box was running for hours without =
showing the
>> symptoms.
>>=20
>> I just checked boot verbose outpout of my 8-stable box again (booted =
with
>> machdep.lapic_allclocks=3D0 as mentioned above). And now it seems to =
have set
>> up IRQ routes just like the current box (one route for IRQ58 to =
lapic0).
>=20
> Not sure how to interpret this properly.
> One possibility is a hardware problem where interrupt message route =
between
> ioapic2 and CPU to which lapic3 belongs is flaky.
> Perhaps, this might be a FreeBSD problem: it could be that the system =
somehow
> tells to not set up such routes, but we don't listen.  But this is far =
fetched.


I'm not sure either. If my "theory" above proved to be true, it would =
have been just luck, that 6.x and 7.x (and current) run just fine on the =
X4100M2. A (short) test on Ubuntu didn't trigger the problem, so the =
Linux kernel is either lucky too by selecting an interrupt route that is =
"not flaky", or there's indeed some way to figure out not to use some =
lapics for some interrupts. Or we didn't test Linux thoroughly enough.


Markus





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5CABE3EC-1EE7-4B6B-85EA-70AA2A107948>