Date: Wed, 21 Jul 2010 13:28:20 -0400 From: John Baldwin <jhb@freebsd.org> To: Markus Gebert <markus.gebert@hostpoint.ch> Cc: freebsd-stable@freebsd.org, Andriy Gapon <avg@icyb.net.ua> Subject: Re: 8.1-RC2 MCE caused by some LAPIC/clock changes? Message-ID: <201007211328.20708.jhb@freebsd.org> In-Reply-To: <BB90561D-87E3-4732-BC94-E702C64A1B32@hostpoint.ch> References: <6B57591F-9FA2-45EB-825F-1DB025C0635D@hostpoint.ch> <4C46E9E5.8000204@icyb.net.ua> <BB90561D-87E3-4732-BC94-E702C64A1B32@hostpoint.ch>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, July 21, 2010 12:44:49 pm Markus Gebert wrote: > > On 21.07.2010, at 14:36, Andriy Gapon wrote: > > > on 21/07/2010 15:25 Markus Gebert said the following: > >> On 21.07.2010, at 10:33, Andriy Gapon wrote: > >> > >>> on 21/07/2010 03:57 Markus Gebert said the following: > >>>> Another thing though: Today I compared verbose boot output from 8-stable > >>>> and the current box. I saw that the ioapic sets up IRQ routing differently > >>>> on these two systems although the hardware is the same. This seemed not so > >>>> interesting at first, but then I noticed that 8-stable sets up two routes > >>>> (to lapic0 and lapic2, or sometimes lapic3) for IRQ58 (mpt0), while current > >>>> only uses one route (to lapic0). > >>> My understanding that it's not "two routes", but re-routing. During early > >>> boot all interrupts are bound to BSP; later, when APs become online, the > >>> interrupts are re-distributed among available CPUs. > >> > >> I guess you're right, misinterpretation on my side. Thanks for clarifying this. > >> > >> > >> Now being aware of this, it seems to me that in the machdep.lapic_allclocks=0 > >> case, there might just be more interrupts to be assigned/routed due to "more > >> clocks being used". If that's true, maybe it's just "luck" that in this case > >> the mpt interrupt gets assigned to lapic0/cpu0 and the box runs fine. I'm just > >> guessing though, since I have no clue how interrupts are assigned to lapics > >> exactly (round-robin? some logic?). > > > > Yes, round-robin, for interrupts that not explicitly bound to specific CPUs. > > The process is deterministic, but hard to predict indeed. > > I see. > > > >>>> I used 'cpuset -c -l 0 -x 58' in an attempt to make my 8-stable box behave > >>>> like the one running current. Indeed, this seems to have changed IRQ58 to > >>>> be routed to lapic0 only. And the box was running for hours without showing > >>>> the symptoms. > >>>> > >>>> I just checked boot verbose outpout of my 8-stable box again (booted with > >>>> machdep.lapic_allclocks=0 as mentioned above). And now it seems to have set > >>>> up IRQ routes just like the current box (one route for IRQ58 to lapic0). > >>> Not sure how to interpret this properly. One possibility is a hardware > >>> problem where interrupt message route between ioapic2 and CPU to which lapic3 > >>> belongs is flaky. Perhaps, this might be a FreeBSD problem: it could be that > >>> the system somehow tells to not set up such routes, but we don't listen. But > >>> this is far fetched. > >> > >> > >> I'm not sure either. If my "theory" above proved to be true, it would have been > >> just luck, that 6.x and 7.x (and current) run just fine on the X4100M2. A > >> (short) test on Ubuntu didn't trigger the problem, so the Linux kernel is > >> either lucky too by selecting an interrupt route that is "not flaky", or > >> there's indeed some way to figure out not to use some lapics for some > >> interrupts. Or we didn't test Linux thoroughly enough. > > > > Yep, it would be interesting to see how interrupts were distributed among CPUs on > > that Linux. > > > Well I can't provide this kind of information about _that_ Ubuntu Linux right now, because it was wiped from the second test machine to test current. But we have a few productive X4100M2 running Debian and there it looks like this: > > ---- > # uname -a > Linux XX 2.6.26-2-amd64 #1 SMP Tue Mar 9 22:29:32 UTC 2010 x86_64 GNU/Linux > # cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 > 0: 36 0 0 1 IO-APIC-edge timer > 1: 0 0 0 2 IO-APIC-edge i8042 > 7: 1 0 0 0 IO-APIC-edge > 8: 0 0 0 1 IO-APIC-edge rtc0 > 9: 0 0 0 0 IO-APIC-fasteoi acpi > 12: 0 0 0 4 IO-APIC-edge i8042 > 14: 0 0 0 74 IO-APIC-edge ide0 > 21: 0 0 0 2 IO-APIC-fasteoi ehci_hcd:usb2 > 22: 0 0 1 31 IO-APIC-fasteoi ohci_hcd:usb1 > 56: 52836 302759221 129 50868 IO-APIC-fasteoi eth2 > 57: 288921 1070387307 225 98210 IO-APIC-fasteoi eth3 > 1271: 92146 45282139 9 4885 PCI-MSI-edge ioc0 > NMI: 0 0 0 0 Non-maskable interrupts > LOC: 258132347 312890202 166484456 147070084 Local timer interrupts > RES: 118623017 84540907 100591028 107693244 Rescheduling interrupts > CAL: 108384 89281 110429 104206 function call interrupts > TLB: 14719843 24105630 12456528 18955140 TLB shootdowns > TRM: 0 0 0 0 Thermal event interrupts > THR: 0 0 0 0 Threshold APIC interrupts > SPU: 0 0 0 0 Spurious interrupts > ERR: 1 > ---- > > Not sure how to interpret this. At first sight no IRQ58, but I guess they might be using MSI for mpt, which might avoid the problem entirely. Yes, the FreeBSD mpt(4) driver should also use MSI by default unless you have disabled it for some reason. Also, Linux will dynamically reshuffle IRQs among CPUs based on load, so the I/O APIC/MSI -> CPU routing is more dynamic in that case. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201007211328.20708.jhb>