Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Feb 2008 12:20:28 -0800
From:      Tech Lab Manager <tech@liveoaksf.org>
To:        freebsd-acpi@freebsd.org
Subject:   Re: SMP, ACPI and interrupt storm
Message-ID:  <7D7C052A-86E1-489F-B2F9-541C8522EBF4@liveoaksf.org>
In-Reply-To: <200802040900.54630.jhb@freebsd.org>
References:  <429F40B0-20EE-4F47-847A-A6B1E91BA79F@liveoaksf.org> <47A217FC.1080606@root.org> <8EE3D963-E390-4F45-A1D1-2295C1767B80@liveoaksf.org> <200802040900.54630.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Feb 4, 2008, at 6:00 AM, John Baldwin wrote:

> On Thursday 31 January 2008 02:35:52 pm Tech Lab Manager wrote:
>> On Jan 31, 2008, at 10:48 AM, Nate Lawson wrote:
>>
>>> Tech Lab Manager wrote:
>>>> Sorry for the cross-post from freebsd-smb.
>>>> Building 6.3-RELEASE and 7.0-RC1 on dual Xeon (4 CPU) boxes:
>>>>     options         SMP
>>>>     device          apic
>>>> SMP kernel builds fine, all 4 CPUs launch on reboot.
>>>> But I get a TON of interrupts from acpi0 -- about 67,000 per second
>>>> according to vmstat -i. With system at idle and almost no services
>>>> running, here is output of top -S:
>>>> last pid:   877;  load averages:  1.18,  0.48,  0.19
>>>> 75 processes:  6 running, 54 sleeping, 15 waiting
>>>> CPU states:  0.0% user,  0.0% nice,  0.2% system, 22.4%
>>>> interrupt,  77.4% idle
>>>> Mem: 31M Active, 12M Inact, 28M Wired, 16K Cache, 15M Buf, 3822M  
>>>> Free
>>>> Swap: 4096M Total, 4096M Free
>>>> PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU
>>>> COMMAND
>>>> 10 root         1 171   52     0K     8K RUN    3   1:11 99.18%
>>>> idle: cpu3
>>>> 13 root         1 171   52     0K     8K CPU0   0   1:10 98.88%
>>>> idle: cpu0
>>>> 12 root         1 171   52     0K     8K CPU1   1   1:09 98.78%
>>>> idle: cpu1
>>>> 21 root         1 -52 -171     0K     8K CPU2   2   0:54 87.24%
>>>> irq9: acpi0
>>>> 11 root         1 171   52     0K     8K RUN    2   0:17 11.19%
>>>> idle: cpu2
>>>> Notice high load and interrupt % of CPU.
>>>> If turn off ACPI (e.g. set hint.apic.0.disabled=1 in /boot/
>>>> loader.conf),
>>>> the interrupt storm ceases, but then I'm only running on one CPU.
>>>
>>> That doesn't turn off acpi, that turns of the APIC (interrupt
>>> controller).  Try:
>>>  hint.acpi.0.disabled=1
>>
>> Sorry, my mistake in writing ACPI above -- I *was* trying to turn off
>> apic, based on a note in the FreeBSD handbook.
>>
>> Disabling ACPI as you suggest above has the same effect as turning
>> off APIC: the interrupt storm is disabled but only one CPU is  
>> launched.
>>
>>>
>>>> The BIOS ACPI settings are all Enabled. Hyperthreading is Enabled.
>>>> These machines have been running RedHat Enterprise 5.0 with full
>>>> multiprocessor support.
>>>
>>> This looks like a failure to sleep in C1 (hlt).  Someone else
>>> reported this probably earlier, but all debugging showed the
>>> inexplicable -- the HLT instruction was being executed but just did
>>> not work (returned immediately).
>>>
>>> There will be a new 7.0 build that fixes one interrupt storm
>>> related to level-triggered GPEs.  If you can cvsup your 7.0 branch
>>> (RELENG_7_0) and retry, that might be helpful to see if it also
>>> fixes your problem.
>>
>> okay, I'm on RC1, will switch to RELENG and report back.
>>
>> I'm not sure if this is a red herring, but acpidump -t reports:
>>
>> Type=INT Override
>>          BUS=0
>>          IRQ=0
>>          INTR=2
>>          Flags={Polarity=conforming, Trigger=conforming}
>>
>> which looks wrong on several counts (IRQ, INTR should be 9,
>> Trigger=level). dmesg even says:
>> "MADT: Forcing active-low polarity and level trigger for SCI"
>
> No, this is an entry for something other than the SCI.  You can  
> have multiple
> interrupt override entries and this entry is typical on all x86  
> systems with
> APICs (the 8259As are tied into pin 0 as a daisy chain and IRQ0 is  
> tied into
> intpin 2 since IRQ2 isn't usable with 8259As.  Do you have an entry  
> at all
> for IRQ 9?  If not, then the hw.acpi.sci tunables currently won't  
> do anything
> (I can fix it so that they do, however).

Here's an update on this issue.

I csup'ed my source tree (RELENG_7_0 now at RC2) last Friday and  
rebuilt world. Two things look slightly different now:

1) On reboot, I still see an interrupt storm at acpi0 (irq9) at  
around 75k/sec; however over time the interrupt rate actually drops,  
to around 15k/sec after a few days (perhaps it settles further, time  
will tell).

2) load average [at idle] is down quite a bit, from a previous  
average of ~1.0 to an average that seems to vacillate between a low  
of 0.10 to a high of 0.35.

$ top -S
last pid:  1038;  load averages:  0.22,  0.18,  0.15
67 processes:  5 running, 46 sleeping, 16 waiting
CPU states:  0.0% user,  0.0% nice,  0.1% system, 21.0% interrupt,  
78.9% idle
Mem: 6468K Active, 5232K Inact, 23M Wired, 1540K Cache, 8688K Buf,  
3849M Free
Swap: 4096M Total, 4096M Free

   PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU  
COMMAND
    11 root         1 171 ki31     0K     8K CPU3   3  74:15 99.02%  
idle: cpu3
    12 root         1 171 ki31     0K     8K CPU2   2  74:14 99.02%  
idle: cpu2
    13 root         1 171 ki31     0K     8K RUN    1  74:10 99.02%  
idle: cpu1
    24 root         1 -52    -     0K     8K WAIT   0  58:08 83.15%  
irq9: acpi0
    14 root         1 171 ki31     0K     8K RUN    0  16:05 14.84%  
idle: cpu0

Note: for kicks I tried rebuilding the kernel with options  
MPTABLE_FORCE_HTT and IPI_PREEMPTION, though without any apparent  
effect. No device polling, and using SCHED_4BSD for what it's worth.

I don't know what a typical load for a multi-cpu box looks like;  
we've only run single-cpu systems here, and even when working our  
server loads are typically pretty close to 0.0. Basically we  
inherited a bunch of dual Xeon machines and I'd like to make them  
work-- of course I can just run them on one cpu but that seems kind  
of silly. (Unfortunately I'm just a school administrator and not much  
of a hardware guy, so I'm a little out of my depth here...;| )

Thanks for any further assistance anyone can provide.

-- 
John Berliner
Live Oak School



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7D7C052A-86E1-489F-B2F9-541C8522EBF4>