Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Jan 2011 07:52:11 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Slawa Olhovchenkov <slw@zxy.spb.ru>
Cc:        freebsd-performance@freebsd.org, Julian Elischer <julian@freebsd.org>, Stefan Lambrev <stefan.lambrev@moneybookers.com>
Subject:   Re: Interrupt performance
Message-ID:  <20110129070205.Q7034@besplex.bde.org>
In-Reply-To: <20110128172516.GG18170@zxy.spb.ru>
References:  <20110128143355.GD18170@zxy.spb.ru> <22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com> <20110128161035.GF18170@zxy.spb.ru> <CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com> <4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 28 Jan 2011, Slawa Olhovchenkov wrote:

> On Fri, Jan 28, 2011 at 09:10:20AM -0800, Julian Elischer wrote:
>
>> On 1/28/11 8:15 AM, Stefan Lambrev wrote:
>>> The overhead comes from badly written software.
>>> This software is optimized for linux and you have to optimize it for freebsd, then you will have the same overhead.
>>> All those *popular* benchmarks like hping, iperf, netperf have some strange optimizations for linux - we call them linuxism.
>>> Just search the archives - I'm pretty sure patches are flying around.
>>
>> He wants to know why the freeBSD driver spends 8 x as much time on
>> each interrupt.

32 x (1/4 as many interrupts reported to be taking 8 x longer altogether).

This must be because the reporting is more broken in Linux :-).

> Yes!
>
>> there are of course several possible answers, including:
>>
>> 1/ Sometimes BSD and Linux report things differently. Linux may or may not
>> account for the lowest level interrupt tie the same as BSD
>
> But I see only 20% idle on FreeBSD and 80% idle on Linux.

The time must be counted somewhere, so when it is not properly accounted
to packet handling, and nothing much else is running, it is accounted to
idle.

To see how much CPU is actually available, run something else and see how
fast it runs.  A simple counting loops works well on UP systems.

>> 2/ the BSD driver for that chip may be badly written, or may
>> be doing more or different work for some reason
>> 3/ the FreeBSD interrupt code may be misconfigured for that driver.
>>
>> or maybe combinations...

Possibly, but it's a low-end NIC and those normally take a lot of CPU.
128 kpps might take 20% of 1 3GHz CPU for even a high-end NIC on FreeBSD.
Linux has generally lower overheads and should be expected to reduce this
a bit, to perhaps as low as 15%, depending on how much of the overhead is
due to the NIC.

>> there are profiling tools that you may decide to run.
>
> What tools I can use on amd64?
>
> I boot kernel configured with 'config -p'.
> Most time in spinlock_exit and acpi_cpu_c1.

Normal profiling works poorly (I see you found my old mail about high
resolution profiling).  Linux might be misreporting the overhead for
exactly the same reasons that normal profiling works poorly:
- the profiling clock frequency of ~1 KHz was adequate for 5 MHz machines
   in 1998, but is now too slow.  Statistics clocks are even slower (128
   Hz in FreeBSD, and possibly 100 Hz (?) jiffies in Linux).
- the statistics clock might be too synchronized with other interrupts.
   The above spinlock_exit and acpi_cpu_c1 times indicate that the
   statistics clock almost always fires on exit from another spinlock
   and/or inside ACPI, for waking up from idle for the latter.  Seeing
   lots of exits from spinlocks may indicated that spinlocks are being
   used too much.
But FreeBSD will report interrupt times and system for non-fast-interrupts
to an accuracy of about 1 microsecond, since it doesn't use the
statistics clock much for this.  OTOH, for fast interrupts it is typical
behaviour in FreeBSD and Linux to not see them at all from the statistics
clock interrupt, since they mask all interrupts so they mask the
statistics clock interrupt in particular.  In FreeBSD, lots of time
apparently spent in spinlock_exit is a typical result of this, or at
least similar things, since spinlock_enter masks all interrupts (except
in my version of course).  Linux doesn't have fast interrupts in the
same way that FreeBSD does, but at least in old versions almost all of
its interrupts masked other interrupts a lot.

>>> On Jan 28, 2011, at 6:10 PM, Slawa Olhovchenkov wrote:
>>>
>>>> On Fri, Jan 28, 2011 at 06:03:15PM +0200, Stefan Lambrev wrote:
>>>>
>>>>> Do the test with netblast ;)
>>>>> Most perf tools are written badly and for Linux.
>>>>> In our internal test netblast running on freebsd outperform everything else.
>>>> I don't speak about bad performance.
>>>> I speak about overhead.
>>>>
>>>> Linux: overhead 7% for 56K int/s
>>>> FreeBSD: overhead 59% for 14K int/s
>>>>
>>>> For processing 1/4 interrupts FreeBSD need 8x CPU.

You showed context switches in another reply.  56k interrupts on FreeBSD
would give at least 112k context switches taking several uSec each to do
nothing except switch.  This would give an overhead in the 59% range.
14K is not so bad, but still too high unless you have a spare CPU or 32
to handle it.  Part of the lowness of low-end NICs is that they tend to
generate too many interrupts and don't have much or any way to control
this.  Linux will certainly be about to handle 56K int/S better than
FreeBSD since it doesn't have heavyweight interrupt threads AFAIK.
FreeBSD also has "fast" interrupts, which are much like normal interrupts
used to be in FreeBSD.  I don't know if your NIC driver uses these.  I
guess not, since if it did then it should move the "interrupt" processing
to a task queue, where it would show up under another label and be reduced
insignificantly.

>>>>> P.S. - /usr/src/tools/tools/netrate/netblast - we have tested little more expensive card - em/igb and bce.

netblast should be able to saturate a low-end NIC, but may take 100% of 1
CPU to do so (it has to busy-wait, since there is no way to select() on
the NIC ring being unfull, and timeouts don't work either since their
granularity is too large).  If the NIC activity alone saturates 1 CPU,
then you might see the 100% CPU being shown for Linux too.

>>>>>> re0:<RealTek 8169SC/8110SC Single-chip Gigabit Ethernet>  port 0x4000-0x40ff mem 0xf0100000-0xf01000ff irq 19 at device 4.0 on pci11
>>>>>> re0: Chip rev. 0x18000000
>>>>>> re0: MAC rev. 0x00000000
>>>>>> miibus0:<MII bus>  on re0
>>>>>> rgephy0:<RTL8169S/8110S/8211B media interface>  PHY 1 on miibus0

I don't really know if this is low-end, but guess all RealTeks are :-).

>>>>>> CPU: Intel(R) Celeron(R) CPU          420  @ 1.60GHz (1596.05-MHz K8-class CPU)

This is low end :-).

I mostly use old AthlonXP and Athlon64 2GHz systems for network testing,
These are a bit faster than the above.  A single medium end bge (5701)
on a PCI33 bus takes 100% CPU at about 512 kpps.  A single low end bge
(5705+) on a PC1333 takes 120% CPU at about 240 kpps on a 2-core system.
Linux-2.6.10 saturates well below 512 kpps on the same hardware.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110129070205.Q7034>