Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Jul 2008 22:30:53 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Andre Oppermann <andre@FreeBSD.org>
Cc:        FreeBSD Net <freebsd-net@FreeBSD.org>, Ingo Flaschberger <if@xip.at>, Paul <paul@gtcomm.net>
Subject:   Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]
Message-ID:  <20080707213356.G7572@besplex.bde.org>
In-Reply-To: <4871FB66.1060406@freebsd.org>
References:  <4867420D.7090406@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <ea7b9c170806302050p2a3a5480t29923a4ac2d7c852@mail.gmail.com> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <alpine.LFD.1.10.0807021052041.557@filebunker.xip.at> <486B4F11.6040906@gtcomm.net> <alpine.LFD.1.10.0807021155280.557@filebunker.xip.at> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <alpine.LFD.1.10.0807041106591.19613@filebunker.xip.at> <486DF1A3.9000409@gtcomm.net> <alpine.LFD.1.10.0807041303490.20760@filebunker.xip.at> <486E65E6.3060301@gtcomm.net> <alpine.LFD.1.10.0807052356130.2145@filebunker.xip.at> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 7 Jul 2008, Andre Oppermann wrote:

> Bruce Evans wrote:
>> What are the other overheads?  I calculate 1.644Mpps counting the 
>> inter-frame
>> gap, with 64-byte packets and 64-header_size payloads.  If the 64 bytes
>> is for the payload, then the max is much lower.
>
> The theoretical maximum at 64byte frames is 1,488,100.  I've looked
> up my notes the 1.244Mpps number can be ajusted to 1.488Mpps.

Where is the extra?  I still get 1.644736 Mpps (10^9/(8*64+96)).
1.488095 is for 64 bits extra (10^9/(8*64+96+64)).

>>>> I hoped to reach 1Mpps with the hardware I mentioned some mails before, 
>>>> but 2Mpps is far far away.
>>>> Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium.
>>> 
>>> This is more or less expected.  PCI32 is not able to sustain high
>>> packet rates.  The bus setup times kill the speed.  For larger packets
>>> the ratio gets much better and some reasonable throughput can be achieved.
>> 
>> I get about 640 kpps without forwarding (sendto: slightly faster;
>> recvfrom: slightly slower) on a 2.2GHz A64.  Underclocking the memory
>> from 200MHz to 100MHz only reduces the speed by about 10%, while not
>> overclocking the CPU by 10% reduces the speed by the same 10%, so the
>> system is apparently still mainly CPU-bound.
>
> On PCI32@33MHz?  He's using a 1.2GHz Mobile Pentium on top of that.

Yes.  My example shows that FreeBSD is more CPU-bound than I/O bound up
to CPUs considerably faster than a 1.2GHz Pentium (though PentiumM is
fast relative to its clock speed).  The memory interface may matter more
than the CPU clock.

>>> NetFPGA doesn't have enough TCAM space to be useful for real routing
>>> (as in Internet sized routing table).  The trick many embedded networking
>>> CPUs use is cache prefetching that is integrated with the network
>>> controller.  The first 64-128bytes of every packet are transferred
>>> automatically into the L2 cache by the hardware.  This allows relatively
>>> slow CPUs (700 MHz Broadcom BCM1250 in Cisco NPE-G1 or 1.67-GHz Freescale
>>> 7448 in NPE-G2) to get more than 1Mpps.  Until something like this is
>>> possible on Intel or AMD x86 CPUs we have a ceiling limited by RAM speed.
>> 
>> Does using fa$ter memory (speed and/or latency) help here?  64 bytes
>> is so small that latency may be more of a problem, especially without
>> a prefetch.
>
> Latency.  For IPv4 packet forwarding only one cache line per packet
> is fetched.  More memory speed only helps with the DMA from/to the
> network card.

I use low-end memory, but on the machine that does 640 kpps it somehow
has latency almost 4 times as low as on new FreeBSD cluster machines
(~42 nsec instead of ~150).  perfmon (fixed for AXP and A64) and hwpmc
report an average of 11 k8-dc-misses per sendto() while sending via
bge at 640 kpps.  11 * 42 accounts for 442 nsec out of the 1562 per
packet at this rate.  11 * 150 = 1650 would probably make this rate
unachievable despite the system having 20 times as much CPU and bus.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080707213356.G7572>