Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 09 Sep 2008 08:02:36 -0700
From:      Sam Leffler <sam@freebsd.org>
To:        Jacques Fourie <jacques.fourie@gmail.com>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: Routing benchmarks
Message-ID:  <48C6900C.8070708@freebsd.org>
In-Reply-To: <be2f52430809090736v4ab9c87bu2a0adced13811801@mail.gmail.com>
References:  <be2f52430809090633o7b80f23y2749a055f61d5cb0@mail.gmail.com>	<20080909175556.07bac5f0.stas@FreeBSD.org> <be2f52430809090736v4ab9c87bu2a0adced13811801@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Jacques Fourie wrote:
> On Tue, Sep 9, 2008 at 3:55 PM, Stanislav Sedov <stas@freebsd.org> wrote:
>   
>> On Tue, 9 Sep 2008 15:33:30 +0200
>> "Jacques Fourie" <jacques.fourie@gmail.com> mentioned:
>>
>>     
>>> Hi,
>>>
>>> I've performed some benchmark tests on my Gumstix Connex 400 (Intel
>>> Xscale PXA 255 CPU clocked at 400MHz) with a netDuo expansion board.
>>> This board has two smc network interfaces. I configure the gumstix as
>>> a router and measure network throughput with netperf running on
>>> seperate boxes on either side of the gumstix. My initial tests showed
>>> a TCP throughput of 2Mbit/s. After adapting the smc driver to use DMA
>>> this figure went up to 7Mbit/s. Although this is a significant
>>> improvement, it still seems to be a bit slow. Does anyone have any
>>> tips on how I can go about to try and figure out where the bottleneck
>>> lies?  Initial profiling showed that a significant amount of time was
>>> spent doing memory to memory copies of data, but after the DMA change
>>> profiling does not show any obvious culprits.
>>>
>>>       
>> Have you tried checking the speed of the interface itself? Without
>> routing involved? May it be the interfaces itself being so slow?
>>
>> --
>> Stanislav Sedov
>> ST4096-RIPE
>>
>>     
>
> Running netserver on the gumstix shows a throughput of 2.4Mbit/s. At
> the moment I can't get if_bridge to work - will try to figure out what
> is going on. A bridging benchmark may be more informative.
>   

You said you did profiling but you didn't provide the data to inspect.  
It's possible kernel profiling has never been tried on your platform; 
did you sanity check the results?  (e.g. run a known test load and check 
results; verify all routines that should execute appear in the 
profile).  Also if copy overhead shows up as significant look to see why 
those copies are being done; it's often possible to avoid a copy.

My experience in working with architectures like this is that cache 
handling can be a significant cost that doesn't always show up on a profile.

Also you may find useful information by tracking mbufs using the h/w 
clock at important places along the "fast path" then look at whether the 
overhead for each step is reasonable.  I did this for bridged traffic by 
forcing the rx dma to go to an mbuf+cluster then used the free storage 
in the mbuf header to store timestamps.  At the end of the processing 
path I sorted the data into buckets by the sample points and added a 
sysctl to dump the histogram to see min/max/avg.

    Sam




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48C6900C.8070708>