Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Sep 2010 19:35:55 +0400
From:      Artemiev Igor <ai@kliksys.ru>
To:        freebsd-current@freebsd.org
Subject:   Re: netisr software flowid
Message-ID:  <20100927153555.GA9200@two.kliksys.ru>
In-Reply-To: <alpine.BSF.2.00.1009270947110.82524@fledge.watson.org>
References:  <20100926235313.GA4848@two.kliksys.ru> <alpine.BSF.2.00.1009270947110.82524@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Sep 27, 2010 at 09:52:21AM +0100, Robert Watson wrote:
> One reason I haven't merged the earlier patch is that many high-performance 
> 10gbps (and even 1gbps) cards now support multiple input queues in hardware, 
> meaning that they have already done the work distribution by the time the 
> packets get to the OS.  This makes the work distribution choice quite a bit 
> harder: has a packet already been adequately balanced, or is further 
> rebalancing required -- and of so, an equal distribution as selected in that 
> patch might not generate well-balanced CPU load.
> 
> Using just the RSS hash to distribute work, and single-queue input, I am able 
> to get doubled end-host TCP performance with highly concurrent connections at 
> 10gbps, which is a useful result.  I have high on my todo list to get the 
> patch you referenced into the mix as well and see how much the software 
> distrbiution hurts/helps...
Thanks for explanation.

> Since you've done some measurement, what was the throughput on that system 
> without the patch applied, and how many cores?
The server has four cores. Topology:
<groups>
 <group level="1" cache-level="0">
  <cpu count="4" mask="0xf">0, 1, 2, 3</cpu>
  <children>
   <group level="3" cache-level="2">
    <cpu count="4" mask="0xf">0, 1, 2, 3</cpu>
   </group>
  </children>
 </group>
</groups>

Without patch i have only one netisr thread
utilization with 100% cpu load and ~90% packets drop at max 80-90Kpps. The
throughput oscillated from 2MB/s to 30MB/s.

Cores 0,2,3 - netisr with cpu binding
Core 1 - irq256 (bge0) bind via cpuset(1) 

P.S.: bge(4) patched for agressive interrupt moderation. Without this i have
11K+ int/sec and ~99% cpu usage only in the interrupt handling.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100927153555.GA9200>