Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Jul 2014 20:58:21 -0400
From:      John Jasem <jjasen@gmail.com>
To:        Navdeep Parhar <nparhar@gmail.com>, FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: tuning routing using cxgbe and T580-CR cards?
Message-ID:  <53C0882D.5070100@gmail.com>
In-Reply-To: <53C03BB4.2090203@gmail.com>
References:  <53C01EB5.6090701@gmail.com> <53C03BB4.2090203@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 07/11/2014 03:32 PM, Navdeep Parhar wrote:
> On 07/11/14 10:28, John Jasem wrote:
>> In testing two Chelsio T580-CR dual port cards with FreeBSD 10-STABLE,
>> I've been able to use a collection of clients to generate approximately
>> 1.5-1.6 million TCP packets per second sustained, and routinely hit
>> 10GB/s, both measured by netstat -d -b -w1 -W (I usually use -h for the
>> quick read, accepting the loss of granularity).
> When forwarding, the pps rate is often more interesting, and almost
> always the limiting factor, as compared to the total amount of data
> being passed around.  10GB at this pps probably means 9000 MTU.  Try
> with 1500 too if possible.

Yes, I am generally more interested/concerned with the pps. Using
1500-sized packets, I've seen around 2 million pps. I'll have hard
numbers for the list, with netstat and vmstat output Monday.

<snip>

>> a) One of the first things I did in prior testing was to turn
>> hyperthreading off. I presume this is still prudent, as HT doesn't help
>> with interrupt handling?
> It is always worthwhile to try your workload with and without
> hyperthreading.

Testing Mellanox cards, HT was severely detrimental. However, in almost
every case so far, Mellanox and Chelsio have resulted in opposite
conclusions (cpufreq, net.isr.*).

>> c) the defaults for the cxgbe driver appear to be 8 rx queues, and N tx
>> queues, with N being the number of CPUs detected. For a system running
>> multiple cards, routing or firewalling, does this make sense, or would
>> balancing tx and rx be more ideal? And would reducing queues per card
>> based on NUMBER-CPUS and NUM-CHELSIO-PORTS make sense at all?
> The defaults are nrxq = min(8, ncores) and ntxq = min(16, ncores).  The
> man page mentions this.  The reason for 8 vs. 16 is that tx queues are
> "cheaper" as they don't have to be backed by rx buffers.  It only needs
> some memory for the tx descriptor ring and some hardware resources.
>
> It appears that your system has >= 16 cores.  For forwarding it probably
> makes sense to have nrxq = ntxq.  If you're left with 8 or fewer cores
> after disabling hyperthreading you'll automatically get 8 rx and tx
> queues.  Otherwise you'll have to fiddle with the hw.cxgbe.nrxq10g and
> ntxq10g tunables (documented in the man page).

I promise I did look through the man page before posting. :) This is
actually a 12 core box with HT turned off.

Mining the cxl stat entries in sysctl, it appears that the queues per
port are reasonably well balanced, so I may be concerned over nothing.

<snip>

>> g) Are there other settings I should be looking at, that may squeeze out
>> a few more packets?
> The pps rates that you've observed are within the chip's hardware limits
> by at least an order of magnitude.  Tuning the kernel rather than the
> driver may be the best bang for your buck.

If I am missing obvious configurations for kernel tuning in this regard,
it would not the be the first time.

Thanks again!

-- John Jasen (jjasen@gmail.com)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53C0882D.5070100>