Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Jul 2014 13:28:21 -0400
From:      John Jasem <jjasen@gmail.com>
To:        FreeBSD Net <freebsd-net@freebsd.org>, Navdeep Parhar <nparhar@gmail.com>
Subject:   tuning routing using cxgbe and T580-CR cards?
Message-ID:  <53C01EB5.6090701@gmail.com>

next in thread | raw e-mail | index | archive | help
In testing two Chelsio T580-CR dual port cards with FreeBSD 10-STABLE,
I've been able to use a collection of clients to generate approximately
1.5-1.6 million TCP packets per second sustained, and routinely hit
10GB/s, both measured by netstat -d -b -w1 -W (I usually use -h for the
quick read, accepting the loss of granularity).

While performance has so far been stellar, and I'm honestly speculating
I will need more CPU depth and horsepower to get much faster, I'm
curious if there is any gain to tweaking performance settings. I'm
seeing, under multiple streams, with N targets connecting to N servers,
interrupts on all CPUs peg at 99-100%, and I'm curious if tweaking
configs will help, or its a free clue to get more horsepower.

So, far, except for temporarily turning off pflogd, and setting the
following sysctl variables, I've not done any performance tuning on the
system yet.

/etc/sysctl.conf
net.inet.ip.fastforwarding=1
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
kern.random.sys.harvest.interrupt=0

a) One of the first things I did in prior testing was to turn
hyperthreading off. I presume this is still prudent, as HT doesn't help
with interrupt handling?

b) I briefly experimented with using cpuset(1) to stick interrupts to
physical CPUs, but it offered no performance enhancements, and indeed,
appeared to decrease performance by 10-20%. Has anyone else tried this?
What were your results?

c) the defaults for the cxgbe driver appear to be 8 rx queues, and N tx
queues, with N being the number of CPUs detected. For a system running
multiple cards, routing or firewalling, does this make sense, or would
balancing tx and rx be more ideal? And would reducing queues per card
based on NUMBER-CPUS and NUM-CHELSIO-PORTS make sense at all?

d) dev.cxl.$PORT.qsize_rxq: 1024 and dev.cxl.$PORT.qsize_txq: 1024.
These appear to not be writeable when if_cxgbe is loaded, so I speculate
they are not to be messed with, or are loader.conf variables? Is there
any benefit to messing with them?

e) dev.t5nex.$CARD.toe.sndbuf: 262144. These are writeable, but messing
with values did not yield an immediate benefit. Am I barking up the
wrong tree, trying?

f) based on prior experiments with other vendors, I tried tweaks to
net.isr.* settings, but did not see any benefits worth discussing. Am I
correct in this speculation, based on others experience?

g) Are there other settings I should be looking at, that may squeeze out
a few more packets?

Thanks in advance!

-- John Jasen (jjasen@gmail.com)
















Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53C01EB5.6090701>