Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Feb 2017 02:01:23 +0100
From:      =?UTF-8?Q?Olivier_Cochard=2DLabb=C3=A9?= <olivier@freebsd.org>
To:        Jordan Caraballo <jordancaraballo87@gmail.com>
Cc:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: Disappointing packets-per-second performance results on a Dell, PE R530
Message-ID:  <CA%2Bq%2BTcoU%2BiFS_LomYE5_7zakGTPtFGhuW39Vi7moDi61ZtA4KQ@mail.gmail.com>
In-Reply-To: <0cdc69d4-e23f-4beb-c4af-59259529287f@gmail.com>
References:  <8f637e2e-cd59-dc65-8476-30989bea516b@gmail.com> <20170103174627.GW37118@zxy.spb.ru> <ebb04a3e-bcde-6d50-af63-348e8d06fcba@gmail.com> <CA%2Bq%2BTco=ApJ28iQSgMkzieMJUMLxc31x2yJe=GEe3b-ZVk2qYQ@mail.gmail.com> <0cdc69d4-e23f-4beb-c4af-59259529287f@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
=E2=80=8B=E2=80=8B
=E2=80=8B=E2=80=8B
On Wed, Feb 1, 2017 at 1:00 AM, Jordan Caraballo <
jordancaraballo87@gmail.com> wrote:

> Hi Oliver, my bad, I missed that one. Here is the info:
>
> * Switch with 48x 10G ports and 12x 40G ports was used
> * (48) 10G connected nodes were used.
> * (24) nodes on each side of the firewall
> * Packet per second (PPS) tests were run using 'iperf'
> * Bandwidth tests were run using 'nuttcp'
> * Parallelization was handled by using pdsh
> * Each of the 24 sending nodes ran either:
>   iperf3 -c "<target>" -u -A 5 -l 512 -b 0 -t "<duration>" -J
>   nuttcp -fparse -l 128k -w1m -T "<duration>" "<target>"
>
>
=E2=80=8BYour current performance (1.5Mpps) seems to indicate that only one=
,
perhaps 2 cores maximum are used.

Can you confirm that during your iperf bench there are 24 distinct flows
simultaneously (different source/destination IP) ?
source-IP-1 -> target-IP-1
source-IP-2 -> target-IP-2
... until source-IP-24 -> target-IP-24

On your dual CPU with 18 cores, chelsio drivers should create per each port=
:
- 8 RX queues (rxq NIC)
- 16 TX queues (txq NIC)

Can you check on /var/run/dmesg.boot that you've got something like this:

t5nex0: <Chelsio T540-CR> mem 0xfb780000-0xfb7fffff,0xfa0000
00-0xfaffffff,0xf9ff0000-0xf9ff1fff irq 40 at device 0.4 numa-domain 0 on
pci7
cxl0: <port 0> numa-domain 0 on t5nex0
cxl0: Ethernet address: 00:07:43:2e:e4:70
cxl0: 16 txq, 8 rxq (NIC); 8 txq, 2 rxq (TOE)

=3D> Notice the 16 txq, 8 rxq (NIC) lines

Now, like Slawa says, we should see the 8 IRQ assigned to these 8 rxq and
equally used.
After your bench, output of a "vmstat -ia | grep t5nex0:0a" should display
a minimum of 8 lines with equally distributed number like this example:

[root@hp]~# vmstat -ia | grep t5nex0:0a
irq292: t5nex0:0a0                    37          0
irq293: t5nex0:0a1                288498        629
irq294: t5nex0:0a2                225410        492
irq295: t5nex0:0a3                306227        668
irq296: t5nex0:0a4                282679        617
irq297: t5nex0:0a5                313143        683
irq298: t5nex0:0a6                318727        695
irq299: t5nex0:0a7                308669        673

(my example seems not perfect because queue0 seems under-utilized, but
you've got the idea)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2Bq%2BTcoU%2BiFS_LomYE5_7zakGTPtFGhuW39Vi7moDi61ZtA4KQ>