Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Jul 2014 11:07:07 -0700
From:      Adrian Chadd <adrian@freebsd.org>
To:        John Jasen <jjasen@gmail.com>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: fastforward/routing: a 3 million packet-per-second system?
Message-ID:  <CAJ-VmomWpc=3dtasbDhhrUpGywPio3_9W2b-RTAeJjq3nahhOQ@mail.gmail.com>
In-Reply-To: <53CE80DD.9090109@gmail.com>
References:  <53CE80DD.9090109@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi!

Well, what's missing is some dtrace/pmc/lockdebugging investigations
into the system to see where it's currently maxing out at.

I wonder if you're seeing contention on the transmit paths as drivers
queue frames from one set of driver threads/queues to another
potentially completely different set of driver transmit
threads/queues.




-a


On 22 July 2014 08:18, John Jasen <jjasen@gmail.com> wrote:
> Feedback and/or tips and tricks more than welcome.
>
> Outstanding questions:
>
> Would increasing the number of processor cores help?
>
> Would a system where both processor QPI ports connect to each other
> mitigate QPI bottlenecks?
>
> Are there further performance optimizations I am missing?
>
> Server Description:
>
> The system in question is a Dell Poweredge R820, 16GB of RAM, and two
> Intel(R) Xeon(R) CPU E5-4610 0 @ 2.40GHz.
>
> Onboard, in a 16x PCIe slot, I have one Chelsio T-580-CR two-port 40GbE
> NIC, and in an 8x slot, another T-580-CR dual port.
>
> I am running FreeBSD 10.0-STABLE.
>
> BIOS tweaks:
>
> Hyperthreading (or Logical Processors) is turned off.
> Memory Node Interleaving is turned off, but did not appear to impact
> performance.
>
> /boot/loader.conf contents:
> #for CARP+PF testing
> carp_load="YES"
> #load cxgbe drivers.
> cxgbe_load="YES"
> #maxthreads appears to not exceed CPU.
> net.isr.maxthreads=12
> #bindthreads may be indicated when using cpuset(1) on interrupts
> net.isr.bindthreads=1
> #random guess based on googling
> net.isr.maxqlimit=60480
> net.link.ifqmaxlen=90000
> #discussions with cxgbe maintainer and list led me to trying this.
> Allows more interrupts
> #to be fixed to CPUs, which in some cases, improves interrupt balancing.
> hw.cxgbe.ntxq10g=16
> hw.cxgbe.nrxq10g=16
>
> /etc/sysctl.conf contents:
>
> #the following is also enabled by rc.conf gateway_enable.
> net.inet.ip.fastforwarding=1
> #recommendations from BSD router project
> kern.random.sys.harvest.ethernet=0
> kern.random.sys.harvest.point_to_point=0
> kern.random.sys.harvest.interrupt=0
> #probably should be removed, as cxgbe does not seem to affect/be
> affected by irq storm settings
> hw.intr_storm_threshold=25000000
> #based on Calomel.Org performance suggestions. 4x40GbE, seemed
> reasonable to use 100GbE settings
> kern.ipc.maxsockbuf=1258291200
> net.inet.tcp.recvbuf_max=1258291200
> net.inet.tcp.sendbuf_max=1258291200
> #attempting to play with ULE scheduler, making it serve packets versus
> netstat
> kern.sched.slice=1
> kern.sched.interact=1
>
> /etc/rc.conf contains:
>
> hostname="fbge1"
> #should remove, especially given below duplicate entry
> ifconfig_igb0="DHCP"
> sshd_enable="YES"
> #ntpd_enable="YES"
> # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
> dumpdev="AUTO"
> # OpenBSD PF options to play with later. very bad for raw packet rates.
> #pf_enable="YES"
> #pflog_enable="YES"
> # enable packet forwarding
> # these enable forwarding and fastforwarding sysctls. inet6 does not
> have fastforward
> gateway_enable="YES"
> ipv6_gateway_enable="YES"
> # enable OpenBSD ftp-proxy
> # should comment out until actively playing with PF
> ftpproxy_enable="YES"
> #left in place, commented out from prior testing
> #ifconfig_mlxen1="inet 172.16.2.1 netmask 255.255.255.0 mtu 9000"
> #ifconfig_mlxen0="inet 172.16.1.1 netmask 255.255.255.0 mtu 9000"
> #ifconfig_mlxen3="inet 172.16.7.1 netmask 255.255.255.0 mtu 9000"
> #ifconfig_mlxen2="inet 172.16.8.1 netmask 255.255.255.0 mtu 9000"
> # -lro and -tso options added per mailing list suggestion from Bjoern A.
> Zeeb (bzeeb-lists at lists.zabbadoz.net)
> ifconfig_cxl0="inet 172.16.3.1 netmask 255.255.255.0 mtu 9000 -lro -tso up"
> ifconfig_cxl1="inet 172.16.4.1 netmask 255.255.255.0 mtu 9000 -lro -tso up"
> ifconfig_cxl2="inet 172.16.5.1 netmask 255.255.255.0 mtu 9000 -lro -tso up"
> ifconfig_cxl3="inet 172.16.6.1 netmask 255.255.255.0 mtu 9000 -lro -tso up"
> # aliases instead of reconfiguring test clients. See above commented out
> entries
> ifconfig_cxl0_alias0="172.16.7.1 netmask 255.255.255.0"
> ifconfig_cxl1_alias0="172.16.8.1 netmask 255.255.255.0"
> ifconfig_cxl2_alias0="172.16.1.1 netmask 255.255.255.0"
> ifconfig_cxl3_alias0="172.16.2.1 netmask 255.255.255.0"
> # for remote monitoring/admin of the test device
> ifconfig_igb0="inet 172.30.60.60 netmask 255.255.0.0"
>
> Additional configurations:
> cpuset-chelsio-6cpu-high
> # Original provided by  Navdeep Parhar <nparhar@gmail.com>
> # takes vmstat -ai output into a list, and assigns interrupts in order to
> # the available CPU cores.
> # Modified: to assign only to the 'high CPUs', ie: on core1.
> # See: http://lists.freebsd.org/pipermail/freebsd-net/2014-July/039317.html
> #!/usr/local/bin/bash
> ncpu=12
> irqlist=$(vmstat -ia | egrep 't4nex|t5nex|cxgbc' | cut -f1 -d: | cut -c4-)
> i=6
> for irq in $irqlist; do
>         cpuset -l $i -x $irq
>         i=$((i+1))
>         [ $i -ge $ncpu ] && i=6
> done
>
> Client Description:
>
> Two Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz processors
> 64 GB ram
> Mellanox Technologies MT27500 Family [ConnectX-3]
> Centos 6.4 with updates
> iperf3 installed from yum repositories: iperf3-3.0.3-3.el6.x86_64
>
> Test setup:
>
> I've found about 3 streams between Centos clients is about the best way
> to get the most out of them.
> Above certain points, the -b flag does not change results.
> -N is an artifact from using TCP
> -l is needed, as -M doesn't work for UDP.
>
> I usually use launch scripts similar to the following:
>
>  for i in `seq 41 60`; do ssh loader$i "export TIME=120; export
> STREAMS=1; export PORT=52$i; export PKT=64; export RATE=2000m;
> /root/iperf-test-8port-udp" & done
>
> The scripts execute the following on each host.
>
> #!/bin/bash
> PORT1=$PORT
> PORT2=$(($PORT+1000))
> PORT3=$(($PORT+2000))
> iperf3 -c loader41-40gbe -u -b 10000m -i 0  -N -l $PKT -t$TIME
> -P$STREAMS -p$PORT1 &
> iperf3 -c loader42-40gbe -u -b 10000m -i 0  -N -l $PKT -t$TIME
> -P$STREAMS -p$PORT1 &
> iperf3 -c loader43-40gbe -u -b 10000m -i 0  -N -l $PKT -t$TIME
> -P$STREAMS -p$PORT1 &
> ... (through all clients and all three ports) ...
> iperf3 -c loader60-40gbe -u -b 10000m -i 0  -N -l $PKT -t$TIME
> -P$STREAMS -p$PORT3 &
>
>
> Results:
>
> Summarized, netstat -w 1 -q 240 -bd, run through:
> cat test4-tuning | egrep -v {'packets | input '} | awk '{ipackets+=$1}
> {idrops+=$3} {opackets+=$5} {odrops+=$9} END {print "input "
> ipackets/NR, "idrops " idrops/NR, "opackets " opackets/NR, "odrops "
> odrops/NR}'
>
> input 1.10662e+07 idrops 8.01783e+06 opackets 3.04516e+06 odrops 3152.4
>
> Snapshot of raw output:
>
>            input        (Total)           output
>    packets  errs idrops      bytes    packets  errs      bytes colls drops
>   11189148     0 7462453 1230805216    3725006     0  409750710     0   799
>   10527505     0 6746901 1158024978    3779096     0  415700708     0   127
>   10606163     0 6850760 1166676673    3751780     0  412695761     0  1535
>   10749324     0 7132014 1182425799    3617558     0  397930956     0  5972
>   10695667     0 7022717 1176521907    3669342     0  403627236     0  1461
>   10441173     0 6762134 1148528662    3675048     0  404255540     0  6021
>   10683773     0 7005635 1175215014    3676962     0  404465671     0  2606
>   10869859     0 7208696 1195683372    3658432     0  402427698     0   979
>   11948989     0 8310926 1314387881    3633773     0  399714986     0   725
>   12426195     0 8864415 1366877194    3562311     0  391853156     0  2762
>   13006059     0 9432389 1430661751    3570067     0  392706552     0  5158
>   12822243     0 9098871 1410443600    3715177     0  408668500     0  4064
>   13317864     0 9683602 1464961374    3632156     0  399536131     0  3684
>   13701905     0 10182562 1507207982    3523101     0  387540859     0
> 8690
>   13820227     0 10244870 1520221820    3562038     0  391823322     0
> 2426
>   14437060     0 10955483 1588073033    3480105     0  382810557     0
> 2619
>   14518471     0 11119573 1597028105    3397439     0  373717355     0
> 5691
>   14890287     0 11675003 1637926521    3199812     0  351978304     0
> 11007
>   14923610     0 11749091 1641594441    3171436     0  348857468     0
> 7389
>   14738704     0 11609730 1621254991    3117715     0  342948394     0
> 2597
>   14753975     0 11549735 1622935026    3207393     0  352812846     0
> 4798
>
>
>
>
>
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmomWpc=3dtasbDhhrUpGywPio3_9W2b-RTAeJjq3nahhOQ>