From owner-freebsd-net@FreeBSD.ORG Mon Jul 14 19:03:56 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AFB09963 for ; Mon, 14 Jul 2014 19:03:56 +0000 (UTC) Received: from mail-pa0-x22d.google.com (mail-pa0-x22d.google.com [IPv6:2607:f8b0:400e:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 83DE4281A for ; Mon, 14 Jul 2014 19:03:56 +0000 (UTC) Received: by mail-pa0-f45.google.com with SMTP id rd3so5871436pab.32 for ; Mon, 14 Jul 2014 12:03:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=aHuW7NKwAvaCTRbhoavr3Lf2FY0GpV2ItFlnVcev+ok=; b=N0pzthpwKbbksce7+z9aahYu2iGk9yfTrHvJoh/8n+9vZibPxx3pU4+Kh+yLNiiFwX aqu7LdRB5g8wwF2JFKb30x7G6tfT3jzeIpZcMZnMbxRD6TCcYHGhC7J1GnfVHo5yWoCf kYFELjos+B3AM4juyOD5t6O+EhHYmaWAfyeLeikEp4OUN/E+08aZhlu9WMoLg3JFoZkO s3toWE94UshA0lA8eamQUzpvgcE2KSBL3kxFXiLh5uguShPvRBxKTybfG2AY2o9OV7Kg goP/kL6VLQ53pevsbljbehb42ePqk9+3IlE0986+rMfbxyh9S3YoocN7wCvEn3mTUMWN gpzQ== X-Received: by 10.68.94.225 with SMTP id df1mr18324098pbb.86.1405364636154; Mon, 14 Jul 2014 12:03:56 -0700 (PDT) Received: from [10.192.166.0] (stargate.chelsio.com. [67.207.112.58]) by mx.google.com with ESMTPSA id mj9sm15498198pdb.44.2014.07.14.12.03.55 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 14 Jul 2014 12:03:55 -0700 (PDT) Message-ID: <53C4299A.3000900@gmail.com> Date: Mon, 14 Jul 2014 12:03:54 -0700 From: Navdeep Parhar User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: John Jasem , FreeBSD Net Subject: Re: tuning routing using cxgbe and T580-CR cards? References: <53C01EB5.6090701@gmail.com> <53C03BB4.2090203@gmail.com> <53C3EFDC.2030100@gmail.com> In-Reply-To: <53C3EFDC.2030100@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jul 2014 19:03:56 -0000 Use UDP if you want more control over your experiments. - It's easier to directly control the frame size on the wire. No TSO, LRO, segmentation to worry about. - UDP has no flow control so the transmitters will not let up even if a frame goes missing. TCP will go into recovery. Lack of protocol level flow control also means the transmitters cannot be influenced by the receivers in any way. - frames go only in the direction you want them to. With TCP you have the receiver transmitting all the time too (ACKs). Regards, Navdeep On 07/14/14 07:57, John Jasem wrote: > The two physical CPUs are: > Intel(R) Xeon(R) CPU E5-4610 0 @ 2.40GHz (2400.05-MHz K8-class CPU) > > Hyperthreading, at least from initial appearances, seems to offer no > benefits or drawbacks. > > I tested iperf3, using a packet generator on each subnet, each sending 4 > streams to a server on another subnet. > > maximum segment size of 128 and 1460 used, with little variance. (iperf3 > -M). > > A snapshot of netstat -d -b -w1 -W -h included. Midway through, the > numbers dropped. This coincides with launching this was when I launched > 16 more streams, 4 new clients, 4 new servers on different nets, 4 > streams each. > > input (Total) output > packets errs idrops bytes packets errs bytes colls drops > 1.6M 0 514 254M 1.6M 0 252M 0 5 > 1.6M 0 294 244M 1.6M 0 246M 0 6 > 1.6M 0 95 255M 1.5M 0 236M 0 6 > 1.4M 0 0 216M 1.5M 0 224M 0 3 > 1.5M 0 0 225M 1.4M 0 219M 0 4 > 1.4M 0 389 214M 1.4M 0 216M 0 1 > 1.4M 0 270 207M 1.4M 0 207M 0 1 > 1.4M 0 279 210M 1.4M 0 209M 0 2 > 1.4M 0 12 207M 1.3M 0 204M 0 1 > 1.4M 0 303 206M 1.4M 0 214M 0 2 > 1.3M 0 2.3K 190M 1.4M 0 212M 0 1 > 1.1M 0 1.1K 175M 1.1M 0 176M 0 1 > 1.1M 0 1.6K 176M 1.1M 0 175M 0 1 > 1.1M 0 830 176M 1.1M 0 174M 0 0 > 1.2M 0 1.5K 187M 1.2M 0 187M 0 0 > 1.2M 0 1.1K 183M 1.2M 0 184M 0 1 > 1.2M 0 1.5K 197M 1.2M 0 196M 0 2 > 1.3M 0 2.2K 199M 1.2M 0 196M 0 0 > 1.3M 0 2.8K 200M 1.3M 0 202M 0 4 > 1.3M 0 1.5K 199M 1.2M 0 198M 0 1 > > > vmstat also included. You see similar drops in faults. > > > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr mf0 cd0 in sy cs > us sy id > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 188799 224 > 387419 0 74 26 > 0 0 0 574M 15G 2 0 0 0 0 6 0 0 207447 150 > 425576 0 72 28 > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 205638 202 > 421659 0 75 25 > 0 0 0 574M 15G 2 0 0 0 0 6 0 0 200292 150 > 411257 0 74 26 > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 200338 197 > 411537 0 77 23 > 0 0 0 574M 15G 2 0 0 0 0 6 0 0 199289 156 > 409092 0 75 25 > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 200504 200 > 411992 0 76 24 > 0 0 0 574M 15G 2 0 0 0 0 6 0 0 165042 152 > 341207 0 78 22 > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 171360 200 > 353776 0 78 22 > 0 0 0 574M 15G 2 0 0 0 0 6 0 0 197557 150 > 405937 0 74 26 > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 170696 204 > 353197 0 78 22 > 0 0 0 574M 15G 2 0 0 0 0 6 0 0 174927 150 > 361171 0 77 23 > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 153836 200 > 319227 0 79 21 > 0 0 0 574M 15G 2 0 0 0 0 6 0 0 159056 150 > 329517 0 78 22 > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 155240 200 > 321819 0 78 22 > 0 0 0 574M 15G 2 0 0 0 0 6 0 0 166422 156 > 344184 0 78 22 > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 162065 200 > 335215 0 79 21 > 0 0 0 574M 15G 2 0 0 0 0 6 0 0 172857 150 > 356852 0 78 22 > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 81267 197 > 176539 0 92 8 > 0 0 0 574M 15G 2 0 0 0 0 6 0 0 82151 150 > 177434 0 91 9 > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 73904 204 > 160887 0 91 9 > 0 0 0 574M 15G 2 0 0 0 8 6 0 0 73820 150 > 161201 0 91 9 > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 73926 196 > 161850 0 92 8 > 0 0 0 574M 15G 2 0 0 0 0 6 0 0 77215 150 > 166886 0 91 9 > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 77509 198 > 169650 0 91 9 > 0 0 0 574M 15G 2 0 0 0 0 6 0 0 69993 156 > 154783 0 90 10 > 0 0 0 574M 15G 82 0 0 0 0 6 0 0 69722 199 > 153525 0 91 9 > 0 0 0 574M 15G 2 0 0 0 0 6 0 0 66353 150 > 147027 0 91 9 > 0 0 0 550M 15G 102 0 0 0 101 6 0 0 67906 259 > 149365 0 90 10 > 0 0 0 550M 15G 0 0 0 0 0 6 0 0 71837 125 > 157253 0 92 8 > 0 0 0 550M 15G 80 0 0 0 0 6 0 0 73508 179 > 161498 0 92 8 > 0 0 0 550M 15G 0 0 0 0 0 6 0 0 72673 125 > 159449 0 92 8 > 0 0 0 550M 15G 80 0 0 0 0 6 0 0 75630 175 > 164614 0 91 9 > > > > > On 07/11/2014 03:32 PM, Navdeep Parhar wrote: >> On 07/11/14 10:28, John Jasem wrote: >>> In testing two Chelsio T580-CR dual port cards with FreeBSD 10-STABLE, >>> I've been able to use a collection of clients to generate approximately >>> 1.5-1.6 million TCP packets per second sustained, and routinely hit >>> 10GB/s, both measured by netstat -d -b -w1 -W (I usually use -h for the >>> quick read, accepting the loss of granularity). >> When forwarding, the pps rate is often more interesting, and almost >> always the limiting factor, as compared to the total amount of data >> being passed around. 10GB at this pps probably means 9000 MTU. Try >> with 1500 too if possible. >> >> "netstat -d 1" and "vmstat 1" for a few seconds when your system is >> under maximum load would be useful. And what kind of CPU is in this system? >> >>> While performance has so far been stellar, and I'm honestly speculating >>> I will need more CPU depth and horsepower to get much faster, I'm >>> curious if there is any gain to tweaking performance settings. I'm >>> seeing, under multiple streams, with N targets connecting to N servers, >>> interrupts on all CPUs peg at 99-100%, and I'm curious if tweaking >>> configs will help, or its a free clue to get more horsepower. >>> >>> So, far, except for temporarily turning off pflogd, and setting the >>> following sysctl variables, I've not done any performance tuning on the >>> system yet. >>> >>> /etc/sysctl.conf >>> net.inet.ip.fastforwarding=1 >>> kern.random.sys.harvest.ethernet=0 >>> kern.random.sys.harvest.point_to_point=0 >>> kern.random.sys.harvest.interrupt=0 >>> >>> a) One of the first things I did in prior testing was to turn >>> hyperthreading off. I presume this is still prudent, as HT doesn't help >>> with interrupt handling? >> It is always worthwhile to try your workload with and without >> hyperthreading. >> >>> b) I briefly experimented with using cpuset(1) to stick interrupts to >>> physical CPUs, but it offered no performance enhancements, and indeed, >>> appeared to decrease performance by 10-20%. Has anyone else tried this? >>> What were your results? >>> >>> c) the defaults for the cxgbe driver appear to be 8 rx queues, and N tx >>> queues, with N being the number of CPUs detected. For a system running >>> multiple cards, routing or firewalling, does this make sense, or would >>> balancing tx and rx be more ideal? And would reducing queues per card >>> based on NUMBER-CPUS and NUM-CHELSIO-PORTS make sense at all? >> The defaults are nrxq = min(8, ncores) and ntxq = min(16, ncores). The >> man page mentions this. The reason for 8 vs. 16 is that tx queues are >> "cheaper" as they don't have to be backed by rx buffers. It only needs >> some memory for the tx descriptor ring and some hardware resources. >> >> It appears that your system has >= 16 cores. For forwarding it probably >> makes sense to have nrxq = ntxq. If you're left with 8 or fewer cores >> after disabling hyperthreading you'll automatically get 8 rx and tx >> queues. Otherwise you'll have to fiddle with the hw.cxgbe.nrxq10g and >> ntxq10g tunables (documented in the man page). >> >> >>> d) dev.cxl.$PORT.qsize_rxq: 1024 and dev.cxl.$PORT.qsize_txq: 1024. >>> These appear to not be writeable when if_cxgbe is loaded, so I speculate >>> they are not to be messed with, or are loader.conf variables? Is there >>> any benefit to messing with them? >> Can't change them after the port has been administratively brought up >> even once. This is mentioned in the man page. I don't really recommend >> changing them any way. >> >>> e) dev.t5nex.$CARD.toe.sndbuf: 262144. These are writeable, but messing >>> with values did not yield an immediate benefit. Am I barking up the >>> wrong tree, trying? >> The TOE tunables won't make a difference unless you have enabled TOE, >> the TCP endpoints lie on the system, and the connections are being >> handled by the TOE on the chip. This is not the case on your systems. >> The driver does not enable TOE by default and the only way to use it is >> to switch it on explicitly. There is no possibility that you're using >> it without knowing that you are. >> >>> f) based on prior experiments with other vendors, I tried tweaks to >>> net.isr.* settings, but did not see any benefits worth discussing. Am I >>> correct in this speculation, based on others experience? >>> >>> g) Are there other settings I should be looking at, that may squeeze out >>> a few more packets? >> The pps rates that you've observed are within the chip's hardware limits >> by at least an order of magnitude. Tuning the kernel rather than the >> driver may be the best bang for your buck. >> >> Regards, >> Navdeep >> >>> Thanks in advance! >>> >>> -- John Jasen (jjasen@gmail.com) >