From owner-freebsd-net@freebsd.org Wed Aug 12 15:23:21 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D254B99F80D for ; Wed, 12 Aug 2015 15:23:21 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-io0-x22c.google.com (mail-io0-x22c.google.com [IPv6:2607:f8b0:4001:c06::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9B44BF9F; Wed, 12 Aug 2015 15:23:21 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: by iodb91 with SMTP id b91so23166215iod.1; Wed, 12 Aug 2015 08:23:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=2ilnLqNrNYDqj1G6OdEubEZQSwTlXHpkfkGlXvJGJ6Q=; b=ZGwXQJxNYiBT2JFG8bULVCnmzCDQ1kDpiP/YB/M9OIYR4FSnzG68NG7/EzMjpjv+Yg nh/oifH4oTj9edxK4HU9J2W1KnEVDvyTFCMEmyEWbz4kmsJQXZIgrOLplZAROnSbUA9O cqEjmtoH7SOkEVmIFYZwgQ4XOeDmx0bcwHvMDWySvA/R+1BLCR2XoHkfuRTuQP69iurR CO0aRRn7A3rwuogI8DY436qeNL74O5Nxs09WfSfsdSQVQKlmpym2HEBEqxnR69RmlvOH B/XCDDLgXYPlPFwaZGGR6a28aCFtCe4kWtnZLJOIaSzqrvocuOOroKO65jCZHKl0ttfR M1wg== MIME-Version: 1.0 X-Received: by 10.107.156.73 with SMTP id f70mr809498ioe.165.1439393000473; Wed, 12 Aug 2015 08:23:20 -0700 (PDT) Received: by 10.36.38.133 with HTTP; Wed, 12 Aug 2015 08:23:20 -0700 (PDT) In-Reply-To: References: Date: Wed, 12 Aug 2015 08:23:20 -0700 Message-ID: Subject: Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1 From: Adrian Chadd To: Maxim Sobolev Cc: Luigi Rizzo , "Alexander V. Chernikov" , FreeBSD Net , Babak Farrokhi , "freebsd@intel.com" , =?UTF-8?Q?Jev_Bj=C3=B6rsell?= , =?UTF-8?Q?Olivier_Cochard=2DLabb=C3=A9?= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Aug 2015 15:23:22 -0000 Right, and for the ixgbe hardware? -a On 12 August 2015 at 08:05, Maxim Sobolev wrote: > igb0@pci0:7:0:0: class=3D0x020000 card=3D0x153315d9 chip=3D0x15338= 086 > rev=3D0x03 hdr=3D0x00 > vendor =3D 'Intel Corporation' > device =3D 'I210 Gigabit Network Connection' > class =3D network > subclass =3D ethernet > igb1@pci0:8:0:0: class=3D0x020000 card=3D0x153315d9 chip=3D0x15338= 086 > rev=3D0x03 hdr=3D0x00 > vendor =3D 'Intel Corporation' > device =3D 'I210 Gigabit Network Connection' > class =3D network > subclass =3D ethernet > > > On Wed, Aug 12, 2015 at 8:03 AM, Maxim Sobolev > wrote: > >> Ok, so my current settings are: >> >> hw.ix.max_interrupt_rate: 20000 >> dev.ix.0.queue0.interrupt_rate: 20000 >> dev.ix.0.queue1.interrupt_rate: 20000 >> dev.ix.0.queue2.interrupt_rate: 20000 >> dev.ix.0.queue3.interrupt_rate: 20000 >> dev.ix.0.queue4.interrupt_rate: 20000 >> dev.ix.0.queue5.interrupt_rate: 20000 >> dev.ix.1.queue0.interrupt_rate: 20000 >> dev.ix.1.queue1.interrupt_rate: 20000 >> dev.ix.1.queue2.interrupt_rate: 20000 >> dev.ix.1.queue3.interrupt_rate: 20000 >> dev.ix.1.queue4.interrupt_rate: 20000 >> dev.ix.1.queue5.interrupt_rate: 20000 >> dev.ix.0.enable_aim: 0 >> dev.ix.1.enable_aim: 0 >> dev.ix.2.enable_aim: 0 >> dev.ix.3.enable_aim: 0 >> hw.ix.num_queues:6 >> >> We also happen to have I210-based system with only 4 hardware queues, it >> would be interesting to see how it stacks up. >> >> On Wed, Aug 12, 2015 at 5:23 AM, Luigi Rizzo wrote: >> >>> As I was telling to maxim, you should disable aim because it only match= es >>> the max interrupt rate to the average packet size, which is the last th= ing >>> you want. >>> >>> Setting the interrupt rate with sysctl (one per queue) gives you precis= e >>> control on the max rate and (hence, extra latency). 20k interrupts/s gi= ve >>> you 50us of latency, and the 2k slots in the queue are still enough to >>> absorb a burst of min-sized frames hitting a single queue (the os will >>> start dropping long before that level, but that's another story). >>> >>> Cheers >>> Luigi >>> >>> On Wednesday, August 12, 2015, Babak Farrokhi >>> wrote: >>> >>>> I ran into the same problem with almost the same hardware (Intel X520) >>>> on 10-STABLE. HT/SMT is disabled and cards are configured with 8 queue= s, >>>> with the same sysctl tunings as sobomax@ did. I am not using lagg, no >>>> FLOWTABLE. >>>> >>>> I experimented with pmcstat (RESOURCE_STALLS) a while ago and here [1] >>>> [2] you can see the results, including pmc output, callchain, flamegra= ph >>>> and gprof output. >>>> >>>> I am experiencing huge number of interrupts with 200kpps load: >>>> >>>> # sysctl dev.ix | grep interrupt_rate >>>> dev.ix.1.queue7.interrupt_rate: 125000 >>>> dev.ix.1.queue6.interrupt_rate: 6329 >>>> dev.ix.1.queue5.interrupt_rate: 500000 >>>> dev.ix.1.queue4.interrupt_rate: 100000 >>>> dev.ix.1.queue3.interrupt_rate: 50000 >>>> dev.ix.1.queue2.interrupt_rate: 500000 >>>> dev.ix.1.queue1.interrupt_rate: 500000 >>>> dev.ix.1.queue0.interrupt_rate: 100000 >>>> dev.ix.0.queue7.interrupt_rate: 500000 >>>> dev.ix.0.queue6.interrupt_rate: 6097 >>>> dev.ix.0.queue5.interrupt_rate: 10204 >>>> dev.ix.0.queue4.interrupt_rate: 5208 >>>> dev.ix.0.queue3.interrupt_rate: 5208 >>>> dev.ix.0.queue2.interrupt_rate: 71428 >>>> dev.ix.0.queue1.interrupt_rate: 5494 >>>> dev.ix.0.queue0.interrupt_rate: 6250 >>>> >>>> [1] http://farrokhi.net/~farrokhi/pmc/6/ >>>> [2] http://farrokhi.net/~farrokhi/pmc/7/ >>>> >>>> Regards, >>>> Babak >>>> >>>> >>>> Alexander V. Chernikov wrote: >>>> > 12.08.2015, 02:28, "Maxim Sobolev" : >>>> >> Olivier, keep in mind that we are not "kernel forwarding" packets, >>>> but "app >>>> >> forwarding", i.e. the packet goes full way >>>> >> net->kernel->recvfrom->app->sendto->kernel->net, which is why we ha= ve >>>> much >>>> >> lower PPS limits and which is why I think we are actually benefitin= g >>>> from >>>> >> the extra queues. Single-thread sendto() in a loop is CPU-bound at >>>> about >>>> >> 220K PPS, and while running the test I am observing that outbound >>>> traffic >>>> >> from one thread is mapped into a specific queue (well, pair of queu= es >>>> on >>>> >> two separate adaptors, due to lagg load balancing action). And the >>>> peak >>>> >> performance of that test is at 7 threads, which I believe correspon= ds >>>> to >>>> >> the number of queues. We have plenty of CPU cores in the box (24) w= ith >>>> >> HTT/SMT disabled and one CPU is mapped to a specific queue. This >>>> leaves us >>>> >> with at least 8 CPUs fully capable of running our app. If you look = at >>>> the >>>> >> CPU utilization, we are at about 10% when the issue hits. >>>> > >>>> > In any case, it would be great if you could provide some profiling >>>> info since there could be >>>> > plenty of problematic places starting from TX rings contention to so= me >>>> locks inside udp or even >>>> > (in)famous random entropy harvester.. >>>> > e.g. something like pmcstat -TS instructions -w1 might be sufficient >>>> to determine the reason >>>> >> ix0: >>> 2.5.15> port >>>> >> 0x6020-0x603f mem 0xc7c00000-0xc7dfffff,0xc7e04000-0xc7e07fff irq 4= 0 >>>> at >>>> >> device 0.0 on pci3 >>>> >> ix0: Using MSIX interrupts with 9 vectors >>>> >> ix0: Bound queue 0 to cpu 0 >>>> >> ix0: Bound queue 1 to cpu 1 >>>> >> ix0: Bound queue 2 to cpu 2 >>>> >> ix0: Bound queue 3 to cpu 3 >>>> >> ix0: Bound queue 4 to cpu 4 >>>> >> ix0: Bound queue 5 to cpu 5 >>>> >> ix0: Bound queue 6 to cpu 6 >>>> >> ix0: Bound queue 7 to cpu 7 >>>> >> ix0: Ethernet address: 0c:c4:7a:5e:be:64 >>>> >> ix0: PCI Express Bus: Speed 5.0GT/s Width x8 >>>> >> 001.000008 [2705] netmap_attach success for ix0 tx 8/4096 rx >>>> >> 8/4096 queues/slots >>>> >> ix1: >>> 2.5.15> port >>>> >> 0x6000-0x601f mem 0xc7a00000-0xc7bfffff,0xc7e00000-0xc7e03fff irq 4= 4 >>>> at >>>> >> device 0.1 on pci3 >>>> >> ix1: Using MSIX interrupts with 9 vectors >>>> >> ix1: Bound queue 0 to cpu 8 >>>> >> ix1: Bound queue 1 to cpu 9 >>>> >> ix1: Bound queue 2 to cpu 10 >>>> >> ix1: Bound queue 3 to cpu 11 >>>> >> ix1: Bound queue 4 to cpu 12 >>>> >> ix1: Bound queue 5 to cpu 13 >>>> >> ix1: Bound queue 6 to cpu 14 >>>> >> ix1: Bound queue 7 to cpu 15 >>>> >> ix1: Ethernet address: 0c:c4:7a:5e:be:65 >>>> >> ix1: PCI Express Bus: Speed 5.0GT/s Width x8 >>>> >> 001.000009 [2705] netmap_attach success for ix1 tx 8/4096 rx >>>> >> 8/4096 queues/slots >>>> >> >>>> >> On Tue, Aug 11, 2015 at 4:14 PM, Olivier Cochard-Labb=C3=A9 < >>>> olivier@cochard.me> >>>> >> wrote: >>>> >> >>>> >>> On Tue, Aug 11, 2015 at 11:18 PM, Maxim Sobolev < >>>> sobomax@freebsd.org> >>>> >>> wrote: >>>> >>> >>>> >>>> Hi folks, >>>> >>>> >>>> >>>> Hi, >>>> >>> >>>> >>> >>>> >>>> We've trying to migrate some of our high-PPS systems to a new >>>> hardware >>>> >>>> that >>>> >>>> has four X540-AT2 10G NICs and observed that interrupt time goes >>>> through >>>> >>>> roof after we cross around 200K PPS in and 200K out (two ports i= n >>>> LACP). >>>> >>>> The previous hardware was stable up to about 350K PPS in and 350= K >>>> out. I >>>> >>>> believe the old one was equipped with the I350 and had the >>>> identical LACP >>>> >>>> configuration. The new box also has better CPU with more cores >>>> (i.e. 24 >>>> >>>> cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3. >>>> >>> 200K PPS, and even 350K PPS are very low value indeed. >>>> >>> On a Intel Xeon L5630 (4 cores only) with one X540-AT2 >>>> >>> >>>> >>> (then 2 10Gigabit ports) I've reached about 1.8Mpps >>>> (fastforwarding >>>> >>> enabled) [1]. >>>> >>> But my setup didn't use lagg(4): Can you disable lagg configurati= on >>>> and >>>> >>> re-measure your performance without lagg ? >>>> >>> >>>> >>> Do you let Intel NIC drivers using 8 queues for port too? >>>> >>> In my use case (forwarding smallest UDP packet size), I obtain >>>> better >>>> >>> behaviour by limiting NIC queues to 4 (hw.ix.num_queues or >>>> >>> hw.ixgbe.num_queues, don't remember) if my system had 8 cores. An= d >>>> this >>>> >>> with Gigabit Intel[2] or Chelsio NIC [3]. >>>> >>> >>>> >>> Don't forget to disable TSO and LRO too. >>>> >>> >>>> >>> Regards, >>>> >>> >>>> >>> Olivier >>>> >>> >>>> >>> [1] >>>> >>> >>>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_= an_ibm_system_x3550_m3_with_10-gigabit_intel_x540-at2#graphs >>>> >>> [2] >>>> >>> >>>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_= a_superserver_5018a-ftn4#graph1 >>>> >>> [3] >>>> >>> >>>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_= a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr#r= educing_nic_queues >>>> >> _______________________________________________ >>>> >> freebsd-net@freebsd.org mailing list >>>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.o= rg >>>> " >>>> > _______________________________________________ >>>> > freebsd-net@freebsd.org mailing list >>>> > http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>> > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.or= g" >>>> _______________________________________________ >>>> freebsd-net@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >>> >>> >>> >>> -- >>> -----------------------------------------+-----------------------------= -- >>> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazion= e >>> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >>> TEL +39-050-2217533 . via Diotisalvi 2 >>> Mobile +39-338-6809875 . 56122 PISA (Italy) >>> -----------------------------------------+-----------------------------= -- >>> >>> >> >> >> -- >> Maksym Sobolyev >> Sippy Software, Inc. >> Internet Telephony (VoIP) Experts >> Tel (Canada): +1-778-783-0474 >> Tel (Toll-Free): +1-855-747-7779 >> Fax: +1-866-857-6942 >> Web: http://www.sippysoft.com >> MSN: sales@sippysoft.com >> Skype: SippySoft >> > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"