Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 May 2015 14:58:38 -0700
From:      Adrian Chadd <adrian@freebsd.org>
To:        Lakshmi Narasimhan Sundararajan <lakshmi.n@msystechnologies.com>
Cc:        "Pokala, Ravi" <rpokala@panasas.com>,  "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, "jfv@freebsd.org" <jfv@freebsd.org>, "erj@freebsd.org" <erj@freebsd.org>,  "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, "Lewis, Fred" <flewis@panasas.com>
Subject:   Re: Performance issues with Intel Fortville (XL710/ixl(4))
Message-ID:  <CAJ-VmomvkxxyG7RZj%2BLg-DxciDaF4dWRTZk_5Naaehg6bWuV8w@mail.gmail.com>
In-Reply-To: <5564b545.0287440a.5871.7050@mx.google.com>
References:  <D1810169.136A5C%rpokala@panasas.com> <5564b545.0287440a.5871.7050@mx.google.com>

next in thread | previous in thread | raw e-mail | index | archive | help
hi!

Try enabling RSS and PCBGROUPS on -HEAD. The ixl driver should work.

(I haven't tested it though; I've had other things going on here.)



-adrian


On 21 May 2015 at 15:20, Lakshmi Narasimhan Sundararajan
<lakshmi.n@msystechnologies.com> wrote:
> Hi FreeBSD Team!
>
> We seem to have found a problem to Tx performance.
>
> We found that the tx handling is spread on all CPUs causing probably cach=
e trashing resulting in poor performance.
>
> But once we used cpuset to bind interrupt thread and iperf process to the=
 same CPU, performance was close to line rate. I used userland cpuset comma=
nd to perform this manually. I want this constrained in the kernel config/c=
ode through some tunables, and I am seeking your help/pointers in that rega=
rd.
>
>
> My followup questions are as follows.
>
> a) How are Tx interrupts steered from the NIC to the CPU on the transmit =
path? Would tx_complete# interrupt for packets transmitted from CPU#x, be s=
erviced on the same CPU? If not, how to get this binding done?
>
>
> b) I would like to use a pool of CPUs dedicated to service NIC interrupts=
. Especially on the transmit path, I would want the tx_interrupts to be han=
dled on the same CPU on which request was submitted. How to get this done?
>
>
> I played with the current ISR setting but I did not see any difference in=
 how Interrupts are scheduled across CPU. The max interrupt threads even th=
ough set to one, the interrupt threads are scheduled on any CPU. Even if I =
set bindthreads to =E2=80=981=E2=80=99. There is no difference in interrupt=
 thread scheduling.
>
>
> root@mau-da-27-4-1:~ # sysctl net.isr
> net.isr.dispatch: direct
> net.isr.maxthreads: 1
> net.isr.bindthreads: 0
> net.isr.maxqlimit: 10240
> net.isr.defaultqlimit: 256
> net.isr.maxprot: 16
> net.isr.numthreads: 1
>
>
> I would sincerely appreciate if you can provide some pointers on these it=
ems above.
>
>
>
>
> Thanks
>
> LN
>
>
>
>
>
>
>
> From: Pokala, Ravi
> Sent: =E2=80=8EWednesday=E2=80=8E, =E2=80=8EMay=E2=80=8E =E2=80=8E20=E2=
=80=8E, =E2=80=8E2015 =E2=80=8E3=E2=80=8E:=E2=80=8E34=E2=80=8E =E2=80=8EAM
> To: freebsd-net@freebsd.org, jfv@freebsd.org, erj@freebsd.org
> Cc: freebsd-hackers@freebsd.org, Lewis, Fred, Sundararajan, Lakshmi
>
>
>
>
>
> Hi folks,
>
> At Panasas, we are working with the Intel XL710 40G NIC (aka Fortville),
> and we're seeing some performance issues w/ 11-CURRENT (r282653).
>
>     Motherboard: Intel S2600KP (aka Kennedy Pass)
>     CPU: E5-2660 v3 @ 2.6GHz (aka Haswell Xeon)
>         (1 socket x 10 physical cores x 2 SMT threads) =3D 20 logical cor=
es
>     NIC: Intel XL710, 2x40Gbps QSFP, configured in 4x10Gbps mode
>     RAM: 4x 16GB DDR4 DIMMs
>
> What we've seen so far:
>
>   - TX performance is pretty consistently lower than RX performance. All
> numbers below are for unidrectional tests using `iperf':
>         10Gbps links    threads/link    TX Gbps     RX Gbps     TX/RX
>         1               1               9.02        9.85        91.57%
>         1               8               8.49        9.91        85.67%
>         1               16              7.00        9.91        70.63%
>         1               32              6.68        9.92        67.40%
>
>   - With multiple active links, both TX and RX performance suffer greatly=
;
> the aggregate bandwidth tops out at about a third of the theoretical
> 40Gbps implied by 4x 10Gbps.
>         10Gbps links    threads/link    TX Gbps     RX Gbps     % of 40Gb=
ps
>         4               1               13.39       13.38       33.4%
>
>   - Multi-link bidirectional throughput is absolutely terrible; the
> aggregate is less than a tenth of the theoretical 40Gbps.
>         10Gbps links    threads/link    TX Gbps     RX Gbps     % of 40Gb=
ps
>         4               1               3.83        2.96        9.6% / 7.=
4%
>
>   - Occasional interrupt storm messages are seen from the IRQs associated
> with the NICs. Since that can impact performance, those runs were not
> included in the data listed above.
>
> Our questions:
>
>   - How stable is ixl(4) in -CURRENT? By that, we mean both how quickly i=
s
> the driver changing, and does the driver cause any system instability?
>
>   - What type of performance have others been getting w/ Fortville? In
> 40Gbps mode? In 4x10Gbps mode?
>
>   - Does anyone have any tuning parameters they can recommend for this
> card?
>
>   - We did our testing w/ 11-CURRENT, but we will initially ship Fortvill=
e
> running on 10.1-RELEASE or 10.2-RELEASE. The presence of RSS - even thoug=
h
> it is disabled by default - makes the driver back-port non-trivial. Is
> there an estimate on when the 11-CURRENT version of the driver (1.4.1)
> will get MFCed to 10-STABLE?
>
> My colleagues Lakshmi and Fred (CCed) are working on this; please make
> sure to include them if you have any comments.
>
> Thanks,
>
> Ravi
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmomvkxxyG7RZj%2BLg-DxciDaF4dWRTZk_5Naaehg6bWuV8w>