Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 1 May 2021 00:31:57 +0300
From:      =?UTF-8?B?w5Z6a2FuIEtJUklL?= <ozkan.kirik@gmail.com>
To:        Mark Johnston <markj@freebsd.org>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: IPsec performace - netisr hits %100
Message-ID:  <CAAcX-AG2KyN-7yMm%2BMpKbCRDKivFQjq6BVR0r50t4P3HpDRx=Q@mail.gmail.com>
In-Reply-To: <CAAcX-AHSk92gXQ3HXw4KYpXQ-jTVCjX0svStu5z49ykH-tk2QQ@mail.gmail.com>
References:  <CAAcX-AF=0s5tueCuanFKkoALNkRnWJ-8QrzfCqSu=ReoWvqMug@mail.gmail.com> <YIxpdL9b6v8%2BN%2BLg@nuc> <CAAcX-AHSk92gXQ3HXw4KYpXQ-jTVCjX0svStu5z49ykH-tk2QQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello again,

patch is applied, now netisr is not eating CPU. but performance drops
around 0.2Gbps according to previous kernel.

I tried also both net.isr.maxthreads=3D1 and net.isr.maxthreads=3D4 . resul=
ts
are same

Results are:

- with CCR - 1.8Gbps
top:
  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   14 root        -16    -     0B    16K CPU5     5   1:38 100.00% [crypto
returns 8]
    3 root        -16    -     0B    16K CPU1     1   0:58  77.83% [crypto
returns 0]
   11 root        -92    -     0B  1120K WAIT     4   0:46  41.77%
[intr{irq295: t6nex0:0a0}]
   11 root        -92    -     0B  1120K WAIT     8   0:37  36.83%
[intr{irq297: t6nex0:0a2}]
   11 root        -92    -     0B  1120K WAIT    12   0:19  21.05%
[intr{irq307: t6nex0:1a2}]
 5376 root         23    0    23M  4348K CPU10   10   0:00   3.77% iperf -B
172.16.70.10 -c 172.16.68.1 -P 2 -t 20{iperf}
 5376 root         23    0    23M  4348K sbwait   3   0:00   3.72% iperf -B
172.16.70.10 -c 172.16.68.1 -P 2 -t 20{iperf}
 5358 root         22    0    23M  4348K sbwait  12   0:00   3.72% iperf -B
172.16.70.4 -c 172.16.68.1 -P 2 -t 20{iperf}
...

- with aesni - 1.25Gbps
  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   14 root        -16    -     0B    16K CPU7     7   1:59  99.66% [crypto
returns 8]
    3 root        -16    -     0B    16K CPU3     3   1:10  46.09% [crypto
returns 0]
   11 root        -92    -     0B  1120K CPU14   14   0:45  38.65%
[intr{irq297: t6nex0:0a2}]
    0 root        -20    -     0B  2144K -        8   0:02  20.42%
[kernel{crypto_12}]
    0 root        -20    -     0B  2144K -        5   0:02  18.54%
[kernel{crypto_7}]
    0 root        -20    -     0B  2144K CPU4     4   0:02  18.28%
[kernel{crypto_10}]
    0 root        -20    -     0B  2144K -        9   0:01  17.87%
[kernel{crypto_5}]
    0 root        -20    -     0B  2144K -       11   0:02  17.66%
[kernel{crypto_14}]
    0 root        -20    -     0B  2144K CPU15   15   0:02  16.94%
[kernel{crypto_15}]
    0 root        -20    -     0B  2144K CPU2     2   0:00  16.91%
[kernel{crypto_4}]
   11 root        -92    -     0B  1120K WAIT    12   0:23  16.41%
[intr{irq307: t6nex0:1a2}]
    0 root        -20    -     0B  2144K -        6   0:02  16.28%
[kernel{crypto_13}]
    0 root        -20    -     0B  2144K CPU14   14   0:02  14.86%
[kernel{crypto_11}]
    0 root        -20    -     0B  2144K -       11   0:01   7.37%
[kernel{crypto_9}]
 5761 root         22    0    23M  4348K sbwait  13   0:00   3.96% iperf -B
172.16.70.6 -c 172.16.68.1 -P 2 -t 20{iperf}
 5761 root         22    0    23M  4348K sbwait   5   0:00   3.86% iperf -B
172.16.70.6 -c 172.16.68.1 -P 2 -t 20{iperf}
...

- with QAT - 1.46 Gbps
  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    3 root        -16    -     0B    16K RUN      0   1:23  93.46% [crypto
returns 0]
   14 root        -16    -     0B    16K crypto   2   2:10  35.31% [crypto
returns 8]
   11 root        -92    -     0B  1664K WAIT     8   0:50  21.83%
[intr{irq297: t6nex0:0a2}]
   11 root        -92    -     0B  1664K WAIT    12   0:25   9.72%
[intr{irq307: t6nex0:1a2}]
   11 root        -92    -     0B  1664K WAIT     8   0:00   4.12%
[intr{irq368: qat1}]
 6103 root         21    0    23M  4348K sbwait  12   0:00   3.88% iperf -B
172.16.70.1 -c 172.16.68.1 -P 2 -t 20{iperf}
 6124 root         22    0    23M  4348K sbwait   4   0:00   3.63% iperf -B
172.16.70.8 -c 172.16.68.1 -P 2 -t 20{iperf}

On Fri, Apr 30, 2021 at 11:40 PM =C3=96zkan KIRIK <ozkan.kirik@gmail.com> w=
rote:

> Thank you Mark,
>
> I'm going to recompile kernel with this patch and share results.
> After a few days, I can install stable/13 on same hardware and then I'll
> repeat same tests.
>
> Thanks again,
> Cheers
>
> On Fri, Apr 30, 2021 at 11:32 PM Mark Johnston <markj@freebsd.org> wrote:
>
>> On Fri, Apr 30, 2021 at 11:11:48PM +0300, =C3=96zkan KIRIK wrote:
>> > Hello,
>> >
>> > I'm using FreeBSD stable/12 built world on 12 April 2021.
>> > my setup is:
>> > [freebsd host cc0] <--------> [cc1 - same freebsd, but jail]
>> >
>> > without IPsec, I can achieve easily to 20Gbps. (test was run with
>> different
>> > source IPs using multiple iperf to scale across multiple queues)
>> > My hardware is Xeon D-2146NT (8 core + SoC Qat), cc0 and cc1 is Chelsi=
o
>> > T62100-LP-CR.
>> >
>> > But with IPsec, throughput is limited to 2Gbps (with ccr) and only one
>> > netisr thread hits %100 cpu.
>> > with aesni throughput is 1,4 Gbps
>> > with QAT throughput is 1,6 Gbps (qat0 C62x, qat1 C62x)
>> > with CCR throughput is 2,0 Gbps (t6nex0)
>> > But always bottleneck is netisr.
>> >
>> > Is there any way to workaround this netisr bottleneck ?
>> > I tried to switch net.isr.dispatch to deferred and hybrid, but
>> performance
>> > drops a bit.
>>
>> I can suggest a couple of things to try.  First, we configure only one
>> netisr thread by default.  You can create more by setting the
>> net.isr.maxthreads loader tunable.  I believe netipsec selects a thread
>> using the SPI so adding more threads might not help much depending on
>> your configuration, but testing with e.g., maxthreads =3D ncpu could be
>> illuminating.
>>
>> Second, netipsec unconditionally hands rx processing off to netisr
>> threads for some reason, that's why changing the dispatch policy doesn't
>> help.  Maybe it's to help avoid running out of kernel stack space or to
>> somehow avoid packet reordering in some case that is not clear to me.  I
>> tried a patch (see below) which eliminates this and it helped somewhat.
>> If anyone can provide an explanation for the current behaviour I'd
>> appreciate it.
>>
>> Could you try both approaches and report back?  It would also be
>> interesting to know how your results compare with 13.0, if possible.
>>
>> commit 618ab87449d412a74bfee4932d84a6fc17afce6c
>> Author: Mark Johnston <markj@FreeBSD.org>
>> Date:   Thu Jan 7 11:29:14 2021 -0500
>>
>>     netipsec: Avoid deferred dispatch on the input path
>>
>> diff --git a/sys/netipsec/ipsec_input.c b/sys/netipsec/ipsec_input.c
>> index 48acba68a1fe..98d0954c4c53 100644
>> --- a/sys/netipsec/ipsec_input.c
>> +++ b/sys/netipsec/ipsec_input.c
>> @@ -425,7 +425,7 @@ ipsec4_common_input_cb(struct mbuf *m, struct
>> secasvar *sav, int skip,
>>                 error =3D ipsec_if_input(m, sav, af);
>>         if (error =3D=3D 0) {
>>                 NET_EPOCH_ENTER(et);
>> -               error =3D netisr_queue_src(isr_prot, (uintptr_t)sav->spi=
,
>> m);
>> +               error =3D netisr_dispatch_src(isr_prot,
>> (uintptr_t)sav->spi, m);
>>                 NET_EPOCH_EXIT(et);
>>                 if (error) {
>>                         IPSEC_ISTAT(sproto, qfull);
>> @@ -624,7 +624,7 @@ ipsec6_common_input_cb(struct mbuf *m, struct
>> secasvar *sav, int skip,
>>                         error =3D ipsec_if_input(m, sav, af);
>>                 if (error =3D=3D 0) {
>>                         NET_EPOCH_ENTER(et);
>> -                       error =3D netisr_queue_src(isr_prot,
>> +                       error =3D netisr_dispatch_src(isr_prot,
>>                             (uintptr_t)sav->spi, m);
>>                         NET_EPOCH_EXIT(et);
>>                         if (error) {
>>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAAcX-AG2KyN-7yMm%2BMpKbCRDKivFQjq6BVR0r50t4P3HpDRx=Q>