Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Oct 2019 11:59:10 +0200
From:      Michael Tuexen <tuexen@freebsd.org>
To:        Paul <devgs@ukr.net>
Cc:        Rick Macklem <rmacklem@uoguelph.ca>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
Subject:   Re: Network anomalies after update from 11.2 STABLE to 12.1 STABLE
Message-ID:  <D77A5604-10E2-412B-89FE-2547ADA9C9A1@freebsd.org>
In-Reply-To: <1571505850.986841000.zen2nmth@frv39.fwdcdn.com>
References:  <1571499556.409350000.a1ewtyar@frv39.fwdcdn.com> <YQBPR0101MB1652CC049B157794AC016378DD6F0@YQBPR0101MB1652.CANPRD01.PROD.OUTLOOK.COM> <1571505850.986841000.zen2nmth@frv39.fwdcdn.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> On 19. Oct 2019, at 19:32, Paul <devgs@ukr.net> wrote:
>=20
> Hi Rick,
>=20
> RST is only one part of a syndrome. Apart from it, we have a ton of =
different
> other issues. For example: a lot (50+) of ACK and [FIN, ACK] =
re-transmissions
> in cases where they are definitely not needed, as seen in tspdump, =
unless the=20
> packets that we see in the dump are not actually processed by the =
kernel(?),=20
> therefore leading to re-transmissions? It definitely has something to =
do with=20
> races, because issue completely disappears when only single queue is =
enabled.
>=20
> In other cases, we have observed that 12.1-STABLE has sent FIN, but =
then,=20
> when sending the ACK it didn't actually increment SEQ, as if those two =
packets
> FIN an ACK were sent concurrently, though ACK was dispatched later. =20=

>=20
> Also, I want to focus on a weird behavior, as I wrote in the original =
post:
> issue also disappears if, multiple TCP streams each use different DST =
port.
> It's as if it has anything to do with sharing a port.
Hi Paul,

I understand that you see the NIC level queue handling as a part of what =
has to
be taken into account. I agree, that having problems there might result =
in packets
send out not in the expected order or packets received not being =
processed in the
expected order.
=46rom a TCP perspective, both cases look like reordering in the network =
and this
might impact the performance in a negative way (unnecessary =
retransmissions,
congestion control limiting the transfer more than it should), but it =
should not
result in TCP connection drops.

Do you have tracefiles (.pcap preferred) from both sides showing =
connection drops?

Best regards
Michael
>=20
>=20
> 19 October 2019, 19:24:43, by "Rick Macklem" <rmacklem@uoguelph.ca>:
>=20
>> Btw, I once ran into a situation where "smart networking" was =
injecting
>> RSTs into a TCP stream. The packet captures at the client and server
>> machines were identical, except for the RSTs and the problem went =
away
>> when I connected the two machines with a cable, bypassing the =
network.
>> Might be worth a try, if you can do it?
>>=20
>> Good luck with it, rick
>>=20
>> ________________________________________
>> From: owner-freebsd-net@freebsd.org <owner-freebsd-net@freebsd.org> =
on behalf of Paul <devgs@ukr.net>
>> Sent: Saturday, October 19, 2019 12:09 PM
>> To: michael.tuexen@lurchi.franken.de; freebsd-net@freebsd.org; =
freebsd-stable@freebsd.org
>> Subject: Re[2]: Network anomalies after update from 11.2 STABLE to =
12.1 STABLE
>>=20
>> Hi Michael,
>>=20
>> Thank you, for taking your time!
>>=20
>> We use physical machines. We don not have any special `pf` rules.
>> Both sides ran `pfctl -d` before testing.
>>=20
>>=20
>> `nginx` config is primitive, no secrets there:
>>=20
>> -------------------------------------------------------------------
>> user  www;
>> worker_processes  auto;
>>=20
>> error_log  /var/log/nginx/error.log warn;
>>=20
>> events {
>>    worker_connections  81920;
>>    kqueue_changes  4096;
>>    use kqueue;
>> }
>>=20
>> http {
>>    include                     mime.types;
>>    default_type                application/octet-stream;
>>=20
>>    sendfile                    off;
>>    keepalive_timeout           65;
>>    tcp_nopush                  on;
>>    tcp_nodelay                 on;
>>=20
>>    # Logging
>>    log_format  main            '$remote_addr - $remote_user =
[$time_local] "$request" '
>>                                '$status $request_length =
$body_bytes_sent "$http_referer" '
>>                                '"$http_user_agent" "$http_x_real_ip" =
"$realip_remote_addr" "$request_completion" "$request_time" '
>>                                '"$request_body"';
>>=20
>>    access_log                  /var/log/nginx/access.log  main;
>>=20
>>    server {
>>        listen                  80 default;
>>=20
>>        server_name             localhost _;
>>=20
>>        location / {
>>            return 404;
>>        }
>>    }
>> }
>> -------------------------------------------------------------------
>>=20
>>=20
>> `wrk` is compiled with a default configuration. We test like this:
>>=20
>> `wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency =
http://10.10.10.92:80/missing`
>>=20
>>=20
>> Also, it seems that our issue, and the one described in this thread, =
are identical:
>>=20
>>   =
https://lists.freebsd.org/pipermail/freebsd-net/2019-June/053667.html
>>=20
>> We both have the Intel network cards, BTW. Our network cards are =
these:
>>=20
>> em0 at pci0:10:0:0:        class=3D0x020000 card=3D0x000015d9 =
chip=3D0x10d38086 rev=3D0x00 hdr=3D0x00
>>    vendor     =3D 'Intel Corporation'
>>    device     =3D '82574L Gigabit Network Connection'
>>=20
>> ixl0 at pci0:4:0:0:        class=3D0x020000 card=3D0x00078086 =
chip=3D0x15728086 rev=3D0x01 hdr=3D0x00
>>    vendor     =3D 'Intel Corporation'
>>    device     =3D 'Ethernet Controller X710 for 10GbE SFP+'
>>=20
>>=20
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
>>=20
>> Additional info:
>>=20
>> During the tests, we have bonded two interfaces into a lagg:
>>=20
>> ixl0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 =
mtu 1500
>>        =
options=3Dc500b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILT=
ER,VLAN_HWTSO,TXCSUM_IPV6>
>>        ether 3c:fd:fe:aa:60:20
>>        media: Ethernet autoselect (10Gbase-SR <full-duplex>)
>>        status: active
>>        nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>> ixl1: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 =
mtu 1500
>>        =
options=3Dc500b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILT=
ER,VLAN_HWTSO,TXCSUM_IPV6>
>>        ether 3c:fd:fe:aa:60:20
>>        hwaddr 3c:fd:fe:aa:60:21
>>        media: Ethernet autoselect (10Gbase-SR <full-duplex>)
>>        status: active
>>        nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>=20
>>=20
>> lagg0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 =
mtu 1500
>>        =
options=3Dc500b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILT=
ER,VLAN_HWTSO,TXCSUM_IPV6>
>>        ether 3c:fd:fe:aa:60:20
>>        inet 10.10.10.92 netmask 0xffff0000 broadcast 10.10.255.255
>>        laggproto failover lagghash l2,l3,l4
>>        laggport: ixl0 flags=3D5<MASTER,ACTIVE>
>>        laggport: ixl1 flags=3D0<>
>>        groups: lagg
>>        media: Ethernet autoselect
>>        status: active
>>        nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>=20
>> using this config:
>>=20
>>    ifconfig_ixl0=3D"up -lro -tso -rxcsum -txcsum"  (tried different =
options - got the same outcome)
>>    ifconfig_ixl1=3D"up -lro -tso -rxcsum -txcsum"
>>    ifconfig_lagg0=3D"laggproto failover laggport ixl0 laggport ixl1 =
10.10.10.92/24"
>>=20
>>=20
>> We have randomly picked `ixl0` and restricted number of RX/TX queues =
to 1:
>>    /boot/loader.conf :
>>    dev.ixl.0.iflib.override_ntxqs=3D1
>>    dev.ixl.0.iflib.override_nrxqs=3D1
>>=20
>> leaving `ixl1` with a default number, matching number of cores (6).
>>=20
>>=20
>>    ixl0: <Intel(R) Ethernet Controller X710 for 10GbE SFP+ - 2.1.0-k> =
mem 0xf8800000-0xf8ffffff,0xf9808000-0xf980ffff irq 40 at device 0.0 on =
pci4
>>    ixl0: fw 5.0.40043 api 1.5 nvm 5.05 etid 80002927 oem 1.261.0
>>    ixl0: PF-ID[0]: VFs 64, MSI-X 129, VF MSI-X 5, QPs 768, I2C
>>    ixl0: Using 1024 TX descriptors and 1024 RX descriptors
>>    ixl0: Using 1 RX queues 1 TX queues
>>    ixl0: Using MSI-X interrupts with 2 vectors
>>    ixl0: Ethernet address: 3c:fd:fe:aa:60:20
>>    ixl0: Allocating 1 queues for PF LAN VSI; 1 queues active
>>    ixl0: PCI Express Bus: Speed 8.0GT/s Width x4
>>    ixl0: SR-IOV ready
>>    ixl0: netmap queues/slots: TX 1/1024, RX 1/1024
>>    ixl1: <Intel(R) Ethernet Controller X710 for 10GbE SFP+ - 2.1.0-k> =
mem 0xf8000000-0xf87fffff,0xf9800000-0xf9807fff irq 40 at device 0.1 on =
pci4
>>    ixl1: fw 5.0.40043 api 1.5 nvm 5.05 etid 80002927 oem 1.261.0
>>    ixl1: PF-ID[1]: VFs 64, MSI-X 129, VF MSI-X 5, QPs 768, I2C
>>    ixl1: Using 1024 TX descriptors and 1024 RX descriptors
>>    ixl1: Using 6 RX queues 6 TX queues
>>    ixl1: Using MSI-X interrupts with 7 vectors
>>    ixl1: Ethernet address: 3c:fd:fe:aa:60:21
>>    ixl1: Allocating 8 queues for PF LAN VSI; 6 queues active
>>    ixl1: PCI Express Bus: Speed 8.0GT/s Width x4
>>    ixl1: SR-IOV ready
>>    ixl1: netmap queues/slots: TX 6/1024, RX 6/1024
>>=20
>>=20
>> This allowed us easy switch between different configurations without
>> the need to reboot, by simply shutting down one interface or the =
other:
>>=20
>>    `ifconfig XXX down`
>>=20
>> When testing `ixl0` that runs only a single queue:
>>    ixl0: Using 1 RX queues 1 TX queues
>>    ixl0: netmap queues/slots: TX 1/1024, RX 1/1024
>>=20
>> we've got these results:
>>=20
>> `wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency =
http://10.10.10.92:80/missing`
>> Running 10s test @ http://10.10.10.92:80/missing
>>  1 threads and 10 connections
>>  Thread Stats   Avg      Stdev     Max   +/- Stdev
>>    Latency   281.31us  297.74us  22.66ms   99.70%
>>    Req/Sec    19.91k     2.79k   21.25k    97.59%
>>  Latency Distribution
>>     50%  266.00us
>>     75%  309.00us
>>     90%  374.00us
>>     99%  490.00us
>>  164440 requests in 10.02s, 47.52MB read
>>  Socket errors: read 0, write 0, timeout 0
>>  Non-2xx or 3xx responses: 164440
>> Requests/sec:  16412.09
>> Transfer/sec:      4.74MB
>>=20
>>=20
>> When testing `ixl1` that runs 6 queues:
>>    ixl1: Using 6 RX queues 6 TX queues
>>    ixl1: netmap queues/slots: TX 6/1024, RX 6/1024
>>=20
>> we've got these results:
>>=20
>> `wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency =
http://10.10.10.92:80/missing`
>> Running 10s test @ http://10.10.10.92:80/missing
>>  1 threads and 10 connections
>>  Thread Stats   Avg      Stdev     Max   +/- Stdev
>>    Latency   216.16us   71.97us 511.00us   47.56%
>>    Req/Sec     4.34k     2.76k   15.44k    83.17%
>>  Latency Distribution
>>     50%  216.00us
>>     75%  276.00us
>>     90%  312.00us
>>     99%  365.00us
>>  43616 requests in 10.10s, 12.60MB read
>>  Socket errors: connect 0, read 24, write 8, timeout 0
>>  Non-2xx or 3xx responses: 43616
>> Requests/sec:   4318.26
>> Transfer/sec:      1.25MB
>>=20
>> Do note, that, not only multiple queues cause issues they also =
dramatically
>> decrease the performance of the network.
>>=20
>> Using `sysctl -w net.inet.tcp.ts_offset_per_conn=3D0` didn't help at =
all.
>>=20
>> Best regards,
>> -Paul
>>=20
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to =
"freebsd-net-unsubscribe@freebsd.org"
>>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D77A5604-10E2-412B-89FE-2547ADA9C9A1>