Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Apr 2016 18:10:26 -0300
From:      =?UTF-8?Q?Z=C3=A9_Claudio_Pastore?= <zclaudio@bsd.com.br>
To:        Ryan Stone <rysto32@gmail.com>
Cc:        freebsd-net <freebsd-net@freebsd.org>
Subject:   Re: Regression? VLAN packet drop after upgrading from r281235
Message-ID:  <CAEGk6G4SxNfb8Ph=Cq0rRATPvFwFqF9jgg%2BsMvMUhc8z554osw@mail.gmail.com>
In-Reply-To: <CAFMmRNyY67RGyb8%2BaS=HCLEpzki3n0JiT5QYXO5xnjz5vyYxMA@mail.gmail.com>
References:  <CAEGk6G4rq=yE14rDcxhJZZ0drstr=fse%2B9aemVYqdt68Gg=bpQ@mail.gmail.com> <CAFMmRNyY67RGyb8%2BaS=HCLEpzki3n0JiT5QYXO5xnjz5vyYxMA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello Ryan,

2016-04-27 17:28 GMT-03:00 Ryan Stone <rysto32@gmail.com>:

> From a quick look at the vlan code, I can identify a few cases that might
> cause that counter to increment:
>
> 1) Error from the underlying ixgbe device.  Does "netstat -dI ix0" show
> that the driver has been dropping packets?
>

No, it does not increase drop counters on ix port, only on the vlan device.


>
> 2) Link down events on the underlying NIC.  I believe that link flaps wil=
l
> be logged to /var/log/messages and dmesg; do you see anything there that
> might correspond to the time of the packet drops?
>

No, dmesg is clean, only a couple down/up link when I actually did
disconnect the port, and no other message on /var/log/messages that grabs
my attention.


>
> 3) If VLAN_HWTAGGING is disabled through ifconfig on the port, then in
> theory a low memory event could cause the packet to be dropped.  Does
> "netstat -m" show that "requests for mbufs denied" increasing?
>

Here is the ifconfig -v output for the vlan6 on the 10.1-STABLE system

vlan6: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 15=
00
options=3D303<RXCSUM,TXCSUM,TSO4,TSO6>
ether a0:36:9f:2a:6d:ae
inet6 fe80::a236:9fff:fe2a:6dae%vlan6 prefixlen 64 scopeid 0x19
inet6 2804:1054:bad:b1fe::1 prefixlen 64
nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (10Gbase-SR <full-duplex>)
status: active
vlan: 3005 parent interface: ix3
groups: vlan

And here it is on the 10.3-STABLE system, I dont know why the only
difference is no options were printed on the newer system, everything else
is the same.

vlan6: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 15=
00
ether a0:36:9f:2a:6d:ae
inet6 fe80::a236:9fff:fe2a:6dae%vlan6 prefixlen 64 scopeid 0x19
inet6 2804:1054:bad:b1fe::1 prefixlen 64
nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (10Gbase-SR <full-duplex>)
status: active
vlan: 3005 parent interface: ix3
groups: vlan

This is the netstat -m output when system has packet loss. Denied and
delayed counters are zeroed.

 % netstat -m
12365/21040/33405 mbufs in use (current/cache/total)
12310/14530/26840/505076 mbuf clusters in use (current/cache/total/max)
12310/14508 mbuf+clusters out of packet secondary zone in use
(current/cache)
0/225/225/252538 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/74826 9k jumbo clusters in use (current/cache/total/max)
0/0/0/42089 16k jumbo clusters in use (current/cache/total/max)
27711K/35220K/62931K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile



>
> On Wed, Apr 27, 2016 at 2:41 PM, Z=C3=A9 Claudio Pastore <zclaudio@bsd.co=
m.br>
> wrote:
>
>> Hello,
>>
>> On a BGP border router I help manage, we run FreeBSD 10.1-STABLE,
>> version r281235 and it works fine for several years now.
>>
>> We have around 4Gbit/s and 1.8Mpps routed on peak while per port interfa=
ce
>> we peak at 300Kpps.
>>
>> Our quality metrics are measured with:
>>
>> ping -s 1472 -i 0.1 <our-other-ibgp-router>
>>
>> As well as iperf bidirecional.
>>
>> This metric is similar to what Speedy Test and SIMET tests are done and
>> our
>> customers reference.
>>
>> Systems working w/o problem:
>> - 10.1-STABLE / r281235
>>
>> Systems tested with drops:
>> - 10.2-STABLE / r292035M
>> - 10.3-STABLE / r298705
>> - 11.0-CURRENT / r295683 (downloaded snapshot from ftp.freebsd.org)
>> - 11.0-CURRENT Melifaro Routing Branch / r297731M
>>
>> While testing, when errors happen I can see output errs on the vlan port
>> on
>> the output from "netstat -w1 -I vlan6"
>>
>>            input          vlan6           output
>>    packets  errs idrops      bytes    packets  errs      bytes colls
>>          1     0     0         66      30557     2   33310968     0
>>          1     0     0        105      31458     3   33912219     0
>>          2     0     0       2954      32001     8   34983986     0
>>          1     0     0       1512      33150     6   35942558     0
>>          1     0     0       1512      33654     4   37311862     0
>>          1     0     0       1512      34825     3   38213793     0
>>          3     0     0       1683      35376     4   39488912     0
>>          5     0     0       7280      32423     3   35551869     0
>>
>> Problems may happen under high load (~200Kpps) or low load (~30Kpps) on =
a
>> vlan port. The observed frame loss never happens on untagged ports, only
>> vlan related. The observed loss happens with packets sized 900 bytes and
>> above but noticeably loss rate is higher with packets close to 1400 (147=
2
>> is my reference size).
>>
>> Loss rate on all listed systems different from r281235 is 9-19% with
>> ping(1) and iperf, while it's 0% on r281235.
>>
>> First I believed it to be a Intel driver error on systems newer than 10.=
1.
>> My reference card are dual port 82599EB 10-Gigabit SFI/SFP+ Network
>> Connection (2x2 on x8 PCIe bus, total 4x10G). But yesterday I replaced
>> Intel by Chelsio T5 and the problem is still exactly the same, so it's n=
ot
>> related to card vendor.
>>
>> I always test the very same hardware, I have two SSD drives in this
>> router,
>> one for the 10.1 which just runs fine and the other disk to test the
>> various versions of FreeBSD.
>>
>> Only minor loader and sysctl confs are tweaked:
>>
>> kern.hz=3D2000
>> net.inet.ip.redirect=3D1                # do not send IP redirects
>> net.inet.ip.accept_sourceroute=3D0      # drop source routed packets sin=
ce
>> they ca
>> net.inet.ip.sourceroute=3D0             # if source routed packets are
>> accepted th
>> net.inet.tcp.drop_synfin=3D1            # SYN/FIN packets get dropped on
>> initial c
>> net.inet.udp.blackhole=3D1              # drop udp packets destined for
>> closed soc
>> net.inet.tcp.blackhole=3D2              # drop tcp packets destined for
>> closed por
>> security.bsd.see_other_uids=3D0
>>
>> Can anyone suggest what might be a fix/tuning for this behavior? Was the=
re
>> any relevant change on vlan code from particular revisions close to the
>> one
>> I run on 10.1 and later which would lead to such a big difference?
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAEGk6G4SxNfb8Ph=Cq0rRATPvFwFqF9jgg%2BsMvMUhc8z554osw>