Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 3 Aug 2020 17:22:37 -0400
From:      Joe Clarke <jclarke@marcuscom.com>
To:        Mark Johnston <markj@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Traffic "corruption" in 12-stable
Message-ID:  <3F5D4874-C8D6-4D77-AE9F-D5EAB750DDB4@marcuscom.com>
In-Reply-To: <2F974A4E-95B3-4C65-A5F8-6FBBB575B756@marcuscom.com>
References:  <9FAE54DE-F409-4A53-B91E-59AE52A86513@marcuscom.com> <20200727190147.GC59953@raichu> <2F974A4E-95B3-4C65-A5F8-6FBBB575B756@marcuscom.com>

next in thread | previous in thread | raw e-mail | index | archive | help


> On Jul 27, 2020, at 15:41, Joe Clarke <jclarke@marcuscom.com> wrote:
>=20
>=20
>=20
>> On Jul 27, 2020, at 15:01, Mark Johnston <markj@freebsd.org> wrote:
>>=20
>> On Sun, Jul 26, 2020 at 06:16:07PM -0400, Joe Clarke wrote:
>>> About two weeks ago, I upgraded from the latest 11-stable to the =
latest 12-stable.  After that, I periodically see the network throughput =
come to a near standstill.  This FreeBSD machine is an ESXi VM with two =
interfaces.  It acts as a router.  It uses vmxnet3 interfaces for both =
LAN and WAN.  It runs ipfw with in-kernel NAT.  The LAN side uses a =
bridge with vmx0 and a tap0 L2 VPN interface.  My LAN side uses an MTU =
of 9000, and my vmx1 (WAN side) uses the default 1500.
>>>=20
>>> Besides seeing massive packet loss and huge latency (~ 200 ms for =
on-LAN ping times), I know the problem has occurred because my lldpd =
reports:
>>>=20
>>> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received =
on bridge0
>>>=20
>>> And if I turn on ipfw verbose messages, I see tons of:
>>>=20
>>> Jul 26 16:02:23 namale kernel: ipfw: pullup failed
>>>=20
>>> This leads to me to believe packets are being corrupted on ingress.  =
I=E2=80=99ve applied all the recent iflib changes, but the problem =
persists. What causes it, I don=E2=80=99t know.
>>>=20
>>> The only thing that changed (and yes, it=E2=80=99s a big one) is I =
upgraded to 12-stable.  Meaning, the rest of the network infra and =
topology has remained the same.  This did not happen at all in =
11-stable.
>>>=20
>>> I=E2=80=99m open to suggestions.
>>=20
>> There are some fixes for vmx not present in stable/12 (yet).  I did a
>> merge of a number of outstanding revisions.  Would you be able to =
test
>> the patch?  I haven't observed any problems with it on a host using =
igb,
>> but I have no ability to test vmx at the moment.
>=20
> I=E2=80=99m down to test anything.  I did notice quite a few vmxnet3 =
changes around performance that appealed to me.  I tried a few of them =
on my last kernel.  That took much longer to exhibit the problem, but =
eventually did.
>=20
> I can tell you I don=E2=80=99t have all of these patches in, though.  =
I=E2=80=99ll build with this diff and start running it now.  I=E2=80=99ll =
let you know how it goes.

So it=E2=80=99s been just over a week of runtime with this full patch =
set.  I have seen no further issues with ingress packet =
=E2=80=9Ctruncation=E2=80=9D, and performance has been what I expect.  =
I=E2=80=99m going to keep running, but I think this seems like a good =
set to MFC.

Thanks again for your help.

Joe


---
PGP Key : http://www.marcuscom.com/pgp.asc







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3F5D4874-C8D6-4D77-AE9F-D5EAB750DDB4>