Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 19 Mar 2017 19:46:49 -0500
From:      "Mike Karels" <mike@karels.net>
To:        "Andrey V. Elsukov" <bu7cher@yandex.ru>
Cc:        freebsd-net@FreeBSD.org, "Alexander V. Chernikov" <melifaro@freebsd.org>,  "Eugene Grosbein" <eugen@freebsd.org>, karels@FreeBSD.org, "George Neville-Neil" <gnn@freebsd.org>
Subject:   Re: LLE reference leak in the L2 cache
Message-ID:  <FC62E82D-4019-4FF8-8EAD-87CC4A369755@karels.net>
In-Reply-To: <70D2287B-664C-48E4-9E8B-68B574BE6CE6@karels.net>
References:  <201703140840.v2E8ecH2040827@mail.karels.net> <3a4c5d87-d42e-5615-5d2b-2a8801376600@yandex.ru> <70D2287B-664C-48E4-9E8B-68B574BE6CE6@karels.net>

next in thread | previous in thread | raw e-mail | index | archive | help
The context has gotten messy here, so I=E2=80=99m going to break down and=
=20
top-post.

I started review https://reviews.freebsd.org/D10059 with a fix for the=20
reference-count leak.
It changes the semantics so only routes within an in_pcb automatically=20
do L2 caching.

I=E2=80=99ll put the tcp_output change for V6 in a separate review when t=
his=20
one is done.

Andrey, could you try your iperf test again? Thanks,

		Mike

On 14 Mar 2017, at 21:39, Mike Karels wrote:

> On 14 Mar 2017, at 3:50, Andrey V. Elsukov wrote:
>
>> On 14.03.2017 11:40, Mike Karels wrote:
>>>> Hi All,
>>>
>>>> Eugene has reported about the following assertion in the ARP code:
>>>> 	http://www.grosbein.net/freebsd/crash/arp-kassert.txt
>>>
>>>> After some investigation I found that L2 cache has reference leak,=20
>>>> that
>>>> can lead to integer overflow and this assertion.
>>>> The one of the ways to reproduce this overflow can be demonstrated=20
>>>> with
>>>> simple IP forwarding, when ip_forward() is used (not=20
>>>> ip_tryforward).
>>>
>>>> I asked olivier@ to reproduce this leak and he got this result:
>>>> 	http://slexy.org/view/s21ql7nA0q
>>>
>>>> After further investigation I found similar leak in the IPv6 TCP=20
>>>> path.
>>>> Simple iperf test shows these results:
>>>
>>>> # dtrace -n 'fbt::in6_lltable_dump_entry:entry {printf("%d",
>>>> args[1]->lle_refcnt);}'
>>>> dtrace: description 'fbt::in6_lltable_dump_entry:entry ' matched 1=20
>>>> probe
>>>> CPU     ID                    FUNCTION:NAME
>>>>  51  18589     in6_lltable_dump_entry:entry 55721
>>>>  51  18589     in6_lltable_dump_entry:entry 1
>>>>  51  18589     in6_lltable_dump_entry:entry 1
>>>>  51  18589     in6_lltable_dump_entry:entry 2
>>>>  38  18589     in6_lltable_dump_entry:entry 111417
>>>>  38  18589     in6_lltable_dump_entry:entry 1
>>>>  38  18589     in6_lltable_dump_entry:entry 1
>>>
>>>> --
>>>> WBR, Andrey V. Elsukov
>>>
>>> Thanks!  Could you try the following patch (compiles, but untested):
>>>
>>> Index: netinet/ip_input.c
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>> --- netinet/ip_input.c	(revision 315160)
>>> +++ netinet/ip_input.c	(working copy)
>>> @@ -60,6 +60,7 @@
>>>  #include <net/if_types.h>
>>>  #include <net/if_var.h>
>>>  #include <net/if_dl.h>
>>> +#include <net/if_llatbl.h>
>>>  #include <net/route.h>
>>>  #include <net/netisr.h>
>>>  #include <net/rss_config.h>
>>> @@ -1066,6 +1067,8 @@
>>>  	if (error =3D=3D EMSGSIZE && ro.ro_rt)
>>>  		mtu =3D ro.ro_rt->rt_mtu;
>>>  	RO_RTFREE(&ro);
>>> +	if (ro.ro_lle)
>>> +		LLE_FREE(ro.ro_lle);
>>>
>>>  	if (error)
>>>  		IPSTAT_INC(ips_cantforward);
>>
>> I think it would be better to set RT_LLE_CACHE flag only for=20
>> protocols
>> that expect presence of L2 cache. I.e. only for the TCP and UDP and=20
>> do
>> it in the corresponding protocol output routine, not in the=20
>> ip[6]_output.
>
> Hmm, let me think about that.  TCP and UDP know nothing about L2=20
> cache,
> they just use the in_pcb cache which handles it.  L3 caching was=20
> removed
> earlier by someone who thought of it as a layering violation, which is=20
> why
> I tried to keep in the IP layer for the most part.  I can probably=20
> find a
> way to encapsulate it.
>
>>
>>> Index: netinet6/ip6_forward.c
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>> --- netinet6/ip6_forward.c	(revision 315160)
>>> +++ netinet6/ip6_forward.c	(working copy)
>>> @@ -52,6 +52,7 @@
>>>  #include <net/if.h>
>>>  #include <net/if_var.h>
>>>  #include <net/netisr.h>
>>> +#include <net/if_llatbl.h>
>>>  #include <net/route.h>
>>>  #include <net/pfil.h>
>>>
>>> @@ -431,4 +432,6 @@
>>>  out:
>>>  	if (rt !=3D NULL)
>>>  		RTFREE(rt);
>>> +	if (rin6.ro_lle)
>>> +		LLE_FREE(rin6.ro_lle);
>>>  }
>>
>> I don't think this chunk will help. ip6_forward() doesn't use
>> ip6_output(). And IPv6 forwarding is not affected by this problem.=20
>> Look
>> at the tcp_output(), it uses local route variable for IPv6 output.
>
> Ah, right, I obviously didn=E2=80=99t read closely enough earlier.  I=E2=
=80=99ve=20
> attached
> a patch that should fix this, as well as adding route caching for=20
> TCP/IPv6.
>
>> I'm not sure, but probably SCTP also can be affected by this problem.
>
> Probably true.  SCTP could probably benefit from L2 caching, but this=20
> also
> argues for making this more transparent.
>
> 		Mike





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FC62E82D-4019-4FF8-8EAD-87CC4A369755>