Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Aug 2013 15:28:46 +0400
From:      "Alexander V. Chernikov" <melifaro@ipfw.ru>
To:        Marko Zec <zec@fer.hr>
Cc:        freebsd-net@freebsd.org, Lev Serebryakov <lev@freebsd.org>, Luigi Rizzo <rizzo@iet.unipi.it>, FreeBSD Net <net@freebsd.org>, Lawrence Stewart <lstewart@freebsd.org>
Subject:   Re: route/arp lifetime (Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux))
Message-ID:  <5212016E.8040609@ipfw.ru>
In-Reply-To: <201308141740.28779.zec@fer.hr>
References:  <520A6D07.5080106@freebsd.org> <520B74DD.1060102@ipfw.ru> <20130814124024.GA64548@onelab2.iet.unipi.it> <201308141740.28779.zec@fer.hr>

next in thread | previous in thread | raw e-mail | index | archive | help
On 14.08.2013 19:40, Marko Zec wrote:
> On Wednesday 14 August 2013 14:40:24 Luigi Rizzo wrote:
>> On Wed, Aug 14, 2013 at 04:15:25PM +0400, Alexander V. Chernikov wrote:
>>> On 14.08.2013 16:05, Luigi Rizzo wrote:
>>>> On Wed, Aug 14, 2013 at 03:47:13PM +0400, Lev Serebryakov wrote:
>>>>> Hello, Luigi.
>>>>> You wrote 14 ?????????????? 2013 ??., 14:21:09:
>>>>>
>>>>> LR> Then the problem remains that we should keep a copy of route and
>>>>> LR> arp information in the socket instead of redoing the lookups on
>>>>> LR> every single transmission, as they consume some 25% of the time
>>>>> of LR> a sendto(), and probably even more when it comes to large tcp
>>>>> LR> segments, sendfile() and the like.
>>>>>     And we should invalidate this info on ARP/route changes, or
>>>>> connection will be lost in such cases, am I right?.. So, on each
>>>>> such event code should look into all sockets and check, if
>>>>> routing/ARP information is still valid for them. Or we should store
>>>>> lists of sockets in routing and ARP tables... I don't know, what is
>>>>> worse.
>>>> I think we should start by acknowledging that routing and ARP
>>>> information is inherently stale, and changes unfrequently.
>>>> So it is not a disaster if we have incorrect information for some
>>>> short amount of time (milliseconds) because in the end the remote
>>>> party that decides to change it and inform us may take much longer
>>>> than that to distribute the update.
>>> You can save rte&arp, however doing this
>>> gives you perfect chance to crash your kernel if egress interface is
>>> destroyed (like vlan or ng or tun).
>> I hope I learned not to follow a stale ifp pointer :)
>> anyways ARP is really just the mac address so there is no
>> dandling pointer issue.
>>
>> For the ifp associated to the route,
>> i do not see a huge problem in marking the route/ifp as
>> zombie and destroy it when the last reference goes away.
> FWIW, apparently we already have that infrastrucure in place - if_rele()
> calls if_free_internal() only when the last reference to the ifnet is
> dropped, so with little care this should be usable for caching ifp pointers
> w/o fears for kernel crashes mentioned above.
There are several different approaches for interface pointers like 
delayed GC with refcount.
However, the problem is a bit deeper: think of virtual interface like 
vlan saving some state to
underlying physical interface. While we have valid if_vlan structure, 
drivers like ixgbe (or lagg)
already destroys given state which can lead to possible crashes. I'm not 
sure if this can happen
with current code (I've observed some strange crashes on 8.x) but this 
is definitely the thing we
should keep in mind.
>
> Marko
>
>> Not that the current way is any better -- you need to lock/unlock
>> the rte while you do the lookup, and hold a refcount to the ifp
>> until the packet is queued. So how does my suggestion make
>> things worse ?
>>
>> cheers
>> luigi
>>
>>>> Considering that each lookup takes between 100..300ns if you are
>>>> lucky (not many misses, relatively empty table etc.), one could
>>>> reasonably do the lookup at most once per millisecond or so (just
>>>> reading 'ticks', no need for a nanotime() if you have a slow clock),
>>>> or whenever we get an error related to the socket, either in the
>>>> forward path (e.g. ifp points to an interface that is down) or in
>>>> the reverse path (e.g. a dupack because we sent a packet to the
>>>> wrong place).
>>> This sounds like "Hey, the kernel lookup is slow (which is true), let's
>>> make a hack and don't bother lookups".
>>> This approach gives us mtx-locked rte refcounts which are used
>>> (misused) in many places making things worse and decreasing the ability
>>> to fix the things up..
>>>
>>>> cheers
>>>> luigi
>>>> _______________________________________________
>>>> freebsd-net@freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>>> To unsubscribe, send any mail to
>>>> "freebsd-net-unsubscribe@freebsd.org"
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5212016E.8040609>