Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Aug 2013 15:18:36 +0400
From:      "Alexander V. Chernikov" <melifaro@yandex-team.ru>
To:        Luigi Rizzo <rizzo@iet.unipi.it>
Cc:        Lawrence Stewart <lstewart@freebsd.org>, Lev Serebryakov <lev@FreeBSD.org>, FreeBSD Net <net@freebsd.org>
Subject:   Re: route/arp lifetime (Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux))
Message-ID:  <5211FF0C.6020104@yandex-team.ru>
In-Reply-To: <20130814140013.GA65049@onelab2.iet.unipi.it>
References:  <520A6D07.5080106@freebsd.org> <520AFBE8.1090109@freebsd.org> <520B24A0.4000706@freebsd.org> <520B3056.1000804@freebsd.org> <20130814102109.GA63246@onelab2.iet.unipi.it> <587579055.20130814154713@serebryakov.spb.ru> <20130814120551.GA64260@onelab2.iet.unipi.it> <520B74DD.1060102@ipfw.ru> <20130814124024.GA64548@onelab2.iet.unipi.it> <520B7F91.2080209@ipfw.ru> <20130814140013.GA65049@onelab2.iet.unipi.it>

next in thread | previous in thread | raw e-mail | index | archive | help
On 14.08.2013 18:00, Luigi Rizzo wrote:
> On Wed, Aug 14, 2013 at 05:01:05PM +0400, Alexander V. Chernikov wrote:
>> On 14.08.2013 16:40, Luigi Rizzo wrote:
> ...
>>>> You can save rte&arp, however doing this
>>>> gives you perfect chance to crash your kernel if egress interface is
>>>> destroyed (like vlan or ng or tun).
>>> I hope I learned not to follow a stale ifp pointer :)
>> Well, currently we have no locks (or other means)  to ensure all other
>> cores has "current" pointer to ifp or its fields (or am I wrong?)
> This i don't know -- but in case, we should fix the race anyways
> (another timescale, but still dangerous).
>
>>> anyways ARP is really just the mac address so there is no
>>> dandling pointer issue.
>>>
>>> For the ifp associated to the route,
>>> i do not see a huge problem in marking the route/ifp as
>>> zombie and destroy it when the last reference goes away.
>> Yes, but references requires some synchronization primitives. One
> Again, we should protect against ifp destruction anyways.  Surely
> we should try and make the protection mechanism cheap (in my proposal,
> going through the refcount once per millisecond instead of every
Sorry, I still can't get this. Are we talking about egress interface 
refcounts?
Where are refcounts incremented/decremented?

Btw, currently interface destroying is usually synchronous so 
1ms-waiting can effectively reduce interface creation/destroying rate in 
BRAS scenarios (mpd with ng*, Juniper case..)
> single packet; there might be better ways, and i am all ears on
> that); surely, we cannot dismiss something because "we run without
> seatbelts now so anything else is more expensive".
>
> We had a related discussion regarding races in interfaces between
> the datapath (if_transmit() and *_rxeof() ) and the control path
> (ioctls, watchdog etc.).
>
> The reason I am raising this issue is because i want to fix the
> races that emerged when we moved to SMP, not because I want to "make
> hacks" and cut corners in unsafe ways.
That's great! I want this fixed, too :)
>
> cheers
> luigi
>
>> possible solution is using pcpu counters, but it does not play well on
>> !amd64.
>>> Not that the current way is any better -- you need to lock/unlock
>>> the rte while you do the lookup, and hold a refcount to the ifp
>>> until the packet is queued. So how does my suggestion make
>>> things worse ?
>>>
>>> cheers
>>> luigi
>>>
>>>
>>>>> Considering that each lookup takes between 100..300ns if you are
>>>>> lucky (not many misses, relatively empty table etc.), one could
>>>>> reasonably do the lookup at most once per millisecond or so (just
>>>>> reading 'ticks', no need for a nanotime() if you have a slow clock),
>>>>> or whenever we get an error related to the socket, either in the
>>>>> forward path (e.g. ifp points to an interface that is down) or in
>>>>> the reverse path (e.g. a dupack because we sent a packet to the
>>>>> wrong place).
>>>> This sounds like "Hey, the kernel lookup is slow (which is true), let's
>>>> make a hack and don't bother lookups".
>>>> This approach gives us mtx-locked rte refcounts which are used (misused)
>>>> in many places making things worse and decreasing the ability to fix the
>>>> things up..
>>>>> cheers
>>>>> luigi
>>>>> _______________________________________________
>>>>> freebsd-net@freebsd.org mailing list
>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>>>>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5211FF0C.6020104>