From owner-freebsd-net@FreeBSD.ORG Mon Aug 19 11:20:38 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id C1C02B65; Mon, 19 Aug 2013 11:20:38 +0000 (UTC) (envelope-from melifaro@yandex-team.ru) Received: from forward-corp1e.mail.yandex.net (forward-corp1e.mail.yandex.net [IPv6:2a02:6b8:0:202::10]) by mx1.freebsd.org (Postfix) with ESMTP id 4CDD7274C; Mon, 19 Aug 2013 11:20:38 +0000 (UTC) Received: from smtpcorp4.mail.yandex.net (smtpcorp4.mail.yandex.net [95.108.252.2]) by forward-corp1e.mail.yandex.net (Yandex) with ESMTP id AB663640CD1; Mon, 19 Aug 2013 15:20:35 +0400 (MSK) Received: from smtpcorp4.mail.yandex.net (localhost [127.0.0.1]) by smtpcorp4.mail.yandex.net (Yandex) with ESMTP id 907562C00D9; Mon, 19 Aug 2013 15:20:35 +0400 (MSK) Received: from dhcp170-36-red.yandex.net (dhcp170-36-red.yandex.net [95.108.170.36]) by smtpcorp4.mail.yandex.net (nwsmtp/Yandex) with ESMTP id GghtcFiAId-KZxS3VtY; Mon, 19 Aug 2013 15:20:35 +0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1376911235; bh=+SIyAFm+OOkyUqRQhL7dQ7HAHvlTgwstk1arhxAfChY=; h=Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject: References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=a16oI5f5lLpv1t91MgOuXYoIuFGFOZMq3+lOiz/7cZ7IEK7t25yBM+Z8MKr42+QGH XkvImWDPUyl4KXoKcL6tUmgyoTwzE7YioJlCl/Wr0WSJlVzb6I4wEuZMfJXZWy4mFU z/pgVUj9qkOwvWfO09Tu9okZAXphxnBfmHL4jN3c= Authentication-Results: smtpcorp4.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Message-ID: <5211FF0C.6020104@yandex-team.ru> Date: Mon, 19 Aug 2013 15:18:36 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130418 Thunderbird/17.0.5 MIME-Version: 1.0 To: Luigi Rizzo Subject: Re: route/arp lifetime (Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)) References: <520A6D07.5080106@freebsd.org> <520AFBE8.1090109@freebsd.org> <520B24A0.4000706@freebsd.org> <520B3056.1000804@freebsd.org> <20130814102109.GA63246@onelab2.iet.unipi.it> <587579055.20130814154713@serebryakov.spb.ru> <20130814120551.GA64260@onelab2.iet.unipi.it> <520B74DD.1060102@ipfw.ru> <20130814124024.GA64548@onelab2.iet.unipi.it> <520B7F91.2080209@ipfw.ru> <20130814140013.GA65049@onelab2.iet.unipi.it> In-Reply-To: <20130814140013.GA65049@onelab2.iet.unipi.it> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Lawrence Stewart , Lev Serebryakov , FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Aug 2013 11:20:38 -0000 On 14.08.2013 18:00, Luigi Rizzo wrote: > On Wed, Aug 14, 2013 at 05:01:05PM +0400, Alexander V. Chernikov wrote: >> On 14.08.2013 16:40, Luigi Rizzo wrote: > ... >>>> You can save rte&arp, however doing this >>>> gives you perfect chance to crash your kernel if egress interface is >>>> destroyed (like vlan or ng or tun). >>> I hope I learned not to follow a stale ifp pointer :) >> Well, currently we have no locks (or other means) to ensure all other >> cores has "current" pointer to ifp or its fields (or am I wrong?) > This i don't know -- but in case, we should fix the race anyways > (another timescale, but still dangerous). > >>> anyways ARP is really just the mac address so there is no >>> dandling pointer issue. >>> >>> For the ifp associated to the route, >>> i do not see a huge problem in marking the route/ifp as >>> zombie and destroy it when the last reference goes away. >> Yes, but references requires some synchronization primitives. One > Again, we should protect against ifp destruction anyways. Surely > we should try and make the protection mechanism cheap (in my proposal, > going through the refcount once per millisecond instead of every Sorry, I still can't get this. Are we talking about egress interface refcounts? Where are refcounts incremented/decremented? Btw, currently interface destroying is usually synchronous so 1ms-waiting can effectively reduce interface creation/destroying rate in BRAS scenarios (mpd with ng*, Juniper case..) > single packet; there might be better ways, and i am all ears on > that); surely, we cannot dismiss something because "we run without > seatbelts now so anything else is more expensive". > > We had a related discussion regarding races in interfaces between > the datapath (if_transmit() and *_rxeof() ) and the control path > (ioctls, watchdog etc.). > > The reason I am raising this issue is because i want to fix the > races that emerged when we moved to SMP, not because I want to "make > hacks" and cut corners in unsafe ways. That's great! I want this fixed, too :) > > cheers > luigi > >> possible solution is using pcpu counters, but it does not play well on >> !amd64. >>> Not that the current way is any better -- you need to lock/unlock >>> the rte while you do the lookup, and hold a refcount to the ifp >>> until the packet is queued. So how does my suggestion make >>> things worse ? >>> >>> cheers >>> luigi >>> >>> >>>>> Considering that each lookup takes between 100..300ns if you are >>>>> lucky (not many misses, relatively empty table etc.), one could >>>>> reasonably do the lookup at most once per millisecond or so (just >>>>> reading 'ticks', no need for a nanotime() if you have a slow clock), >>>>> or whenever we get an error related to the socket, either in the >>>>> forward path (e.g. ifp points to an interface that is down) or in >>>>> the reverse path (e.g. a dupack because we sent a packet to the >>>>> wrong place). >>>> This sounds like "Hey, the kernel lookup is slow (which is true), let's >>>> make a hack and don't bother lookups". >>>> This approach gives us mtx-locked rte refcounts which are used (misused) >>>> in many places making things worse and decreasing the ability to fix the >>>> things up.. >>>>> cheers >>>>> luigi >>>>> _______________________________________________ >>>>> freebsd-net@freebsd.org mailing list >>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >>>>>