From owner-freebsd-net@FreeBSD.ORG Wed Aug 14 12:17:18 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 3377D459; Wed, 14 Aug 2013 12:17:18 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from mail.ipfw.ru (unknown [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E9C73258F; Wed, 14 Aug 2013 12:17:13 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=dhcp170-36-red.yandex.net) by mail.ipfw.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1V9a3m-000Pmr-7T; Wed, 14 Aug 2013 16:20:18 +0400 Message-ID: <520B74DD.1060102@ipfw.ru> Date: Wed, 14 Aug 2013 16:15:25 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130418 Thunderbird/17.0.5 MIME-Version: 1.0 To: Luigi Rizzo Subject: Re: route/arp lifetime (Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)) References: <520A6D07.5080106@freebsd.org> <520AFBE8.1090109@freebsd.org> <520B24A0.4000706@freebsd.org> <520B3056.1000804@freebsd.org> <20130814102109.GA63246@onelab2.iet.unipi.it> <587579055.20130814154713@serebryakov.spb.ru> <20130814120551.GA64260@onelab2.iet.unipi.it> In-Reply-To: <20130814120551.GA64260@onelab2.iet.unipi.it> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Lawrence Stewart , Lev Serebryakov , FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Aug 2013 12:17:18 -0000 On 14.08.2013 16:05, Luigi Rizzo wrote: > On Wed, Aug 14, 2013 at 03:47:13PM +0400, Lev Serebryakov wrote: >> Hello, Luigi. >> You wrote 14 ?????????????? 2013 ??., 14:21:09: >> >> LR> Then the problem remains that we should keep a copy of route and >> LR> arp information in the socket instead of redoing the lookups on >> LR> every single transmission, as they consume some 25% of the time of >> LR> a sendto(), and probably even more when it comes to large tcp >> LR> segments, sendfile() and the like. >> And we should invalidate this info on ARP/route changes, or connection >> will be lost in such cases, am I right?.. So, on each such event code >> should look into all sockets and check, if routing/ARP information is still >> valid for them. Or we should store lists of sockets in routing and ARP >> tables... I don't know, what is worse. > I think we should start by acknowledging that routing and ARP > information is inherently stale, and changes unfrequently. > So it is not a disaster if we have incorrect information for some > short amount of time (milliseconds) because in the end the remote > party that decides to change it and inform us may take much longer > than that to distribute the update. You can save rte&arp, however doing this gives you perfect chance to crash your kernel if egress interface is destroyed (like vlan or ng or tun). > > > Considering that each lookup takes between 100..300ns if you are > lucky (not many misses, relatively empty table etc.), one could > reasonably do the lookup at most once per millisecond or so (just > reading 'ticks', no need for a nanotime() if you have a slow clock), > or whenever we get an error related to the socket, either in the > forward path (e.g. ifp points to an interface that is down) or in > the reverse path (e.g. a dupack because we sent a packet to the > wrong place). This sounds like "Hey, the kernel lookup is slow (which is true), let's make a hack and don't bother lookups". This approach gives us mtx-locked rte refcounts which are used (misused) in many places making things worse and decreasing the ability to fix the things up.. > > cheers > luigi > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >