From owner-freebsd-arch@FreeBSD.ORG Tue Jan 8 18:29:45 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7461A16A419; Tue, 8 Jan 2008 18:29:45 +0000 (UTC) (envelope-from vadimnuclight@tpu.ru) Received: from relay1.tpu.ru (relay1.tpu.ru [213.183.112.102]) by mx1.freebsd.org (Postfix) with ESMTP id A5D5513C455; Tue, 8 Jan 2008 18:29:44 +0000 (UTC) (envelope-from vadimnuclight@tpu.ru) Received: from localhost (localhost.localdomain [127.0.0.1]) by relay1.tpu.ru (Postfix) with ESMTP id 1C8181048BF; Wed, 9 Jan 2008 00:29:42 +0600 (NOVT) X-Virus-Scanned: amavisd-new at tpu.ru Received: from relay1.tpu.ru ([127.0.0.1]) by localhost (relay1.tpu.ru [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f3E19LpcdGLu; Wed, 9 Jan 2008 00:29:39 +0600 (NOVT) Received: from mail.main.tpu.ru (mail.main.tpu.ru [10.0.0.3]) by relay1.tpu.ru (Postfix) with ESMTP id 486D0104888; Wed, 9 Jan 2008 00:29:39 +0600 (NOVT) Received: from mail.tpu.ru ([213.183.112.105]) by mail.main.tpu.ru with Microsoft SMTPSVC(6.0.3790.3959); Wed, 9 Jan 2008 00:29:38 +0600 Received: from nuclight.avtf.net ([82.117.64.107]) by mail.tpu.ru over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959); Wed, 9 Jan 2008 00:29:33 +0600 Date: Wed, 09 Jan 2008 00:29:28 +0600 To: "Andre Oppermann" References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> <4780E5E7.2070202@FreeBSD.org> <4781197F.1000105@elischer.org> <47814AF0.9070509@freebsd.org> From: "Vadim Goncharov" Organization: AVTF TPU Hostel Content-Type: text/plain; format=flowed; delsp=yes; charset=koi8-r MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID: In-Reply-To: <47814AF0.9070509@freebsd.org> User-Agent: Opera M2/7.54 (Win32, build 3865) X-OriginalArrivalTime: 08 Jan 2008 18:29:33.0482 (UTC) FILETIME=[6A3E30A0:01C85224] Cc: Qing Li , FreeBSD Net , arch@freebsd.org, Ivo Vachkov , Robert Watson , Julian Elischer , "Bruce M. Simpson" Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jan 2008 18:29:45 -0000 07.01.08 @ 03:41 Andre Oppermann wrote: > Vadim Goncharov wrote: >> 07.01.08 @ 00:10 Julian Elischer wrote: >> >>>>> Is multicast and multipath routing the same? >>>> No. They are currently orthogonal. >>>> However it makes sense to merge the multicast and unicast forwarding >>>> code as currently MROUTING is limited to a fan-out of 32 next-hops >>>> only. In multicast, next-hops are normally just interfaces. >>>> Also the IETF MANET ad-hoc IP is going to need hooks there; >>>> multicast in MANET needs to address its next-hops by their unicast >>>> address, and encapsulate the traffic with a header. This is not true >>>> link layer multicast -- although it might use link layer multicast to >>>> leverage the hash filters in 802.11 MACs. >>>> As regards getting ARP out of forwarding tables, this should have >>>> happened a long time ago... >>> >>> I'm not 100 % convinced of this... >>> I was, but I think there may still be a place for a cached arp pointer >>> in hte next hop route to the arp entry for that next hop. >>> I DO however thing that the arp stuff should nto be accessing its >>> data via the routing table. >> Surely, routing table should contain a cached pointer to an entry in >> L2 table (ARP in case of Ethernet), to not do double lookups. But still >> separate those tables... > > Locking hell over again. How do you remove an ARP entry without doing > a full walk over the entire routing table (some 250K entries for the > DFZ)? Make it rmlocks and be done with it. Why a full walk, why such a dumb way? To remove an ARP entry for host A.B.C.D in L2 table of form (A.B.C.D -> 00:01:02:03:04:05), it is enough to do a (usual speed) routing lookup for host A.B.C.D and modify a one pointer in it's rtentry to NULL or remove rtentry (if it's selected to be implemented as cloned). Thus, when on regular forwarding (table read) a routing lookup is done, we already have a FAST access - one pointer dereference - for it's L2 table entry, be it ARP or any other L2 type (which support becoming easily with separation of L2 and L3). And on every modification of L2 table - which is RARE - do lookup with usual speed to modify cached pointer. Compare it with a scheme where for EVERY forwarded packet, there is a need for DOUBLE lookup - after a routing one, do another in L2 table. Current routing table implementation, with all disadvantages of combining L2 and L3, have from the same combinig a one HUGE benefit - performance. And never, ever, ever, ever even try to split L2 from L3 with losing that performance - then it should be still never split, despite all disadvantages, and you'll become an enemy of many, many users. Especially while caching allows to do things reasonably fast. -- WBR, Vadim Goncharov