From owner-freebsd-net@FreeBSD.ORG Fri Dec 14 10:51:13 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3A5A916A421 for ; Fri, 14 Dec 2007 10:51:13 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from out3.smtp.messagingengine.com (out3.smtp.messagingengine.com [66.111.4.27]) by mx1.freebsd.org (Postfix) with ESMTP id 7A00B13C448 for ; Fri, 14 Dec 2007 10:51:12 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from compute1.internal (compute1.internal [10.202.2.41]) by out1.messagingengine.com (Postfix) with ESMTP id CD8707A36B; Fri, 14 Dec 2007 05:51:11 -0500 (EST) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute1.internal (MEProxy); Fri, 14 Dec 2007 05:51:11 -0500 X-Sasl-enc: ueXdmSRIOnM0476F2Fx34H15hoU6MLaXNXXy2l/ILKdy 1197629471 Received: from empiric.lon.incunabulum.net (82-35-112-254.cable.ubr07.dals.blueyonder.co.uk [82.35.112.254]) by mail.messagingengine.com (Postfix) with ESMTP id 40A094EA7; Fri, 14 Dec 2007 05:51:10 -0500 (EST) Message-ID: <4762601D.8020004@FreeBSD.org> Date: Fri, 14 Dec 2007 10:51:09 +0000 From: "Bruce M. Simpson" User-Agent: Thunderbird 2.0.0.6 (X11/20070928) MIME-Version: 1.0 To: Julian Elischer References: <47623B9A.2050603@elischer.org> In-Reply-To: <47623B9A.2050603@elischer.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Net Subject: Re: initial call for review.. initial multi-fib (routing table) support X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Dec 2007 10:51:13 -0000 Julian, First of all, thank you very much for starting this work in a much needed area. Julian Elischer wrote: > This is a call for review for a change that is part of a > longer term project. > > This implements multiple routing tables. Eventually the implementation > will be much cleaner but > the first implementation is designed to be backported to 6.x > and thus must be ABI compatible. It need not be particularly 'clean' > as the version in 8.x will be.. First itis needs to be committed to > -current in its 6.x form so an MFC can occur, then the cleaner version > can be committed over the top of it to clean it up. Few comments: Allocating multiple radix trie heads is one way of doing this, but it would be nice to be able to clean up the memory management in the radix trie in general. I've seen implementations which do this by assigning index numbers or bit sets to the radix trie entries. That way, you don't need to keep multiple redundant copies of the same data around -- this IS the kernel FIB after all, and if you're running a router in the Default Free Zone, or with a considerable BGP topology, this kind of redundancy in the forwarding plane is not an OK use of memory resources. It's been a few months, but I believe this is how OpenBSD does it; ipfw also does something similar deep in its innards, the rules are tagged with bitsets to specify which sets they are present in. [I see similar memory management issues with C++ STL containers, which irritates me; Boost++'s multi_index_container is an analogous idiom.] One of the big strengths of the BSD radix trie, as implemented by Keith Sklower, was that it could be regression tested independently of the kernel. I'd very much like to see this capability retained, and perhaps expanded upon, as this is a sensitive area of work. I'd encourage you to take a look at the OpenBSD changes. They are much less invasive than this patch, and whilst they don't provide the setfib() syscall functionality, that could be easily grafted on top. I understand your folk's requirements for multiple tables, I'm sure there is a possible fit here given the idioms described herein. As I say it's been months since I last had a chance to look at this, and I am busy finishing up the first phase of another project, so I don't have all of these changes to hand -- however -- here's a good date and starting position: http://www.openbsd.org/cgi-bin/cvsweb/src/sys/net/radix.c.diff?r1=1.20&r2=1.21 I know there is an element of Not-Invented-Here which creeps in, but, when all is said and done, OpenBSD's approach is viable, compact, and simple, and addresses folk's immediate requirements for multi-path support. They don't address SMP, multicast, or source address selection, but those are future development stories. cheers BMS