From owner-freebsd-arch@FreeBSD.ORG Thu Jan 3 15:30:37 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB23516A421; Thu, 3 Jan 2008 15:30:37 +0000 (UTC) (envelope-from vadimnuclight@tpu.ru) Received: from relay1.tpu.ru (relay1.tpu.ru [213.183.112.102]) by mx1.freebsd.org (Postfix) with ESMTP id 1894913C4EB; Thu, 3 Jan 2008 15:30:36 +0000 (UTC) (envelope-from vadimnuclight@tpu.ru) Received: from localhost (localhost.localdomain [127.0.0.1]) by relay1.tpu.ru (Postfix) with ESMTP id 8583610459F; Thu, 3 Jan 2008 21:12:20 +0600 (NOVT) X-Virus-Scanned: amavisd-new at tpu.ru Received: from relay1.tpu.ru ([127.0.0.1]) by localhost (relay1.tpu.ru [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BxvKPxzaM4J5; Thu, 3 Jan 2008 21:12:17 +0600 (NOVT) Received: from mail.main.tpu.ru (mail.main.tpu.ru [10.0.0.3]) by relay1.tpu.ru (Postfix) with ESMTP id 507381044D0; Thu, 3 Jan 2008 21:12:17 +0600 (NOVT) Received: from mail.tpu.ru ([213.183.112.105]) by mail.main.tpu.ru with Microsoft SMTPSVC(6.0.3790.3959); Thu, 3 Jan 2008 21:12:17 +0600 Received: from nuclight.avtf.net ([78.140.2.250]) by mail.tpu.ru over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959); Thu, 3 Jan 2008 21:12:17 +0600 Date: Thu, 03 Jan 2008 21:12:12 +0600 To: "Julian Elischer" , "Ivo Vachkov" References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> From: "Vadim Goncharov" Organization: AVTF TPU Hostel Content-Type: text/plain; format=flowed; delsp=yes; charset=koi8-r MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID: In-Reply-To: <477416CC.4090906@elischer.org> User-Agent: Opera M2/7.54 (Win32, build 3865) X-OriginalArrivalTime: 03 Jan 2008 15:12:17.0135 (UTC) FILETIME=[072A37F0:01C84E1B] Cc: arch@freebsd.org, FreeBSD Net , Robert Watson , Qing Li Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2008 15:30:37 -0000 28.12.07 @ 03:19 Julian Elischer wrote: > By the way, I might add that in the 6.x compat. version I may end up > limiting the feature to 8 tables. This is because I need to store some > stuff in an efficient way in the mbuf, and in a compatible manner this > is easiest done by stealing the top 4 bits in the mbuf dlags word > and defining them as: > > #define M_HAVEFIB 0x10000000 > #define M_FIBMASK 0x07 > #define M_FIBNUM 0xe0000000 > #define M_FIBSHIFT 29 > #define m_getfib(_m, _default) ((m->m_flags & M_HAVE_FIBNUM) ? > ((m->m_flags >> M_FIBSHIFT) & M_FIBMASK) : _default) > #M_SETFIB(_m, _fib) do { \ > _m->m_flags &= ~M_FIBNUM; \ > _m->m_flags |= (M_HAVEFIB|((_fib & M_FIBMASK) << M_FIBSHIFT));\ > } while (0) > > This then becomes very easy to change to use a tag or > whatever is needed in later versions , and the number can > be expanded past 8 predefined FIBs at that time.. If you want it to be a tag, why spent bits in m_flags and not just do it as a tag at once? Or it is supposed to completely throw away 6.x (possibly 7.x too) implementation in favor of right thing in 8.0 ? >>> This brings us as to how the correct FIB is selected for an outgoing >>> IPV4 packet. >>> >>> Packets fall into one of a number of classes. >>> 1/ locally generated packets, coming from a socket/PCB. >>> Such packets select a FIB from a number associated with the >>> socket/PCB. This in turn is inherited from the process, >>> but can be changed by a socket option. The process in turn >>> inherits it on fork. I have written a utility call setfib >>> that acts a bit like nice.. >>> >>> setfib -n 3 ping target.example.com # will use fib 3 for ping. Pretty cool! >>> 2/ packets received on an interface for forwarding. >>> By default these packets would use table 0, >>> (or possibly a number settable in a sysctl(not yet)). >>> but prior to routing the firewall can inspect them (see below). >>> >>> 3/ packets inspected by a packet classifier, which can arbitrarily >>> associate a fib with it on a packet by packet basis. >>> A fib assigned to a packet by a packet classifier >>> (such as ipfw) would over-ride a fib associated by >>> a more default source. (such as cases 1 or 2). Sounds good. I like idea to do routing decisions in firewall, to not double kernel code and userspace utilities, like in Linux' iproute2 (which, however, still have a few parameters and relies on firewall marks for others). However, there are some cases, I think, where it could be done outisde firewall. For example, make an ifconfig option to use a specific FIB as a default for all packets outgoing from this interface's address. But here arises another related question - Linux allows to select a specific src IP based on a routing table entry - destination address (thoughts about pf reply-to/route-ro, huh). In relation to this I can remember multipath routing (different metrics?), addresses from one subnet on different ifaces (mask wider /32) and so on. Also it is interesting, how multiple FIBs would interact with host-wide events, such as ICMP redirects (which table should be updated?), storing of TCP stack metrics (MTU, etc.) and hostcache, and so on. How these and above will be solved?.. per ifconfig (>1 host per subnet)/icmp redirects/src to prefer, multipath/metrics, tcp stack parameters interaction, iproute2 >>> Routing messages would be associated with their >>> process, and thus select one FIB or another. This is not clear. How should the 'route' command work with different FIBs, if they are supposed by admin to be used for forwarding, and not the straight per-process? I think a setfib option is more consistent than running route under setfib command. Also, routing sockets and routing daemons - should they work with only one table?.. >>> I have not yet added the changes to ipfw. Action modifier, like 'ipfw add count setfib 3 ip from any to any' ? There were thoughts (I heard,t as a hack before multiple FIBs) about making an additional, say, 'nexthop' ipfw action, which acts like fwd, but does not accept packet, allowing to continue it through firewall ruleset - thus making it more comfortable to separate routing (imagine 'nexthop tablearg') and filtering. There are questions with both fwd and new supposed option: will fwd still survive? Will it change the output interface, like as complete rerouting before calling pfil(9) hooks, so that *oif will be changed to be mathed iin rules below? pf route-to/reply-to is hanging around... >>> pf has some similar changes already but they seem to rely on >>> the various FIBs having symbolic names. Which I do not plan to support >>> in the first version of these changes. I think this is what pf team should care about, not we, as it lives in ../contrib. Though they can use something like sysctl with symbolic-name-to-system-FIB-number translator or such. >>> Interaction with the ARP layer/ LL layer would need to be >>> revisited as well. Qing Li has been working on this already. Oh yes, L2 interaction is interesting. How it should work in case of planned separation of routing and ARP tables?.. -- WBR, Vadim Goncharov