Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 03 Jan 2008 10:52:35 -0800
From:      Julian Elischer <julian@elischer.org>
To:        Vadim Goncharov <vadimnuclight@tpu.ru>
Cc:        arch@freebsd.org, Ivo Vachkov <ivo.vachkov@gmail.com>, Robert Watson <rwatson@freebsd.org>, Qing Li <qingli@freebsd.org>, FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: resend: multiple routing table roadmap (format fix)
Message-ID:  <477D2EF3.2060909@elischer.org>
In-Reply-To: <opt4c0imk24fjv08@nuclight.avtf.net>
References:  <4772F123.5030303@elischer.org>	<f85d6aa70712261728h331eadb8p205d350dc7fb7f4c@mail.gmail.com>	<477416CC.4090906@elischer.org> <opt4c0imk24fjv08@nuclight.avtf.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Vadim Goncharov wrote:
> 28.12.07 @ 03:19 Julian Elischer wrote:
> 
>> By the way, I might add that in the 6.x compat. version I may end up
>> limiting the feature to 8 tables. This is because I need to store some
>> stuff in an efficient way in the mbuf, and in a compatible manner this 
>> is easiest done by stealing the top 4 bits in the mbuf dlags word
>> and defining them as:
>>
>>   #define M_HAVEFIB    0x10000000
>>   #define M_FIBMASK    0x07
>>   #define M_FIBNUM    0xe0000000
>>   #define M_FIBSHIFT    29
>>   #define m_getfib(_m, _default) ((m->m_flags & M_HAVE_FIBNUM) ? 
>> ((m->m_flags >> M_FIBSHIFT) & M_FIBMASK) : _default)
>>   #M_SETFIB(_m, _fib) do { \
>>     _m->m_flags &= ~M_FIBNUM; \
>>     _m->m_flags |= (M_HAVEFIB|((_fib & M_FIBMASK) << M_FIBSHIFT));\
>> } while (0)
>>
>> This then becomes very easy to change to use a tag or
>> whatever is needed in later versions , and the number can
>> be expanded past 8 predefined  FIBs at that time..
> 
> If you want it to be a tag, why spent bits in m_flags and not just do it 
> as a tag at once? Or it is supposed to completely throw away 6.x 
> (possibly 7.x too) implementation in favor of right thing in 8.0 ?

basically yes..

I'm looking at just doing tags to start with, but haven't done it 
yet.. I'm looking for a good bit of tag code to copy :-)

> 
>>>> This brings us as to how the correct FIB is selected for an outgoing
>>>> IPV4 packet.
>>>>
>>>> Packets fall into one of a number of classes.
>>>> 1/ locally generated packets, coming from a socket/PCB.
>>>>     Such packets select a FIB from a number associated with the
>>>>     socket/PCB. This in turn is inherited from the process,
>>>>     but can be changed by a socket option. The process in turn
>>>>     inherits it on fork. I have written a utility call setfib
>>>>     that acts a bit like nice..
>>>>
>>>>         setfib -n 3 ping target.example.com # will use fib 3 for ping.
> 
> Pretty cool!

or, (and I've done this)

  setfib 3 /bin/sh

now by default everythign you do uses table 3.
or even

setfib 3 jail {blah}

and all the procs in the jail use table 3. You also need to do
setfib 3 jexec xxx
for extra processes you add to the jail afterwards.

> 
>>>> 2/ packets received on an interface for forwarding.
>>>>     By default these packets would use table 0,
>>>>     (or possibly a number settable in a sysctl(not yet)).
>>>>     but prior to routing the firewall can inspect them (see below).
>>>>
>>>> 3/ packets inspected by a packet classifier, which can arbitrarily
>>>>     associate a fib with it on a packet by packet basis.
>>>>     A fib assigned to a packet by a packet classifier
>>>>     (such as ipfw) would over-ride a fib associated by
>>>>     a more default source. (such as cases 1 or 2).
> 
> Sounds good. I like idea to do routing decisions in firewall, to not 
> double kernel code and userspace utilities, like in Linux' iproute2 
> (which, however, still have a few parameters and relies on firewall 
> marks for others). However, there are some cases, I think, where it 
> could be done outisde firewall. For example, make an ifconfig option to 
> use a specific FIB as a default for all packets outgoing from this 
> interface's address. But here arises another related question - Linux 
> allows to select a specific src IP based on a routing table entry - 
> destination address (thoughts about pf reply-to/route-ro, huh).

that is default here too if I understand what you are talking about.
teh src address is selected from the routing table's exit interface.
In the code I'm showing in perforce, that address would depend on 
which table your process was associated with. (or just the socket if 
you have used the socket option on it before doing the bind/connect)

> In 
> relation to this I can remember multipath routing (different metrics?), 
> addresses from one subnet on different ifaces (mask wider /32) and so on.
> Also it is interesting, how multiple FIBs would interact with host-wide 
> events, such as ICMP redirects (which table should be updated?), storing 
> of TCP stack metrics (MTU, etc.) and hostcache, and so on. How these and 
> above will be solved?..
> 

I'm not really too knowledgeable about multicast..

> per ifconfig (>1 host per subnet)/icmp redirects/src to prefer, 
> multipath/metrics, tcp stack parameters interaction, iproute2

I'm not trying to solve problems that need vimage to solve them..

> 
>>>> Routing messages would be associated with their
>>>> process, and thus select one FIB or another.
> 
> This is not clear. How should the 'route' command work with different 
> FIBs, if they are supposed by admin to be used for forwarding, and not 
> the straight per-process? I think a setfib option is more consistent 
> than running route under setfib command. Also, routing sockets and 
> routing daemons - should they work with only one table?..

if you do
setfib 3 route get 1.1.1.1

you may get a different result from

setfib 2 route get 1.1.1.1


I will add a fibnum argument to route itself as well but it's not 
needed immediately as long as I have the setfib command.

> 
>>>> I have not yet added the changes to ipfw.
> 
> Action modifier, like 'ipfw add count setfib 3 ip from any to any' ? 
> There were thoughts (I heard,t as a hack before multiple FIBs) about 
> making an additional, say, 'nexthop' ipfw action, which acts like fwd, 
> but does not accept packet, allowing to continue it through firewall 
> ruleset - thus making it more comfortable to separate routing (imagine 
> 'nexthop tablearg') and filtering. There are questions with both fwd and 
> new supposed option: will fwd still survive? Will it change the output 
> interface, like as complete rerouting before calling pfil(9) hooks, so 
> that *oif will be changed to be mathed iin rules below? pf 
> route-to/reply-to is hanging around...

The 'nexthop' cal you suggest is problematic because it needs to 
return information immediately. which is why it is terminal.

As for the setfib ipfw action, I have now done this in p4.

ipfw add 200 setfib 3 ip from any to any in receive em0

now works.
This lessens the need for associating a fib with an interface as the 
firewall can do that too..

the setfib rule is not terminal. (hmm need to check I did that right.)

you can also do
ipfw add 200 skipto 300 ip from any to any hasfib
  # to select on a packet that has a fib associated with it already.
ipfw add 200 skipto 300 ip from any to any fib 4
  # to slelect packets that are associated with fib 4
ipfw add 200 clrfib ip from any to any
  # to remove a fib association from the packet.

> 
>>>> pf has some similar changes already but they seem to rely on
>>>> the various FIBs having symbolic names. Which I do not plan to support
>>>> in the first version of these changes.
> 
> I think this is what pf team should care about, not we, as it lives in 
> ../contrib. Though they can use something like sysctl with 
> symbolic-name-to-system-FIB-number translator or such.
> 
>>>> Interaction with the ARP layer/ LL layer would need to be
>>>> revisited as well. Qing Li has been working on this already.
> 
> Oh yes, L2 interaction is interesting. How it should work in case of 
> planned separation of routing and ARP tables?..
> 




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?477D2EF3.2060909>