Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Apr 1998 21:06:04 +0100
From:      Chrisy Luke <chrisy@flix.net>
To:        freebsd-hackers@FreeBSD.ORG
Subject:   Beta 3 release of Multipath routing and friends.
Message-ID:  <19980430210604.09329@flix.net>

next in thread | raw e-mail | index | archive | help

--akKZr9L6Hm6aLOcr
Content-Type: text/plain; charset=us-ascii
Content-Description: Mail message

ftp://ftp.flirble.org/pub/unix/hacks/FreeBSD/mpath.b3.tgz

README attached.

A few fixes to the Multipath code. The metric stuff and the persistant
route caching will come in b4.

This code mostly adds support to the ipfw interface and code to support
two things, which are based on the same thing:

 * Directing INCOMING traffic that match rules to a LOCAL TCP port.
   This is intended for transparent proxying without external calls
   to a LKM, it also doesn't touch the packet, so getsockname() works
   so there's also no need for a subsequent IOCTL to work out what the
   original destination/port was.
   It's freaky seeing random remote IP's listed as "Local addresses"
   in netstat! BSD-router-speed transparent diversion... :-)

 * Modifying the next-hop address of OUTBOUND traffic that matches the
   rule. My intention for this is to direct web traffic from a core
   router to a transparent proxy. David Sharnoff also wanted something
   similar, and the functionality of this thus extends to doing a route
   table lookup on the specified next-hop and using the route to it,
   meaning the next-hop doesn't need to be on a directly reachable
   interface. Remember though, this code only forwards to a directly
   reachable machine! It doesn't deliver it to the specified next-hop!
   TCP port numbers are ignored if this rule comes into affect.

The rule-based forwarding mechanism is independant of the Multipath
stuff, but does have multipath code in it if multipath is compiled in.

Currently on rule-based forwarding there's a douvle-route-table penalty
on the outbound traffic. I'll probably address this in b4 also.

Chris.
-- 
== chris@easynet.net, chrisy@flix.net, chrisy@flirble.org.
== Head of Systems for Easynet Group PLC.

--akKZr9L6Hm6aLOcr
Content-Type: text/plain; charset=us-ascii
Content-Description: Multipath/Rule-based forwarding README file
Content-Disposition: attachment; filename="README.MPATH"

Multipath Routing for FreeBSD (and friends)
-------------------------------------------

Beta 3 release. 30 April 1998. Maintained by chrisy@flix.net.

Patches against FreeBSD-2.2.6-STABLE (last cvsup'ed on 21 April 1998).

The tar from which this came has two sets of code in it.

One is for Multipath routing. One is for Firewall rule-based packet
forwarding, where firewall rules can change the next-hop of packets
that match them.

You can have one, other both or none as you so desire, just include/
don't include the relevant "options" lines as shown below.

This release has only minor fixes to the Multipath code.
This release introduces the rule-based forwarding code.

Installation
------------

Please follow these in order...

In the tar file are a number of diffs all relative to /usr/src.
One is for the sys/ tree (the kernel source) and the others
for ifconfig(8), netstat(8) and route(8). I recommend you make copies
of these binary sources into directories called "name.mpath" firstly
for backup, secondly because cvsup overwrties files with changes with
their current versions! Patch everything up as usual. If you don't know
how, you shouldn't be playing with this code...
The rule-based forwaring code requires a new ipfw(8) to be compiled. I
recommend copying /usr/src/sbin/ipfw/ to ipfw.fwd and patching/compiling
it in there.

You will need to copy the patched sys/net/route.h to
/usr/include/net/route.h, taking a copy of the original first, of course.
Similarly, for the rule-based forwarind package, you will need to copy
the patched sys/netinet/ip_fw.h to /usr/include/netinet/ip_fw.h.

There is an example kernel config file in sys/i386/conf/QBert-MPath.
The important lines in this file are:

options         EA_MULTIPATH
options         "EA_N_MULTIPATH=4"
options         "MSIZE=256"

options		EA_FWD

The first two enable the patched code to be compiled in (all the code
in the kernel is delimited with #ifdef EA_MULTIPATH - I hope) and the
number of multipath gateways to support for each destination. This has
to be hardcoded, I'm afraid, without a major rewrite of nearly everything.

EA_FWD enables the rule-based forwarding package. Again, all the code is
delimited with EA_FWD. You can also enable EA_FWD_DEBUG for debug messages
that are probably meaningless to anyone but me. :-)

MSIZE is the size of an mbuf. The extra data made routing socket messages
too big for a single mbuf. This shouldn't be too terrible an overheard,
many people have been considering making this the default for a long time.

Compile the kernel, compile the patches binaries. You will also need to
recompile arp(8) (no patches necessary) and anything else that makes
use of the routing socket (or #include's route.h) to make them understand
the new message structure. Particularly braindead code may need hacking
to get it right (see below).

Install it all, reboot, and hope for the best...

...with this code, I can type...

bash-2.01# route add default -gateway 193.131.248.183 -gateway 195.40.1.1

...and get a routing table that looks like...

bash-2.01# netstat -rn
Routing tables

Internet:
Destination        Gateway            Flags     Refs     Use     Netif Expire
default                               UGSc        2      989 
                    193.131.248.183                      494      fxp2
                   *195.40.1.1                           495      fxp0
127.0.0.1          *127.0.0.1         UH          1       14       lo0
193.131.248        *link#3            UC          0        0 
193.131.248.20     *0:0:c0:a0:b1:e3   UHLW        0      357      fxp2    997
[etc]

...(note the gateways and their interfaces, also note the "Use" column)
   and thus get traceroutes like...

bash-2.01# traceroute -n 195.40.6.30
traceroute to 195.40.6.30 (195.40.6.30), 30 hops max, 40 byte packets
 1  195.40.1.1  0.629 ms 193.131.248.183  0.551 ms 195.40.1.1  0.491 ms
 2  193.131.248.1  0.620 ms 195.40.1.13  0.676 ms 193.131.248.1  0.636 ms
 3  195.40.6.30  0.975 ms  0.886 ms  0.915 ms

Note the alternating IP's on *every* response. This is what it's all about.
If the remote machine (195.40.6.30) has multipath code, then all data
between these two hosts would use both available paths.

The rule-based forwarding package allows me to do this:

bash-2.01# ipfw add 2000 forward to 195.40.1.2,23 tcp from any to any 23

to get:

bash-2.01# ipfw show
01000         16        986 allow ip from any to any via lo0
01010          0          0 deny ip from 127.0.0.0/8 to 127.0.0.0/8
02000        489      32817 forward to 195.40.1.2,23 tcp from any to any 23
65000      29109    3058452 allow ip from any to any
65535          0          0 deny ip from any to any

Which traps all packets that go *into* this machine, destined for
*any* address, destined for port 23 (the telnet service) and send it
to 195.40.1.2, port 23. Because this address is local, it keeps it
for itself.

If it wasn't local, it would only affect *outbound* packets, but the
port number rewriting doesn't then happen either, but it allows you to
control the next-hop. If that next-hop isn't directly reachable, it
looks up the route to that next-hop and uses that for the gateway
for the packet (note, this second lookup *is* multipath code also,
if multipath is enabled. Because of this there is a small penalty in having
two routing table lookups (something I will address in the next release
pf this code). This is only a huge issue on routers with large routing
tables.

Fixing things (programs) that don't work
----------------------------------------

Also in the tar is share/man/man4/route.mpath.4 which you may want
to install.
gzip -c share/man/man4/route.4 > /usr/share/man/man4/route.4

It has a brief descriprion of the changed routing socket messages.  The
notable difference is that the RTA_GATEWAY data is at the *end* of the
message now instead of the middle, and that it's formed from a bit 
pattern that specifies how many gateways are in the message (which 
follow each other, at the end of said message). Code that doesn't follow
the values of the RTA masks, even for single-gateway destinations, will
have unpredictable results!

Old versions of ipfw(8) won't work with the new ipfw structures in the
kernel, either. I've included a share/man/man4/ipfirewall.8 that
you may want to:
gzip -c share/man/man4/ipfirewall.4 > /usr/share/man/man4/ipfirewall.4

Other things
------------

I've updated most man pages. I've probably missed a few, but route(8)
is the most important in my mind. Route(8) has relatively good visual
support for multipath routes. I've not yet defined a mechanism in the
routing socket for passing multiple interface information, so it won't
display that there. It works for netstat(8) because it accesses the kernel
memory directly.

I'm working on GateD. I've got GateD3_6Alpha_2 doing something half
sensible, but not quite. When I've done GateD4 I'll submit those changes
back to the Consortium.

I'm not going to touch routed. Someone else can do that, if they want. I'll
include diffs here should I recieve them.

I'm going to code in metrics to multipath routes as well as persistant-
route multipath in the firewall stuff. The latter of these will mean that
when traffic to a certain destination has been routed, it remembers for
a while where it sent it, so that should the next hop catch said traffic,
there's a good chance that all fragments of a TCP connection go to the
same host... it also improves the effectiveness of multiple transparent
web proxies, by sending traffic that was destined for a certain site to
the same proxy that has already cached it.

Bugs, comments, flames to chris@easynet.net or chrisy@flix.net.

--akKZr9L6Hm6aLOcr--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980430210604.09329>