Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Apr 2004 00:52:58 -0700
From:      Luigi Rizzo <luigi@FreeBSD.org>
To:        Darren Reed <darrenr@hub.freebsd.org>
Cc:        cvs-src@FreeBSD.org
Subject:   Re: cvs commit: src/sys/net radix.c
Message-ID:  <20040422005258.A84320@xorpc.icir.org>
In-Reply-To: <20040422051953.GA27025@hub.freebsd.org>; from darrenr@hub.freebsd.org on Wed, Apr 21, 2004 at 10:19:53PM -0700
References:  <200404211527.i3LFRabS088245@repoman.freebsd.org> <6.0.1.1.1.20040422005919.03afaaa0@imap.sfu.ca> <20040422002143.GC60368@hub.freebsd.org> <200404220259.45651.max@love2party.net> <20040422051953.GA27025@hub.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
i knew this would be a bit controversial :)

On Wed, Apr 21, 2004 at 10:19:53PM -0700, Darren Reed wrote:
...
> I would suggest that if you (or anyone else) has trouble reading
> the code, then read Stevens.  The variable naming, etc, is not

isn't it even better if you find the documentation in the code itself
(and perhaps up to date) rather than having to resort to external
sources ?

> really the hard part of this code to grasp, it's the structures
> and how they interact and interconnect.  Look in that book, see
> where all the arrows go in the diagrams.  Perhaps I should say

that part is unchanged and i do not have any intention of changing it,
nor interfaces. BTW the hard parts of the code in radix.c (rn_addmask(),
rn_addroute(), rn_delete() and so on) are _not_ documented in Stevens'
book so there is little if any motivation to preserve the lexical format
of the code.

> On the contrary, what you need is to understand the relationship
> between all the data structures and the who/what/where/how/why
> of their use.

these were indeed part of my changes:

+/*
+ * Most of the functions in this code assume that the key/mask arguments
+ * are sockaddr-like structures, where the first byte is an u_char
+ * indicating the size of the entire structure.
+ *
+ * To make the assumption more explicit, we use the LEN() macro to access
+ * this field. It is safe to pass an expression with side effects
+ * to LEN() as the argument is evaluated only once.
+ */
...
+/*
+ * Whenever we add a new leaf to the tree, we also add a parent node,
+ * so we allocate them as an array of two elements: the first one must be
+ * the leaf (see RNTORT() in route.c), the second one is the parent.
+ * This routine initializes the relevant fields of the nodes, so that
+ * the leaf is the left child of the parent node, and both nodes have
+ * (almost) all all fields filled as appropriate.
+ * (XXX some fields are left unset, see the '#if 0' section).
+ * The function returns a pointer to the parent node.
+ */
...
+/*
+ * Convert a 'struct radix_node *' to a 'struct rtentry *'.
+ * The operation can be done safely (in this code) because a
+ * 'struct rtentry' starts with two 'struct radix_node''s, the first
+ * one representing leaf nodes in the routing tree, which is
+ * what the code in radix.c passes us as a 'struct radix_node'.
+ *
+ * But because there are a lot of assumptions in this conversion,
+ * do not cast explicitly, but always use the macro below.
+ */
+#define RNTORT(p)      ((struct rtentry *)(p))
+

> But don't forget that at least the way the radix code "was" had the
> backing of a number of chapters in a book that explain(ed/s) it all
> and so it has an advantage over IPv6/KAME...

as i said, the part that i [plan to] touch is the undocumented one.

Additionally, we need to make progress. The entire routing structure
is incredibly expensive time and spacewise -- e.g each entry in the
routing table consumes 80 bytes + the metrics + the 2 sockaddr,
which 5-10 times the space you need for an optimised implementation
of the routing tree for ipv4. Maybe it was ok when you had 1000
entries in the table, not anymore when you have large BGP tables
to push in the kernel.
We need to be able to provide specialised (and efficient) versions
of the code for selected address families -- this likely will
cause a change in the APIs, with a compatibility layer to
let old code still work.

cheers
luigi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040422005258.A84320>