Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Aug 2002 21:30:08 +0300
From:      Ruslan Ermilov <ru@FreeBSD.ORG>
To:        Bakul Shah <bakul@bitblocks.com>
Cc:        Luigi Rizzo <rizzo@icir.org>, net@FreeBSD.ORG
Subject:   Re: Consistency of cached routes
Message-ID:  <20020813183008.GA29747@sunbay.com>
In-Reply-To: <200208131722.NAA21497@wellington.cnchost.com>
References:  <20020813021333.A4507@iguana.icir.org> <200208131722.NAA21497@wellington.cnchost.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--J/dobhs11T7y2rNN
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Aug 13, 2002 at 10:22:19AM -0700, Bakul Shah wrote:
> > ip_forward, ipflow and TCP/UDP sockets cache a copy
> > of the result of route lookups, but these entries might be
> > out-of-date when a routing update is performed. Ruslan just MFC'ed
> > a fix to invalidate the (one-entry) cache in ip_forward, but
> > the other two can still be inconsistent.
>=20
> If a host route to ADDR is cloned from a route to ADDR/n and
> a route to ADDR/n+k is added, the cloned route to ADDR must
> be deleted.  This is already taken care of in rt_fixchange()
> -- see the last comment in this function.  The next time a
> pkt to ADDR is delivered, a new ADDR route will be cloned
> from ADDR/n+k.
>=20
Correct.

> TCP/UDP sockets cache a host route.  If it is a cloned route
> it will get the above treatment.  So the issue is really only
> for cached net routes that are less specific than a newly
> added route.  Does ipflow cache those?
>=20
Everything that is allocated using rtalloc_ign(, RTF_PRCLONING)
should be taken care of (or a raw form using rtalloc1()).

> > ok, so i guess it is time to instrument route lookups and see
> > how expensive they are, and sort out what is the best way to
> > solve the tradeoff between caching and potential inconsistencies.
>=20
> Any incosistency needs to be fixed.
>=20
> > The timestamp idea is good because it has constant cost, though
> > on a box with many routing updates it might completely defeat
> > the cache.
>=20
> You can use a long long counter as a virtual timestamp.  Even
> if you allow  a million route changes per second, this number
> won't roll over for almost 300000 years!  But this will cause
> all cached entries to be flushed any time the route table was
> updated + every cache use just got a little more expensive.
> On the other hand now you can afford to keep a bigger cache.
> Caching just one forwarding entry doesn't make a lot of sense
> for a router.
>=20
> > Ideas anyone ? This might be a problem with a known efficient solution.
>=20
> I just thought of a simple way.
>=20
> When a route to ADDR/n is added, for all routes to ADD/n-k,
> (k in 0,1..n) with refcnt > 0, *move* the route entry!  That
> is, allocate a new data structure, do a memcpy from old to
> new and move its sole route table reference from the old
> entry to the new entry.  Mark the original rtentry data
> structure as down.  It still exists but is not accessible
> from the route table; it is accessible only from its cached
> references.  Upon next use of such a ref it will be found to
> be down, so its refcount gets decremented (and the structure
> deleted when refcnt drops to 0) and a new rtalloc will be
> done.
>=20
Hmm, it has a non-obvious impact on rt_gwroute's, but the idea
sounds promising.

> Second, this entirely avoids the need to walk the whole tree
> on addition/deletion/change of any route.
>=20
It was never an intent.

> Third, we can dispense with the silliness of cloned routes
> alltogether and go back to keeping a separate host route
> cache.
>=20
Stevens says, in TCP/IP Ill. Vol 2 in section 18.12:

: The Patricia tree data structure is well suited to routing tables.
: Routing table lookups occur much more frequently than adding or
: deleting routes, so from a performance standpoint using Patricia
: trees for the routing table makes sense.  Patricia trees provide
: fast lookups at the expense of additional work in adding and
: deleting.  Measurements in [Sklower 1991] comparing the radix
: tree approach to the Net/1 hash table show that the radix tree
: method is about two times faster in building a test tree and
: four times faster in searching.


Cheers,
--=20
Ruslan Ermilov		Sysadmin and DBA,
ru@sunbay.com		Sunbay Software AG,
ru@FreeBSD.org		FreeBSD committer,
+380.652.512.251	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age

--J/dobhs11T7y2rNN
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (FreeBSD)

iD8DBQE9WVAwUkv4P6juNwoRAt9CAJ9FVrSbs+lD7yce5lyRYo0f5RzMagCfRaG7
K0v5v0ICyXrzknwSiPHTTpc=
=7Gn+
-----END PGP SIGNATURE-----

--J/dobhs11T7y2rNN--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020813183008.GA29747>