Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 08 Jan 2008 23:05:12 +0100
From:      Andre Oppermann <andre@freebsd.org>
To:        "Li, Qing" <qing.li@bluecoat.com>
Cc:        Qing Li <qingli@freebsd.org>, FreeBSD Net <freebsd-net@freebsd.org>, arch@freebsd.org, Ivo Vachkov <ivo.vachkov@gmail.com>, Robert Watson <rwatson@freebsd.org>, Vadim Goncharov <vadimnuclight@tpu.ru>, "Bruce M. Simpson" <bms@freebsd.org>, Julian Elischer <julian@elischer.org>
Subject:   Re: resend: multiple routing table roadmap (format fix)
Message-ID:  <4783F398.801@freebsd.org>
In-Reply-To: <305C539CA2F86249BF51CDCE8996AFF4096E12A7@bcs-mail2.internal.cacheflow.com>
References:  <4772F123.5030303@elischer.org>	<f85d6aa70712261728h331eadb8p205d350dc7fb7f4c@mail.gmail.com>	<477416CC.4090906@elischer.org>	<opt4c0imk24fjv08@nuclight.avtf.net>	<477D2EF3.2060909@elischer.org>	<opt4g4kcis17d6mn@nuclight.avtf.net>	<4780E5E7.2070202@FreeBSD.org><4781197F.1000105@elischer.org><opt4i0rlz317d6mn@nuclight.avtf.net>	<47814AF0.9070509@freebsd.org> <opt4mizese4fjv08@nuclight.avtf.net> <305C539CA2F86249BF51CDCE8996AFF4096E12A7@bcs-mail2.internal.cacheflow.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Li, Qing wrote:
>> To remove an ARP entry for host A.B.C.D in L2 table of form 
>> (A.B.C.D -> 00:01:02:03:04:05), it is enough to do a (usual speed) 
>> routing lookup for host A.B.C.D and modify a one pointer in 
>> it's rtentry to NULL or remove rtentry (if it's selected to 
>> be implemented as cloned). Thus, when on regular forwarding 
>> (table read) a routing lookup is done, we already have a FAST 
>> access - one pointer dereference - for it's L2 table entry, 
>> be it ARP or any other L2 type (which support becoming easily 
>> with separation of L2 and L3). And on every modification of 
>> L2 table - which is RARE - do lookup with usual speed to 
>> modify cached pointer. Compare it with a scheme where for 
>> EVERY forwarded packet, there is a need for DOUBLE lookup - 
>> after a routing one, do another in L2 table.
>>
> 
>   Is it really a double lookup though ?  
> 
>   With the current routing table that contains the ARP entries,
>   a search has to proceed pass the interface route further down 
>   the routing tree, and the depth depends on the number of ARP 
>   entries in the table.
> 
>   With L2/L3 seperation, the routing search stops at the interface
>   route, and further search for the exact entry continues
>   in a separate L2 table.
> 
>   From a high level it does seem there could be performance
>   issues such as cache invalidation problem, however, I cannot
>   quantify at this point what that degration translates into, 
>   and what impact it has on the overall scheme of things.
>   I am not sure if anyone can quantify such performance question
>   at this point.

No.  We have to profile the new implementation together with
the appropriate locking changes.

>> Current routing table implementation, with all disadvantages 
>> of combining
>> L2 and L3, have from the same combinig a one HUGE benefit - 
>> performance.  
>> And never, ever, ever, ever even try to split L2 from L3 with 
>> losing that performance - then it should be still never 
>> split, despite all disadvantages, and you'll become an enemy 
>> of many, many users. Especially while caching allows to do 
>> things reasonably fast.
>>
> 
>    No disagreement here.

We have to consider two aspects here:

  1. the locking changes (for example switching to rmlocks which
     are way less expensive than even normal rmlocks or mutexes)
     *may* compensate for the additional table lookup.

  2. architectual benefits from a clear and strict layering that
     help us to easily maintain and develop the code in the future
     *provided* the performance impact is only very small.  Having
     a clean architecture is well worth maybe one to three percent
     performance in the mid and long term IMHO.

People with the ultimate need for speed have to maintain their own
trees anyway (Bluecoat, Juniper, Sandvine, Isilon,...) and can afford
to cut some more corners anyway.  If one is tuning a machine for a
very particular purpose one can tightly glue layers together without
having to take care of general purpose principles of a generic operating
system as the stock FreeBSD is.

I'm all for squeezing out the last bit of performance in stock FreeBSD,
however not at the expense of a clean system architecture.  Almost all
attempts to cut those corners have bitten us badly after only a few
number of moons when underlying hardware realities change (see P-IV
Netburst assumptions vs. current Core2/AMD64 reality; nobody really
cares about Netburst and its horrible locking overhead anymore).

-- 
Andre




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4783F398.801>