From owner-cvs-src@FreeBSD.ORG  Sat Nov 15 02:35:51 2003
Return-Path: <owner-cvs-src@FreeBSD.ORG>
Delivered-To: cvs-src@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D822F16A4CF
	for <cvs-src@FreeBSD.org>; Sat, 15 Nov 2003 02:35:50 -0800 (PST)
Received: from mailtoaster1.pipeline.ch (mailtoaster1.pipeline.ch
	[62.48.0.70])	by mx1.FreeBSD.org (Postfix) with ESMTP id B0F9843FEC
	for <cvs-src@FreeBSD.org>; Sat, 15 Nov 2003 02:35:47 -0800 (PST)
	(envelope-from oppermann@pipeline.ch)
Received: (qmail 85281 invoked from network); 15 Nov 2003 10:38:42 -0000
Received: from unknown (HELO pipeline.ch) ([62.48.0.54])
          (envelope-sender <oppermann@pipeline.ch>)
          by mailtoaster1.pipeline.ch (qmail-ldap-1.03) with SMTP
          for <rizzo@icir.org>; 15 Nov 2003 10:38:42 -0000
Message-ID: <3FB60181.4256A519@pipeline.ch>
Date: Sat, 15 Nov 2003 11:35:45 +0100
From: Andre Oppermann <oppermann@pipeline.ch>
X-Mailer: Mozilla 4.76 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Luigi Rizzo <rizzo@icir.org>
References: <200311142102.hAEL2Nen073186@repoman.freebsd.org>
	<20031114153145.A54064@xorpc.icir.org> <3FB593F5.1053E7E2@pipeline.ch>
	<20031115002921.B68056@xorpc.icir.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: cvs-src@FreeBSD.org
cc: src-committers@FreeBSD.org
cc: cvs-all@FreeBSD.org
Subject: Re: cvs commit: src/sys/netinet in_var.h ip_fastfwd.c ip_flow.c 
 ip_flow.h ip_input.c ip_output.c src/sys/sys mbuf.h src/sys/conf files 
 src/sys/net if_arcsubr.c if_ef.c if_ethersubr.c if_fddisubr.c 
 if_iso88025subr.c if_ppp.c
X-BeenThere: cvs-src@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: CVS commit messages for the src tree <cvs-src.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-src>
List-Post: <mailto:cvs-src@freebsd.org>
List-Help: <mailto:cvs-src-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Nov 2003 10:35:51 -0000

Luigi Rizzo wrote:
> 
> [mu
> On Sat, Nov 15, 2003 at 03:48:21AM +0100, Andre Oppermann wrote:
> > Luigi Rizzo wrote:
> ...
> > > Given that there are large segments of common code between
> > > ip_fastforward() and ip_input()/ip_output() (i am thinking of the
> > > entire ipfw handling code, for one, and also some basic integrity
> > > checks, the fragmentation code, etc.) I also wonder if it wouldn't
> > > be beneficial to put the optimizations into the standard path rather
> > > than create a new (partial) replica of the same code, with the
> > > potential of introducing bugs, and with some substantial I-cache
> > > pollution which might well destroy the benefits of minor optimizations.
> >
> > I don't see much cache pollution here. Normally you use ip_fastforward
> 
> i said I-cache, not data cache. Even a routed does some substantial
> amount of local communication (bgp and routing processes etc.) so

Here on my CORE2 router (4.8-REL) with two full and 130 peering BGP4
feeds I see about three to four route changes per second which make
it to the kernel (route -nv monitor). Out of 303'394'603 packets
44'721'091 were for itself. But that is probably bogus since the
counters are only 32-bit (and the machine has a uptime of 63 days).
So the overall packet count has wrapped at least once (if not more)
and is more likely to be 4'303'394'603. So about 1 percent of all
packets (as I said it's probably even less because it has wrapped
more than that) are for machine. All other packets use the fast
path. To put this more into perspective wrt counter wrapping, on
my interfaces I have a byte counter wrap every 40 minutes or so.
So the true ratio is probably even far less than one percent and
more in the region of one per mille. The wrapping looks really ugly
on MRTG and RRtool graphs. Interface counters should be 64bit or
they become useless with todays traffic levels...

> i am pretty sure that in any non-trivial case you will end up having
> both the slow path and the fast path conflicting for the instruction
> cache. Merging them might help -- i have seen many cases where
> inlining code as opposed to explicit function calls makes things
> slower for this precise reason.

I will try to measure that with more precision. You did have
code which was able to record and timestamp events several
thousand times per second. Do still have that code somewhere?

> > > Minor comments on the code:
> > >
> > >   + one of the initial comments in the new code states
> > >
> > >         ... The only part of the packet we touch with the CPU is the
> > >         IP header. ...
> > >
> > >     this is not true if you use ipfw because that code touches many
> > >     places in the packet (and can also do some expensive computation
> > >     like trying to locate the uid/gid of a packet; the fact that we
> > >     only deal with packets not for us does not prevent the existence
> > >     of such firewall rules).
> >
> > Well, as I said, everybody is free to shoot himself with such highly
> > complex firewall rules. I'd say the ipfw code could be optimized with
> > some of the ideas I've specified earlier. I don't think the ipfw code
> > would do a uid/gid lookup if neither the destination nor source is
> 
> i was just saying that the comment is untrue.

Ok, I will modify it say something like "as long firewalling is not
going through the whole packet" or so.

> > >   + could you clarify the divert logic ? I am a bit rusty with that
> > >     part of the code, but i am under the impression that in
> > >     ip_fastforward() you are passing along args.divert_rule and
> > >     losing track of divert_info which is instead what you need too.
> >
> > It's not you being rusty, the code is indeed hard to follow. :-/
> > divert_info is used for ip packet reassembly. ip_divert() is then
> > just using it to determine whether the packet was catched on the
> > way into the machine or out of it. It seems to have it's largest
> > significance for the ip reassembly. I've tested that too with an
> > earlier version of my code. However I will redo those tests to be
> > sure it is working as expected.
> 
> ok, the specific case where i think it fails is when you divert a
> fragmented packet -- your code seems to store the divert_info
> (the port you divert to) into divert_rule, and lose track of
> the former.

It looks like I need both of them...

-- 
Andre