Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 31 Mar 2007 11:47:12 +0100
From:      Max Laier <max@love2party.net>
To:        freebsd-ipfw@freebsd.org
Cc:        Luigi Rizzo <rizzo@icir.org>, Andre Oppermann <andre@freebsd.org>, Julian Elischer <julian@elischer.org>, FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: IPFW update frequency
Message-ID:  <200703311247.19940.max@love2party.net>
In-Reply-To: <20070331022741.A94927@xorpc.icir.org>
References:  <460D75CE.70804@elischer.org> <460E19EE.3020700@freebsd.org> <20070331022741.A94927@xorpc.icir.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart1702159.HoLqzXUo7s
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On Saturday 31 March 2007 11:27, Luigi Rizzo wrote:
> On Sat, Mar 31, 2007 at 10:21:02AM +0200, Andre Oppermann wrote:
> > Julian Elischer wrote:
> > > Luigi Rizzo wrote:
> > >> On Fri, Mar 30, 2007 at 01:40:46PM -0700, Julian Elischer wrote:
> > >>> I have been looking at the IPFW code recently, especially with
> > >>> respect to locking.
> > >>> There are some things that could be done to improve IPFW's
> > >>> behaviour
>
> ...
>
> > The locking overhead per packet in ipfw is by no means its limiting
>
> i think you and Julian are looking at different issues.
> if i understand julian's comment, the problem is that the list
> is protected by a single lock, so no hope of parallelising

ipfw uses rwlocks for the static rules quite some time now.  In contrast=20
to Julian, I don't believe that the claimed lock order reversal with a=20
rlock() can be the cause of a deadlock (exclusiveness is a precondition). =
=20
Haveing been involved in the hacks that went in and out of ipfw and pfil=20
locking over the last few years and the problems that went along with it,=20
I'd urge everybody to *not* rush any more hacks into this.

> the work, and if one kernel thread is busy processing a packet
> in the filter, others might be blocked for a long time
> (in your case, the set of 30 rules is 765ns for ipfw and 1198ns
> for pf).
>
> Your tests presumably have little if any contention on the lock.

Most likely none at all, since the forwarding path takes care of=20
serialization.

> Specifically, if you compute the difference of the inverses
> of those pps rates you see the following:
>
> 	+pfil_pass	45.3 ns	per packet
>
> 	+ipfw_allow	+253.4 ns/packet (setup and first rule)
> 	+ipfw_30	+17.67 ns/(packet * extra rule)
>
> 	+pf_pass	+376.9 ns/packet (setup and first rule)
> 	+pf_30		+28.34 ns/(packet * extra rule)
>
>
> the lock acquisition cost is in the 'setup' part but i cannot tell
> how expensive it is.
> Julian's suggested change (and surely the one i described)
> replaces the lock/unlock pair on the rule list with a refcount add/dec
> pair (with uncontested locks the cost should be similar), but
> especially makes the operation non-blocking allowing running the input
> and output paths in parallel.

See above, ipfw is working in parallel already.  In addition to that,=20
using a ref-count would be worse!  Instead of two atomic operations you'd=20
then have to pay for four: lock ref unlock work lock unref unlock  All of=20
which can contentend each other.  This will most likely cause more=20
serialization than we currently have.  Again, please don't rush any=20
hacks!

> > factor.  Actually it's a very small part and pretty much any work on
> > it is lost love.  It would be much better spent time to optimize the
> > main rule loop of ipfw to speed things up.  I was profiling ipfw
> > early last year with an Agilent packet generator and hwpmc.  In the
> > meantime the packet forwarding path (w/o ipfw) has been improved but
> > relative to each other the number are still correct.
>
> actually your numbers show that at least the rule setup (and the
> processing of simple rules) is significantly faster (50% or so) in
> ipfw2 than in pf.

Note that pf includes a plethora of sanity checks in the default rule=20
processing.  Also note that pf - due to it's stateful design - does=20
a "check state" first for every packet.  This gives a big mallus in this=20
test special test.

> I know that the setup time is expensive, but i am not sure that
> one can save much - in both cases, you need to fetching a lot
> of information, which is scattered in variable locations in
> the mbuf and packet headers.

Agreed.  For the ipfw case it *might* make sense to reach into the upper=20
layers only if requested - not at all sure about that, however.

=2D-=20
/"\  Best regards,                      | mlaier@freebsd.org
\ /  Max Laier                          | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | mlaier@EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News

--nextPart1702159.HoLqzXUo7s
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQBGDjw3XyyEoT62BG0RAovAAJ9/eLdEZfjHiEEwICMt4CQTGbH3BwCePJgT
4nAXEG6PMYYNEMmLp+gg1e8=
=E5Tl
-----END PGP SIGNATURE-----

--nextPart1702159.HoLqzXUo7s--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200703311247.19940.max>