Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Dec 2006 01:29:31 -0800
From:      Luigi Rizzo <rizzo@icir.org>
To:        Max Laier <max@love2party.net>
Cc:        freebsd-ipfw@freebsd.org
Subject:   Re: Better "hash_packet6"
Message-ID:  <20061206012931.A56288@xorpc.icir.org>
In-Reply-To: <200612060451.58473.max@love2party.net>; from max@love2party.net on Wed, Dec 06, 2006 at 04:51:51AM %2B0100
References:  <200612052010.36789.max@love2party.net> <20061205161744.A48319@xorpc.icir.org> <200612060451.58473.max@love2party.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Dec 06, 2006 at 04:51:51AM +0100, Max Laier wrote:
> On Wednesday 06 December 2006 01:17, Luigi Rizzo wrote:
...
> > First, this proposal, with 36 multiplies and one division, the
> > function seems rather expensive for e.g. a low end cpu (arm or
> > soekris) as you might find on network-appliance boxes.
> > Any chance to get performance numbers on that hardware ?
> 
> I tried the reference machines (see hacked up attachment):
> 78x ia64
> 40x amd64
> 60x p3
> 16x p4

i assume the first number is the slowdown between the current and
the proposed method ?
I have slightly modified/extended the program adding the hsieh hash
that i mentioned below, and made it easy to add more methods. the
code is at

	http://info.iet.unipi.it/~luigi/hc.c

as expected, especially the simplest algorithms depend a lot
on cache effects so if you change the number of packets or the number
of loops things vary a lot. In any case, on a soekris (the default
parameters are too high):

	# ./hc 1000 100
	starting algorithm hash_pkt5 1000 loops 100 packets
	took 87862 usec, 0.878620 per cycle
	starting algorithm hash_hsieh 1000 loops 100 packets
	took 1082883 usec, 10.828830 per cycle
	starting algorithm hash_pkt6 1000 loops 100 packets
	took 2697178 usec, 26.971780 per cycle
	# ./hc 100 10000
	starting algorithm hash_pkt5 100 loops 10000 packets
	took 2023610 usec, 2.023610 per cycle
	starting algorithm hash_hsieh 100 loops 10000 packets
	took 11619238 usec, 11.619238 per cycle
	starting algorithm hash_pkt6 100 loops 10000 packets
	took 27739595 usec, 27.739595 per cycle
(here we probably overflow some cache so the simple algorithm
suffers a lot by increasing the number of different packets)

on my new 3 GHz pentium D

	> ./hc 1000 100
	starting algorithm hash_pkt5 1000 loops 100 packets
	took 1258 usec, 0.012580 per cycle
	starting algorithm hash_hsieh 1000 loops 100 packets
	took 16152 usec, 0.161520 per cycle
	starting algorithm hash_pkt6 1000 loops 100 packets
	took 25485 usec, 0.254850 per cycle
	> ./hc 100 10000
	starting algorithm hash_pkt5 100 loops 10000 packets
	took 12870 usec, 0.012870 per cycle
	starting algorithm hash_hsieh 100 loops 10000 packets
	took 162510 usec, 0.162510 per cycle
	starting algorithm hash_pkt6 100 loops 10000 packets
	took 248003 usec, 0.248003 per cycle

Surely we need to experiment a bit more, but the cost
is significant especially on low end boxes.
Maybe we could restrict the hash to just a part of the address ?

	cheers
	luigi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061206012931.A56288>