Date: Wed, 6 Dec 2006 01:29:31 -0800 From: Luigi Rizzo <rizzo@icir.org> To: Max Laier <max@love2party.net> Cc: freebsd-ipfw@freebsd.org Subject: Re: Better "hash_packet6" Message-ID: <20061206012931.A56288@xorpc.icir.org> In-Reply-To: <200612060451.58473.max@love2party.net>; from max@love2party.net on Wed, Dec 06, 2006 at 04:51:51AM %2B0100 References: <200612052010.36789.max@love2party.net> <20061205161744.A48319@xorpc.icir.org> <200612060451.58473.max@love2party.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Dec 06, 2006 at 04:51:51AM +0100, Max Laier wrote: > On Wednesday 06 December 2006 01:17, Luigi Rizzo wrote: ... > > First, this proposal, with 36 multiplies and one division, the > > function seems rather expensive for e.g. a low end cpu (arm or > > soekris) as you might find on network-appliance boxes. > > Any chance to get performance numbers on that hardware ? > > I tried the reference machines (see hacked up attachment): > 78x ia64 > 40x amd64 > 60x p3 > 16x p4 i assume the first number is the slowdown between the current and the proposed method ? I have slightly modified/extended the program adding the hsieh hash that i mentioned below, and made it easy to add more methods. the code is at http://info.iet.unipi.it/~luigi/hc.c as expected, especially the simplest algorithms depend a lot on cache effects so if you change the number of packets or the number of loops things vary a lot. In any case, on a soekris (the default parameters are too high): # ./hc 1000 100 starting algorithm hash_pkt5 1000 loops 100 packets took 87862 usec, 0.878620 per cycle starting algorithm hash_hsieh 1000 loops 100 packets took 1082883 usec, 10.828830 per cycle starting algorithm hash_pkt6 1000 loops 100 packets took 2697178 usec, 26.971780 per cycle # ./hc 100 10000 starting algorithm hash_pkt5 100 loops 10000 packets took 2023610 usec, 2.023610 per cycle starting algorithm hash_hsieh 100 loops 10000 packets took 11619238 usec, 11.619238 per cycle starting algorithm hash_pkt6 100 loops 10000 packets took 27739595 usec, 27.739595 per cycle (here we probably overflow some cache so the simple algorithm suffers a lot by increasing the number of different packets) on my new 3 GHz pentium D > ./hc 1000 100 starting algorithm hash_pkt5 1000 loops 100 packets took 1258 usec, 0.012580 per cycle starting algorithm hash_hsieh 1000 loops 100 packets took 16152 usec, 0.161520 per cycle starting algorithm hash_pkt6 1000 loops 100 packets took 25485 usec, 0.254850 per cycle > ./hc 100 10000 starting algorithm hash_pkt5 100 loops 10000 packets took 12870 usec, 0.012870 per cycle starting algorithm hash_hsieh 100 loops 10000 packets took 162510 usec, 0.162510 per cycle starting algorithm hash_pkt6 100 loops 10000 packets took 248003 usec, 0.248003 per cycle Surely we need to experiment a bit more, but the cost is significant especially on low end boxes. Maybe we could restrict the hash to just a part of the address ? cheers luigi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061206012931.A56288>