Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Mar 2004 12:31:42 +0100 (CET)
From:      Harti Brandt <brandt@fokus.fraunhofer.de>
To:        Brooks Davis <brooks@one-eyed-alien.net>
Cc:        Max Laier <max@love2party.net>
Subject:   Re: Byte counters reset at ~4GB
Message-ID:  <20040316122840.E28777@beagle.fokus.fraunhofer.de>
In-Reply-To: <20040316014206.GA12382@Odin.AC.HMC.Edu>
References:  <2650.192.168.0.200.1079393908.squirrel@192.168.0.1> <2662.192.168.0.200.1079396323.squirrel@192.168.0.1> <2697.192.168.0.200.1079398101.squirrel@192.168.0.1> <20040316014206.GA12382@Odin.AC.HMC.Edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 15 Mar 2004, Brooks Davis wrote:

BD>On Mon, Mar 15, 2004 at 07:48:21PM -0500, Mike Jakubik wrote:
BD>> Max Laier said:
BD>>
BD>> > Sure, you measure it ;) ... no, of course it is more expensive to update a
BD>> > 64bit counter on a 32bit arch, but the key (once again) is descision:
BD>> > While
BD>> > (almost) all of the pf counters are 64bit types you can configure it not
BD>> > to
BD>> > use the loginterface or whatsoever more. So it's up to you: You need 64bit
BD>> > counters? You shall have them! You need *fast* 64bit counters? AMD sells
BD>> > nice processors (they say)! ... you get the idea.
BD>>
BD>> Got it. In just curious though... realistically, how big of an impact on
BD>> performance is this on a modern CPU? Is it not simply the original 32bit
BD>> calculation x 2?
BD>
BD>No, you have to do overflow handling so that adds some to the cost.
BD>
BD>I was curious what the actual overhead was so I ran the following
BD>program with both uint32_t and uint64_t counters.  With 64-bit counters,
BD>it was a bit over four times slower on a the dual 2.2GHz Xeon (~2sec vs
BD>~8.4sec).  On a dual opteron, the 32-bit math had a slight edge, but
BD>not much.  Intestingly, runtime was longer then on the Xeon (~3.1s for
BD>32-bit and ~3.8 for 64-bit.)
BD>
BD>If you do this test, be sure not to use any optimizer flags or the whole
BD>loop gets optimized out.
BD>
BD>-- Brooks
BD>
BD>#include <stdio.h>
BD>#include <stdint.h>
BD>
BD>int
BD>main (int argc, char **argv)
BD>{
BD>	uint32_t j = 0;
BD>
BD>	for (j = 0; j < 1000000000; j++) {}
BD>	printf("%d\n", j);
BD>}

Isn't the actual problem the required atomicity? While on 32-bit
architectures you can increment a 32-bit value without taking a lock,
you need a lock to increment a 64-bit value.

harti



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040316122840.E28777>