Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 6 May 2008 09:48:30 +0200
From:      Roman Divacky <rdivacky@FreeBSD.ORG>
To:        hackers@FreeBSD.ORG
Subject:   Re: hashinit versus phashinit
Message-ID:  <20080506074830.GA70848@freebsd.org>
In-Reply-To: <20080506062556.GA68171@zim.MIT.EDU>
References:  <20080505204350.GA45321@freebsd.org> <20080506062556.GA68171@zim.MIT.EDU>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, May 06, 2008 at 02:25:56AM -0400, David Schultz wrote:
> On Mon, May 05, 2008, Roman Divacky wrote:
> > hi
> > 
> > when we want to use a hash table in kernel we call "hashinit" which
> > initializes a hash table with power-of-2 size. There's also "phashinit"
> > that creates hash table of size that is a prime number. This was
> > added in 1995 by davidg@ but it is not used anywhere in the kernel.
> > 
> > phk@ commited rev. 1.30 of vfs_cache.c replacing phashinit with hashinit
> > stating that it's better because it replaces a division with logical and.
> > is this reason still valid today? (the commit was done almost 11 years ago)
> > 
> > is there still any reason why not use the phashinit instead of hashinit?
> > I believe using prime-sized hash table might have positive performance
> > impact...
> 
> There's a tradeoff.
> 
> The argument for using powers of 2 is that division takes many
> times longer than a logical AND.
> 
> The argument for using primes is that if your hash function isn't
> as good as you thought it was, or if the data has some regularity
> you weren't expecting, you can get screwed a lot more easily with
> a power of 2 hash table.  With a prime-sized hash table, you only
> get screwed if lots of your data is congruent modulo the prime,
> which is very rare.
> 
> Most general-purpose hash implementations I've used (e.g., GNU
> libstdc++, Sun JDK, Microsoft .NET) use prime table sizes,
> probably to make it less likely that programmers will shoot
> themselves in the foot with pathological data or bad hash functions.

yes... a division takes roughly 40 cycles on modern i386 hw, while
and is just 1, but does it matter compared to the access times of
memory these days?

the ratio between cpu-speed/mem-speed has changed a lot. I am not
arguing if it's still true that avoiding the division helps the performance
these days...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080506074830.GA70848>