From owner-freebsd-hackers@FreeBSD.ORG Tue May 6 13:05:04 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 65568106564A for ; Tue, 6 May 2008 13:05:04 +0000 (UTC) (envelope-from max@love2party.net) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.179]) by mx1.freebsd.org (Postfix) with ESMTP id 060788FC21 for ; Tue, 6 May 2008 13:05:03 +0000 (UTC) (envelope-from max@love2party.net) Received: from vampire.homelinux.org (dslb-088-064-180-001.pools.arcor-ip.net [88.64.180.1]) by mrelayeu.kundenserver.de (node=mrelayeu6) with ESMTP (Nemesis) id 0ML29c-1JtMqp30M0-0004vB; Tue, 06 May 2008 15:05:03 +0200 Received: (qmail 57580 invoked from network); 6 May 2008 13:03:30 -0000 Received: from myhost.laiers.local (192.168.4.151) by mx.laiers.local with SMTP; 6 May 2008 13:03:30 -0000 From: Max Laier Organization: FreeBSD To: freebsd-hackers@freebsd.org Date: Tue, 6 May 2008 15:00:33 +0200 User-Agent: KMail/1.9.9 References: <20080505204350.GA45321@freebsd.org> <20080506062556.GA68171@zim.MIT.EDU> <20080506074830.GA70848@freebsd.org> In-Reply-To: <20080506074830.GA70848@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200805061500.33538.max@love2party.net> X-Provags-ID: V01U2FsdGVkX19shv+QfHaHLfQr8R6EIVIGb5WYve/mk179qQZ a3rx0AzmV2Rz/gAaFHmWhjUIDapZWcWXIE6bnAUBkfCDVGNK0/ 3PWkS0JcYOOMsJEUgardg== Cc: Roman Divacky Subject: Re: hashinit versus phashinit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 May 2008 13:05:04 -0000 On Tuesday 06 May 2008 09:48:30 Roman Divacky wrote: > On Tue, May 06, 2008 at 02:25:56AM -0400, David Schultz wrote: > > On Mon, May 05, 2008, Roman Divacky wrote: > > > hi > > > > > > when we want to use a hash table in kernel we call "hashinit" which > > > initializes a hash table with power-of-2 size. There's also > > > "phashinit" that creates hash table of size that is a prime number. > > > This was added in 1995 by davidg@ but it is not used anywhere in > > > the kernel. > > > > > > phk@ commited rev. 1.30 of vfs_cache.c replacing phashinit with > > > hashinit stating that it's better because it replaces a division > > > with logical and. is this reason still valid today? (the commit was > > > done almost 11 years ago) > > > > > > is there still any reason why not use the phashinit instead of > > > hashinit? I believe using prime-sized hash table might have > > > positive performance impact... > > > > There's a tradeoff. > > > > The argument for using powers of 2 is that division takes many > > times longer than a logical AND. > > > > The argument for using primes is that if your hash function isn't > > as good as you thought it was, or if the data has some regularity > > you weren't expecting, you can get screwed a lot more easily with > > a power of 2 hash table. With a prime-sized hash table, you only > > get screwed if lots of your data is congruent modulo the prime, > > which is very rare. > > > > Most general-purpose hash implementations I've used (e.g., GNU > > libstdc++, Sun JDK, Microsoft .NET) use prime table sizes, > > probably to make it less likely that programmers will shoot > > themselves in the foot with pathological data or bad hash functions. > > yes... a division takes roughly 40 cycles on modern i386 hw, while > and is just 1, but does it matter compared to the access times of > memory these days? > > the ratio between cpu-speed/mem-speed has changed a lot. I am not > arguing if it's still true that avoiding the division helps the > performance these days... requests/s * div_overhead - [avoided_]collisions/s * collision_overhead ~= -([avoided_]collisions/requests * collision_overhead) assuming the collision_overhead (requiring memory operations) greatly dominates the div_overhead. So if there is a high collision rate and you can reasonably assert that you will lower that significantly by using a prime sized hash table, the div_overhead doesn't matter. At least that's what I've come up with off the top of my head now. -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News