Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Jan 2009 15:04:02 -0500
From:      David Schultz <das@FreeBSD.ORG>
To:        d@delphij.net
Cc:        freebsd-arch@FreeBSD.ORG
Subject:   Re: RFC: MI strlen()
Message-ID:  <20090119200402.GA26878@zim.MIT.EDU>
In-Reply-To: <4966B5D4.7040709@delphij.net>
References:  <4966B5D4.7040709@delphij.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jan 08, 2009, Xin LI wrote:
> Here is a new implementation of strlen() which employed the bitmask
> skill in order to achieve better performance on modern hardware.  For
> common case, this would be a 5.2x boost on FreeBSD/amd64.  The code is
> intended for MI use when there is no hand-optimized assembly.

I ran some microbenchmarks on amd64, which show that the version
of strlen() in libc is up to twice as fast as yours for short
strings (< 4 bytes), but your implementation is nearly 5 times as
fast for longer strings.

As Bruce pointed out, gcc will almost use its builtin
strlen(). However, that may change in the future, and nobody has
suggested that your version would actually hurt anything, so I
think you should commit it.

Benchmark results:
	  http://www.freebsd.org/~das/strlen.gif

I ran this on a Wolfdale core using word-aligned ASCII strings and
an adaptive number of iterations. As you can see, the gcc builtin
is always slower than your code, but faster than our current libc
implementation. I can't explain why the builtin is faster for
strings of length 10 than it is for strings of length 1, but the
results are repeatable. Another interesting thing to note is that
your implementation is the only one that gets less throughput when
the string no longer fits in the L2 cache. This suggests that
either the other two are so slow that they can't use the full
memory bandwidth, or they are more effective at triggering the
CPU's prefetch heuristics.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090119200402.GA26878>