Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 01 Oct 2010 14:46:53 -0700
From:      Xin LI <delphij@delphij.net>
To:        Roman Divacky <rdivacky@FreeBSD.ORG>
Cc:        Jilles Tjoelker <jilles@FreeBSD.ORG>, svn-src-head@FreeBSD.ORG, svn-src-all@FreeBSD.ORG, src-committers@FreeBSD.ORG
Subject:   Re: svn commit: r213326 - head/lib/libc/i386/string
Message-ID:  <4CA656CD.40908@delphij.net>
In-Reply-To: <20101001132233.GA83116@freebsd.org>
References:  <201010011310.o91DABUU007534@svn.freebsd.org> <20101001132233.GA83116@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 2010/10/01 06:22, Roman Divacky wrote:
> On Fri, Oct 01, 2010 at 01:10:11PM +0000, Jilles Tjoelker wrote:
>> Author: jilles
>> Date: Fri Oct  1 13:10:11 2010
>> New Revision: 213326
>> URL: http://svn.freebsd.org/changeset/base/213326
>>
>> Log:
>>   libc: Remove the i386 assembler version of strlen(3).
>>   
>>   On anything modern, the C version, which processes a word at a time, is much
>>   faster. The Intel optimization manual explicitly warns against using REP
>>   prefixes with SCAS or CMPS, which is exactly what the assembler version
>>   does.
> 
> there's "rep cmps" in bcmp.S and memcmp.S in both amd64/i386
> 
> they both have C counterparts, no idea how fast those are (they
> are going char by char).

char by char will be slower than word-by-word in both aligned and
unaligned case.  There are some other factors like inline expanding,
etc. which also affects their speed and require careful tuning.

> does this wisdom apply to those too?

I'm not quite sure about bcmp() and memcmp() case, especially when the
two pointers are not aligned (say, (p1 & (sizeof(word)-1)) != (p2 &
(sizeof(word)-1))).  Branching for different aligning cases MAY give
better performance, BUT they can also hurt due to the added complexity,
so if we want to do it in a MI way we will need to benchmark.

===

By the way I have a memchr(3) implementation using the similar algorithm
strlen(3) uses, and microbenchmark shows a 2x to 3x improvement but it's
still in my queue and needs to do real-world testing.

Cheers,
- -- 
Xin LI <delphij@delphij.net>	http://www.delphij.net/
FreeBSD - The Power to Serve!	       Live free or die
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (FreeBSD)

iQEcBAEBCAAGBQJMplbNAAoJEATO+BI/yjfBgWwH/2MSNvH0QNhEcyhKBU/Pzh8C
862myDjcxA4l1+ca2en9igPgWno+ZMUaiH4Td5qCBdX8tsFLlGCgC0o0a0HC51+7
mv8qTfWrYAFcU2NrmX8wsnprLijmS2NH3wBC0uJJXpJhmJUraTHbG9YcctIUe363
Yvy+d7HqraPvCShWEgj54V5q/vPPy5vT6gPFwhMpe0J9/gmSMwwxCF1RctE2K/Br
89TWb/g4vrFJCk3Ks3j8viJJN2Zd9sbBYeF/LBnMLPkVSJNCnw0j1gSs+uFbfgzw
Gv5WMNNpDu338dFMVJDddgxqWa+OW1oMgtHcLUmoxMQI87sir+NJQFBD6+EK22I=
=2+Hh
-----END PGP SIGNATURE-----



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CA656CD.40908>