Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 May 2013 08:23:28 +0200
From:      =?ISO-8859-1?Q?V=E1clav_Zeman?= <vhaisman@gmail.com>
To:        freebsd-hackers@freebsd.org
Subject:   Re: Performance improvement to strnlen().
Message-ID:  <51A1AA60.9050209@gmail.com>
In-Reply-To: <afa77dcc2e1cb351cfaded708acbdae0@lthomas.net>
References:  <afa77dcc2e1cb351cfaded708acbdae0@lthomas.net>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig1FEBB2302363A6AC4C0260A6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 05/25/2013 10:27 PM, Lee Thomas wrote:
> Hello FreeBSD devs,
>=20
> I have found a performance improvement to libc's strnlen().
> lib/libc/string/strnlen.c is a trivial byte-by-byte implementation,
> where strlen.c has a smarter word-by-word implementation. I have
> implemented strnlen similarly to strlen. It runs about 4x as fast, at
> the cost of a binary codesize increase from 30 bytes to 221.
>=20
> In writing this I needed a test, and there isn't one in
> tools/regression/lib/libc/string, so I wrote a unit test for strnlen.
> This really only makes sense for a word-by-word implementation, as it
> tests each combination of string and limit length, overread characters,=

> and starting alignment.
>=20
> Could someone please review these patches? I am not very experienced in=

> C, and am even less experienced with FreeBSD's style guidelines, so the=
y
> likely have a bunch of style and idiom issues. Even if one or both of
> them prove not worth committing, it would still be useful for my learni=
ng.
>=20
> If/When these patches prove worthy of submitting, what is the next step=

> after that? Should I submit a PR, or is there some other process? This
> code is quite similar to the existing strlen.c, and doesn't do anything=

> fancy with e.g. endianness, but I haven't tested it for correctness or
> performance on anything other than x86...
>=20
> And finally, there is some other low-hanging fruit in the other strn*
> functions. Would it be worth it for me to give those the same treatment=
?
>=20
> Thanks,
> Lee Thomas
>=20
> Test platform:
>     $uname -a
>         FreeBSD LeeDesktop 9.1-STABLE FreeBSD 9.1-STABLE #15 r250831:
> Mon May 20 17:31:28 EDT 2013
>             lthomas@LeeDesktop:/usr/obj/usr/src/sys/NOSOUND  amd64
>     $dmesg | grep CPU:
>         CPU: Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
> (2666.81-MHz K8-class CPU)
>     $clang --version
>         FreeBSD clang version 3.2 (tags/RELEASE_32/final 170710) 201212=
21
>         Target: x86_64-unknown-freebsd9.1
>         Thread model: posix
>=20
> My timing test was a file of 10000 strings, of uniformly random length,=

> 50% between 0 and 20 bytes long, and 50% between 21 and 1000 bytes long=
=2E
> Each was filled with random generated bytes in the range [1, 255]. The
> test executables run their strlen on each string in the file 1000 times=
,
> and a shell script ran the test programs alternately 100 times. Here ar=
e
> the clang results; gcc results are roughly the same. I will share the
> actual timing code if someone wants it, but it is pretty ugly C++ and
> shell and I don't guarantee it working :-).
>=20
> Results:
>=20
> x ./times_bsd_strnlen.txt
> + ./times_my_strnlen.txt
> +----------------------------------------------------------------------=
----+
>=20
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |+                                                                     =
  x|
> |A                                                                     =
  A|
> +----------------------------------------------------------------------=
----+
>=20
>     N           Min           Max        Median           Avg        St=
ddev
> x 101     1.8685486     1.8693889     1.8687506     1.8687894  0.000148=
4903
> + 101    0.42704683    0.42779486    0.42712813    0.42715597 0.0001068=
1494
> Difference at 95.0% confidence
>     -1.44163 +/- 3.56739e-05
>     -77.1426% +/- 0.00190893%
>     (Student's t, pooled s =3D 0.000129342)
>=20
> Note: I manually shortened the histogram, as it was 100 lines long of
> exactly the same.
>=20
>=20
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.o=
rg"
>=20

> Index: strnlen.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> diff --git a/head/lib/libc/string/strnlen.c b/head/lib/libc/string/strn=
len.c
> --- a/head/lib/libc/string/strnlen.c	(revision 250951)
> +++ b/head/lib/libc/string/strnlen.c	(working copy)
> @@ -1,5 +1,6 @@
>  /*-
> - * Copyright (c) 2009 David Schultz <das@FreeBSD.org>
> + * Copyright (c) 2009, 2010 Xin LI <delphij@FreeBSD.org>
> + * Copyright (c) 2013 Lee Thomas <lee_thomas@AslanTools.com>
>   * All rights reserved.
>   *
>   * Redistribution and use in source and binary forms, with or without
> @@ -27,16 +28,91 @@
>  #include <sys/cdefs.h>
>  __FBSDID("$FreeBSD$");
> =20
> +#include <sys/limits.h>
> +#include <sys/types.h>
>  #include <string.h>
> =20
> +/*
> + * Portable strnlen() for 32-bit and 64-bit systems.
> + *
> + * Rationale: it is generally much more efficient to do word length
> + * operations and avoid branches on modern computer systems, as
> + * compared to byte-length operations with a lot of branches.
> + *
> + * The expression:
> + *
> + *	((x - 0x01....01) & ~x & 0x80....80)
> + *
> + * would evaluate to a non-zero value iff any of the bytes in the
> + * original word is zero.
> + *
> + * On multi-issue processors, we can divide the above expression into:=

> + *	a)  (x - 0x01....01)
> + *	b) (~x & 0x80....80)
> + *	c) a & b
> + *
> + * Where, a) and b) can be partially computed in parallel.
> + *
> + * The algorithm above is found on "Hacker's Delight" by
> + * Henry S. Warren, Jr.
> + */
> +
> +/* Magic numbers for the algorithm */
> +#if LONG_BIT =3D=3D 32
> +static const unsigned long mask01 =3D 0x01010101;
> +static const unsigned long mask80 =3D 0x80808080;
> +#elif LONG_BIT =3D=3D 64
> +static const unsigned long mask01 =3D 0x0101010101010101;
> +static const unsigned long mask80 =3D 0x8080808080808080;
> +#else
> +#error Unsupported word size
> +#endif
> +
> +#define	LONGPTR_MASK (sizeof(long) - 1)
> +
>  size_t
> -strnlen(const char *s, size_t maxlen)
> +strnlen(const char *str, size_t maxlen)
>  {
> -	size_t len;
> +	const char *stop, *short_stop;
> +	const char *p;
> +	const unsigned long *lp;
> +	long va, vb;
> =20
> -	for (len =3D 0; len < maxlen; len++, s++) {
> -		if (!*s)
> -			break;
> +	if (maxlen=3D=3D0) return 0;
> +
> +	stop=3Dstr+maxlen;
> +	/*
> +	 * Before trying the hard (unaligned byte-by-byte access) way
> +	 * to figure out whether there is a nul character, try to see
> +	 * if there is a nul character is within this accessible word
> +	 * first.
> +	 *
> +	 * p and (p & ~LONGPTR_MASK) must be equally accessible since
> +	 * they always fall in the same memory page, as long as page
> +	 * boundaries is integral multiple of word size.
> +	 */
> +	lp =3D (const unsigned long *)((uintptr_t)str & ~LONGPTR_MASK);
> +	va =3D (*lp - mask01);
> +	vb =3D ((~*lp) & mask80);
I do not think that this correct C. This is type punning violating the
rules of the language.

One way you can rewrite this is to use memcpy() into a temporary. Do not
be afraid of the memcpy(), modern GCC, and I hope even Clang, can
optimize it into a single move instruction:

{
  long tmp;
  memcpy(&tmp, lp, sizeof(tmp));
  va =3D (tmp - mask01);
  vb =3D ((~tmp) & mask80);
}

Another way could be using the union trick.

> +	lp++;
> +	if (va & vb) {
> +		/* Check if we have \0 in the first part */
> +		short_stop=3D(const char *)lp;
> +		if (stop<short_stop) short_stop=3Dstop;
> +		for (p=3Dstr; p !=3D short_stop; p++)
> +			if (*p =3D=3D '\0')
> +				return (p-str);
>  	}
> -	return (len);
> +	/* Scan the rest of the string using word sized operation */
> +	for (; (const char *)lp < stop; lp++) {
> +		va =3D (*lp - mask01);
> +		vb =3D ((~*lp) & mask80);
Ditto.

> +		if (va & vb) {
> +			for (p=3D(const char *)lp; p !=3D stop; p++)
> +				if (*p =3D=3D '\0')
> +					break;
> +			return (p-str);
> +		}
> +	}
> +	return (maxlen);
>  }

--=20
VZ



--------------enig1FEBB2302363A6AC4C0260A6
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iF4EAREIAAYFAlGhqmEACgkQ6OWUyaYNY6OgEAD/XZbTU+/A4DiqlCuF31LB2WFz
9Qmcy8o2eALuL9HCkkgBAIxcLCprHbfby4hx84Laz36uFBiF3EnibKW/DuVxzqar
=v93F
-----END PGP SIGNATURE-----

--------------enig1FEBB2302363A6AC4C0260A6--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51A1AA60.9050209>