Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Jun 2001 18:10:50 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Matt Dillon <dillon@earth.backplane.com>
Cc:        Mikhail Teterin <mi@aldan.algebra.com>, jlemon@FreeBSD.ORG, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG
Subject:   Re: Inline optimized bzero (was Re: cvs commit: src/sys/netinet tcp_subr.c)
Message-ID:  <Pine.BSF.4.21.0106241725360.54646-100000@besplex.bde.org>
In-Reply-To: <200106240513.f5O5DIH75729@earth.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 23 Jun 2001, Matt Dillon wrote:

> :To use the builtin memset except for parts of it that we don't like,
> :I suggest using code like:
> :
> :#if defined(__GNUC) && defined(_HAVE_GOOD_BUILTIN_MEMSET)
> :#define	bzero(p, n) do {					\
> :	if (__builtin_constant_p(n) && (n) < LARGE_MD_VALUE &&	\
> :	   !__any_other_cases_that_we_dont_like(n))		\
> :		__builtin_memset((p), 0, (n));			\
> :	else							\
> :		(bzero)((p), (n));				\
> :} while (0)
> :#endif
> 
>     Hmm.  I wonder if __builtin_constant_p() works in an inline....
>     holy cow, it does!

That's interesting.  ISTR asking an experienced gcc maintainer (Kenner?)
to implement this back in 1992, and he said something like "if you want
macro semantics, then use a macro".  Perhaps I actually wanted the "i"
constraint to work in asms ... yes, this is still a problem:

	asm("foo %0" : : "i" (n));

doesn't work if `n' is an inline function parameter, despite
__builtin_constant_p(n) being true.

>     Ok, so what if we made bzero() an inline which checked to see if the
>     size was a constant (or too large) and did the memset() magic there,
>     otherwise it calls the real bzero for non-constant or too-large n's
>     (say, anything bigger then 64 bytes).  I think we'd have to use an
>     inline rather then a #define'd macro to make it look as much like a
>     real function as possible.

I benchmarked the following version using lmbench2:

#define	bzero(p, n) ({						\
	if (__builtin_constant_p(n) && (n) <= 16)		\
		__builtin_memset((p), 0, (n));			\
	else							\
		(bzero)((p), (n));				\
})

The results were uninteresting: essentially no change.  lmbench2 is a
micro-benchmark, so it tends to show larger improvements for micro-
optimizations than can be expected in normal use.

>     As a separate issue, I am starting to get real worried about the FP
>     optimization in bzero() interfering with -current... I'm thinking that
>     we should rip it out.

It only costs one indirection for a call through a function pointer,
and is disabled anyway (but we still pay for the indirection).  The
cost is mainly in source code complexity.

One point that I noticed after writing my original reply: the gcc
builtins depend on misaligned accesses not trapping.  This is reasonable
on i386's, although it is broken if alignment checking is enabled
(but other things are broken, e.g., copying of structs essentially
uses the builtin memcpy and does misaligned copies for some structs
(e.g., ones containing just a large array of shorts, inside another
struct so that the i386 ABI forces perfect misalignment of the array).
However, unaligned accesses trap on some machines (including alphas
I think), so the corresponding optimization for memset is not possible.
Your bzerol() could do better by knowing that the pointer is aligned.
However, I think the source code shouldn't be complicated with
optimizations like this.  For alphas, there are the additional
complications that 64-bit copies should be preferred, but I think
more alignment is required for 64-bit copies, so the alignment would
have to be part of the interface for maximal efficiency...

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0106241725360.54646-100000>