Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Feb 2012 11:45:31 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Tijl Coosemans <tijl@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r232266 - in head/sys: amd64/include i386/include pc98/include x86/include
Message-ID:  <20120229105104.K1479@besplex.bde.org>
In-Reply-To: <201202281939.q1SJdtYr084858@svn.freebsd.org>
References:  <201202281939.q1SJdtYr084858@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 28 Feb 2012, Tijl Coosemans wrote:

> Log:
>  Copy amd64 endian.h to x86 and merge with i386 endian.h. Replace
>  amd64/i386/pc98 endian.h with stubs.
>
>  In __bswap64_const(x) the conflict between 0xffUL and 0xffULL has been
>  resolved by reimplementing the macro in terms of __bswap32(x). As a side
>  effect __bswap64_var(x) is now implemented using two bswap instructions on
>  i386 and should be much faster. __bswap32_const(x) has been reimplemented
>  in terms of __bswap16(x) for consistency.

The files now have enough in common for me to be happy with centralizing
them.  The trip through <machine> would have to be avoided to complete
the cleanup.  I don't know how to do that.  Maybe install can produce
a direct symlink.  We should try harder to not install both x86/foo.h
and machine/foo.h, so pathnames <x86> and <machine> don't start being
used in applications and then having to be supported for compatibility.

The patch is unreadable due to significant reimplementations, even without
the move.  So I looked at the resulting x86/endian.h:

% ...
% #ifdef __cplusplus
% extern "C" {
% #endif

This ifdef is old style bug:
- it is a hard-coded spelling of __BEGIN_DECLS
- it is placed around the whole file.  Most of the file consists of macro
   definitions.  __BEGIN_DECLS would obviously make no sense around macro
   definitions.
- the rest of the file consists of inline function declarations.  Since
   these aren't extern, 'extern "C"' makes no sense for them, though it
   isn't so clear that __BEGIN_DECLS makes no sense for them.
- the log message for the commit that added it is not good, but at least
   says that C++ support is added.

% #if defined(__GNUCLIKE_ASM) && defined(__GNUCLIKE_BUILTIN_CONSTANT_P)

This ifdef is still misplaced.  Defining the 'const' versions doesn't
require the gcc builtin, and it is further from requiring gcc asm.

% 
% #define	__bswap64_const(_x)			\
% 	(((_x) >> 56) |				\
% 	(((_x) >> 40) & (0xffULL << 8)) |	\
% 	(((_x) >> 24) & (0xffULL << 16)) |	\
% 	(((_x) >> 8) & (0xffULL << 24)) |	\
% 	(((_x) << 8) & (0xffULL << 32)) |	\
% 	(((_x) << 24) & (0xffULL << 40)) |	\
% 	(((_x) << 40) & (0xffULL << 48)) |	\
% 	((_x) << 56))

I think this can be simplified in the same way as the `var' case should
have been simplified, by building it up out of 2 __bswap32_const()s...

% 
% #define	__bswap32_const(_x)			\
% 	(((_x) >> 24) |				\
% 	(((_x) & (0xff << 16)) >> 8) |		\
% 	(((_x) & (0xff << 8)) << 8) |		\
% 	((_x) << 24))

... and this can be built out of 2 __bswap16_const()s', though it is
short enough already.  Building up would be essential for __bswap_2^N
and good for N > 64.

% 
% #define __bswap16_const(_x)	(__uint16_t)((_x) << 8 | (_x) >> 8)
% 
% static __inline __uint64_t
% __bswap64_var(__uint64_t __x)
% {
% 
% 	return __bswap64_const(__x);
% }

This seems to be a pessimization for amd64, and doesn't match your
claim that it now uses 2 bswap instructions on i386.  It is the old
i386 version.  amd64 used to use a single 64-bit bswap in asm here.
Last time I looked (only on i386), gcc was incapable of turning either
__bswap64_const() or __bswap32_const() into 2 or 1 bswap instructions,
but generated pessimial code using shifts and masks.  Only clang
generated the 2 or 1 bswaps.

There is a PR about bswap64 being pessimal on i386.  The following seems
to be optimal for gcc on i386.  It uses the building-up method.  See the
PR for other details.

@ #define	_KERNEL
@ #define	I486_CPU
@ 
@ #include <machine/endian.h>
@ #include <stdint.h>
@ 
@ uint64_t x, y;
@ 
@ int
@ main(void)
@ {
@ 	/*
@ 	 * Follow is a better __bswap64().  It doesn't even need const and
@ 	 * var subcases.
@ 	 */
@ 	y = __bswap32(x) | (uint64_t)__bswap32(x >> 32) << 32;
@ 	return (0);
@ }

Expanding on its comment, we probably don't even need the const subcase
for consts, but can just use the above expression for consts too.
However, this only works right for i386.   For amd64, there are the
following cases:
- const case, compiler either gcc or clang: the above expression should
   work
- variable case, gcc: needs the inline with a 64-bit bswap, unless gcc
   has become smarter
- variable case, clang: the above expression won't work if __bswap32()
   uses an inline asm with a 32-bit bswap like it does now, since clang
   will be forced to generate 2 32-bit bswaps.  To recover to single
   64-bit bswap, either write it in inline asm for clang too, or maybe
   use the general 64-bit 'const' expression (written like it is now,
   not as in my version above).  I think clang translates the big const
   expression into a single 64-bit bswap, not 2 32-bit ones like it does
   on i386.

% 
%

Extra blank line (old style bug duplicated).

% ... 
% #define	__bswap64(_x)					\
% 	(__builtin_constant_p(_x) ?			\
% 	    __bswap64_const((__uint64_t)(_x)) : __bswap64_var(_x))

Can be replaced by a "building-up" expression except for complications
with clang on amd64.

% 
% #define	__bswap32(_x)					\
% 	(__builtin_constant_p(_x) ?			\
% 	    __bswap32_const((__uint32_t)(_x)) : __bswap32_var(_x))

Can be replaced by a "building-up" expression except for complications
for gcc.  (We already killed the asm for __bswap16_var(), so the
optimization problem now is to turn the general expression for
__bswap32() into a single 32-bit bswap instruction.  Only clang does
this automatically.

% 
% #define	__bswap16(_x)					\
% 	(__builtin_constant_p(_x) ?			\
% 	    __bswap16_const((__uint16_t)(_x)) : __bswap16_var(_x))

Need a primitive somewhere -- no building up for this :-).

% #ifdef __cplusplus
% }
% #endif

Brackets the 'extern "C"' style bug. but looks uglier with a brace all
by itself.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120229105104.K1479>