From owner-svn-src-all@FreeBSD.ORG  Wed Feb 29 05:14:00 2012
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 0CD80106566B;
	Wed, 29 Feb 2012 05:14:00 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au
	[211.29.132.189])
	by mx1.freebsd.org (Postfix) with ESMTP id 5F5F48FC0A;
	Wed, 29 Feb 2012 05:13:58 +0000 (UTC)
Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au
	(c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136])
	by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q1T5DtDp018501
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 29 Feb 2012 16:13:56 +1100
Date: Wed, 29 Feb 2012 16:13:55 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Tijl Coosemans <tijl@FreeBSD.org>
In-Reply-To: <201202282222.26343.tijl@freebsd.org>
Message-ID: <20120229160522.W2514@besplex.bde.org>
References: <201202281939.q1SJdtYr084858@svn.freebsd.org>
	<4F4D3F5D.70905@FreeBSD.org> <201202282222.26343.tijl@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org,
	src-committers@FreeBSD.org, Dimitry Andric <dim@FreeBSD.org>
Subject: Re: svn commit: r232266 - in head/sys: amd64/include i386/include
 pc98/include x86/include
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
	user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Feb 2012 05:14:00 -0000

On Tue, 28 Feb 2012, Tijl Coosemans wrote:

> On Tuesday 28 February 2012 21:55:57 Dimitry Andric wrote:
>> On 2012-02-28 20:39, Tijl Coosemans wrote:
>>> Log:
>>>   Copy amd64 endian.h to x86 and merge with i386 endian.h. Replace
>>>   amd64/i386/pc98 endian.h with stubs.
>>>
>>>   In __bswap64_const(x) the conflict between 0xffUL and 0xffULL has been
>>>   resolved by reimplementing the macro in terms of __bswap32(x). As a side
>>>   effect __bswap64_var(x) is now implemented using two bswap instructions on
>>>   i386 and should be much faster. __bswap32_const(x) has been reimplemented
>>>   in terms of __bswap16(x) for consistency.
>> ...
>>> +#define	__bswap32_const(_x)		\
>>> +	(((__uint32_t)__bswap16(_x) << 16) | __bswap16((_x) >> 16))
>>> +
>>> +#define	__bswap32(_x)			\
>>> +	(__builtin_constant_p(_x) ?	\
>>> +	    __bswap32_const((__uint32_t)(_x)) : __bswap32_var(_x))
>>> +
>>> +#define	__bswap64_const(_x)		\
>>> +	(((__uint64_t)__bswap32(_x) << 32) | __bswap32((_x) >> 32))
>>
>> Hmm, shouldn't __bswap32_const() be implemented in terms of
>> __bswap16_const(), and similarly for __bswap64_const(), which should be
>> implemented in terms of __bswap32_const()?
>>
>> The whole reason for the difficult dance with __builtin_constant_p is to
>> allow for compile-time resolving of bswap'd constants.  Invoking the
>> regular __bswap() macros doesn't gain you anything here.
>
> __bswap64_const is also used in __bswap64_var, so its argument isn't
> always a compile time constant and then __bswap32 becomes __bswap32_var.
> If it is a constant __bswap32 becomes __bswap32_const (even at -O0).

I cleaned this up a bit according to ideas in my previous mails, and
added a comment about the confusing use of __bswap64_const() (now
named __bswap64_gen()) in __bswap64_var():

% Index: endian.h
% ===================================================================
% RCS file: /home/ncvs/src/sys/x86/include/endian.h,v
% retrieving revision 1.1
% diff -u -2 -r1.1 endian.h
% --- endian.h	28 Feb 2012 19:39:54 -0000	1.1
% +++ endian.h	29 Feb 2012 03:10:33 -0000
% @@ -37,8 +37,4 @@
%  #include <sys/_types.h>
% 
% -#ifdef __cplusplus
% -extern "C" {
% -#endif
% -
%  /*
%   * Define the order of 32-bit words in 64-bit words.
% @@ -68,32 +64,24 @@
%  #endif
% 
% -#if defined(__GNUCLIKE_ASM) && defined(__GNUCLIKE_BUILTIN_CONSTANT_P)
% -
% -#define	__bswap16_const(_x)	(__uint16_t)((_x) << 8 | (_x) >> 8)
% -
% -#define	__bswap16(_x)			\
% -	(__builtin_constant_p(_x) ?	\
% -	    __bswap16_const((__uint16_t)(_x)) : __bswap16_var(_x))
% -
% -#define	__bswap32_const(_x)		\
% -	(((__uint32_t)__bswap16(_x) << 16) | __bswap16((_x) >> 16))
% -
% -#define	__bswap32(_x)			\
% -	(__builtin_constant_p(_x) ?	\
% -	    __bswap32_const((__uint32_t)(_x)) : __bswap32_var(_x))
% +#define	___bswap16(x)	(__uint16_t)((x) << 8 | (x) >> 8)
% +#define	__bswap16(x)	(___bswap16((__uint16_t)(x)))
% +#define	__bswap32_gen(x)		\
% +	(((__uint32_t)__bswap16(x) << 16) | __bswap16((x) >> 16))
% +#define	__bswap64_gen(x)		\
% +	(((__uint64_t)__bswap32(x) << 32) | __bswap32((x) >> 32))
% 
% -#define	__bswap64_const(_x)		\
% -	(((__uint64_t)__bswap32(_x) << 32) | __bswap32((_x) >> 32))
% -
% -#define	__bswap64(_x)			\
% -	(__builtin_constant_p(_x) ?	\
% -	    __bswap64_const((__uint64_t)(_x)) : __bswap64_var(_x))
% +#if defined(__GNUCLIKE_ASM) && defined(__GNUCLIKE_BUILTIN_CONSTANT_P)
% 
% -static __inline __uint16_t
% -__bswap16_var(__uint16_t _x)
% -{
% +/*
% + * The following messes are old optimizations for gcc.  The messes have
% + * been reduced significantly, but I don't know how to implement the
% + * preferred way of defining defaults above and overriding them cleanly
% + * here.  Even clang needs help for the 64-bit case, so to keep the
% + * ifdefs sane we use this for the 32-bit case with clang too.
% + */
% 
% -	return (__bswap16_const(_x));
% -}
% +#define	__bswap32(x)			\
% +	(__builtin_constant_p(x) ?	\
% +	    __bswap32_gen((__uint32_t)(x)) : __bswap32_var(x))
% 
%  static __inline __uint32_t
% @@ -105,34 +93,35 @@
%  }
% 
% +#define	__bswap64(x)			\
% +	(__builtin_constant_p(x) ?	\
% +	    __bswap64_gen((__uint64_t)(x)) : __bswap64_var(x))
% +
%  static __inline __uint64_t
%  __bswap64_var(__uint64_t _x)
%  {
% -#ifdef	_LP64
% +
% +#ifdef _LP64
%  	__asm ("bswap %0" : "+r" (_x));
%  	return (_x);
%  #else
% -	return (__bswap64_const(_x));
% +	/*
% +	 * It is important for the optimizations that the following is not
% +	 * really generic, but expands to 2 __bswap32_var()'s.
% +	 */
% +	return (__bswap64_gen(_x));
%  #endif
%  }
% 
% -#define	__htonl(x)	__bswap32(x)
% -#define	__htons(x)	__bswap16(x)
% -#define	__ntohl(x)	__bswap32(x)
% -#define	__ntohs(x)	__bswap16(x)
% -
% -#else /* !(__GNUCLIKE_ASM && __GNUCLIKE_BUILTIN_CONSTANT_P) */
% +#else /* !(__GNUCLIKE_ASM && __GNUCLIKE_BUILTIN_CONSTANT_P */
% 
% -/*
% - * No optimizations are available for this compiler.  Fall back to
% - * non-optimized functions by defining the constant usually used to prevent
% - * redefinition.
% - */
% -#define	_BYTEORDER_FUNC_DEFINED
% +#define	__bswap32(x)	(__bswap32_gen((__uint32_t)(x)))
% +#define	__bswap64(x)	(__bswap64_gen((__uint64_t)(x)))
% 
%  #endif /* __GNUCLIKE_ASM && __GNUCLIKE_BUILTIN_CONSTANT_P */
% 
% -#ifdef __cplusplus
% -}
% -#endif
% +#define	__htonl(x)	__bswap32(x)
% +#define	__htons(x)	__bswap16(x)
% +#define	__ntohl(x)	__bswap32(x)
% +#define	__ntohs(x)	__bswap16(x)
% 
%  #endif /* !_MACHINE_ENDIAN_H_ */

This passes simple tests, but I had to restore some messes for clang
that I originally left out to get clang to optimize __bswap64()
like it did before for the `var' case.  clang apparently only
understands the general expression for __bswapN() up to N = 32.

Bruce