From owner-svn-src-all@FreeBSD.ORG Wed Feb 29 05:14:00 2012 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0CD80106566B; Wed, 29 Feb 2012 05:14:00 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by mx1.freebsd.org (Postfix) with ESMTP id 5F5F48FC0A; Wed, 29 Feb 2012 05:13:58 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q1T5DtDp018501 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 29 Feb 2012 16:13:56 +1100 Date: Wed, 29 Feb 2012 16:13:55 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Tijl Coosemans In-Reply-To: <201202282222.26343.tijl@freebsd.org> Message-ID: <20120229160522.W2514@besplex.bde.org> References: <201202281939.q1SJdtYr084858@svn.freebsd.org> <4F4D3F5D.70905@FreeBSD.org> <201202282222.26343.tijl@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Dimitry Andric Subject: Re: svn commit: r232266 - in head/sys: amd64/include i386/include pc98/include x86/include X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Feb 2012 05:14:00 -0000 On Tue, 28 Feb 2012, Tijl Coosemans wrote: > On Tuesday 28 February 2012 21:55:57 Dimitry Andric wrote: >> On 2012-02-28 20:39, Tijl Coosemans wrote: >>> Log: >>> Copy amd64 endian.h to x86 and merge with i386 endian.h. Replace >>> amd64/i386/pc98 endian.h with stubs. >>> >>> In __bswap64_const(x) the conflict between 0xffUL and 0xffULL has been >>> resolved by reimplementing the macro in terms of __bswap32(x). As a side >>> effect __bswap64_var(x) is now implemented using two bswap instructions on >>> i386 and should be much faster. __bswap32_const(x) has been reimplemented >>> in terms of __bswap16(x) for consistency. >> ... >>> +#define __bswap32_const(_x) \ >>> + (((__uint32_t)__bswap16(_x) << 16) | __bswap16((_x) >> 16)) >>> + >>> +#define __bswap32(_x) \ >>> + (__builtin_constant_p(_x) ? \ >>> + __bswap32_const((__uint32_t)(_x)) : __bswap32_var(_x)) >>> + >>> +#define __bswap64_const(_x) \ >>> + (((__uint64_t)__bswap32(_x) << 32) | __bswap32((_x) >> 32)) >> >> Hmm, shouldn't __bswap32_const() be implemented in terms of >> __bswap16_const(), and similarly for __bswap64_const(), which should be >> implemented in terms of __bswap32_const()? >> >> The whole reason for the difficult dance with __builtin_constant_p is to >> allow for compile-time resolving of bswap'd constants. Invoking the >> regular __bswap() macros doesn't gain you anything here. > > __bswap64_const is also used in __bswap64_var, so its argument isn't > always a compile time constant and then __bswap32 becomes __bswap32_var. > If it is a constant __bswap32 becomes __bswap32_const (even at -O0). I cleaned this up a bit according to ideas in my previous mails, and added a comment about the confusing use of __bswap64_const() (now named __bswap64_gen()) in __bswap64_var(): % Index: endian.h % =================================================================== % RCS file: /home/ncvs/src/sys/x86/include/endian.h,v % retrieving revision 1.1 % diff -u -2 -r1.1 endian.h % --- endian.h 28 Feb 2012 19:39:54 -0000 1.1 % +++ endian.h 29 Feb 2012 03:10:33 -0000 % @@ -37,8 +37,4 @@ % #include % % -#ifdef __cplusplus % -extern "C" { % -#endif % - % /* % * Define the order of 32-bit words in 64-bit words. % @@ -68,32 +64,24 @@ % #endif % % -#if defined(__GNUCLIKE_ASM) && defined(__GNUCLIKE_BUILTIN_CONSTANT_P) % - % -#define __bswap16_const(_x) (__uint16_t)((_x) << 8 | (_x) >> 8) % - % -#define __bswap16(_x) \ % - (__builtin_constant_p(_x) ? \ % - __bswap16_const((__uint16_t)(_x)) : __bswap16_var(_x)) % - % -#define __bswap32_const(_x) \ % - (((__uint32_t)__bswap16(_x) << 16) | __bswap16((_x) >> 16)) % - % -#define __bswap32(_x) \ % - (__builtin_constant_p(_x) ? \ % - __bswap32_const((__uint32_t)(_x)) : __bswap32_var(_x)) % +#define ___bswap16(x) (__uint16_t)((x) << 8 | (x) >> 8) % +#define __bswap16(x) (___bswap16((__uint16_t)(x))) % +#define __bswap32_gen(x) \ % + (((__uint32_t)__bswap16(x) << 16) | __bswap16((x) >> 16)) % +#define __bswap64_gen(x) \ % + (((__uint64_t)__bswap32(x) << 32) | __bswap32((x) >> 32)) % % -#define __bswap64_const(_x) \ % - (((__uint64_t)__bswap32(_x) << 32) | __bswap32((_x) >> 32)) % - % -#define __bswap64(_x) \ % - (__builtin_constant_p(_x) ? \ % - __bswap64_const((__uint64_t)(_x)) : __bswap64_var(_x)) % +#if defined(__GNUCLIKE_ASM) && defined(__GNUCLIKE_BUILTIN_CONSTANT_P) % % -static __inline __uint16_t % -__bswap16_var(__uint16_t _x) % -{ % +/* % + * The following messes are old optimizations for gcc. The messes have % + * been reduced significantly, but I don't know how to implement the % + * preferred way of defining defaults above and overriding them cleanly % + * here. Even clang needs help for the 64-bit case, so to keep the % + * ifdefs sane we use this for the 32-bit case with clang too. % + */ % % - return (__bswap16_const(_x)); % -} % +#define __bswap32(x) \ % + (__builtin_constant_p(x) ? \ % + __bswap32_gen((__uint32_t)(x)) : __bswap32_var(x)) % % static __inline __uint32_t % @@ -105,34 +93,35 @@ % } % % +#define __bswap64(x) \ % + (__builtin_constant_p(x) ? \ % + __bswap64_gen((__uint64_t)(x)) : __bswap64_var(x)) % + % static __inline __uint64_t % __bswap64_var(__uint64_t _x) % { % -#ifdef _LP64 % + % +#ifdef _LP64 % __asm ("bswap %0" : "+r" (_x)); % return (_x); % #else % - return (__bswap64_const(_x)); % + /* % + * It is important for the optimizations that the following is not % + * really generic, but expands to 2 __bswap32_var()'s. % + */ % + return (__bswap64_gen(_x)); % #endif % } % % -#define __htonl(x) __bswap32(x) % -#define __htons(x) __bswap16(x) % -#define __ntohl(x) __bswap32(x) % -#define __ntohs(x) __bswap16(x) % - % -#else /* !(__GNUCLIKE_ASM && __GNUCLIKE_BUILTIN_CONSTANT_P) */ % +#else /* !(__GNUCLIKE_ASM && __GNUCLIKE_BUILTIN_CONSTANT_P */ % % -/* % - * No optimizations are available for this compiler. Fall back to % - * non-optimized functions by defining the constant usually used to prevent % - * redefinition. % - */ % -#define _BYTEORDER_FUNC_DEFINED % +#define __bswap32(x) (__bswap32_gen((__uint32_t)(x))) % +#define __bswap64(x) (__bswap64_gen((__uint64_t)(x))) % % #endif /* __GNUCLIKE_ASM && __GNUCLIKE_BUILTIN_CONSTANT_P */ % % -#ifdef __cplusplus % -} % -#endif % +#define __htonl(x) __bswap32(x) % +#define __htons(x) __bswap16(x) % +#define __ntohl(x) __bswap32(x) % +#define __ntohs(x) __bswap16(x) % % #endif /* !_MACHINE_ENDIAN_H_ */ This passes simple tests, but I had to restore some messes for clang that I originally left out to get clang to optimize __bswap64() like it did before for the `var' case. clang apparently only understands the general expression for __bswapN() up to N = 32. Bruce