Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 21 Mar 2015 21:33:22 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: svn commit: r280279 - head/sys/sys
Message-ID:  <20150321205449.H1846@besplex.bde.org>
In-Reply-To: <20150321005456.GC2379@kib.kiev.ua>
References:  <201503201027.t2KAR6Ze053047@svn.freebsd.org> <20150320130216.GS2379@kib.kiev.ua> <20150321085923.U1046@besplex.bde.org> <20150321005456.GC2379@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 21 Mar 2015, Konstantin Belousov wrote:

> On Sat, Mar 21, 2015 at 09:42:51AM +1100, Bruce Evans wrote:
>> On Fri, 20 Mar 2015, Konstantin Belousov wrote:
>>
>>> On Fri, Mar 20, 2015 at 10:27:06AM +0000, John Baldwin wrote:
>>>> Author: jhb
>>>> Date: Fri Mar 20 10:27:06 2015
>>>> New Revision: 280279
>>>> URL: https://svnweb.freebsd.org/changeset/base/280279
>>>>
>>>> Log:
>>>>   Expand the bitcount* API to support 64-bit integers, plain ints and longs
>>>>   and create a "hidden" API that can be used in other system headers without
>>>>   adding namespace pollution.
>>>>   - If the POPCNT instruction is enabled at compile time, use
>>>>     __builtin_popcount*() to implement __bitcount*(), otherwise fall back
>>>>     to software implementations.
>>
>>> Are you aware of the Haswell errata HSD146 ?  I see the described behaviour
>>
>> I wasn't.
>>
>>> on machines back to SandyBridge, but not on Nehalems.
>>> HSD146.   POPCNT Instruction May Take Longer to Execute Than Expected
>>> Problem: POPCNT instruction execution with a 32 or 64 bit operand may be
>>> delayed until previous non-dependent instructions have executed.
>>
>> If it only affects performance, then it is up to the compiler to fix it.
> It affects performance on some cpu models.  It is too wrong for compiler
> to issue cpuid before popcnt.  Always issuing xorl before popcnt, as
> it is currently done by recent gcc, but not clang, is better, but still
> I think it is up to the code author to decide.

How does the unconditional use of popcntq (in inline asm that is called
from amd64 pmap) even work?

jhb's cleanups apparently missed this inline asm, while previous work
should not have added it.  This inline asm should never have existed,
since compilers can generate the same code with less-incorrect
configuration.  Correct configuration would be messy.  jhb's cleanups
depend on a compiler predefine (__POPCNT__) to determine if the
instruction can be used.  But this definition is static and is usually
wrong, since compilers default to plain amd64 which doesn't have the
instruction.  Dynamic configuration would have to use cpuid just to
determine if the instruction exists.

>>> Jilles noted that gcc head and 4.9.2 already provides a workaround by
>>> xoring the dst register.  I have some patch for amd64 pmap, see the end
>>> of the message.

I thought that that couldn't work since it uses the instruction
unconditionally.  But the old code already does that.

>> IIRC, then patch never never uses asm, but intentionally uses the popcount
>> builtin to avoid complications.

popcntq() in amd64/include/cpufunc.h does use asm, but is missing the
complications needed for when it the instruction is not available.
All callers (just 3 in pmap) seem to be broken, since they are also
missing the complications.

There is also a naming problem.  The asm API is named popcntq like the
builtins, but the APIs in the cleanup are named *bitcount*.  We couldn't
decide whether the cleanup should rename them to match the builtins.

>>>>   - Use the existing bitcount16() and bitcount32() from <sys/systm.h> to
>>>>     implement the non-POPCNT __bitcount16() and __bitcount32() in
>>>>     <sys/types.h>.
>>> Why is it in sys/types.h ?
>>
>> To make it easier to use, while minimizing namespace pollution and
>> inefficiencies.  Like the functions used to implement ntohl(), except
>> the implementation is MI so it doesn't need to be in <machine>.
>> (The functions used to implement ntohl() are in machine/endian.h.
>> sys/types.h always includes that, so it makes little difference to
>> pollution and inefficiency that the implementation is not more directly
>> in machine/_types.h.)  bitcount is simpler and not burdened by
>> compatibility, so it doesn't need a separate header.)
>
> Still, it is weird to provide functions from the sys/types.h namespace,
> and even more weird to provide such special-purpose function.

Not in the sys/types.h, but in the implementation namespace :-).  Since
no one should depend on the implementation details, we can move the
details to a better place when one is known.  Better places would be
something like libkern for userland to hold a large collection of
functions, and a Standard place for single functions.  I expect the
Standard place will be broken as designed for compatibility.  Probably
strings.h alongside ffs().

The special (inline/asm) functions are in sys/types.h on amd64 are
currently limited to just 3 bswap functions and some new popcnt functions.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150321205449.H1846>