Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 21 Mar 2015 02:54:56 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>
Subject:   Re: svn commit: r280279 - head/sys/sys
Message-ID:  <20150321005456.GC2379@kib.kiev.ua>
In-Reply-To: <20150321085923.U1046@besplex.bde.org>
References:  <201503201027.t2KAR6Ze053047@svn.freebsd.org> <20150320130216.GS2379@kib.kiev.ua> <20150321085923.U1046@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 21, 2015 at 09:42:51AM +1100, Bruce Evans wrote:
> On Fri, 20 Mar 2015, Konstantin Belousov wrote:
> 
> > On Fri, Mar 20, 2015 at 10:27:06AM +0000, John Baldwin wrote:
> >> Author: jhb
> >> Date: Fri Mar 20 10:27:06 2015
> >> New Revision: 280279
> >> URL: https://svnweb.freebsd.org/changeset/base/280279
> >>
> >> Log:
> >>   Expand the bitcount* API to support 64-bit integers, plain ints and longs
> >>   and create a "hidden" API that can be used in other system headers without
> >>   adding namespace pollution.
> >>   - If the POPCNT instruction is enabled at compile time, use
> >>     __builtin_popcount*() to implement __bitcount*(), otherwise fall back
> >>     to software implementations.
> 
> > Are you aware of the Haswell errata HSD146 ?  I see the described behaviour
> 
> I wasn't.
> 
> > on machines back to SandyBridge, but not on Nehalems.
> > HSD146.   POPCNT Instruction May Take Longer to Execute Than Expected
> > Problem: POPCNT instruction execution with a 32 or 64 bit operand may be
> > delayed until previous non-dependent instructions have executed.
> 
> If it only affects performance, then it is up to the compiler to fix it.
It affects performance on some cpu models.  It is too wrong for compiler
to issue cpuid before popcnt.  Always issuing xorl before popcnt, as
it is currently done by recent gcc, but not clang, is better, but still
I think it is up to the code author to decide.

> 
> > Jilles noted that gcc head and 4.9.2 already provides a workaround by
> > xoring the dst register.  I have some patch for amd64 pmap, see the end
> > of the message.
> 
> IIRC, then patch never never uses asm, but intentionally uses the popcount
> builtin to avoid complications.
> 
> >>   - Use the existing bitcount16() and bitcount32() from <sys/systm.h> to
> >>     implement the non-POPCNT __bitcount16() and __bitcount32() in
> >>     <sys/types.h>.
> > Why is it in sys/types.h ?
> 
> To make it easier to use, while minimizing namespace pollution and
> inefficiencies.  Like the functions used to implement ntohl(), except
> the implementation is MI so it doesn't need to be in <machine>.
> (The functions used to implement ntohl() are in machine/endian.h.
> sys/types.h always includes that, so it makes little difference to
> pollution and inefficiency that the implementation is not more directly
> in machine/_types.h.)  bitcount is simpler and not burdened by
> compatibility, so it doesn't need a separate header.)

Still, it is weird to provide functions from the sys/types.h namespace,
and even more weird to provide such special-purpose function.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150321005456.GC2379>