Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Feb 2015 10:58:24 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Jung-uk Kim <jkim@freebsd.org>
Subject:   Re: svn commit: r278474 - head/sys/sys
Message-ID:  <12119175.I8M1urv6pf@ralph.baldwin.cx>
In-Reply-To: <20150211014516.N1511@besplex.bde.org>
References:  <201502092103.t19L3OAn013792@svn.freebsd.org> <54D92CE8.1030803@FreeBSD.org> <20150211014516.N1511@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, February 11, 2015 02:37:05 AM Bruce Evans wrote:
> On Mon, 9 Feb 2015, Jung-uk Kim wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA256
> > 
> > On 02/09/2015 16:08, John Baldwin wrote:
> >> On Monday, February 09, 2015 09:03:24 PM John Baldwin wrote:
> >>> Author: jhb Date: Mon Feb  9 21:03:23 2015 New Revision: 278474
> >>> URL: https://svnweb.freebsd.org/changeset/base/278474
> >>> 
> >>> Log: Use __builtin_popcnt() to implement a BIT_COUNT() operation
> >>> for bitsets and use this to implement CPU_COUNT() to count the
> >>> number of CPUs in a cpuset.
> >>> 
> >>> MFC after:	2 weeks
> >> 
> >> Yes, __builtin_popcnt() works with GCC 4.2.  It should also allow
> >> the compiler to DTRT in userland uses of this if -msse4.2 is
> >> enabled.
> > 
> > Back in 2012, when I submitted a similar patch, bde noted
> > __builtin_popcount*() cannot be used with GCC 4.2 for *kernel* because
> > it emits a library call.
> 
> (*) Since generic amd64 and i386 have no popcount instruction in hardware,
> using builtin popcount rarely uses the hardware instruction (it takes
> special -march to get it, and the resulting binaries don't run on generic
> CPUs).  Thus using the builtin works worse than using the old inline
> function in most cases.  Except, the old inline function is only
> implemented in the kernel, and isn't implemented for 64-bit integers.
> 
> gcc-4.8 generates the hardware popcount if the arch supports it.  Only
> its library popcounts are slower than clang's.  gcc-4.2 presumably
> doesn't generate the hardware popcount, since it doesn't have a -march
> for newer CPUs that have it.

I don't really expect CPU_COUNT() to be used in places where performance is of 
the utmost importance.  (For example in igb I use it in attach to enumerate 
the set of CPUs to bind queues to, but nowhere else.)  I can implement a 
bitcount64 by using bitcount32 on both halves unless someone has a better 
suggestion and we can use the bitcount routines instead of __builtin_popcountl 
in BIT_COUNT() for GCC if we care that strongly about it.  Alternatively, I'm 
happy to implement the libcall for GCC 4.2 for the kernel so that 
__builtin_popcountl() works.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?12119175.I8M1urv6pf>