Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Feb 2019 13:55:51 +0000
From:      Alexey Dokuchaev <danfe@freebsd.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r344118 - head/sys/i386/include
Message-ID:  <20190215135551.GA99583@FreeBSD.org>
In-Reply-To: <20190215233444.F2229@besplex.bde.org>
References:  <201902141353.x1EDrB0Z076223@repo.freebsd.org> <20190215071604.GA89653@FreeBSD.org> <20190215103644.GN24863@kib.kiev.ua> <20190215233444.F2229@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Feb 16, 2019 at 12:27:16AM +1100, Bruce Evans wrote:
> On Fri, 15 Feb 2019, Konstantin Belousov wrote:
> > On Fri, Feb 15, 2019 at 07:16:04AM +0000, Alexey Dokuchaev wrote:
> >> On Thu, Feb 14, 2019 at 01:53:11PM +0000, Konstantin Belousov wrote:
> >>> New Revision: 344118
> >>> URL: https://svnweb.freebsd.org/changeset/base/344118
> >>>
> >>> Log:
> >>>   Provide userspace versions of do_cpuid() and cpuid_count() on i386.
> >>> ...
> >>> +static __inline void
> >>> +do_cpuid(u_int ax, u_int *p)
> >>> +{
> >>> +	__asm __volatile(
> >>> +	    "pushl\t%%ebx\n\t"
> >>> +	    "cpuid\n\t"
> >>> +	    "movl\t%%ebx,%1\n\t"
> >>> +	    "popl\t%%ebx"
> >>
> >> Is there a reason to prefer pushl+movl+popl instead of movl+xchgl?
> >>
> >>     "movl %%ebx, %1\n\t"
> >>     "cpuid\n\t"
> >>     "xchgl %%ebx, %1"
> >
> > xchgl seems to be slower even in registers format (where no implicit
> > lock is used).  If you can demonstrate that your fragment is better in
> > some microbenchmark, I can change it.  But also note that its use is not
> > on the critical path.
> 
> The should have the same speed on modern x86.  xchgl %reg1,%reg2 is
> not slow, but it changes 2 visible registers and a needs somwhere to
> hold one of the registers while changing it, so on 14 year old AthlonXP
> where I know the times in cycles better, register xchgl was twice as slow
> as register move (2 cycles latency instead of 1, and throughput ==
> latency (?)).  On 2015 Haswell, register movl in a loop is in parallel
> with the loop overhead (1 cycle), while xchgl and pushl/popl take 0.5
> cycles longer on average.  Latency might be a problem for pushl/popl
> in critical paths.  There aren't many of those.

Thanks Bruce, I guess we can leave it as is then.

./danfe



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190215135551.GA99583>