Date: Sat, 16 Feb 2019 00:27:16 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Alexey Dokuchaev <danfe@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r344118 - head/sys/i386/include Message-ID: <20190215233444.F2229@besplex.bde.org> In-Reply-To: <20190215103644.GN24863@kib.kiev.ua> References: <201902141353.x1EDrB0Z076223@repo.freebsd.org> <20190215071604.GA89653@FreeBSD.org> <20190215103644.GN24863@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 15 Feb 2019, Konstantin Belousov wrote: > On Fri, Feb 15, 2019 at 07:16:04AM +0000, Alexey Dokuchaev wrote: >> On Thu, Feb 14, 2019 at 01:53:11PM +0000, Konstantin Belousov wrote: >>> New Revision: 344118 >>> URL: https://svnweb.freebsd.org/changeset/base/344118 >>> >>> Log: >>> Provide userspace versions of do_cpuid() and cpuid_count() on i386. >>> >>> Some older compilers, when generating PIC code, cannot handle inline >>> asm that clobbers %ebx (because %ebx is used as the GOT offset >>> register). Userspace versions avoid clobbering %ebx by saving it to >>> stack before executing the CPUID instruction. >>> >>> ... >>> +static __inline void >>> +do_cpuid(u_int ax, u_int *p) >>> +{ >>> + __asm __volatile( >>> + "pushl\t%%ebx\n\t" >>> + "cpuid\n\t" >>> + "movl\t%%ebx,%1\n\t" >>> + "popl\t%%ebx" >> >> Is there a reason to prefer pushl+movl+popl instead of movl+xchgl? >> >> "movl %%ebx, %1\n\t" >> "cpuid\n\t" >> "xchgl %%ebx, %1" > > xchgl seems to be slower even in registers format (where no implicit > lock is used). If you can demonstrate that your fragment is better in > some microbenchmark, I can change it. But also note that its use is not > on the critical path. The should have the same speed on modern x86. xchgl %reg1,%reg2 is not slow, but it changes 2 visible registers and a needs somwhere to hold one of the registers while changing it, so on 14 year old AthlonXP where I know the times in cycles better, register xchgl was twice as slow as register move (2 cycles latency instead of 1, and throughput == latency (?)). On 2015 Haswell, register movl in a loop is in parallel with the loop overhead (1 cycle), while xchgl and pushl/popl take 0.5 cycles longer on average. Latency might be a problem for pushl/popl in critical paths. There aren't many of those. There is no reason to use the style with strings made unreadable using soft tabs and newlines. gcc supported hard newlines 20-30 years ago, but broke this because C90 or C99 made hard newlines in strings invalid. This broke lots of my asms. I now use hard tabs and backslash-hard_newlines after soft newlines: __asm __volatile(" \n\ pushl %%ebx \n\ cpuid \n\ movl %%ebx,%1 \n\ popl %%ebx" \n\ "); The Standard C lossage forces use \n\ before hard newline, and readability forces a hard-to-edit variable number of hard tabs before \n\, but otherwise the code looks the same as before (opcodes are outdented to column 8 in large asms, and labels are outdented to column 0, so that the code looks the same as non-inline asm too). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190215233444.F2229>