Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Nov 2015 03:24:58 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        David Chisnall <theraven@freebsd.org>,  Eric van Gyzen <vangyzen@freebsd.org>, src-committers@freebsd.org,  svn-src-all@freebsd.org, svn-src-stable@freebsd.org,  svn-src-stable-10@freebsd.org
Subject:   Re: svn commit: r290014 - in stable/10: lib/libthr/arch/amd64 lib/libthr/arch/i386 libexec/rtld-elf/amd64 libexec/rtld-elf/i386 share/mk
Message-ID:  <20151116024035.P1071@besplex.bde.org>
In-Reply-To: <20151115153659.GD5854@kib.kiev.ua>
References:  <201510261621.t9QGLuL2028872@repo.freebsd.org> <71109998-711D-4ECA-9B44-5A7B1F8705F3@FreeBSD.org> <20151115153659.GD5854@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 15 Nov 2015, Konstantin Belousov wrote:

> On Sat, Nov 14, 2015 at 06:30:13PM +0000, David Chisnall wrote:
>> On 26 Oct 2015, at 16:21, Eric van Gyzen <vangyzen@FreeBSD.org> wrote:
>>>
>>> One counter-argument to this change is that most applications already
>>>  use SIMD, and the number of applications and amount of SIMD usage
>>>  are only increasing.
>>
>> Note that SSE and SIMD are not the same thing.  The x86-64 ABI uses SSE registers for floating point arguments, so even a purely scalar application that uses floating point will end up faulting in the SSE state.  This is not the case on IA32, where x87 registers are used (though when compiling for i686, SSE is used by default because register allocation for x87 is a huge pain).
>
> Is it ?  If SSE is used on i686 (AKA >= Pentium Pro) by default,
> this is a huge bug.

clang is not as broken as that.  It needs excessive setting of -march to
get SSE instructions and of course a runtime arch that has SSE to execute
them.  I usually see it by forcing -march=core2 or -march=native on a
host arch that has SSE.

Using SSE instead of x87 on i386 is a usually a small pessimization except
in large functions where the x87 register set is too small or non-scalar
SSE can be used, since the i386 ABI requires returning results in x87
registers and the conversions between SSE and x87 for this have large
latency.  But clang doesn't understand the x87 very well, so it tends to
be faster using SSE despite this.  Strangely, it appears to understand the
x87 better on amd64 than on i386 -- better than gcc on amd64, but worse
than gcc on i386.  I think this is mostly because to kill SSE for
arithmetic on i386 on arches that support it, all use of SSE must be
killed using -mno-sse or -march=lower.  -mfpmath=387 to give fine control
of this is still broken in clang: -march=i386 -mfpmath=387 works, but
-march=core2 -mfpmath=387 fails with "'387' not supported.  Similarly for
any arch that supports sse, or with -mfpmath=sse on an arch where clang
wants to use x87.

gcc supports -mfpmath=387 even on amd64.  This is just slower, and usually
not more accurate, since the ABI forces conversons on function return.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151116024035.P1071>