Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 May 2013 11:51:17 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        dim@freebsd.org, numerics@freebsd.org
Subject:   Re: extra-precision bugs in clang on i386 even with __SSE*_MATH__
Message-ID:  <20130522085117.GK3047@kib.kiev.ua>
In-Reply-To: <20130522131618.M1038@besplex.bde.org>
References:  <20130522131618.M1038@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--RrBt8QUUAk8/lIok
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Only replying to some secondary points in the message.

On Wed, May 22, 2013 at 02:55:19PM +1000, Bruce Evans wrote:
> clang with certain march= in CFLAGS uses SSE for double and/or float
> operations even on i386 although the ABI doesn't really allow this,
> and sets __SSE2_MATH__ and/or __SSE_MATH__ to indicate this.  It is
> well known that this breaks the definitions of float_t, double_t and
> FLT_EVAL_METHOD, because FreeBSD headers haven't been updated to support
> clang; in particular they know nothing of __SSE*_MATH__.

Hm, i386 ABI is silent about XMM registers use, which means that the
registers are caller-saved, if available.  And it cannot mandate the
non-use of any processor instructions at all.  ABI-conformant code
could use any supported CPU instruction (and unsupported as well, if
the SIGILL is the intended outcome).

> C11 breaks this area even more.  It specifies that extra precision is
> always clipped on return, at least in C functions.  This breaks
> intentionally returning extra precision for accuracy, and more seriously
> it breaks efficiency by requiring the slow clipping operation on every
> return (on x86, the clipping operation takes about as long as 2
> serially-dependent addition operations and stalls pipelines due to
> its serial dependencies).

SSE conversions like CVTSD2SS are very fast. According to the Agner Fog
tables, on the SandyBridge-class CPU, the instruction has the latency
of 3 and new CVTSD2SS instruction can be started on each cycle. This is
comparable with the simple integer arithmetic.

--RrBt8QUUAk8/lIok
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJRnIcFAAoJEJDCuSvBvK1BzcQP/i5mXTSYNe0/gt2nrrFIGlk2
QlhpYKw64iQmnQzTqJVUvdM39fSJGklP5l3lTweQRTxvMdmL+JMQeTF1SK99pfCt
4X2Nnk60uZbuodsICLIgkwvudfhCDJ+QJC/Kr1R1yaRrw0DZpt/+MBZ01r9mLhWd
4WdAylpGeKDS/7TkLdvObZMMPgIfdfqC0L0xgs/OCfowl2OVlFB4EJAWGwvoUEou
AtVhunMhpWGOzuHsBojK6yI3m0LwU1zjzqcO9dxuYVwuP8giSLb7x9uwN8kMoWd/
GErtkjLCmMboTH3g9lIbdbTvGASLiiN7gx0Hcr2E03HnnQuCbcC0xP2gEfT3cPxQ
w6EBBNFac0ZCNnOslOr0PvLB8L9asSIPq4MP+YfOBlmodxfnGLRXeR0NkY6zyERX
HBk9eKGT9OEWEaN7KFlTBONJm/oyyr6p2MtCnOHOKR7xcMOempXu6xijdhEcbuC5
Q8HQI9LAzLu69s5u1PYrjYdfTKaWZHPTtQOMqM7G82rAiwCDSku02sFL3E07X0oY
o6o4T1GgL/h7ioBmHbiFpLda3wyYU6HP81tpCfTUfct12k7KUnHKd4TUDAIimLvo
uh/Cj8EvAlQrqLpZs5oJJmnahMJKJi0+syiWgTja6D5FmCUg2FEfPf65rlSN5fKU
KF8nszrZVvrNky+H+BSI
=nc5Z
-----END PGP SIGNATURE-----

--RrBt8QUUAk8/lIok--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130522085117.GK3047>