From owner-freebsd-bugs@freebsd.org Sun Nov 22 02:02:06 2015 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E0E2FA3518F for ; Sun, 22 Nov 2015 02:02:06 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 9072912D4 for ; Sun, 22 Nov 2015 02:02:06 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id F04FF7829CA for ; Sun, 22 Nov 2015 13:01:57 +1100 (AEDT) Date: Sun, 22 Nov 2015 13:01:57 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org cc: freebsd-bugs@freebsd.org Subject: Re: [Bug 204671] clang floating point wrong around Inf (i386) In-Reply-To: Message-ID: <20151122112921.P1083@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=R6/+YolX c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=9cW_t1CCXrUA:10 a=JzwRw_2MAAAA:8 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=Twlkf-z8AAAA:8 a=ltDKUUI1AAAA:8 a=jK65cvzJWPIBY70AficA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Nov 2015 02:02:07 -0000 On Sat, 21 Nov 2015 a bug that supreesses replies in mail wrote: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671 > > Jilles Tjoelker changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |jilles@FreeBSD.org > > --- Comment #2 from Jilles Tjoelker --- > This is related to the strangeness that is the x87 FPU. Internally, the x87 > performs calculations in extended precision. Even if the precision control is > set to double precision, like FreeBSD and Windows do by default but Linux and > Solaris do not, the x87 registers still have greater range than double > precision. Which versions of Windows do it? I only have Windows/DOS compilers from 1995 or earlier, and they do it. I think Visual Studio (?) does it for compatibility. Does Windows actually require this as an ABI? then it should also disallow clang's bug of using SSE on 32-bit systems. > As a result, the addition 1e308 + 1e308 does not overflow, but produces a > result of approximately 2e308 in an x87 register. When this result is stored to > memory in double precision format, overflow or rounding will occur. For C (C90 and later) compilers, also when this result is assigned or cast to variable of type double. This sometimes loses precision and is always slow (typically 2-4 times slower) and is rarely needed, so it is broken by default in gcc and clang on i386 with x87. Recent versions of gcc can be turned into C compilers in this respect using -fexcess-precision=standard. Standards directives like -std=c99 but not -std=gnu99 also give this perfectly correct slowness for unsuspecting users that don't want the slowness but want a C compiler in other respects. clang now knows that -fexcess-precision exists, but doesn't support it. It also doesn't support this implicitly for -std=c99. For C11 compilers, also when this result is returned. This gives further destruction of precision and slownes and is broken by default. IIRC, -std=c99 gives this bug even for C99 mode in gcc. clang doesn't support this even with-std=c11. > What happens in t1.c is that the conversion from extended to double precision > happens two times. The conversion for printing the bytes happens directly after > the calculation and therefore uses the modified rounding mode. The conversion > for printf happens during the inlined fesetround() call, after setting the x87 > rounding mode and before calling a function __test_sse to check whether SSE is > available. (After that, the value is stored and loaded again a few times.) > Therefore, the conversion for printf uses an incorrect rounding mode. Both conversions are done after the fesetround() call in program order. This is asking for trouble. But since there is an assignment before the call, there is no problem if the compiler is a C compiler. clang is far from being a C compiler and does unnatural ordering that gives trouble: program order: runtime order: add add assign assign (to memory var) for printing in hex restore rounding mode restore rounding mode print as double assign (to memory var) for printing as double print as hex print as double print as hex > Global variables force the compiler to store values to memory more often and > may therefore reduce x87 weirdnesses. -ffloat-store is often recommended for causing the slow store. Before -fexcess-precision, there was no similar hack for for fixing casts. But it is an easier and more controllable hack to use a volatile variable. See STRICT_ASSIGN() in FreeBSD libm. Even minimised use of this gives slowness and loses precision. So in some functions I have started using double_t to avoid the slowness (especially if the compiler is a C compiler) and keep the extra precision intentionally. Some hacks are needed to avoid destroying the extra precision on return. (Since the extra precision is intentionaly, it doesn't take the C11 bug to require destroying it on return.) The expression huge*huge is used often in FreeBSD libm to raise the overflow flag and return +Inf. It doesn't actually work for that. Some broken compilers invalididly optimize it and similar expressions for raising underflow to just returning a value; the value is then correct but the flags are not. But the code is buggy. With extra precision, it asks for and should get a value larger than DBL_MAX and no exception. The C11 bug breaks this. This gives a wrong value and for use in expressions, but the use is often to store to a value of type double; then if the compiler is a C compiler or due to some accident like storing to memory, the value is sometimes converted to double. A special case test program for comparing functions does rounding mode flipping almost exactly the same as t1.c and differs only in care taken with assignments: X fpsetprec(RPREF); X STRICT_ASSIGN(flref_t, vref, FUNCREF(x)); X fpsetprec(RPTEST); X STRICT_ASSIGN(fl_t, v, FUNCTEST(x)); X fpsetprec(RPDEF); Here flref_t might be long double and fl_t double. FUNCREF might be expl and FUNCTEST exp. Oops, this actually modifies the rounding precision. The rounding mode is the same for the reference function and the test function. It is still important to get the order right. Old versions of this use explicit volatile variables. This version uses STRICT_ASSIGN which uses volatile for double but not for long double. The volatile variables accidentally ensure the ordering of the fpset* calls. I'm not sure of fp* and fenv* calls have sufficient ordering. Function calls are supposed to give sequence points, but compilers can see too far into inline ones. > Following the C standard, you would have to use #pragma STDC FENV_ACCESS on > to make this work reliably. This shouldn't be needed in practice. Anyway, it is not required to affect the compiler bug of not reducing to double precision in assignments and casts. > However, neither gcc nor clang support this pragma. > They follow an ad hoc approach to floating point exceptions and modes. In gcc > you can use -frounding-math to prevent some problematic optimizations but clang > doesn't even support that. Clang has a bug about the pragma, > https://llvm.org/bugs/show_bug.cgi?id=8100, which has been open for five years > with various duplicates but no other significant action. gcc does support this, even in 10+ year old versions (4.2.1), to the extent of having documentation about it: from gcc.info: X * `The default state for the `FENV_ACCESS' pragma (C99 7.6.1).' X X This pragma is not implemented, but the default is to "off" unless X `-frounding-math' is used in which case it is "on". This gives enough control for a simple test program, and I think the option that keeps the flag always on gives strict standards conformace for the pragma. > You will generally have fewer problems with weirdly changing floating point > results if you use SSE instead of the x87 FPU, assuming your CPUs are new > enough. SSE performs calculations in the precision specified by the program > (single or double), so it does not matter when or if a value is spilled to > memory. As noted above, GCC and clang are still ignorant about the side effects > with the floating point exceptions and modes, though. Spilling of intermediate x87 values is one thing that works right in all or most versions clang but not in old versions of gcc. The test program seems to be looking for bugs, not workarounds. Its description says to not use high -march since this makes the bug go away. High -march exposes the bug that clang starts using SSE on i386. FreeBSD doesn't support this. The non-support includes: - setjmp()/longjmp() don't support SSE - double_t is still long double. This seems to give only pessimizations. Bruce