Date: Sat, 4 Dec 2010 00:45:50 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: das@FreeBSD.org Cc: freebsd-bugs@FreeBSD.org, abramo.bagnara@gmail.com Subject: Re: kern/133583: [libm] fma(3) does not respect rounding mode using extended precision Message-ID: <20101204000948.E2687@besplex.bde.org> In-Reply-To: <201012030702.oB3724qN017772@freefall.freebsd.org> References: <201012030702.oB3724qN017772@freefall.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 3 Dec 2010 das@FreeBSD.org wrote: > Synopsis: [libm] fma(3) does not respect rounding mode using extended precision > Thanks for the report! This limitation is described in the source for > fma(), and unfortunately, it is unlikely to ever change. There are > several reasons: > > - We are a long way from having the necessary compiler support to make > dynamic precision changes work as expected. > - Dynamic FPU precision changes aren't officially supported, and > fpsetprec() has been documented as deprecated for many years. Not really. See my reply to the commit to the man pages. > - The only supported architecture that can have this problem due to > dynamic precision changes is i386, and even then only for non-SSE2 > builds. SSE2 makes little difference to this problem for i386, except for clang it makes it worse. The ABI requires using the FPU for at least returning values, and gcc keeps using the FPU for operations too. OTOH, clang uses SSE2 for operations. This gives an even larger pessimization than I expected (in 1 example, clang with a wrong arch (nocona instead of core2, since gcc doesn't support -march=core2 yet and I used the same flags for clang as for gcc), clang was 170/45 times slower; with -march=core2, it was only 139/45 times slower; with -march=i386, it was only 88/45 times slower. Here -march=i386 works mainly by avoiding avoiding even useful SSE1 instructions. The example was a float function, so it only needed SSE1. Restoring use of SSE1 using -march=athlon-xp restores the slowness to 144/45.) It also makes the precision used more unpredictable than before. It now depends on $CC and $CFLAGS, but float.h doesn't. Fortunately, i386 float.h covers some cases by defining FLT_EVAL_METHOD = -1, which says that the FP evaluation method is indeterminate :-). Unfortunately, i386 float.h's definition of float_t as double becomes wrong if floats are actually evaluated in float precision, like clang's use of SSE1 gives. > - The cost and complexity associated with making every function in > libm detect and adapt to dynamic precision changes is prohibitive. Same as for dynamic rounding direction changes. Actually, much lower cost and complexity than for rounding direction. For rounding direction, it is actually useful to keep the caller's mode, and supporting this would require making sure every step of every function works right in every mode. For rounding precision, we can just switch to mode that works for every function that needs it, and most don't need it except for bizarre environments (like forcing single precision and calling extended precision functions and expecting them to return any particular precision). > I have updated the manpage for fpsetprec() to explain that changing > the FPU precision isn't supported by the compiler or libraries. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101204000948.E2687>