Date: Mon, 14 Mar 2016 01:02:20 +0100 From: Dimitry Andric <dim@FreeBSD.org> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: freebsd-toolchain@freebsd.org Subject: Re: clang gets numerical underflow wrong, please fix. Message-ID: <A70D119A-514A-4949-9BCB-CA344650BDB5@FreeBSD.org> In-Reply-To: <20160313201004.GA26343@troutmask.apl.washington.edu> References: <20160313182521.GA25361@troutmask.apl.washington.edu> <74970883-FE44-47C0-BDA0-92DB0723398A@FreeBSD.org> <20160313201004.GA26343@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_40B5429D-BCD2-4684-8E3A-55F296B73BBE Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On 13 Mar 2016, at 21:10, Steve Kargl <sgk@troutmask.apl.washington.edu> = wrote: > On Sun, Mar 13, 2016 at 09:03:57PM +0100, Dimitry Andric wrote: ... >> So it's storing the intermediate result in a double, for some reason. >> The fnstsw will then result in zero, since there was no underflow at >> that point. >>=20 >> I will submit a bug for this upstream, thanks for the report. Submitted upstream as: https://llvm.org/bugs/show_bug.cgi?id=3D26931 > Thanks for the quick reply. But, it must be using an 80-bit > extended double instead of a double for storage. This variation >=20 > #include <fenv.h> > #include <stdio.h> >=20 > int > main(void) > { > int i; > // float x =3D 1.f; > double x =3D 1.; > i =3D 0; > feclearexcept(FE_ALL_EXCEPT); > do { > x /=3D 2; > i++; > } while(!fetestexcept(FE_UNDERFLOW)); > if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: "); > printf("x =3D %e after %d iterations\n", x, i); >=20 > return 0; > } >=20 > yields >=20 > % cc -O -o z b.c -lm && ./z > FE_UNDERFLOW: x =3D 0.000000e+00 after 16435 iterations >=20 > It should be 1075 iterations. >=20 > Note, there is a similar issue with OVERFLOW. The upshot is > that clang on current is probably miscompiling libm. With this example, I also get different results from gcc (4.8.5), depending on the optimization level: $ gcc -O underflow-iter.c -o underflow-iter-gcc -lm $ ./underflow-iter-gcc FE_UNDERFLOW: x =3D 0.000000e+00 after 1075 iterations $ gcc -O2 underflow-iter.c -o underflow-iter-gcc -lm $ ./underflow-iter-gcc FE_UNDERFLOW: x =3D 0.000000e+00 after 16435 iterations Similar for the overflow case: $ gcc -O overflow-iter.c -o overflow-iter-gcc -lm $ ./overflow-iter-gcc FE_OVERFLOW: x =3D inf after 1024 iterations $ gcc -O2 overflow-iter.c -o overflow-iter-gcc -lm $ ./overflow-iter-gcc FE_OVERFLOW: x =3D inf after 16384 iterations Are we depending on some sort of subtle undefined behavior here? With -O, the 'main loop' becomes: .L3: fld1 fstpl 24(%esp) movl $0, %ebx .L8: fldl 24(%esp) fld %st(0) faddp %st, %st(1) fstpl 24(%esp) addl $1, %ebx fnstsw %ax movl %eax, %esi movl __has_sse, %eax testl %eax, %eax je .L4 cmpl $2, %eax jne .L5 call __test_sse testl %eax, %eax je .L5 .L4: stmxcsr 44(%esp) jmp .L6 .L5: movl $0, 44(%esp) .L6: orl 44(%esp), %esi testl $8, %esi je .L8 With -O2, it becomes: .L3: fld1 xorl %ebx, %ebx .L12: fadd %st(0), %st addl $1, %ebx fnstsw %ax testl %edx, %edx movl %eax, %esi je .L10 cmpl $2, %edx je .L27 .L9: xorl %eax, %eax .L8: orl %eax, %esi andl $8, %esi je .L12 So it switches from using faddp and fstpl to direct fadd of %st(0) and %st. I assume that uses the internal 80 bit precision? Gcc also manages to move the __has_sse stuff out to further down in the function, but it does not really affect the result. -Dimitry --Apple-Mail=_40B5429D-BCD2-4684-8E3A-55F296B73BBE Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.29 iEYEARECAAYFAlbl/5MACgkQsF6jCi4glqO95wCfaSScY8fm/V7XtAcMJ7Xz7Ctw /OUAoISYUy/1dgZFhXFbT7wPyDRgSWZF =prQV -----END PGP SIGNATURE----- --Apple-Mail=_40B5429D-BCD2-4684-8E3A-55F296B73BBE--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A70D119A-514A-4949-9BCB-CA344650BDB5>