From owner-freebsd-i386@FreeBSD.ORG Sat Feb 5 09:40:24 2005 Return-Path: Delivered-To: freebsd-i386@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8F3E216A4CE for ; Sat, 5 Feb 2005 09:40:24 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1EBA443D39 for ; Sat, 5 Feb 2005 09:40:24 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.1/8.13.1) with ESMTP id j159eN7h051798 for ; Sat, 5 Feb 2005 09:40:23 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.1/8.13.1/Submit) id j159eNft051797; Sat, 5 Feb 2005 09:40:23 GMT (envelope-from gnats) Date: Sat, 5 Feb 2005 09:40:23 GMT Message-Id: <200502050940.j159eNft051797@freefall.freebsd.org> To: freebsd-i386@FreeBSD.org From: Bruce Evans Subject: Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs X-BeenThere: freebsd-i386@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Bruce Evans List-Id: I386-specific issues for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 Feb 2005 09:40:24 -0000 The following reply was made to PR i386/67469; it has been noted by GNATS. From: Bruce Evans To: David Schultz Cc: FreeBSD-gnats-submit@FreeBSD.org, freebsd-i386@FreeBSD.org, bde@FreeBSD.org Subject: Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs Date: Sat, 5 Feb 2005 20:33:43 +1100 (EST) On Fri, 4 Feb 2005, David Schultz wrote: > On Wed, Jun 02, 2004, Bruce Evans wrote: > > On Tue, 1 Jun 2004, David Schultz wrote: > > > > > >Description: > > > src/lib/msun/i387/s_tan.S returns wildly inaccuate results when > > > its input has a large magnitude (>> 2*pi). For example: > > > > > > input s_tan.S k_tan.c > > > 1.776524190754802e+269 1.773388446261095e+16 -1.367233274980565e+01 > > > 1.182891728897420e+57 -1.9314539773999572e-01 1.0020569035866138e+03 > > > 2.303439778835110e+202 2.8465460220132694e+00 3.5686329695133922e+00 > > Here is a patch to fix the problem for tan(). See caveats below... Seems like a good method... > Index: s_tan.S > =================================================================== > RCS file: /cvs/src/lib/msun/i387/s_tan.S,v > retrieving revision 1.6 > diff -u -r1.6 s_tan.S > --- s_tan.S 28 Aug 1999 00:06:14 -0000 1.6 > +++ s_tan.S 4 Feb 2005 21:43:32 -0000 > @@ -45,14 +45,21 @@ > jnz 1f "Large" probably needs to be determined by comparison with a constant. fptan also gives large relative errors if the result is small, even for small args like M_PI (2^39 ulps for MI_PI!!). These can be reduced to 1 ulp (assuming that fdlibm is accurate to 1 ulp) using comparision of the result with a constant. See below. > fstp %st(0) > ret > -1: fldpi > - fadd %st(0) > - fxch %st(1) > -2: fprem1 > - fstsw %ax > - andw $0x400,%ax > - jnz 2b > - fstp %st(1) > - fptan > - fstp %st(0) > + > +/* Use the fdlibm routines for accuracy with large arguments. */ > +1: pushl %ebp > + movl %esp,%ebp > + subl $32,%esp > + leal 12(%esp),%eax > + movl %eax,8(%esp) > + fstpl (%esp) > + call __ieee754_rem_pio2 > + addl $12,%esp > + andl $1,%eax /* compute (eax & 1) ? -1 : 1 */ > + sall %eax > + subl $1,%eax > + neg %eax > + movl %eax,16(%esp) > + call __kernel_tan Better call the MI tan() to do all this. It won't take significantly longer, and shouldn't be reached in most cases anyway. > + leave > ret > > Unfortunately, I'm still getting the wrong answer for large values > that are *supposed* to be handled by the fptan instruction. The > error seems to increase towards the end of the range of fptan, > (-2^63,2^63). For instance, tan(0x1.3dea2a2c29172p+22) is only > off by the least significant 15 binary digits or so, but > tan(0x1.2c95e550f1635p+62) is off by about 5%. Is fptan simply > inherently inaccurate, or did I screw up somewhere? I would be > interested in results from an AMD processor. I think it is because fptan is inherently inaccurate. In an earlier reply, I said that fldpi cannot give a value that has more than 64 bits of accuracy, and that there is some magic that sometimes gives 66 bits. I now think the magic is just that the hardware trig functions use an internal value of pi with that amount of accuracy. When they fail, we use fldpi which gives even less accuracy. fptan succeeds when its arg is in the range [-2^63,2^63], but to be accurate with 66 bits of precision for the internal divisor, the arg must be in the range of about [-2^13,2^13] for double precision and [-2^2,2^2] for extended precision. Better reduce the ranges more for safety. Testing shows that the inaccuracy is much worse than that. The number of correct digits is more like (66 - 53) than (66 - 13) when the result is very close to 0, even for small args near pi. 13 bits of accuracy means 2^40 ulps of inaccuracy for such results. It's surprising that ucbtest doesn't report this. So it seems that the i387 tan() should do something like the following: /* 11 = 13 less a bit for safety. */ if (fabs(arg) > pow(2, 11)) return __mitan(arg); y = fptan(arg); /* * I think the -6 in the following should be more like -2 * (plus 2 for safety?), but -2 would slow down too many cases. * Even -6 gives a 2-ulp error for * arg = M_PI * pow(2, 11) + (delta = pow(2, -6)) + delta2(delta) * The magic 2 here is the 2 extra bits of precision in the * internal value of pi, or possibly the conversion of that * value via the pow(2, 11) range limit (2 = 66 - 53 - 11). */ if (fabs(y) < pow(2, -6); return __mitan(arg); return (y); Simple test program: %%% #include #include double xtan(double x); main() { double x, y; for (x = 1; x < 1 << 20; x = 2 * x) { printf("%.0a:\n%.13a\n%.13a\n", x, tan(x), xtan(x)); y = M_PI * x; printf("M_PI * %.0a:\n%.13a\n%.13a\n", x, tan(y), xtan(y)); y += 0x1.0p-6; printf("M_PI * %.0a + 0x1p-6:\n%.13a\n%.13a\n", x, tan(y), xtan(y)); } } %%% xtan() is the MI fdlibm tan(). Output: %%% 0x1p+0: 0x1.8eb245cbee3a6p+0 0x1.8eb245cbee3a6p+0 M_PI * 0x1p+0: -0x1.1a60000000000p-53 -0x1.1a62633145c07p-53 M_PI * 0x1p+0 + 0x1p-6: 0x1.0005557778525p-6 0x1.0005557778525p-6 0x1p+1: -0x1.17af62e0950f8p+1 -0x1.17af62e0950f8p+1 M_PI * 0x1p+1: -0x1.1a60000000000p-52 -0x1.1a62633145c07p-52 M_PI * 0x1p+1 + 0x1p-6: 0x1.0005557778502p-6 0x1.0005557778502p-6 0x1p+2: 0x1.2866f9be4de13p+0 0x1.2866f9be4de14p+0 M_PI * 0x1p+2: -0x1.1a60000000000p-51 -0x1.1a62633145c07p-51 M_PI * 0x1p+2 + 0x1p-6: 0x1.00055577784bbp-6 0x1.00055577784bbp-6 0x1p+3: -0x1.b32e78f49a1e4p+2 -0x1.b32e78f49a1e4p+2 M_PI * 0x1p+3: -0x1.1a60000000000p-50 -0x1.1a62633145c07p-50 M_PI * 0x1p+3 + 0x1p-6: 0x1.000555777842ep-6 0x1.000555777842ep-6 0x1p+4: 0x1.33d8f03e769a0p-2 0x1.33d8f03e769a0p-2 M_PI * 0x1p+4: -0x1.1a60000000000p-49 -0x1.1a62633145c07p-49 M_PI * 0x1p+4 + 0x1p-6: 0x1.0005557778314p-6 0x1.0005557778314p-6 0x1p+5: 0x1.526f6245432a4p-1 0x1.526f6245432a4p-1 M_PI * 0x1p+5: -0x1.1a60000000000p-48 -0x1.1a62633145c07p-48 M_PI * 0x1p+5 + 0x1p-6: 0x1.00055577780dfp-6 0x1.00055577780dfp-6 0x1p+6: 0x1.2c86afc5c9119p+1 0x1.2c86afc5c9119p+1 M_PI * 0x1p+6: -0x1.1a60000000000p-47 -0x1.1a62633145c07p-47 M_PI * 0x1p+6 + 0x1p-6: 0x1.0005557777c75p-6 0x1.0005557777c75p-6 0x1p+7: -0x1.0a65bcce6f48cp+0 -0x1.0a65bcce6f48cp+0 M_PI * 0x1p+7: -0x1.1a60000000000p-46 -0x1.1a62633145c07p-46 M_PI * 0x1p+7 + 0x1p-6: 0x1.00055577773a2p-6 0x1.00055577773a1p-6 0x1p+8: 0x1.91c8f293711dbp+4 0x1.91c8f293711dbp+4 M_PI * 0x1p+8: -0x1.1a60000000000p-45 -0x1.1a62633145c07p-45 M_PI * 0x1p+8 + 0x1p-6: 0x1.00055577761fap-6 0x1.00055577761fap-6 0x1p+9: -0x1.46be0f0a73387p-4 -0x1.46be0f0a73388p-4 M_PI * 0x1p+9: -0x1.1a60000000000p-44 -0x1.1a62633145c07p-44 M_PI * 0x1p+9 + 0x1p-6: 0x1.0005557773eacp-6 0x1.0005557773eacp-6 0x1p+10: -0x1.48d5be43ada01p-3 -0x1.48d5be43ada01p-3 M_PI * 0x1p+10: -0x1.1a60000000000p-43 -0x1.1a62633145c07p-43 M_PI * 0x1p+10 + 0x1p-6: 0x1.000555776f810p-6 0x1.000555776f80fp-6 0x1p+11: -0x1.518972221f88ep-2 -0x1.518972221f88ep-2 M_PI * 0x1p+11: -0x1.1a60000000000p-42 -0x1.1a62633145c07p-42 M_PI * 0x1p+11 + 0x1p-6: 0x1.0005557766ad7p-6 0x1.0005557766ad5p-6 0x1p+12: -0x1.7aae915eb67f3p-1 -0x1.7aae915eb67f3p-1 M_PI * 0x1p+12: -0x1.1a60000000000p-41 -0x1.1a62633145c07p-41 M_PI * 0x1p+12 + 0x1p-6: 0x1.0005557755065p-6 0x1.0005557755061p-6 0x1p+13: -0x1.a1ff2171ec9fbp+1 -0x1.a1ff2171ec9fbp+1 M_PI * 0x1p+13: -0x1.1a60000000000p-40 -0x1.1a62633145c07p-40 M_PI * 0x1p+13 + 0x1p-6: 0x1.0005557731b82p-6 0x1.0005557731b79p-6 0x1p+14: 0x1.5a04d6e15c566p-1 0x1.5a04d6e15c565p-1 M_PI * 0x1p+14: -0x1.1a60000000000p-39 -0x1.1a62633145c07p-39 M_PI * 0x1p+14 + 0x1p-6: 0x1.00055576eb1bbp-6 0x1.00055576eb1a8p-6 0x1p+15: 0x1.3e75a49b5b447p+1 0x1.3e75a49b5b446p+1 M_PI * 0x1p+15: -0x1.1a60000000000p-38 -0x1.1a62633145c07p-38 M_PI * 0x1p+15 + 0x1p-6: 0x1.000555765de2ep-6 0x1.000555765de08p-6 0x1p+16: -0x1.eae2708cf5424p-1 -0x1.eae2708cf5425p-1 M_PI * 0x1p+16: -0x1.1a60000000000p-37 -0x1.1a62633145c07p-37 M_PI * 0x1p+16 + 0x1p-6: 0x1.0005557543714p-6 0x1.00055575436c7p-6 0x1p+17: -0x1.7bcb26d5d9adap+4 -0x1.7bcb26d5d9af5p+4 M_PI * 0x1p+17: -0x1.1a60000000000p-36 -0x1.1a62633145c07p-36 M_PI * 0x1p+17 + 0x1p-6: 0x1.000555730e8dfp-6 0x1.000555730e846p-6 0x1p+18: 0x1.59ba3666c9a5cp-4 0x1.59ba3666c9a43p-4 M_PI * 0x1p+18: -0x1.1a60000000000p-35 -0x1.1a62633145c07p-35 M_PI * 0x1p+18 + 0x1p-6: 0x1.0005556ea4c75p-6 0x1.0005556ea4b44p-6 0x1p+19: 0x1.5c354a31a8487p-3 0x1.5c354a31a846ep-3 M_PI * 0x1p+19: -0x1.1a60000000000p-34 -0x1.1a62633145c07p-34 M_PI * 0x1p+19 + 0x1p-6: 0x1.00055565d13a2p-6 0x1.00055565d113fp-6 %%% Notes on special values: % 0x1p+11: % -0x1.518972221f88ep-2 % -0x1.518972221f88ep-2 No difference here. However, there are 1-ulp differences for smaller args. % M_PI * 0x1p+11: % -0x1.1a60000000000p-42 % -0x1.1a62633145c07p-42 There is always this huge difference for the (M_PI * x) args. Only the leading 14 mantissa bits agree. % M_PI * 0x1p+11 + 0x1p-6: % 0x1.0005557766ad7p-6 % 0x1.0005557766ad5p-6 This is the first 2-ulp difference for the (M_PI * integer + delta) args in my limited testing. % 0x1p+17: % -0x1.7bcb26d5d9adap+4 % -0x1.7bcb26d5d9af5p+4 This is the first 2-ulp difference for the integer args in the above output. The difference of 27 ulps is consistent with an internal value of pi that is not much more accurate than 66 bits (17 + 53 - log2(27) = 65.25...). Bruce