Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 5 Feb 2005 20:33:43 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        David Schultz <das@FreeBSD.org>
Cc:        bde@FreeBSD.org
Subject:   Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs
Message-ID:  <20050205181808.J10966@delplex.bde.org>
In-Reply-To: <20050204215913.GA44598@VARK.MIT.EDU>
References:  <200406012251.i51MpkkU024224@VARK.homeunix.com> <20040602172105.T23521@gamplex.bde.org> <20050204215913.GA44598@VARK.MIT.EDU>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 4 Feb 2005, David Schultz wrote:

> On Wed, Jun 02, 2004, Bruce Evans wrote:
> > On Tue, 1 Jun 2004, David Schultz wrote:
> >
> > > >Description:
> > > src/lib/msun/i387/s_tan.S returns wildly inaccuate results when
> > > its input has a large magnitude (>> 2*pi).  For example:
> > >
> > > input			s_tan.S				k_tan.c
> > > 1.776524190754802e+269	1.773388446261095e+16		-1.367233274980565e+01
> > > 1.182891728897420e+57	-1.9314539773999572e-01		1.0020569035866138e+03
> > > 2.303439778835110e+202	2.8465460220132694e+00		3.5686329695133922e+00
>
> Here is a patch to fix the problem for tan().  See caveats below...

Seems like a good method...

> Index: s_tan.S
> ===================================================================
> RCS file: /cvs/src/lib/msun/i387/s_tan.S,v
> retrieving revision 1.6
> diff -u -r1.6 s_tan.S
> --- s_tan.S	28 Aug 1999 00:06:14 -0000	1.6
> +++ s_tan.S	4 Feb 2005 21:43:32 -0000
> @@ -45,14 +45,21 @@
>  	jnz	1f

"Large" probably needs to be determined by comparison with a constant.
fptan also gives large relative errors if the result is small, even
for small args like M_PI (2^39 ulps for MI_PI!!).  These can be reduced
to 1 ulp (assuming that fdlibm is accurate to 1 ulp) using comparision
of the result with a constant.  See below.

>  	fstp	%st(0)
>  	ret
> -1:	fldpi
> -	fadd	%st(0)
> -	fxch	%st(1)
> -2:	fprem1
> -	fstsw	%ax
> -	andw	$0x400,%ax
> -	jnz	2b
> -	fstp	%st(1)
> -	fptan
> -	fstp	%st(0)
> +
> +/* Use the fdlibm routines for accuracy with large arguments. */
> +1:	pushl   %ebp
> +	movl    %esp,%ebp
> +	subl    $32,%esp
> +	leal	12(%esp),%eax
> +	movl	%eax,8(%esp)
> +	fstpl	(%esp)
> +	call	__ieee754_rem_pio2
> +	addl	$12,%esp
> +	andl	$1,%eax			/* compute (eax & 1) ? -1 : 1 */
> +	sall	%eax
> +	subl	$1,%eax
> +	neg	%eax
> +	movl	%eax,16(%esp)
> +	call	__kernel_tan

Better call the MI tan() to do all this.  It won't take significantly
longer, and shouldn't be reached in most cases anyway.

> +	leave
>  	ret
>
> Unfortunately, I'm still getting the wrong answer for large values
> that are *supposed* to be handled by the fptan instruction.  The
> error seems to increase towards the end of the range of fptan,
> (-2^63,2^63).  For instance, tan(0x1.3dea2a2c29172p+22) is only
> off by the least significant 15 binary digits or so, but
> tan(0x1.2c95e550f1635p+62) is off by about 5%.  Is fptan simply
> inherently inaccurate, or did I screw up somewhere?  I would be
> interested in results from an AMD processor.

I think it is because fptan is inherently inaccurate.  In an earlier
reply, I said that fldpi cannot give a value that has more than 64
bits of accuracy, and that there is some magic that sometimes gives
66 bits.  I now think the magic is just that the hardware trig functions
use an internal value of pi with that amount of accuracy.  When they
fail, we use fldpi which gives even less accuracy.

fptan succeeds when its arg is in the range [-2^63,2^63], but to be
accurate with 66 bits of precision for the internal divisor, the arg
must be in the range of about [-2^13,2^13] for double precision and
[-2^2,2^2] for extended precision.  Better reduce the ranges more for
safety.

Testing shows that the inaccuracy is much worse than that.  The number
of correct digits is more like (66 - 53) than (66 - 13) when the result
is very close to 0, even for small args near pi.  13 bits of accuracy
means 2^40 ulps of inaccuracy for such results.  It's surprising that
ucbtest doesn't report this.

So it seems that the i387 tan() should do something like the following:

	/* 11 = 13 less a bit for safety. */
	if (fabs(arg) > pow(2, 11))
		return __mitan(arg);
	y = fptan(arg);
	/*
	 * I think the -6 in the following should be more like -2
	 * (plus 2 for safety?), but -2 would slow down too many cases.
	 * Even -6 gives a 2-ulp error for
	 * arg = M_PI * pow(2, 11) + (delta = pow(2, -6)) + delta2(delta)
	 * The magic 2 here is the 2 extra bits of precision in the
	 * internal value of pi, or possibly the conversion of that
	 * value via the pow(2, 11) range limit (2 = 66 - 53 - 11).
	 */
	if (fabs(y) < pow(2, -6);
		return __mitan(arg);
	return (y);

Simple test program:

%%%
#include <math.h>
#include <stdio.h>

double xtan(double x);

main()
{
	double x, y;

	for (x = 1; x < 1 << 20; x = 2 * x) {
		printf("%.0a:\n%.13a\n%.13a\n",
		    x, tan(x), xtan(x));
		y = M_PI * x;
		printf("M_PI * %.0a:\n%.13a\n%.13a\n",
		    x, tan(y), xtan(y));
		y += 0x1.0p-6;
		printf("M_PI * %.0a + 0x1p-6:\n%.13a\n%.13a\n",
		    x, tan(y), xtan(y));
	}
}
%%%

xtan() is the MI fdlibm tan().

Output:

%%%
0x1p+0:
0x1.8eb245cbee3a6p+0
0x1.8eb245cbee3a6p+0
M_PI * 0x1p+0:
-0x1.1a60000000000p-53
-0x1.1a62633145c07p-53
M_PI * 0x1p+0 + 0x1p-6:
0x1.0005557778525p-6
0x1.0005557778525p-6
0x1p+1:
-0x1.17af62e0950f8p+1
-0x1.17af62e0950f8p+1
M_PI * 0x1p+1:
-0x1.1a60000000000p-52
-0x1.1a62633145c07p-52
M_PI * 0x1p+1 + 0x1p-6:
0x1.0005557778502p-6
0x1.0005557778502p-6
0x1p+2:
0x1.2866f9be4de13p+0
0x1.2866f9be4de14p+0
M_PI * 0x1p+2:
-0x1.1a60000000000p-51
-0x1.1a62633145c07p-51
M_PI * 0x1p+2 + 0x1p-6:
0x1.00055577784bbp-6
0x1.00055577784bbp-6
0x1p+3:
-0x1.b32e78f49a1e4p+2
-0x1.b32e78f49a1e4p+2
M_PI * 0x1p+3:
-0x1.1a60000000000p-50
-0x1.1a62633145c07p-50
M_PI * 0x1p+3 + 0x1p-6:
0x1.000555777842ep-6
0x1.000555777842ep-6
0x1p+4:
0x1.33d8f03e769a0p-2
0x1.33d8f03e769a0p-2
M_PI * 0x1p+4:
-0x1.1a60000000000p-49
-0x1.1a62633145c07p-49
M_PI * 0x1p+4 + 0x1p-6:
0x1.0005557778314p-6
0x1.0005557778314p-6
0x1p+5:
0x1.526f6245432a4p-1
0x1.526f6245432a4p-1
M_PI * 0x1p+5:
-0x1.1a60000000000p-48
-0x1.1a62633145c07p-48
M_PI * 0x1p+5 + 0x1p-6:
0x1.00055577780dfp-6
0x1.00055577780dfp-6
0x1p+6:
0x1.2c86afc5c9119p+1
0x1.2c86afc5c9119p+1
M_PI * 0x1p+6:
-0x1.1a60000000000p-47
-0x1.1a62633145c07p-47
M_PI * 0x1p+6 + 0x1p-6:
0x1.0005557777c75p-6
0x1.0005557777c75p-6
0x1p+7:
-0x1.0a65bcce6f48cp+0
-0x1.0a65bcce6f48cp+0
M_PI * 0x1p+7:
-0x1.1a60000000000p-46
-0x1.1a62633145c07p-46
M_PI * 0x1p+7 + 0x1p-6:
0x1.00055577773a2p-6
0x1.00055577773a1p-6
0x1p+8:
0x1.91c8f293711dbp+4
0x1.91c8f293711dbp+4
M_PI * 0x1p+8:
-0x1.1a60000000000p-45
-0x1.1a62633145c07p-45
M_PI * 0x1p+8 + 0x1p-6:
0x1.00055577761fap-6
0x1.00055577761fap-6
0x1p+9:
-0x1.46be0f0a73387p-4
-0x1.46be0f0a73388p-4
M_PI * 0x1p+9:
-0x1.1a60000000000p-44
-0x1.1a62633145c07p-44
M_PI * 0x1p+9 + 0x1p-6:
0x1.0005557773eacp-6
0x1.0005557773eacp-6
0x1p+10:
-0x1.48d5be43ada01p-3
-0x1.48d5be43ada01p-3
M_PI * 0x1p+10:
-0x1.1a60000000000p-43
-0x1.1a62633145c07p-43
M_PI * 0x1p+10 + 0x1p-6:
0x1.000555776f810p-6
0x1.000555776f80fp-6
0x1p+11:
-0x1.518972221f88ep-2
-0x1.518972221f88ep-2
M_PI * 0x1p+11:
-0x1.1a60000000000p-42
-0x1.1a62633145c07p-42
M_PI * 0x1p+11 + 0x1p-6:
0x1.0005557766ad7p-6
0x1.0005557766ad5p-6
0x1p+12:
-0x1.7aae915eb67f3p-1
-0x1.7aae915eb67f3p-1
M_PI * 0x1p+12:
-0x1.1a60000000000p-41
-0x1.1a62633145c07p-41
M_PI * 0x1p+12 + 0x1p-6:
0x1.0005557755065p-6
0x1.0005557755061p-6
0x1p+13:
-0x1.a1ff2171ec9fbp+1
-0x1.a1ff2171ec9fbp+1
M_PI * 0x1p+13:
-0x1.1a60000000000p-40
-0x1.1a62633145c07p-40
M_PI * 0x1p+13 + 0x1p-6:
0x1.0005557731b82p-6
0x1.0005557731b79p-6
0x1p+14:
0x1.5a04d6e15c566p-1
0x1.5a04d6e15c565p-1
M_PI * 0x1p+14:
-0x1.1a60000000000p-39
-0x1.1a62633145c07p-39
M_PI * 0x1p+14 + 0x1p-6:
0x1.00055576eb1bbp-6
0x1.00055576eb1a8p-6
0x1p+15:
0x1.3e75a49b5b447p+1
0x1.3e75a49b5b446p+1
M_PI * 0x1p+15:
-0x1.1a60000000000p-38
-0x1.1a62633145c07p-38
M_PI * 0x1p+15 + 0x1p-6:
0x1.000555765de2ep-6
0x1.000555765de08p-6
0x1p+16:
-0x1.eae2708cf5424p-1
-0x1.eae2708cf5425p-1
M_PI * 0x1p+16:
-0x1.1a60000000000p-37
-0x1.1a62633145c07p-37
M_PI * 0x1p+16 + 0x1p-6:
0x1.0005557543714p-6
0x1.00055575436c7p-6
0x1p+17:
-0x1.7bcb26d5d9adap+4
-0x1.7bcb26d5d9af5p+4
M_PI * 0x1p+17:
-0x1.1a60000000000p-36
-0x1.1a62633145c07p-36
M_PI * 0x1p+17 + 0x1p-6:
0x1.000555730e8dfp-6
0x1.000555730e846p-6
0x1p+18:
0x1.59ba3666c9a5cp-4
0x1.59ba3666c9a43p-4
M_PI * 0x1p+18:
-0x1.1a60000000000p-35
-0x1.1a62633145c07p-35
M_PI * 0x1p+18 + 0x1p-6:
0x1.0005556ea4c75p-6
0x1.0005556ea4b44p-6
0x1p+19:
0x1.5c354a31a8487p-3
0x1.5c354a31a846ep-3
M_PI * 0x1p+19:
-0x1.1a60000000000p-34
-0x1.1a62633145c07p-34
M_PI * 0x1p+19 + 0x1p-6:
0x1.00055565d13a2p-6
0x1.00055565d113fp-6
%%%

Notes on special values:

% 0x1p+11:
% -0x1.518972221f88ep-2
% -0x1.518972221f88ep-2

No difference here.  However, there are 1-ulp differences for smaller
args.

% M_PI * 0x1p+11:
% -0x1.1a60000000000p-42
% -0x1.1a62633145c07p-42

There is always this huge difference for the (M_PI * x) args.  Only the
leading 14 mantissa bits agree.

% M_PI * 0x1p+11 + 0x1p-6:
% 0x1.0005557766ad7p-6
% 0x1.0005557766ad5p-6

This is the first 2-ulp difference for the (M_PI * integer + delta) args
in my limited testing.

% 0x1p+17:
% -0x1.7bcb26d5d9adap+4
% -0x1.7bcb26d5d9af5p+4

This is the first 2-ulp difference for the integer args in the above output.
The difference of 27 ulps is consistent with an internal value of pi that
is not much more accurate than 66 bits (17 + 53 - log2(27) = 65.25...).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050205181808.J10966>