From owner-freebsd-i386@FreeBSD.ORG  Sat Feb  5 09:40:24 2005
Return-Path: <owner-freebsd-i386@FreeBSD.ORG>
Delivered-To: freebsd-i386@hub.freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8F3E216A4CE
	for <freebsd-i386@hub.freebsd.org>;
	Sat,  5 Feb 2005 09:40:24 +0000 (GMT)
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1EBA443D39
	for <freebsd-i386@hub.freebsd.org>;
	Sat,  5 Feb 2005 09:40:24 +0000 (GMT)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1])
	by freefall.freebsd.org (8.13.1/8.13.1) with ESMTP id j159eN7h051798
	for <freebsd-i386@freefall.freebsd.org>; Sat, 5 Feb 2005 09:40:23 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.13.1/8.13.1/Submit) id j159eNft051797;
	Sat, 5 Feb 2005 09:40:23 GMT
	(envelope-from gnats)
Date: Sat, 5 Feb 2005 09:40:23 GMT
Message-Id: <200502050940.j159eNft051797@freefall.freebsd.org>
To: freebsd-i386@FreeBSD.org
From: Bruce Evans <bde@zeta.org.au>
Subject: Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results
 for large inputs
X-BeenThere: freebsd-i386@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: Bruce Evans <bde@zeta.org.au>
List-Id: I386-specific issues for FreeBSD <freebsd-i386.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-i386>,
	<mailto:freebsd-i386-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-i386>
List-Post: <mailto:freebsd-i386@freebsd.org>
List-Help: <mailto:freebsd-i386-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-i386>,
	<mailto:freebsd-i386-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 05 Feb 2005 09:40:24 -0000

The following reply was made to PR i386/67469; it has been noted by GNATS.

From: Bruce Evans <bde@zeta.org.au>
To: David Schultz <das@FreeBSD.org>
Cc: FreeBSD-gnats-submit@FreeBSD.org, freebsd-i386@FreeBSD.org,
	bde@FreeBSD.org
Subject: Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results
 for large inputs
Date: Sat, 5 Feb 2005 20:33:43 +1100 (EST)

 On Fri, 4 Feb 2005, David Schultz wrote:
 
 > On Wed, Jun 02, 2004, Bruce Evans wrote:
 > > On Tue, 1 Jun 2004, David Schultz wrote:
 > >
 > > > >Description:
 > > > src/lib/msun/i387/s_tan.S returns wildly inaccuate results when
 > > > its input has a large magnitude (>> 2*pi).  For example:
 > > >
 > > > input			s_tan.S				k_tan.c
 > > > 1.776524190754802e+269	1.773388446261095e+16		-1.367233274980565e+01
 > > > 1.182891728897420e+57	-1.9314539773999572e-01		1.0020569035866138e+03
 > > > 2.303439778835110e+202	2.8465460220132694e+00		3.5686329695133922e+00
 >
 > Here is a patch to fix the problem for tan().  See caveats below...
 
 Seems like a good method...
 
 > Index: s_tan.S
 > ===================================================================
 > RCS file: /cvs/src/lib/msun/i387/s_tan.S,v
 > retrieving revision 1.6
 > diff -u -r1.6 s_tan.S
 > --- s_tan.S	28 Aug 1999 00:06:14 -0000	1.6
 > +++ s_tan.S	4 Feb 2005 21:43:32 -0000
 > @@ -45,14 +45,21 @@
 >  	jnz	1f
 
 "Large" probably needs to be determined by comparison with a constant.
 fptan also gives large relative errors if the result is small, even
 for small args like M_PI (2^39 ulps for MI_PI!!).  These can be reduced
 to 1 ulp (assuming that fdlibm is accurate to 1 ulp) using comparision
 of the result with a constant.  See below.
 
 >  	fstp	%st(0)
 >  	ret
 > -1:	fldpi
 > -	fadd	%st(0)
 > -	fxch	%st(1)
 > -2:	fprem1
 > -	fstsw	%ax
 > -	andw	$0x400,%ax
 > -	jnz	2b
 > -	fstp	%st(1)
 > -	fptan
 > -	fstp	%st(0)
 > +
 > +/* Use the fdlibm routines for accuracy with large arguments. */
 > +1:	pushl   %ebp
 > +	movl    %esp,%ebp
 > +	subl    $32,%esp
 > +	leal	12(%esp),%eax
 > +	movl	%eax,8(%esp)
 > +	fstpl	(%esp)
 > +	call	__ieee754_rem_pio2
 > +	addl	$12,%esp
 > +	andl	$1,%eax			/* compute (eax & 1) ? -1 : 1 */
 > +	sall	%eax
 > +	subl	$1,%eax
 > +	neg	%eax
 > +	movl	%eax,16(%esp)
 > +	call	__kernel_tan
 
 Better call the MI tan() to do all this.  It won't take significantly
 longer, and shouldn't be reached in most cases anyway.
 
 > +	leave
 >  	ret
 >
 > Unfortunately, I'm still getting the wrong answer for large values
 > that are *supposed* to be handled by the fptan instruction.  The
 > error seems to increase towards the end of the range of fptan,
 > (-2^63,2^63).  For instance, tan(0x1.3dea2a2c29172p+22) is only
 > off by the least significant 15 binary digits or so, but
 > tan(0x1.2c95e550f1635p+62) is off by about 5%.  Is fptan simply
 > inherently inaccurate, or did I screw up somewhere?  I would be
 > interested in results from an AMD processor.
 
 I think it is because fptan is inherently inaccurate.  In an earlier
 reply, I said that fldpi cannot give a value that has more than 64
 bits of accuracy, and that there is some magic that sometimes gives
 66 bits.  I now think the magic is just that the hardware trig functions
 use an internal value of pi with that amount of accuracy.  When they
 fail, we use fldpi which gives even less accuracy.
 
 fptan succeeds when its arg is in the range [-2^63,2^63], but to be
 accurate with 66 bits of precision for the internal divisor, the arg
 must be in the range of about [-2^13,2^13] for double precision and
 [-2^2,2^2] for extended precision.  Better reduce the ranges more for
 safety.
 
 Testing shows that the inaccuracy is much worse than that.  The number
 of correct digits is more like (66 - 53) than (66 - 13) when the result
 is very close to 0, even for small args near pi.  13 bits of accuracy
 means 2^40 ulps of inaccuracy for such results.  It's surprising that
 ucbtest doesn't report this.
 
 So it seems that the i387 tan() should do something like the following:
 
 	/* 11 = 13 less a bit for safety. */
 	if (fabs(arg) > pow(2, 11))
 		return __mitan(arg);
 	y = fptan(arg);
 	/*
 	 * I think the -6 in the following should be more like -2
 	 * (plus 2 for safety?), but -2 would slow down too many cases.
 	 * Even -6 gives a 2-ulp error for
 	 * arg = M_PI * pow(2, 11) + (delta = pow(2, -6)) + delta2(delta)
 	 * The magic 2 here is the 2 extra bits of precision in the
 	 * internal value of pi, or possibly the conversion of that
 	 * value via the pow(2, 11) range limit (2 = 66 - 53 - 11).
 	 */
 	if (fabs(y) < pow(2, -6);
 		return __mitan(arg);
 	return (y);
 
 Simple test program:
 
 %%%
 #include <math.h>
 #include <stdio.h>
 
 double xtan(double x);
 
 main()
 {
 	double x, y;
 
 	for (x = 1; x < 1 << 20; x = 2 * x) {
 		printf("%.0a:\n%.13a\n%.13a\n",
 		    x, tan(x), xtan(x));
 		y = M_PI * x;
 		printf("M_PI * %.0a:\n%.13a\n%.13a\n",
 		    x, tan(y), xtan(y));
 		y += 0x1.0p-6;
 		printf("M_PI * %.0a + 0x1p-6:\n%.13a\n%.13a\n",
 		    x, tan(y), xtan(y));
 	}
 }
 %%%
 
 xtan() is the MI fdlibm tan().
 
 Output:
 
 %%%
 0x1p+0:
 0x1.8eb245cbee3a6p+0
 0x1.8eb245cbee3a6p+0
 M_PI * 0x1p+0:
 -0x1.1a60000000000p-53
 -0x1.1a62633145c07p-53
 M_PI * 0x1p+0 + 0x1p-6:
 0x1.0005557778525p-6
 0x1.0005557778525p-6
 0x1p+1:
 -0x1.17af62e0950f8p+1
 -0x1.17af62e0950f8p+1
 M_PI * 0x1p+1:
 -0x1.1a60000000000p-52
 -0x1.1a62633145c07p-52
 M_PI * 0x1p+1 + 0x1p-6:
 0x1.0005557778502p-6
 0x1.0005557778502p-6
 0x1p+2:
 0x1.2866f9be4de13p+0
 0x1.2866f9be4de14p+0
 M_PI * 0x1p+2:
 -0x1.1a60000000000p-51
 -0x1.1a62633145c07p-51
 M_PI * 0x1p+2 + 0x1p-6:
 0x1.00055577784bbp-6
 0x1.00055577784bbp-6
 0x1p+3:
 -0x1.b32e78f49a1e4p+2
 -0x1.b32e78f49a1e4p+2
 M_PI * 0x1p+3:
 -0x1.1a60000000000p-50
 -0x1.1a62633145c07p-50
 M_PI * 0x1p+3 + 0x1p-6:
 0x1.000555777842ep-6
 0x1.000555777842ep-6
 0x1p+4:
 0x1.33d8f03e769a0p-2
 0x1.33d8f03e769a0p-2
 M_PI * 0x1p+4:
 -0x1.1a60000000000p-49
 -0x1.1a62633145c07p-49
 M_PI * 0x1p+4 + 0x1p-6:
 0x1.0005557778314p-6
 0x1.0005557778314p-6
 0x1p+5:
 0x1.526f6245432a4p-1
 0x1.526f6245432a4p-1
 M_PI * 0x1p+5:
 -0x1.1a60000000000p-48
 -0x1.1a62633145c07p-48
 M_PI * 0x1p+5 + 0x1p-6:
 0x1.00055577780dfp-6
 0x1.00055577780dfp-6
 0x1p+6:
 0x1.2c86afc5c9119p+1
 0x1.2c86afc5c9119p+1
 M_PI * 0x1p+6:
 -0x1.1a60000000000p-47
 -0x1.1a62633145c07p-47
 M_PI * 0x1p+6 + 0x1p-6:
 0x1.0005557777c75p-6
 0x1.0005557777c75p-6
 0x1p+7:
 -0x1.0a65bcce6f48cp+0
 -0x1.0a65bcce6f48cp+0
 M_PI * 0x1p+7:
 -0x1.1a60000000000p-46
 -0x1.1a62633145c07p-46
 M_PI * 0x1p+7 + 0x1p-6:
 0x1.00055577773a2p-6
 0x1.00055577773a1p-6
 0x1p+8:
 0x1.91c8f293711dbp+4
 0x1.91c8f293711dbp+4
 M_PI * 0x1p+8:
 -0x1.1a60000000000p-45
 -0x1.1a62633145c07p-45
 M_PI * 0x1p+8 + 0x1p-6:
 0x1.00055577761fap-6
 0x1.00055577761fap-6
 0x1p+9:
 -0x1.46be0f0a73387p-4
 -0x1.46be0f0a73388p-4
 M_PI * 0x1p+9:
 -0x1.1a60000000000p-44
 -0x1.1a62633145c07p-44
 M_PI * 0x1p+9 + 0x1p-6:
 0x1.0005557773eacp-6
 0x1.0005557773eacp-6
 0x1p+10:
 -0x1.48d5be43ada01p-3
 -0x1.48d5be43ada01p-3
 M_PI * 0x1p+10:
 -0x1.1a60000000000p-43
 -0x1.1a62633145c07p-43
 M_PI * 0x1p+10 + 0x1p-6:
 0x1.000555776f810p-6
 0x1.000555776f80fp-6
 0x1p+11:
 -0x1.518972221f88ep-2
 -0x1.518972221f88ep-2
 M_PI * 0x1p+11:
 -0x1.1a60000000000p-42
 -0x1.1a62633145c07p-42
 M_PI * 0x1p+11 + 0x1p-6:
 0x1.0005557766ad7p-6
 0x1.0005557766ad5p-6
 0x1p+12:
 -0x1.7aae915eb67f3p-1
 -0x1.7aae915eb67f3p-1
 M_PI * 0x1p+12:
 -0x1.1a60000000000p-41
 -0x1.1a62633145c07p-41
 M_PI * 0x1p+12 + 0x1p-6:
 0x1.0005557755065p-6
 0x1.0005557755061p-6
 0x1p+13:
 -0x1.a1ff2171ec9fbp+1
 -0x1.a1ff2171ec9fbp+1
 M_PI * 0x1p+13:
 -0x1.1a60000000000p-40
 -0x1.1a62633145c07p-40
 M_PI * 0x1p+13 + 0x1p-6:
 0x1.0005557731b82p-6
 0x1.0005557731b79p-6
 0x1p+14:
 0x1.5a04d6e15c566p-1
 0x1.5a04d6e15c565p-1
 M_PI * 0x1p+14:
 -0x1.1a60000000000p-39
 -0x1.1a62633145c07p-39
 M_PI * 0x1p+14 + 0x1p-6:
 0x1.00055576eb1bbp-6
 0x1.00055576eb1a8p-6
 0x1p+15:
 0x1.3e75a49b5b447p+1
 0x1.3e75a49b5b446p+1
 M_PI * 0x1p+15:
 -0x1.1a60000000000p-38
 -0x1.1a62633145c07p-38
 M_PI * 0x1p+15 + 0x1p-6:
 0x1.000555765de2ep-6
 0x1.000555765de08p-6
 0x1p+16:
 -0x1.eae2708cf5424p-1
 -0x1.eae2708cf5425p-1
 M_PI * 0x1p+16:
 -0x1.1a60000000000p-37
 -0x1.1a62633145c07p-37
 M_PI * 0x1p+16 + 0x1p-6:
 0x1.0005557543714p-6
 0x1.00055575436c7p-6
 0x1p+17:
 -0x1.7bcb26d5d9adap+4
 -0x1.7bcb26d5d9af5p+4
 M_PI * 0x1p+17:
 -0x1.1a60000000000p-36
 -0x1.1a62633145c07p-36
 M_PI * 0x1p+17 + 0x1p-6:
 0x1.000555730e8dfp-6
 0x1.000555730e846p-6
 0x1p+18:
 0x1.59ba3666c9a5cp-4
 0x1.59ba3666c9a43p-4
 M_PI * 0x1p+18:
 -0x1.1a60000000000p-35
 -0x1.1a62633145c07p-35
 M_PI * 0x1p+18 + 0x1p-6:
 0x1.0005556ea4c75p-6
 0x1.0005556ea4b44p-6
 0x1p+19:
 0x1.5c354a31a8487p-3
 0x1.5c354a31a846ep-3
 M_PI * 0x1p+19:
 -0x1.1a60000000000p-34
 -0x1.1a62633145c07p-34
 M_PI * 0x1p+19 + 0x1p-6:
 0x1.00055565d13a2p-6
 0x1.00055565d113fp-6
 %%%
 
 Notes on special values:
 
 % 0x1p+11:
 % -0x1.518972221f88ep-2
 % -0x1.518972221f88ep-2
 
 No difference here.  However, there are 1-ulp differences for smaller
 args.
 
 % M_PI * 0x1p+11:
 % -0x1.1a60000000000p-42
 % -0x1.1a62633145c07p-42
 
 There is always this huge difference for the (M_PI * x) args.  Only the
 leading 14 mantissa bits agree.
 
 % M_PI * 0x1p+11 + 0x1p-6:
 % 0x1.0005557766ad7p-6
 % 0x1.0005557766ad5p-6
 
 This is the first 2-ulp difference for the (M_PI * integer + delta) args
 in my limited testing.
 
 % 0x1p+17:
 % -0x1.7bcb26d5d9adap+4
 % -0x1.7bcb26d5d9af5p+4
 
 This is the first 2-ulp difference for the integer args in the above output.
 The difference of 27 ulps is consistent with an internal value of pi that
 is not much more accurate than 66 bits (17 + 53 - log2(27) = 65.25...).
 
 Bruce