Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 Nov 2007 08:51:06 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Steve Kargl <sgk@troutmask.apl.washington.edu>
Cc:        freebsd-standards@FreeBSD.org
Subject:   Re: long double broken on i386?
Message-ID:  <20071116080356.S10107@besplex.bde.org>
In-Reply-To: <20071115201625.GA18739@troutmask.apl.washington.edu>
References:  <20070928152227.GA39233@troutmask.apl.washington.edu> <20071001173736.U1985@besplex.bde.org> <20071002001154.GA3782@troutmask.apl.washington.edu> <20071002172317.GA95181@VARK.MIT.EDU> <20071002173237.GA12586@troutmask.apl.washington.edu> <20071003103519.X14175@delplex.bde.org> <20071010204249.GA7446@troutmask.apl.washington.edu> <20071103144151.U671@besplex.bde.org> <20071115201625.GA18739@troutmask.apl.washington.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 15 Nov 2007, Steve Kargl wrote:

> On Sat, Nov 03, 2007 at 03:53:19PM +1100, Bruce Evans wrote:

>> Are 396 hex digits enough?  It can't be right for all of the float, double
>> and long double cases.  I hope it is enough for 128-bit long doubles, and

> ...
> So, I poked around and found
>
> http://docs.sun.com/app/docs/doc/801-7639/6i1ucu1uh?a=view
>
> which contains the statement:
>
>   libm and libsunmath trigonometric functions use an "infinitely" precise
>   pi for argument reduction.  The value 2/pi is computed to 916 hexadecimal
>   digits and stored in a lookup table to use during argument reduction.

That is a lot.  Try looking at www.validgh.com for more references.  There
is a program for calculating the number of digits needed somewhere there
or pointed to there.

> static const long double xxx[10] = {
> 	-0x8.000000000000000p-4L, /* The - here, changes the sign of hz. */

Why put the minus here? ...

> long double
> __kernel_cosl(long double x, long double y)
> {
> 	long double hz, r, z, w;
>
> #if 1
> 	/*
> 	 * XXX - Although y supposely carries extra precision to allow
> 	 * better accuracy, the following lines are needed to get <= 1 ULP.
> 	 * Note, very simply testing with arg in [-2000:2000] shows ULPs
> 	 * that exceed 38 without this.
> 	 */
> 	x += y;
> 	y = 0;
> #endif

y should adjust x by at most 1 ULP (maybe < 0.5 ULPs) so this change
should have little (maybe no) effect.  The bug is probably in the
caller, in adding up the doubles to form a long double.  For double
precision, y[0] gives the double x directly, and it is up to
__ieee754_rem_pio2() to keep y[1] (y here) small.  For amd and i386
long double precision, x here must be calculated from y[0] and y[1]:

 	x = (double)(y[0] + y[1]);

and y here is the residue (somthing like (double)(x - y[0]) + y[1]).
Here it is essential to avoid the gcc bug of casts and assignments
not discarding extra precision on i386's.  With this calculation of
x and y, x += y always has no effect, since the sum has already been
calculated as accurately as possible in x.

> 	z = x * x;
>
> 	r = z * (C[1] + z * (C[2] + z * (C[3] + z * (C[4] + z * (C[5] +
> 	    z * (C[6] + z * (C[7] + z * (C[8] + z * C[9]))))))));
>
> 	hz = (float)C[0] * z;

Oops, this float cast is a copy of the one in k_cos.c, which is only there
due to mismerging the k_cosf.c to k_cos.c.  This should make no difference.

> 	w = 1 + hz;
>
> 	return (w + (((1 - w) + hz) + (z * r - x * y)));

... hz means half of z.  Why change it to mean minus half of z?

C[1] etc vs C1 etc in k_cos.c is a regression.  C1 is easier to type
in debuggers.  My program for determining the best polynomials prints
the coefficients in a standard form, and I decided to not use the array
form and have had second thoughts about the hex values (I now prefer
decimal values in code and hex values in comments, as in fdlibm, partly
because the C99 hex values are too different from the raw MD hex values
to be useful for debugging).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071116080356.S10107>