From owner-freebsd-numerics@FreeBSD.ORG Fri Oct 5 21:32:28 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 738D8106566B for ; Fri, 5 Oct 2012 21:32:28 +0000 (UTC) (envelope-from peter@vps.rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id B40E18FC0A for ; Fri, 5 Oct 2012 21:32:26 +0000 (UTC) Received: from vps.rulingia.com (localhost [127.0.0.1]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q95LWIRD090469 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 6 Oct 2012 07:32:18 +1000 (EST) (envelope-from peter@vps.rulingia.com) Received: (from peter@localhost) by vps.rulingia.com (8.14.5/8.14.5/Submit) id q95LWHc3090468 for freebsd-numerics@freebsd.org; Sat, 6 Oct 2012 07:32:17 +1000 (EST) (envelope-from peter) Date: Sat, 6 Oct 2012 07:32:17 +1000 From: Peter Jeremy To: freebsd-numerics@freebsd.org Message-ID: <20121005213217.GA90440@vps.rulingia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Implementing cpow(3) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 21:32:28 -0000 I have been meditating on cpow(3) for some time without achieving enlightenment. The following is somewhat of a braindump in the hope that someone will advise whether I am on the right track or have gone off the rails. A naive solution to cpow(3) looks like: double complex cpow(double complex z, double complex w) { return (cexp(w * clog(z)); } but this suffers from both spurious exceptions and precision loss. A somewhat expanded approach makes this more obvious: double complex cpow(double complex z, double complex w) { double rw, iw; double rad, lrad, th; double rrad, rth; rw = creal(w); iw = cimag(w); rad = cabs(z); lrad = log(rad); th = carg(z); rrad = exp(rw * lrad - iw * th); rth = rw * th + iw * lrad; return (cpack(rrad * cos(rth), rrad * sin(rth))); } This approach has the advantage of not requiring any complex arithmetic and it's also easier to explicitly handle exception cases. Working through the spurious exceptions and precision loss issues: - cabs(z) can overflow when z has large but finite real & imaginary parts. - log(x) pushes exponent bits into the fraction, thus a double precision log(x) loses about 10 bits of precision. - The '-' & '+' expressions can both suffer catastrophic cancellation. - Because exp(x) pushes fraction bits into the exponent, achieving 53 bits of precision in the result requires about 63 bits of input fraction. - Since both sin() and cos() have a range of [-1,1], rrad can potentially exceed DBL_MAX even though the return value is finite. Note that from here on, the code is intended to demonstrate the general approach, rather than as committable code, and assumes that 0, Inf and NaN are special cased externally. Note that denormals need special handling to do ilogb() and scalbn() via bit-manipulation. The precision loss in log(rad) can be avoided by separately handling the exponent bits - ie split rad into int nrad and double frad such that: rad = 2**nrad * frad, 1 <= frad < 2 Then: lrad = log(rad) = log(2**nrad * frad) = log(2) * nrad + log(frad) The limited range of frad means that log(frad) should not lose any precision and the two sides of the addition can be stored separately. For later use, assume: double lrad0 = log(2.) * (double)nrad; double lrad1 = log(frad); Note that for x << 1, log(1 + x) =~ x. Therefore lrad1 is either 0 or 1p-53 ≤ lrad1 < log(2) ~= 0.693. Moving back a line, cabs(z) can avoid spurious overflow & provide a split result via: struct split { int n; double f; } cabs(double complex z) { double rz, iz; int nrz, niz; double frz, fiz; double y; int ny; rz = creal(z); nrz = ilogb(rz); frz = scalbn(rz, -nrz); iz = cimag(z); niz = ilogb(iz); /* * sqrt(a*a + b*b) = abs(a) * sqrt(1 + (b/a)**2) * For x << 1, sqrt(1 + x*x) is approximately (1 + x*x/2) * therefore the "sqrt(1 + (b/a)**2)" term can be ignored * if b is more than half the size of the mantissa smaller * than a. */ if (nrz > niz + DBL_MANT_DIG/2) return ({nrz, fabs(frz)}); if (niz > nrz + DBL_MANT_DIG/2) return ({niz, fabs(scalbn(iz, -niz))}); fiz = scalbn(iz, -nrz); /* No underflow/overflow is possible due to restricted range */ y = sqrt(frz*frz + fiz*fiz); ny = ilogb(y); return ({nrz + ny, scalbn(y, -ny)}); } A potential enhancement would be to merge log(cabs(z)), noting that log(sqrt(x)) == log(x)/2 and log(sqrt(x*x + y*y)) == log(abs(x) * sqrt(1 + (y*y)/(x*x))) == log(abs(x)) + log(1 + (y*y)/(x*x))/2 == log(abs(x)) + log1p((y*y)/(x*x))/2 (Note that the use of the log1p() approach would imply an additional lrad2 term that is not handled in the following) Looking at "rth = rw * th + iw * lrad;": This is used solely as an argument for sin() or cos() - both of which have a period of 2π, therefore rth can be evaluated modulo 2π, as can each term in the expression. Assuming lrad is split as (lrad0 + lrad1) and mod_2pi() performs a high-precision modulo 2π: rth = mod_2pi(rw) * th + mod_2pi(iw) * (mod_2pi(lrad0) + mod_2pi(lrad1)); This should minimise the catastrophic cancellation. This leaves "rrad = exp(rw * lrad - iw * th);" or, given the lrad split: x = rw * log(2) * nrad + rw * lrad1 - iw * th; rrad = exp(x); The ranges of the various terms are: rw, iw: Any double -π ≤ th ≤ π -1074 ≤ nrad ≤ 1024 (allowing for denormals & z = DBL_MAX * (1 + i)) lrad1 = 0 or 1p-53 ≤ lrad1 < log(2) ~= 0.693. Since the final result is return (rrad * cis(rth)); and the range of cis(t) is |cis(t)| ≤ 1 or, alternatively ignoring 0 values: -1074 ≤ log₂(|cis(t)|) ≤ 0 this implies rrad needs to have a range -1074 ≤ log₂(|rrad|) ≤ 2097 (the latter value allows for rrad * DBL_MIN = DBL_MAX) and provide 53 bits of precision over -1023 < log₂(|rrad|) < 2047 In order to achieve the required range/precision, x needs to be expressed as: x = k * log(2) + r, |r| ≤ log(2) (ideally ≤ log(2)/2) Giving rrad = 2**k * exp(r) or double t = exp(r); return (cpack(scalbn(t * cos(rth), k), scalbn(t * sin(rth), k))); Unfortunately, it's not immediately obvious to me how to get from x = rw * log(2) * nrad + rw * lrad1 - iw * th; to x = k * log(2) + r, |r| ≤ log(2) (ideally ≤ log(2)/2) with the range/precision requirements. In particular, all multiplications need to be able to return values exceeding DBL_MAX and have sufficient precision to provide a 53-bit final result in the face of catastrophic cancellation. Does anyone have any suggestions as to how to proceed? -- Peter Jeremy From owner-freebsd-numerics@FreeBSD.ORG Fri Oct 5 21:48:54 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BF89A106566B for ; Fri, 5 Oct 2012 21:48:54 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 894998FC0C for ; Fri, 5 Oct 2012 21:48:54 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q95LmkMj027265 for ; Fri, 5 Oct 2012 16:48:47 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <506F55BE.7000304@missouri.edu> Date: Fri, 05 Oct 2012 16:48:46 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120912 Thunderbird/15.0.1 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <20121005213217.GA90440@vps.rulingia.com> In-Reply-To: <20121005213217.GA90440@vps.rulingia.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: Implementing cpow(3) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 21:48:54 -0000 On 10/05/2012 04:32 PM, Peter Jeremy wrote: > Looking at "rth = rw * th + iw * lrad;": This is used solely as an > argument for sin() or cos() - both of which have a period of 2π, > therefore rth can be evaluated modulo 2π, as can each term in the > expression. Assuming lrad is split as (lrad0 + lrad1) and mod_2pi() > performs a high-precision modulo 2π: > rth = mod_2pi(rw) * th + mod_2pi(iw) * (mod_2pi(lrad0) + mod_2pi(lrad1)); > This should minimise the catastrophic cancellation. I just wanted to comment that you seem to be using that mod_2pi(a*b) = mod_2pi(a) * mod_2pi(b) + 2*PI*n, for some integer n. But this is generally only true if a and b are integers. If I were writing cpow(complex z, complex w), my code would look at the special case when w is a real integer, and handle it separately from the other cases. When z and w are general complex numbers (or even when z is complex and w is merely restricted to be real), I don't see how to avoid catastrophic cancellation. From owner-freebsd-numerics@FreeBSD.ORG Sat Oct 6 01:47:33 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1667F106566B for ; Sat, 6 Oct 2012 01:47:33 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id 5F54D8FC12 for ; Sat, 6 Oct 2012 01:47:32 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q961lMhd002428 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 6 Oct 2012 11:47:23 +1000 Date: Sat, 6 Oct 2012 11:47:22 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Peter Jeremy In-Reply-To: <20121005213217.GA90440@vps.rulingia.com> Message-ID: <20121006091156.D1385@besplex.bde.org> References: <20121005213217.GA90440@vps.rulingia.com> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-961849994-1349488042=:1385" Cc: freebsd-numerics@FreeBSD.org Subject: Re: Implementing cpow(3) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2012 01:47:33 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-961849994-1349488042=:1385 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Sat, 6 Oct 2012, Peter Jeremy wrote: > I have been meditating on cpow(3) for some time without achieving > enlightenment. The following is somewhat of a braindump in the > hope that someone will advise whether I am on the right track or > have gone off the rails. > ... > - Because exp(x) pushes fraction bits into the exponent, achieving 53 bit= s > of precision in the result requires about 63 bits of input fraction. Look at the real pow() to see how it gets enough (?) extra precision, and also for how it handles exceptional and other special cases. Look at both the fdlibm version and the old BSD libm version (in the Attic, unless svn lost it). I've barely looked at them, but noticed that they do the following for extra precision: - fdlibm duplicates most of exp() and log() manually inline in exp(), so as to get at their internals. It is hard to see how this provides enough extra precision, since the internals are only accurate to half an ulp. The power series are accurate to 1/32 ulps, but it is hard to evaluate them that accurately. Well, maybe pow() does the usual extra-precision things for the highest term of the power series only. - old BSD libm uses the extra-precision functions __exp__D() and __log__D(). These have a chance of providing an extra 25 bits of precision. It is much easier to understand, so I will include parts of it here: % * Required kernel functions: % *=09exp__D(a,c)=09=09=09exp(a + c) for |a| << |c| % *=09struct d_double dlog(x)=09=09r.a + r.b, |r.b| < |r.a| % ... % double pow(x,y)=20 % double x,y; % { % =09double t; % =09if (y=3D=3Dzero) % =09=09return (one); % =09else if (y=3D=3Done || (_IEEE && x !=3D x)) % =09=09return (x);=09=09/* if x is NaN or y=3D1 */ % =09else if (_IEEE && y!=3Dy)=09=09/* if y is NaN */ % =09=09return (y); % =09else if (!finite(y))=09=09/* if y is INF */ % =09=09if ((t=3Dfabs(x))=3D=3Done)=09/* +-1 ** +-INF is NaN */ % =09=09=09return (y - y); % =09=09else if (t>one) % =09=09=09return ((y<0)? zero : ((x0)? zero : ((x<0)? y-y : -y)); % =09else if (y=3D=3Dtwo) % =09=09return (x*x); % =09else if (y=3D=3Dnegone) % =09=09return (one/x); % /* x > 0, x =3D=3D +0 */ % =09else if (copysign(one, x) =3D=3D one) % =09=09return (pow_P(x, y)); %=20 % /* sign(x)=3D -1 */ % =09/* if y is an even integer */ % =09else if ( (t=3Ddrem(y,two)) =3D=3D zero) % =09=09return (pow_P(-x, y)); %=20 % =09/* if y is an odd integer */ % =09else if (copysign(t,one) =3D=3D one) % =09=09return (-pow_P(-x, y)); Apart from exceptional cases, it only has these special cases for integers. This is not much different from fdlibm pow(). %=20 % =09/* Henceforth y is not an integer */ % =09else if (x=3D=3Dzero)=09/* x is -0 */ % =09=09return ((y>zero)? -x : one/(-x)); % =09else if (_IEEE) % =09=09return (zero/zero); % =09else % =09=09return (infnan(EDOM)); % } % /* kernel function for x >=3D 0 */ % static double % #ifdef _ANSI_SOURCE % pow_P(double x, double y) % #else % pow_P(x, y) double x, y; % #endif % { % =09struct Double s, t, __log__D(); % =09double __exp__D(), huge =3D 1e300, tiny =3D 1e-300; %=20 % =09if (x =3D=3D zero) % =09=09if (y > zero) % =09=09=09return (zero); % =09=09else if (_IEEE) % =09=09=09return (huge*huge); % =09=09else % =09=09=09return (infnan(ERANGE)); % =09if (x =3D=3D one) % =09=09return (one); % =09if (!finite(x)) % =09=09if (y < zero) % =09=09=09return (zero); % =09=09else if (_IEEE) % =09=09=09return (huge*huge); % =09=09else % =09=09=09return (infnan(ERANGE)); % =09if (y >=3D 7e18)=09=09/* infinity */ % =09=09if (x < 1) % =09=09=09return(tiny*tiny); % =09=09else if (_IEEE) % =09=09=09return (huge*huge); % =09=09else % =09=09=09return (infnan(ERANGE)); Don't know why it handles some exceptional cases here instead of all in the main function. %=20 % =09/* Return exp(y*log(x)), using simulated extended */ % =09/* precision for the log and the multiply.=09 */ %=20 % =09s =3D __log__D(x); % =09t.a =3D y; % =09TRUNC(t.a); % =09t.b =3D y - t.a; % =09t.b =3D s.b*y + t.b*s.a; % =09t.a *=3D s.a; % =09s.a =3D t.a + t.b; % =09s.b =3D (t.a - s.a) + t.b; % =09return (__exp__D(s.a, s.b)); % } It handles all non-special cases here, using the formula with extra precision. Note that __log__D() starts with standard precision and produces extra precision, while __exp__D() starts with extra precision and produces standard precision. This is exactly what is needed here. These functions are still in libm, since I brought them back to use with old BSD tgamma(). But they are not very good. My log*() works similarly to __log__D(). It currently only provides a static inline interface for exporting the extra precision that it produces. It produces only about 8 extra bits. __log__D() has a chance of producing 26 extra, but I expect it barely produces 8. It is fairly easy to change tables and polynomial degrees to produce 26 if required. Current expl() has similar extra precision internally to __log__D(), but it produces it instead of starting with it, so it would be not so easy to convert it to __exp__D()'s interface. However, some versions of exp2*() are a good base for such a conversion, since exp2(x) =3D=3D pow(2, x) so it has similar requirements. It would be generally useful to have versions of exp() and log() that both take and return extra precision. My log1pl() almost does this internally. > - Since both sin() and cos() have a range of [-1,1], rrad can potentially > exceed DBL_MAX even though the return value is finite. This is a solved problem. See how cexp() and ccosh() handle it. > ... > Moving back a line, cabs(z) can avoid spurious overflow & provide a > split result via: > > struct split { > =09int n; > =09double f; > } cabs(double complex z) > { > =09double rz, iz; > =09int nrz, niz; > =09double frz, fiz; > =09double y; > =09int ny; > > =09rz =3D creal(z); > =09nrz =3D ilogb(rz); > =09frz =3D scalbn(rz, -nrz); > =09iz =3D cimag(z); > =09niz =3D ilogb(iz); > =09/* > =09 * sqrt(a*a + b*b) =3D abs(a) * sqrt(1 + (b/a)**2) > =09 * For x << 1, sqrt(1 + x*x) is approximately (1 + x*x/2) > =09 * therefore the "sqrt(1 + (b/a)**2)" term can be ignored > =09 * if b is more than half the size of the mantissa smaller > =09 * than a. > =09 */ > =09if (nrz > niz + DBL_MANT_DIG/2) > =09=09return ({nrz, fabs(frz)}); > =09if (niz > nrz + DBL_MANT_DIG/2) > =09=09return ({niz, fabs(scalbn(iz, -niz))}); > > =09fiz =3D scalbn(iz, -nrz); > =09/* No underflow/overflow is possible due to restricted range */ > =09y =3D sqrt(frz*frz + fiz*fiz); > =09ny =3D ilogb(y); > =09return ({nrz + ny, scalbn(y, -ny)}); > } Here you are reinventing a bit too much of hypot(), or rather clog(). Scaling is certainly necessary to avoid spurious overflow, so we can't just use these functions. There are also large problems with accuracy, which the above doesn't solve and clog() probably doesn't solve well enough for use here. I snipped too much above so I quote the pseudo-code again: > double complex cpow(double complex z, double complex w) > { > =09double rw, iw; > =09double rad, lrad, th; > =09double rrad, rth; >=20 > =09rw =3D creal(w); > =09iw =3D cimag(w); > =09rad =3D cabs(z); > =09lrad =3D log(rad); log(rad) cancels up to ~156 bits (3 * DBL_MAX less 3) when |z| is near 1, so this won't work without tripled or quadrupled double precision. clog() has to work hard to produce 52 bits of precision using only doubled double precision. > =09th =3D carg(z); >=20 > =09rrad =3D exp(rw * lrad - iw * th); There will be additional cancelations here. It seems to hard to avoid them completely, but it would be good to start with at least full precision in lrad and th. I know how to get full precision for lrad: in clog(), where it evaluates log1p(hi + lo) in the critical case, use an extra-precision log1p() on (hi, lo). This would also give an extra- precision result if hi+lo has extra precision. The doubled double precision caclulations usually give lots of extra precision in hi+lo, but I forget if they give much extra when |z| is near 1. > =09rth =3D rw * th + iw * lrad; Can cancel too. >=20 > =09return (cpack(rrad * cos(rth), rrad * sin(rth))); > } No cancelations here. This is "easy", since it is just like the solved problem for ccosh(). We just have to scale the exp() factor for both. > A potential enhancement would be to merge log(cabs(z)), noting that > log(sqrt(x)) =3D=3D log(x)/2 > and > log(sqrt(x*x + y*y)) =3D=3D log(abs(x) * sqrt(1 + (y*y)/(x*x))) > =3D=3D log(abs(x)) + log(1 + (y*y)/(x*x))/2 > =3D=3D log(abs(x)) + log1p((y*y)/(x*x))/2 > (Note that the use of the log1p() approach would imply an additional > lrad2 term that is not handled in the following) clog() does stuff like this, especially in discarded versions. |y/x| must be fairly small, else this just gives an extra ulp or 2 of error. If |y/x| is very small, then the above can be evaluated efficiently using log1p(ratio) ~=3D ratio. I tried this, but it didn't improve the average speed, although the test emphasized unusual cases with small or large ratios. So I only use this when |y/x| is so small that it would underflow, so the log1p() term goes away as a side effect of avoiding underflow. cpow() could only do better than clog() by inlining clog() and adding complications. Seems too hard. I now see that the overflow avoidance in clog() is enough for calculating lrad for use here, but then the expressions for rrad and rth may overflow. These expressions just give ordinary complex multiplication, which may overflow, and one reason we are writing them as real expressions is to possibly avoid overflow. It can be avoided by scaling |lrad| and |th| to <=3D 0.5, with complications when one of them would underflow. > Looking at "rth =3D rw * th + iw * lrad;": This is used solely as an > argument for sin() or cos() - both of which have a period of 2=CF=80, > therefore rth can be evaluated modulo 2=CF=80, as can each term in the > expression. Assuming lrad is split as (lrad0 + lrad1) and mod_2pi() > performs a high-precision modulo 2=CF=80: > rth =3D mod_2pi(rw) * th + mod_2pi(iw) * (mod_2pi(lrad0) + mod_2pi(lrad1= )); > This should minimise the catastrophic cancellation. I think Stephen said that this only works for special args. The mod_2pi operation is difficult, but is supported by __kernel_rem_pio2(). But this point shows that the cancelation problem is larger than usual -- it is not just at 0, but at all multiples of 2*Pi (or Pi/2). But I think it is much larger at 0, since other multiples are hard to get close to. Easier with 2 variables than 1 though. > This leaves "rrad =3D exp(rw * lrad - iw * th);" or, given the lrad split= : > x =3D rw * log(2) * nrad + rw * lrad1 - iw * th; > rrad =3D exp(x); I snipped the part about the rad split. I think it wouldn't work too well. You would have to evaluate the above long expression for x mostly in exitra precision. It's easier to have an extra-precision log(). In the above, the full extra precision expression is: x =3D rw * (ln2_hi + ln2_lo) * nrad + rw * (lrad1_hi + lrad1_lo - iw * (th_hi + th_lo);=09/* not literal; also need hi+lo for rw... */ We can let the extra-precision log() do some of the additions, and "only" have to evaluate: x =3D rw * (lrad_hi + lrad_lo) - iw * (th_hi + th_lo);=09/* not literal.= =2E. */ > The ranges of the various terms are: > rw, iw: Any double > [... unreadable part with binary characters] > ... > In order to achieve the required range/precision, x needs to be expressed= as: > x =3D k * log(2) + r, |r| =E2=89=A4 log(2) (ideally =E2=89=A4 log(2)/2) > Giving > rrad =3D 2**k * exp(r) > or > double t =3D exp(r); > return (cpack(scalbn(t * cos(rth), k), scalbn(t * sin(rth), k))); > > Unfortunately, it's not immediately obvious to me how to get from > x =3D rw * log(2) * nrad + rw * lrad1 - iw * th; > to > x =3D k * log(2) + r, |r| =E2=89=A4 log(2) (ideally =E2=89=A4 log(2)/2) > with the range/precision requirements. In particular, all > multiplications need to be able to return values exceeding DBL_MAX and > have sufficient precision to provide a 53-bit final result in the face > of catastrophic cancellation. See cexp(). It calls __ldexp_cexp() to handle all the scaling details. Oops, __ldexp_cexp() cannot handle extra precision that we may have in x, we need significant extra precision in x since exp() will expand the error. So we need an __ldexp_cexp() corresponding to __exp__D(). But that is easi= er than getting enough precision in x. I would try evaluating it all in merel= y doubled double precision. clog() shows how to do this for the simpler expression ax*ax + ay*ay - 1. Bruce --0-961849994-1349488042=:1385--