Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 28 Jul 2012 14:28:13 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Stephen Montgomery-Smith <stephen@missouri.edu>
Cc:        freebsd-bugs@FreeBSD.org, FreeBSD-gnats-submit@FreeBSD.org, Stephen Montgomery-Smith <stephen@FreeBSD.org>, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: bin/170206: complex arcsinh, log, etc.
Message-ID:  <20120728120632.M909@besplex.bde.org>
In-Reply-To: <5012FF06.4030501@missouri.edu>
References:  <201207270247.q6R2lkeR021134@wilberforce.math.missouri.edu> <20120727233939.A7820@besplex.bde.org> <5012FF06.4030501@missouri.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 27 Jul 2012, Stephen Montgomery-Smith wrote:

> On 07/27/2012 09:26 AM, Bruce Evans wrote:
>
>> VC> > For clog, the worst case that I've found so far has x^2+y^2-1 ~=
>> 1e-47:
>> VC> >
>> VC> >      x =
>> 0.999999999999999555910790149937383830547332763671875000000000
>> VC> >      y =
>> VC> > 0.0000000298023223876953091912775497878893005143652317201485857367516
>> VC> >        (need high precision decimal or these rounded to 53 bits
>> binary)
>> VC> >      x^2+y^2-1 = 1.0947644252537633366591637369e-47
>> VC> VC> That is exactly 2^(-156).  So maybe triple quad precision really
>> is enough.
>
> Furthermore, if you use the computation (x-1)*(x+1)*y*y (assuming as you do 
> x>y>0), only double precision is necessary.  This is proved in the paper 
> "Implementing Complex Elementary Functions Using Exception Handling" by Hull, 
> Fairgrieve, Tang, ACM Transactions on Mathematical Software, Vol 20, No 2, 
> 1994.  They give a bound on the error, which I think can be interpreted as 
> being around 3.9 ULP.

I'm using x*x-1+y*y in doubled precision, which I believe is better.

I'm now thinking of the following refinement: suppose x is close to 1
and y is close to 0 (other cases are easier to get right accidentally,
but harder to analyze).  Then u = x-1 in non-doubled precision is exact
and cancels most low bits.  So u*u is exact in non-doubled precision.
Thus x*x-1 can be evalated in mostly-non-doubled precision as u*u+2*u.
2u is exact, and u*u is a tiny correction term (less than about half
an ulp relative to 2*u).  If we just wanted to pass this result to
log1p(), it would round to -2*u and no doubled precision would be
necessary.  But we need to retain it to add to y*y.  The necessary
extra precision is much easier for addition than for multiplication.
If I'm right that u*u is less than half an ulp, then (-2*u, u*u) is
already in my normal form for doubled precision.

Oops, this is essentially the same as (x-1)*(x+1).  x-1 is u, and
x+1 is u+2, so the product is u*u+2*u grouped in a numerically bad
way (it either needs extra precision for x+1 and then for the
multiplication, or loses accuracy starting with x+1).  Did you
mean doubled precision, not double precision?

This also avoids the following complication: double precision has an
odd number of bits, so it can't be split in half for calculating x*x
and y*y.  The usual 26+27 split would give an error of half an ulp in
doubled doubled precision.  The above avoids this for x*x in some
critical cases.  I hope in all critical cases.

Cases where x*x and y*y are both nearly 0.5 have other interesting
points.  If x*x => 0.5, then x*x-1 is exact in doubled precsion.  When
x*x < 0.5, x*x-1 is not necessarily exact in doubled precision.  I
handle these cases by writing the expression as (x*x-0.5)+(y*y-0.5).
When x*x >= 0.5 and y*y >= 0.25, both methods give exact subtractions
and I think they give the same result.

> And I think you will see that your example does not contradict their theorem. 
> Because in your example, x-1 will be rather small.
>
> So to get reasonable ULP (reasonable meaning 4 rather than 1), double 
> precision is all you need.

I think you mean doubled precision.

4 in undoubled precision is mediocre, but 4 in doubled precision is
many more than needed, but I hope I get 0.5+ epsilon.  The result
starting with double precision would then be accurate with 53-4 or
(54-0.5-epsilon) extra bits if the log1p() of it were taken with
infinite precision.  But log1p() has finite precision, and I'm seeing
the effects of slightly more than half a double precision bit being
lost on conversion of the doubled double precision x*x+y*y-1 when
passed to log1p(), and then another slightly more than half [...] lost
by imperfect rounding of log1p().  So one of my tests is to remove the
log1p() source of inaccuracy by replacing it with log1pl().  In float
precision, exhaustive testing is possible though not complete; all
cases tested with of |z| as close as possible to 1 were perfectly
rounded.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120728120632.M909>