Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 9 Mar 2019 07:53:24 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Steve Kargl <sgk@troutmask.apl.washington.edu>
Cc:        freebsd-numerics@freebsd.org
Subject:   Re: cexpl() (was: Re: Update ENTERI() macro)
Message-ID:  <20190309070413.G2539@besplex.bde.org>
In-Reply-To: <20190308191150.GA32980@troutmask.apl.washington.edu>
References:  <20190306184829.GA44023@troutmask.apl.washington.edu> <20190307061214.R4911@besplex.bde.org> <20190306214233.GA23159@troutmask.apl.washington.edu> <20190307144315.N932@besplex.bde.org> <20190307044447.GA16298@troutmask.apl.washington.edu> <20190307163958.V1293@besplex.bde.org> <20190307195436.GA20856@troutmask.apl.washington.edu> <20190308225807.D6410@besplex.bde.org> <20190308162432.GA31392@troutmask.apl.washington.edu> <20190309033820.J1443@besplex.bde.org> <20190308191150.GA32980@troutmask.apl.washington.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 8 Mar 2019, Steve Kargl wrote:

> On Sat, Mar 09, 2019 at 05:02:57AM +1100, Bruce Evans wrote:
>> On Fri, 8 Mar 2019, Steve Kargl wrote:
>>> On Sat, Mar 09, 2019 at 12:51:07AM +1100, Bruce Evans wrote:
>>>> On Thu, 7 Mar 2019, Steve Kargl wrote:
>>>> ...
>>>> IIRC, that was done by extracting into a 32-bit lx.  This could be written
>>>> as (lx >> 32) > 0xb17217f7 (or a fuzzier value), and it might be better
>>>> to do that anyway.
>>>
>>> This is what I do in my WIP ccosh.  There are 3 thresholds
>>> of ~22.9, ~11356, and ~22755.  I'm casting the result to
>>> uint32_t, so something like (uint32_t)(lx >> 32) > 0xb17000000.
>>> Do I need the cast and/or do you prefer it?
>>
>> I think you can even round to a power of 2 and not look at any mantissa
>> bits to classify this, as for the 64 in coshl().
>
> You used your k_hexpl() and take advantage of the hi+lo splitting
> to use x < 64.

Or I designed my k_hexpl() to support this splitting up to x < 64.  Long
double precision is easier because it has (committed) support functions
like k_hexpl().

>> 22.9 here seems wrong.  It is approximately the threshold where cosh()
>> decays to exp().  The threshold for coshl() delay is much higher.
>> coshl() uses 64, which is high enough for both ld80 and ld128.  More
>> like 32 would work for ld80.
>
> I actually round up to 23 for ld80, and haven't thought too much
> about ld128.  A value of 22 is used for s_ccosh.c, which isn't
> quite large enough for long double. (Yes, I looked at bits)
>
> % ./gen 22
> coshl(22)    = 1.79245642306579578097e+09
> expl(22) / 2 = 1.79245642306579578086e+09
> sinhl(22)    = 1.79245642306579578074e+09
>
> % ./gen 23
> coshl(23)    = 4.87240172312445130013e+09
> expl(23) / 2 = 4.87240172312445130013e+09
> sinhl(23)    = 4.87240172312445130013e+09

Look at more bits :-).  For r + 1/r, ignoring the 1/r term gives an error
of about 1 ulp in ld80 precision when 1/r/r is about 2**-64.  This gives
a threshold of about log(2**(64/2)) = 22.18 for accuracy of about 1 ulp.
But we want more accuracy than that.  For 0.5 ulps, log(2**(65/2) = 22.52+.
We want more than that too.  expl() gives almost 10 extra bits of accuracy,
and the hyperbolic functions use the expl kernel to get all these extra
bits.  expl(x) itself uses a threshold of 2**-75 instead of 2**-65 for 
x being so tiny that the x**2/2 term and higher terms are negligible
compared with the x term.  So we should try for 11 extra bits here.  This
gives a threshold of log(2**(75/2)) = 25.99++.

ld128 expl() is missing the change corresponding to changing 65 to 75 for
the x vs x**2/2 threshold.  It still uses 114.  After fixing this to 124,
the corresponding threshold here is log(2**124/2) = 42.97+.

> in s_ccoshl.c I then have
>
> 	if (ix < 0x4003 ||      /* |x| < ~23: normal case */
> 	    (ix == 0x4003 && (uint32_t)(lx >> 32) < 0xb8000000))
> 		RETURNI(CMPLXL(coshl(x) * c, sinhl(x) * s));

k_hexpl() can't handle this case.  (But if you want efficiency or accuracy,
then you should implement sinhcoshl() or just use the kernel to evaluate
exp(x) only once.)

I think this formula works up to 32 or 64 or even up to the overflow
threshold.  coshl(x) and sinhl() decay to expl(x) starting a bit above
23, but, but these functions can optimize that almost as well as here
(they actually don't bother to until x reaches 64).

You can use this to eliminate the check of lx.

>
> 	/* |x| >= 23, so cosh(x) ~= exp(|x|) / 2 */
> 	if (ix < 0x400c ||
> 	    (ix == 0x400c && (uint32_t)(lx >> 32) <= 0xb1700000)) {
> 		/* x < 11356: exp(|x|) won't overflow */
> 		h = expl(fabsl(x)) / 2;
> 		RETURNI(CMPLXL(h * c, copysignl(h, x) * s));

Use a power of 2 for the upper threshold too if possible.

> With |x| < 23, we have 2 function calls for coshl() and sinhl()
> as opposed to the 1 function call for expl() in 23 <= x <= 32 range.
> I can write the first conditional as
>
> 	if (ix < 0x4004)     /* |x| < 32: normal case */
> 		RETURNI(CMPLXL(coshl(x) * c, sinhl(x) * s));

The general formula is indeed unecessarily slow in that range.  Also
for the more important range [1, 23].  coshl() in the lower range is
lo + 0.25/(hi + lo) + hi after h_expl() to get hi and lo.  Similarly
for sinhl().  These formulas are short enough to repeat in the above,
but I think this optimization is excessive.  It is better for accuracy
after some rearrangement.

> This then raises the question on whether we should change the
> limit to 32 in the double complex ccosh()?

Do you mean from 64 to 32 the non-complex cosh(), or from your current
limit to the above?

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190309070413.G2539>