Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Mar 2017 17:46:34 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Steve Kargl <sgk@troutmask.apl.washington.edu>
Cc:        freebsd-numerics@freebsd.org
Subject:   Re: Bit twiddling question
Message-ID:  <20170310165638.D1143@besplex.bde.org>
In-Reply-To: <20170309195134.GA36213@troutmask.apl.washington.edu>
References:  <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> <20170309075236.GA30199@troutmask.apl.washington.edu> <20170309152307.GA32781@troutmask.apl.washington.edu> <20170310025417.U3723@besplex.bde.org> <20170309195134.GA36213@troutmask.apl.washington.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 9 Mar 2017, Steve Kargl wrote:

> On Fri, Mar 10, 2017 at 05:02:13AM +1100, Bruce Evans wrote:
>> On Thu, 9 Mar 2017, Steve Kargl wrote:
>>
>>> On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote:
>>>> To give a hint at what I have been working on, I have implementations
>>>> for sinpif(x), sinpi(x), cospif(x), and cospi(x).  For sinpif(x) and
>>>> cospif(x) I have kernels that give correctly rounded results for
>>>> FLT_MIN <= x < 0x1p23 (at least on x86_64).   I also have slightly
>>
>> I didn't see the original mail.
>
> Odd.  Maybe munched by a spam filter.

It was delivered out of order.

>> I'd prefer you to finish the long double versions of cosl(), sinl() and
>> tanl() (maybe das is responsible for the latter).  These are still too
>> slow and minor bugs for at least sinl() on small x, partly by not copying
>> the lower-precision versions enough.
>
> sinl, cosl, and tanl are good enough for my purposes at the moment.
> tgammal and powl are higher on the list of things to fix.  But,

Only powl is very important, but everyone avoids it because it is harder.

>*  ...
>>> which can be written (off the top-of-head and not tested)
>>>        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
>>>                double volatile vx;
>>>                uint32_t n;
>>>                vx = ax + 0x1p52;
>>>                vx = vx - 0x1p52;
>>>                if (vx == ax) return(0);
>>>                if (vx > UINT32_MAX) {   /* ax = m + n + r with m + n */
>>>                    vx -= UINT32_MAX;    /* m = UINT32_MAX.  n is in range */
>>>                    ax -= UINT32_MAX;
>>> 				}
>>>                n = (uint32_t)vx;
>>>                ax = __compute_sinpi(ax - n);
>>>                if (n & 1) ax = -ax;
>>>                return ((hx & 0x80000000) ? -ax : ax);
>>>        }
>>>
>>> Something similar can be applied to ld128, but reduction may take
>>> two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX).
>>
>> I don't like that much.  The 0x1p52 magic is tricky and probably buggy.
>> s_rint.c uses the same magic but needs many complications to avoid
>> double rounding.
>
> Yep. Seems to have some issue.  From my working copy of s_sinpif(x),
> I tried
> ...
> My max ULP went from 0.5 to 0.50398505 for exhaustive testing
> in the interval [1,100].  If I grab the worse case x and use the
> '#if 1' code, I see
>
>> Similar-looking magic for trig reduction in radians
>> uses floating the magic value of 0x1.8p52 to bias rounding errors in
>> a safe way, and arranges so that rounding errors don't matter then.
>> ....
>>  	y = x * 0.5 - floor(x * 0.5);	/* 4 quads in [2n, 2n+2) -> [0, 1) */
>>  	y = 4 * y;			/* -> [0, 4) */
>>  	int n = y;			/* quadrant */
>>  	r = (y - n) / 4;		/* offset in quadrant */
>>
>> This is a bit slow, and too much like lgamma's method.
>
> I did look over what you did in lgamma[f], but forgot the logic
> behind the reduction into octants.  I suppose that I'm more
> concern with 'make it work', 'make it correct', and then I
> will/might worry about 'make it fast'.  But, I don't want it to be
> too slow, so I'm trying to avoid function calls and reclassification
> of x as +-inf or nan.  Heck, r = remquof(ax, vx, &n) should work,
> but there's that speed issue.

In fact, we already did the work to optimize this using the 0x1p52
magic for lgamma, and even committed it.

fdlibm lgamma used a messy combination of multiplications by 0.5 and
2.0, 2 floor()s, 1 half of the 0x1p52 magic and bit fiddling to do
the other half of the 0x1p52 magic.  You simplified this, and we
eventually optimized it to use no floors (but the equivalent of
2 rint()s on |x| and 4*|x| implemented using full 0x1p52 magic) and
different bit fiddling.  Now I fear that the 0x1p52 doesn't work
in double rounding cases.  Hopefully it works like it does for trig
functions -- at worst there is an off-by 1 error in the quadrant
but here is a compensating tiny error in the offset.  E.g., the
closest value to pi/4 might be represented as octant 0 with offset
pi/4 + epsilon or octant 1 with offset epsilon (the lowest level
must reduce to octants, not quadrants).  Then the poly much just
approximate a little above pi/4 in case epsilon > 0.  Higher levels
also see the off-by-1 error but everything is consistent.

This explains your larger error -- the accuracy of the poly breaks
down not far above pi/4, and even epsilon above pi/4 there may be
a near-halfway case that is relatively inaccurate.

I think the reduction for lgamma can be further improved by reducing
directly to octants.

sin_pi() in lgamma differs from a perfect general sin_pi() only in
the following respects:
- the general function has to handle large args, most are even integers
   so sin_pi() on them is 0.  Some are odd integers.  The 0x1p52 magic
   breaks down at about the same point as where all args are integers.
- the perfect general function has to use special polys and special
   poly evaluation methods since pi*x is too inaccurate without extra
   precision.

Reduction for sind() is interesting.  sind(x) = sin_pi(x/180) in
infinite precision, but the division is hard to do accurately and
efficiently.  So sind(x) is only easy for small x.

C11 doesn't have sincos*() or sind*() to make us more incomplete.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170310165638.D1143>