From owner-freebsd-numerics@FreeBSD.ORG  Sun May 10 10:44:12 2015
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0BF6533D
 for <freebsd-numerics@freebsd.org>; Sun, 10 May 2015 10:44:12 +0000 (UTC)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id C48F81E11
 for <freebsd-numerics@freebsd.org>; Sun, 10 May 2015 10:44:11 +0000 (UTC)
Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au
 (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 7D29D3C50AE;
 Sun, 10 May 2015 20:16:15 +1000 (AEST)
Date: Sun, 10 May 2015 20:16:14 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: small cleanup patch for e_pow.c
In-Reply-To: <20150510061458.GA82518@troutmask.apl.washington.edu>
Message-ID: <20150510172810.E1812@besplex.bde.org>
References: <20150510002910.GA82261@troutmask.apl.washington.edu>
 <20150510113454.O841@besplex.bde.org>
 <20150510061458.GA82518@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=A5NVYcmG c=1 sm=1 tr=0
 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10
 a=JzwRw_2MAAAA:8 a=1_pA0lLU-LACsFVV59MA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 10 May 2015 10:44:12 -0000

On Sat, 9 May 2015, Steve Kargl wrote:

> On Sun, May 10, 2015 at 11:59:55AM +1000, Bruce Evans wrote:
>> On Sat, 9 May 2015, Steve Kargl wrote:
>>
>>> In reading, e_pow.c I found a small piece of code that
>>> can be remove.  Anyone object?
>>>
>>> Index: src/e_pow.c
>>> ===================================================================
>>> --- src/e_pow.c	(revision 1603)
>>> +++ src/e_pow.c	(working copy)
>>> @@ -187,10 +187,6 @@ __ieee754_pow(double x, double y)
>>>
>>>     /* |y| is huge */
>>> 	if(iy>0x41e00000) { /* if |y| > 2**31 */
>>> -	    if(iy>0x43f00000){	/* if |y| > 2**64, must o/uflow */
>>> -		if(ix<=0x3fefffff) return (hy<0)? huge*huge:tiny*tiny;
>>> -		if(ix>=0x3ff00000) return (hy>0)? huge*huge:tiny*tiny;
>>> -	    }
>>> 	/* over/underflow if x is not close to one */
>>> 	    if(ix<0x3fefffff) return (hy<0)? s*huge*huge:s*tiny*tiny;
>>> 	    if(ix>0x3ff00000) return (hy>0)? s*huge*huge:s*tiny*tiny;
>>
>> It seems to be just an optimization.  It is a large optimization for
>> the huge args, but those are not common, and is at most a tiny
>> pessimization for non-huge args (just an extra branch which can be
>> predicted perfectly if non-huge args are never used).
>
> I don't see how this can be an optimization.  The code has the form
>
>  if (|y| > 2**31) {
>     if (|y| > 2**64) {
>         if (a) return
>         if (b) return
>     }
>     if (a') return
>     if (b') return
>     ...
>   }
>
> The difference between a and a' is <= instead of <, and similar for
> b and b' with >= and >.  If either a or b would have return, so will
> a' and b'.  If neither a nor b return, then neither a' nor b' will
> return.  The a' and b' lines were added by das in r141296, where the
> commit message says the change is for fdlibm 5.3 compatibility.

Hmm, I read the comment and not the code.  According to the comment,
there are no tests a and b.  But these are apparently needed.
Obviously, we must return 1 when x is exactly 1, and it is apparently
possible for overflow/underflow to occur for values near 1 even when
y is huge.

> I'll also note that block like 'if (|y|>2**64) { }' is not present
> in e_powf.c

Apparently just another translation error.

The tests for the double precision case are probably fuzzy just for
efficiency -- we want to look at only the high word.  In float
precision, we can look at all the bits.  Since even 1 bit away from 1
is relatively large in float precision, overflow/underflow is more
likely to happen automatically.  Indeed, (1+2**-23)**(2**30) is a large
multiple of FLT_MAX although (1+2**-23)**(2**29) is a small fraction
of FLT_MAX; N >= 2**30 and above also gives underflow for (1-2**-24)**N;
N >= 2**62 gives overflow in double precision for (1+2**-52)**N, ...

So 1 is the only numbers near 1 that doesn't give overflow.

Here is the code again, with explanations:

X 	if(iy>0x41e00000) { /* if |y| > 2**31 */
X 	    if(iy>0x43f00000){	/* if |y| > 2**64, must o/uflow */

The correct fuzzy (power of 2) threshold seems to be >= 2**62.

X 		if(ix<=0x3fefffff) return (hy<0)? huge*huge:tiny*tiny;
X 		if(ix>=0x3ff00000) return (hy>0)? huge*huge:tiny*tiny;

Then we must determine on which side of 1 |x| lies. ix <= 0x3fefffff
clearly imples that |x| < 1.  Otherwise, ix >= 0x3ff00000 and |x| >= 1.
The last inequality is strict since |x| = 1 has already been handled.
The check that ix >= 0x3ff00000 is redundant and should be omitted.
The first check should be written more naturally as ix < 0x3ff00000.

In float precision, the first check can be written even more easily as
ix < 0x3f800000.

X 	    }
X 	/* over/underflow if x is not close to one */
X 	    if(ix<0x3fefffff) return (hy<0)? s*huge*huge:s*tiny*tiny;
X 	    if(ix>0x3ff00000) return (hy>0)? s*huge*huge:s*tiny*tiny;

Removing the equal signs from the inequalities makes a signficant
difference.  There are 2**32 values with ix == 0x3fefffff and another
2**32 values with ix == 0x3ff000000.  All of these are close to 1.
Since |iy| is not so large now, |x| has to be signficantly different
from 1 for overflow/underflow to occur.  It probably occurs for many
of the 2 * 2**32 values not found by these tests; the tests are just
quick fuzzy ones using ix.

In float precision, we can find the exact thresholds for overflow
and underflow.  Perhaps the Cygnus thresholds are exact.

das didn't actually add a' and b', but just changed the spelling of sn
in them:

X @@ -192,9 +191,9 @@
X  	    }
X  	/* over/underflow if x is not close to one */
X -	    if(ix<0x3fefffff) return (hy<0)? sn*huge*huge:sn*tiny*tiny;
X -	    if(ix>0x3ff00000) return (hy>0)? sn*huge*huge:sn*tiny*tiny;
X -	/* now |1-x| is tiny <= 2**-20, suffice to compute
X +	    if(ix<0x3fefffff) return (hy<0)? s*huge*huge:s*tiny*tiny;
X +	    if(ix>0x3ff00000) return (hy>0)? s*huge*huge:s*tiny*tiny;
X +	/* now |1-x| is tiny <= 2**-20, suffice to compute 
X  	   log(x) by x-x^2/2+x^3/3-x^4/4 */
X -	    t = ax-1;		/* t has 20 trailing zeros */
X +	    t = ax-one;		/* t has 20 trailing zeros */
X  	    w = (t*t)*(0.5-t*(0.3333333333333333333333-t*0.25));
X  	    u = ivln2_h*t;	/* ivln2_h has 21 sig. bits */

Bruce