From owner-freebsd-numerics@FreeBSD.ORG Sun May 10 10:44:12 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0BF6533D for ; Sun, 10 May 2015 10:44:12 +0000 (UTC) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id C48F81E11 for ; Sun, 10 May 2015 10:44:11 +0000 (UTC) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 7D29D3C50AE; Sun, 10 May 2015 20:16:15 +1000 (AEST) Date: Sun, 10 May 2015 20:16:14 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: small cleanup patch for e_pow.c In-Reply-To: <20150510061458.GA82518@troutmask.apl.washington.edu> Message-ID: <20150510172810.E1812@besplex.bde.org> References: <20150510002910.GA82261@troutmask.apl.washington.edu> <20150510113454.O841@besplex.bde.org> <20150510061458.GA82518@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=A5NVYcmG c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=1_pA0lLU-LACsFVV59MA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 May 2015 10:44:12 -0000 On Sat, 9 May 2015, Steve Kargl wrote: > On Sun, May 10, 2015 at 11:59:55AM +1000, Bruce Evans wrote: >> On Sat, 9 May 2015, Steve Kargl wrote: >> >>> In reading, e_pow.c I found a small piece of code that >>> can be remove. Anyone object? >>> >>> Index: src/e_pow.c >>> =================================================================== >>> --- src/e_pow.c (revision 1603) >>> +++ src/e_pow.c (working copy) >>> @@ -187,10 +187,6 @@ __ieee754_pow(double x, double y) >>> >>> /* |y| is huge */ >>> if(iy>0x41e00000) { /* if |y| > 2**31 */ >>> - if(iy>0x43f00000){ /* if |y| > 2**64, must o/uflow */ >>> - if(ix<=0x3fefffff) return (hy<0)? huge*huge:tiny*tiny; >>> - if(ix>=0x3ff00000) return (hy>0)? huge*huge:tiny*tiny; >>> - } >>> /* over/underflow if x is not close to one */ >>> if(ix<0x3fefffff) return (hy<0)? s*huge*huge:s*tiny*tiny; >>> if(ix>0x3ff00000) return (hy>0)? s*huge*huge:s*tiny*tiny; >> >> It seems to be just an optimization. It is a large optimization for >> the huge args, but those are not common, and is at most a tiny >> pessimization for non-huge args (just an extra branch which can be >> predicted perfectly if non-huge args are never used). > > I don't see how this can be an optimization. The code has the form > > if (|y| > 2**31) { > if (|y| > 2**64) { > if (a) return > if (b) return > } > if (a') return > if (b') return > ... > } > > The difference between a and a' is <= instead of <, and similar for > b and b' with >= and >. If either a or b would have return, so will > a' and b'. If neither a nor b return, then neither a' nor b' will > return. The a' and b' lines were added by das in r141296, where the > commit message says the change is for fdlibm 5.3 compatibility. Hmm, I read the comment and not the code. According to the comment, there are no tests a and b. But these are apparently needed. Obviously, we must return 1 when x is exactly 1, and it is apparently possible for overflow/underflow to occur for values near 1 even when y is huge. > I'll also note that block like 'if (|y|>2**64) { }' is not present > in e_powf.c Apparently just another translation error. The tests for the double precision case are probably fuzzy just for efficiency -- we want to look at only the high word. In float precision, we can look at all the bits. Since even 1 bit away from 1 is relatively large in float precision, overflow/underflow is more likely to happen automatically. Indeed, (1+2**-23)**(2**30) is a large multiple of FLT_MAX although (1+2**-23)**(2**29) is a small fraction of FLT_MAX; N >= 2**30 and above also gives underflow for (1-2**-24)**N; N >= 2**62 gives overflow in double precision for (1+2**-52)**N, ... So 1 is the only numbers near 1 that doesn't give overflow. Here is the code again, with explanations: X if(iy>0x41e00000) { /* if |y| > 2**31 */ X if(iy>0x43f00000){ /* if |y| > 2**64, must o/uflow */ The correct fuzzy (power of 2) threshold seems to be >= 2**62. X if(ix<=0x3fefffff) return (hy<0)? huge*huge:tiny*tiny; X if(ix>=0x3ff00000) return (hy>0)? huge*huge:tiny*tiny; Then we must determine on which side of 1 |x| lies. ix <= 0x3fefffff clearly imples that |x| < 1. Otherwise, ix >= 0x3ff00000 and |x| >= 1. The last inequality is strict since |x| = 1 has already been handled. The check that ix >= 0x3ff00000 is redundant and should be omitted. The first check should be written more naturally as ix < 0x3ff00000. In float precision, the first check can be written even more easily as ix < 0x3f800000. X } X /* over/underflow if x is not close to one */ X if(ix<0x3fefffff) return (hy<0)? s*huge*huge:s*tiny*tiny; X if(ix>0x3ff00000) return (hy>0)? s*huge*huge:s*tiny*tiny; Removing the equal signs from the inequalities makes a signficant difference. There are 2**32 values with ix == 0x3fefffff and another 2**32 values with ix == 0x3ff000000. All of these are close to 1. Since |iy| is not so large now, |x| has to be signficantly different from 1 for overflow/underflow to occur. It probably occurs for many of the 2 * 2**32 values not found by these tests; the tests are just quick fuzzy ones using ix. In float precision, we can find the exact thresholds for overflow and underflow. Perhaps the Cygnus thresholds are exact. das didn't actually add a' and b', but just changed the spelling of sn in them: X @@ -192,9 +191,9 @@ X } X /* over/underflow if x is not close to one */ X - if(ix<0x3fefffff) return (hy<0)? sn*huge*huge:sn*tiny*tiny; X - if(ix>0x3ff00000) return (hy>0)? sn*huge*huge:sn*tiny*tiny; X - /* now |1-x| is tiny <= 2**-20, suffice to compute X + if(ix<0x3fefffff) return (hy<0)? s*huge*huge:s*tiny*tiny; X + if(ix>0x3ff00000) return (hy>0)? s*huge*huge:s*tiny*tiny; X + /* now |1-x| is tiny <= 2**-20, suffice to compute X log(x) by x-x^2/2+x^3/3-x^4/4 */ X - t = ax-1; /* t has 20 trailing zeros */ X + t = ax-one; /* t has 20 trailing zeros */ X w = (t*t)*(0.5-t*(0.3333333333333333333333-t*0.25)); X u = ivln2_h*t; /* ivln2_h has 21 sig. bits */ Bruce