From owner-freebsd-numerics@FreeBSD.ORG Sun Dec 15 16:24:27 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3E776D11 for ; Sun, 15 Dec 2013 16:24:27 +0000 (UTC) Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au [211.29.132.59]) by mx1.freebsd.org (Postfix) with ESMTP id DE56C1F53 for ; Sun, 15 Dec 2013 16:24:26 +0000 (UTC) Received: from c122-106-144-87.carlnfd1.nsw.optusnet.com.au (c122-106-144-87.carlnfd1.nsw.optusnet.com.au [122.106.144.87]) by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id 35E151A301F; Mon, 16 Dec 2013 03:01:28 +1100 (EST) Date: Mon, 16 Dec 2013 03:00:42 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: dead code in lgamma_r[f]? In-Reply-To: <20131210073840.GA94002@troutmask.apl.washington.edu> Message-ID: <20131216010148.L4928@besplex.bde.org> References: <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> <20131206114426.I1329@besplex.bde.org> <20131207064602.GA76042@troutmask.apl.washington.edu> <20131208012939.GA80145@troutmask.apl.washington.edu> <20131208141627.F1181@besplex.bde.org> <20131208183339.GA83010@troutmask.apl.washington.edu> <20131208185941.GA83484@troutmask.apl.washington.edu> <20131209223720.GA91828@troutmask.apl.washington.edu> <20131210155635.J1022@besplex.bde.org> <20131210073840.GA94002@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=YYGEuWhf c=1 sm=1 tr=0 a=p/w0leo876FR0WNmYI1KeA==:117 a=PO7r1zJSAAAA:8 a=m8NKqXhiB1EA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=ElNGO9NIuDsA:10 a=lBF535-FGMuUv4ZbaAwA:9 a=CY0hOcAYUpUtOGBY:21 a=bvDuWBERGOWNEt0m:21 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@FreeBSD.org, Filipe Maia X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Dec 2013 16:24:27 -0000 On Mon, 9 Dec 2013, Steve Kargl wrote: > On Tue, Dec 10, 2013 at 04:26:12PM +1100, Bruce Evans wrote: >> On Mon, 9 Dec 2013, Steve Kargl wrote: >>> . In the domain [0,2), move the three minimax approximations embedded >>> within __ieee75_lgamma_r() into three 'static inline double' functions. >> >> This is too invasive. It is only a style change to a slighly worse style. > > It's only a style change to a much better style in that I now > have some idea of what to do get working ld80 and ld128 versions. "slightly worse" != "much better" :-). The disadvantages are that the combinations of expressions are a little harder to see and more than a little harder to debug (since debugging of inline functions is broken). >> Like whitespace fixes. >> >>> r = -__ieee754_log(x); >>> - if(ix>=0x3FE76944) {y = one-x; i= 0;} >>> - else if(ix>=0x3FCDA661) {y= x-(tc-one); i=1;} >>> - else {y = x; i=2;} >>> + if (ix >= 0x3FE76944) >>> + r += func0(1 - x); >>> + else if (ix >= 0x3FCDA661) >>> + r += func1(x - (tc - 1)); >>> + else >>> + r += func2(x); >> >> The simplification from using func*() seems to be mainly avoiding a case >> statement and the case selector variable i. > > It also allows one to clearly see that here func[0-2] are > evaluations of lgamma(1+x) where subdomains in [0,2) are > being mapped into other domains. I missed that the [0-2] in the names are more than function numbers. I still disagree with this change. >>> ... >>> switch(i) { >>> - case 7: z *= (y+6.0); /* FALLTHRU */ >>> - case 6: z *= (y+5.0); /* FALLTHRU */ >>> - case 5: z *= (y+4.0); /* FALLTHRU */ >>> - case 4: z *= (y+3.0); /* FALLTHRU */ >>> - case 3: z *= (y+2.0); /* FALLTHRU */ >>> + case 7: z *= (y+6); /* FALLTHRU */ >>> + case 6: z *= (y+5); /* FALLTHRU */ >>> + case 5: z *= (y+4); /* FALLTHRU */ >>> + case 4: z *= (y+3); /* FALLTHRU */ >>> + case 3: z *= (y+2); /* FALLTHRU */ >> >> The comment indentation would be wrong now, except this file already uses >> totally inconsistent comment indentation. The float version blindly moved >> the indentation in the other direction by adding float casts. > > The change reduces the diff between float and double versions. That's the removal of the types in the constants. The indentation shouldn't be changed by blind substitution. This block of code is bad for the float version anyway. Testing another version showed that the asymptotic formula gives enough precision for floats (29 bits when evaluated with double coefficients and insignificant rounding errors) starting at x = 4. So 4 of the above 5 cases shouldn't have been there for float precision, and with only 1 case there shouldn't be a case statement. Reducing the number of terms in the asymptotic formula might eliminate the 1 case. However, I couldn't get this to work (to remove 1 case), and you removed 4 terms in the asymptotic formula instead. I think you couldn't get this to work either :-) -- my calculations show that for gamma(4), this reduced the accuracy of the formula from 29 bits to 18. Not a problem, since the threshold is still 8. However, the accuracy at 8 is now only 24+ bits (down from 49+). Someone has an off-by-1 error -- 49+ bits is not enough for the threshold of 8 which is used for double precision. But a threshold of 9 gives 57+ bits which is is good. The fdlibm coefficients work much better than untuned Bernoulli coefficients -- 4 or 5 more terms were needed with untuned Bernoulli coeffs to get the same accuracy at x = 9. I now see why your reduction in the number of terms has little chance of working properly. You just removed the higher terms. Cygnus already blindly rounded the coefficients; that would de-tune them, and could easily make them worse than untuned Bernoulli coefficients, and/or make the higher terms worse than useless. Thus removing the higher terms might have increased the accuracy. However, it has little chance of working as well as tuned minimax coefficients. As mentioned above, the fdlibm ones give a surprising amount of accuracy. It isn't clear that so much accuracy is available in float precision without using relatively more terms. My testing of this was sloppy -- I took the decimal coeffs and didn't round them to double precision, and for w1 I used a high-precision pari expression. For accuracy, it seems best to use more terms in the asymptotic formula so that it covers more cases without needing shifts up or down. In the asymptotic formula, the rounding error is dominated by the leading terms but is quite large (several ulps). The approximations for x near 1 and 2 are more accurate, but the above loses accuracy in shifting down. My calculations involved trying to get my old shifting-up code to work without explicit (slow) extra precision. This showed that this is not a good method for efficiency, even without the extra precision. The sloppy version of it runs barely faster than the less-sloppy fdlibm version. So the method is good for at most complex args, since I don't know of another method for them (except args with negative real part should start by using the reflection formula, since shifting up the real part from large negative is too inefficient and inaccurate). Bruce