From owner-freebsd-numerics@FreeBSD.ORG Sun Dec 8 01:29:41 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4B08DE6D for ; Sun, 8 Dec 2013 01:29:41 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2458B16FF for ; Sun, 8 Dec 2013 01:29:41 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id rB81TeuQ080225; Sat, 7 Dec 2013 17:29:40 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id rB81Te0h080224; Sat, 7 Dec 2013 17:29:40 -0800 (PST) (envelope-from sgk) Date: Sat, 7 Dec 2013 17:29:39 -0800 From: Steve Kargl To: Bruce Evans Subject: Re: dead code in lgamma_r[f]? Message-ID: <20131208012939.GA80145@troutmask.apl.washington.edu> References: <20131205173700.GA64575@troutmask.apl.washington.edu> <20131205182324.GA65135@troutmask.apl.washington.edu> <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> <20131206114426.I1329@besplex.bde.org> <20131207064602.GA76042@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131207064602.GA76042@troutmask.apl.washington.edu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@FreeBSD.org, Filipe Maia X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Dec 2013 01:29:41 -0000 On Fri, Dec 06, 2013 at 10:46:02PM -0800, Steve Kargl wrote: > On Fri, Dec 06, 2013 at 12:55:16PM +1100, Bruce Evans wrote: > > On Fri, 6 Dec 2013, Bruce Evans wrote: > > > > > On Thu, 5 Dec 2013, Steve Kargl wrote: > > > ... > > >> If we again look at the code from __ieee754_lgamma_r(), we see > > >> that sin_pi() is called if ix < 0x43300000, so by the time we > > >> arrive at the 'if(ix<0x43300000)' statement we already know that > > >> the condition is true. > > > > > > No, only for negative integers. hx<0 classifies negative values, and > > > then ix>=0x43300000 classifies numbers that are so large negative that > > > they must be integers, and the result is a sort of pole error. We > > > are just filtering out this case, perhaps as an optimization. > > > > Oops, sin_pi() is only called for negative integers, so your change > > seems to be correct. Just add a comment about the limited domain > > of sin_pi() (it already has one saying that "x is assumed negative". > > > > I wish to retract my earlier statement that after 2 additional > years of reading fdlibm code that it was easier to work with. > I spent the better part of Friday giving myself a headache trying > to understand the algorithm for lgamma_r(). The code for x in > the interval (0,2) does not match any comment in lgamma_r(). > > I also think, but can't prove yet, that like erff() the > polynomial and rational approximations in lgammaf_r() have > too many terms. > I really need to stop looking at fdlibm code. The threshold(s) in e_lgammaf.c could be chosen better (and actually documented). In the float code /* 8.0 <= x < 2**58 */ } else if (ix < 0x4e800000) { // (ix < 0x5c800000) { t = __ieee754_logf(x); z = one/x; y = z*z; w = w0+z*(w1+y*(w2+y*(w3+y*(w4+y*(w5+y*w6))))); r = (x-half)*(t-one)+w; } else /* 2**58 <= x <= inf */ r = x*(__ieee754_logf(x)-one); the 2**58 is much too large. Asymptotically, we have lgamma(x)~(x-0.5)log(x)-x = x*(log(x)-1) - 0.5*log(x) where the 0.5*log(x) is insignificant when compared to 2**58. I suspect that 2**58 is a sloppy threshold for double, which can be reduced to 2**30 (or smaller) for float. I suppose I could look for a tighter bound, but 2**(p+slop) with p, the precision, probably is sufficient. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Sun Dec 8 04:21:15 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9AA5E67F for ; Sun, 8 Dec 2013 04:21:15 +0000 (UTC) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 484051027 for ; Sun, 8 Dec 2013 04:21:14 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 8D33442032F; Sun, 8 Dec 2013 15:21:06 +1100 (EST) Date: Sun, 8 Dec 2013 15:21:04 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: dead code in lgamma_r[f]? In-Reply-To: <20131208012939.GA80145@troutmask.apl.washington.edu> Message-ID: <20131208141627.F1181@besplex.bde.org> References: <20131205173700.GA64575@troutmask.apl.washington.edu> <20131205182324.GA65135@troutmask.apl.washington.edu> <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> <20131206114426.I1329@besplex.bde.org> <20131207064602.GA76042@troutmask.apl.washington.edu> <20131208012939.GA80145@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=YYGEuWhf c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=m8NKqXhiB1EA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=ElNGO9NIuDsA:10 a=YmPyuLAzrPIqYOFI3MIA:9 a=7i44fyHmEgJOJrlA:21 a=KHVXADVR9NTKJXXW:21 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@FreeBSD.org, Filipe Maia X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Dec 2013 04:21:15 -0000 On Sat, 7 Dec 2013, Steve Kargl wrote: > I really need to stop looking at fdlibm code. The threshold(s) in > e_lgammaf.c could be chosen better (and actually documented). In > the float code > > /* 8.0 <= x < 2**58 */ > } else if (ix < 0x4e800000) { // (ix < 0x5c800000) { > t = __ieee754_logf(x); > z = one/x; > y = z*z; > w = w0+z*(w1+y*(w2+y*(w3+y*(w4+y*(w5+y*w6))))); > r = (x-half)*(t-one)+w; > } else > /* 2**58 <= x <= inf */ > r = x*(__ieee754_logf(x)-one); > > the 2**58 is much too large. Asymptotically, we have That's Cygnus code, not fdlibm :-). > lgamma(x)~(x-0.5)log(x)-x = x*(log(x)-1) - 0.5*log(x) > > where the 0.5*log(x) is insignificant when compared to 2**58. > I suspect that 2**58 is a sloppy threshold for double, which > can be reduced to 2**30 (or smaller) for float. I suppose I > could look for a tighter bound, but 2**(p+slop) with p, the > precision, probably is sufficient. The Cygnus translation to float didn't look closely enough at the thresholds. gamma() was the first part of libm that I looked closely at (even before trig functions in 2004/5). I didn't understand it at first and tried to write my own. My first idea was to use the asymptotic expansion as a kernel for all cases, especially for complex args. To use the asymptotic expansion, the real part of the arg must be pushed up to about 16 or greater using repeated multiplications. It is difficult to do these multiplications accurately enough, but the float case should be very easy using double precision multiplication. I tried extra precision in software, but this was too slow. This method also works poorly for args with negative real part, especially for lgamma() near its zeros. Even using doubled double precision didn't get near 1 ulp of accuracy in float precision for lgamma() near its zeros, and the only fix that I know of is to use a special approximation near each zero down to about real part -70 (70 cases). fdlibm is also bad for lgamma() near its zeros. Here is a pari version of this (the C version is harder to understand due to its technical details and less suitable language; pari syntax is worse than C syntax for most things (including things that pari should be able to do well) but slightly better for this use: % g(z) = % { % local(r); % % r = 1; % while (abs(z) < M, % r = r * z; % z = z + 1; % ); % return (exp(lgb(z)) / r); % } This is cgamma(). The whole thing, modulo not handling exceptional args and depending on subroutines. It first pushes up Real(z). It uses abs(z) instead of Real(z) for some reason. Apparently it only works for real z and is buggy for large negative z (small negative z get pushed up towards +Inf). Then it calls the local routine lgb() to do the asymptotic expansion and the pari builtin exp() to convert from clgamma() to cgamma(). % % lg(z) = % { % local(bol, rnew); % % if (abs(z - 1) < 1.e-1, return (lngamma(z));); % if (abs(z - 2) < 1.e-1, return (lngamma(z));); Near 1 and 2, I planned to use a poly approximation, but that is not so easy to do in pari so I call the pari builtin lgamma() to fake it. % bol = 0; % r = 1; % while (abs(z) < M, % rnew = r * z; % if (imag(r) > 0 && imag(rnew) < 0, bol += 1); % if (imag(r) < 0 && imag(rnew) > 0, bol -= 1); % r = rnew; % z = z + 1; % ); This pushes up z, almost as above. % bol = 0; % return (lgb(z) - log(r) - 2 * Pi * bol); Then use the asymptotic expansion. % } % % lgb(z) = (z - 0.5) * log(z) - z + 0.5 * log(2 * Pi) + \ % sum(n = 1, N, b[n] / z^(2 * n - 1)) This is the asymptotic expansion. Pari syntax is only perfect for numeric things like this (and probably for some number theory things which I don't know well). b[n] is a table of coefficients. I had forgotten the details of even the first term in this. You mentioned using an approximation of just the first term or two above a huge threshold. It might be better to go much lower using another term or two (but not as many as medium-sized args). % % setb(N) = % { % b = vector(N); % for (n = 1, N, % b[n] = roundn(bernreal(2 * n) / (2 * n * (2 * n - 1)), P); % ); % b = precision(b, 100); % } This initializes b[] using the builtin bernreal() which gives Bernoulli numbers. roundn() is a local function (not shown) that rounds to 64 bits in IEEE round-to-nearest mode. IEEE arithmetic is too hard to fake everywhere here. it would need a rounding step after every operation. I only adding these steps while developing clog(), mainly to understand the limitations of tripled double precision. % % pg() = plot(x = A, B, abs(g(x + C * I) / gamma(x + C * I) - 1)) % pq() = plot(x = A, B, imag(g(x + C * I)) / imag(gamma(x + C * I)) - 1) % xgamma(z) = % { % if ((truncate(z) - z) < 1e-20, return (100000);); % if (abs(gamma(z) - 1) < 1.e-40, return (1 + 1.e-40);); % return (gamma(z)); % } % pr() = plot(x = A, B, abs(1 / (xgamma(x) - 1))) % % plg() = plot(x = A, B, abs(lg(x + C * I) / lngamma(x + C * I) - 1)) % % A = 1; % B = 100; % C = 0; % M = 17; % N = 8; % P = 64; % setb(100); % gamma(10) % g(10) Testing stuff. Only M, N and P are of interest. M gives the threshold for the asymptotic expansion. N gives the number of terms in it that are used, not counting the furst couple. P gives the precision. Results: 362880.0000000000000000000000 362879.9999999999999999714634 Pari gamma(10) gave the exact result (9 factorial). g(10) got 22 digits corect, which is more than can be hoped for (there would be more rounding errors with finite precision throughout). g() is sloppy for complex args, but after changing 10 to 10+I it works well: -217255.6436196325446121126363 + 267132.0341468011581535241915*I (gamma) -217255.6436196325446121611110 + 267132.0341468011581534939537*I (g) Cephes has a very complicated lgamma() or gamma(), with many special poly approximations for 0 < x < 8 or similar. glibc adopted this. It seems too hard to scale this for complex args. You just can't write a special approximation that even converges in a strip 1 < Re(z) < 8 or similar. But pushing up the real part until the asymptotic expansion works is a very general method. For complex long double args, maybe we should just use this method and not worry about the losses of accuracy from the multiplications. 16 multiplications would lose about 32 ulps. This would give imprecise.c ;-(. A few more than 16 would be needed for more than float precision. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sun Dec 8 18:33:46 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EF2624B3 for ; Sun, 8 Dec 2013 18:33:45 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C10CB15B5 for ; Sun, 8 Dec 2013 18:33:45 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id rB8IXdep083385; Sun, 8 Dec 2013 10:33:39 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id rB8IXdXb083384; Sun, 8 Dec 2013 10:33:39 -0800 (PST) (envelope-from sgk) Date: Sun, 8 Dec 2013 10:33:39 -0800 From: Steve Kargl To: Bruce Evans Subject: Re: dead code in lgamma_r[f]? Message-ID: <20131208183339.GA83010@troutmask.apl.washington.edu> References: <20131205173700.GA64575@troutmask.apl.washington.edu> <20131205182324.GA65135@troutmask.apl.washington.edu> <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> <20131206114426.I1329@besplex.bde.org> <20131207064602.GA76042@troutmask.apl.washington.edu> <20131208012939.GA80145@troutmask.apl.washington.edu> <20131208141627.F1181@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131208141627.F1181@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@FreeBSD.org, Filipe Maia X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Dec 2013 18:33:46 -0000 On Sun, Dec 08, 2013 at 03:21:04PM +1100, Bruce Evans wrote: > On Sat, 7 Dec 2013, Steve Kargl wrote: > > > I really need to stop looking at fdlibm code. The threshold(s) in > > e_lgammaf.c could be chosen better (and actually documented). In > > the float code > > > > /* 8.0 <= x < 2**58 */ > > } else if (ix < 0x4e800000) { // (ix < 0x5c800000) { > > t = __ieee754_logf(x); > > z = one/x; > > y = z*z; > > w = w0+z*(w1+y*(w2+y*(w3+y*(w4+y*(w5+y*w6))))); > > r = (x-half)*(t-one)+w; > > } else > > /* 2**58 <= x <= inf */ > > r = x*(__ieee754_logf(x)-one); > > > > the 2**58 is much too large. Asymptotically, we have > > That's Cygnus code, not fdlibm :-). > > > lgamma(x)~(x-0.5)log(x)-x = x*(log(x)-1) - 0.5*log(x) > > > > where the 0.5*log(x) is insignificant when compared to 2**58. > > I suspect that 2**58 is a sloppy threshold for double, which > > can be reduced to 2**30 (or smaller) for float. I suppose I > > could look for a tighter bound, but 2**(p+slop) with p, the > > precision, probably is sufficient. > > The Cygnus translation to float didn't look closely enough at > the thresholds. It also did not recompute minimax coefficients as there are too many terms in the approximation. Looks like Cygnus simply rounded the double coefficients to float. > gamma() was the first part of libm that I looked closely at (even > before trig functions in 2004/5). I didn't understand it at first and > tried to write my own. My first idea was to use the asymptotic > expansion as a kernel for all cases, especially for complex args. > To use the asymptotic expansion, the real part of the arg must be > pushed up to about 16 or greater using repeated multiplications. > It is difficult to do these multiplications accurately enough, > but the float case should be very easy using double precision > multiplication. I tried extra precision in software, but this > was too slow. This method also works poorly for args with negative > real part, especially for lgamma() near its zeros. Even using > doubled double precision didn't get near 1 ulp of accuracy in float > precision for lgamma() near its zeros, and the only fix that I > know of is to use a special approximation near each zero down > to about real part -70 (70 cases). fdlibm is also bad for lgamma() > near its zeros. Interesting idea. For x in [2,8), fdlibm code (as you probably know) uses recursion to reduce the evaluation to a log of a product and lgamma(2+s) with s in [0,1). One can then show that lgamma(2+s) = s * (1 - E) + s**2 * P(s) where E is Euler's constant and P(s) is a polynomial in s. But, instead of using a minimax polynomial approximation fdlibm code uses a rational approximation lgamma(2+s) = 0.5*s + p(s) / q(s). This is the origin of the s0, ..., s6, r1, ..., r6 coefficients. > Here is a pari version of this (the C version is harder to understand > due to its technical details and less suitable language; pari syntax > is worse than C syntax for most things (including things that pari > should be able to do well) but slightly better for this use: I suppose one of these days, I'll learn how to use pari. A quick scan of n1256.pdf shows that C99 does not require lgamma of a complex argument, but it does reserve the names clgamma[fl]. > % lg(z) = > % { > % local(bol, rnew); > % > % if (abs(z - 1) < 1.e-1, return (lngamma(z));); > % if (abs(z - 2) < 1.e-1, return (lngamma(z));); > > Near 1 and 2, I planned to use a poly approximation, but that is not > so easy to do in pari so I call the pari builtin lgamma() to fake it. fdlibm's lgammaf_r does not appear to have an issue near 1 or 2. Starting at the lower bound of each interval and using nextafterf to scan up to the upper bound, I'm seeing % make testf && ./testf Interval: Time ULP Value [4.7683716e-07, 2.0000000e+00): 0.04590 0.98491 1.231645e+00 [2.0000000e+00, 8.0000000e+00): 0.08754 0.82825 4.012439e+00 [8.0000000e+00, 2.8823038e+17): 0.05841 1.43789 8.942273e+06 [2.8823038e+17, 2.6584560e+36): 0.05095 0.50000 1.491249e+23 where the reference value is from lgamma_r. The different intervals test specific branches in lgammaf_r. The upper bound of 2.6584560e36 is 0x1p121. Time is the average value for all calls in the interval in usec/call. > % bol = 0; > % r = 1; > % while (abs(z) < M, > % rnew = r * z; > % if (imag(r) > 0 && imag(rnew) < 0, bol += 1); > % if (imag(r) < 0 && imag(rnew) > 0, bol -= 1); > % r = rnew; > % z = z + 1; > % ); > > This pushes up z, almost as above. > > % bol = 0; > % return (lgb(z) - log(r) - 2 * Pi * bol); > > Then use the asymptotic expansion. > > % } > % > % lgb(z) = (z - 0.5) * log(z) - z + 0.5 * log(2 * Pi) + \ > % sum(n = 1, N, b[n] / z^(2 * n - 1)) > > This is the asymptotic expansion. Pari syntax is only perfect for > numeric things like this (and probably for some number theory things > which I don't know well). b[n] is a table of coefficients. > > I had forgotten the details of even the first term in this. You > mentioned using an approximation of just the first term or two above > a huge threshold. It might be better to go much lower using another > term or two (but not as many as medium-sized args). Well, I was only interested in the poorly chosen 2**58 threshold. In the [8,2**58) interval the asymptotic expansion is used, but its written in such a way that the >% + 0.5 * log(2 * Pi) + sum(n = 1, N, b[n] / z^(2 * n - 1)) portion of your above code is approximated by a minimax polynomial. /* 8.0 <= x < 2**58 */ } else if (ix < 0x5c800000) { t = __ieee754_logf(x); z = one/x; y = z*z; w = w0+z*(w1+y*(w2+y*(w3+y*(w4+y*(w5+y*w6))))); r = (x-half)*(t-one)+w; } else fdlibm use only the first 2 terms for the x >= 2**58. > Cephes has a very complicated lgamma() or gamma(), with many special > poly approximations for 0 < x < 8 or similar. fdlibm's approximations in the [0,2) range are mostly black magic to me. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Sun Dec 8 18:59:42 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 01CC7D00 for ; Sun, 8 Dec 2013 18:59:42 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B47DF16E9 for ; Sun, 8 Dec 2013 18:59:41 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id rB8Ixfn4083492; Sun, 8 Dec 2013 10:59:41 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id rB8IxfuJ083491; Sun, 8 Dec 2013 10:59:41 -0800 (PST) (envelope-from sgk) Date: Sun, 8 Dec 2013 10:59:41 -0800 From: Steve Kargl To: Bruce Evans Subject: Re: dead code in lgamma_r[f]? Message-ID: <20131208185941.GA83484@troutmask.apl.washington.edu> References: <20131205173700.GA64575@troutmask.apl.washington.edu> <20131205182324.GA65135@troutmask.apl.washington.edu> <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> <20131206114426.I1329@besplex.bde.org> <20131207064602.GA76042@troutmask.apl.washington.edu> <20131208012939.GA80145@troutmask.apl.washington.edu> <20131208141627.F1181@besplex.bde.org> <20131208183339.GA83010@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131208183339.GA83010@troutmask.apl.washington.edu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@FreeBSD.org, Filipe Maia X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Dec 2013 18:59:42 -0000 On Sun, Dec 08, 2013 at 10:33:39AM -0800, Steve Kargl wrote: > > fdlibm's lgammaf_r does not appear to have an issue near 1 or 2. > Starting at the lower bound of each interval and using nextafterf > to scan up to the upper bound, I'm seeing > > % make testf && ./testf > Interval: Time ULP Value > [4.7683716e-07, 2.0000000e+00): 0.04590 0.98491 1.231645e+00 > [2.0000000e+00, 8.0000000e+00): 0.08754 0.82825 4.012439e+00 > [8.0000000e+00, 2.8823038e+17): 0.05841 1.43789 8.942273e+06 > [2.8823038e+17, 2.6584560e+36): 0.05095 0.50000 1.491249e+23 > > where the reference value is from lgamma_r. The different > intervals test specific branches in lgammaf_r. The upper > bound of 2.6584560e36 is 0x1p121. Time is the average value > for all calls in the interval in usec/call. Correction. The timing is for 1 million calls uniformly distributed over the interval. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Mon Dec 9 11:06:50 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CEB5220E for ; Mon, 9 Dec 2013 11:06:50 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B9EE41E81 for ; Mon, 9 Dec 2013 11:06:50 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id rB9B6ogF071087 for ; Mon, 9 Dec 2013 11:06:50 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id rB9B6oYS071085 for freebsd-numerics@FreeBSD.org; Mon, 9 Dec 2013 11:06:50 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 9 Dec 2013 11:06:50 GMT Message-Id: <201312091106.rB9B6oYS071085@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-numerics@FreeBSD.org Subject: Current problem reports assigned to freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Dec 2013 11:06:50 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o stand/175811 numerics libstdc++ needs complex support in order use C99 o bin/170206 numerics [msun] [patch] complex arcsinh, log, etc. o stand/82654 numerics C99 long double math functions are missing 3 problems total. From owner-freebsd-numerics@FreeBSD.ORG Mon Dec 9 22:37:28 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 90CA923C for ; Mon, 9 Dec 2013 22:37:28 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 4776216BA for ; Mon, 9 Dec 2013 22:37:28 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id rB9MbL31091886; Mon, 9 Dec 2013 14:37:21 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id rB9MbK2X091885; Mon, 9 Dec 2013 14:37:20 -0800 (PST) (envelope-from sgk) Date: Mon, 9 Dec 2013 14:37:20 -0800 From: Steve Kargl To: Bruce Evans Subject: Re: dead code in lgamma_r[f]? Message-ID: <20131209223720.GA91828@troutmask.apl.washington.edu> References: <20131205182324.GA65135@troutmask.apl.washington.edu> <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> <20131206114426.I1329@besplex.bde.org> <20131207064602.GA76042@troutmask.apl.washington.edu> <20131208012939.GA80145@troutmask.apl.washington.edu> <20131208141627.F1181@besplex.bde.org> <20131208183339.GA83010@troutmask.apl.washington.edu> <20131208185941.GA83484@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131208185941.GA83484@troutmask.apl.washington.edu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@FreeBSD.org, Filipe Maia X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Dec 2013 22:37:28 -0000 On Sun, Dec 08, 2013 at 10:59:41AM -0800, Steve Kargl wrote: > On Sun, Dec 08, 2013 at 10:33:39AM -0800, Steve Kargl wrote: > > > > fdlibm's lgammaf_r does not appear to have an issue near 1 or 2. > > Starting at the lower bound of each interval and using nextafterf > > to scan up to the upper bound, I'm seeing > > > > % make testf && ./testf > > Interval: Time ULP Value > > [4.7683716e-07, 2.0000000e+00): 0.04590 0.98491 1.231645e+00 > > [2.0000000e+00, 8.0000000e+00): 0.08754 0.82825 4.012439e+00 > > [8.0000000e+00, 2.8823038e+17): 0.05841 1.43789 8.942273e+06 > > [2.8823038e+17, 2.6584560e+36): 0.05095 0.50000 1.491249e+23 > > > > where the reference value is from lgamma_r. The different > > intervals test specific branches in lgammaf_r. The upper > > bound of 2.6584560e36 is 0x1p121. Time is the average value > > for all calls in the interval in usec/call. > > Correction. The timing is for 1 million calls uniformly > distributed over the interval. > I would like to commit the attached patch. I will do it in 3 passes: 1) whitespace fixes, 2) literal constants changes (checked by md5), 3) the code re-organization, threshold change, and dead code removed. * lib/msun/src/e_lgamma_r.c . Remove trailing space and trailing blank line in Copyright. . Fix prototype for sin_pi to agree with the intended commit r97413 done some 11 years 6 month ago. . Remove dead code in sin_pi and remove 'if(ix<0x43300000)' as it is always true. . In the domain [0,2), move the three minimax approximations embedded within __ieee75_lgamma_r() into three 'static inline double' functions. . Remove the now unused variables p1, p2, and p3. . Use integer literal constants instead of double literal constants where possible (checked with md5). . Remove explicit cast of the int 'i' to double (checked with md5). * lib/msun/src/e_lgammaf_r.c: . The minimax polynomials have more terms than required for the precision. Remove a8-a11, t10-t14, and w3-w6. . Fix prototype for sin_pif to agree with the intended commit r97413 done some 11 years 6 month ago. . Remove dead code in sin_pif and remove 'if(ix<0x4b000000)' as it is always true. . In the domain [0,2), move the three minimax approximations embedded within __ieee75_lgamma_r() into three 'static inline float' functions. . Remove the now unused variables p1, p2, and p3. . Use integer literal constants instead of double literal constants where possible (checked with md5). . Remove explicit cast of the int 'i' to float (checked with md5). . Reduce the 2**58 threshold copied from e_lgamma_r.c to 2**30. As before and after comparison, I offer lgammaf_r() without patch Interval: Time ULP Value [4.7683716e-07, 2.0000000e+00): 0.05381 2.40423 7.169912e-01 [2.0000000e+00, 8.0000000e+00): 0.07758 1.64252 2.763559e+00 [8.0000000e+00, 1.0737418e+09): 0.06948 2.56132 1.122045e+01 [1.0737418e+09, 2.1990233e+12): 0.06967 1.29037 1.693146e+09 [2.1990233e+12, 3.9876840e+36): 0.05704 1.50689 1.424986e+14 lgammaf_r() with patch Interval: Time ULP Value [4.7683716e-07, 2.0000000e+00): 0.04888 2.40423 7.169912e-01 [2.0000000e+00, 8.0000000e+00): 0.07471 1.64252 2.763559e+00 [8.0000000e+00, 1.0737418e+09): 0.05819 2.56132 1.122045e+01 [1.0737418e+09, 2.1990233e+12): 0.05338 1.29037 1.693146e+09 [2.1990233e+12, 3.9876840e+36): 0.07397 1.50689 1.424986e+14 lgamma_r() without patch Interval: Time ULP Value [8.4703295e-22, 2.0000000e+00): 0.05961 2.09187 7.3247074760047326e-01 [2.0000000e+00, 8.0000000e+00): 0.08758 1.46347 3.9813700222966872e+00 lgamma_r() with patch Interval: Time ULP Value [8.4703295e-22, 2.0000000e+00): 0.05761 2.09187 7.3247074760047326e-01 [2.0000000e+00, 8.0000000e+00): 0.08659 1.46347 3.9813700222966872e+00 Timing is in usec/call for 1 million values uniformly distributed in the interval. -- Steve --- /usr/src/lib/msun/src/e_lgamma_r.c 2013-12-06 07:58:39.000000000 -0800 +++ src/e_lgamma_r.c 2013-12-09 13:05:22.000000000 -0800 @@ -6,10 +6,9 @@ * * Developed at SunSoft, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this - * software is freely granted, provided that this notice + * software is freely granted, provided that this notice * is preserved. * ==================================================== - * */ #include @@ -156,7 +155,8 @@ static const double zero= 0.00000000000000000000e+00; - static double sin_pi(double x) +static double +sin_pi(double x) { double y,z; int n,ix; @@ -177,15 +177,11 @@ y = 2*(y - floor(y)); /* y = |x| mod 2.0 */ n = (int)(y*4); } else { - if(ix>=0x43400000) { - y = zero; n = 0; /* y must be even */ - } else { - if(ix<0x43300000) z = y+two52; /* exact */ - GET_LOW_WORD(n,z); - n &= 1; - y = n; - n<<= 2; - } + z = y+two52; /* exact */ + GET_LOW_WORD(n,z); + n &= 1; + y = n; + n<<= 2; } switch (n) { case 0: y = __kernel_sin(pi*y,zero,0); break; @@ -201,12 +197,46 @@ } +static inline double +func0(double y) +{ + double p1, p2, z; + + z = y * y; + p1 = a0 + z * (a2 + z * (a4 + z * (a6 + z * (a8 + z * a10)))); + p2 = z * (a1 + z * (a3 + z * (a5 + z * (a7 + z * (a9 + z * a11))))); + return (y * p1 + p2 - y / 2); +} + +static inline double +func1(double y) +{ + double p1, p2, p3, w, z; + + z = y * y; + w = z * y; + p1 = t0 + w * (t3 + w * (t6 + w * (t9 + w * t12))); + p2 = t1 + w * (t4 + w * (t7 + w * (t10 + w * t13))); + p3 = t2 + w * (t5 + w * (t8 + w * (t11 + w * t14))); + return (z * p1 - (tt - w * (p2 + y * p3)) + tf); +} + +static inline double +func2(double y) +{ + double p1, p2; + + p1 = y * (u0 + y * (u1 + y * (u2 + y * (u3 + y * (u4 + y * u5))))); + p2 = one + y * (v1 + y * (v2 + y * (v3 + y * (v4 + y * v5)))); + return (p1 / p2 - y / 2); +} + double __ieee754_lgamma_r(double x, int *signgamp) { - double t,y,z,nadj,p,p1,p2,p3,q,r,w; + double t,y,z,nadj,p,q,r,w; int32_t hx; - int i,lx,ix; + int i,ix,lx; EXTRACT_WORDS(hx,lx,x); @@ -235,51 +265,36 @@ if((((ix-0x3ff00000)|lx)==0)||(((ix-0x40000000)|lx)==0)) r = 0; /* for x < 2.0 */ else if(ix<0x40000000) { - if(ix<=0x3feccccc) { /* lgamma(x) = lgamma(x+1)-log(x) */ + if (ix <= 0X3FECCCCC) { /* lgamma(x) = lgamma(x+1)-log(x) */ r = -__ieee754_log(x); - if(ix>=0x3FE76944) {y = one-x; i= 0;} - else if(ix>=0x3FCDA661) {y= x-(tc-one); i=1;} - else {y = x; i=2;} + if (ix >= 0x3FE76944) + r += func0(1 - x); + else if (ix >= 0x3FCDA661) + r += func1(x - (tc - 1)); + else + r += func2(x); } else { - r = zero; - if(ix>=0x3FFBB4C3) {y=2.0-x;i=0;} /* [1.7316,2] */ - else if(ix>=0x3FF3B4C4) {y=x-tc;i=1;} /* [1.23,1.73] */ - else {y=x-one;i=2;} - } - switch(i) { - case 0: - z = y*y; - p1 = a0+z*(a2+z*(a4+z*(a6+z*(a8+z*a10)))); - p2 = z*(a1+z*(a3+z*(a5+z*(a7+z*(a9+z*a11))))); - p = y*p1+p2; - r += (p-0.5*y); break; - case 1: - z = y*y; - w = z*y; - p1 = t0+w*(t3+w*(t6+w*(t9 +w*t12))); /* parallel comp */ - p2 = t1+w*(t4+w*(t7+w*(t10+w*t13))); - p3 = t2+w*(t5+w*(t8+w*(t11+w*t14))); - p = z*p1-(tt-w*(p2+y*p3)); - r += (tf + p); break; - case 2: - p1 = y*(u0+y*(u1+y*(u2+y*(u3+y*(u4+y*u5))))); - p2 = one+y*(v1+y*(v2+y*(v3+y*(v4+y*v5)))); - r += (-0.5*y + p1/p2); + if (ix >= 0x3FFBB4C3) /* [1.7316,2] */ + r = func0(2 - x); + else if (ix >= 0x3FF3B4C4) /* [1.23,1.73] */ + r = func1(x - tc); + else + r = func2(x - 1); } } else if(ix<0x40200000) { /* x < 8.0 */ i = (int)x; - y = x-(double)i; + y = x-i; p = y*(s0+y*(s1+y*(s2+y*(s3+y*(s4+y*(s5+y*s6)))))); q = one+y*(r1+y*(r2+y*(r3+y*(r4+y*(r5+y*r6))))); - r = half*y+p/q; + r = y/2+p/q; z = one; /* lgamma(1+s) = log(s) + lgamma(s) */ switch(i) { - case 7: z *= (y+6.0); /* FALLTHRU */ - case 6: z *= (y+5.0); /* FALLTHRU */ - case 5: z *= (y+4.0); /* FALLTHRU */ - case 4: z *= (y+3.0); /* FALLTHRU */ - case 3: z *= (y+2.0); /* FALLTHRU */ + case 7: z *= (y+6); /* FALLTHRU */ + case 6: z *= (y+5); /* FALLTHRU */ + case 5: z *= (y+4); /* FALLTHRU */ + case 4: z *= (y+3); /* FALLTHRU */ + case 3: z *= (y+2); /* FALLTHRU */ r += __ieee754_log(z); break; } /* 8.0 <= x < 2**58 */ --- /usr/src/lib/msun/src/e_lgammaf_r.c 2013-12-06 07:57:42.000000000 -0800 +++ src/e_lgammaf_r.c 2013-12-09 13:04:12.000000000 -0800 @@ -32,10 +32,6 @@ a5 = 2.8905137442e-03, /* 0x3b3d6ec6 */ a6 = 1.1927076848e-03, /* 0x3a9c54a1 */ a7 = 5.1006977446e-04, /* 0x3a05b634 */ -a8 = 2.2086278477e-04, /* 0x39679767 */ -a9 = 1.0801156895e-04, /* 0x38e28445 */ -a10 = 2.5214456400e-05, /* 0x37d383a2 */ -a11 = 4.4864096708e-05, /* 0x383c2c75 */ tc = 1.4616321325e+00, /* 0x3fbb16c3 */ tf = -1.2148628384e-01, /* 0xbdf8cdcd */ /* tt = -(tail of tf) */ @@ -50,11 +46,6 @@ t7 = -3.6845202558e-03, /* 0xbb7177fe */ t8 = 2.2596477065e-03, /* 0x3b141699 */ t9 = -1.4034647029e-03, /* 0xbab7f476 */ -t10 = 8.8108185446e-04, /* 0x3a66f867 */ -t11 = -5.3859531181e-04, /* 0xba0d3085 */ -t12 = 3.1563205994e-04, /* 0x39a57b6b */ -t13 = -3.1275415677e-04, /* 0xb9a3f927 */ -t14 = 3.3552918467e-04, /* 0x39afe9f7 */ u0 = -7.7215664089e-02, /* 0xbd9e233f */ u1 = 6.3282704353e-01, /* 0x3f2200f4 */ u2 = 1.4549225569e+00, /* 0x3fba3ae7 */ @@ -81,15 +72,12 @@ r6 = 7.3266842264e-06, /* 0x36f5d7bd */ w0 = 4.1893854737e-01, /* 0x3ed67f1d */ w1 = 8.3333335817e-02, /* 0x3daaaaab */ -w2 = -2.7777778450e-03, /* 0xbb360b61 */ -w3 = 7.9365057172e-04, /* 0x3a500cfd */ -w4 = -5.9518753551e-04, /* 0xba1c065c */ -w5 = 8.3633989561e-04, /* 0x3a5b3dd2 */ -w6 = -1.6309292987e-03; /* 0xbad5c4e8 */ +w2 = -2.7777778450e-03; /* 0xbb360b61 */ static const float zero= 0.0000000000e+00; - static float sin_pif(float x) +static float +sin_pif(float x) { float y,z; int n,ix; @@ -110,15 +98,11 @@ y = 2*(y - floorf(y)); /* y = |x| mod 2.0 */ n = (int)(y*4); } else { - if(ix>=0x4b800000) { - y = zero; n = 0; /* y must be even */ - } else { - if(ix<0x4b000000) z = y+two23; /* exact */ - GET_FLOAT_WORD(n,z); - n &= 1; - y = n; - n<<= 2; - } + z = y+two23; /* exact */ + GET_FLOAT_WORD(n,z); + n &= 1; + y = n; + n<<= 2; } switch (n) { case 0: y = __kernel_sindf(pi*y); break; @@ -133,11 +117,44 @@ return -y; } +static inline float +func0(float y) +{ + float p1, p2, z; + + z = y * y; + p1 = a0 + z * (a2 + z * (a4 + z * a6)); + p2 = z * (a1 + z * (a3 + z * (a5 + z * a7))); + return (y * p1 + p2 - y / 2); +} + +static inline float +func1(float y) +{ + float p1, p2, p3, w, z; + + z = y * y; + w = z * y; + p1 = t0 + w * (t3 + w * (t6 + w * t9)); + p2 = t1 + w * (t4 + w * t7); + p3 = t2 + w * (t5 + w * t8); + return (z * p1 - (tt - w * (p2 + y * p3)) + tf); +} + +static inline float +func2(float y) +{ + float p1, p2; + + p1 = y * (u0 + y * (u1 + y * (u2 + y * (u3 + y * (u4 + y * u5))))); + p2 = one + y * (v1 + y * (v2 + y * (v3 + y * (v4 + y * v5)))); + return (p1 / p2 - y / 2); +} float __ieee754_lgammaf_r(float x, int *signgamp) { - float t,y,z,nadj,p,p1,p2,p3,q,r,w; + float t,y,z,nadj,p,q,r,w; int32_t hx; int i,ix; @@ -168,62 +185,47 @@ if (ix==0x3f800000||ix==0x40000000) r = 0; /* for x < 2.0 */ else if(ix<0x40000000) { - if(ix<=0x3f666666) { /* lgamma(x) = lgamma(x+1)-log(x) */ + if (ix <= 0x3f666666) { /* lgamma(x) = lgamma(x+1)-log(x) */ r = -__ieee754_logf(x); - if(ix>=0x3f3b4a20) {y = one-x; i= 0;} - else if(ix>=0x3e6d3308) {y= x-(tc-one); i=1;} - else {y = x; i=2;} + if (ix >= 0x3f3b4a20) + r += func0(1 - x); + else if (ix >= 0x3e6d3308) + r += func1(x - (tc - 1)); + else + r += func2(x); } else { - r = zero; - if(ix>=0x3fdda618) {y=(float)2.0-x;i=0;} /* [1.7316,2] */ - else if(ix>=0x3F9da620) {y=x-tc;i=1;} /* [1.23,1.73] */ - else {y=x-one;i=2;} - } - switch(i) { - case 0: - z = y*y; - p1 = a0+z*(a2+z*(a4+z*(a6+z*(a8+z*a10)))); - p2 = z*(a1+z*(a3+z*(a5+z*(a7+z*(a9+z*a11))))); - p = y*p1+p2; - r += (p-(float)0.5*y); break; - case 1: - z = y*y; - w = z*y; - p1 = t0+w*(t3+w*(t6+w*(t9 +w*t12))); /* parallel comp */ - p2 = t1+w*(t4+w*(t7+w*(t10+w*t13))); - p3 = t2+w*(t5+w*(t8+w*(t11+w*t14))); - p = z*p1-(tt-w*(p2+y*p3)); - r += (tf + p); break; - case 2: - p1 = y*(u0+y*(u1+y*(u2+y*(u3+y*(u4+y*u5))))); - p2 = one+y*(v1+y*(v2+y*(v3+y*(v4+y*v5)))); - r += (-(float)0.5*y + p1/p2); + if (ix >= 0x3fdda618) /* [1.7316,2] */ + r = func0(2 - x); + else if (ix >= 0x3F9da620) /* [1.23,1.73] */ + r = func1(x - tc); + else + r = func2(x - 1); } } else if(ix<0x41000000) { /* x < 8.0 */ i = (int)x; - y = x-(float)i; + y = x-i; p = y*(s0+y*(s1+y*(s2+y*(s3+y*(s4+y*(s5+y*s6)))))); q = one+y*(r1+y*(r2+y*(r3+y*(r4+y*(r5+y*r6))))); - r = half*y+p/q; + r = y/2+p/q; z = one; /* lgamma(1+s) = log(s) + lgamma(s) */ switch(i) { - case 7: z *= (y+(float)6.0); /* FALLTHRU */ - case 6: z *= (y+(float)5.0); /* FALLTHRU */ - case 5: z *= (y+(float)4.0); /* FALLTHRU */ - case 4: z *= (y+(float)3.0); /* FALLTHRU */ - case 3: z *= (y+(float)2.0); /* FALLTHRU */ + case 7: z *= (y+6); /* FALLTHRU */ + case 6: z *= (y+5); /* FALLTHRU */ + case 5: z *= (y+4); /* FALLTHRU */ + case 4: z *= (y+3); /* FALLTHRU */ + case 3: z *= (y+2); /* FALLTHRU */ r += __ieee754_logf(z); break; } - /* 8.0 <= x < 2**58 */ - } else if (ix < 0x5c800000) { + /* 8.0 <= x < 2**30 */ + } else if (ix < 0x4e800000) { t = __ieee754_logf(x); z = one/x; y = z*z; - w = w0+z*(w1+y*(w2+y*(w3+y*(w4+y*(w5+y*w6))))); + w = w0+z*(w1+y*w2); r = (x-half)*(t-one)+w; } else - /* 2**58 <= x <= inf */ + /* 2**30 <= x <= inf */ r = x*(__ieee754_logf(x)-one); if(hx<0) r = nadj - r; return r; From owner-freebsd-numerics@FreeBSD.ORG Tue Dec 10 05:26:27 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2328CFA0 for ; Tue, 10 Dec 2013 05:26:27 +0000 (UTC) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id A9CE7138D for ; Tue, 10 Dec 2013 05:26:26 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 22DB442E040; Tue, 10 Dec 2013 16:26:16 +1100 (EST) Date: Tue, 10 Dec 2013 16:26:12 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: dead code in lgamma_r[f]? In-Reply-To: <20131209223720.GA91828@troutmask.apl.washington.edu> Message-ID: <20131210155635.J1022@besplex.bde.org> References: <20131205182324.GA65135@troutmask.apl.washington.edu> <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> <20131206114426.I1329@besplex.bde.org> <20131207064602.GA76042@troutmask.apl.washington.edu> <20131208012939.GA80145@troutmask.apl.washington.edu> <20131208141627.F1181@besplex.bde.org> <20131208183339.GA83010@troutmask.apl.washington.edu> <20131208185941.GA83484@troutmask.apl.washington.edu> <20131209223720.GA91828@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=Kejr72oD c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=m8NKqXhiB1EA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=ElNGO9NIuDsA:10 a=XGf6sHeGM9skIQnvz40A:9 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@freebsd.org, Filipe Maia X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Dec 2013 05:26:27 -0000 On Mon, 9 Dec 2013, Steve Kargl wrote: > I would like to commit the attached patch. I will do it in > 3 passes: > 1) whitespace fixes, > 2) literal constants changes (checked by md5), > 3) the code re-organization, threshold change, and dead code removed. > > * lib/msun/src/e_lgamma_r.c > . Remove trailing space and trailing blank line in Copyright. This undoes rev.1.8, in which previous trailing whitespace removal in 1.2 was backed out to reduce diffs against the vendor version. > . Fix prototype for sin_pi to agree with the intended commit > r97413 done some 11 years 6 month ago. > . Remove dead code in sin_pi and remove 'if(ix<0x43300000)' as it is > always true. > . In the domain [0,2), move the three minimax approximations embedded > within __ieee75_lgamma_r() into three 'static inline double' functions. This is too invasive. It is only a style change to a slighly worse style. The other (unrelated) changes seem reasonable. > --- /usr/src/lib/msun/src/e_lgamma_r.c 2013-12-06 07:58:39.000000000 -0800 > +++ src/e_lgamma_r.c 2013-12-09 13:05:22.000000000 -0800 > ... > double > __ieee754_lgamma_r(double x, int *signgamp) > { > - double t,y,z,nadj,p,p1,p2,p3,q,r,w; > + double t,y,z,nadj,p,q,r,w; > int32_t hx; > - int i,lx,ix; > + int i,ix,lx; Also avoid fixing sorting errors when not changing much. It would be a more substantive change to fix the types of these variables (they should be uint32_t or int32_t). I should have done more or less in rev.1.9 where I split up this declaration to change only hx from int to int32_t. > > EXTRACT_WORDS(hx,lx,x); > > @@ -235,51 +265,36 @@ > if((((ix-0x3ff00000)|lx)==0)||(((ix-0x40000000)|lx)==0)) r = 0; > /* for x < 2.0 */ > else if(ix<0x40000000) { > - if(ix<=0x3feccccc) { /* lgamma(x) = lgamma(x+1)-log(x) */ > + if (ix <= 0X3FECCCCC) { /* lgamma(x) = lgamma(x+1)-log(x) */ Like whitespace fixes. > r = -__ieee754_log(x); > - if(ix>=0x3FE76944) {y = one-x; i= 0;} > - else if(ix>=0x3FCDA661) {y= x-(tc-one); i=1;} > - else {y = x; i=2;} > + if (ix >= 0x3FE76944) > + r += func0(1 - x); > + else if (ix >= 0x3FCDA661) > + r += func1(x - (tc - 1)); > + else > + r += func2(x); The simplification from using func*() seems to be mainly avoiding a case statement and the case selector variable i. > ... > switch(i) { > - case 7: z *= (y+6.0); /* FALLTHRU */ > - case 6: z *= (y+5.0); /* FALLTHRU */ > - case 5: z *= (y+4.0); /* FALLTHRU */ > - case 4: z *= (y+3.0); /* FALLTHRU */ > - case 3: z *= (y+2.0); /* FALLTHRU */ > + case 7: z *= (y+6); /* FALLTHRU */ > + case 6: z *= (y+5); /* FALLTHRU */ > + case 5: z *= (y+4); /* FALLTHRU */ > + case 4: z *= (y+3); /* FALLTHRU */ > + case 3: z *= (y+2); /* FALLTHRU */ The comment indentation would be wrong now, except this file already uses totally inconsistent comment indentation. The float version blindly moved the indentation in the other direction by adding float casts. Bruce From owner-freebsd-numerics@FreeBSD.ORG Tue Dec 10 05:45:18 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D0EE835B for ; Tue, 10 Dec 2013 05:45:18 +0000 (UTC) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 56B47150C for ; Tue, 10 Dec 2013 05:45:17 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 7F258D447C0; Tue, 10 Dec 2013 16:45:15 +1100 (EST) Date: Tue, 10 Dec 2013 16:45:14 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: dead code in lgamma_r[f]? In-Reply-To: <20131208183339.GA83010@troutmask.apl.washington.edu> Message-ID: <20131210162729.A1022@besplex.bde.org> References: <20131205173700.GA64575@troutmask.apl.washington.edu> <20131205182324.GA65135@troutmask.apl.washington.edu> <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> <20131206114426.I1329@besplex.bde.org> <20131207064602.GA76042@troutmask.apl.washington.edu> <20131208012939.GA80145@troutmask.apl.washington.edu> <20131208141627.F1181@besplex.bde.org> <20131208183339.GA83010@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=Kejr72oD c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=m8NKqXhiB1EA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=ElNGO9NIuDsA:10 a=UvcvclIWdY0omR4vDwoA:9 a=mfOIlMnV6EL5W_DY:21 a=H_VbkGgs1e0FD1jH:21 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@FreeBSD.org, Filipe Maia X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Dec 2013 05:45:19 -0000 On Sun, 8 Dec 2013, Steve Kargl wrote: > On Sun, Dec 08, 2013 at 03:21:04PM +1100, Bruce Evans wrote: >> ... >> gamma() was the first part of libm that I looked closely at (even >> before trig functions in 2004/5). I didn't understand it at first and >> tried to write my own. My first idea was to use the asymptotic >> expansion as a kernel for all cases, especially for complex args. >> ... > Interesting idea. For x in [2,8), fdlibm code (as you probably > know) uses recursion to reduce the evaluation to a log of a > product and lgamma(2+s) with s in [0,1). One can then show that I remember a bit more about why I tried to use the asymptotic expansion. The fdlibm method in [0,8] only works for real args. > lgamma(2+s) = s * (1 - E) + s**2 * P(s) where E is Euler's constant > and P(s) is a polynomial in s. But, instead of using a minimax This works for complex s, but only for s near 0. > polynomial approximation fdlibm code uses a rational approximation > lgamma(2+s) = 0.5*s + p(s) / q(s). This is the origin of the > s0, ..., s6, r1, ..., r6 coefficients. It is indeed hard to see why the code matches the comment. The 0.5*s expansion is presumably to calculate the leading term exactly. (All ration approximations should do this, since p(s) / q(s) has an error of a couple of ulps.) s * (1 - E) would need extra precision. > A quick scan of n1256.pdf shows that C99 does not require > lgamma of a complex argument, but it does reserve the > names clgamma[fl]. Same in C11 (n1570.pdf). >> % lg(z) = >> % { >> % local(bol, rnew); >> % >> % if (abs(z - 1) < 1.e-1, return (lngamma(z));); >> % if (abs(z - 2) < 1.e-1, return (lngamma(z));); >> >> Near 1 and 2, I planned to use a poly approximation, but that is not >> so easy to do in pari so I call the pari builtin lgamma() to fake it. > > fdlibm's lgammaf_r does not appear to have an issue near 1 or 2. > Starting at the lower bound of each interval and using nextafterf > to scan up to the upper bound, I'm seeing I mean that these cases are easy (even for complex z) so I didn't bother testing various implementations of them for accuracy. >> % lgb(z) = (z - 0.5) * log(z) - z + 0.5 * log(2 * Pi) + \ >> % sum(n = 1, N, b[n] / z^(2 * n - 1)) >> >> This is the asymptotic expansion. Pari syntax is only perfect for >> numeric things like this (and probably for some number theory things >> which I don't know well). b[n] is a table of coefficients. > ... > portion of your above code is approximated by a minimax polynomial. > > /* 8.0 <= x < 2**58 */ > } else if (ix < 0x5c800000) { > t = __ieee754_logf(x); > z = one/x; > y = z*z; > w = w0+z*(w1+y*(w2+y*(w3+y*(w4+y*(w5+y*w6))))); > r = (x-half)*(t-one)+w; > } else This is essentially the same -- the approximation is mostly a minimax poly in 1/z. Another problem for complex args is that it would be surpising if minimax polys worked for them. The polys use delicate cancelations that probably only work for real args. Bruce From owner-freebsd-numerics@FreeBSD.ORG Tue Dec 10 07:38:41 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B4443905 for ; Tue, 10 Dec 2013 07:38:41 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 885B61CF1 for ; Tue, 10 Dec 2013 07:38:41 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id rBA7ceQZ094134; Mon, 9 Dec 2013 23:38:40 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id rBA7ceaZ094133; Mon, 9 Dec 2013 23:38:40 -0800 (PST) (envelope-from sgk) Date: Mon, 9 Dec 2013 23:38:40 -0800 From: Steve Kargl To: Bruce Evans Subject: Re: dead code in lgamma_r[f]? Message-ID: <20131210073840.GA94002@troutmask.apl.washington.edu> References: <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> <20131206114426.I1329@besplex.bde.org> <20131207064602.GA76042@troutmask.apl.washington.edu> <20131208012939.GA80145@troutmask.apl.washington.edu> <20131208141627.F1181@besplex.bde.org> <20131208183339.GA83010@troutmask.apl.washington.edu> <20131208185941.GA83484@troutmask.apl.washington.edu> <20131209223720.GA91828@troutmask.apl.washington.edu> <20131210155635.J1022@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131210155635.J1022@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@freebsd.org, Filipe Maia X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Dec 2013 07:38:41 -0000 On Tue, Dec 10, 2013 at 04:26:12PM +1100, Bruce Evans wrote: > On Mon, 9 Dec 2013, Steve Kargl wrote: > > > I would like to commit the attached patch. I will do it in > > 3 passes: > > 1) whitespace fixes, > > 2) literal constants changes (checked by md5), > > 3) the code re-organization, threshold change, and dead code removed. > > > > * lib/msun/src/e_lgamma_r.c > > . Remove trailing space and trailing blank line in Copyright. > > This undoes rev.1.8, in which previous trailing whitespace removal in > 1.2 was backed out to reduce diffs against the vendor version. It reduces the diff between e_lgamma_r.c and e_lgammaf.c. In addition, I think that at this point in time, the original vendor is irrelevant. >From what I can glean from OpenBSD, NetBSD, Openlibm, and dragonflybsd, FreeBSD is the vendor. > > . Fix prototype for sin_pi to agree with the intended commit > > r97413 done some 11 years 6 month ago. > > . Remove dead code in sin_pi and remove 'if(ix<0x43300000)' as it is > > always true. > > . In the domain [0,2), move the three minimax approximations embedded > > within __ieee75_lgamma_r() into three 'static inline double' functions. > > This is too invasive. It is only a style change to a slighly worse style. It's only a style change to a much better style in that I now have some idea of what to do get working ld80 and ld128 versions. > > The other (unrelated) changes seem reasonable. > > > --- /usr/src/lib/msun/src/e_lgamma_r.c 2013-12-06 07:58:39.000000000 -0800 > > +++ src/e_lgamma_r.c 2013-12-09 13:05:22.000000000 -0800 > > ... > > double > > __ieee754_lgamma_r(double x, int *signgamp) > > { > > - double t,y,z,nadj,p,p1,p2,p3,q,r,w; > > + double t,y,z,nadj,p,q,r,w; > > int32_t hx; > > - int i,lx,ix; > > + int i,ix,lx; > > Also avoid fixing sorting errors when not changing much. It would be a > more substantive change to fix the types of these variables (they should > be uint32_t or int32_t). I should have done more or less in rev.1.9 > where I split up this declaration to change only hx from int to int32_t. > > > > > EXTRACT_WORDS(hx,lx,x); > > > > @@ -235,51 +265,36 @@ > > if((((ix-0x3ff00000)|lx)==0)||(((ix-0x40000000)|lx)==0)) r = 0; > > /* for x < 2.0 */ > > else if(ix<0x40000000) { > > - if(ix<=0x3feccccc) { /* lgamma(x) = lgamma(x+1)-log(x) */ > > + if (ix <= 0X3FECCCCC) { /* lgamma(x) = lgamma(x+1)-log(x) */ > > Like whitespace fixes. > > > r = -__ieee754_log(x); > > - if(ix>=0x3FE76944) {y = one-x; i= 0;} > > - else if(ix>=0x3FCDA661) {y= x-(tc-one); i=1;} > > - else {y = x; i=2;} > > + if (ix >= 0x3FE76944) > > + r += func0(1 - x); > > + else if (ix >= 0x3FCDA661) > > + r += func1(x - (tc - 1)); > > + else > > + r += func2(x); > > The simplification from using func*() seems to be mainly avoiding a case > statement and the case selector variable i. It also allows one to clearly see that here func[0-2] are evaluations of lgamma(1+x) where subdomains in [0,2) are being mapped into other domains. > > ... > > switch(i) { > > - case 7: z *= (y+6.0); /* FALLTHRU */ > > - case 6: z *= (y+5.0); /* FALLTHRU */ > > - case 5: z *= (y+4.0); /* FALLTHRU */ > > - case 4: z *= (y+3.0); /* FALLTHRU */ > > - case 3: z *= (y+2.0); /* FALLTHRU */ > > + case 7: z *= (y+6); /* FALLTHRU */ > > + case 6: z *= (y+5); /* FALLTHRU */ > > + case 5: z *= (y+4); /* FALLTHRU */ > > + case 4: z *= (y+3); /* FALLTHRU */ > > + case 3: z *= (y+2); /* FALLTHRU */ > > The comment indentation would be wrong now, except this file already uses > totally inconsistent comment indentation. The float version blindly moved > the indentation in the other direction by adding float casts. The change reduces the diff between float and double versions. -- Steve