From owner-freebsd-numerics@FreeBSD.ORG Tue Dec 3 00:57:01 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 60257475 for ; Tue, 3 Dec 2013 00:57:01 +0000 (UTC) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id C3676187C for ; Tue, 3 Dec 2013 00:57:00 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 4B2AD1043FFD; Tue, 3 Dec 2013 11:56:51 +1100 (EST) Date: Tue, 3 Dec 2013 11:56:49 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: erff, erfl, and erfcl patch In-Reply-To: <20131202214240.GA44209@troutmask.apl.washington.edu> Message-ID: <20131203095936.G1230@besplex.bde.org> References: <20131202214240.GA44209@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=bpB1Wiqi c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=pwry36zkC_oA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=Y1DDuN7oUWAA:10 a=qWm97dYoU6-eoFHfUM0A:9 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Dec 2013 00:57:01 -0000 On Mon, 2 Dec 2013, Steve Kargl wrote: > I intend to commit the patch that follows by sig in the next day or > two. In particular, I want to get the ld80 and ld128 versions of > erfl and erfcl into tree as the current hack provided by imprecise.c > is ugly. Looks good, and a lot of work. I don' have time to look at it closely for a week or 2. Did you find translating the fdlibm code, including keeping its style bugs and algorithms to a fault, easier than using a new method like you did for expl()? > ... > * msun/src/s_erf.c: > . Fix whitespace issues. > . Add weak references for erfl and erfcl on targets where long double > has 53 bits of precisions. > > * msun/src/s_erff.c: > . Consistently use lower case in hex constants. > . Update the rational approximations. > . Fix descriptions of the the domain and range. > . Remove leading zeros in exponents, e.g., 1.28379166e-01 becomes > 1.28379166e-1. Some style is still inconsistent. Better not change it without making it fully consistent. > Index: ld128/s_erfl.c > =================================================================== > --- ld128/s_erfl.c (revision 0) > +++ ld128/s_erfl.c (working copy) > @@ -0,0 +1,347 @@ > ... > +/* > + * Domain [0, 0.84375], range ~[-2.076e-38, 2.074e-38]: > + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-126 > + */ > +static const long double > +efx = 1.28379167095512573896158903121545167e-1L, /* 0xecbff6a7, 0x481dd788, 0xb64d21a8, 0xeb06fc3f */ > +efx8= 1.02703333676410059116927122497236133L, /* 0xecbff6a7, 0x481dd788, 0xb64d21a8, 0xeb06ff3f */ > +pp0 = 1.28379167095512573896158903121545167e-1L, /* 0x3ffc06eb, 0xa8214db6, 0x88d71d48, 0xa7f6bfec */ There are really 2 subranges, one near 0 covered by a linear approximation using efx and efx8, and the expanded comment doesn't really apply to this range (though the comment might be true, p(x)/q(x) is not actually used for this subrange). Elsewhere, we often write similar comments, but we don't put the coefficients for the approximation near 0 under the scope of the comment. This approximation near 0 is especially interesting so a separate comment for it would be good. > +pp1 = -3.14930453779220199897729762882897733e-1L, /* 0xbffd427d, 0x20fdfc19, 0x5395ba59, 0x26b231dc */ > ... > +qa12= -1.02073578608782110388686097137679971e-6L; /* 0xbfeb1200, 0x6dd9de6c, 0x647bcf5b, 0xec577463 */ > ... s_erf.c uses more spaces, so that there is always at least 1 space before '=' (except for efx8, which is misformatted). In fact, there are always at least 2. There, the coeff numbers don't go above 9, so there is more space than necessary (except for lining up with the efx8 line if that were not misformatted). Here, using the same column for '=' as there is just enough to leave 1 space after the wider coeffs. > +long double > +erfl(long double x) > +{ > + int32_t i; > + long double ax,R,S,P,Q,s,y,z,r; > + uint64_t lx, llx; > + uint16_t hx; > + > + EXTRACT_LDBL128_WORDS(hx, lx, llx, x); > + > + /* erf(+-0) = +-0 */ > + if ((hx == 0 || hx == 0x8000) && (lx | llx) == 0) > + return (x); Why a special case for +-0? s_erf.c doesn't have one. I think it is correct too. erf(-0) is evaluated as 0.125*(8*-0+efx8*-0) = 0.125*(-0+-0) = -0, hopefully in all rounding modes. > + > + /* x is INF or NaN */ > + if (hx >= 0x7fff) { > + if (hx == 0xffff) > + return (-1); > + if ((lx | llx) == 0) > + return (1); > + return (x * x); > + } s_erf.c has better comments. x is _not_ INF or NaN before it is classified as such... This seems to be quite broken. hx is unsigned, so (hx >= 0x7fff) is true for all negative numbers, as well as for Infs and NaNs. This gives the bug that erfl(negative) = square(negative). (hx == 0xffff) is true for negative NaNs as well as for -Inf. This gives the bug that erfl(negative NaN) = -1. s_erf.c avoids these bugs using the following methods: - first mask out the sign bit: (ix = hx & MASK). - be more careful testing for NaNs. In the above, the (lx | llx) test would probably work if if it were not broken by being after the (hx == 0xffff) test; the latter is apparently to detect -Inf in a not so good way after not masking out the sign bit initially. Then it uses a more magic expression than x*x so as to avoid the 3 cases for +-1 and x*x. In expl.c, we eventually found an even more magic expression that avoids the (lx | llx) test. This test tends to be slow (although it is in a rarely executed case, just looking at llx sometimes confuses the compiler to load llx for all cases). The test is even more complicated for ld80, since you have to handle the normalization bit and should detected unsupported formats. So it is good to avoid testing this in bits. For the magic expression, the one in s_erf.c is already good. We don't need to look at either lx or llx. hx already consists of just the exponent (and sign) bits. This corresponds to masking off the mantissa bits in s_erf.c. (hx & 0x7fff) == 0x7fff classifies Infs and NaNs. Divide 1 by x to turn +-Inf into +-0 and not change NaNs except to quieten them. This result is not used for +-Inf, but it keeps NaNs unchanged except for quietening. Then add the integer +-1 to get the result +-1 for +-Inf and keep NaNs again unchanged except for quietening. s_erf.c calculates the +-1 as an integer. I don't know of any better way to turn +-Inf into +-1. Expressions like x/Inf don't quite work. Similarly elsewhere. > + > + ax = fabsl(x); > + if(ax < 0.84375) { I agree with not using the bit tests here, at least if you can avoid loading lx and llx at all. (The way the above is written, lx and llx are always loaded and there is no fast path for the usual case of finite x where they are not used.) > + if(ax < 0x1.p-34L) { I like to omit the "point" if possible, and there is no need to use an L suffix for small powers of 2. > + if (ax < LIM1) { Not consistently misformatted as 'if(foo)'. Why #defined constants for this and not for the others? This one varies between ld80 and ld128, so using it makes the diffs between those easier to read, but it is a literal constant for f32 and d64, so using it makes the diffs from ld* to the reference d64 harder to read. > + s = one/(ax*ax); > + if(ax < LIM0) { /* |x| < 2.85715 */ Repeating magic numbers in comments defeats the point of using #define to make them less magic. LIM0 seems to be the same for all precisions, and s_erf.c spells it less magically as 1/0.35. > Index: ld80/s_erfl.c > =================================================================== > --- ld80/s_erfl.c (revision 0) > +++ ld80/s_erfl.c (working copy) > @@ -0,0 +1,353 @@ > ... > +long double > +erfl(long double x) > ... > + /* x is INF or NaN */ > + if (hx >= 0x7fff) { > + if (hx == 0xffff) > + return (-1); > + if (lx == 0x8000000000000000) > + return (1); > + return (x * x); > + } After changing the first hx test to ((hx & 0x7fff) == 0x7fff) and removing the second hx test as above, this seems to be a correct but slow way to classify Infs and NaNs. Oops, we still need a second hx test to classify -Inf. This must be done after testing the normalization bit: /* Wrong comment intentionally left out. */ if ((hx & 0x7fff) == 0x7fff) { if (lx == 0x8000000000000000ULL) return (hx & 0x8000 ? -1 : 1); /* erf(+-Inf) = +-1 */ return (x + x); /* erf(NaN/pseudo{Inf,NaN}) = qNaN */ } I don't like the ULL suffix, but it is needed to fix warnings for non-C99 compilers, and I used it in s_logl.c. The above may be slow, but it is close to what I used in s_logl.c. There it is impossible to avoid loading lx. s_logl.c is a little more complicated, both to handle denormals and to classify unnormals as NaNs so that they get handled by the x+x return. It is better to avoid these complications by using the method in s_erf.c. "Finite" unnormals are not classified by the hx test. I think they are hanled correctly by falling though, as in expl(), since there will by an operation involving them. > Index: src/s_erf.c > =================================================================== > --- src/s_erf.c (revision 258855) > +++ src/s_erf.c (working copy) > ... > @@ -121,8 +121,8 @@ > /* > * Coefficients for approximation to erf on [0,0.84375] > */ > -efx = 1.28379167095512586316e-01, /* 0x3FC06EBA, 0x8214DB69 */ > -efx8= 1.02703333676410069053e+00, /* 0x3FF06EBA, 0x8214DB69 */ > +efx = 1.28379167095512586316e-01, /* 0x3FC06EBA, 0x8214DB69 */ > +efx8= 1.02703333676410069053e+00, /* 0x3FF06EBA, 0x8214DB69 */ > pp0 = 1.28379167095512558561e-01, /* 0x3FC06EBA, 0x8214DB68 */ > pp1 = -3.25042107247001499370e-01, /* 0xBFD4CD7D, 0x691CB913 */ > pp2 = -2.84817495755985104766e-02, /* 0xBF9D2A51, 0xDBD7194F */ This lines up the comments, but the misformatting was not lining up either the '=' or the first digit. > Index: src/s_erff.c > =================================================================== > --- src/s_erff.c (revision 258855) > +++ src/s_erff.c (working copy) > @@ -21,62 +21,54 @@ > > static const float > tiny = 1e-30, > -half= 5.0000000000e-01, /* 0x3F000000 */ > -one = 1.0000000000e+00, /* 0x3F800000 */ > -two = 2.0000000000e+00, /* 0x40000000 */ > +half= 5.0000000000e-1, /* 0x3f000000 */ > +one = 1.0000000000e+0, /* 0x3f800000 */ > +two = 2.0000000000e+0, /* 0x40000000 */ If changing these, fix the '=' and first digit to line up with later sections. > /* > - * Coefficients for approximation to erf on [0,0.84375] > + * Domain [0, 0.84375], range ~[-5.4419e-10,5.5179e-10]: > + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-31 > */ > -efx = 1.2837916613e-01, /* 0x3e0375d4 */ > -efx8= 1.0270333290e+00, /* 0x3f8375d4 */ > +efx = 1.28379166e-1, /* 0x3e0375d4 */ > +efx8 = 1.02703333, /* 0x3f8375d4 */ > +pp0 = 1.28379166e-1, /* 0x3e0375d4 */ This does line up the '=' and first digit. So the formatting seems consistent in this file except for half/one/two, but may be too different from s_erfc.c > /* > - * Domain [0.84375, 1.25], range ~[-1.953e-11,1.940e-11]: > - * |(erf(x) - erx) - p(x)/q(x)| < 2**-36. > + * Domain [1.25,1/0.35], range ~[-4.821e-9,4.927e-9]: > + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-28 > */ > -erx = 8.42697144e-01F, /* 0x1.af7600p-1. erf(1) rounded to 16 bits. */ > -pa0 = 3.64939137e-06F, /* 0x1.e9d022p-19 */ > -pa1 = 4.15109694e-01F, /* 0x1.a91284p-2 */ > -pa2 = -1.65179938e-01F, /* -0x1.5249dcp-3 */ > -pa3 = 1.10914491e-01F, /* 0x1.c64e46p-4 */ > -qa1 = 6.02074385e-01F, /* 0x1.344318p-1 */ > -qa2 = 5.35934687e-01F, /* 0x1.126608p-1 */ > -qa3 = 1.68576106e-01F, /* 0x1.593e6ep-3 */ > -qa4 = 5.62181212e-02F, /* 0x1.cc89f2p-5 */ > +ra0 = -9.88156721e-3, /* 0xbc21e64c */ > +ra1 = -5.43658376e-1, /* 0xbf0b2d32 */ Nah, this becomes inconsistent again by removing the space bnefore the '=' that is needed to like up with the longer variable name efx8 and a space after that before the '='. > @@ -113,12 +105,12 @@ > } > x = fabsf(x); > s = one/(x*x); > - if(ix< 0x4036DB6E) { /* |x| < 1/0.35 */ > + if(ix< 0x4036db8c) { /* |x| < 2.85715 ~ 1/0.35 */ Don't give more detail than s_erf.c. We are only keeping these comments (that should be identical) because removing them would make the diffs harder to read). This also changes the style. s_erf.c still uses an upper case hex constant here. Bruce