From owner-freebsd-numerics@freebsd.org Wed Mar 6 12:56:43 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 026CE1514503 for ; Wed, 6 Mar 2019 12:56:43 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id B21698C948 for ; Wed, 6 Mar 2019 12:56:35 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 78D16105B9C9; Wed, 6 Mar 2019 23:56:24 +1100 (AEDT) Date: Wed, 6 Mar 2019 23:56:23 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Update ENTERI() macro In-Reply-To: <20190306055201.GA40298@troutmask.apl.washington.edu> Message-ID: <20190306225811.P2731@besplex.bde.org> References: <20190226191825.GA68479@troutmask.apl.washington.edu> <20190227145002.P907@besplex.bde.org> <20190227074811.GA75972@troutmask.apl.washington.edu> <20190227201214.V1823@besplex.bde.org> <20190227161906.GA77785@troutmask.apl.washington.edu> <20190228060920.R4413@besplex.bde.org> <20190304212159.GA12587@troutmask.apl.washington.edu> <20190305153243.Y1349@besplex.bde.org> <20190306055201.GA40298@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=fpzoFui-lhB3HaOsU0AA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: B21698C948 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.249 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-6.25 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[249.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[extmail.optusnet.com.au]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.92)[-0.916,0]; IP_SCORE(-3.03)[ip: (-7.94), ipnet: 211.28.0.0/14(-3.98), asn: 4804(-3.17), country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2019 12:56:43 -0000 On Tue, 5 Mar 2019, Steve Kargl wrote: > ... > I think I have the s_cexpl.c file fixed to use LDBL_EXTRACT_WORDS > instead of the new macro I introduced. I however cannot figure > out what David Das did to arrive at k_exp.c, so I cannot write Davvid A S. > a similar k_cexpl.c. Yes, I added the 'c' in the name to avoid > confusion in ld80/. In particular, I have no idea how he found > his scaling value 'k'. Any insights? bde already wrote __ldexp_cexpl() in ld*/k_expl.h, and kargl committed it in r260066. Does it not work? :-). Well, it hasn't been tested, and it indeed cannot work since it spells cosl(y) as cos(y). It is written in a more portable way than the double and float versions using '1' instead of bit fiddling to start the construction of 2**k. This makes it identical in ld80 and ld128 (since the exponent range is the same and the accessor macros hide enough details of the bit fiddling for the exponent). The duplication is noted in a comment, but the comment also says that using scalbnl() ond ld128 is probably best after all (it is slow, but multiplication is also slow). I have rewritten the double and float versions in work related to fixing the accuracy of the double and float versions of hyperbolic functions. I fixed these before writing the long double hyperbolic functions using identical methods. You committed the latter, but the double and float versions still use the old innaccurate slower methods. The rewrite improves the comments, and it is the improved comments that the reference to k_exp.c in k_expl.h is supposed to be for. XX Index: k_exp.c XX =================================================================== XX RCS file: /home/ncvs/src/lib/msun/src/k_exp.c,v XX retrieving revision 1.2 XX diff -u -2 -r1.2 k_exp.c XX --- k_exp.c 17 Nov 2012 01:50:07 -0000 1.2 XX +++ k_exp.c 30 Jun 2013 01:31:48 -0000 XX @@ -32,75 +32,46 @@ XX #include "math.h" XX #include "math_private.h" XX - XX -static const uint32_t k = 1799; /* constant for reduction */ XX -static const double kln2 = 1246.97177782734161156; /* k * ln2 */ I never liked this magic by das. I forget how it worked. I avoid using general methods. XX - XX -/* XX - * Compute exp(x), scaled to avoid spurious overflow. An exponent is XX - * returned separately in 'expt'. XX - * XX - * Input: ln(DBL_MAX) <= x < ln(2 * DBL_MAX / DBL_MIN_DENORM) ~= 1454.91 XX - * Output: 2**1023 <= y < 2**1024 XX - */ XX -static double XX -__frexp_exp(double x, int *expt) XX -{ XX - double exp_x; XX - uint32_t hx; XX - XX - /* XX - * We use exp(x) = exp(x - kln2) * 2**k, carefully chosen to XX - * minimize |exp(kln2) - 2**k|. We also scale the exponent of XX - * exp_x to MAX_EXP so that the result can be multiplied by XX - * a tiny number without losing accuracy due to denormalization. XX - */ XX - exp_x = exp(x - kln2); XX - GET_HIGH_WORD(hx, exp_x); XX - *expt = (hx >> 20) - (0x3ff + 1023) + k; XX - SET_HIGH_WORD(exp_x, (hx & 0xfffff) | ((0x3ff + 1023) << 20)); XX - return (exp_x); XX -} __frexp_exp() was only used in __ldexp_cexp(). It is not needed using the general methods. __ldexp_cexpl() already uses the general methods and has no trace of __frexp_expl(). XX +#include "k_exp.h" XX XX /* XX - * __ldexp_exp(x, expt) and __ldexp_cexp(x, expt) compute exp(x) * 2**expt. XX - * They are intended for large arguments (real part >= ln(DBL_MAX)) XX - * where care is needed to avoid overflow. XX + * _ldexp_cexp(x + I*y, expt) computes cexp(x + I*y) * 2**expt. It is XX + * intended for large arguments where care is needed to avoid overflow. XX * XX - * The present implementation is narrowly tailored for our hyperbolic and XX - * exponential functions. We assume expt is small (0 or -1), and the caller XX - * has filtered out very large x, for which overflow would be inevitable. XX + * The present implementation is tailored (not very narrowly) for our XX + * complex hyperbolic and exponential functions. The caller must filter XX + * out all x not supported by k_exp(), most negative x, and very large x, XX + * and not use very large |expt| (we only need expt = 0 or -1). The XX + * caller should filter out all x for which overflow is either impossible XX + * (x < ~ln(DBL_MAX)) or inevitable (x > ~2*ln(DBL_MAX/DBL_TRUE_MIN)). XX */ XX - XX -double XX -__ldexp_exp(double x, int expt) XX -{ XX - double exp_x, scale; XX - int ex_expt; XX - XX - exp_x = __frexp_exp(x, &ex_expt); XX - expt += ex_expt; XX - INSERT_WORDS(scale, (0x3ff + expt) << 20, 0); XX - return (exp_x * scale); XX -} XX - __ldexp_exp() was only used in cosh() and sinh(). coshl() and sinhl() always used better methods, so there is no trace of __ldexp_expl() in k_expl.h. XX double complex XX __ldexp_cexp(double complex z, int expt) XX { XX - double x, y, exp_x, scale1, scale2; XX - int ex_expt, half_expt; XX + double_t exp_x, hi, lo; XX + double x, y, scale1, scale2; XX + int half_expt, k; Also start fixing i386 pessimizations using s/double/double_t. XX XX x = creal(z); XX y = cimag(z); XX - exp_x = __frexp_exp(x, &ex_expt); XX - expt += ex_expt; XX + k_exp(x, &hi, &lo, &k); k_expl.h already has a kernel k_expl() much like this (you wrote the non-kernel expl() and I turned it into the kernel. The rest of this mail is mostly diffs and files for turning exp() into a kernel. The kernel is used a lot in hyperbolic functions, just like in the long double case). XX + XX + /* XX + * 3 scale factors are needed in general. Put one in exp_x now. XX + * exp_x must be larger than 2**DBL_MANT_DIG to avoid spurious XX + * underflows We satisfy this and maximize the range handled XX + * by putting as much as possible in exp_x. XX + */ XX + exp_x = (lo + hi) * 0x1p1022; XX + expt += k - 1022; XX XX /* XX - * Arrange so that scale1 * scale2 == 2**expt. We use this to XX - * compensate for scalbn being horrendously slow. XX + * Arrange so that scale1 * scale2 == 2**expt. As usual, scalbn() XX + * is too slow to actually use for the things it does. XX */ XX + scale1 = 1; XX half_expt = expt / 2; XX - INSERT_WORDS(scale1, (0x3ff + half_expt) << 20, 0); XX - half_expt = expt - half_expt; XX - INSERT_WORDS(scale2, (0x3ff + half_expt) << 20, 0); XX + SET_HIGH_WORD(scale1, (0x3ff + half_expt) << 20); XX + scale2 = 1; XX + SET_HIGH_WORD(scale2, (0x3ff + expt - half_expt) << 20); XX XX return (cpack(cos(y) * exp_x * scale1 * scale2, The scaling is the same as in k_expl.h, with minor differences for the bit fiddling. This started by constructing 1 without using bits too. The scaling is to avoid overlow and underflow, and it must be careful to not have its own underflow. The basic general method is that the kernel creates the value in the form r * 2**k where r is near 1. Since r is near 1, we can multiply almost any value by it without overflow or underflow. Values near +Inf might need to be divided by 2 to prepare. Values near DBL_MIN might need to be multiplied by 2 to prepare. Denormal values should be multiplied by a larger power of 2 to prepare. Adjust k to match. The rest is not a diff, but is the contents of k_exp.h. XX /* See e_exp.c for more comments. */ XX XX static const double XX ln2hi = 6.93147180369123816490e-01, /* 0x3fe62e42, 0xfee00000 */ XX ln2lo = 1.90821492927058770002e-10, /* 0x3dea39ef, 0x35793c76 */ XX invln2 = 1.44269504088896338700e+00, /* 0x3ff71547, 0x652b82fe */ XX /* XX * Domain [-0.34568, 0.34568], range ~[-1.6340e-18, 2.0047e-18]: XX * |x*(exp(x)+1)/(exp(x)-1) - p(x)| < 2**-58.7 XX */ XX P1 = 1.66666666666666019037e-01, /* 0x3FC55555, 0x5555553E */ XX P2 = -2.77777777770155933842e-03, /* 0xBF66C16C, 0x16BEBD93 */ XX P3 = 6.61375632143793436117e-05, /* 0x3F11566A, 0xAF25DE2C */ XX P4 = -1.65339022054652515390e-06, /* 0xBEBBBD41, 0xC5D26BF1 */ XX P5 = 4.13813679705723846039e-08; /* 0x3E663769, 0x72BEA4D0 */ XX XX /* XX * Kernel for exp(x). x must be finite and not tiny or huge. XX * "tiny" is anything that would make us underflow (|P4*x^8| < ~DBL_T_MIN). XX * "huge" is anything that would make t*ln2hi inexact (|x| > ~2**21*ln2). XX */ XX static inline void XX k_exp(double x, double_t *hip, double_t *lop, int *kp) XX { XX double_t c, hi, lo, rhi, rlo, t, xred; XX int32_t k; XX XX /* argument reduction */ XX #if 0 XX t = rnint((double_t)x * invln2); XX #else XX t = (double)((double_t)x * invln2 + 0x1.8p52) - 0x1.8p52; XX #endif XX k = irint(t); XX /* XX * ln2hi is rounded to 32 bits to more than cover the normal XX * range -1075 <= k <= 1024. It covers the wider range XX * |k| <= 2**21. We now need it cover the range XX * -1075 <= k <= 1024+1024+53. XX */ XX hi = x - t*ln2hi; /* t*ln2hi is exact here */ XX lo = t*ln2lo; XX xred = hi - lo; XX XX /* xred is now in primary range */ XX t = xred*xred; XX c = xred - t*P1 - t*t*(P2 + t*P3) - t*t*(t*t)*(P4 + t*P5); XX t = xred*c/(2-c); XX rhi = 1; XX rlo = hi; XX _2sumF(rhi, rlo); XX rlo += t-lo; XX *hip = rhi; XX *lop = rlo; XX *kp = k; XX } XX XX /* XX * Extra-precision exp(x)/2. x must be between the underflow threshold XX * and the overflow threshold (in float precision due to our optimization XX * of evaluating 2**(k-1) in float precision)), and not very close to XX * either, and not tiny. We only use this for x >= 1 up to the threshold XX * where cosh(x) decays to exp(x)/2. XX */ XX static inline void XX k_hexp(double x, double_t *hip, double_t *lop) XX { XX float twopkm1; XX int k; XX XX k_exp(x, hip, lop, &k); XX SET_FLOAT_WORD(twopkm1, 0x3f800000 + ((k - 1) << 23)); XX *hip *= twopkm1; XX *lop *= twopkm1; XX } XX XX /* XX * exp(x)/2 without spurious overflow. x must be well above the underflow XX * threshold and at most a little above the overflow threshold, and not XX * tiny. We only use this for x positive and above the threshold where XX * cosh(x) decays to exp(x)/2 up to a little above the overflow threshold. XX */ XX static inline double_t XX hexp(double x) XX { XX double_t hi, lo; XX double twopkm2; XX int k; XX XX twopkm2 = 1; XX k_exp(x, &hi, &lo, &k); XX SET_HIGH_WORD(twopkm2, 0x3ff00000 + ((k - 2) << 20)); XX return (lo + hi) * 2 * twopkm2; XX } k_expl.h also points to the above uncommitted comments for k_hexp() and hexp(). __ldexp_cexpl() doesn't belong in k_expl.h. It belongs in a new file k_expl.c corresponding to k_exp.c. hexp[fl]() isn't quite a kernel, but it has to be in a header file since it is inline. Using static inline functions also reduces namespace problems. We already use this to name some kernels simply kfoo() instead of uglier names like __kernel_foo() used for older methods where the kernels aren't always inline. Bruce