From owner-freebsd-numerics@freebsd.org  Wed Mar  6 12:56:43 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 026CE1514503
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed,  6 Mar 2019 12:56:43 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id B21698C948
 for <freebsd-numerics@freebsd.org>; Wed,  6 Mar 2019 12:56:35 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 78D16105B9C9;
 Wed,  6 Mar 2019 23:56:24 +1100 (AEDT)
Date: Wed, 6 Mar 2019 23:56:23 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Update ENTERI() macro
In-Reply-To: <20190306055201.GA40298@troutmask.apl.washington.edu>
Message-ID: <20190306225811.P2731@besplex.bde.org>
References: <20190226191825.GA68479@troutmask.apl.washington.edu>
 <20190227145002.P907@besplex.bde.org>
 <20190227074811.GA75972@troutmask.apl.washington.edu>
 <20190227201214.V1823@besplex.bde.org>
 <20190227161906.GA77785@troutmask.apl.washington.edu>
 <20190228060920.R4413@besplex.bde.org>
 <20190304212159.GA12587@troutmask.apl.washington.edu>
 <20190305153243.Y1349@besplex.bde.org>
 <20190306055201.GA40298@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=fpzoFui-lhB3HaOsU0AA:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: B21698C948
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates
 211.29.132.249 as permitted sender) smtp.mailfrom=brde@optusnet.com.au
X-Spamd-Result: default: False [-6.25 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 RCVD_IN_DNSWL_LOW(-0.10)[249.132.29.211.list.dnswl.org : 127.0.5.1];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23];
 FREEMAIL_FROM(0.00)[optusnet.com.au];
 MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 MX_GOOD(-0.01)[extmail.optusnet.com.au];
 RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.92)[-0.916,0];
 IP_SCORE(-3.03)[ip: (-7.94), ipnet: 211.28.0.0/14(-3.98), asn: 4804(-3.17),
 country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 FREEMAIL_ENVFROM(0.00)[optusnet.com.au];
 ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU];
 MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Mar 2019 12:56:43 -0000

On Tue, 5 Mar 2019, Steve Kargl wrote:

> ...
> I think I have the s_cexpl.c file fixed to use LDBL_EXTRACT_WORDS
> instead of the new macro I introduced.  I however cannot figure
> out what David Das did to arrive at k_exp.c, so I cannot write

Davvid A S.

> a similar k_cexpl.c.  Yes, I added the 'c' in the name to avoid
> confusion in ld80/.  In particular, I have no idea how he found
> his scaling value 'k'.  Any insights?

bde already wrote __ldexp_cexpl() in ld*/k_expl.h, and kargl committed
it in r260066.  Does it not work? :-).

Well, it hasn't been tested, and it indeed cannot work since it spells
cosl(y) as cos(y).

It is written in a more portable way than the double and float versions
using '1' instead of bit fiddling to start the construction of 2**k.
This makes it identical in ld80 and ld128 (since the exponent range
is the same and the accessor macros hide enough details of the bit
fiddling for the exponent).  The duplication is noted in a comment,
but the comment also says that using scalbnl() ond ld128 is probably
best after all (it is slow, but multiplication is also slow).

I have rewritten the double and float versions in work related to
fixing the accuracy of the double and float versions of hyperbolic
functions.  I fixed these before writing the long double hyperbolic
functions using identical methods.  You committed the latter, but
the double and float versions still use the old innaccurate slower
methods.

The rewrite improves the comments, and it is the improved comments
that the reference to k_exp.c in k_expl.h is supposed to be for.

XX Index: k_exp.c
XX ===================================================================
XX RCS file: /home/ncvs/src/lib/msun/src/k_exp.c,v
XX retrieving revision 1.2
XX diff -u -2 -r1.2 k_exp.c
XX --- k_exp.c	17 Nov 2012 01:50:07 -0000	1.2
XX +++ k_exp.c	30 Jun 2013 01:31:48 -0000
XX @@ -32,75 +32,46 @@
XX  #include "math.h"
XX  #include "math_private.h"
XX -
XX -static const uint32_t k = 1799;		/* constant for reduction */
XX -static const double kln2 =  1246.97177782734161156;	/* k * ln2 */

I never liked this magic by das.  I forget how it worked.  I avoid using
general methods.

XX -
XX -/*
XX - * Compute exp(x), scaled to avoid spurious overflow.  An exponent is
XX - * returned separately in 'expt'.
XX - *
XX - * Input:  ln(DBL_MAX) <= x < ln(2 * DBL_MAX / DBL_MIN_DENORM) ~= 1454.91
XX - * Output: 2**1023 <= y < 2**1024
XX - */
XX -static double
XX -__frexp_exp(double x, int *expt)
XX -{
XX -	double exp_x;
XX -	uint32_t hx;
XX -
XX -	/*
XX -	 * We use exp(x) = exp(x - kln2) * 2**k, carefully chosen to
XX -	 * minimize |exp(kln2) - 2**k|.  We also scale the exponent of
XX -	 * exp_x to MAX_EXP so that the result can be multiplied by
XX -	 * a tiny number without losing accuracy due to denormalization.
XX -	 */
XX -	exp_x = exp(x - kln2);
XX -	GET_HIGH_WORD(hx, exp_x);
XX -	*expt = (hx >> 20) - (0x3ff + 1023) + k;
XX -	SET_HIGH_WORD(exp_x, (hx & 0xfffff) | ((0x3ff + 1023) << 20));
XX -	return (exp_x);
XX -}

__frexp_exp() was only used in __ldexp_cexp().  It is not needed using the
general methods.  __ldexp_cexpl() already uses the general methods and has
no trace of __frexp_expl().

XX +#include "k_exp.h"
XX 
XX  /*
XX - * __ldexp_exp(x, expt) and __ldexp_cexp(x, expt) compute exp(x) * 2**expt.
XX - * They are intended for large arguments (real part >= ln(DBL_MAX))
XX - * where care is needed to avoid overflow.
XX + * _ldexp_cexp(x + I*y, expt) computes cexp(x + I*y) * 2**expt.  It is
XX + * intended for large arguments where care is needed to avoid overflow.
XX   *
XX - * The present implementation is narrowly tailored for our hyperbolic and
XX - * exponential functions.  We assume expt is small (0 or -1), and the caller
XX - * has filtered out very large x, for which overflow would be inevitable.
XX + * The present implementation is tailored (not very narrowly) for our
XX + * complex hyperbolic and exponential functions.  The caller must filter
XX + * out all x not supported by k_exp(), most negative x, and very large x,
XX + * and not use very large |expt| (we only need expt = 0 or -1).  The
XX + * caller should filter out all x for which overflow is either impossible
XX + * (x < ~ln(DBL_MAX)) or inevitable (x > ~2*ln(DBL_MAX/DBL_TRUE_MIN)).
XX   */
XX -
XX -double
XX -__ldexp_exp(double x, int expt)
XX -{
XX -	double exp_x, scale;
XX -	int ex_expt;
XX -
XX -	exp_x = __frexp_exp(x, &ex_expt);
XX -	expt += ex_expt;
XX -	INSERT_WORDS(scale, (0x3ff + expt) << 20, 0);
XX -	return (exp_x * scale);
XX -}
XX -

__ldexp_exp() was only used in cosh() and sinh().  coshl() and sinhl()
always used better methods, so there is no trace of __ldexp_expl() in
k_expl.h.

XX  double complex
XX  __ldexp_cexp(double complex z, int expt)
XX  {
XX -	double x, y, exp_x, scale1, scale2;
XX -	int ex_expt, half_expt;
XX +	double_t exp_x, hi, lo;
XX +	double x, y, scale1, scale2;
XX +	int half_expt, k;

Also start fixing i386 pessimizations using s/double/double_t.

XX 
XX  	x = creal(z);
XX  	y = cimag(z);
XX -	exp_x = __frexp_exp(x, &ex_expt);
XX -	expt += ex_expt;
XX +	k_exp(x, &hi, &lo, &k);

k_expl.h already has a kernel k_expl() much like this (you wrote the
non-kernel expl() and I turned it into the kernel.  The rest of this
mail is mostly diffs and files for turning exp() into a kernel.  The
kernel is used a lot in hyperbolic functions, just like in the long
double case).

XX +
XX +	/*
XX +	 * 3 scale factors are needed in general.  Put one in exp_x now.
XX +	 * exp_x must be larger than 2**DBL_MANT_DIG to avoid spurious
XX +	 * underflows  We satisfy this and maximize the range handled
XX +	 * by putting as much as possible in exp_x.
XX +	 */
XX +	exp_x = (lo + hi) * 0x1p1022;
XX +	expt += k - 1022;
XX 
XX  	/*
XX -	 * Arrange so that scale1 * scale2 == 2**expt.  We use this to
XX -	 * compensate for scalbn being horrendously slow.
XX +	 * Arrange so that scale1 * scale2 == 2**expt.  As usual, scalbn()
XX +	 * is too slow to actually use for the things it does.
XX  	 */
XX +	scale1 = 1;
XX  	half_expt = expt / 2;
XX -	INSERT_WORDS(scale1, (0x3ff + half_expt) << 20, 0);
XX -	half_expt = expt - half_expt;
XX -	INSERT_WORDS(scale2, (0x3ff + half_expt) << 20, 0);
XX +	SET_HIGH_WORD(scale1, (0x3ff + half_expt) << 20);
XX +	scale2 = 1;
XX +	SET_HIGH_WORD(scale2, (0x3ff + expt - half_expt) << 20);
XX 
XX  	return (cpack(cos(y) * exp_x * scale1 * scale2,

The scaling is the same as in k_expl.h, with minor differences for the
bit fiddling.  This started by constructing 1 without using bits too.

The scaling is to avoid overlow and underflow, and it must be careful
to not have its own underflow.  The basic general method is that the
kernel creates the value in the form r * 2**k where r is near 1.  Since
r is near 1, we can multiply almost any value by it without overflow
or underflow.  Values near +Inf might need to be divided by 2 to prepare.
Values near DBL_MIN might need to be multiplied by 2 to prepare.  Denormal
values should be multiplied by a larger power of 2 to prepare.  Adjust k
to match.

The rest is not a diff, but is the contents of k_exp.h.

XX /* See e_exp.c for more comments. */
XX 
XX static const double
XX ln2hi      =  6.93147180369123816490e-01,  /* 0x3fe62e42, 0xfee00000 */
XX ln2lo      =  1.90821492927058770002e-10,  /* 0x3dea39ef, 0x35793c76 */
XX invln2     =  1.44269504088896338700e+00,  /* 0x3ff71547, 0x652b82fe */
XX /*
XX  * Domain [-0.34568, 0.34568], range ~[-1.6340e-18, 2.0047e-18]:
XX  * |x*(exp(x)+1)/(exp(x)-1) - p(x)| < 2**-58.7
XX  */
XX P1   =  1.66666666666666019037e-01, /* 0x3FC55555, 0x5555553E */
XX P2   = -2.77777777770155933842e-03, /* 0xBF66C16C, 0x16BEBD93 */
XX P3   =  6.61375632143793436117e-05, /* 0x3F11566A, 0xAF25DE2C */
XX P4   = -1.65339022054652515390e-06, /* 0xBEBBBD41, 0xC5D26BF1 */
XX P5   =  4.13813679705723846039e-08; /* 0x3E663769, 0x72BEA4D0 */
XX 
XX /*
XX  * Kernel for exp(x).  x must be finite and not tiny or huge.
XX  * "tiny" is anything that would make us underflow (|P4*x^8| < ~DBL_T_MIN).
XX  * "huge" is anything that would make t*ln2hi inexact (|x| > ~2**21*ln2).
XX  */
XX static inline void
XX k_exp(double x, double_t *hip, double_t *lop, int *kp)
XX {
XX 	double_t c, hi, lo, rhi, rlo, t, xred;
XX 	int32_t k;
XX 
XX     /* argument reduction */
XX #if 0
XX 	t = rnint((double_t)x * invln2);
XX #else
XX 	t = (double)((double_t)x * invln2 + 0x1.8p52) - 0x1.8p52;
XX #endif
XX 	k = irint(t);
XX 	/*
XX 	 * ln2hi is rounded to 32 bits to more than cover the normal
XX 	 * range -1075 <= k <= 1024.  It covers the wider range
XX 	 * |k| <= 2**21.  We now need it cover the range
XX 	 * -1075 <= k <= 1024+1024+53.
XX 	 */
XX 	hi = x - t*ln2hi;	/* t*ln2hi is exact here */
XX 	lo = t*ln2lo;
XX 	xred = hi - lo;
XX 
XX     /* xred is now in primary range */
XX 	t  = xred*xred;
XX 	c  = xred - t*P1 - t*t*(P2 + t*P3) - t*t*(t*t)*(P4 + t*P5);
XX 	t = xred*c/(2-c);
XX 	rhi = 1;
XX 	rlo = hi;
XX 	_2sumF(rhi, rlo);
XX 	rlo += t-lo;
XX 	*hip = rhi;
XX 	*lop = rlo;
XX 	*kp = k;
XX }
XX 
XX /*
XX  * Extra-precision exp(x)/2.  x must be between the underflow threshold
XX  * and the overflow threshold (in float precision due to our optimization
XX  * of evaluating 2**(k-1) in float precision)), and not very close to
XX  * either, and not tiny.  We only use this for x >= 1 up to the threshold
XX  * where cosh(x) decays to exp(x)/2.
XX  */
XX static inline void
XX k_hexp(double x, double_t *hip, double_t *lop)
XX {
XX 	float twopkm1;
XX 	int k;
XX 
XX 	k_exp(x, hip, lop, &k);
XX 	SET_FLOAT_WORD(twopkm1, 0x3f800000 + ((k - 1) << 23));
XX 	*hip *= twopkm1;
XX 	*lop *= twopkm1;
XX }
XX 
XX /*
XX  * exp(x)/2 without spurious overflow.  x must be well above the underflow
XX  * threshold and at most a little above the overflow threshold, and not
XX  * tiny.  We only use this for x positive and above the threshold where
XX  * cosh(x) decays to exp(x)/2 up to a little above the overflow threshold.
XX  */
XX static inline double_t
XX hexp(double x)
XX {
XX 	double_t hi, lo;
XX 	double twopkm2;
XX 	int k;
XX 
XX 	twopkm2 = 1;
XX 	k_exp(x, &hi, &lo, &k);
XX 	SET_HIGH_WORD(twopkm2, 0x3ff00000 + ((k - 2) << 20));
XX 	return (lo + hi) * 2 * twopkm2;
XX }

k_expl.h also points to the above uncommitted comments for k_hexp() and
hexp().

__ldexp_cexpl() doesn't belong in k_expl.h.  It belongs in a new file
k_expl.c corresponding to k_exp.c.  hexp[fl]() isn't quite a kernel, but
it has to be in a header file since it is inline.

Using static inline functions also reduces namespace problems.  We already
use this to name some kernels simply kfoo() instead of uglier names like
__kernel_foo() used for older methods where the kernels aren't always
inline.

Bruce