From owner-freebsd-numerics@FreeBSD.ORG  Mon Jun 10 00:36:51 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 743F4800
 for <freebsd-numerics@freebsd.org>; Mon, 10 Jun 2013 00:36:51 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 4426C1149
 for <freebsd-numerics@freebsd.org>; Mon, 10 Jun 2013 00:36:51 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r5A0ajD6016493
 for <freebsd-numerics@freebsd.org>; Sun, 9 Jun 2013 17:36:45 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r5A0ajeK016492
 for freebsd-numerics@freebsd.org; Sun, 9 Jun 2013 17:36:45 -0700 (PDT)
 (envelope-from sgk)
Date: Sun, 9 Jun 2013 17:36:45 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: freebsd-numerics@freebsd.org
Subject: Implementation for coshl.
Message-ID: <20130610003645.GA16444@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Jun 2013 00:36:51 -0000

I suspect that there will be some nits with the implementation.
Anyway, testing gives

    Arch   |     Interval        | #calls | Time (s) | Max ULP | Compiler | Value
-----------+---------------------+--------+----------+---------+----------+-------
  i386 [1] | [    0.00:    0.35] |  100M  | 15.0198  | 0.58583 |  gcc     |  1
  i386 [1] | [    0.35:   24.00] |  100M  | 15.1858  | 1.01504 |  gcc     |  2
  i386 [1] | [   24.00:11356.52] |  100M  | 12.9591  | 0.51198 |  gcc     |  3
  i386 [1] | [11356.52:11357.22] |  100M  | 13.3328  | 1.90988 |  gcc     |  4
-----------+---------------------+--------+----------+---------+----------+-------
  amd64 [2]| [    0.00:    0.35] |  100M  | 11.7811  | 0.63075 |  clang   |  5
  amd64 [2]| [    0.35:   24.00] |  100M  | 11.0662  | 1.01535 |  clang   |  6
  amd64 [2]| [   24.00:11356.52] |  100M  | 9.97704  | 0.50852 |  clang   |  7
  amd64 [2]| [11356.52:11357.22] |  100M  | 10.8221  | 1.90931 |  clang   |  8
-----------+---------------------+--------+----------+---------+----------+-------
  amd64 [2]| [    0.00:    0.35] |  100M  | 10.5930  | 0.63075 |  gcc     |  9
  amd64 [2]| [    0.35:   24.00] |  100M  | 10.0606  | 1.01535 |  gcc     | 10
  amd64 [2]| [   24.00:11356.52] |  100M  | 8.76182  | 0.50852 |  gcc     | 11
  amd64 [2]| [11356.52:11357.22] |  100M  | 9.23632  | 1.90931 |  gcc     | 12
-----------+---------------------+--------+----------+---------+----------+-------
sparc64 [3]| [    0.00:    0.35] |    1M  | 62.7154  | 0.57638 |  gcc     | 13
sparc64 [3]| [    0.35:   41.00] |    1M  | 43.2052  | 1.00709 |  gcc     | 14
sparc64 [3]| [   41.00:11356.52] |    1M  | 40.5336  | 0.50587 |  gcc     | 15
sparc64 [3]| [11356.52:11357.22] |    1M  | 44.7029  | 1.90400 |  gcc     | 16

 1  3.3932642386627052e-01, 0x1.5b7862d4b272b9c6p-2
 2  1.2802453843180306e+00, 0x1.47be2958803a846cp+0
 3  2.4087260438954509e+01, 0x1.81656b33b8b51fcp+4
 4  1.1357216248489914e+04, 0x1.62e9bae07cffe94ap+13
 5  3.4657359027997265e-01, 0x1.62e42fefa39ef35ap-2
 6  1.2651166512735005e+00, 0x1.43deaf52d83fa6c4p+0
 7  2.4019265291717229e+01, 0x1.804ee91f5df97166p+4
 8  1.1357215893598522e+04, 0x1.62e9ba266c488adcp+13
 9  3.4657359027997265e-01, 0x1.62e42fefa39ef35ap-2
10  1.2651166512735005e+00, 0x1.43deaf52d83fa6c4p+0
11  2.4019265291717229e+01, 0x1.804ee91f5df97166p+4
12  1.1357215893598522e+04, 0x1.62e9ba266c488adcp+13
13  3.35979864752159202758685843670565218e-01, 0x1.580b1b0ce66d750dad9e5773bc5bp-2
14  1.28794527359513292307645171506249973e+00, 0x1.49b6c80d20fd82749b4ffd5d2b24p+0
15  1.03131066340595824072037883331972584e+04, 0x1.4248da62f5345e862cf6d17602dcp+13
16  1.13572063870748402222198225060228912e+04, 0x1.62e9a6ae44460bffbacfe7a589c2p+13

[1] Intel(R) Core(TM)2 Duo CPU T7250@2.00GHz (1995.04-MHz 686-class CPU)
[2] AMD Opteron(tm) Processor 248 (2191.60-MHz K8-class CPU)
[3] Sun Microsystems UltraSparc-IIe Processor (648.00 MHz CPU)

Two observations:

1. The max ulp for the intervals [0.35:24] and [0.35:41] of 1.xxx is
due to the division in the expression half*exp(x) + half/exp(x).
Bruce and I exchanged emails a long time ago about possible ways
to reduce the ulp in this range by either computer exp(x) with
extra precision or using a table with cosh(x) = cosh(x_i) * cosh(d)
+ sinh(x_i) * sinh(d) with d = x - x_i.  I tried the latter with
disappointing results.  The former would require a refactoring of
s_expl.c into a kernel __kernel_expl(x, hi, lo).  I have no plans on
pursuing this at the this time.

2. gcc produces faster code than clang on amd64.

Comments welcomed.

-- 
Steve

/*
 * ====================================================
 * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
 *
 * Developed at SunSoft, a Sun Microsystems, Inc. business.
 * Permission to use, copy, modify, and distribute this
 * software is freely granted, provided that this notice 
 * is preserved.
 * ====================================================
 *
 * Converted to long double by Steven G. Kargl
 */

#include <sys/cdefs.h>
__FBSDID("$FreeBSD$");

/*
 * See comments in src/e_cosh.c for a description of the algorithm. 
 */
#include <float.h>
#ifdef __i386__
#include <ieeefp.h>
#endif
#include "fpmath.h"
#include "math.h"
#include "math_private.h"

#if LDBL_MANT_DIG == 64
static const union IEEEl2bits
#define	EXP_TINY	-32
#define	s_threshold	24
/* log(2) / 2 */
log2o2u = LD80C(0xb17217f7d1cf79ac, -2, 0.346573590279972654714L),
#define	log2o2	(log2o2u.e)
/* x = log(LDBL_MAX - 0.5) */
o_threshold1u = LD80C(0xb17217f7d1cf79ac, 13, 11356.5234062941439497L),
#define	o_threshold1	(o_threshold1u.e)
/* log(LDBL_MAX - 0.5) + log(2) */
o_threshold2u = LD80C(0xb174ddc031aec0ea, 13, 11357.2165534747038951L);
#define	o_threshold2	(o_threshold2u.e)
#elif LDBL_MANT_DIG == 113
#define	EXP_TINY	-56
#define	s_threshold	41
static long double
log2o2 = 0.346573590279972654708616060729088288L,
o_threshold1 = 11356.5234062941439494919310779707650L,
o_threshold2 = 11357.2165534747038948013483100922230L;
#else
#error "Unsupported long double format"
#endif

static const long double huge = 0x1.p10000L;
static const double half = 0.5;

#define BIAS	(LDBL_MAX_EXP - 1)

long double
coshl(long double x)
{
	long double t, w;
	uint16_t hx, ix;

	ENTERI();

	GET_LDBL_EXPSIGN(hx, x);
	ix = hx & 0x7fff;
	SET_LDBL_EXPSIGN(x, ix);

	/* x is +-Inf or NaN. */
	if (ix == BIAS + LDBL_MAX_EXP)
		RETURNI(x * x);

	if (x < log2o2) {
		if (ix < BIAS + EXP_TINY) {	/* |x| < 0x1pEXP_TINY */
			/* cosh(x) = 1 exactly iff x = +-0. */
			if ((int)x == 0)
				RETURNI(1.0L);
		}
		t = expm1l(x);
		w = 1 + t;
		RETURNI(1 + t * t / (w + w));
	}

	if (x < s_threshold) {
		t = expl(x);
		RETURNI(half * t + half / t);
	}

	if (x < o_threshold1)
		RETURNI(half * expl(x));

	if (x < o_threshold2) {
		t = expl(half * x);
		RETURNI(half * t * t);
	}

	RETURNI(huge * huge);
}

From owner-freebsd-numerics@FreeBSD.ORG  Mon Jun 10 02:32:17 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 66A48826
 for <freebsd-numerics@FreeBSD.org>; Mon, 10 Jun 2013 02:32:17 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au
 [211.29.133.51]) by mx1.freebsd.org (Postfix) with ESMTP id 92BF5166C
 for <freebsd-numerics@FreeBSD.org>; Mon, 10 Jun 2013 02:32:16 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r5A2W54q007158
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Mon, 10 Jun 2013 12:32:07 +1000
Date: Mon, 10 Jun 2013 12:32:05 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: Implementation for coshl.
In-Reply-To: <20130610003645.GA16444@troutmask.apl.washington.edu>
Message-ID: <20130610110740.V24058@besplex.bde.org>
References: <20130610003645.GA16444@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=eqSHVfVX c=1 sm=1 a=LM0AswAWfpYA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=MtpAjnDH-vAA:10
 a=uVhegijvKTqswUFXLp4A:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
Cc: freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Jun 2013 02:32:17 -0000

On Sun, 9 Jun 2013, Steve Kargl wrote:

> I suspect that there will be some nits with the implementation.

Quite a few nats :-).

> Anyway, testing gives
>
>    Arch   |     Interval        | #calls | Time (s) | Max ULP | Compiler | Value
> -----------+---------------------+--------+----------+---------+----------+-------
>  i386 [1] | [    0.00:    0.35] |  100M  | 15.0198  | 0.58583 |  gcc     |  1
>  i386 [1] | [    0.35:   24.00] |  100M  | 15.1858  | 1.01504 |  gcc     |  2
>  i386 [1] | [   24.00:11356.52] |  100M  | 12.9591  | 0.51198 |  gcc     |  3
>  i386 [1] | [11356.52:11357.22] |  100M  | 13.3328  | 1.90988 |  gcc     |  4
> -----------+---------------------+--------+----------+---------+----------+-------

Quite large errors, unfortunately much the same as in double precision.

>  amd64 [2]| [    0.00:    0.35] |  100M  | 11.7811  | 0.63075 |  clang   |  5
>  amd64 [2]| [    0.35:   24.00] |  100M  | 11.0662  | 1.01535 |  clang   |  6
>  amd64 [2]| [   24.00:11356.52] |  100M  | 9.97704  | 0.50852 |  clang   |  7
>  amd64 [2]| [11356.52:11357.22] |  100M  | 10.8221  | 1.90931 |  clang   |  8
> -----------+---------------------+--------+----------+---------+----------+-------

Likely a bug for the ULPs for the first domain to be so differnt.  I suspect
even much smaller differences are due to bugs.

> ...
> 1. The max ulp for the intervals [0.35:24] and [0.35:41] of 1.xxx is
> due to the division in the expression half*exp(x) + half/exp(x).

That's the one with the large MD difference.

> Bruce and I exchanged emails a long time ago about possible ways
> to reduce the ulp in this range by either computer exp(x) with
> extra precision or using a table with cosh(x) = cosh(x_i) * cosh(d)
> + sinh(x_i) * sinh(d) with d = x - x_i.  I tried the latter with
> disappointing results.

The latter may be good for trig functions, but it is bad for hyperbolic
functions.  It is technically difficult to splice the functions.

> The former would require a refactoring of
> s_expl.c into a kernel __kernel_expl(x, hi, lo).  I have no plans on
> pursuing this at the this time.

But you need this soon for __ldexp_exp() and __ldexp_cexp(), which are
needed for hyperbolic functions (the large args already fixed for float
and double precision) and for cexp() and trig and hyperbolic complex
functions.  It is much easier to implement these using a kernel.  I do
this only for float precision.

The hyperbolic functions are also much easier with a kernel.  Most of
the thresholds become unnecessary.  Above about |x| < 0.5, all cases
are handled uniformly using code like:

 	k_expl(x, &hi1, &lo1, &k1);
 	k_expl(-x, &hi2, &lo2, &k2);	/* KISS slow initially  */
 	/*
 	 * Bah, that's too uniform.  I don't want to deal with k.  So
 	 * use a threshold for large |x| (case handled by __ldexp_expl().
 	 * Now for 0.5 <= |x| < thresh:
 	 */
 	k_expl(x, &hi1, &lo1);
 	k_expl(-x, &hi2, &lo2);		/* diferent API includes 2**k */
 	_2sumF(hi1, lo1);		/* a bit sloppy */
 	return (0.5 * (lo2 + lo1 + hi1 + hi2));

Error analysis: since |x| >= 0.5, the ratio exp(x)/exp(-x) is >= exp(1).
Already if we we add exp(x) + exp(-x), the error is at most ~1 ulps
(certainly less than 2).  But our hi_lo approximations give 6-10 extra
bits, so the ~1 ulp error is scaled by 2**-6 or better until the final
addition.

Note that this doesn't need the complexities of expm1l(x) or a kernel
for that.  Adding exp(-x) to exp(x) without increasing the error
significantly is similar to adding -1 to exp(x) (subtracting exp(-x)
for sinh() is even more similar).  The addiitonal complications in
expm1l() are because:
- it wants to give 6-10 bits and not lose any relative to expl()
- |x| >= 0.5 so large cancelations cannot occur.

For |x| <= 0.5, use a poly approx.  0.5 can be reduced significantly
if necessary to get less terms in the poly.  This is less needed than
for expm1l() since the power series about 0 converges much faster
for coshl().

Similarly for sinhl().  I don't know of a similar method for tanhl().
A division seems to be necessary, and hi+lo decompositions only work
well for additions.


> /*
> * ====================================================
> * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
> *
> * Developed at SunSoft, a Sun Microsystems, Inc. business.
> * Permission to use, copy, modify, and distribute this
> * software is freely granted, provided that this notice
> * is preserved.
> * ====================================================
> *
> * Converted to long double by Steven G. Kargl
> */

It changes the style completely, so diffs with the double version are
unreadable.

> #if LDBL_MANT_DIG == 64
> ...
> #else
> #error "Unsupported long double format"
> #endif

The complications for ld80/128 and i386 seem reasonble.

> long double
> coshl(long double x)
> {
> 	long double t, w;
> 	uint16_t hx, ix;
>
> 	ENTERI();
>
> 	GET_LDBL_EXPSIGN(hx, x);
> 	ix = hx & 0x7fff;
> 	SET_LDBL_EXPSIGN(x, ix);

This sign frobbing is pessimal, and is not done by the double version.
Signs are better cleared using fabs*() unless you are sure that clearing
them in bits is more optimal.  But don't optimize before getting it right.
Any optimizations should be made to the double version first.

> 	/* x is +-Inf or NaN. */
> 	if (ix == BIAS + LDBL_MAX_EXP)
> 		RETURNI(x * x);

The sign frobbing also clobbers the result here.

>
> 	if (x < log2o2) {

The threshold comparisons are painful and probably inefficient to do in
bits.

Hoever you have most of the pain of using bits by using long doubles
and LD80C() to declare precise thresholds.  Most or all of the thresholds
are fuzzy and don't need more than float precision (or maybe just a the
exponent).  The double version uses a fuzzy threshold here.  It only
tests the upper 21 bits of the mantissa, so it uses less than float
precision.

> 		if (ix < BIAS + EXP_TINY) {	/* |x| < 0x1pEXP_TINY */
> 			/* cosh(x) = 1 exactly iff x = +-0. */
> 			if ((int)x == 0)
> 				RETURNI(1.0L);
> 		}

Unnecessary algorithm change and micro-optimization.  The double version
uses the general case doing 1+t to set inexact here.

> 		t = expm1l(x);
> 		w = 1 + t;
> 		RETURNI(1 + t * t / (w + w));
> 	}
> ...
> 	if (x < o_threshold2) {
> 		t = expl(half * x);
> 		RETURNI(half * t * t);
> 	}

This is missing use of __ldexp_expl().

Going back to the painful threshold declarations:

> #if LDBL_MANT_DIG == 64
> static const union IEEEl2bits
> #define	EXP_TINY	-32

Strange placement of macro in the middle of a declaration.

This should be simply LDBL_MANT_DIG / 2.

> #define	s_threshold	24

I don't understand the magic for this now.  It is not quite LDBL_MANT_DIG / 3.

> /* log(2) / 2 */
> log2o2u = LD80C(0xb17217f7d1cf79ac, -2, 0.346573590279972654714L),
> #define	log2o2	(log2o2u.e)
> /* x = log(LDBL_MAX - 0.5) */
> o_threshold1u = LD80C(0xb17217f7d1cf79ac, 13, 11356.5234062941439497L),
> #define	o_threshold1	(o_threshold1u.e)
> /* log(LDBL_MAX - 0.5) + log(2) */
> o_threshold2u = LD80C(0xb174ddc031aec0ea, 13, 11357.2165534747038951L);
> #define	o_threshold2	(o_threshold2u.e)
> #elif LDBL_MANT_DIG == 113
> #define	EXP_TINY	-56
> #define	s_threshold	41
> static long double
> log2o2 = 0.346573590279972654708616060729088288L,
> o_threshold1 = 11356.5234062941439494919310779707650L,
> o_threshold2 = 11357.2165534747038948013483100922230L;
> #else
> #error "Unsupported long double format"
> #endif

No need for any long doubles or LD80C()'s.  The double version uses only
21 bits for all thresholds.  We had to be more careful about the thresholds
for technical reasons in expl().  IIRC, we needed precise thresholds to
do a final filtering step after a fuzzy initial classification.  This
isn't needed here, since we use calculations that can't give spurious
overflow.

Going back a bit more:
- 'huge' is missing a volatile qualifier to work around clang bugs.
   expl() is already out of date relative to exp2l() since it is missing
   the recent fix to add this qualifier.  This fix is missing in the
   double and float versions of all the hyperbolic functions too.
- 'half' doesn't need to be long double
- the double version has a variable 'one'.  You changed that to '1', but
   didn't change 'half' to 0.5.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Mon Jun 10 05:58:41 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id B17D16F7
 for <freebsd-numerics@FreeBSD.org>; Mon, 10 Jun 2013 05:58:41 +0000 (UTC)
 (envelope-from das@FreeBSD.ORG)
Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net
 [50.196.151.174])
 by mx1.freebsd.org (Postfix) with ESMTP id 982191C88
 for <freebsd-numerics@FreeBSD.org>; Mon, 10 Jun 2013 05:58:41 +0000 (UTC)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
 by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r5A5wYKw068908;
 Sun, 9 Jun 2013 22:58:34 -0700 (PDT) (envelope-from das@FreeBSD.ORG)
Received: (from das@localhost)
 by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r5A5wYIm068907;
 Sun, 9 Jun 2013 22:58:34 -0700 (PDT) (envelope-from das@FreeBSD.ORG)
Date: Sun, 9 Jun 2013 22:58:34 -0700
From: David Schultz <das@FreeBSD.ORG>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: Implementation for coshl.
Message-ID: <20130610055834.GA68643@zim.MIT.EDU>
References: <20130610003645.GA16444@troutmask.apl.washington.edu>
 <20130610110740.V24058@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130610110740.V24058@besplex.bde.org>
Cc: freebsd-numerics@FreeBSD.org,
 Steve Kargl <sgk@troutmask.apl.washington.edu>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Jun 2013 05:58:41 -0000

Awesome, thanks for working on this.

On Mon, Jun 10, 2013, Bruce Evans wrote:
> On Sun, 9 Jun 2013, Steve Kargl wrote:
> >    Arch   |     Interval        | #calls | Time (s) | Max ULP | Compiler | Value
> > -----------+---------------------+--------+----------+---------+----------+-------
> >  i386 [1] | [    0.00:    0.35] |  100M  | 15.0198  | 0.58583 |  gcc     |  1
> >  i386 [1] | [    0.35:   24.00] |  100M  | 15.1858  | 1.01504 |  gcc     |  2
> >  i386 [1] | [   24.00:11356.52] |  100M  | 12.9591  | 0.51198 |  gcc     |  3
> >  i386 [1] | [11356.52:11357.22] |  100M  | 13.3328  | 1.90988 |  gcc     |  4
> > -----------+---------------------+--------+----------+---------+----------+-------
> 
> Quite large errors, unfortunately much the same as in double precision.
[...]
> > Bruce and I exchanged emails a long time ago about possible ways
> > to reduce the ulp in this range by either computer exp(x) with
> > extra precision or using a table with cosh(x) = cosh(x_i) * cosh(d)
> > + sinh(x_i) * sinh(d) with d = x - x_i.  I tried the latter with
> > disappointing results.

This is fine for now.  We might eventually want to improve the
accuracy of the float, double, and long double implementations,
but that ought to come later.  It's much easier to start with
float and double when making improvements in the algorithm.

> > The former would require a refactoring of
> > s_expl.c into a kernel __kernel_expl(x, hi, lo).  I have no plans on
> > pursuing this at the this time.
> 
> But you need this soon for __ldexp_exp() and __ldexp_cexp(), which are
> needed for hyperbolic functions (the large args already fixed for float
> and double precision) and for cexp() and trig and hyperbolic complex
> functions.  It is much easier to implement these using a kernel.  I do
> this only for float precision.

That's fine for now.  We need a better k_exp, too.  I believe
Bruce was working on one.

I have a few comments on the code... mostly a subset of Bruce's
comments:

- You can clear the sign bit on x after the Inf/NaN check.

- I think the way you handle the tiny-x case is an improvement
  over the double version.  However, float-to-int conversion is
  very slow on x86, so you might instead try something like
  RETURNI(1.0L + tiny), where tiny is volatile and e.g., 0x1p-1000.
  That will also return the correct result in all rounding modes.

- Please check whether volatile hacks are needed to induce the
  compiler to do the right thing.  I'm increasingly finding that
  these are needed with clang.

- I agree with Bruce about the thresholds not needing to be exact.
  In fact, if you try to use exact thresholds, the threshold might
  round the wrong way and give a subtle off-by-one-ulp error.
  Often just looking at the exponent is sufficient.

From owner-freebsd-numerics@FreeBSD.ORG  Mon Jun 10 11:06:53 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 57FF4FFF
 for <freebsd-numerics@FreeBSD.org>; Mon, 10 Jun 2013 11:06:53 +0000 (UTC)
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 23CBF1C90
 for <freebsd-numerics@FreeBSD.org>; Mon, 10 Jun 2013 11:06:53 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r5AB6qaI097048
 for <freebsd-numerics@FreeBSD.org>; Mon, 10 Jun 2013 11:06:52 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r5AB6qQH097046
 for freebsd-numerics@FreeBSD.org; Mon, 10 Jun 2013 11:06:52 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 10 Jun 2013 11:06:52 GMT
Message-Id: <201306101106.r5AB6qQH097046@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
 owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@freebsd.org>
To: freebsd-numerics@FreeBSD.org
Subject: Current problem reports assigned to freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Jun 2013 11:06:53 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o stand/175811 numerics   libstdc++ needs complex support in order use C99
o bin/170206   numerics   [msun] [patch] complex arcsinh, log, etc.

2 problems total.


From owner-freebsd-numerics@FreeBSD.ORG  Mon Jun 10 15:43:28 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 5D5C4C9B;
 Mon, 10 Jun 2013 15:43:28 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail36.syd.optusnet.com.au (mail36.syd.optusnet.com.au
 [211.29.133.76])
 by mx1.freebsd.org (Postfix) with ESMTP id D7DFC1F74;
 Mon, 10 Jun 2013 15:43:26 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail36.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r5AFhHfe009078
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Tue, 11 Jun 2013 01:43:18 +1000
Date: Tue, 11 Jun 2013 01:43:17 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: David Schultz <das@FreeBSD.org>
Subject: Re: Implementation for coshl.
In-Reply-To: <20130610055834.GA68643@zim.MIT.EDU>
Message-ID: <20130610235038.D790@besplex.bde.org>
References: <20130610003645.GA16444@troutmask.apl.washington.edu>
 <20130610110740.V24058@besplex.bde.org> <20130610055834.GA68643@zim.MIT.EDU>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=K8x6hFqI c=1 sm=1 a=LM0AswAWfpYA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=MtpAjnDH-vAA:10
 a=o_Yc0WgG-dlwfkC-MugA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
Cc: freebsd-numerics@FreeBSD.org,
 Steve Kargl <sgk@troutmask.apl.washington.edu>,
 Bruce Evans <brde@optusnet.com.au>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Jun 2013 15:43:28 -0000

On Sun, 9 Jun 2013, David Schultz wrote:

> On Mon, Jun 10, 2013, Bruce Evans wrote:
>> On Sun, 9 Jun 2013, Steve Kargl wrote:
>
> This is fine for now.  We might eventually want to improve the
> accuracy of the float, double, and long double implementations,
> but that ought to come later.  It's much easier to start with
> float and double when making improvements in the algorithm.

Yes, it would have been better to improve them before cloning their
bugs.

>>> The former would require a refactoring of
>>> s_expl.c into a kernel __kernel_expl(x, hi, lo).  I have no plans on
>>> pursuing this at the this time.
>>
>> But you need this soon for __ldexp_exp() and __ldexp_cexp(), which are
>> needed for hyperbolic functions (the large args already fixed for float
>> and double precision) and for cexp() and trig and hyperbolic complex
>> functions.  It is much easier to implement these using a kernel.  I do
>> this only for float precision.
>
> That's fine for now.  We need a better k_exp, too.  I believe
> Bruce was working on one.

This is especially easy for long double precision, since expl() only
has 1 return that needs to be changed:

@ 	/* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */
@ 	z = r * r;
@ 	q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6;
@ 	t = (long double)tbl[n2].lo + tbl[n2].hi;
@ 	t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;

Return the hi and low parts tbl[n2].hi and tbl[n2].lo + t * (q + r1) here
and add them up in callers.

@ 
@ 	/* Scale by 2**k. */
@ 	if (k >= LDBL_MIN_EXP) {
@ 		if (k == LDBL_MAX_EXP)
@ 			RETURNI(t * 2 * 0x1p16383L);
@ 		RETURNI(t * twopk);
@ 	} else {
@ 		RETURNI(t * twopkp10000 * twom10000);
@ 	}

Return k and/or twopk here and multiply by them in callers.

Remove all other returns (don't support special values in the kernel).

The only difficulty is that twopk doesn't give the full power for cases
near of after overflow and underflow, and those cases are precisely the
critical ones for __ldexp_[c]exp*().

I have this sort of working for sinhf() in float precision, for the
range [1, 9+].  There is no problem with twopk in this range, and I
multiply by it in the kernel and don't return it.  A simplified
version of adding up the values in the caller is:

 	_k_exp(fabs(x), &hi1, &lo1);
 	_k_exp(-fabs(x), &hi2, &lo2);
 	return h * (lo2 + lo1 + hi2 + hi1);

This summation is sloppy and loses about hi2/hi1 ulps unnecessarily.
hi2/hi1 <= exp(-2*|x|) < exp(-2) so this loss is never absolutely
large, and is only relatively large if exp() it very accurate.
This error is in addition to the error in exp(|x|) ~= lo1 + h1.

> I have a few comments on the code... mostly a subset of Bruce's
> comments:
> ...
> - I think the way you handle the tiny-x case is an improvement
>  over the double version.  However, float-to-int conversion is
>  very slow on x86, so you might instead try something like
>  RETURNI(1.0L + tiny), where tiny is volatile and e.g., 0x1p-1000.
>  That will also return the correct result in all rounding modes.

No, it is an unimprovement.  It micro-optimizes for an unusual case,
and breaks the rounding in that case.  We were careful about this in
expl() and expm1l().  expl(x) returns 1+x, and expm1l(x) returns a
very complicated expression to get the rounding right for x + x*x/2.
This is quite difficult for expm1l() since the signs alternate.  In
the complicated expression, we needed an explicit test for x == 0,
since the rest of the complicated expression would mess up the signed
for +-0 in some rounding modes.

This particular float-to-int conversion isn't very slow on i386.  In
previous lectures, I explained why it is very efficient even on (out
of order execution) i386's: no operand depend on its result, so it can
be and is done in parallel.  It is magic complicated expressions that
are inefficient, since even a simple expression that uses the arg gives
1 dependency and complicated ones give more.

I tried to get the rounding for values of +-0 and +-pi/2 right in
catrig*.c.  But fdlibm generally isn't very careful about this.  I
noticed the following types of errors:
- expressions using pi/2 in such a way that it doesn't work right in
   all rounding modes.  Returning a const pi/2 can never work in all
   rounding modes, so exprssions like pi/2 + tiny should be used
- often, tiny isn't volatile, so compiler bugs prevent pi/2 + tiny
   working
- sometimes, optimization broke pi/2 + tiny by replacing it by pi/2
- often, pi/2 is written as pio2_hi + pio2_lo.  This (and pi for pi/2)
   should be rounded down so that adding tiny to it works right.  IIRC,
   rounding to nearest gives the same result as rounding down in double
   precision, so everyone gets the same result, but in float precision
   rounding to nearest rounds up.  This delicacy was not understood when
   the functions were converted to float precision, so there are some
   misrounded spittings of pi/2.  Now I'm not sure if any fixed splitting
   can work in all rounding modes.  The complications from double rounding
   are espcially large since compile-time rounding of the constant is mixed
   with runtime rounding of expressions using it.

   Typical code using this from e_asin.c:

 	    if(((ix-0x3ff00000)|lx)==0)
 		    /* asin(1)=+-pi/2 with inexact */
 		return x*pio2_hi+x*pio2_lo;

   The comment is bad and should say asin(-1) (for asin(1), we wouldn't
   use x in the expression).  The result of this is pio2_hi + pio2_lo
   for x = 1 and -pio2_hi - pio2_lo for x = -1.  This sets inexact but
   seems to be quite broken for non-default rounding modes.  It seems
   too hard to round to pio/2 correctly in all cases (unless delicate
   rounding in the splitting does it), but the aboves breaks even asin()
   being an odd function when the rounding mode is towards +-infinity.
   Most odd functions avoid this problem by calculating func(|x|) and
   adjusting the sign at the end.

   The bugs in this were moved by dubious optimizations in e_asinf.c:

 	    if(ix==0x3f800000)		/* |x| == 1 */
 		return x*pio2;		/* asin(+-1) = +-pi/2 with inexact */

   Now asinf() is odd at +-1 although not for general args; there is no
   longer any chance of correct rounding at +-1 in all rounding modes;
   inexact is no longer set in the code, but is still claimed to be set
   in the comment.  I implemented most of these bugs :-(.  Fortunately
   they were not copied to the long double version.

- constant folding of pio2_hi + pio2_lo may cause problems even without
   compiler bugs.  The above expression with x's instead of +-1's hopefully
   prevents compilers reducing to constant expressions and then folding.
   If we wrote the classification as (fabs(x) == 1), then the optimizer
   wouldn't have to be very smart to notice that x is 1 or -1 and doing
   the folding.  That would only be an invalid optimization with FENV on.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Mon Jun 10 16:29:28 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id F329F215;
 Mon, 10 Jun 2013 16:29:27 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id DDAC51265;
 Mon, 10 Jun 2013 16:29:27 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r5AGTRBn020971; 
 Mon, 10 Jun 2013 09:29:27 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r5AGTRwK020970;
 Mon, 10 Jun 2013 09:29:27 -0700 (PDT) (envelope-from sgk)
Date: Mon, 10 Jun 2013 09:29:27 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: David Schultz <das@FreeBSD.ORG>
Subject: Re: Implementation for coshl.
Message-ID: <20130610162927.GA20856@troutmask.apl.washington.edu>
References: <20130610003645.GA16444@troutmask.apl.washington.edu>
 <20130610110740.V24058@besplex.bde.org>
 <20130610055834.GA68643@zim.MIT.EDU>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130610055834.GA68643@zim.MIT.EDU>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-numerics@FreeBSD.org, Bruce Evans <brde@optusnet.com.au>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Jun 2013 16:29:28 -0000

On Sun, Jun 09, 2013 at 10:58:34PM -0700, David Schultz wrote:
> Awesome, thanks for working on this.
> 
> On Mon, Jun 10, 2013, Bruce Evans wrote:
> > On Sun, 9 Jun 2013, Steve Kargl wrote:
> > >    Arch   |     Interval        | #calls | Time (s) | Max ULP | Compiler | Value
> > > -----------+---------------------+--------+----------+---------+----------+-------
> > >  i386 [1] | [    0.00:    0.35] |  100M  | 15.0198  | 0.58583 |  gcc     |  1
> > >  i386 [1] | [    0.35:   24.00] |  100M  | 15.1858  | 1.01504 |  gcc     |  2
> > >  i386 [1] | [   24.00:11356.52] |  100M  | 12.9591  | 0.51198 |  gcc     |  3
> > >  i386 [1] | [11356.52:11357.22] |  100M  | 13.3328  | 1.90988 |  gcc     |  4
> > > -----------+---------------------+--------+----------+---------+----------+-------
> > 
> > Quite large errors, unfortunately much the same as in double precision.
> [...]
> > > Bruce and I exchanged emails a long time ago about possible ways
> > > to reduce the ulp in this range by either computer exp(x) with
> > > extra precision or using a table with cosh(x) = cosh(x_i) * cosh(d)
> > > + sinh(x_i) * sinh(d) with d = x - x_i.  I tried the latter with
> > > disappointing results.
> 
> This is fine for now.  We might eventually want to improve the
> accuracy of the float, double, and long double implementations,
> but that ought to come later.  It's much easier to start with
> float and double when making improvements in the algorithm.

I do not have working test programs for float and double, which
will take a morning (or late night) to fix.  So, I do not have
a method for testing the either the current cosh[f] or any changes
that I might suggest.

> > > The former would require a refactoring of
> > > s_expl.c into a kernel __kernel_expl(x, hi, lo).  I have no plans on
> > > pursuing this at the this time.
> > 
> > But you need this soon for __ldexp_exp() and __ldexp_cexp(), which are
> > needed for hyperbolic functions (the large args already fixed for float
> > and double precision) and for cexp() and trig and hyperbolic complex
> > functions.  It is much easier to implement these using a kernel.  I do
> > this only for float precision.

I spent a couple of hours last night pulling __kernel_expl out
of expl().  The results were less then stellar, but Bruce's
email suggested the error in my method.  

> That's fine for now.  We need a better k_exp, too.  I believe
> Bruce was working on one.
> 
> I have a few comments on the code... mostly a subset of Bruce's
> comments:
> 
> - You can clear the sign bit on x after the Inf/NaN check.

OK.

> - I think the way you handle the tiny-x case is an improvement
>   over the double version.  However, float-to-int conversion is
>   very slow on x86, so you might instead try something like
>   RETURNI(1.0L + tiny), where tiny is volatile and e.g., 0x1p-1000.
>   That will also return the correct result in all rounding modes.

I originally had

                if (ix < BIAS - 33) {   /* |x| < 0x1p-33 */
                        /* cosh(+-0) = 1. */
                        if (x == 0)
                                return (one);
                        /* cosh(tiny) = one */
                        return (one + tiny * tiny);
                }

because C99 requires cosh(+-0) to be exact.  For |x| != the
return value will raise inexact.

> - Please check whether volatile hacks are needed to induce the
>   compiler to do the right thing.  I'm increasingly finding that
>   these are needed with clang.

I suppose I need to either look at the generated assembly or
add testing for signals to my test programs.  How do you go
about looking at this?

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Mon Jun 10 16:37:09 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id CD82F321;
 Mon, 10 Jun 2013 16:37:09 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 97AB212B1;
 Mon, 10 Jun 2013 16:37:09 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r5AGb90A021017; 
 Mon, 10 Jun 2013 09:37:09 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r5AGb9Ve021016;
 Mon, 10 Jun 2013 09:37:09 -0700 (PDT) (envelope-from sgk)
Date: Mon, 10 Jun 2013 09:37:09 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: Implementation for coshl.
Message-ID: <20130610163709.GB20856@troutmask.apl.washington.edu>
References: <20130610003645.GA16444@troutmask.apl.washington.edu>
 <20130610110740.V24058@besplex.bde.org>
 <20130610055834.GA68643@zim.MIT.EDU>
 <20130610235038.D790@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130610235038.D790@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: David Schultz <das@FreeBSD.org>, freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Jun 2013 16:37:09 -0000

On Tue, Jun 11, 2013 at 01:43:17AM +1000, Bruce Evans wrote:
> On Sun, 9 Jun 2013, David Schultz wrote:
>> On Mon, Jun 10, 2013, Bruce Evans wrote:
>>> On Sun, 9 Jun 2013, Steve Kargl wrote:
>>>> The former would require a refactoring of
>>>> s_expl.c into a kernel __kernel_expl(x, hi, lo).  I have no plans on
>>>> pursuing this at the this time.
>>>
>>> But you need this soon for __ldexp_exp() and __ldexp_cexp(), which are
>>> needed for hyperbolic functions (the large args already fixed for float
>>> and double precision) and for cexp() and trig and hyperbolic complex
>>> functions.  It is much easier to implement these using a kernel.  I do
>>> this only for float precision.
>>
>> That's fine for now.  We need a better k_exp, too.  I believe
>> Bruce was working on one.
> 
> This is especially easy for long double precision, since expl() only
> has 1 return that needs to be changed:
> 
> @ 	/* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */
> @ 	z = r * r;
> @ 	q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6;
> @ 	t = (long double)tbl[n2].lo + tbl[n2].hi;
> @ 	t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
> 
> Return the hi and low parts tbl[n2].hi and tbl[n2].lo + t * (q + r1) here
> and add them up in callers.
> 
> @ 
> @ 	/* Scale by 2**k. */
> @ 	if (k >= LDBL_MIN_EXP) {
> @ 		if (k == LDBL_MAX_EXP)
> @ 			RETURNI(t * 2 * 0x1p16383L);
> @ 		RETURNI(t * twopk);
> @ 	} else {
> @ 		RETURNI(t * twopkp10000 * twom10000);
> @ 	}
> 
> Return k and/or twopk here and multiply by them in callers.

Ah, so that's the answer to my unasked question.  I spent a
few hours last night working on this, but I was trying to
apply the scaling to hi and lo before adding them in the
caller.  The result was that the max observed ULP in my
tests went from 0.51 to 0.73, and I believe, but did not
test, that the overflow threshold was effected.

> > I have a few comments on the code... mostly a subset of Bruce's
> > comments:
> > ...
> > - I think the way you handle the tiny-x case is an improvement
> >  over the double version.  However, float-to-int conversion is
> >  very slow on x86, so you might instead try something like
> >  RETURNI(1.0L + tiny), where tiny is volatile and e.g., 0x1p-1000.
> >  That will also return the correct result in all rounding modes.
> 
> No, it is an unimprovement.  It micro-optimizes for an unusual case,
> and breaks the rounding in that case.

OK.  I'll revisit this section of code.

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Mon Jun 10 17:06:05 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 3B5FA2ED
 for <freebsd-numerics@FreeBSD.org>; Mon, 10 Jun 2013 17:06:05 +0000 (UTC)
 (envelope-from das@FreeBSD.ORG)
Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net
 [50.196.151.174])
 by mx1.freebsd.org (Postfix) with ESMTP id 12ED6162A
 for <freebsd-numerics@FreeBSD.org>; Mon, 10 Jun 2013 17:06:04 +0000 (UTC)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
 by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r5AH63B1072698;
 Mon, 10 Jun 2013 10:06:03 -0700 (PDT) (envelope-from das@FreeBSD.ORG)
Received: (from das@localhost)
 by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r5AH63He072697;
 Mon, 10 Jun 2013 10:06:03 -0700 (PDT) (envelope-from das@FreeBSD.ORG)
Date: Mon, 10 Jun 2013 10:06:03 -0700
From: David Schultz <das@FreeBSD.ORG>
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: Implementation for coshl.
Message-ID: <20130610170603.GA72597@zim.MIT.EDU>
References: <20130610003645.GA16444@troutmask.apl.washington.edu>
 <20130610110740.V24058@besplex.bde.org>
 <20130610055834.GA68643@zim.MIT.EDU>
 <20130610162927.GA20856@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130610162927.GA20856@troutmask.apl.washington.edu>
Cc: freebsd-numerics@FreeBSD.org, Bruce Evans <brde@optusnet.com.au>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Jun 2013 17:06:05 -0000

On Mon, Jun 10, 2013, Steve Kargl wrote:
> On Sun, Jun 09, 2013 at 10:58:34PM -0700, David Schultz wrote:
> > Awesome, thanks for working on this.
> > 
> > On Mon, Jun 10, 2013, Bruce Evans wrote:
> > > On Sun, 9 Jun 2013, Steve Kargl wrote:
> > > >    Arch   |     Interval        | #calls | Time (s) | Max ULP | Compiler | Value
> > > > -----------+---------------------+--------+----------+---------+----------+-------
> > > >  i386 [1] | [    0.00:    0.35] |  100M  | 15.0198  | 0.58583 |  gcc     |  1
> > > >  i386 [1] | [    0.35:   24.00] |  100M  | 15.1858  | 1.01504 |  gcc     |  2
> > > >  i386 [1] | [   24.00:11356.52] |  100M  | 12.9591  | 0.51198 |  gcc     |  3
> > > >  i386 [1] | [11356.52:11357.22] |  100M  | 13.3328  | 1.90988 |  gcc     |  4
> > > > -----------+---------------------+--------+----------+---------+----------+-------
> > > 
> > > Quite large errors, unfortunately much the same as in double precision.
> > [...]
> > > > Bruce and I exchanged emails a long time ago about possible ways
> > > > to reduce the ulp in this range by either computer exp(x) with
> > > > extra precision or using a table with cosh(x) = cosh(x_i) * cosh(d)
> > > > + sinh(x_i) * sinh(d) with d = x - x_i.  I tried the latter with
> > > > disappointing results.
> > 
> > This is fine for now.  We might eventually want to improve the
> > accuracy of the float, double, and long double implementations,
> > but that ought to come later.  It's much easier to start with
> > float and double when making improvements in the algorithm.
> 
> I do not have working test programs for float and double, which
> will take a morning (or late night) to fix.  So, I do not have
> a method for testing the either the current cosh[f] or any changes
> that I might suggest.

I think you might have misinterpreted my suggestion.  I meant that
you should not spend too much time worrying about improvements in
the algorithm at the same time as you're porting it from double to
long double.  Improvements in the algorithm can come separately,
and when we do tackle them, it is easier to develop them in float
precision first.

> > > > The former would require a refactoring of
> > > > s_expl.c into a kernel __kernel_expl(x, hi, lo).  I have no plans on
> > > > pursuing this at the this time.
> > > 
> > > But you need this soon for __ldexp_exp() and __ldexp_cexp(), which are
> > > needed for hyperbolic functions (the large args already fixed for float
> > > and double precision) and for cexp() and trig and hyperbolic complex
> > > functions.  It is much easier to implement these using a kernel.  I do
> > > this only for float precision.
> 
> I spent a couple of hours last night pulling __kernel_expl out
> of expl().  The results were less then stellar, but Bruce's
> email suggested the error in my method.  

My recollection of working on this in double precision is that
Bruce was unhappy with most proposals.  Either it made exp slower
by having exp call the kernel, or it resulted in lots of code
duplication.  I think it's a very useful thing to have, but don't
get stuck in a rathole for a month over it.

> > - I think the way you handle the tiny-x case is an improvement
> >   over the double version.  However, float-to-int conversion is
> >   very slow on x86, so you might instead try something like
> >   RETURNI(1.0L + tiny), where tiny is volatile and e.g., 0x1p-1000.
> >   That will also return the correct result in all rounding modes.
> 
> I originally had
> 
>                 if (ix < BIAS - 33) {   /* |x| < 0x1p-33 */
>                         /* cosh(+-0) = 1. */
>                         if (x == 0)
>                                 return (one);
>                         /* cosh(tiny) = one */
>                         return (one + tiny * tiny);
>                 }
> 
> because C99 requires cosh(+-0) to be exact.  For |x| != the
> return value will raise inexact.

Aah, right, what I proposed won't handle 0 correctly.  In that
case, I think what you have is fine; it should handle everything
except for the rounding mode issue, which we don't really support
anyway.  Alternatively, you could just do what fdlibm does, which
is hard to argue with given that it has been acceptable for
several decades.

> > - Please check whether volatile hacks are needed to induce the
> >   compiler to do the right thing.  I'm increasingly finding that
> >   these are needed with clang.
> 
> I suppose I need to either look at the generated assembly or
> add testing for signals to my test programs.  How do you go
> about looking at this?

#include <fenv.h>
volatile long double x = ..., y;
feclearexcept(FE_ALL_EXCEPT);
y = coshl(x);
printf("%x\n", fetestexcept(FE_ALL_EXCEPT));

The main cases to check are:

- cosh(+-0) = 0, no exceptions
- cosh(large) = inf, overflow exception
  (On some architectures, this also raises inexact, which is allowed.)
- cosh(inf or qnan) raises no exceptions
- cosh(anything else) raises an inexact exception

From owner-freebsd-numerics@FreeBSD.ORG  Mon Jun 10 19:14:54 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 9285C585;
 Mon, 10 Jun 2013 19:14:54 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 41E0A1BF2;
 Mon, 10 Jun 2013 19:14:53 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id B7BDA3C0EA5;
 Tue, 11 Jun 2013 05:14:51 +1000 (EST)
Date: Tue, 11 Jun 2013 05:14:46 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: David Schultz <das@freebsd.org>
Subject: Re: Implementation for coshl.
In-Reply-To: <20130610170603.GA72597@zim.MIT.EDU>
Message-ID: <20130611040059.A17928@besplex.bde.org>
References: <20130610003645.GA16444@troutmask.apl.washington.edu>
 <20130610110740.V24058@besplex.bde.org> <20130610055834.GA68643@zim.MIT.EDU>
 <20130610162927.GA20856@troutmask.apl.washington.edu>
 <20130610170603.GA72597@zim.MIT.EDU>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=Q6eKePKa c=1 sm=1 a=LM0AswAWfpYA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=MtpAjnDH-vAA:10
 a=HMF5KdcwFu_V6rpZWRkA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
Cc: freebsd-numerics@freebsd.org,
 Steve Kargl <sgk@troutmask.apl.washington.edu>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Jun 2013 19:14:54 -0000

On Mon, 10 Jun 2013, David Schultz wrote:

> On Mon, Jun 10, 2013, Steve Kargl wrote:
>> On Sun, Jun 09, 2013 at 10:58:34PM -0700, David Schultz wrote:
> ...
>>> - I think the way you handle the tiny-x case is an improvement
>>>   over the double version.  However, float-to-int conversion is
>>>   very slow on x86, so you might instead try something like
>>>   RETURNI(1.0L + tiny), where tiny is volatile and e.g., 0x1p-1000.
>>>   That will also return the correct result in all rounding modes.
>>
>> I originally had
>>
>>                 if (ix < BIAS - 33) {   /* |x| < 0x1p-33 */
>>                         /* cosh(+-0) = 1. */
>>                         if (x == 0)
>>                                 return (one);
>>                         /* cosh(tiny) = one */
>>                         return (one + tiny * tiny);
>>                 }
>>
>> because C99 requires cosh(+-0) to be exact.  For |x| != the
>> return value will raise inexact.
>
> Aah, right, what I proposed won't handle 0 correctly.  In that
> case, I think what you have is fine; it should handle everything
> except for the rounding mode issue, which we don't really support
> anyway.  Alternatively, you could just do what fdlibm does, which
> is hard to argue with given that it has been acceptable for
> several decades.

No, the above is quite broken.  tiny * tiny gives underflow, and
when the underflow is to 0, adding 1 to it doesn't give inexact.
Also, the 'tiny' in the comment is confusing, since it refers to
a different variable than the 'tiny' in the code.

Just copy the known-good fdlibm code.  It uses one+t, where t is
expm1(fabs(x)).  This has many subtleties:
- it avoids pessimizing the usual case
- it moves all the complications for setting inexact to expm1().  These
   include:
   - the expression 1+x doesn't work for tiny x since it rounds incorrectly
     in some modes.  It also requires x >= 0 for correct rounding, so
     we may have to caclulate |x| earlier than we want.
   - the expression 1+x*x doesn't work for tiny x since it may underflow
     (just like 1+tiny*tiny)
   - the expression x doesn't work for tiny x in expm1() since it doesn't
     set inexact.
   - the expression x+x*x doesn't work for tiny x in expm1() since it may
     underflow.

fdlibm expm1() avoids the last 2 problems using the strange expressions
't = huge+x; return x - (t-(huge+x));'.  That is unlikely to be best,
but it does avoid a branch for testing x == 0.  We may have looked at
this before.  If so, we didn't like it, and use a different rather
strange expression in expm1l(), and we still need a branch.  fdlibm
itself uses a different method for the same problem in log1p().  It
uses 'if (two54+x>zero && ax<3c90000)...'.  I didn't like that, and
used the more common ((int)x == 0) method to set inexact after classifying
x as being tiny using a test like (ax<3c90000).  Benchmarks showed that
the ((int)x = 0) works well.
   I just thought of another possible reason for this: on i386 the cast
   requires large code with an fenv switch.  Perhaps the compiler moves
   the code out of the way, the same as if we used a __predict_false().
   This is good since the tiny-arg case is special.

   The ENTERI() and RETURNI() macros generate similar large code, but
   much larger and uglier because of excessive carew taken to avoid
   trapping in fpsetprec().  gcc generates almost 100 bytes per macro
   invocation, so average small functions explode in object code if
   they have many returns in them.  But compilers seem to do a good
   job of moving this code out of the way.

>>> - Please check whether volatile hacks are needed to induce the
>>>   compiler to do the right thing.  I'm increasingly finding that
>>>   these are needed with clang.
>>
>> I suppose I need to either look at the generated assembly or
>> add testing for signals to my test programs.  How do you go
>> about looking at this?
>
> #include <fenv.h>
> volatile long double x = ..., y;
> feclearexcept(FE_ALL_EXCEPT);
> y = coshl(x);
> printf("%x\n", fetestexcept(FE_ALL_EXCEPT));
>
> The main cases to check are:
>
> - cosh(+-0) = 0, no exceptions
> - cosh(large) = inf, overflow exception
>  (On some architectures, this also raises inexact, which is allowed.)
> - cosh(inf or qnan) raises no exceptions
> - cosh(anything else) raises an inexact exception

Or in a debugger, put breakpoints at interesting points and look at
the exception mask.  This is easier on i386, or with long doubles.
"info float" decodes the i387 exception mask.  "p $mxcsr" displays
the SSE mask.  Later versions of gdb decode it too.  I don't know any
easy way to write to the i387 exception mask in gdb.  $mxcsr can be
assigned to.

I now have the kernel expf working for sinhf like I expected.  It worked
better when didn't corrupt hi-lo to hi_lo, doh.

To emulate the eventual high-precision version, I use this:

% void
% k_expf(float x, float *hip, float *lop)
% {
%     long double r;
% 
%     r = expl(x);
%     *hip = r;
%     *lop = r - *hip;
% }

This gives 24+24 bits of accuracy, and sinhf() ends up being perfectly
rounded in the range where it uses this.  A better enulation would
destroy many low bits to get only about 24+6 bits of accuracy.

The version created from expf()'s internals has about 24+3 bits.

Then in sinhf(x), for 1 <= x <= 12:

% 	    float hi1, hi2, lo1, lo2;
% 	    extern void k_expf(float x, float *hip, float *lop);
% 
% 	    t = fabsf(x);
% 	    k_expf(t, &hi1, &lo1);

Returning h * (lo1 + 1 / (hi1 + lo1) + hi1) as this point works OK.
Then the final error is under 1 ulp.  Without this, sophisticated
expressions were needed to keep it under 2 ulps.

% 	    k_expf(-t, &hi2, &lo2);
% 	    hi2 = -hi2;
% 	    lo2 = -lo2;
% 	    _2sumF(hi1, hi2);
% 	    return h * (lo1 + hi2 + lo2 + hi1);

Adding up the terms like this retains almost all the accuracy provided
by the kernel.  Calculating expf(-x) as 1/expf(x) gives an extra
inaccuracy of about 1/exp(2) = 0.135 ulps.

Efficiency is currently too low when there are 2 calls to non-inline
kernels, so this method is slightly slower than the old method
(except using the faster expl() kernel on non-i386.  On i386 it and
also i387 exp are too slow due to non-null precision switches).  The
correct fix is unclear.  Hopefully just speed up the kernels.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Tue Jun 11 01:22:19 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 5299B413;
 Tue, 11 Jun 2013 01:22:19 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au
 [211.29.132.59])
 by mx1.freebsd.org (Postfix) with ESMTP id DCA5C1ED8;
 Tue, 11 Jun 2013 01:22:18 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id B175D1A06F7;
 Tue, 11 Jun 2013 11:22:10 +1000 (EST)
Date: Tue, 11 Jun 2013 11:22:09 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: Implementation for coshl.
In-Reply-To: <20130611040059.A17928@besplex.bde.org>
Message-ID: <20130611104252.G50308@besplex.bde.org>
References: <20130610003645.GA16444@troutmask.apl.washington.edu>
 <20130610110740.V24058@besplex.bde.org> <20130610055834.GA68643@zim.MIT.EDU>
 <20130610162927.GA20856@troutmask.apl.washington.edu>
 <20130610170603.GA72597@zim.MIT.EDU>
 <20130611040059.A17928@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=K8x6hFqI c=1 sm=1 a=LM0AswAWfpYA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=MtpAjnDH-vAA:10
 a=Udj031w0ckm0zOnfpgkA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
Cc: David Schultz <das@freebsd.org>, freebsd-numerics@freebsd.org,
 Steve Kargl <sgk@troutmask.apl.washington.edu>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Jun 2013 01:22:19 -0000

On Tue, 11 Jun 2013, Bruce Evans wrote:

> On Mon, 10 Jun 2013, David Schultz wrote:
>
>> On Mon, Jun 10, 2013, Steve Kargl wrote:
>...
>>> I originally had
>>> 
>>>                 if (ix < BIAS - 33) {   /* |x| < 0x1p-33 */
>>>                         /* cosh(+-0) = 1. */
>>>                         if (x == 0)
>>>                                 return (one);
>>>                         /* cosh(tiny) = one */
>>>                         return (one + tiny * tiny);
>>>                 }
>>> 
>>> because C99 requires cosh(+-0) to be exact.  For |x| != the
>>> return value will raise inexact.
> ...
> No, the above is quite broken.  tiny * tiny gives underflow, and
> when the underflow is to 0, adding 1 to it doesn't give inexact.
> Also, the 'tiny' in the comment is confusing, since it refers to
> a different variable than the 'tiny' in the code.
>
> Just copy the known-good fdlibm code.  It uses one+t, where t is
> expm1(fabs(x)).  This has many subtleties:
> - it avoids pessimizing the usual case
> - it moves all the complications for setting inexact to expm1().  These
>  include:
>  - the expression 1+x doesn't work for tiny x since it rounds incorrectly
>    in some modes.  It also requires x >= 0 for correct rounding, so
>    we may have to caclulate |x| earlier than we want.
>  - the expression 1+x*x doesn't work for tiny x since it may underflow
>    (just like 1+tiny*tiny)
>  - the expression x doesn't work for tiny x in expm1() since it doesn't
>    set inexact.
>  - the expression x+x*x doesn't work for tiny x in expm1() since it may
>    underflow.

Bah, this is more magic and broken than first appeared.  The fdlibm code
is now known-bad :-).  I stumbled across this when implementing the use of
k_expf() in ccoshf().

o One subtlety in the fdlibm cosh() is that it uses an artificially small
   threshold of 2**-55 to "ensure" that 1+this gives 1 with inexact
   (actually 1+eps in some rounding modes).  A threshold of about 2**26.5
   is sufficient for cosh() to decay to 1, but fdlibm wants to abuse the
   tiny value given by each arg below the threshold for setting inexact
   as a side effect of evaluating the expression 'one + expm1f(x)'.  This
   obviously requires a threshold of <= 2**-53 or 2**-54.  The value of
   2**-55 actually used makes no sense.
o I broke this in coshf() in 2005 when fixing its threshold.  The
   threshold of 2**-55 was blindly copied.  It should have been reduced
   to 2**-26 for a direct translation.  I changed it to the natural
   threshold of 2**-12.  This also gives a micro-optimization.  This
   made extra-precision bugs more apparent, and I fixed them by
   returning 'one' instead of 'one + expm1f(x)' for tiny x.  Returning
   'one' gives another micro-optimization.  But this weakens the support
   for non-default rounding modes.
o fdlibm cosh()'s large threshold is not large enough to actually work.
   'one + 2**-55' may be evaluated and returned in extra precision, and
   it only takes 2  or 3 bits extra for it to evaluate to itself instead
   of 1.  coshf()'s blindly copied threshold accidentally protected
   against using '1.0F + 2**-26F' which always evaluates to itself on
   i386, and is then normally returned unchanged.  Assertions like
   assert(cosh(tiny) == 1) should then fail, but cosh() may return
   extra precision so is also a bug in the assertions -- essentially
   the same one that I found recently for exp() non-overlow.  This
   is still a bug in cosh() which would be found by a more careful
   assertion -- cosh(x) may be evaluated in extra precision, but then
   its value must be ~1+x*x/2, not ~1+|x|.

FreeBSD's default rounding precision protects against these bugs in both
float and double precision provided the theshold is <= 2**-53 for both.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Thu Jun 13 11:12:07 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 3C72BCD6;
 Thu, 13 Jun 2013 11:12:07 +0000 (UTC)
 (envelope-from gavin@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 16CCF16E8;
 Thu, 13 Jun 2013 11:12:07 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r5DBC69l078803;
 Thu, 13 Jun 2013 11:12:06 GMT
 (envelope-from gavin@freefall.freebsd.org)
Received: (from gavin@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r5DBC66F078802;
 Thu, 13 Jun 2013 11:12:06 GMT (envelope-from gavin)
Date: Thu, 13 Jun 2013 11:12:06 GMT
Message-Id: <201306131112.r5DBC66F078802@freefall.freebsd.org>
To: gavin@FreeBSD.org, freebsd-standards@FreeBSD.org,
 freebsd-numerics@FreeBSD.org
From: gavin@FreeBSD.org
Subject: Re: standards/82654: C99 long double math functions are missing
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Jun 2013 11:12:07 -0000

Synopsis: C99 long double math functions are missing

Responsible-Changed-From-To: freebsd-standards->freebsd-numerics
Responsible-Changed-By: gavin
Responsible-Changed-When: Thu Jun 13 11:09:50 UTC 2013
Responsible-Changed-Why: 
Over to -numerics.  It looks like at least half of this has
already made it into releases, and the other half is in head, so
it may be that this PR can be put into "patched" state now.

http://www.freebsd.org/cgi/query-pr.cgi?pr=82654

From owner-freebsd-numerics@FreeBSD.ORG  Fri Jun 14 06:53:58 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 9392566C;
 Fri, 14 Jun 2013 06:53:58 +0000 (UTC) (envelope-from das@FreeBSD.ORG)
Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net
 [50.196.151.174])
 by mx1.freebsd.org (Postfix) with ESMTP id 79B81167E;
 Fri, 14 Jun 2013 06:53:57 +0000 (UTC)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
 by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r5E6rvBI084983;
 Thu, 13 Jun 2013 23:53:57 -0700 (PDT) (envelope-from das@FreeBSD.ORG)
Received: (from das@localhost)
 by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r5E6rvQ5084982;
 Thu, 13 Jun 2013 23:53:57 -0700 (PDT) (envelope-from das@FreeBSD.ORG)
Date: Thu, 13 Jun 2013 23:53:57 -0700
From: David Schultz <das@FreeBSD.ORG>
To: gavin@FreeBSD.org
Subject: Re: standards/82654: C99 long double math functions are missing
Message-ID: <20130614065357.GA84881@zim.MIT.EDU>
References: <201306131112.r5DBC66F078802@freefall.freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201306131112.r5DBC66F078802@freefall.freebsd.org>
Cc: freebsd-numerics@FreeBSD.org, freebsd-standards@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Jun 2013 06:53:58 -0000

On Thu, Jun 13, 2013, gavin@FreeBSD.org wrote:
> Synopsis: C99 long double math functions are missing
> 
> Responsible-Changed-From-To: freebsd-standards->freebsd-numerics
> Responsible-Changed-By: gavin
> Responsible-Changed-When: Thu Jun 13 11:09:50 UTC 2013
> Responsible-Changed-Why: 
> Over to -numerics.  It looks like at least half of this has
> already made it into releases, and the other half is in head, so
> it may be that this PR can be put into "patched" state now.
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=82654

I think someone mentioned wanting to keep this open as a tracking
bug, although I now have a wiki page for tracking the status of
this work, so it's probably fine to close.  kargl@ should do the
honors!

From owner-freebsd-numerics@FreeBSD.ORG  Sat Jun 15 17:26:15 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 933605E2
 for <freebsd-numerics@freebsd.org>; Sat, 15 Jun 2013 17:26:15 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 5FC4810FD
 for <freebsd-numerics@freebsd.org>; Sat, 15 Jun 2013 17:26:15 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r5FHQEfs048084
 for <freebsd-numerics@freebsd.org>; Sat, 15 Jun 2013 10:26:14 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r5FHQEaG048083
 for freebsd-numerics@freebsd.org; Sat, 15 Jun 2013 10:26:14 -0700 (PDT)
 (envelope-from sgk)
Date: Sat, 15 Jun 2013 10:26:14 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: freebsd-numerics@freebsd.org
Subject: FYI openlibm
Message-ID: <20130615172614.GA48063@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Jun 2013 17:26:15 -0000

Just FYI,

It seems that the openlibm project, which is a part of the
Julia language project, has grabbed msun for their OS 
independent libm.

https://github.com/JuliaLang/openlibm

A scan of the commit log suggests that the openlibm/Julia
developers are not making any improvements to the accuracy
or efficiency of libm.

-- 
Steve