From owner-freebsd-numerics@FreeBSD.ORG  Mon May 27 11:06:51 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 50EC4371
 for <freebsd-numerics@FreeBSD.org>; Mon, 27 May 2013 11:06:51 +0000 (UTC)
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 28A566D0
 for <freebsd-numerics@FreeBSD.org>; Mon, 27 May 2013 11:06:51 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r4RB6pSi016110
 for <freebsd-numerics@FreeBSD.org>; Mon, 27 May 2013 11:06:51 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r4RB6oWc016108
 for freebsd-numerics@FreeBSD.org; Mon, 27 May 2013 11:06:50 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 27 May 2013 11:06:50 GMT
Message-Id: <201305271106.r4RB6oWc016108@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
 owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@freebsd.org>
To: freebsd-numerics@FreeBSD.org
Subject: Current problem reports assigned to freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 May 2013 11:06:51 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o stand/175811 numerics   libstdc++ needs complex support in order use C99
o bin/170206   numerics   [msun] [patch] complex arcsinh, log, etc.

2 problems total.


From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 04:32:24 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 016CACE2;
 Tue, 28 May 2013 04:32:24 +0000 (UTC) (envelope-from das@freebsd.org)
Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net
 [50.196.151.174])
 by mx1.freebsd.org (Postfix) with ESMTP id D776875B;
 Tue, 28 May 2013 04:32:23 +0000 (UTC)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
 by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r4S4W8sJ012895;
 Mon, 27 May 2013 21:32:08 -0700 (PDT) (envelope-from das@freebsd.org)
Received: (from das@localhost)
 by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r4S4W5tG012894;
 Mon, 27 May 2013 21:32:05 -0700 (PDT) (envelope-from das@freebsd.org)
Date: Mon, 27 May 2013 21:32:05 -0700
From: David Schultz <das@freebsd.org>
To: Stephen Montgomery-Smith <stephen@missouri.edu>
Subject: Re: Use of C99 extra long double math functions after r236148
Message-ID: <20130528043205.GA3282@zim.MIT.EDU>
References: <500DAD41.5030104@missouri.edu>
 <20120724113214.G934@besplex.bde.org> <501204AD.30605@missouri.edu>
 <20120727032611.GB25690@server.rulingia.com>
 <20120728125824.GA26553@server.rulingia.com>
 <501460BB.30806@missouri.edu>
 <20120728231300.GA20741@server.rulingia.com>
 <50148F02.4020104@missouri.edu>
 <20120729222706.GA29048@server.rulingia.com>
 <5015BB9F.90807@missouri.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5015BB9F.90807@missouri.edu>
X-Mailman-Approved-At: Tue, 28 May 2013 11:20:14 +0000
Cc: Diane Bruce <db@db.net>, Bruce Evans <brde@optusnet.com.au>,
 John Baldwin <jhb@freebsd.org>, David Chisnall <theraven@freebsd.org>,
 freebsd-numerics@freebsd.org, Bruce Evans <bde@freebsd.org>,
 Steve Kargl <sgk@troutmask.apl.washington.edu>,
 Peter Jeremy <peter@rulingia.com>, Warner Losh <imp@bsdimp.com>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 04:32:24 -0000

On Sun, Jul 29, 2012, Stephen Montgomery-Smith wrote:
> Also I forgot that the real part of casinh(0+I*x) isn't always 0.  If 
> |x|>1, it is something non-zero.  And so you need to check that 
> creal(casinh(0+I*x)) and creal(casinh(-0+I*x)) have opposite signs in 
> this case.
> 
> > I'm less sure of the next logical
> > step, which is to check things like
> >    casinh(x + I*0) = asinh(x) + I*0
> 
> Does C99 mandate this?  My programs probably won't satisfy this, because 
> I realized that the computation works in these cases anyway.  Of course, 
> it would be easy to make it happen.

Hi Stephen,

I wrote some tests to cover the corner cases for the complex
inverse trig functions. They don't find any nontrivial bugs in
your implementations. :-) Now that you have a commit bit, would
you like to commit your code, or shall I?

Below is a diff of all the changes needed to integrate it. I have
a short list of style fixes, but otherwise I think what you have
is good:
  - wrap lines to 80 chars, please
  - spaces between operators
  - "static inline", not "inline static"
  - don't use "inline" on large functions

Index: lib/msun/Makefile
===================================================================
--- lib/msun/Makefile	(revision 251024)
+++ lib/msun/Makefile	(working copy)
@@ -105,7 +105,8 @@
 .endif
 
 # C99 complex functions
-COMMON_SRCS+=	s_ccosh.c s_ccoshf.c s_cexp.c s_cexpf.c \
+COMMON_SRCS+=	catrig.c catrigf.c \
+	s_ccosh.c s_ccoshf.c s_cexp.c s_cexpf.c \
 	s_cimag.c s_cimagf.c s_cimagl.c \
 	s_conj.c s_conjf.c s_conjl.c \
 	s_cproj.c s_cprojf.c s_creal.c s_crealf.c s_creall.c \
@@ -126,7 +127,7 @@
 INCS+=	fenv.h math.h
 
 MAN=	acos.3 acosh.3 asin.3 asinh.3 atan.3 atan2.3 atanh.3 \
-	ceil.3 ccos.3 ccosh.3 cexp.3 \
+	ceil.3 cacos.3 ccos.3 ccosh.3 cexp.3 \
 	cimag.3 copysign.3 cos.3 cosh.3 csqrt.3 erf.3 exp.3 fabs.3 fdim.3 \
 	feclearexcept.3 feenableexcept.3 fegetenv.3 \
 	fegetround.3 fenv.3 floor.3 \
@@ -144,6 +145,9 @@
 MLINKS+=atanh.3 atanhf.3
 MLINKS+=atan2.3 atan2f.3 atan2.3 atan2l.3 \
 	atan2.3 carg.3 atan2.3 cargf.3 atan2.3 cargl.3
+MLINKS+=cacos.3 cacosf.3 cacos.3 cacosh.3 cacos.3 cacoshf.3 \
+	cacos.3 casin.3 cacos.3 casinf.3 cacos.3 casinh.3 cacos.3 casinhf.3 \
+	cacos.3 catan.3 cacos.3 catanf.3 cacos.3 catanh.3 cacos.3 catanhf.3
 MLINKS+=ccos.3 ccosf.3 ccos.3 csin.3 ccos.3 csinf.3 ccos.3 ctan.3 ccos.3 ctanf.3
 MLINKS+=ccosh.3 ccoshf.3 ccosh.3 csinh.3 ccosh.3 csinhf.3 \
 	ccosh.3 ctanh.3 ccosh.3 ctanhf.3
Index: lib/msun/Symbol.map
===================================================================
--- lib/msun/Symbol.map	(revision 251024)
+++ lib/msun/Symbol.map	(working copy)
@@ -237,6 +237,18 @@
 	fegetround;
 	fesetround;
 	fesetenv;
+	cacos;
+	cacosf;
+	cacosh;
+	cacoshf;
+	casin;
+	casinf;
+	casinh;
+	casinhf;
+	catan;
+	catanf;
+	catanh;
+	catanhf;
 	csin;
 	csinf;
 	csinh;
Index: lib/msun/man/cacos.3
===================================================================
--- lib/msun/man/cacos.3	(revision 0)
+++ lib/msun/man/cacos.3	(working copy)
@@ -0,0 +1,128 @@
+.\" Copyright (c) 2013 David Schultz <das@FreeBSD.org>
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd May 27, 2013
+.Dt CACOS 3
+.Os
+.Sh NAME
+.Nm cacos ,
+.Nm cacosf ,
+.Nm cacosh ,
+.Nm cacoshf ,
+.Nm casin ,
+.Nm casinf
+.Nm casinh ,
+.Nm casinhf
+.Nm catan ,
+.Nm catanf
+.Nm catanh ,
+.Nm catanhf
+.Nd complex arc trigonometric and hyperbolic functions
+.Sh LIBRARY
+.Lb libm
+.Sh SYNOPSIS
+.In complex.h
+.Ft double complex
+.Fn cacos "double complex z"
+.Ft float complex
+.Fn cacosf "float complex z"
+.Ft double complex
+.Fn cacosh "double complex z"
+.Ft float complex
+.Fn cacoshf "float complex z"
+.Ft double complex
+.Fn casin "double complex z"
+.Ft float complex
+.Fn casinf "float complex z"
+.Ft double complex
+.Fn casinh "double complex z"
+.Ft float complex
+.Fn casinhf "float complex z"
+.Ft double complex
+.Fn catan "double complex z"
+.Ft float complex
+.Fn catanf "float complex z"
+.Ft double complex
+.Fn catanh "double complex z"
+.Ft float complex
+.Fn catanhf "float complex z"
+.Sh DESCRIPTION
+The
+.Fn cacos ,
+.Fn casin ,
+and
+.Fn catan
+functions compute the principal value of the inverse cosine, sine,
+and tangent of the complex number
+.Fa z ,
+respectively.
+The
+.Fn cacosh ,
+.Fn casinh ,
+and
+.Fn catanh
+functions compute the principal value of the inverse hyperbolic
+cosine, sine, and tangent, respectively.
+The
+.Fn cacosf ,
+.Fn casinf ,
+.Fn catanf
+.Fn cacoshf ,
+.Fn casinhf ,
+and
+.Fn catanhf
+functions perform the same operations in
+.Fa float
+precision.
+.Pp
+.ie '\*[.T]'utf8'
+.  ds Un \[cu]
+.el
+.  ds Un U
+.
+There is no universal convention for defining the principal values of
+these functions. The following table gives the branch cuts, and the
+corresponding ranges for the return values, adopted by the C language.
+.Bl -column ".Sy Function" ".Sy (-\*(If*I, -I) \*(Un (I, \*(If*I)" ".Sy [-\*(Pi/2*I, \*(Pi/2*I]"
+.It Sy Function Ta Sy Branch Cut(s) Ta Sy Range
+.It cacos Ta (-\*(If, -1) \*(Un (1, \*(If) Ta [0, \*(Pi]
+.It casin Ta (-\*(If, -1) \*(Un (1, \*(If) Ta [-\*(Pi/2, \*(Pi/2]
+.It catan Ta (-\*(If*I, -i) \*(Un (I, \*(If*I) Ta [-\*(Pi/2, \*(Pi/2]
+.It cacosh Ta (-\*(If, 1) Ta [-\*(Pi*I, \*(Pi*I]
+.It casinh Ta (-\*(If*I, -i) \*(Un (I, \*(If*I) Ta [-\*(Pi/2*I, \*(Pi/2*I]
+.It catanh Ta (-\*(If, -1) \*(Un (1, \*(If) Ta [-\*(Pi/2*I, \*(Pi/2*I]
+.El
+.Sh SEE ALSO
+.Xr cacosh 3 ,
+.Xr ccosh 3 ,
+.Xr complex 3 ,
+.Xr cos 3 ,
+.Xr math 3 ,
+.Xr sin 3 ,
+.Xr tan 3
+.Sh STANDARDS
+These functions conform to
+.St -isoC-99 .

Property changes on: lib/msun/man/cacos.3
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Index: lib/msun/man/ccos.3
===================================================================
--- lib/msun/man/ccos.3	(revision 251024)
+++ lib/msun/man/ccos.3	(working copy)
@@ -69,6 +69,7 @@
 .Fa float
 precision.
 .Sh SEE ALSO
+.Xr cacos 3 ,
 .Xr ccosh 3 ,
 .Xr complex 3 ,
 .Xr cos 3 ,
Index: lib/msun/man/ccosh.3
===================================================================
--- lib/msun/man/ccosh.3	(revision 251024)
+++ lib/msun/man/ccosh.3	(working copy)
@@ -69,6 +69,7 @@
 .Fa float
 precision.
 .Sh SEE ALSO
+.Xr cacosh 3 ,
 .Xr ccos 3 ,
 .Xr complex 3 ,
 .Xr cosh 3 ,
Index: lib/msun/man/complex.3
===================================================================
--- lib/msun/man/complex.3	(revision 251024)
+++ lib/msun/man/complex.3	(working copy)
@@ -89,6 +89,12 @@
 .\" Section 7.3.5-6 of ISO C99 standard
 .Ss Trigonometric and Hyperbolic Functions
 .Cl
+cacos	arc cosine
+cacosh	arc hyperbolic cosine
+casin	arc sine
+casinh	arc hyperbolic sine
+catan	arc tangent
+catanh	arc hyperbolic tangent
 ccos	cosine
 ccosh	hyperbolic cosine
 csin	sine
@@ -111,20 +117,8 @@
 functions described here conform to
 .St -isoC-99 .
 .Sh BUGS
-The inverse trigonometric and hyperbolic functions
-.Fn cacos ,
-.Fn cacosh ,
-.Fn casin ,
-.Fn casinh ,
-.Fn catan ,
-and
-.Fn catanh
-are not implemented.
-.Pp
 The logarithmic functions
 .Fn clog
-are not implemented.
-.Pp
-The power functions
+and the power functions
 .Fn cpow
 are not implemented.
Index: tools/regression/lib/msun/Makefile
===================================================================
--- tools/regression/lib/msun/Makefile	(revision 251024)
+++ tools/regression/lib/msun/Makefile	(working copy)
@@ -2,7 +2,8 @@
 
 TESTS=	test-cexp test-conj test-csqrt test-ctrig \
 	test-exponential test-fenv test-fma \
-	test-fmaxmin test-ilogb test-invtrig test-logarithm test-lrint \
+	test-fmaxmin test-ilogb test-invtrig test-invctrig \
+	test-logarithm test-lrint \
 	test-lround test-nan test-nearbyint test-next test-rem test-trig
 CFLAGS+= -O0 -lm
 
Index: tools/regression/lib/msun/test-invctrig.c
===================================================================
--- tools/regression/lib/msun/test-invctrig.c	(revision 0)
+++ tools/regression/lib/msun/test-invctrig.c	(working copy)
@@ -0,0 +1,467 @@
+/*-
+ * Copyright (c) 2008-2013 David Schultz <das@FreeBSD.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+/*
+ * Tests for casin[h](), cacos[h](), and catan[h]().
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <assert.h>
+#include <complex.h>
+#include <fenv.h>
+#include <float.h>
+#include <math.h>
+#include <stdio.h>
+
+#define	ALL_STD_EXCEPT	(FE_DIVBYZERO | FE_INEXACT | FE_INVALID | \
+			 FE_OVERFLOW | FE_UNDERFLOW)
+#define	OPT_INVALID	(ALL_STD_EXCEPT & ~FE_INVALID)
+#define	OPT_INEXACT	(ALL_STD_EXCEPT & ~FE_INEXACT)
+#define	FLT_ULP()	ldexpl(1.0, 1 - FLT_MANT_DIG)
+#define	DBL_ULP()	ldexpl(1.0, 1 - DBL_MANT_DIG)
+#define	LDBL_ULP()	ldexpl(1.0, 1 - LDBL_MANT_DIG)
+
+#pragma STDC FENV_ACCESS	ON
+#pragma	STDC CX_LIMITED_RANGE	OFF
+
+/*
+ * XXX gcc implements complex multiplication incorrectly. In
+ * particular, it implements it as if the CX_LIMITED_RANGE pragma
+ * were ON. Consequently, we need this function to form numbers
+ * such as x + INFINITY * I, since gcc evalutes INFINITY * I as
+ * NaN + INFINITY * I.
+ */
+static inline long double complex
+cpackl(long double x, long double y)
+{
+	long double complex z;
+
+	__real__ z = x;
+	__imag__ z = y;
+	return (z);
+}
+
+/* Flags that determine whether to check the signs of the result. */
+#define	CS_REAL	1
+#define	CS_IMAG	2
+#define	CS_BOTH	(CS_REAL | CS_IMAG)
+
+#ifdef	DEBUG
+#define	debug(...)	printf(__VA_ARGS__)
+#else
+#define	debug(...)	(void)0
+#endif
+
+/*
+ * Test that a function returns the correct value and sets the
+ * exception flags correctly. The exceptmask specifies which
+ * exceptions we should check. We need to be lenient for several
+ * reasons, but mainly because on some architectures it's impossible
+ * to raise FE_OVERFLOW without raising FE_INEXACT.
+ *
+ * These are macros instead of functions so that assert provides more
+ * meaningful error messages.
+ *
+ * XXX The volatile here is to avoid gcc's bogus constant folding and work
+ *     around the lack of support for the FENV_ACCESS pragma.
+ */
+#define	test_p(func, z, result, exceptmask, excepts, checksign)	do {	\
+	volatile long double complex _d = z;				\
+	debug("  testing %s(%Lg + %Lg I) == %Lg + %Lg I\n", #func,	\
+	    creall(_d), cimagl(_d), creall(result), cimagl(result));	\
+	assert(feclearexcept(FE_ALL_EXCEPT) == 0);			\
+	assert(cfpequal((func)(_d), (result), (checksign)));		\
+	assert(((func), fetestexcept(exceptmask) == (excepts)));	\
+} while (0)
+
+/*
+ * Test within a given tolerance.  The tolerance indicates relative error
+ * in ulps.
+ */
+#define	test_p_tol(func, z, result, tol)			do {	\
+	volatile long double complex _d = z;				\
+	debug("  testing %s(%Lg + %Lg I) ~= %Lg + %Lg I\n", #func,	\
+	    creall(_d), cimagl(_d), creall(result), cimagl(result));	\
+	assert(cfpequal_tol((func)(_d), (result), (tol)));		\
+} while (0)
+
+/* These wrappers apply the identities f(conj(z)) = conj(f(z)). */
+#define	test(func, z, result, exceptmask, excepts, checksign)	do {	\
+	test_p(func, z, result, exceptmask, excepts, checksign);	\
+	test_p(func, conjl(z), conjl(result), exceptmask, excepts, checksign); \
+} while (0)
+#define	test_tol(func, z, result, tol)				do {	\
+	test_p_tol(func, z, result, tol);				\
+	test_p_tol(func, conjl(z), conjl(result), tol);			\
+} while (0)
+
+/* Test the given function in all precisions. */
+#define	testall(func, x, result, exceptmask, excepts, checksign) do {	\
+	test(func, x, result, exceptmask, excepts, checksign);		\
+	test(func##f, x, result, exceptmask, excepts, checksign);	\
+} while (0)
+#define	testall_odd(func, x, result, exceptmask, excepts, checksign) do { \
+	testall(func, x, result, exceptmask, excepts, checksign);	\
+	testall(func, -(x), -result, exceptmask, excepts, checksign);	\
+} while (0)
+#define	testall_even(func, x, result, exceptmask, excepts, checksign) do { \
+	testall(func, x, result, exceptmask, excepts, checksign);	\
+	testall(func, -(x), result, exceptmask, excepts, checksign);	\
+} while (0)
+
+/*
+ * Test the given function in all precisions, within a given tolerance.
+ * The tolerance is specified in ulps.
+ */
+#define	testall_tol(func, x, result, tol)	       		   do { \
+	test_tol(func, x, result, (tol) * DBL_ULP());			\
+	test_tol(func##f, x, result, (tol) * FLT_ULP());		\
+} while (0)
+#define	testall_odd_tol(func, x, result, tol)	       		   do { \
+	testall_tol(func, x, result, tol);				\
+	testall_tol(func, -(x), -result, tol);				\
+} while (0)
+#define	testall_even_tol(func, x, result, tol)	       		   do { \
+	testall_tol(func, x, result, tol);				\
+	testall_tol(func, -(x), result, tol);				\
+} while (0)
+
+static const long double
+pi = 3.14159265358979323846264338327950280L,
+c3pi = 9.42477796076937971538793014983850839L;
+
+/*
+ * Determine whether x and y are equal, with two special rules:
+ *	+0.0 != -0.0
+ *	 NaN == NaN
+ * If checksign is 0, we compare the absolute values instead.
+ */
+static int
+fpequal(long double x, long double y, int checksign)
+{
+	if (isnan(x) && isnan(y))
+		return (1);
+	if (checksign)
+		return (x == y && !signbit(x) == !signbit(y));
+	else
+		return (fabsl(x) == fabsl(y));
+}
+
+static int
+fpequal_tol(long double x, long double y, long double tol)
+{
+	fenv_t env;
+	int ret;
+
+	if (isnan(x) && isnan(y))
+		return (1);
+	if (!signbit(x) != !signbit(y))
+		return (0);
+	if (x == y)
+		return (1);
+	if (tol == 0 || y == 0.0)
+		return (0);
+
+	/* Hard case: need to check the tolerance. */
+	feholdexcept(&env);
+	ret = fabsl(x - y) <= fabsl(y * tol);
+	fesetenv(&env);
+	return (ret);
+}
+
+static int
+cfpequal(long double complex x, long double complex y, int checksign)
+{
+	return (fpequal(creal(x), creal(y), checksign & CS_REAL)
+		&& fpequal(cimag(x), cimag(y), checksign & CS_IMAG));
+}
+
+static int
+cfpequal_tol(long double complex x, long double complex y, long double tol)
+{
+	return (fpequal_tol(creal(x), creal(y), tol)
+		&& fpequal_tol(cimag(x), cimag(y), tol));
+}
+
+
+/* Tests for 0 */
+void
+test_zero(void)
+{
+	long double complex zero = cpackl(0.0, 0.0);
+
+	testall_tol(cacosh, zero, cpackl(0.0, pi / 2), 1);
+	testall_tol(cacosh, -zero, cpackl(0.0, -pi / 2), 1);
+	testall_tol(cacos, zero, cpackl(pi / 2, -0.0), 1);
+	testall_tol(cacos, -zero, cpackl(pi / 2, 0.0), 1);
+
+	testall_odd(casinh, zero, zero, ALL_STD_EXCEPT, 0, CS_BOTH);
+	testall_odd(casin, zero, zero, ALL_STD_EXCEPT, 0, CS_BOTH);
+
+	testall_odd(catanh, zero, zero, ALL_STD_EXCEPT, 0, CS_BOTH);
+	testall_odd(catan, zero, zero, ALL_STD_EXCEPT, 0, CS_BOTH);
+}
+
+/*
+ * Tests for NaN inputs.
+ */
+void
+test_nan()
+{
+	long double complex nan_nan = cpackl(NAN, NAN);
+	long double complex z;
+
+	/*
+	 * IN		CACOSH	    CACOS	CASINH	    CATANH
+	 * NaN,NaN	NaN,NaN	    NaN,NaN	NaN,NaN	    NaN,NaN
+	 * finite,NaN	NaN,NaN*    NaN,NaN*	NaN,NaN*    NaN,NaN*
+	 * NaN,finite   NaN,NaN*    NaN,NaN*	NaN,NaN*    NaN,NaN*
+	 * NaN,Inf	Inf,NaN     NaN,-Inf	?Inf,NaN    ?0,pi/2	
+	 * +-Inf,NaN	Inf,NaN     NaN,?Inf	+-Inf,NaN   +-0,NaN
+	 * +-0,NaN	NaN,NaN*    pi/2,NaN	NaN,NaN*    +-0,NaN
+	 * NaN,0	NaN,NaN*    NaN,NaN*	NaN,0	    NaN,NaN*
+	 *
+	 *  * = raise invalid
+	 */
+	z = nan_nan;
+	testall(cacosh, z, nan_nan, ALL_STD_EXCEPT, 0, 0);
+	testall(cacos, z, nan_nan, ALL_STD_EXCEPT, 0, 0);
+	testall(casinh, z, nan_nan, ALL_STD_EXCEPT, 0, 0);
+	testall(casin, z, nan_nan, ALL_STD_EXCEPT, 0, 0);
+	testall(catanh, z, nan_nan, ALL_STD_EXCEPT, 0, 0);
+	testall(catan, z, nan_nan, ALL_STD_EXCEPT, 0, 0);
+
+	z = cpackl(0.5, NAN);
+	testall(cacosh, z, nan_nan, OPT_INVALID, 0, 0);
+	testall(cacos, z, nan_nan, OPT_INVALID, 0, 0);
+	testall(casinh, z, nan_nan, OPT_INVALID, 0, 0);
+	testall(casin, z, nan_nan, OPT_INVALID, 0, 0);
+	testall(catanh, z, nan_nan, OPT_INVALID, 0, 0);
+	testall(catan, z, nan_nan, OPT_INVALID, 0, 0);
+
+	z = cpackl(NAN, 0.5);
+	testall(cacosh, z, nan_nan, OPT_INVALID, 0, 0);
+	testall(cacos, z, nan_nan, OPT_INVALID, 0, 0);
+	testall(casinh, z, nan_nan, OPT_INVALID, 0, 0);
+	testall(casin, z, nan_nan, OPT_INVALID, 0, 0);
+	testall(catanh, z, nan_nan, OPT_INVALID, 0, 0);
+	testall(catan, z, nan_nan, OPT_INVALID, 0, 0);
+
+	z = cpackl(NAN, INFINITY);
+	testall(cacosh, z, cpackl(INFINITY, NAN), ALL_STD_EXCEPT, 0, CS_REAL);
+	testall(cacosh, -z, cpackl(INFINITY, NAN), ALL_STD_EXCEPT, 0, CS_REAL);
+	testall(cacos, z, cpackl(NAN, -INFINITY), ALL_STD_EXCEPT, 0, CS_IMAG);
+	testall(casinh, z, cpackl(INFINITY, NAN), ALL_STD_EXCEPT, 0, 0);
+	testall(casin, z, cpackl(NAN, INFINITY), ALL_STD_EXCEPT, 0, CS_IMAG);
+	testall_tol(catanh, z, cpackl(0.0, pi / 2), 1);
+	testall(catan, z, cpackl(NAN, 0.0), ALL_STD_EXCEPT, 0, CS_IMAG);
+
+	z = cpackl(INFINITY, NAN);
+	testall_even(cacosh, z, cpackl(INFINITY, NAN), ALL_STD_EXCEPT, 0,
+		     CS_REAL);
+	testall_even(cacos, z, cpackl(NAN, INFINITY), ALL_STD_EXCEPT, 0, 0);
+	testall_odd(casinh, z, cpackl(INFINITY, NAN), ALL_STD_EXCEPT, 0,
+		    CS_REAL);
+	testall_odd(casin, z, cpackl(NAN, INFINITY), ALL_STD_EXCEPT, 0, 0);
+	testall_odd(catanh, z, cpackl(0.0, NAN), ALL_STD_EXCEPT, 0, CS_REAL);
+	testall_odd_tol(catan, z, cpackl(pi / 2, 0.0), 1);
+
+	z = cpackl(0.0, NAN);
+        /* XXX We allow a spurious inexact exception here. */
+	testall_even(cacosh, z, nan_nan, OPT_INVALID & ~FE_INEXACT, 0, 0);
+	testall_even_tol(cacos, z, cpackl(pi / 2, NAN), 1);
+	testall_odd(casinh, z, nan_nan, OPT_INVALID, 0, 0);
+	testall_odd(casin, z, cpackl(0.0, NAN), ALL_STD_EXCEPT, 0, CS_REAL);
+	testall_odd(catanh, z, cpackl(0.0, NAN), OPT_INVALID, 0, CS_REAL);
+	testall_odd(catan, z, nan_nan, OPT_INVALID, 0, 0);
+
+	z = cpackl(NAN, 0.0);
+	testall(cacosh, z, nan_nan, OPT_INVALID, 0, 0);
+	testall(cacos, z, nan_nan, OPT_INVALID, 0, 0);
+	testall(casinh, z, cpackl(NAN, 0), ALL_STD_EXCEPT, 0, CS_IMAG);
+	testall(casin, z, nan_nan, OPT_INVALID, 0, 0);
+	testall(catanh, z, nan_nan, OPT_INVALID, 0, CS_IMAG);
+	testall(catan, z, cpackl(NAN, 0.0), ALL_STD_EXCEPT, 0, 0);
+}
+
+void
+test_inf(void)
+{
+	long double complex z;
+
+	/*
+	 * IN		CACOSH	    CACOS	CASINH	    CATANH
+	 * Inf,Inf	Inf,pi/4    pi/4,-Inf	Inf,pi/4    0,pi/2
+	 * -Inf,Inf	Inf,3pi/4   3pi/4,-Inf	---	    ---
+	 * Inf,finite	Inf,0	    0,-Inf	Inf,0	    0,pi/2
+	 * -Inf,finite	Inf,pi      pi,-Inf	---	    ---
+	 * finite,Inf	Inf,pi/2    pi/2,-Inf	Inf,pi/2    0,pi/2
+	 */
+	z = cpackl(INFINITY, INFINITY);
+	testall_tol(cacosh, z, cpackl(INFINITY, pi / 4), 1);
+	testall_tol(cacosh, -z, cpackl(INFINITY, -c3pi / 4), 1);
+	testall_tol(cacos, z, cpackl(pi / 4, -INFINITY), 1);
+	testall_tol(cacos, -z, cpackl(c3pi / 4, INFINITY), 1);
+	testall_odd_tol(casinh, z, cpackl(INFINITY, pi / 4), 1);
+	testall_odd_tol(casin, z, cpackl(pi / 4, INFINITY), 1);
+	testall_odd_tol(catanh, z, cpackl(0, pi / 2), 1);
+	testall_odd_tol(catan, z, cpackl(pi / 2, 0), 1);
+
+	z = cpackl(INFINITY, 0.5);
+	/* XXX We allow a spurious inexact exception here. */
+	testall(cacosh, z, cpackl(INFINITY, 0), OPT_INEXACT, 0, CS_BOTH);
+	testall_tol(cacosh, -z, cpackl(INFINITY, -pi), 1);
+	testall(cacos, z, cpackl(0, -INFINITY), OPT_INEXACT, 0, CS_BOTH);
+	testall_tol(cacos, -z, cpackl(pi, INFINITY), 1);
+	testall_odd(casinh, z, cpackl(INFINITY, 0), OPT_INEXACT, 0, CS_BOTH);
+	testall_odd_tol(casin, z, cpackl(pi / 2, INFINITY), 1);
+	testall_odd_tol(catanh, z, cpackl(0, pi / 2), 1);
+	testall_odd_tol(catan, z, cpackl(pi / 2, 0), 1);
+
+	z = cpackl(0.5, INFINITY);
+	testall_tol(cacosh, z, cpackl(INFINITY, pi / 2), 1);
+	testall_tol(cacosh, -z, cpackl(INFINITY, -pi / 2), 1);
+	testall_tol(cacos, z, cpackl(pi / 2, -INFINITY), 1);
+	testall_tol(cacos, -z, cpackl(pi / 2, INFINITY), 1);
+	testall_odd_tol(casinh, z, cpackl(INFINITY, pi / 2), 1);
+	/* XXX We allow a spurious inexact exception here. */
+	testall_odd(casin, z, cpackl(0.0, INFINITY), OPT_INEXACT, 0, CS_BOTH);
+	testall_odd_tol(catanh, z, cpackl(0, pi / 2), 1);
+	testall_odd_tol(catan, z, cpackl(pi / 2, 0), 1);
+}
+
+/* Tests along the real and imaginary axes. */
+void
+test_axes(void)
+{
+	static const long double nums[] = {
+		-2, -1, -0.5, 0.5, 1, 2
+	};
+	long double complex z;
+	int i;
+
+	for (i = 0; i < sizeof(nums) / sizeof(nums[0]); i++) {
+		/* Real axis */
+		z = cpackl(nums[i], 0.0);
+		if (fabs(nums[i]) <= 1) {
+			testall_tol(cacosh, z, cpackl(0.0, acos(nums[i])), 1);
+			testall_tol(cacos, z, cpackl(acosl(nums[i]), -0.0), 1);
+			testall_tol(casin, z, cpackl(asinl(nums[i]), 0.0), 1);
+			testall_tol(catanh, z, cpackl(atanh(nums[i]), 0.0), 1);
+		} else {
+			testall_tol(cacosh, z,
+				    cpackl(acosh(fabs(nums[i])),
+					   (nums[i] < 0) ? pi : 0), 1);
+			testall_tol(cacos, z,
+				    cpackl((nums[i] < 0) ? pi : 0,
+					   -acosh(fabs(nums[i]))), 1);
+			testall_tol(casin, z,
+				    cpackl(copysign(pi / 2, nums[i]),
+					   acosh(fabs(nums[i]))), 1);
+			testall_tol(catanh, z,
+				    cpackl(atanh(1 / nums[i]), pi / 2), 1);
+		}
+		testall_tol(casinh, z, cpackl(asinh(nums[i]), 0.0), 1);
+		testall_tol(catan, z, cpackl(atan(nums[i]), 0), 1);
+
+		/* TODO: Test the imaginary axis. */
+	}
+}
+
+void
+test_small(void)
+{
+	/*
+	 * z =  0.75 + i 0.25
+	 *     acos(z) = Pi/4 - i ln(2)/2
+	 *     asin(z) = Pi/4 + i ln(2)/2
+	 *     atan(z) = atan(4)/2 + i ln(17/9)/4
+	 */
+	static const struct {
+		long double a, b;
+		long double acos_a, acos_b;
+		long double asin_a, asin_b;
+		long double atan_a, atan_b;
+	} tests[] = {
+		{  0.75L,
+		   0.25L,
+		   pi / 4,
+		   -0.34657359027997265470861606072908828L,
+		   pi / 4,
+		   0.34657359027997265470861606072908828L,
+		   0.66290883183401623252961960521423782L,
+		   0.15899719167999917436476103600701878L },
+	};
+	long double complex z;
+	int i;
+
+	for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
+		z = cpackl(tests[i].a, tests[i].b);
+		testall_tol(cacos, z,
+		    cpackl(tests[i].acos_a, tests[i].acos_b), 2);
+		testall_odd_tol(casin, z,
+		    cpackl(tests[i].asin_a, tests[i].asin_b), 2);
+		testall_odd_tol(catan, z,
+		    cpackl(tests[i].atan_a, tests[i].atan_b), 2);
+        }
+}
+
+/* Test inputs that might cause overflow in a sloppy implementation. */
+void
+test_large(void)
+{
+
+	/* TODO: Write these tests */
+}
+
+int
+main(int argc, char *argv[])
+{
+
+	printf("1..6\n");
+
+	test_zero();
+	printf("ok 1 - invctrig zero\n");
+
+	test_nan();
+	printf("ok 2 - invctrig nan\n");
+
+	test_inf();
+	printf("ok 3 - invctrig inf\n");
+
+	test_axes();
+	printf("ok 4 - invctrig axes\n");
+
+	test_small();
+	printf("ok 5 - invctrig small\n");
+
+	test_large();
+	printf("ok 6 - invctrig large\n");
+
+	return (0);
+}

Property changes on: tools/regression/lib/msun/test-invctrig.c
___________________________________________________________________
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property


From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 05:57:36 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 799CCC38;
 Tue, 28 May 2013 05:57:36 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
 [211.29.132.185])
 by mx1.freebsd.org (Postfix) with ESMTP id EB3E8A5A;
 Tue, 28 May 2013 05:57:35 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4S5v7R0015276
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Tue, 28 May 2013 15:57:08 +1000
Date: Tue, 28 May 2013 15:57:07 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: David Schultz <das@freebsd.org>
Subject: Re: Use of C99 extra long double math functions after r236148
In-Reply-To: <20130528043205.GA3282@zim.MIT.EDU>
Message-ID: <20130528150808.F1298@besplex.bde.org>
References: <500DAD41.5030104@missouri.edu>
 <20120724113214.G934@besplex.bde.org>
 <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com>
 <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu>
 <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu>
 <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu>
 <20130528043205.GA3282@zim.MIT.EDU>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=e4Ne0tV/ c=1 sm=1 a=O6A2dy7pM2IA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Zc3-fPm5GV4A:10
 a=3K6gk9kpRNNbVDm7pYwA:9 a=CjuIK1q_8ugA:10 a=Wy-Xl9HimQZDeEWb:21
 a=nZJVZjyGvH_yZe87:21 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
X-Mailman-Approved-At: Tue, 28 May 2013 11:40:18 +0000
Cc: Diane Bruce <db@db.net>, Bruce Evans <brde@optusnet.com.au>,
 John Baldwin <jhb@freebsd.org>, David Chisnall <theraven@freebsd.org>,
 Stephen Montgomery-Smith <stephen@missouri.edu>, freebsd-numerics@freebsd.org,
 Bruce Evans <bde@freebsd.org>, Steve Kargl <sgk@troutmask.apl.washington.edu>,
 Peter Jeremy <peter@rulingia.com>, Warner Losh <imp@bsdimp.com>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 05:57:36 -0000

On Mon, 27 May 2013, David Schultz wrote:

> I wrote some tests to cover the corner cases for the complex
> inverse trig functions. They don't find any nontrivial bugs in
> your implementations. :-) Now that you have a commit bit, would
> you like to commit your code, or shall I?
>
> Below is a diff of all the changes needed to integrate it. I have
> a short list of style fixes, but otherwise I think what you have
> is good:
>  - wrap lines to 80 chars, please
>  - spaces between operators
>  - "static inline", not "inline static"
>  - don't use "inline" on large functions

indent(1) fixes the spaces between operators fairly well, without finding
many other problems or adding many.  It didn't find [m]any long lines
(but it doesn't understand its own line length limit).

Here are my local patches.  Just a few that were not integrated by
Stephen after we stopped working on it last October.

@ diff -u2 catrigf.c~ catrigf.c
@ --- catrigf.c~	2012-09-22 21:13:50.000000000 +0000
@ +++ catrigf.c	2012-09-22 21:35:51.287614000 +0000
@ @@ -353,12 +353,7 @@
@  	}
@ 
@ -	if (ax == 1 && ay < FLT_EPSILON) {
@ -#if 0
@ -		if (ay > 2*FLT_MIN)
@ -			rx = - logf(ay/2) / 2;
@ -		else
@ -#endif
@ -			rx = - (logf(ay) - m_ln2) / 2;
@ -	} else
@ +	if (ax == 1 && ay < FLT_EPSILON)
@ +		rx = - (logf(ay) - m_ln2) / 2;
@ +	else
@  		rx = log1pf(4*ax / sum_squares(ax-1, ay)) / 4;
@

This is in catrig.c, but catrigf.c wasn't regenerated from catrig.c, and
the scripts for the generation and their support file are no longer in
stephen's public_html directory.

@ diff -u2 catrigl.c~ catrigl.c
@ --- catrigl.c~	2012-09-22 21:14:24.000000000 +0000
@ +++ catrigl.c	2013-05-26 08:46:10.423187000 +0000
@ @@ -50,4 +50,6 @@
@  #define signbit(x)	(__builtin_signbitl(x)) 
@ 
@ +long double atanhl(long double);
@ +
@  static const long double
@  A_crossover =		10,

catrigl.c depends on atanhl(), logl() and log1pl() existing.  Stephen
has a not-very-dummy version of s_atanhl.c in this public_html
directory.  This needs a more direct conversion from the fdlibm
e_atanhl.c to be of commit quality.  I recently started testing with
it, and use my own logl().  Previously this patch had to change the
atanhl() call to atanh() to for catrigl.c to be usable.  I haven't
tested the long double complex functions for anything except efficiency
and consistency with the plain double complex functions yet, so my
tests don't should any difference from switching to atanhl().  They
just show that atanhl() is consistent in its limited use in catrigl.c.
I also haven't tested atanhl() as a real function.

Strangely, catrigl.c gives complex acoshl() and asinhl() without needing
real acoshl() and asinhl().  The real inverse hyperbolic trig functions
seem to be just as easy as the real inverse trig functions, but you
only converted the latter from the fdlibm versions to create the long
double versions.  Hopefully they are all as easy to translate
e_atanhl.c.

@ @@ -60,6 +62,6 @@
@  #if LDBL_MANT_DIG == 64
@  static const union IEEEl2bits
@ -um_e =		LD80C(0xadf85458a2bb4a9b,  1, 0, 2.71828182845904523536e0L),
@ -um_ln2 =	LD80C(0xb17217f7d1cf79ac, -1, 0, 6.93147180559945309417e-1L);
@ +um_e =		LD80C(0xadf85458a2bb4a9b,  1, 2.71828182845904523536e+0L),
@ +um_ln2 =	LD80C(0xb17217f7d1cf79ac, -1, 6.93147180559945309417e-1L);
@  #define		m_e	um_e.e
@  #define		m_ln2	um_ln2.e

Keep up with API changes.

@ @@ -348,5 +350,5 @@
@ 
@  	if (y == 0 && ax <= 1)
@ -		return (cpackl(atanhl(x), y)); 	/* XXX need atanhl() */
@ +		return (cpackl(atanh(x), y)); 	/* XXX need atanhl() */
@ 
@  	if (x == 0)

The comment doesn't apply if this file is actually usable.  Don't forget
to remove it before committing.

@ @@ -369,12 +371,7 @@
@  	}
@ 
@ -	if (ax == 1 && ay < LDBL_EPSILON) {
@ -#if 0
@ -		if (ay > 2*LDBL_MIN)
@ -			rx = - logl(ay/2) / 2;
@ -		else
@ -#endif
@ -			rx = - (logl(ay) - m_ln2) / 2;
@ -	} else
@ +	if (ax == 1 && ay < LDBL_EPSILON)
@ +		rx = - (logl(ay) - m_ln2) / 2;
@ +	else
@  		rx = log1pl(4*ax / sum_squares(ax-1, ay)) / 4;
@

Should be obtained by regeneration, as for catrigf.c.

Back to your changes...  They mostly look good, as usual...

% Index: tools/regression/lib/msun/test-invctrig.c
% ===================================================================
% --- tools/regression/lib/msun/test-invctrig.c	(revision 0)
% +++ tools/regression/lib/msun/test-invctrig.c	(working copy)
% @@ -0,0 +1,467 @@
% ....
% +#pragma STDC FENV_ACCESS	ON
% +#pragma	STDC CX_LIMITED_RANGE	OFF

Heheh, style rules for #pragma.  I like the old rule which says that
it should be indented 6 feet under.  It is still almost useless, since
we don't even have any C99 compilers than implement the fenv pragmas
yet.

% +/*
% + * XXX gcc implements complex multiplication incorrectly. In
% + * particular, it implements it as if the CX_LIMITED_RANGE pragma
% + * were ON. Consequently, we need this function to form numbers
% + * such as x + INFINITY * I, since gcc evalutes INFINITY * I as
% + * NaN + INFINITY * I.
% + */
% +static inline long double complex
% +cpackl(long double x, long double y)
% +{
% +	long double complex z;
% +
% +	__real__ z = x;
% +	__imag__ z = y;
% +	return (z);
% +}

Why duplicate this?  I guess it is because math_private,h is hard to
include.  I use complicated conditionals (mostly switches on
$(uname -p) and $(hostname) in shell scripts to locate it when
compiling from external directories.

The tests seem to be compiled with -O0.  That tests a different
environment than the usual runtime one, and in particular misses seeing
most precision bugs.  I mostly test with -O (-O2 with gcc is slower
and even harder to debug, while with clang it makes little difference),
but switch to -O0 to debug.  -g -O is now almost unusable because -O
optimizes away dead variables and -g is broken in many cases (sometimes
it can't even show live variables).

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 06:14:47 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id CEB00104;
 Tue, 28 May 2013 06:14:47 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from fallbackmx08.syd.optusnet.com.au
 (fallbackmx08.syd.optusnet.com.au [211.29.132.10])
 by mx1.freebsd.org (Postfix) with ESMTP id 63285AF7;
 Tue, 28 May 2013 06:14:46 +0000 (UTC)
Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au
 [211.29.132.191])
 by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
 r4S6EWp4011570; Tue, 28 May 2013 16:14:32 +1000
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4S6EDZX013018
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Tue, 28 May 2013 16:14:14 +1000
Date: Tue, 28 May 2013 16:14:13 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: David Schultz <das@freebsd.org>
Subject: Re: Use of C99 extra long double math functions after r236148
In-Reply-To: <20130528043205.GA3282@zim.MIT.EDU>
Message-ID: <20130528155933.V1298@besplex.bde.org>
References: <500DAD41.5030104@missouri.edu>
 <20120724113214.G934@besplex.bde.org>
 <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com>
 <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu>
 <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu>
 <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu>
 <20130528043205.GA3282@zim.MIT.EDU>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=e/de0tV/ c=1 sm=1 a=O6A2dy7pM2IA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Zc3-fPm5GV4A:10
 a=MGBSo3QMWewO758DKTcA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
X-Mailman-Approved-At: Tue, 28 May 2013 11:40:29 +0000
Cc: Diane Bruce <db@db.net>, Bruce Evans <brde@optusnet.com.au>,
 John Baldwin <jhb@freebsd.org>, David Chisnall <theraven@freebsd.org>,
 Stephen Montgomery-Smith <stephen@missouri.edu>, freebsd-numerics@freebsd.org,
 Bruce Evans <bde@freebsd.org>, Steve Kargl <sgk@troutmask.apl.washington.edu>,
 Peter Jeremy <peter@rulingia.com>, Warner Losh <imp@bsdimp.com>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 06:14:47 -0000

On Mon, 27 May 2013, David Schultz wrote:

> ...
> Below is a diff of all the changes needed to integrate it. I have
> a short list of style fixes, but otherwise I think what you have
> is good:
>  - wrap lines to 80 chars, please
>  - spaces between operators
>  - "static inline", not "inline static"
>  - don't use "inline" on large functions

Another reply.

I think I tested "inline" on the large functions (just 2) and found
it useful for efficiency.  This is like inline on large trig support
functions being useful.  The inline parts are duplicated once per
C99-API function, and often the caller only uses on C99-API function.
Actually, the large inlines are not duplicated that much.  cacosh()
and casinh() are just wrappers that call cacos() and casin(),
respectively.  There is no inlining for the last 2 (even larger)
functions.  The overhead for the wrappers is noticeable, but more
inlining didn't seem to reduce it much.

More investigation of the extent of the style bugs:
- only 1 line is longer than 80 columns now and easy to fix.  Other long
   lines are for declarations where I prefer to keep the long comments
   on the same line
- spaces between operations will expand a few lines beyond 80 columns if
   done blindly.  Only a few.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 08:12:33 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 15B90B3F;
 Tue, 28 May 2013 08:12:33 +0000 (UTC) (envelope-from das@freebsd.org)
Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net
 [50.196.151.174])
 by mx1.freebsd.org (Postfix) with ESMTP id DA5751AB;
 Tue, 28 May 2013 08:12:32 +0000 (UTC)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
 by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r4S8CECE013842;
 Tue, 28 May 2013 01:12:14 -0700 (PDT) (envelope-from das@freebsd.org)
Received: (from das@localhost)
 by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r4S8CCcU013841;
 Tue, 28 May 2013 01:12:12 -0700 (PDT) (envelope-from das@freebsd.org)
Date: Tue, 28 May 2013 01:12:12 -0700
From: David Schultz <das@freebsd.org>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: Use of C99 extra long double math functions after r236148
Message-ID: <20130528081212.GA13594@zim.MIT.EDU>
References: <501204AD.30605@missouri.edu>
 <20120727032611.GB25690@server.rulingia.com>
 <20120728125824.GA26553@server.rulingia.com>
 <501460BB.30806@missouri.edu>
 <20120728231300.GA20741@server.rulingia.com>
 <50148F02.4020104@missouri.edu>
 <20120729222706.GA29048@server.rulingia.com>
 <5015BB9F.90807@missouri.edu> <20130528043205.GA3282@zim.MIT.EDU>
 <20130528150808.F1298@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130528150808.F1298@besplex.bde.org>
X-Mailman-Approved-At: Tue, 28 May 2013 11:40:43 +0000
Cc: Diane Bruce <db@db.net>, John Baldwin <jhb@freebsd.org>,
 David Chisnall <theraven@freebsd.org>,
 Stephen Montgomery-Smith <stephen@missouri.edu>, freebsd-numerics@freebsd.org,
 Bruce Evans <bde@freebsd.org>, Steve Kargl <sgk@troutmask.apl.washington.edu>,
 Peter Jeremy <peter@rulingia.com>, Warner Losh <imp@bsdimp.com>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 08:12:33 -0000

On Tue, May 28, 2013, Bruce Evans wrote:
> @ diff -u2 catrigl.c~ catrigl.c
> @ --- catrigl.c~	2012-09-22 21:14:24.000000000 +0000
> @ +++ catrigl.c	2013-05-26 08:46:10.423187000 +0000
> @ @@ -50,4 +50,6 @@
> @  #define signbit(x)	(__builtin_signbitl(x)) 
> @ 
> @ +long double atanhl(long double);
> @ +
> @  static const long double
> @  A_crossover =		10,
> 
> catrigl.c depends on atanhl(), logl() and log1pl() existing.

Yep, I'm ignoring the complex long double functions until the real
long double functions are done. I'm hoping that won't be too long!

> % Index: tools/regression/lib/msun/test-invctrig.c
> % ===================================================================
> % --- tools/regression/lib/msun/test-invctrig.c	(revision 0)
> % +++ tools/regression/lib/msun/test-invctrig.c	(working copy)
> % @@ -0,0 +1,467 @@
> % ....
> % +#pragma STDC FENV_ACCESS	ON
> % +#pragma	STDC CX_LIMITED_RANGE	OFF
> 
> Heheh, style rules for #pragma.  I like the old rule which says that
> it should be indented 6 feet under.  It is still almost useless, since
> we don't even have any C99 compilers than implement the fenv pragmas
> yet.

They are mostly just there to document the fact that this code is
expecting FENV_ACCESS to work. Clang, adding insult to injury,
generates a warning about these. I don't think they're going to
implement the missing C99 features soon. Many bugs have been filed
about the issue, but I haven't heard of any progress. When I asked
years ago, I was basically told that the LLVM IR can't support the
feature without substantial modifications.

> % + * XXX gcc implements complex multiplication incorrectly. In
> % + * particular, it implements it as if the CX_LIMITED_RANGE pragma
> % + * were ON. Consequently, we need this function to form numbers
> % + * such as x + INFINITY * I, since gcc evalutes INFINITY * I as
> % + * NaN + INFINITY * I.
> % + */
> % +static inline long double complex
> % +cpackl(long double x, long double y)
> % +{
> % +	long double complex z;
> % +
> % +	__real__ z = x;
> % +	__imag__ z = y;
> % +	return (z);
> % +}
> 
> Why duplicate this?  I guess it is because math_private,h is hard to
> include.  I use complicated conditionals (mostly switches on
> $(uname -p) and $(hostname) in shell scripts to locate it when
> compiling from external directories.

I will change to CMPLXL, now that CMPLXL has been committed.
Thanks for reminding me. The ability to use complex numbers in
initializers is nice (ignore whitespace munging due to cut/paste):

        static const struct {
                complex long double z;
                complex long double acos_z;
                complex long double asin_z;
                complex long double atan_z;
        } tests[] = {
                { CMPLXL(0.75L, 0.25L),
                  CMPLXL(pi / 4, -0.34657359027997265470861606072908828L),
                  CMPLXL(pi / 4, 0.34657359027997265470861606072908828L),
                  CMPLXL(0.66290883183401623252961960521423782L,
                         0.15899719167999917436476103600701878L) },
        };
        int i;

        for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
                testall_tol(cacos, tests[i].z, tests[i].acos_z, 2);
                testall_odd_tol(casin, tests[i].z, tests[i].asin_z, 2);
                testall_odd_tol(catan, tests[i].z, tests[i].atan_z, 2);
        }

A few more tests would be good (e.g., large inputs, parts of the
range that are close to an axis or discontinuity), but I ran out
of time.

> The tests seem to be compiled with -O0.  That tests a different
> environment than the usual runtime one, and in particular misses seeing
> most precision bugs.  I mostly test with -O (-O2 with gcc is slower
> and even harder to debug, while with clang it makes little difference),
> but switch to -O0 to debug.  -g -O is now almost unusable because -O
> optimizes away dead variables and -g is broken in many cases (sometimes
> it can't even show live variables).

I want the tests to come as close as possible to testing the
behavior that real programs will see. Unfortunately, any test that
exercises different rounding modes or looks at floating-point
exceptions is pretty much doomed to fail with gcc and clang, so I
gave up. (Sometimes I wonder if there's any point in having a free
library that supports them if you need a commercial compiler to
take advantage.) However, the tests do sometimes uncover compiler
bugs that get fixed. They caught a few bugs in gcc builtins, and
an arithmetic bug in clang's constant-folding code, all of which
were fixed.

From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 08:19:35 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 58AC3BD0;
 Tue, 28 May 2013 08:19:35 +0000 (UTC) (envelope-from das@freebsd.org)
Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net
 [50.196.151.174])
 by mx1.freebsd.org (Postfix) with ESMTP id 1F3DE1EC;
 Tue, 28 May 2013 08:19:34 +0000 (UTC)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
 by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r4S8JMeg013858;
 Tue, 28 May 2013 01:19:23 -0700 (PDT) (envelope-from das@freebsd.org)
Received: (from das@localhost)
 by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r4S8JLAo013857;
 Tue, 28 May 2013 01:19:21 -0700 (PDT) (envelope-from das@freebsd.org)
Date: Tue, 28 May 2013 01:19:21 -0700
From: David Schultz <das@freebsd.org>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: Use of C99 extra long double math functions after r236148
Message-ID: <20130528081921.GB13594@zim.MIT.EDU>
References: <501204AD.30605@missouri.edu>
 <20120727032611.GB25690@server.rulingia.com>
 <20120728125824.GA26553@server.rulingia.com>
 <501460BB.30806@missouri.edu>
 <20120728231300.GA20741@server.rulingia.com>
 <50148F02.4020104@missouri.edu>
 <20120729222706.GA29048@server.rulingia.com>
 <5015BB9F.90807@missouri.edu> <20130528043205.GA3282@zim.MIT.EDU>
 <20130528155933.V1298@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130528155933.V1298@besplex.bde.org>
X-Mailman-Approved-At: Tue, 28 May 2013 11:40:53 +0000
Cc: Diane Bruce <db@db.net>, John Baldwin <jhb@freebsd.org>,
 David Chisnall <theraven@freebsd.org>,
 Stephen Montgomery-Smith <stephen@missouri.edu>, freebsd-numerics@freebsd.org,
 Bruce Evans <bde@freebsd.org>, Steve Kargl <sgk@troutmask.apl.washington.edu>,
 Peter Jeremy <peter@rulingia.com>, Warner Losh <imp@bsdimp.com>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 08:19:35 -0000

On Tue, May 28, 2013, Bruce Evans wrote:
> On Mon, 27 May 2013, David Schultz wrote:
> 
> > ...
> > Below is a diff of all the changes needed to integrate it. I have
> > a short list of style fixes, but otherwise I think what you have
> > is good:
> >  - wrap lines to 80 chars, please
> >  - spaces between operators
> >  - "static inline", not "inline static"
> >  - don't use "inline" on large functions
> 
> Another reply.
> 
> I think I tested "inline" on the large functions (just 2) and found
> it useful for efficiency.  This is like inline on large trig support
> functions being useful.  The inline parts are duplicated once per
> C99-API function, and often the caller only uses on C99-API function.
> Actually, the large inlines are not duplicated that much.  cacosh()
> and casinh() are just wrappers that call cacos() and casin(),
> respectively.  There is no inlining for the last 2 (even larger)
> functions.  The overhead for the wrappers is noticeable, but more
> inlining didn't seem to reduce it much.
> 
> More investigation of the extent of the style bugs:
> - only 1 line is longer than 80 columns now and easy to fix.  Other long
>    lines are for declarations where I prefer to keep the long comments
>    on the same line
> - spaces between operations will expand a few lines beyond 80 columns if
>    done blindly.  Only a few.

If you did benchmarks to show that using inline is worthwhile
despite the cache pressure, then it's fine with me. I had assumed
that it was added without much thought.

Also, people have been asking for someone to commit this for a
long time, so I'm not going to split hairs over the spacing.

From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 10:48:11 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 5845182D;
 Tue, 28 May 2013 10:48:11 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
 [211.29.132.185])
 by mx1.freebsd.org (Postfix) with ESMTP id B88B6D27;
 Tue, 28 May 2013 10:48:10 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4SAlj87005958
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Tue, 28 May 2013 20:47:56 +1000
Date: Tue, 28 May 2013 20:47:45 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: David Schultz <das@FreeBSD.org>
Subject: Re: Use of C99 extra long double math functions after r236148
In-Reply-To: <20130528081212.GA13594@zim.MIT.EDU>
Message-ID: <20130528195733.Q2294@besplex.bde.org>
References: <501204AD.30605@missouri.edu>
 <20120727032611.GB25690@server.rulingia.com>
 <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu>
 <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu>
 <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu>
 <20130528043205.GA3282@zim.MIT.EDU> <20130528150808.F1298@besplex.bde.org>
 <20130528081212.GA13594@zim.MIT.EDU>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=BPvrNysG c=1 sm=1 a=O6A2dy7pM2IA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Zc3-fPm5GV4A:10
 a=hG9Faytz-pyrK-G3USYA:9 a=CjuIK1q_8ugA:10 a=QPu_LqNFptFJU9lF:21
 a=io-CCv-q2jcpwO-C:21 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
X-Mailman-Approved-At: Tue, 28 May 2013 11:41:08 +0000
Cc: Diane Bruce <db@db.net>, Bruce Evans <brde@optusnet.com.au>,
 John Baldwin <jhb@FreeBSD.org>, David Chisnall <theraven@FreeBSD.org>,
 Stephen Montgomery-Smith <stephen@missouri.edu>, freebsd-numerics@FreeBSD.org,
 Steve Kargl <sgk@troutmask.apl.washington.edu>,
 Peter Jeremy <peter@rulingia.com>, Warner Losh <imp@bsdimp.com>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 10:48:11 -0000

On Tue, 28 May 2013, David Schultz wrote:

> On Tue, May 28, 2013, Bruce Evans wrote:
>> @ diff -u2 catrigl.c~ catrigl.c
>> @ --- catrigl.c~	2012-09-22 21:14:24.000000000 +0000
>> @ +++ catrigl.c	2013-05-26 08:46:10.423187000 +0000
>> @ @@ -50,4 +50,6 @@
>> @  #define signbit(x)	(__builtin_signbitl(x))
>> @
>> @ +long double atanhl(long double);
>> @ +
>> @  static const long double
>> @  A_crossover =		10,
>>
>> catrigl.c depends on atanhl(), logl() and log1pl() existing.
>
> Yep, I'm ignoring the complex long double functions until the real
> long double functions are done. I'm hoping that won't be too long!

As usual, you can find my current versions in
~bde/msun/src/zztest/s_log*.c, ~bde/msun/src/zztest/ld128/s_logl.c,
and ~bde/msun/src/zztest/cplex.c (clog*).  Lots of macros in
~bde/msun/src/zztest/math_private.h are also needed.  The header needs
more cleaning than the C files, but you can easily extract the parts
needed.

>> % Index: tools/regression/lib/msun/test-invctrig.c
>> % ===================================================================
>> % --- tools/regression/lib/msun/test-invctrig.c	(revision 0)
>> % +++ tools/regression/lib/msun/test-invctrig.c	(working copy)
>> % @@ -0,0 +1,467 @@
>> % ....
>
>> % + * XXX gcc implements complex multiplication incorrectly. In
>> % + * particular, it implements it as if the CX_LIMITED_RANGE pragma
>> % + * were ON. Consequently, we need this function to form numbers
>> % + * such as x + INFINITY * I, since gcc evalutes INFINITY * I as
>> % + * NaN + INFINITY * I.
>> % + */
>> % +static inline long double complex
>> % +cpackl(long double x, long double y)
>> % +{
>> % +	long double complex z;
>> % +
>> % +	__real__ z = x;
>> % +	__imag__ z = y;
>> % +	return (z);
>> % +}
>>
>> Why duplicate this?  I guess it is because math_private,h is hard to
>> include.  I use complicated conditionals (mostly switches on
>> $(uname -p) and $(hostname) in shell scripts to locate it when
>> compiling from external directories.
>
> I will change to CMPLXL, now that CMPLXL has been committed.
> Thanks for reminding me.

That won't be very portable.  I already need ifdefs and extra code in
math_private.h to restore the old version that works with old versions
of gcc.

> The ability to use complex numbers in
> initializers is nice (ignore whitespace munging due to cut/paste):
>
>        static const struct {
>                complex long double z;
>                complex long double acos_z;
>                complex long double asin_z;
>                complex long double atan_z;
>        } tests[] = {
>                { CMPLXL(0.75L, 0.25L),
>                  CMPLXL(pi / 4, -0.34657359027997265470861606072908828L),
>                  CMPLXL(pi / 4, 0.34657359027997265470861606072908828L),
>                  CMPLXL(0.66290883183401623252961960521423782L,
>                         0.15899719167999917436476103600701878L) },
>        };

I think you mean "nasty" :-).  Simply x + I * y seems to work correctly
with the following compilers on amd64: gcc-2.95.4, gcc-3.3.3, gcc-3.4.6,
gcc-4.2.1, clang 3.3.
    But you cannot use either x + I * y or CMPLXL() with literals for
    for for long doubles, since on i386 most of the gcc's will round the
    long doubles to 53 bits, so you must use LD80C() for most long double
    constants, and LD80C() won't work inside either x + I * y or CMPLXL().

I didn't test this with exactly the above.  Untested conversion of it:

                { 0.75L +  I * 0.25L,
                  pi / 4 + I * -0.34657359027997265470861606072908828L,
                  pi / 4 + I *  0.34657359027997265470861606072908828L,
                  0.66290883183401623252961960521423782L +
                           I * 0.15899719167999917436476103600701878L, },

Is pi a variable, and/or does CMPLXL() work with variables in static
initializers?  Non-static initializers and CMPLXL() can be used on
variables constructed using LD80C().  Now gcc-3.3.3 generates horrible
code for a runtime evaluation and probably causes overflow bugs for
exceptional args (the ones that we invented cpack*() to avoid).  gcc-4.2.1
generates good code.  The freebsd cluster seems to have crashed while I
was writing this, so I don't have access to the other compilers.

>> The tests seem to be compiled with -O0.  That tests a different
>> environment than the usual runtime one, and in particular misses seeing
>> most precision bugs.  I mostly test with -O (-O2 with gcc is slower
>> and even harder to debug, while with clang it makes little difference),
>> but switch to -O0 to debug.  -g -O is now almost unusable because -O
>> optimizes away dead variables and -g is broken in many cases (sometimes
>> it can't even show live variables).
>
> I want the tests to come as close as possible to testing the
> behavior that real programs will see. Unfortunately, any test that
> exercises different rounding modes or looks at floating-point
> exceptions is pretty much doomed to fail with gcc and clang, so I
> gave up. (Sometimes I wonder if there's any point in having a free
> library that supports them if you need a commercial compiler to
> take advantage.) However, the tests do sometimes uncover compiler
> bugs that get fixed. They caught a few bugs in gcc builtins, and
> an arithmetic bug in clang's constant-folding code, all of which
> were fixed.

But doesn't using -O0 give the opposite of that?  The library is closer
to working than tests and real programs since it is relatively careful
and the compiler problems usually don't have much effect (since wrong
rounding by the compiler tends to show up as errors of >= 1 ulp and
gets fixed).

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 11:12:12 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 01C31D5B;
 Tue, 28 May 2013 11:12:12 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au
 [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 7D5DAE92;
 Tue, 28 May 2013 11:12:11 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 5F0D1D405D6;
 Tue, 28 May 2013 21:12:00 +1000 (EST)
Date: Tue, 28 May 2013 21:12:00 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: David Schultz <das@FreeBSD.org>
Subject: Re: Use of C99 extra long double math functions after r236148
In-Reply-To: <20130528081921.GB13594@zim.MIT.EDU>
Message-ID: <20130528205441.U2294@besplex.bde.org>
References: <501204AD.30605@missouri.edu>
 <20120727032611.GB25690@server.rulingia.com>
 <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu>
 <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu>
 <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu>
 <20130528043205.GA3282@zim.MIT.EDU> <20130528155933.V1298@besplex.bde.org>
 <20130528081921.GB13594@zim.MIT.EDU>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=BPvrNysG c=1 sm=1 a=O6A2dy7pM2IA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Zc3-fPm5GV4A:10
 a=SKYg3Y9sK9o-Tfi3u28A:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
X-Mailman-Approved-At: Tue, 28 May 2013 11:41:17 +0000
Cc: Diane Bruce <db@db.net>, Bruce Evans <brde@optusnet.com.au>,
 John Baldwin <jhb@FreeBSD.org>, David Chisnall <theraven@FreeBSD.org>,
 Stephen Montgomery-Smith <stephen@missouri.edu>, freebsd-numerics@FreeBSD.org,
 Steve Kargl <sgk@troutmask.apl.washington.edu>,
 Peter Jeremy <peter@rulingia.com>, Warner Losh <imp@bsdimp.com>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 11:12:12 -0000

On Tue, 28 May 2013, David Schultz wrote:

> On Tue, May 28, 2013, Bruce Evans wrote:
>>
>> I think I tested "inline" on the large functions (just 2) and found
>> it useful for efficiency.  This is like inline on large trig support
>> functions being useful.  The inline parts are duplicated once per
>> C99-API function, and often the caller only uses on C99-API function.
>> Actually, the large inlines are not duplicated that much.  cacosh()
>> and casinh() are just wrappers that call cacos() and casin(),
>> respectively.  There is no inlining for the last 2 (even larger)
>> functions.  The overhead for the wrappers is noticeable, but more
>> inlining didn't seem to reduce it much.
>
> If you did benchmarks to show that using inline is worthwhile
> despite the cache pressure, then it's fine with me. I had assumed
> that it was added without much thought.

I retested.  Inlining the big function do_hard_work() helps for gcc on
amd64 (about 5% faster), but makes no significant difference for clang.
The previous testing was mostly with gcc.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 11:55:44 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id C3D43919;
 Tue, 28 May 2013 11:55:44 +0000 (UTC)
 (envelope-from theraven@FreeBSD.org)
Received: from theravensnest.org (theraven.freebsd.your.org [216.14.102.27])
 by mx1.freebsd.org (Postfix) with ESMTP id 9438F238;
 Tue, 28 May 2013 11:55:44 +0000 (UTC)
Received: from c120.sec.cl.cam.ac.uk (c120.sec.cl.cam.ac.uk [128.232.18.120])
 (authenticated bits=0)
 by theravensnest.org (8.14.5/8.14.5) with ESMTP id r4SBtccK042149
 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
 Tue, 28 May 2013 11:55:39 GMT (envelope-from theraven@FreeBSD.org)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Subject: Re: Use of C99 extra long double math functions after r236148
From: David Chisnall <theraven@FreeBSD.org>
In-Reply-To: <20130528205441.U2294@besplex.bde.org>
Date: Tue, 28 May 2013 12:55:34 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <C7367F2D-0A97-422E-97B1-4AF4BFEDD526@FreeBSD.org>
References: <501204AD.30605@missouri.edu>
 <20120727032611.GB25690@server.rulingia.com>
 <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu>
 <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu>
 <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu>
 <20130528043205.GA3282@zim.MIT.EDU> <20130528155933.V1298@besplex.bde.org>
 <20130528081921.GB13594@zim.MIT.EDU> <20130528205441.U2294@besplex.bde.org>
To: Bruce Evans <brde@optusnet.com.au>
X-Mailer: Apple Mail (2.1503)
Cc: Diane Bruce <db@db.net>, John Baldwin <jhb@FreeBSD.org>,
 Stephen Montgomery-Smith <stephen@missouri.edu>, freebsd-numerics@FreeBSD.org,
 Steve Kargl <sgk@troutmask.apl.washington.edu>,
 David Schultz <das@FreeBSD.org>, Peter Jeremy <peter@rulingia.com>,
 Warner Losh <imp@bsdimp.com>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 11:55:44 -0000

On 28 May 2013, at 12:12, Bruce Evans <brde@optusnet.com.au> wrote:

> Inlining the big function do_hard_work() helps for gcc on
> amd64 (about 5% faster), but makes no significant difference for =
clang.
> The previous testing was mostly with gcc.

How are you inlining?  With the C99 inline keyword, which changes the =
linkage type but only provides and advisory hint to the compiler with =
regard to inlining (which, in a modern compiler, is largely ignored), or =
with the always_inline attribute, which forces the compiler to inline =
the function?

David


From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 12:03:19 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id F1802DBA
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 12:03:19 +0000 (UTC)
 (envelope-from s.montgomerysmith@gmail.com)
Received: from mail-ie0-x22a.google.com (mail-ie0-x22a.google.com
 [IPv6:2607:f8b0:4001:c03::22a])
 by mx1.freebsd.org (Postfix) with ESMTP id C1F0C2E4
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 12:03:19 +0000 (UTC)
Received: by mail-ie0-f170.google.com with SMTP id e14so2506268iej.1
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 05:03:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:subject
 :references:in-reply-to:x-enigmail-version:content-type
 :content-transfer-encoding;
 bh=1GPUFncYViNScC5uzVOcidJV1nmflw0qt/mpmKv7Z5A=;
 b=V6eD6lOiPlvkcvoNF32C4EVSufDHqQq/h/78G91MOhkZOlXJnHLVrZIS2VdftYiwmU
 3bvqIxoQQ7idgdEcV0rS5B/6dheL2/fDZvTmWvQAshlvMdCigCqLWn8dqihPUDdszDmt
 Bol2qynaW8S4ff6dK/UjxsMoe/5HMiV/oZ2wHeO/2vTvsb0ta45zABoYYX5G82Sqz+Uo
 WWinhQDnQ1eEUIIrS4N+Y2451ww87dybHGkh/jXci6v1dzfc8cqIzK5Z0ykobi2cgkgz
 oQ0UHP1J9qfQZP5G6ffbk1O/JqBeVwc9RGpwf+thPM1UlAj4EInC5+zf39DZXOIQUU+0
 g7Zg==
X-Received: by 10.42.196.138 with SMTP id eg10mr19096254icb.5.1369742599562;
 Tue, 28 May 2013 05:03:19 -0700 (PDT)
Received: from [192.168.0.11] (50-82-246-58.client.mchsi.com. [50.82.246.58])
 by mx.google.com with ESMTPSA id
 gz1sm5147957igb.5.2013.05.28.05.03.17
 for <freebsd-numerics@freebsd.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Tue, 28 May 2013 05:03:18 -0700 (PDT)
Sender: Stephen Montgomery-Smith <s.montgomerysmith@gmail.com>
Message-ID: <51A49D04.5050409@missouri.edu>
Date: Tue, 28 May 2013 07:03:16 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
 rv:17.0) Gecko/20130510 Thunderbird/17.0.6
MIME-Version: 1.0
To: freebsd-numerics@freebsd.org
Subject: Re: Use of C99 extra long double math functions after r236148
References: <500DAD41.5030104@missouri.edu>
 <20120724113214.G934@besplex.bde.org> <501204AD.30605@missouri.edu>
 <20120727032611.GB25690@server.rulingia.com>
 <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu>
 <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu>
 <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu>
 <20130528043205.GA3282@zim.MIT.EDU> <51A49A40.3040505@missouri.edu>
In-Reply-To: <51A49A40.3040505@missouri.edu>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 12:03:20 -0000

On 05/28/2013 06:51 AM, Stephen Montgomery-Smith wrote:
> On 05/27/2013 11:32 PM, David Schultz wrote:
> 
>> Hi Stephen,
>>
>> I wrote some tests to cover the corner cases for the complex
>> inverse trig functions. They don't find any nontrivial bugs in
>> your implementations. :-) Now that you have a commit bit, would
>> you like to commit your code, or shall I?
> 
> I think I only have a commit bit for ports, not src.
> 
> In any case, I would much prefer that you commit it.  I have a lot on my
> plate right now.
> 
> Thank you for doing this.  It would be great to see this in FreeBSD.
> 

Also, if I can brag a little, I think the only other implementation of
the complex arc-trig functions that is as accurate are the most recent
boost library implementations, and then only because I submitted bug
fixes to them.

I also found a bug in the Hull, Fairgrieve, and Tang algorithm for
cacos/cacaosh, which was faulty in certain extreme cases.  This bug is
documented here:
https://svn.boost.org/trac/boost/ticket/7290

From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 11:51:34 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 175068B3;
 Tue, 28 May 2013 11:51:34 +0000 (UTC)
 (envelope-from s.montgomerysmith@gmail.com)
Received: from mail-ie0-x22a.google.com (mail-ie0-x22a.google.com
 [IPv6:2607:f8b0:4001:c03::22a])
 by mx1.freebsd.org (Postfix) with ESMTP id B26AB211;
 Tue, 28 May 2013 11:51:33 +0000 (UTC)
Received: by mail-ie0-f170.google.com with SMTP id e14so2475756iej.1
 for <multiple recipients>; Tue, 28 May 2013 04:51:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
 :references:in-reply-to:x-enigmail-version:content-type
 :content-transfer-encoding;
 bh=H+/7e2OzqUy//muIYaUjn6p9QEFLHwI+uyPNvRXjU2g=;
 b=aZElcC2MFEmydK12ja/lnqL5/5UvzHTAIu06iAlJT8xhH7dvs28S8OxIuvica4m0wD
 tk4vb/7fzNf8IEKIAZQ82E9a6AkEnEDB6IpMrvGyc2yDm2Pgn5ojAFZXjPokMgn1nX81
 nn0ic/3IbghHnqGcruZhAUw0jmbV8/KSnzkCouhmBtcoMSO9F49SuLGQotBbxntiTKER
 HTVpwJUjrLELfmF5QIhmmqwOeU3qx85GOBt6o++BHy4x7Qyes6fvOutx6szwDzCz19sT
 DTl74Ln/P4ZCxkAPm/sMe42eF6SuButH555s/x5yR7U8UUl29eKfFUwcTprkQrulMJEp
 4OqQ==
X-Received: by 10.42.196.138 with SMTP id eg10mr19076648icb.5.1369741892821;
 Tue, 28 May 2013 04:51:32 -0700 (PDT)
Received: from [192.168.0.11] (50-82-246-58.client.mchsi.com. [50.82.246.58])
 by mx.google.com with ESMTPSA id 9sm17646992igy.7.2013.05.28.04.51.29
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Tue, 28 May 2013 04:51:31 -0700 (PDT)
Sender: Stephen Montgomery-Smith <s.montgomerysmith@gmail.com>
Message-ID: <51A49A40.3040505@missouri.edu>
Date: Tue, 28 May 2013 06:51:28 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
 rv:17.0) Gecko/20130510 Thunderbird/17.0.6
MIME-Version: 1.0
To: David Schultz <das@freebsd.org>
Subject: Re: Use of C99 extra long double math functions after r236148
References: <500DAD41.5030104@missouri.edu>
 <20120724113214.G934@besplex.bde.org> <501204AD.30605@missouri.edu>
 <20120727032611.GB25690@server.rulingia.com>
 <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu>
 <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu>
 <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu>
 <20130528043205.GA3282@zim.MIT.EDU>
In-Reply-To: <20130528043205.GA3282@zim.MIT.EDU>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Mailman-Approved-At: Tue, 28 May 2013 12:07:52 +0000
Cc: Diane Bruce <db@db.net>, Bruce Evans <brde@optusnet.com.au>,
 John Baldwin <jhb@freebsd.org>, David Chisnall <theraven@freebsd.org>,
 freebsd-numerics@freebsd.org, Bruce Evans <bde@freebsd.org>,
 Steve Kargl <sgk@troutmask.apl.washington.edu>,
 Peter Jeremy <peter@rulingia.com>, Warner Losh <imp@bsdimp.com>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 11:51:34 -0000

On 05/27/2013 11:32 PM, David Schultz wrote:

> Hi Stephen,
> 
> I wrote some tests to cover the corner cases for the complex
> inverse trig functions. They don't find any nontrivial bugs in
> your implementations. :-) Now that you have a commit bit, would
> you like to commit your code, or shall I?

I think I only have a commit bit for ports, not src.

In any case, I would much prefer that you commit it.  I have a lot on my
plate right now.

Thank you for doing this.  It would be great to see this in FreeBSD.

From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 13:03:13 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id A5D60ADE;
 Tue, 28 May 2013 13:03:13 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id 65C2E8A2;
 Tue, 28 May 2013 13:03:13 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 7D79C104139F;
 Tue, 28 May 2013 22:44:57 +1000 (EST)
Date: Tue, 28 May 2013 22:44:22 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: David Chisnall <theraven@freebsd.org>
Subject: Re: Use of C99 extra long double math functions after r236148
In-Reply-To: <C7367F2D-0A97-422E-97B1-4AF4BFEDD526@FreeBSD.org>
Message-ID: <20130528222541.N2926@besplex.bde.org>
References: <501204AD.30605@missouri.edu>
 <20120727032611.GB25690@server.rulingia.com>
 <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu>
 <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu>
 <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu>
 <20130528043205.GA3282@zim.MIT.EDU> <20130528155933.V1298@besplex.bde.org>
 <20130528081921.GB13594@zim.MIT.EDU> <20130528205441.U2294@besplex.bde.org>
 <C7367F2D-0A97-422E-97B1-4AF4BFEDD526@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=e/de0tV/ c=1 sm=1 a=O6A2dy7pM2IA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Zc3-fPm5GV4A:10
 a=jpvodMJeLT64p2G4esEA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
Cc: Diane Bruce <db@db.net>, John Baldwin <jhb@freebsd.org>,
 Stephen Montgomery-Smith <stephen@missouri.edu>, freebsd-numerics@freebsd.org,
 Steve Kargl <sgk@troutmask.apl.washington.edu>,
 David Schultz <das@freebsd.org>, Peter Jeremy <peter@rulingia.com>,
 Warner Losh <imp@bsdimp.com>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 13:03:13 -0000

On Tue, 28 May 2013, David Chisnall wrote:

> On 28 May 2013, at 12:12, Bruce Evans <brde@optusnet.com.au> wrote:
>
>> Inlining the big function do_hard_work() helps for gcc on
>> amd64 (about 5% faster), but makes no significant difference for clang.
>> The previous testing was mostly with gcc.
>
> How are you inlining?  With the C99 inline keyword, which changes the linkage type but only provides and advisory hint to the compiler with regard to inlining (which, in a modern compiler, is largely ignored), or with the always_inline attribute, which forces the compiler to inline the function?

Only static inlining in catrig*.c.  All compilers follow its hints there.
libm sometimes uses static __always_inline instead of static inline
elsewhere (but mostly not).

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 17:22:43 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 27148BB4
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 17:22:43 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id E6E2BA07
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 17:22:42 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4SHMgDf051541
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 10:22:42 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4SHMghZ051540
 for freebsd-numerics@freebsd.org; Tue, 28 May 2013 10:22:42 -0700 (PDT)
 (envelope-from sgk)
Date: Tue, 28 May 2013 10:22:42 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: freebsd-numerics@freebsd.org
Subject: Patches for s_expl.c
Message-ID: <20130528172242.GA51485@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 17:22:43 -0000

Here are two patches for ld80/s_expl.c and ld128/s_expl.c.
Instead of committing the one large patch that I have spent
hours testing, I have split it into two.  One patch fixes/updates
expl().  The other patch is the implementation of expm1l().  

My commit messages will be:

Patch 1:

   ld80/s_expl.c:

   * Use the LOG2_INTERVALS macro instead of hardcoding 7.
   * Use LD80C to set overflow and underflow thresholds, and then use
     #defines to access the .e component to reduce diffs with ld128 version.
   * Rename polynomial coefficients P# to A#, which is used in Tang.
   * Remove the use of intermediate results t23 and t45.
   * Micro-optimization: remove access to u.xbits.man.
   * Fix an off-by-one in the underflow case.
   * Replace a factor the long double constant 2.0L by the integer 2.  Let
     the compiler to the conversion. 

   ld128/s_expl.c:

   * Adjust Copyright years to reflect when bits of the code were actually
     written.
   * Reduce diff between the ld80 and ld128 versions.

Patch 2:

   ld80/s_expl.c:

   * Compute expm1l(x) for Intel 80-bit format.

   ld128/s_expl.c:

   * Compute expm1l(x) for IEEE 754 128-bit format.

   These are based on:

   PTP Tang, "Table-driven implementation of the Expm1 function
   in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18,
   211-222 (1992).

These commit logs may be too terse for some, but quite frankly after
2 or 3 years of submitting and resubmitting diffs, I've forgotten
why some changes have or have not been made.

expm1l() resides in s_expl.c because she shares the same table,
polynomial coefficients, and some numerical constants with expl().

-- 
Steve

Patch 1:

Index: ld80/s_expl.c
===================================================================
--- ld80/s_expl.c	(revision 251062)
+++ ld80/s_expl.c	(working copy)
@@ -50,6 +50,7 @@
 #include "math_private.h"
 
 #define	INTERVALS	128
+#define	LOG2_INTERVALS	7
 #define	BIAS	(LDBL_MAX_EXP - 1)
 
 static const long double
@@ -60,9 +61,12 @@
 
 static const union IEEEl2bits
 /* log(2**16384 - 0.5) rounded towards zero: */
-o_threshold = LD80C(0xb17217f7d1cf79ab, 13,  11356.5234062941439488L),
+/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */
+o_thresholdu = LD80C(0xb17217f7d1cf79ab, 13,  11356.5234062941439488L),
+#define o_threshold	 (o_thresholdu.e)
 /* log(2**(-16381-64-1)) rounded towards zero: */
-u_threshold = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L);
+u_thresholdu = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L);
+#define u_threshold	 (u_thresholdu.e)
 
 static const double
 /*
@@ -78,11 +82,11 @@
  * |exp(x) - p(x)| < 2**-77.2
  * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
  */
-P2 =  0.5,
-P3 =  1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
-P4 =  4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
-P5 =  8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
-P6 =  1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
+A2  = 0.5,
+A3  = 1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
+A4  = 4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
+A5  = 8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
+A6  = 1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
 
 /*
  * 2^(i/INTERVALS) for i in [0,INTERVALS] is represented by two values where
@@ -232,7 +236,8 @@
 expl(long double x)
 {
 	union IEEEl2bits u, v;
-	long double fn, q, r, r1, r2, t, t23, t45, twopk, twopkp10000, z;
+	long double fn, q, r, r1, r2, t, twopk, twopkp10000;
+	long double z;
 	int k, n, n2;
 	uint16_t hx, ix;
 
@@ -242,23 +247,21 @@
 	ix = hx & 0x7fff;
 	if (ix >= BIAS + 13) {		/* |x| >= 8192 or x is NaN */
 		if (ix == BIAS + LDBL_MAX_EXP) {
-			if (hx & 0x8000 && u.xbits.man == 1ULL << 63)
-				return (0.0L);	/* x is -Inf */
-			return (x + x); /* x is +Inf, NaN or unsupported */
+			if (hx & 0x8000)  /* x is -Inf, -NaN or unsupported */
+				return (-1 / x);
+ 			return (x + x);	/* x is +Inf, +NaN or unsupported */
 		}
-		if (x > o_threshold.e)
+		if (x > o_threshold)
 			return (huge * huge);
-		if (x < u_threshold.e)
+		if (x < u_threshold)
 			return (tiny * tiny);
-	} else if (ix < BIAS - 66) {	/* |x| < 0x1p-66 */
-					/* includes pseudo-denormals */
-		if (huge + x > 1.0L)	/* trigger inexact iff x != 0 */
-			return (1.0L + x);
+	} else if (ix < BIAS - 65) {	/* |x| < 0x1p-65 (includes pseudos) */
+		return (1 + x);		/* 1 with inexact iff x != 0 */
 	}
 
 	ENTERI();
 
-	/* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
 	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
 	fn = x * INV_L + 0x1.8p63 - 0x1.8p63;
 	r = x - fn * L1 - fn * L2;	/* r = r1 + r2 done independently. */
@@ -270,12 +273,12 @@
 	n  = (int)fn;
 #endif
 	n2 = (unsigned)n % INTERVALS;
-	k = (n - n2) / INTERVALS;
+	k = n >> LOG2_INTERVALS;
 	r1 = x - fn * L1;
-	r2 = -fn * L2;
+	r2 = fn * -L2;
 
 	/* Prepare scale factors. */
-	v.xbits.man = 1ULL << 63;
+	v.e = 1;
 	if (k >= LDBL_MIN_EXP) {
 		v.xbits.expsign = BIAS + k;
 		twopk = v.e;
@@ -284,19 +287,16 @@
 		twopkp10000 = v.e;
 	}
 
-	/* Evaluate expl(midpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */
-	/* Here q = q(r), not q(r1), since r1 is lopped like L1. */
-	t45 = r * P5 + P4;
+	/* Evaluate expl(endpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */
 	z = r * r;
-	t23 = r * P3 + P2;
-	q = r2 + z * t23 + z * z * t45 + z * z * z * P6;
+	q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6;
 	t = (long double)s[n2].lo + s[n2].hi;
 	t = s[n2].lo + t * (q + r1) + s[n2].hi;
 
 	/* Scale by 2**k. */
 	if (k >= LDBL_MIN_EXP) {
 		if (k == LDBL_MAX_EXP)
-			RETURNI(t * 2.0L * 0x1p16383L);
+			RETURNI(t * 2 * 0x1p16383L);
 		RETURNI(t * twopk);
 	} else {
 		RETURNI(t * twopkp10000 * twom10000);
Index: ld128/s_expl.c
===================================================================
--- ld128/s_expl.c	(revision 251062)
+++ ld128/s_expl.c	(working copy)
@@ -1,5 +1,5 @@
 /*-
- * Copyright (c) 2012 Steven G. Kargl
+ * Copyright (c) 2009-2012 Steven G. Kargl
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -22,6 +22,8 @@
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Optimized by Bruce D. Evans.
  */
 
 #include <sys/cdefs.h>
@@ -38,34 +40,56 @@
 #include "math_private.h"
 
 #define	INTERVALS	128
+#define	LOG2_INTERVALS	7
 #define	BIAS	(LDBL_MAX_EXP - 1)
 
+static const long double
+huge = 0x1p10000L,
+twom10000 = 0x1p-10000L;
+/* XXX Prevent gcc from erroneously constant folding this: */
 static volatile const long double tiny = 0x1p-10000L;
 
 static const long double
-INV_L = 1.84664965233787316142070359168242182e+02L,
-L1 = 5.41521234812457272982212595914567508e-03L,
-L2 = -1.02536706388947310094527932552595546e-29L,
-huge = 0x1p10000L,
+/* log(2**16384 - 0.5) rounded towards zero: */
+/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */
 o_threshold =  11356.523406294143949491931077970763428L,
-twom10000 = 0x1p-10000L,
+/* log(2**(-16381-64-1)) rounded towards zero: */
 u_threshold = -11433.462743336297878837243843452621503L;
 
+/*
+ * ln2/INTERVALS = L1+L2 (hi+lo decomposition for multiplication).  L1 must
+ * have at least 22 (= log2(|LDBL_MIN_EXP-extras|) + log2(INTERVALS)) lowest
+ * bits zero so that multiplication of it by n is exact.
+ */
+static const double
+INV_L = 1.8466496523378731e+2,		/*  0x171547652b82fe.0p-45 */
+L2 = -1.0253670638894731e-29;		/* -0x1.9ff0342542fc3p-97 */
 static const long double
-P2 = 5.00000000000000000000000000000000000e-1L,
-P3 = 1.66666666666666666666666666666666972e-1L,
-P4 = 4.16666666666666666666666666653708268e-2L,
-P5 = 8.33333333333333333333333315069867254e-3L,
-P6 = 1.38888888888888888888996596213795377e-3L,
-P7 = 1.98412698412698412718821436278644414e-4L,
-P8 = 2.48015873015869681884882576649543128e-5L,
-P9 = 2.75573192240103867817876199544468806e-6L,
-P10 = 2.75573236172670046201884000197885520e-7L,
-P11 = 2.50517544183909126492878226167697856e-8L;
+/* 0x1.62e42fefa39ef35793c768000000p-8 */
+L1 =  5.41521234812457272982212595914567508e-03L;
 
+static const long double
+/*
+ * Domain [-0.002708, 0.002708], range ~[-2.4011e-38, 2.4244e-38]:
+ * |exp(x) - p(x)| < 2**-124.9
+ * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
+ */
+A2  = 0.5,
+A3  = 1.66666666666666666666666666651085500e-01L,
+A4  = 4.16666666666666666666666666425885320e-02L,
+A5  = 8.33333333333333333334522877160175842e-03L,
+A6  = 1.38888888888888888889971139751596836e-03L;
+
+static const double
+A7  = 1.9841269841269471e-04,
+A8  = 2.4801587301585284e-05,
+A9  = 2.7557324277411234e-06,
+A10 = 2.7557333722375072e-07;
+
 static const struct {
 	long double	hi;
 	long double	lo;
+/* XXX should rename 's'. */
 } s[INTERVALS] = {
 	0x1p0L, 0x0p0L,
 	0x1.0163da9fb33356d84a66aep0L, 0x3.36dcdfa4003ec04c360be2404078p-92L,
@@ -201,9 +225,10 @@
 expl(long double x)
 {
 	union IEEEl2bits u, v;
-	long double fn, r, r1, r2, q, t, twopk, twopkp10000;
+	long double q, r, r1, t, twopk, twopkp10000;
+	double dr, fn, r2;
 	int k, n, n2;
-	uint32_t hx, ix;
+	uint16_t hx, ix;
 
 	/* Filter out exceptional cases. */
 	u.e = x;
@@ -211,31 +236,38 @@
 	ix = hx & 0x7fff;
 	if (ix >= BIAS + 13) {		/* |x| >= 8192 or x is NaN */
 		if (ix == BIAS + LDBL_MAX_EXP) {
-			if (hx & 0x8000 && u.xbits.manh == 0 &&
-			    u.xbits.manl == 0)
-				return (0.0L);	/* x is -Inf */
-			return (x + x);	/* x is +Inf or NaN */
+			if (hx & 0x8000)  /* x is -Inf or -NaN */
+				return (-1 / x);
+			return (x + x);	/* x is +Inf or +NaN */
 		}
 		if (x > o_threshold)
 			return (huge * huge);
 		if (x < u_threshold)
 			return (tiny * tiny);
-	} else if (ix < BIAS - 115) {	/* |x| < 0x1p-115 */
-	    	if (huge + x > 1.0L)	/* trigger inexact iff x != 0 */
-			return (1.0L + x);
+	} else if (ix < BIAS - 114) {	/* |x| < 0x1p-114 */
+		return (1 + x);		/* 1 with inexact iff x != 0 */
 	}
 
-	/* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */
-	fn = x * INV_L + 0x1.8p112 - 0x1.8p112;
+	ENTERI();
+
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
+	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
+	/* XXX assume no extra precision for the additions, as for trig fns. */
+	fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52;
+#if defined(HAVE_EFFICIENT_IRINT)
+	n  = irint(fn);
+#else
 	n  = (int)fn;
+#endif
 	n2 = (unsigned)n % INTERVALS;
 	k = (n - n2) / INTERVALS;
 	r1 = x - fn * L1;
-	r2 = -fn * L2;
+	r2 = fn * -L2;
+	r = r1 + r2;
 
 	/* Prepare scale factors. */
-	v.xbits.manh = 0;
-	v.xbits.manl = 0;
+	/* XXX sparc64 multiplication is so slow that scalbnl() is faster. */
+	v.e = 1;
 	if (k >= LDBL_MIN_EXP) {
 		v.xbits.expsign = BIAS + k;
 		twopk = v.e;
@@ -244,18 +276,19 @@
 		twopkp10000 = v.e;
 	}
 
-	r = r1 + r2;
-	q = r * r * (P2 + r * (P3 + r * (P4 + r * (P5 + r * (P6 + r * (P7 +
-	    r * (P8 + r * (P9 + r * (P10 + r * P11)))))))));
+	/* Evaluate expl(endpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */
+	dr = r;
+	q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 +
+	    dr * (A7 + dr * (A8 + dr * (A9 + dr * A10))))))));
 	t = s[n2].lo + s[n2].hi;
-	t = s[n2].hi + (s[n2].lo + t * (r2 + q + r1));
+	t = s[n2].lo + t * (q + r1) + s[n2].hi;
 
 	/* Scale by 2**k. */
 	if (k >= LDBL_MIN_EXP) {
 		if (k == LDBL_MAX_EXP)
-			return (t * 2.0L * 0x1p16383L);
-		return (t * twopk);
+			RETURNI(t * 2 * 0x1p16383L);
+		RETURNI(t * twopk);
 	} else {
-		return (t * twopkp10000 * twom10000);
+		RETURNI(t * twopkp10000 * twom10000);
 	}
 }


Patch 2:

--- ld80/s_expl.c	2013-05-28 09:36:27.000000000 -0700
+++ ld80/s_expl.c.all	2013-05-28 09:34:41.000000000 -0700
@@ -302,3 +302,166 @@
 		RETURNI(t * twopkp10000 * twom10000);
 	}
 }
+
+/**
+ * Compute expm1l(x) for Intel 80-bit format.  This is based on:
+ *
+ *   PTP Tang, "Table-driven implementation of the Expm1 function
+ *   in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18,
+ *   211-222 (1992).
+ */
+
+/*
+ * Our T1 and T2 are chosen to be approximately the points where method
+ * A and method B have the same accuracy.  Tang's T1 and T2 are the
+ * points where method A's accuracy changes by a full bit.  For Tang,
+ * this drop in accuracy makes method A immediately less accurate than
+ * method B, but our larger INTERVALS makes method A 2 bits more
+ * accurate so it remains the most accurate method significantly
+ * closer to the origin despite losing the full bit in our extended
+ * range for it.
+ */
+static const double
+T1 = -0.1659,				/* ~-30.625/128 * log(2) */
+T2 =  0.1659;				/* ~30.625/128 * log(2) */
+
+/*
+ * Domain [-0.1659, 0.1659], range ~[-1.2027e-22, 3.4417e-22]:
+ * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.2
+ */
+static const union IEEEl2bits
+B3  = LD80C(0xaaaaaaaaaaaaaaab, -3,  1.66666666666666666671e-01L),
+B4  = LD80C(0xaaaaaaaaaaaaaaac, -5,  4.16666666666666666712e-02L);
+
+static const double
+B5  = 8.3333333333333245e-03,		/* 0x1.111111111110cp-7 */
+B6  = 1.3888888888888861e-03,		/* 0x1.6c16c16c16c0ap-10 */
+B7  = 1.9841269841532042e-04,		/* 0x1.a01a01a0319f9p-13 */
+B8  = 2.4801587302069236e-05,		/* 0x1.a01a01a03cbbcp-16 */
+B9  = 2.7557316558468562e-06,		/* 0x1.71de37fd33d67p-19 */
+B10 = 2.7557315829785151e-07,		/* 0x1.27e4f91418144p-22 */
+B11 = 2.5063168199779829e-08,		/* 0x1.ae94fabdc6b27p-26 */
+B12 = 2.0887164654459567e-09;		/* 0x1.1f122d6413fe1p-29 */
+
+long double
+expm1l(long double x)
+{
+	union IEEEl2bits u, v;
+	long double fn, hx2_hi, hx2_lo, q, r, r1, r2, t, twomk, twopk, x_hi;
+	long double x_lo, x2, z;
+	long double x4;
+	int k, n, n2;
+	uint16_t hx, ix;
+
+	/* Filter out exceptional cases. */
+	u.e = x;
+	hx = u.xbits.expsign;
+	ix = hx & 0x7fff;
+	if (ix >= BIAS + 6) {		/* |x| >= 64 or x is NaN */
+		if (ix == BIAS + LDBL_MAX_EXP) {
+			if (hx & 0x8000)  /* x is -Inf, -NaN or unsupported */
+				return (-1 / x - 1);
+			return (x + x);	/* x is +Inf, +NaN or unsupported */
+		}
+		if (x > o_threshold)
+			return (huge * huge);
+		/*
+		 * expm1l() never underflows, but it must avoid
+		 * unrepresentable large negative exponents.  We used a
+		 * much smaller threshold for large |x| above than in
+		 * expl() so as to handle not so large negative exponents
+		 * in the same way as large ones here.
+		 */
+		if (hx & 0x8000)	/* x <= -64 */
+			return (tiny - 1);	/* good for x < -65ln2 - eps */
+	}
+
+	ENTERI();
+
+	if (T1 < x && x < T2) {
+		if (ix < BIAS - 64) {	/* |x| < 0x1p-64 (includes pseudos) */
+			/* x (rounded) with inexact if x != 0: */
+			RETURNI(x == 0 ? x :
+			    (0x1p100 * x + fabsl(x)) * 0x1p-100);
+		}
+
+		x2 = x * x;
+		x4 = x2 * x2;
+		q = x4 * (x2 * (x4 *
+		    (x2 *            B12  + (x * B11 + B10)) +
+		    (x2 * (x * B9 +  B8) +  (x * B7 +  B6))) +
+			  (x * B5 +  B4.e)) + x2 * x * B3.e;
+
+		x_hi = (float)x;
+		x_lo = x - x_hi;
+		hx2_hi = x_hi * x_hi / 2;
+		hx2_lo = x_lo * (x + x_hi) / 2;
+		if (ix >= BIAS - 7)
+			RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi));
+		else
+			RETURNI(hx2_lo + q + hx2_hi + x);
+	}
+
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
+	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
+	fn = x * INV_L + 0x1.8p63 - 0x1.8p63;
+#if defined(HAVE_EFFICIENT_IRINTL)
+	n  = irintl(fn);
+#elif defined(HAVE_EFFICIENT_IRINT)
+	n  = irint(fn);
+#else
+	n  = (int)fn;
+#endif
+	n2 = (unsigned)n % INTERVALS;
+	k = n >> LOG2_INTERVALS;
+	r1 = x - fn * L1;
+	r2 = fn * -L2;
+	r = r1 + r2;
+
+	/* Prepare scale factor. */
+	v.e = 1;
+	v.xbits.expsign = BIAS + k;
+	twopk = v.e;
+
+	/*
+	 * Evaluate lower terms of
+	 * expl(endpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2).
+	 */
+	z = r * r;
+	q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6;
+
+	t = (long double)s[n2].lo + s[n2].hi;
+
+	if (k == 0) {
+		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 +
+		    (s[n2].hi - 1);
+		RETURNI(t);
+	}
+
+	if (k == -1) {
+		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + 
+		    (s[n2].hi - 2);
+		RETURNI(t / 2);
+	}
+
+	if (k < -7) {
+		t = s[n2].lo + t * (q + r1) + s[n2].hi;
+		RETURNI(t * twopk - 1);
+	}
+
+	if (k > 2 * LDBL_MANT_DIG - 1) {
+		t = s[n2].lo + t * (q + r1) + s[n2].hi;
+		if (k == LDBL_MAX_EXP)
+			RETURNI(t * 2 * 0x1p16383L - 1);
+		RETURNI(t * twopk - 1);
+	}
+
+	v.xbits.expsign = BIAS - k;
+	twomk = v.e;
+
+	if (k > LDBL_MANT_DIG - 1)
+		t = s[n2].lo - twomk + t * (q + r1) + s[n2].hi;
+	else
+		t = s[n2].lo + t * (q + r1)  + (s[n2].hi - twomk);
+	RETURNI(t * twopk);
+}
--- ld128/s_expl.c	2013-05-28 09:36:11.000000000 -0700
+++ ld128/s_expl.c.all	2013-05-28 09:34:52.000000000 -0700
@@ -292,3 +292,214 @@
 		RETURNI(t * twopkp10000 * twom10000);
 	}
 }
+
+/*
+ * Our T1 and T2 are chosen to be approximately the points where method
+ * A and method B have the same accuracy.  Tang's T1 and T2 are the
+ * points where method A's accuracy changes by a full bit.  For Tang,
+ * this drop in accuracy makes method A immediately less accurate than
+ * method B, but our larger INTERVALS makes method A 2 bits more
+ * accurate so it remains the most accurate method significantly
+ * closer to the origin despite losing the full bit in our extended
+ * range for it.
+ */
+static const double
+T1 = -0.1659,				/* ~-30.625/128 * log(2) */
+T2 =  0.1659;				/* ~30.625/128 * log(2) */
+
+/*
+ * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2].
+ * Setting T3 to 0 would require the |x| < 0x1p-113  condition to appear
+ * in both subintervals, so set T3 = 2**-5, which places the condition
+ * into the [T1:T3] interval.
+ */
+static const double
+T3 = 0.03125;
+
+/*
+ * XXX Estimated range is for absolute error.
+ * Domain [-0.1659, 0.03125], range ~[-1.8933e-38, 1.8943e-38]:
+ * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-125.3
+ */
+static const long double
+C3  = 1.66666666666666666666666666666666667e-01L,
+C4  = 4.16666666666666666666666666666666645e-02L,
+C5  = 8.33333333333333333333333333333371638e-03L,
+C6  = 1.38888888888888888888888888891188658e-03L,
+C7  = 1.98412698412698412698412697235950394e-04L,
+C8  = 2.48015873015873015873015112487849040e-05L,
+C9  = 2.75573192239858906525606685484412005e-06L,
+C10 = 2.75573192239858906612966093057020362e-07L,
+C11 = 2.50521083854417203619031960151253944e-08L,
+C12 = 2.08767569878679576457272282566520649e-09L,
+C13 = 1.60590438367252471783548748824255707e-10L;
+
+static const double
+C14 = 1.1470745580491932e-11,		/* 0x1.93974a81dae3p-37 */
+C15 = 7.6471620181090468e-13,		/* 0x1.ae7f3820adab1p-41 */
+C16 = 4.7793721460260450e-14,		/* 0x1.ae7cd18a18eacp-45 */
+C17 = 2.8074757356658877e-15,		/* 0x1.949992a1937d9p-49 */
+C18 = 1.4760610323699476e-16;		/* 0x1.545b43aabfbcdp-53 */
+
+
+/*
+ * XXX Estimated range is for absolute error.
+ * Domain [0.03125, 0.1659], range ~[-2.7597e-38, 2.7602e-38]:
+ * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-124.8
+ */
+static const long double
+D3  = 1.66666666666666666666666666666682245e-01L,
+D4  = 4.16666666666666666666666666634228324e-02L,
+D5  = 8.33333333333333333333333364022244481e-03L,
+D6  = 1.38888888888888888888887138722762072e-03L,
+D7  = 1.98412698412698412699085805424661471e-04L,
+D8  = 2.48015873015873015687993712101479612e-05L,
+D9  = 2.75573192239858944101036288338208042e-06L,
+D10 = 2.75573192239853161148064676533754048e-07L,
+D11 = 2.50521083855084570046480450935267433e-08L,
+D12 = 2.08767569819738524488686318024854942e-09L,
+D13 = 1.60590442297008495301927448122499313e-10L;
+
+static const double
+D14 = 1.1470726176204336e-11,		/* 0x1.93971dc395d9ep-37 */
+D15 = 7.6478532249581686e-13,		/* 0x1.ae892e3D16fcep-41 */
+D16 = 4.7628892832607741e-14,		/* 0x1.ad00Dfe41feccp-45 */
+D17 = 3.0524857220358650e-15;		/* 0x1.D7e8d886Df921p-49 */
+
+long double
+expm1l(long double x)
+{
+	union IEEEl2bits u, v;
+	long double hx2_hi, hx2_lo, q, r, r1, t, twomk, twopk, x_hi;
+	long double x_lo, x2;
+	double dr, dx, fn, r2;
+	int k, n, n2;
+	uint16_t hx, ix;
+
+	/* Filter out exceptional cases. */
+	u.e = x;
+	hx = u.xbits.expsign;
+	ix = hx & 0x7fff;
+	if (ix >= BIAS + 7) {		/* |x| >= 128 or x is NaN */
+		if (ix == BIAS + LDBL_MAX_EXP) {
+			if (hx & 0x8000)  /* x is -Inf or -NaN */
+				return (-1 / x - 1);
+			return (x + x);	/* x is +Inf or +NaN */
+		}
+		if (x > o_threshold)
+			return (huge * huge);
+		/*
+		 * expm1l() never underflows, but it must avoid
+		 * unrepresentable large negative exponents.  We used a
+		 * much smaller threshold for large |x| above than in
+		 * expl() so as to handle not so large negative exponents
+		 * in the same way as large ones here.
+		 */
+		if (hx & 0x8000)	/* x <= -128 */
+			return (tiny - 1);	/* good for x < -114ln2 - eps */
+	}
+
+	ENTERI();
+
+	if (T1 < x && x < T2) {
+		x2 = x * x;
+		dx = x;
+
+		if (x < T3) {
+			if (ix < BIAS - 113) {	/* |x| < 0x1p-113 */
+				/* x (rounded) with inexact if x != 0: */
+				RETURNI(x == 0 ? x :
+				    (0x1p200 * x + fabsl(x)) * 0x1p-200);
+			}
+			q = x * x2 * C3 + x2 * x2 * (C4 + x * (C5 + x * (C6 +
+			    x * (C7 + x * (C8 + x * (C9 + x * (C10 +
+			    x * (C11 + x * (C12 + x * (C13 +
+			    dx * (C14 + dx * (C15 + dx * (C16 +
+			    dx * (C17 + dx * C18))))))))))))));
+		} else {
+			q = x * x2 * D3 + x2 * x2 * (D4 + x * (D5 + x * (D6 +
+			    x * (D7 + x * (D8 + x * (D9 + x * (D10 +
+			    x * (D11 + x * (D12 + x * (D13 +
+			    dx * (D14 + dx * (D15 + dx * (D16 +
+			    dx * D17)))))))))))));
+		}
+
+		x_hi = (float)x;
+		x_lo = x - x_hi;
+		hx2_hi = x_hi * x_hi / 2;
+		hx2_lo = x_lo * (x + x_hi) / 2;
+		if (ix >= BIAS - 7)
+			RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi));
+		else
+			RETURNI(hx2_lo + q + hx2_hi + x);
+	}
+
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
+	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
+	/* XXX assume no extra precision for the additions, as for trig fns. */
+	fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52;
+#if defined(HAVE_EFFICIENT_IRINT)
+	n  = irint(fn);
+#else
+	n  = (int)fn;
+#endif
+	n2 = (unsigned)n % INTERVALS;
+	k = n >> LOG2_INTERVALS;
+	r1 = x - fn * L1;
+	r2 = fn * -L2;
+	r = r1 + r2;
+
+	/* Prepare scale factor. */
+	/* XXX sparc64 multiplication is so slow that scalbnl() is faster. */
+	v.e = 1;
+	v.xbits.expsign = BIAS + k;
+	twopk = v.e;
+
+	/*
+	 * Evaluate lower terms of
+	 * expl(endpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2).
+	 */
+	dr = r;
+	q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 +
+	    dr * (A7 + dr * (A8 + dr * (A9 + dr * A10))))))));
+
+	t = s[n2].lo + s[n2].hi;
+
+	if (k == 0) {
+		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 +
+		    (s[n2].hi - 1);
+		RETURNI(t);
+	}
+
+	if (k == -1) {
+		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + 
+		    (s[n2].hi - 2);
+		RETURNI(t / 2);
+	}
+
+
+	if (k < -7) {
+		t = s[n2].lo + t * (q + r1) + s[n2].hi;
+		RETURNI(t * twopk - 1);
+	}
+
+	if (k > 2 * LDBL_MANT_DIG - 1) {
+		t = s[n2].lo + t * (q + r1) + s[n2].hi;
+		if (k == LDBL_MAX_EXP)
+			RETURNI(t * 2 * 0x1p16383L - 1);
+		RETURNI(t * twopk - 1);
+	}
+
+	v.xbits.expsign = BIAS - k;
+	twomk = v.e;
+
+	if (k > LDBL_MANT_DIG - 1)
+		t = s[n2].lo - twomk + t * (q + r1) + s[n2].hi;
+	else if (k < 1)
+		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 +
+		   (s[n2].hi - twomk);
+	else
+		t = s[n2].lo * (q + r1 + 1) + s[n2].hi * (q + r1) +
+		    (s[n2].hi - twomk);
+	RETURNI(t * twopk);
+}

From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 17:37:10 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 0208ED9D
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 17:37:10 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id DF6F0A95
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 17:37:09 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4SHb9mm051666
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 10:37:09 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4SHb9LG051665
 for freebsd-numerics@freebsd.org; Tue, 28 May 2013 10:37:09 -0700 (PDT)
 (envelope-from sgk)
Date: Tue, 28 May 2013 10:37:09 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: freebsd-numerics@freebsd.org
Subject: Re: Patches for s_expl.c
Message-ID: <20130528173709.GA51603@troutmask.apl.washington.edu>
References: <20130528172242.GA51485@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130528172242.GA51485@troutmask.apl.washington.edu>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 17:37:10 -0000

On Tue, May 28, 2013 at 10:22:42AM -0700, Steve Kargl wrote:
> Here are two patches for ld80/s_expl.c and ld128/s_expl.c.
> Instead of committing the one large patch that I have spent
> hours testing, I have split it into two.  One patch fixes/updates
> expl().  The other patch is the implementation of expm1l().  

I forgot to send the 3rd patch, which updates documentations,
deals with 53-bit long double targets, and math.h.  Yes, there
is some cruft in the diff, which I'll disentangle when I do
the commit.

-- 
steve

Index: Symbol.map
===================================================================
--- Symbol.map	(revision 251062)
+++ Symbol.map	(working copy)
@@ -250,4 +250,7 @@
 	ctanh;
 	ctanhf;
 	expl;
+	expm1l;
+	logl;
+	sincos;
 };
Index: man/exp.3
===================================================================
--- man/exp.3	(revision 251062)
+++ man/exp.3	(working copy)
@@ -41,6 +41,7 @@
 .Nm exp2l ,
 .Nm expm1 ,
 .Nm expm1f ,
+.Nm expm1l ,
 .Nm pow ,
 .Nm powf
 .Nd exponential and power functions
@@ -64,6 +65,8 @@
 .Fn expm1 "double x"
 .Ft float
 .Fn expm1f "float x"
+.Ft long double
+.Fn expm1l "long double x"
 .Ft double
 .Fn pow "double x" "double y"
 .Ft float
@@ -88,9 +91,10 @@
 .Fa x .
 .Pp
 The
-.Fn expm1
-and the
-.Fn expm1f
+.Fn expm1 ,
+.Fn expm1f ,
+and
+.Fn expm1l
 functions compute the value exp(x)\-1 accurately even for tiny argument
 .Fa x .
 .Pp
Index: src/math.h
===================================================================
--- src/math.h	(revision 251062)
+++ src/math.h	(working copy)
@@ -405,6 +405,7 @@
 long double	cosl(long double);
 long double	exp2l(long double);
 long double	expl(long double);
+long double	expm1l(long double);
 long double	fabsl(long double) __pure2;
 long double	fdiml(long double, long double);
 long double	floorl(long double);
@@ -419,6 +420,7 @@
 long long	llrintl(long double);
 long long	llroundl(long double);
 long double	logbl(long double);
+long double	logl(long double);
 long		lrintl(long double);
 long		lroundl(long double);
 long double	modfl(long double, long double *); /* fundamentally !__pure2 */
@@ -440,6 +442,11 @@
 long double	truncl(long double);
 
 #endif /* __ISO_C_VISIBLE >= 1999 */
+
+#if __BSD_VISIBLE
+void	sincos(double, double *, double *);
+#endif	/* __BSD_VISIBLE */
+
 __END_DECLS
 
 #endif /* !_MATH_H_ */
@@ -462,12 +469,10 @@
 long double	coshl(long double);
 long double	erfcl(long double);
 long double	erfl(long double);
-long double	expm1l(long double);
 long double	lgammal(long double);
 long double	log10l(long double);
 long double	log1pl(long double);
 long double	log2l(long double);
-long double	logl(long double);
 long double	powl(long double, long double);
 long double	sinhl(long double);
 long double	tanhl(long double);
Index: src/s_expm1.c
===================================================================
--- src/s_expm1.c	(revision 251062)
+++ src/s_expm1.c	(working copy)
@@ -216,3 +216,7 @@
 	}
 	return y;
 }
+
+#if (LDBL_MANT_DIG == 53)
+__weak_reference(expm1, expm1l);
+#endif

From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 21:58:38 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 11750D17
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 21:58:38 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 8FE9FD30
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 21:58:37 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 8A74B122EDD;
 Wed, 29 May 2013 07:39:12 +1000 (EST)
Date: Wed, 29 May 2013 07:39:04 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: Patches for s_expl.c
In-Reply-To: <20130528172242.GA51485@troutmask.apl.washington.edu>
Message-ID: <20130529062437.V4648@besplex.bde.org>
References: <20130528172242.GA51485@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=e/de0tV/ c=1 sm=1 a=kj9zAlcOel0A:10
 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=SI7jfN9uMIUA:10
 a=enA2T3gqEfefmBwEoGAA:9 a=CjuIK1q_8ugA:10 a=tJtbpcaLiRwA:10
 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
Cc: freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 21:58:38 -0000

On Tue, 28 May 2013, Steve Kargl wrote:

> Here are two patches for ld80/s_expl.c and ld128/s_expl.c.
> Instead of committing the one large patch that I have spent
> hours testing, I have split it into two.  One patch fixes/updates
> expl().  The other patch is the implementation of expm1l().
>
> My commit messages will be:
>
> Patch 1:
>
>   ld80/s_expl.c:
>
>   * Use the LOG2_INTERVALS macro instead of hardcoding 7.

The use of LOG2_INTERVALS isn't merged into the ld128 version.  Patch 2
merges its use for expm1l() only.

>   * Use LD80C to set overflow and underflow thresholds, and then use
>     #defines to access the .e component to reduce diffs with ld128 version.
>   * Rename polynomial coefficients P# to A#, which is used in Tang.

Almost all the declarations polynomial coefficients are still formatted
in a nonstandard way, but differently than in previous development
versions.  I keep sending you patches for this.

>   * Remove the use of intermediate results t23 and t45.
>   * Micro-optimization: remove access to u.xbits.man.

On the same line(s) that LOG2_INTERVALS is used, there is a more
important micro-optimization than this one.

>   * Fix an off-by-one in the underflow case.
>   * Replace a factor the long double constant 2.0L by the integer 2.  Let
>     the compiler to the conversion.
>
>   ld128/s_expl.c:
>
>   * Adjust Copyright years to reflect when bits of the code were actually
>     written.
>   * Reduce diff between the ld80 and ld128 versions.
>
> Patch 2:
>
>   ld80/s_expl.c:
>
>   * Compute expm1l(x) for Intel 80-bit format.
>
>   ld128/s_expl.c:
>
>   * Compute expm1l(x) for IEEE 754 128-bit format.

There is a fairly large bug in this, from only merging half of the
most recent micro-optimization in the development version of the ld80
version.  This might only be an efficiency bug, but I haven't tested
the ld128 version with either the full merge or the half merge.

The ld128 version still has excessive optimizations for |x| near 0.
It uses a slightly different high-degree polynomial on each side of
0.  The ld80 version uses the same poly on each side.  Most of the
style bugs in the 4 exp[!2]l functions are in the coeffs for the
polys on each side.  I haven't tried so hard to get you to fix them
since I want to remove them.

>
>   These are based on:
>
>   PTP Tang, "Table-driven implementation of the Expm1 function
>   in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18,
>   211-222 (1992).
>
> These commit logs may be too terse for some, but quite frankly after
> 2 or 3 years of submitting and resubmitting diffs, I've forgotten
> why some changes have or have not been made.
>
> expm1l() resides in s_expl.c because she shares the same table,
> polynomial coefficients, and some numerical constants with expl().

There are some minor style regressions relative to previous development
versions outside of poly coeffs.  Patches later.

> Index: ld80/s_expl.c
> ===================================================================
> --- ld80/s_expl.c	(revision 251062)
> +++ ld80/s_expl.c	(working copy)
> ...
> @@ -78,11 +82,11 @@
>  * |exp(x) - p(x)| < 2**-77.2
>  * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
>  */
> -P2 =  0.5,
> -P3 =  1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
> -P4 =  4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
> -P5 =  8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
> -P6 =  1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
> +A2  = 0.5,
> +A3  = 1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
> +A4  = 4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
> +A5  = 8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
> +A6  = 1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */

Example of a formatting regression.  The extra space that was before the
values is for a possible minus sign.  This space is still there for the
hex values.  The extra space before the equals sign is used for fancy
formatting to line up the values when the variable names reach A10.  Since
thee variable names only reach A6, this is not needed.

> ...
> @@ -242,23 +247,21 @@
> 	ix = hx & 0x7fff;
> 	if (ix >= BIAS + 13) {		/* |x| >= 8192 or x is NaN */
> 		if (ix == BIAS + LDBL_MAX_EXP) {
> -			if (hx & 0x8000 && u.xbits.man == 1ULL << 63)
> -				return (0.0L);	/* x is -Inf */
> -			return (x + x); /* x is +Inf, NaN or unsupported */
> +			if (hx & 0x8000)  /* x is -Inf, -NaN or unsupported */
> +				return (-1 / x);

Micro-optimization here.

> ...
> @@ -270,12 +273,12 @@
> 	n  = (int)fn;
> #endif
> 	n2 = (unsigned)n % INTERVALS;
> -	k = (n - n2) / INTERVALS;
> +	k = n >> LOG2_INTERVALS;
> 	r1 = x - fn * L1;
> -	r2 = -fn * L2;
> +	r2 = fn * -L2;

2 micro-optimizations.

> Index: ld128/s_expl.c
> ===================================================================
> --- ld128/s_expl.c	(revision 251062)
> +++ ld128/s_expl.c	(working copy)
> ...
> @@ -38,34 +40,56 @@
> #include "math_private.h"
>
> #define	INTERVALS	128
> +#define	LOG2_INTERVALS	7

Not used.

> ...
> 	n2 = (unsigned)n % INTERVALS;
> 	k = (n - n2) / INTERVALS;
> 	r1 = x - fn * L1;
> -	r2 = -fn * L2;
> +	r2 = fn * -L2;
> +	r = r1 + r2;

1 micro-optimization (that uses LOG2_INTERVALS) not merrged here.

> @@ -244,18 +276,19 @@
> 		twopkp10000 = v.e;
> 	}
>
> -	r = r1 + r2;
> -	q = r * r * (P2 + r * (P3 + r * (P4 + r * (P5 + r * (P6 + r * (P7 +
> -	    r * (P8 + r * (P9 + r * (P10 + r * P11)))))))));
> +	/* Evaluate expl(endpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */
> +	dr = r;
> +	q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 +
> +	    dr * (A7 + dr * (A8 + dr * (A9 + dr * A10))))))));

Macro-optimizations here.  Quite different from the ld80 ones.  The grouping
of terms was already quite different.  This merges a macro-optimization
technique from das's old work on the ld128 logl -- evaluate terms in double
precision if possible, since long double precision is so slow on sparc64
(about 1000 times slower than long double precision on x86.  Only hundreds
of times slower than double precision on sparc64).

> Patch 2:
>
> --- ld80/s_expl.c	2013-05-28 09:36:27.000000000 -0700
> +++ ld80/s_expl.c.all	2013-05-28 09:34:41.000000000 -0700
> @@ -302,3 +302,166 @@
> ...
> +	if (k == 0) {
> +		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 +
> +		    (s[n2].hi - 1);
> +		RETURNI(t);
> +	}
> +
> +	if (k == -1) {
> +		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 +
> +		    (s[n2].hi - 2);
> +		RETURNI(t / 2);
> +	}

Some cases are optimized here.

> ...
> +	if (k > LDBL_MANT_DIG - 1)
> +		t = s[n2].lo - twomk + t * (q + r1) + s[n2].hi;
> +	else
> +		t = s[n2].lo + t * (q + r1)  + (s[n2].hi - twomk);

The last statement isn't accurate enough for k = 0 and k = -1, so
handling of those cases were moved earlier so that this statement
could be optimized to what it is now.  The ld128 version is missing
this.

> ...
> --- ld128/s_expl.c	2013-05-28 09:36:11.000000000 -0700
> +++ ld128/s_expl.c.all	2013-05-28 09:34:52.000000000 -0700
> ...
> +	if (k == 0) {
> +		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 +
> +		    (s[n2].hi - 1);
> +		RETURNI(t);
> +	}
> +
> +	if (k == -1) {
> +		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 +
> +		    (s[n2].hi - 2);
> +		RETURNI(t / 2);
> +	}
> +
> +

Same as for ld808, except for 2 style bugs instead of 1 (1 more extra
blank line).

> +	if (k > LDBL_MANT_DIG - 1)
> +		t = s[n2].lo - twomk + t * (q + r1) + s[n2].hi;
> +	else if (k < 1)
> +		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 +
> +		   (s[n2].hi - twomk);
> +	else
> +		t = s[n2].lo * (q + r1 + 1) + s[n2].hi * (q + r1) +
> +		    (s[n2].hi - twomk);

Not the same as for ld128.  Still has the old slower code, so it probably
still works, but even more slowly than before except for k == 0 and k == -1,
since there are extra branches to filter out those values.

Some patches relative to my version now instead of later:

@ --- z22/s_expl.c	Wed May 29 04:48:10 2013
@ +++ ./s_expl.c	Wed May 29 06:16:29 2013
@ @@ -30,5 +30,5 @@
@  __FBSDID("$FreeBSD: src/lib/msun/ld80/s_expl.c,v 1.10 2012/10/13 19:53:11 kargl Exp $");
@ 
@ -/*-
@ +/**
@   * Compute the exponential of x for Intel 80-bit format.  This is based on:
@   *

This ugliness is now required by style(9) :-(.  You only made this change in
some places places.

The indent protection '/*-' was subverted to mean a copyright markup.  Its
previously-KNF use for non-copyrights was purged in some places but not
all.  It is still used extensively for non-copyrights in kern/kern_prot.c.

@ @@ -83,9 +83,9 @@
@   * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
@   */
@ -A2  = 0.5,
@ -A3  = 1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
@ -A4  = 4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
@ -A5  = 8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
@ -A6  = 1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
@ +A2 =  0.5,
@ +A3 =  1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
@ +A4 =  4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
@ +A5 =  8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
@ +A6 =  1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
@ 
@  /*

Fix regressions relative to a previous development version.

@ @@ -267,11 +275,12 @@
@  	r = x - fn * L1 - fn * L2;	/* r = r1 + r2 done independently. */
@  #if defined(HAVE_EFFICIENT_IRINTL)
@ -	n  = irintl(fn);
@ +	n = irintl(fn);
@  #elif defined(HAVE_EFFICIENT_IRINT)
@ -	n  = irint(fn);
@ +	n = irint(fn);
@  #else
@ -	n  = (int)fn;
@ +	n = (int)fn;

Fix more regressions.

@  #endif
@  	n2 = (unsigned)n % INTERVALS;
@ +	/* Depend on the sign bit being propagated: */
@  	k = n >> LOG2_INTERVALS;
@  	r1 = x - fn * L1;

I think a comment is needed.  This micro-optimization was merged from
s_exp2*.c, where it is commented on more prominently for the long
double versions only.

@ @@ -327,6 +336,15 @@
@ 
@  /*
@ - * Domain [-0.1659, 0.1659], range ~[-1.2027e-22, 3.4417e-22]:
@ - * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.2
@ + * Domain [-0.1659, 0.1659], range ~[-2.6155e-22, 2.5507e-23]:
@ + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.6

The coeffs were improved a little, but the comment wasn't updated to match.

@ + *
@ + * XXX the coeffs aren't very carefully rounded, and I get 4.5 more bits,
@ + * but unlike for ld128 we can't drop any terms.
@ + *
@ + * XXX this still isn't in standard format:
@ + * - extra digits in exponents for decimal values
@ + * - no space for a (not present) minus sign in either the decimal or hex
@ + *   values
@ + * - perhaps they are impossible for double values
@   */
@  static const union IEEEl2bits

The coeffs have lots of style bugs, though not as many as for ld128.

I'm not sure where the latest set of B coeffs came from.  Looks like
you improved your generation of them.  You still seem to minimize the
absolute error.  This gives larger than necessary relative errors,
especially near the endpoints.  I think I wrote the new and old
versions of the comment about the domain and range.  I take a proposed
set of coeffs and plot the relative error of the function given by them,
then copy the results to the comment.

@ @@ -389,4 +409,9 @@
@  		x4 = x2 * x2;
@  		q = x4 * (x2 * (x4 *
@ +		    /*
@ +		     * XXX the number of terms is no longer good for
@ +		     * pairwise grouping of all except B3, and the
@ +		     * grouping is no longer from highest down.
@ +		     */
@  		    (x2 *            B12  + (x * B11 + B10)) +
@  		    (x2 * (x * B9 +  B8) +  (x * B7 +  B6))) +
@ @@ -407,9 +432,9 @@
@  	fn = x * INV_L + 0x1.8p63 - 0x1.8p63;
@  #if defined(HAVE_EFFICIENT_IRINTL)
@ -	n  = irintl(fn);
@ +	n = irintl(fn);
@  #elif defined(HAVE_EFFICIENT_IRINT)
@ -	n  = irint(fn);
@ +	n = irint(fn);
@  #else
@ -	n  = (int)fn;
@ +	n = (int)fn;
@  #endif
@  	n2 = (unsigned)n % INTERVALS;
@ @@ -434,22 +459,21 @@
@ 
@  	if (k == 0) {
@ -		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 +
@ -		    (s[n2].hi - 1);
@ +		t = SUM2P(s[n2].hi - 1, s[n2].lo * (r1 + 1) + t * q +
@ +		    s[n2].hi * r1);
@  		RETURNI(t);
@  	}
@ -

Style bug (extra blank line between related statements).

@  	if (k == -1) {
@ -		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + 
@ -		    (s[n2].hi - 2);
@ +		t = SUM2P(s[n2].hi - 2, s[n2].lo * (r1 + 1) + t * q +
@ +		    s[n2].hi * r1);
@  		RETURNI(t / 2);
@  	}
@

This blank line is correct since the statements are unrelated -- the
evaluation method changes significantly.  For k = 0 and k = -1, the
evaluation is the same but we repeat it all to avoid using a variable
for (k - 1) for the 2 values of k.

@  	if (k < -7) {
@ -		t = s[n2].lo + t * (q + r1) + s[n2].hi;
@ +		t = SUM2P(s[n2].hi, s[n2].lo + t * (q + r1));
@  		RETURNI(t * twopk - 1);
@  	}
@ 
@  	if (k > 2 * LDBL_MANT_DIG - 1) {
@ -		t = s[n2].lo + t * (q + r1) + s[n2].hi;
@ +		t = SUM2P(s[n2].hi, s[n2].lo + t * (q + r1));
@  		if (k == LDBL_MAX_EXP)
@  			RETURNI(t * 2 * 0x1p16383L - 1);

Ignore all the other changes in this hunk.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 22:53:11 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id D1FBAAE8
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 22:53:11 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id B6264FA2
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 22:53:11 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4SMrAbL053382; 
 Tue, 28 May 2013 15:53:10 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4SMrAkA053381;
 Tue, 28 May 2013 15:53:10 -0700 (PDT) (envelope-from sgk)
Date: Tue, 28 May 2013 15:53:10 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: Patches for s_expl.c
Message-ID: <20130528225310.GA53144@troutmask.apl.washington.edu>
References: <20130528172242.GA51485@troutmask.apl.washington.edu>
 <20130529062437.V4648@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130529062437.V4648@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 22:53:11 -0000

On Wed, May 29, 2013 at 07:39:04AM +1000, Bruce Evans wrote:
> On Tue, 28 May 2013, Steve Kargl wrote:
> 
> > Here are two patches for ld80/s_expl.c and ld128/s_expl.c.
> > Instead of committing the one large patch that I have spent
> > hours testing, I have split it into two.  One patch fixes/updates
> > expl().  The other patch is the implementation of expm1l().
> >
> > My commit messages will be:
> >
> > Patch 1:
> >
> >   ld80/s_expl.c:
> >
> >   * Use the LOG2_INTERVALS macro instead of hardcoding 7.
> 
> The use of LOG2_INTERVALS isn't merged into the ld128 version.  Patch 2
> merges its use for expm1l() only.
> 
> >   * Use LD80C to set overflow and underflow thresholds, and then use
> >     #defines to access the .e component to reduce diffs with ld128 version.
> >   * Rename polynomial coefficients P# to A#, which is used in Tang.
> 
> Almost all the declarations polynomial coefficients are still formatted
> in a nonstandard way, but differently than in previous development
> versions.  I keep sending you patches for this.

Given that I've merged, unmerged, remerged, disremerged, and
undisremerged numerous diffs over the last 2+ years, I am not
surprise that there are issues with the patches.  I'm neither
an expert in floating arithmetic nor style(9).  If I understand
half of what you write when you annotate one of your diffs, I 
feel lucky.

(Un)fortunately, I only have a few hours this week to work on
expl/expm1l, and then I'll disappear again for a month or two
(due to work and life).  (Un)fortunately, theraven (under the
pretense of core) has threaten to completely rendered libm into
a crippled useless mess by mapping all unimplemented long double
functions to their double cousins.  When/if it comes to pass
that I have to untangle whatever theraven does, I'll likely
just walk away from libm hacking.

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Tue May 28 23:17:49 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id D3AA390
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 23:17:49 +0000 (UTC)
 (envelope-from s.montgomerysmith@gmail.com)
Received: from mail-ie0-x229.google.com (mail-ie0-x229.google.com
 [IPv6:2607:f8b0:4001:c03::229])
 by mx1.freebsd.org (Postfix) with ESMTP id A72511CE
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 23:17:49 +0000 (UTC)
Received: by mail-ie0-f169.google.com with SMTP id u16so23556194iet.0
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 16:17:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:subject
 :references:in-reply-to:x-enigmail-version:content-type
 :content-transfer-encoding;
 bh=SvFDEBzW9er/cY4ta5wvKGwTqj1R7Vi6usIXpyhzSro=;
 b=BBHPsF/Me+J2MOPl5ruP1iBf5gNUXvBIK4xOv1cXSuPxefFOuwhqqRPzcO84PQx22b
 vNEeQmZatx0EL2tCz+PkUKoXnkx2uAfhMa0gzV21sD51RWLtyPjeOyR/JqZ2tTH9WJMl
 6D/lc2qNNQ24qJGBUiEymmWTzHtUFgOjHGOPfsZVlwuWj3JEYFFnm97S86HcenaXIGly
 8gDDXl1UEcNKzSMhDaJxAPPwq39ws0vwxe0WVJF+jLUQTCY+jAwGB8nquxovbrvdysrE
 DmYv8/WbN6YPArTgHirgrTNjKIzNZGac6HcbU8mvyO/JV3/WVfpemb1qfuXjVNqnGvoy
 M8nA==
X-Received: by 10.50.8.65 with SMTP id p1mr54053iga.19.1369783069366;
 Tue, 28 May 2013 16:17:49 -0700 (PDT)
Received: from [192.168.0.11] (50-82-246-58.client.mchsi.com. [50.82.246.58])
 by mx.google.com with ESMTPSA id
 ct8sm20129230igb.7.2013.05.28.16.17.47
 for <freebsd-numerics@freebsd.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Tue, 28 May 2013 16:17:48 -0700 (PDT)
Sender: Stephen Montgomery-Smith <s.montgomerysmith@gmail.com>
Message-ID: <51A53B1A.9040607@missouri.edu>
Date: Tue, 28 May 2013 18:17:46 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
 rv:17.0) Gecko/20130510 Thunderbird/17.0.6
MIME-Version: 1.0
To: freebsd-numerics@freebsd.org
Subject: Re: Patches for s_expl.c
References: <20130528172242.GA51485@troutmask.apl.washington.edu>
 <20130529062437.V4648@besplex.bde.org>
 <20130528225310.GA53144@troutmask.apl.washington.edu>
In-Reply-To: <20130528225310.GA53144@troutmask.apl.washington.edu>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2013 23:17:49 -0000

On 05/28/2013 05:53 PM, Steve Kargl wrote:

> Given that I've merged, unmerged, remerged, disremerged, and
> undisremerged numerous diffs over the last 2+ years, I am not
> surprise that there are issues with the patches.  I'm neither
> an expert in floating arithmetic nor style(9).  If I understand
> half of what you write when you annotate one of your diffs, I 
> feel lucky.
> 
> (Un)fortunately, I only have a few hours this week to work on
> expl/expm1l, and then I'll disappear again for a month or two
> (due to work and life).  (Un)fortunately, theraven (under the
> pretense of core) has threaten to completely rendered libm into
> a crippled useless mess by mapping all unimplemented long double
> functions to their double cousins.  When/if it comes to pass
> that I have to untangle whatever theraven does, I'll likely
> just walk away from libm hacking.

I think it is better to commit "as is" if you cannot make all the changes.

As for me, I don't really understand the need to be so consistent with
style, nor to get every last drop of optimization.  In particular,
regarding style, I think it is like people talking different languages.
 You could insist that everyone speak a common language, but it is far
better for the intellectual commons if people learn other peoples'
languages.

Anyway, I think it is better for Steve to commit, and then for Bruce to
make changes later on.


From owner-freebsd-numerics@FreeBSD.ORG  Wed May 29 00:06:23 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 1E0A0AA8
 for <freebsd-numerics@freebsd.org>; Wed, 29 May 2013 00:06:23 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id DCFC66B6
 for <freebsd-numerics@freebsd.org>; Wed, 29 May 2013 00:06:22 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4T06MqY053909; 
 Tue, 28 May 2013 17:06:22 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4T06MI4053908;
 Tue, 28 May 2013 17:06:22 -0700 (PDT) (envelope-from sgk)
Date: Tue, 28 May 2013 17:06:22 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Stephen Montgomery-Smith <stephen@missouri.edu>
Subject: Re: Patches for s_expl.c
Message-ID: <20130529000622.GA53899@troutmask.apl.washington.edu>
References: <20130528172242.GA51485@troutmask.apl.washington.edu>
 <20130529062437.V4648@besplex.bde.org>
 <20130528225310.GA53144@troutmask.apl.washington.edu>
 <51A53B1A.9040607@missouri.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <51A53B1A.9040607@missouri.edu>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 May 2013 00:06:23 -0000

On Tue, May 28, 2013 at 06:17:46PM -0500, Stephen Montgomery-Smith wrote:
> On 05/28/2013 05:53 PM, Steve Kargl wrote:
> 
> > Given that I've merged, unmerged, remerged, disremerged, and
> > undisremerged numerous diffs over the last 2+ years, I am not
> > surprise that there are issues with the patches.  I'm neither
> > an expert in floating arithmetic nor style(9).  If I understand
> > half of what you write when you annotate one of your diffs, I 
> > feel lucky.
> > 
> > (Un)fortunately, I only have a few hours this week to work on
> > expl/expm1l, and then I'll disappear again for a month or two
> > (due to work and life).  (Un)fortunately, theraven (under the
> > pretense of core) has threaten to completely rendered libm into
> > a crippled useless mess by mapping all unimplemented long double
> > functions to their double cousins.  When/if it comes to pass
> > that I have to untangle whatever theraven does, I'll likely
> > just walk away from libm hacking.
> 
> I think it is better to commit "as is" if you cannot make all the changes.
> 
> As for me, I don't really understand the need to be so consistent with
> style, nor to get every last drop of optimization.  In particular,
> regarding style, I think it is like people talking different languages.
>  You could insist that everyone speak a common language, but it is far
> better for the intellectual commons if people learn other peoples'
> languages.
> 
> Anyway, I think it is better for Steve to commit, and then for Bruce to
> make changes later on.
> 

It's too late.  In making some change since the last time I test
has introduced a massive regression in the computation of expm1l.

laptop-kargl:kargl[204] ./testl -n 5 -b
prec: 64
For x in [-64.0000:-0.1659], 5M expm1l calls in 2.176513 seconds.
For x in [-0.1659:0.1659], 5M expm1l calls in 0.415051 seconds.
For x in [0.1659:11356.0000], 5M expm1l calls in 0.550342 seconds.

Notice, the first interval is now 4 to 5 times slower than the
other intervals.  This was not the case with an older version
of the code.

:(


-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Wed May 29 01:21:19 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 2E775467
 for <freebsd-numerics@freebsd.org>; Wed, 29 May 2013 01:21:19 +0000 (UTC)
 (envelope-from s.montgomerysmith@gmail.com)
Received: from mail-ie0-x232.google.com (mail-ie0-x232.google.com
 [IPv6:2607:f8b0:4001:c03::232])
 by mx1.freebsd.org (Postfix) with ESMTP id F2BE7A79
 for <freebsd-numerics@freebsd.org>; Wed, 29 May 2013 01:21:18 +0000 (UTC)
Received: by mail-ie0-f178.google.com with SMTP id f4so7240082iea.37
 for <freebsd-numerics@freebsd.org>; Tue, 28 May 2013 18:21:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:subject
 :references:in-reply-to:x-enigmail-version:content-type
 :content-transfer-encoding;
 bh=vZGvRyAYMcXbSQd+Pef+ozY2WNuHDSVSkodRKHfc5CE=;
 b=L+ejsyWiYDzrzqDLg7R9uQ+Bk32Dl+DULmlvMZsdCPyO5lhO+FeTkquPr5KtBAr1tv
 AqD91GrLc59FPmMIgnp4/ZaGY7SmE7qrz7MRTkDAlAMgPeaieWt5GoImICPp96T2F2kt
 +31Tx+X2k/V2nLt3U5n6+Eyfkg8/sGAbCOjCFg7rT8TYptZzKXELM2f45RH3ZGopiQuL
 sJ0IFeJa3bV0Fq3o5HfWkXkZcGJORreUKVWz+LSwWHH1qczA89eNxYLRecPJ4MIIqXlN
 Dnkf8mjkhwdjODCMbHJ/VtPQFKo1QCGuTa4b0kWV3lFPADTk4/bV060mVyQ3yIPhaQAP
 GMkQ==
X-Received: by 10.42.84.73 with SMTP id k9mr140386icl.50.1369790478756;
 Tue, 28 May 2013 18:21:18 -0700 (PDT)
Received: from [192.168.0.11] (50-82-246-58.client.mchsi.com. [50.82.246.58])
 by mx.google.com with ESMTPSA id
 o10sm20679318igh.2.2013.05.28.18.21.16
 for <freebsd-numerics@freebsd.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Tue, 28 May 2013 18:21:17 -0700 (PDT)
Sender: Stephen Montgomery-Smith <s.montgomerysmith@gmail.com>
Message-ID: <51A5580C.9000607@missouri.edu>
Date: Tue, 28 May 2013 20:21:16 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
 rv:17.0) Gecko/20130510 Thunderbird/17.0.6
MIME-Version: 1.0
To: freebsd-numerics@freebsd.org
Subject: Re: Patches for s_expl.c
References: <20130528172242.GA51485@troutmask.apl.washington.edu>
 <20130529062437.V4648@besplex.bde.org>
 <20130528225310.GA53144@troutmask.apl.washington.edu>
 <51A53B1A.9040607@missouri.edu>
 <20130529000622.GA53899@troutmask.apl.washington.edu>
In-Reply-To: <20130529000622.GA53899@troutmask.apl.washington.edu>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 May 2013 01:21:19 -0000

On 05/28/2013 07:06 PM, Steve Kargl wrote:
> On Tue, May 28, 2013 at 06:17:46PM -0500, Stephen Montgomery-Smith wrote:
>> On 05/28/2013 05:53 PM, Steve Kargl wrote:
>>
>>> Given that I've merged, unmerged, remerged, disremerged, and
>>> undisremerged numerous diffs over the last 2+ years, I am not
>>> surprise that there are issues with the patches.  I'm neither
>>> an expert in floating arithmetic nor style(9).  If I understand
>>> half of what you write when you annotate one of your diffs, I 
>>> feel lucky.
>>>
>>> (Un)fortunately, I only have a few hours this week to work on
>>> expl/expm1l, and then I'll disappear again for a month or two
>>> (due to work and life).  (Un)fortunately, theraven (under the
>>> pretense of core) has threaten to completely rendered libm into
>>> a crippled useless mess by mapping all unimplemented long double
>>> functions to their double cousins.  When/if it comes to pass
>>> that I have to untangle whatever theraven does, I'll likely
>>> just walk away from libm hacking.
>>
>> I think it is better to commit "as is" if you cannot make all the changes.
>>
>> As for me, I don't really understand the need to be so consistent with
>> style, nor to get every last drop of optimization.  In particular,
>> regarding style, I think it is like people talking different languages.
>>  You could insist that everyone speak a common language, but it is far
>> better for the intellectual commons if people learn other peoples'
>> languages.
>>
>> Anyway, I think it is better for Steve to commit, and then for Bruce to
>> make changes later on.
>>
> 
> It's too late.  In making some change since the last time I test
> has introduced a massive regression in the computation of expm1l.
> 
> laptop-kargl:kargl[204] ./testl -n 5 -b
> prec: 64
> For x in [-64.0000:-0.1659], 5M expm1l calls in 2.176513 seconds.
> For x in [-0.1659:0.1659], 5M expm1l calls in 0.415051 seconds.
> For x in [0.1659:11356.0000], 5M expm1l calls in 0.550342 seconds.
> 
> Notice, the first interval is now 4 to 5 times slower than the
> other intervals.  This was not the case with an older version
> of the code.
> 
> :(

I think it is still better to commit.  Then figure out where the
regression was later, when you have time.


From owner-freebsd-numerics@FreeBSD.ORG  Wed May 29 11:04:54 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id D32E175A
 for <freebsd-numerics@FreeBSD.org>; Wed, 29 May 2013 11:04:54 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 7C5C6F33
 for <freebsd-numerics@FreeBSD.org>; Wed, 29 May 2013 11:04:54 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 8C5461217FE;
 Wed, 29 May 2013 21:04:51 +1000 (EST)
Date: Wed, 29 May 2013 21:04:50 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Stephen Montgomery-Smith <stephen@missouri.edu>
Subject: Re: Patches for s_expl.c
In-Reply-To: <51A5580C.9000607@missouri.edu>
Message-ID: <20130529203350.V1268@besplex.bde.org>
References: <20130528172242.GA51485@troutmask.apl.washington.edu>
 <20130529062437.V4648@besplex.bde.org>
 <20130528225310.GA53144@troutmask.apl.washington.edu>
 <51A53B1A.9040607@missouri.edu>
 <20130529000622.GA53899@troutmask.apl.washington.edu>
 <51A5580C.9000607@missouri.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=e4Ne0tV/ c=1 sm=1 a=kj9zAlcOel0A:10
 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=SI7jfN9uMIUA:10
 a=hyAGcHVSu_I8guswM6YA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
Cc: freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 May 2013 11:04:54 -0000

On Tue, 28 May 2013, Stephen Montgomery-Smith wrote:

> On 05/28/2013 07:06 PM, Steve Kargl wrote:
>> On Tue, May 28, 2013 at 06:17:46PM -0500, Stephen Montgomery-Smith wrote:
>>> On 05/28/2013 05:53 PM, Steve Kargl wrote:
>>>
>>>> Given that I've merged, unmerged, remerged, disremerged, and
>>>> undisremerged numerous diffs over the last 2+ years, I am not
>>>> surprise that there are issues with the patches.  I'm neither
>>>> an expert in floating arithmetic nor style(9).  If I understand
>>>> half of what you write when you annotate one of your diffs, I
>>>> feel lucky.

Mail is not a very suitable medium for exchanging patches (but is
better than a vcs that is not shared, or url).

>>>> (Un)fortunately, I only have a few hours this week to work on
>>>> expl/expm1l, and then I'll disappear again for a month or two
>>>> (due to work and life).  (Un)fortunately, theraven (under the
>>>> ...

It can take a long time to merger patches, especially when the
turnaround time is months.  I take more than a few hours a week on
this when I'm working on it.

>>> ...
>>> Anyway, I think it is better for Steve to commit, and then for Bruce to
>>> make changes later on.
>>
>> It's too late.  In making some change since the last time I test
>> has introduced a massive regression in the computation of expm1l.
>>
>> laptop-kargl:kargl[204] ./testl -n 5 -b
>> prec: 64
>> For x in [-64.0000:-0.1659], 5M expm1l calls in 2.176513 seconds.
>> For x in [-0.1659:0.1659], 5M expm1l calls in 0.415051 seconds.
>> For x in [0.1659:11356.0000], 5M expm1l calls in 0.550342 seconds.
>>
>> Notice, the first interval is now 4 to 5 times slower than the
>> other intervals.  This was not the case with an older version
>> of the code.

I don't see this (only checked on i386 so far).  expm1l on
[-64.0000:-0.1659] takes about 55-59 cycles (22 nsec; 5M calls in 0.11
seconds) on freefall (Xeon i7(?)) when compiled by gcc.  Other
intervals are only a couple of cycles faster, except when compiled by
clang expm1l takes only 44-45 cycles on [-0.1659:0.1659].

Large slowdowns may be caused by exceptions, but I tested the above
range with overflow and underflow traps and didn't get any.

> I think it is still better to commit.  Then figure out where the
> regression was later, when you have time.

This is OK for transient efficiency regressions, not for accuracy ones.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Wed May 29 16:24:48 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id C591798D
 for <freebsd-numerics@freebsd.org>; Wed, 29 May 2013 16:24:48 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id A57C7801
 for <freebsd-numerics@freebsd.org>; Wed, 29 May 2013 16:24:48 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4TGOfSa058882; 
 Wed, 29 May 2013 09:24:41 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4TGOf7Y058881;
 Wed, 29 May 2013 09:24:41 -0700 (PDT) (envelope-from sgk)
Date: Wed, 29 May 2013 09:24:41 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: Patches for s_expl.c
Message-ID: <20130529162441.GA58773@troutmask.apl.washington.edu>
References: <20130528172242.GA51485@troutmask.apl.washington.edu>
 <20130529062437.V4648@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130529062437.V4648@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 May 2013 16:24:48 -0000

On Wed, May 29, 2013 at 07:39:04AM +1000, Bruce Evans wrote:
> On Tue, 28 May 2013, Steve Kargl wrote:
> 
> > Here are two patches for ld80/s_expl.c and ld128/s_expl.c.
> > Instead of committing the one large patch that I have spent
> > hours testing, I have split it into two.  One patch fixes/updates
> > expl().  The other patch is the implementation of expm1l().
> >
> > My commit messages will be:
> >
> > Patch 1:
> >
> >   ld80/s_expl.c:
> >
> >   * Use the LOG2_INTERVALS macro instead of hardcoding 7.
> 
> The use of LOG2_INTERVALS isn't merged into the ld128 version.  Patch 2
> merges its use for expm1l() only.

Hopefully, fixed.

> >   * Use LD80C to set overflow and underflow thresholds, and then use
> >     #defines to access the .e component to reduce diffs with ld128 version.
> >   * Rename polynomial coefficients P# to A#, which is used in Tang.
> 
> Almost all the declarations polynomial coefficients are still formatted
> in a nonstandard way, but differently than in previous development
> versions.  I keep sending you patches for this.

Hopefully, fixed.  All fancy whitespace has been removed including
in comments with hex values.

> >   * Compute expm1l(x) for IEEE 754 128-bit format.
> 
> There is a fairly large bug in this, from only merging half of the
> most recent micro-optimization in the development version of the ld80
> version.  This might only be an efficiency bug, but I haven't tested
> the ld128 version with either the full merge or the half merge.
> 
> The ld128 version still has excessive optimizations for |x| near 0.
> It uses a slightly different high-degree polynomial on each side of
> 0.  The ld80 version uses the same poly on each side.  Most of the
> style bugs in the 4 exp[!2]l functions are in the coeffs for the
> polys on each side.  I haven't tried so hard to get you to fix them
> since I want to remove them.

Hopefully, fixed to the extent that opened ld80/s_expl.c in one
nedit window and ld128/s_expl.c in another.  I copied everything
from ld80 to ld128 except of course literal constants and 
polynomials that must be different.

> >
> >   These are based on:
> >
> >   PTP Tang, "Table-driven implementation of the Expm1 function
> >   in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18,
> >   211-222 (1992).
> >
> > These commit logs may be too terse for some, but quite frankly after
> > 2 or 3 years of submitting and resubmitting diffs, I've forgotten
> > why some changes have or have not been made.
> >
> > expm1l() resides in s_expl.c because she shares the same table,
> > polynomial coefficients, and some numerical constants with expl().
> 
> There are some minor style regressions relative to previous development
> versions outside of poly coeffs.  Patches later.

I'm sure you're going to hate the new patch at the end.

> > Index: ld80/s_expl.c
> > ===================================================================
> > --- ld80/s_expl.c	(revision 251062)
> > +++ ld80/s_expl.c	(working copy)
> > ...
> > @@ -78,11 +82,11 @@
> >  * |exp(x) - p(x)| < 2**-77.2
> >  * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
> >  */
> > -P2 =  0.5,
> > -P3 =  1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
> > -P4 =  4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
> > -P5 =  8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
> > -P6 =  1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
> > +A2  = 0.5,
> > +A3  = 1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
> > +A4  = 4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
> > +A5  = 8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
> > +A6  = 1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
> 
> Example of a formatting regression.  The extra space that was before the
> values is for a possible minus sign.  This space is still there for the
> hex values.  The extra space before the equals sign is used for fancy
> formatting to line up the values when the variable names reach A10.  Since
> thee variable names only reach A6, this is not needed.

All coefficient are now formatted with the form:

A6 = 1.3888891738560272e-3;       /* 0x16c16c651633ae.0p-62 */

ie., 1 space before and 1 space after =.  The space in the comments
for the implicit + sign has been removed.

> > Index: ld128/s_expl.c
> > ===================================================================
> > --- ld128/s_expl.c	(revision 251062)
> > +++ ld128/s_expl.c	(working copy)
> > ...
> > @@ -38,34 +40,56 @@
> > #include "math_private.h"
> >
> > #define	INTERVALS	128
> > +#define	LOG2_INTERVALS	7
> 
> Not used.

Hopefully, fixed.

> > 	n2 = (unsigned)n % INTERVALS;
> > 	k = (n - n2) / INTERVALS;
> > 	r1 = x - fn * L1;
> > -	r2 = -fn * L2;
> > +	r2 = fn * -L2;
> > +	r = r1 + r2;
> 
> 1 micro-optimization (that uses LOG2_INTERVALS) not merrged here.
> 

Hopefully, fixed.

> > ...
> > +	if (k > LDBL_MANT_DIG - 1)
> > +		t = s[n2].lo - twomk + t * (q + r1) + s[n2].hi;
> > +	else
> > +		t = s[n2].lo + t * (q + r1)  + (s[n2].hi - twomk);
> 
> The last statement isn't accurate enough for k = 0 and k = -1, so
> handling of those cases were moved earlier so that this statement
> could be optimized to what it is now.  The ld128 version is missing
> this.

ld80 code merged into ld128.

> > --- ld128/s_expl.c	2013-05-28 09:36:11.000000000 -0700
> > +++ ld128/s_expl.c.all	2013-05-28 09:34:52.000000000 -0700
> > ...
> > +	if (k == 0) {
> > +		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 +
> > +		    (s[n2].hi - 1);
> > +		RETURNI(t);
> > +	}
> > +
> > +	if (k == -1) {
> > +		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 +
> > +		    (s[n2].hi - 2);
> > +		RETURNI(t / 2);
> > +	}
> > +
> > +
> 
> Same as for ld808, except for 2 style bugs instead of 1 (1 more extra
> blank line).

Hopefully, fixed.

> > +	if (k > LDBL_MANT_DIG - 1)
> > +		t = s[n2].lo - twomk + t * (q + r1) + s[n2].hi;
> > +	else if (k < 1)
> > +		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 +
> > +		   (s[n2].hi - twomk);
> > +	else
> > +		t = s[n2].lo * (q + r1 + 1) + s[n2].hi * (q + r1) +
> > +		    (s[n2].hi - twomk);
> 
> Not the same as for ld128.  Still has the old slower code, so it probably
> still works, but even more slowly than before except for k == 0 and k == -1,
> since there are extra branches to filter out those values.

ld80 and ld128 now use identical code.

> 
> Some patches relative to my version now instead of later:
> 
> @ --- z22/s_expl.c	Wed May 29 04:48:10 2013
> @ +++ ./s_expl.c	Wed May 29 06:16:29 2013
> @ @@ -30,5 +30,5 @@
> @  __FBSDID("$FreeBSD: src/lib/msun/ld80/s_expl.c,v 1.10 2012/10/13 19:53:11 kargl Exp $");
> @ 
> @ -/*-
> @ +/**
> @   * Compute the exponential of x for Intel 80-bit format.  This is based on:
> @   *
> 
> This ugliness is now required by style(9) :-(.  You only made this change in
> some places places.

Hopefully, fixed.

> @ @@ -83,9 +83,9 @@
> @   * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
> @   */
> @ -A2  = 0.5,
> @ -A3  = 1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
> @ -A4  = 4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
> @ -A5  = 8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
> @ -A6  = 1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
> @ +A2 =  0.5,
> @ +A3 =  1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
> @ +A4 =  4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
> @ +A5 =  8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
> @ +A6 =  1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
> @ 
> @  /*
> 
> Fix regressions relative to a previous development version.

I made this conform to style(9).

> @ @@ -267,11 +275,12 @@
> @  	r = x - fn * L1 - fn * L2;	/* r = r1 + r2 done independently. */
> @  #if defined(HAVE_EFFICIENT_IRINTL)
> @ -	n  = irintl(fn);
> @ +	n = irintl(fn);
> @  #elif defined(HAVE_EFFICIENT_IRINT)
> @ -	n  = irint(fn);
> @ +	n = irint(fn);
> @  #else
> @ -	n  = (int)fn;
> @ +	n = (int)fn;
> 
> Fix more regressions.

Hopefully, fixed.

> @  #endif
> @  	n2 = (unsigned)n % INTERVALS;
> @ +	/* Depend on the sign bit being propagated: */
> @  	k = n >> LOG2_INTERVALS;
> @  	r1 = x - fn * L1;
> 
> I think a comment is needed.  This micro-optimization was merged from
> s_exp2*.c, where it is commented on more prominently for the long
> double versions only.

Ignored adding a comment.

> 
> The coeffs have lots of style bugs, though not as many as for ld128.
> 

Hopefully, fixed.

> @ @@ -389,4 +409,9 @@
> @  		x4 = x2 * x2;
> @  		q = x4 * (x2 * (x4 *
> @ +		    /*
> @ +		     * XXX the number of terms is no longer good for
> @ +		     * pairwise grouping of all except B3, and the
> @ +		     * grouping is no longer from highest down.
> @ +		     */
> @  		    (x2 *            B12  + (x * B11 + B10)) +
> @  		    (x2 * (x * B9 +  B8) +  (x * B7 +  B6))) +

I left this as-is with whitespace and did not add the comment.
This should be the only place where there is a substantial
deviation from style(9).

> @ @@ -407,9 +432,9 @@
> @  	fn = x * INV_L + 0x1.8p63 - 0x1.8p63;
> @  #if defined(HAVE_EFFICIENT_IRINTL)
> @ -	n  = irintl(fn);
> @ +	n = irintl(fn);
> @  #elif defined(HAVE_EFFICIENT_IRINT)
> @ -	n  = irint(fn);
> @ +	n = irint(fn);
> @  #else
> @ -	n  = (int)fn;
> @ +	n = (int)fn;
> @  #endif

Hopefully, fixed.

> @  	n2 = (unsigned)n % INTERVALS;
> @ @@ -434,22 +459,21 @@
> @ 
> @  	if (k == 0) {
> @ -		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 +
> @ -		    (s[n2].hi - 1);
> @ +		t = SUM2P(s[n2].hi - 1, s[n2].lo * (r1 + 1) + t * q +
> @ +		    s[n2].hi * r1);
> @  		RETURNI(t);
> @  	}
> @ -
> 
> Style bug (extra blank line between related statements).

Hopefully, fixed.

> 
> @  	if (k == -1) {
> @ -		t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + 
> @ -		    (s[n2].hi - 2);
> @ +		t = SUM2P(s[n2].hi - 2, s[n2].lo * (r1 + 1) + t * q +
> @ +		    s[n2].hi * r1);
> @  		RETURNI(t / 2);
> @  	}
> @
> 
> This blank line is correct since the statements are unrelated -- the
> evaluation method changes significantly.  For k = 0 and k = -1, the
> evaluation is the same but we repeat it all to avoid using a variable
> for (k - 1) for the 2 values of k.
> 
> @  	if (k < -7) {
> @ -		t = s[n2].lo + t * (q + r1) + s[n2].hi;
> @ +		t = SUM2P(s[n2].hi, s[n2].lo + t * (q + r1));
> @  		RETURNI(t * twopk - 1);
> @  	}
> @ 
> @  	if (k > 2 * LDBL_MANT_DIG - 1) {
> @ -		t = s[n2].lo + t * (q + r1) + s[n2].hi;
> @ +		t = SUM2P(s[n2].hi, s[n2].lo + t * (q + r1));
> @  		if (k == LDBL_MAX_EXP)
> @  			RETURNI(t * 2 * 0x1p16383L - 1);
> 
> Ignore all the other changes in this hunk.

After making the changes, current unscientific testing gives
(best viewed in a 95 column window):

expl

Timing:
                               1M        2M       10M       100M
i386    [-11355.0:11356.0]   0.088302           0.867567   8.64871
amd64   [-11355.0:11356.0]   0.062994           0.631960   6.30295
sparc64 [-11355.0:11356.0]  39.5309    79.1927

Accuracy:
                             M    Max ULP      x at Max ULP
i386    [-11355.0:11356.0]   1   0.50465  -3.5510383760383760e+03 -0x1.bbe13a6062b8cdd4p+11
i386    [-11355.0:11356.0]  10   0.50556  -9.6479456830945683e+03 -0x1.2d7f90c24c5c686p+13
i386    [-11355.0:11356.0] 100   0.50654  -7.9982712426427124e+03 -0x1.f3e45702867bb01p+12
amd64   [-11355.0:11356.0]   1   0.50465  -3.5510383760383760e+03 -0x1.bbe13a6062b8cdd4p+11
amd64   [-11355.0:11356.0]  10   0.50556  -9.6479456830945683e+03 -0x1.2d7f90c24c5c686p+13
amd64   [-11355.0:11356.0] 100   0.50654  -7.9982712426427124e+03 -0x1.f3e45702867bb01p+12
sparc64 [-11355.0:11356.0]   1   0.50619  1.79779355979355979355979355979355983e+03
sparc64 {-11355.0:11356.0]   2   0.50541  1.11496704618352309176154588077294027e+04


expm1l
 
Timing:
                             1M          10M        100M
i386    [-64.0000:-0.1659]   0.435783   4.342621  43.41397
i386    [ -0.1659: 0.1659]   0.082880   0.829142   8.28948
i386    [  0.1659:11356.0]   0.110590   1.096098  10.96253
amd64   [-64.0000:-0.1659]   0.066751   0.648734   6.46649
amd64   [ -0.1659: 0.1659]   0.061531   0.614824   6.14377
amd64   [  0.1659:11356.0]   0.071677   0.716927   7.16819
sparc64 [-113.000:-0.1659]  37.84224
sparc64 [ -0.1659: 0.1659]  66.28533
sparc64 [  0.1659:11356.0]  41.20714
 
Accuracy:
                            M   Max ULP      x at Max ULP
i386    [-64.0000:-0.1659]   1   0.50824  -1.7579429539429599e-01 -0x1.6806d6ec55bd2cp-3
i386    [ -0.1659: 0.1659]   1   0.50807   1.5765476175476175e-01  0x1.42e07fee5cecaa04p-3
i386    [  0.1659:11356.0]   1   0.50533   4.6558240641420642e+03  0x1.22fd2f5de1bf8cb2p+12
i386    [-64.0000:-0.1659]  10   0.51163  -1.8666523480652408e-01 -0x1.7e4a57b65a7cp-3
i386    [ -0.1659: 0.1659]  10   0.51031  -1.6139564864956486e-01 -0x1.4a89cd45552be4a8p-3
i386    [  0.1659:11356.0]  10   0.50597   7.2029609713952472e+03  0x1.c22f60238aafa618p+12
i386    [-64.0000:-0.1659] 100   0.51520  -1.8119337383093434e-01 -0x1.731582f6d89b72p-3
i386    [ -0.1659: 0.1659] 100   0.51161   1.6120475455904754e-01  0x1.4a25b7e6539760ecp-3
i386    [  0.1659:11356.0] 100   0.50645   1.5581592136564341e+03  0x1.858a308e79dd8494p+10

amd64   [-64.0000:-0.1659]   1   0.50502  -1.8115636515636515e-01 -0x1.73021bbe7877ccp-3
amd64   [ -0.1659: 0.1659]   1   0.50807   1.5765476175476175e-01  0x1.42e07fee5cecaa04p-3
amd64   [  0.1659:11356.0]   1   0.50522   5.3732636683514684e+03  0x1.4fd437fc4e28bfb6p+12
amd64   [-64.0000:-0.1659]  10   0.51363  -1.7086629347662934e-01 -0x1.5def25b3c452dap-3
amd64   [ -0.1659: 0.1659]  10   0.51031  -1.6139564864956486e-01 -0x1.4a89cd45552be4a8p-3
amd64   [  0.1659:11356.0]  10   0.50595   2.2495034322503431e-01  0x1.ccb2c3fb0104dbe4p-3
amd64   [-64.0000:-0.1659] 100   0.51376  -2.7335577165055771e-01 -0x1.17ea934da5e086p-2
amd64   [ -0.1659: 0.1659] 100   0.51161   1.6120475455904754e-01  0x1.4a25b7e6539760ecp-3
amd64   [  0.1659:11356.0] 100   0.50662   3.9436528827225188e+02  0x1.8a5d83883eef2676p+8

sparc64 [-113.000:-0.1659]   1   0.50339  -4.89331501511501510727132103685011835e+00
sparc64 [  -0.1659:0.1659]   1   0.50837  -1.28120218820218813976976441251060453e-01
sparc64 [   0.1659:11356.]   1   0.50514   6.45515777662077662077313264157127259e+03

Testing on flame is excrudiating slow especially because rdivacky
is building clang.  Yes, the following is one massive patch. 

-- 
Steve

Index: ld80/s_expl.c
===================================================================
--- ld80/s_expl.c	(revision 251067)
+++ ld80/s_expl.c	(working copy)
@@ -29,7 +29,7 @@
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
-/*-
+/**
  * Compute the exponential of x for Intel 80-bit format.  This is based on:
  *
  *   PTP Tang, "Table-driven implementation of the exponential function
@@ -50,6 +50,7 @@
 #include "math_private.h"
 
 #define	INTERVALS	128
+#define	LOG2_INTERVALS	7
 #define	BIAS	(LDBL_MAX_EXP - 1)
 
 static const long double
@@ -60,9 +61,12 @@
 
 static const union IEEEl2bits
 /* log(2**16384 - 0.5) rounded towards zero: */
-o_threshold = LD80C(0xb17217f7d1cf79ab, 13,  11356.5234062941439488L),
+/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */
+o_thresholdu = LD80C(0xb17217f7d1cf79ab, 13, 11356.5234062941439488L),
+#define o_threshold	 (o_thresholdu.e)
 /* log(2**(-16381-64-1)) rounded towards zero: */
-u_threshold = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L);
+u_thresholdu = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L);
+#define u_threshold	 (u_thresholdu.e)
 
 static const double
 /*
@@ -70,19 +74,19 @@
  * have at least 22 (= log2(|LDBL_MIN_EXP-extras|) + log2(INTERVALS)) lowest
  * bits zero so that multiplication of it by n is exact.
  */
-INV_L = 1.8466496523378731e+2,		/*  0x171547652b82fe.0p-45 */
-L1 =  5.4152123484527692e-3,		/*  0x162e42ff000000.0p-60 */
+INV_L = 1.8466496523378731e+2,		/* 0x171547652b82fe.0p-45 */
+L1 = 5.4152123484527692e-3,		/* 0x162e42ff000000.0p-60 */
 L2 = -3.2819649005320973e-13,		/* -0x1718432a1b0e26.0p-94 */
 /*
  * Domain [-0.002708, 0.002708], range ~[-5.7136e-24, 5.7110e-24]:
  * |exp(x) - p(x)| < 2**-77.2
  * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
  */
-P2 =  0.5,
-P3 =  1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
-P4 =  4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
-P5 =  8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
-P6 =  1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
+A2 = 0.5,
+A3 = 1.6666666666666119e-1,		/* 0x15555555555490.0p-55 */
+A4 = 4.1666666666665887e-2,		/* 0x155555555554e5.0p-57 */
+A5 = 8.3333354987869413e-3,		/* 0x1111115b789919.0p-59 */
+A6 = 1.3888891738560272e-3;		/* 0x16c16c651633ae.0p-62 */
 
 /*
  * 2^(i/INTERVALS) for i in [0,INTERVALS] is represented by two values where
@@ -96,8 +100,7 @@
 static const struct {
 	double	hi;
 	double	lo;
-/* XXX should rename 's'. */
-} s[INTERVALS] = {
+} tbl[INTERVALS] = {
 	0x1p+0, 0x0p+0,
 	0x1.0163da9fb3335p+0, 0x1.b61299ab8cdb7p-54,
 	0x1.02c9a3e778060p+0, 0x1.dcdef95949ef4p-53,
@@ -232,7 +235,8 @@
 expl(long double x)
 {
 	union IEEEl2bits u, v;
-	long double fn, q, r, r1, r2, t, t23, t45, twopk, twopkp10000, z;
+	long double fn, q, r, r1, r2, t, twopk, twopkp10000;
+	long double z;
 	int k, n, n2;
 	uint16_t hx, ix;
 
@@ -242,40 +246,38 @@
 	ix = hx & 0x7fff;
 	if (ix >= BIAS + 13) {		/* |x| >= 8192 or x is NaN */
 		if (ix == BIAS + LDBL_MAX_EXP) {
-			if (hx & 0x8000 && u.xbits.man == 1ULL << 63)
-				return (0.0L);	/* x is -Inf */
-			return (x + x); /* x is +Inf, NaN or unsupported */
+			if (hx & 0x8000)  /* x is -Inf, -NaN or unsupported */
+				return (-1 / x);
+ 			return (x + x);	/* x is +Inf, +NaN or unsupported */
 		}
-		if (x > o_threshold.e)
+		if (x > o_threshold)
 			return (huge * huge);
-		if (x < u_threshold.e)
+		if (x < u_threshold)
 			return (tiny * tiny);
-	} else if (ix < BIAS - 66) {	/* |x| < 0x1p-66 */
-					/* includes pseudo-denormals */
-		if (huge + x > 1.0L)	/* trigger inexact iff x != 0 */
-			return (1.0L + x);
+	} else if (ix < BIAS - 65) {	/* |x| < 0x1p-65 (includes pseudos) */
+		return (1 + x);		/* 1 with inexact iff x != 0 */
 	}
 
 	ENTERI();
 
-	/* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
 	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
 	fn = x * INV_L + 0x1.8p63 - 0x1.8p63;
 	r = x - fn * L1 - fn * L2;	/* r = r1 + r2 done independently. */
 #if defined(HAVE_EFFICIENT_IRINTL)
-	n  = irintl(fn);
+	n = irintl(fn);
 #elif defined(HAVE_EFFICIENT_IRINT)
-	n  = irint(fn);
+	n = irint(fn);
 #else
-	n  = (int)fn;
+	n = (int)fn;
 #endif
 	n2 = (unsigned)n % INTERVALS;
-	k = (n - n2) / INTERVALS;
+	k = n >> LOG2_INTERVALS;
 	r1 = x - fn * L1;
-	r2 = -fn * L2;
+	r2 = fn * -L2;
 
 	/* Prepare scale factors. */
-	v.xbits.man = 1ULL << 63;
+	v.e = 1;
 	if (k >= LDBL_MIN_EXP) {
 		v.xbits.expsign = BIAS + k;
 		twopk = v.e;
@@ -284,21 +286,181 @@
 		twopkp10000 = v.e;
 	}
 
-	/* Evaluate expl(midpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */
-	/* Here q = q(r), not q(r1), since r1 is lopped like L1. */
-	t45 = r * P5 + P4;
+	/* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */
 	z = r * r;
-	t23 = r * P3 + P2;
-	q = r2 + z * t23 + z * z * t45 + z * z * z * P6;
-	t = (long double)s[n2].lo + s[n2].hi;
-	t = s[n2].lo + t * (q + r1) + s[n2].hi;
+	q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6;
+	t = (long double)tbl[n2].lo + tbl[n2].hi;
+	t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
 
 	/* Scale by 2**k. */
 	if (k >= LDBL_MIN_EXP) {
 		if (k == LDBL_MAX_EXP)
-			RETURNI(t * 2.0L * 0x1p16383L);
+			RETURNI(t * 2 * 0x1p16383L);
 		RETURNI(t * twopk);
 	} else {
 		RETURNI(t * twopkp10000 * twom10000);
 	}
 }
+
+/**
+ * Compute expm1l(x) for Intel 80-bit format.  This is based on:
+ *
+ *   PTP Tang, "Table-driven implementation of the Expm1 function
+ *   in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18,
+ *   211-222 (1992).
+ */
+
+/*
+ * Our T1 and T2 are chosen to be approximately the points where method
+ * A and method B have the same accuracy.  Tang's T1 and T2 are the
+ * points where method A's accuracy changes by a full bit.  For Tang,
+ * this drop in accuracy makes method A immediately less accurate than
+ * method B, but our larger INTERVALS makes method A 2 bits more
+ * accurate so it remains the most accurate method significantly
+ * closer to the origin despite losing the full bit in our extended
+ * range for it.
+ */
+static const double
+T1 = -0.1659,				/* ~-30.625/128 * log(2) */
+T2 = 0.1659;				/* ~30.625/128 * log(2) */
+
+/*
+ * Domain [-0.1659, 0.1659], range ~[-1.2027e-22, 3.4417e-22]:
+ * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.2
+ */
+static const union IEEEl2bits
+B3 = LD80C(0xaaaaaaaaaaaaaaab, -3, 1.66666666666666666671e-01L),
+B4 = LD80C(0xaaaaaaaaaaaaaaac, -5, 4.16666666666666666712e-02L);
+
+static const double
+B5 = 8.3333333333333245e-03,		/* 0x1.111111111110cp-7 */
+B6 = 1.3888888888888861e-03,		/* 0x1.6c16c16c16c0ap-10 */
+B7 = 1.9841269841532042e-04,		/* 0x1.a01a01a0319f9p-13 */
+B8 = 2.4801587302069236e-05,		/* 0x1.a01a01a03cbbcp-16 */
+B9 = 2.7557316558468562e-06,		/* 0x1.71de37fd33d67p-19 */
+B10 = 2.7557315829785151e-07,		/* 0x1.27e4f91418144p-22 */
+B11 = 2.5063168199779829e-08,		/* 0x1.ae94fabdc6b27p-26 */
+B12 = 2.0887164654459567e-09;		/* 0x1.1f122d6413fe1p-29 */
+
+long double
+expm1l(long double x)
+{
+	union IEEEl2bits u, v;
+	long double fn, hx2_hi, hx2_lo, q, r, r1, r2, t, twomk, twopk, x_hi;
+	long double x_lo, x2, z;
+	long double x4;
+	int k, n, n2;
+	uint16_t hx, ix;
+
+	/* Filter out exceptional cases. */
+	u.e = x;
+	hx = u.xbits.expsign;
+	ix = hx & 0x7fff;
+	if (ix >= BIAS + 6) {		/* |x| >= 64 or x is NaN */
+		if (ix == BIAS + LDBL_MAX_EXP) {
+			if (hx & 0x8000)  /* x is -Inf, -NaN or unsupported */
+				return (-1 / x - 1);
+			return (x + x);	/* x is +Inf, +NaN or unsupported */
+		}
+		if (x > o_threshold)
+			return (huge * huge);
+		/*
+		 * expm1l() never underflows, but it must avoid
+		 * unrepresentable large negative exponents.  We used a
+		 * much smaller threshold for large |x| above than in
+		 * expl() so as to handle not so large negative exponents
+		 * in the same way as large ones here.
+		 */
+		if (hx & 0x8000)	/* x <= -64 */
+			return (tiny - 1);	/* good for x < -65ln2 - eps */
+	}
+
+	ENTERI();
+
+	if (T1 < x && x < T2) {
+		if (ix < BIAS - 64) {	/* |x| < 0x1p-64 (includes pseudos) */
+			/* x (rounded) with inexact if x != 0: */
+			RETURNI(x == 0 ? x :
+			    (0x1p100 * x + fabsl(x)) * 0x1p-100);
+		}
+
+		x2 = x * x;
+		x4 = x2 * x2;
+
+		q = x4 * (x2 * (x4 *
+		    (x2 *            B12  + (x * B11 + B10)) +
+		    (x2 * (x * B9 +  B8) +  (x * B7 +  B6))) +
+			  (x * B5 +  B4.e)) + x2 * x * B3.e;
+
+		x_hi = (float)x;
+		x_lo = x - x_hi;
+		hx2_hi = x_hi * x_hi / 2;
+		hx2_lo = x_lo * (x + x_hi) / 2;
+		if (ix >= BIAS - 7)
+			RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi));
+		else
+			RETURNI(hx2_lo + q + hx2_hi + x);
+	}
+
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
+	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
+	fn = x * INV_L + 0x1.8p63 - 0x1.8p63;
+#if defined(HAVE_EFFICIENT_IRINTL)
+	n = irintl(fn);
+#elif defined(HAVE_EFFICIENT_IRINT)
+	n = irint(fn);
+#else
+	n = (int)fn;
+#endif
+	n2 = (unsigned)n % INTERVALS;
+	k = n >> LOG2_INTERVALS;
+	r1 = x - fn * L1;
+	r2 = fn * -L2;
+	r = r1 + r2;
+
+	/* Prepare scale factor. */
+	v.e = 1;
+	v.xbits.expsign = BIAS + k;
+	twopk = v.e;
+
+	/*
+	 * Evaluate lower terms of
+	 * expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2).
+	 */
+	z = r * r;
+	q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6;
+
+	t = (long double)tbl[n2].lo + tbl[n2].hi;
+
+	if (k == 0) {
+		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 +
+		    (tbl[n2].hi - 1);
+		RETURNI(t);
+	}
+
+	if (k == -1) {
+		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + 
+		    (tbl[n2].hi - 2);
+		RETURNI(t / 2);
+	}
+
+	if (k < -7) {
+		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
+		RETURNI(t * twopk - 1);
+	}
+
+	if (k > 2 * LDBL_MANT_DIG - 1) {
+		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
+		if (k == LDBL_MAX_EXP)
+			RETURNI(t * 2 * 0x1p16383L - 1);
+		RETURNI(t * twopk - 1);
+	}
+
+	v.xbits.expsign = BIAS - k;
+	twomk = v.e;
+	if (k > LDBL_MANT_DIG - 1)
+		t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi;
+	else
+		t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk);
+	RETURNI(t * twopk);
+}
Index: ld128/s_expl.c
===================================================================
--- ld128/s_expl.c	(revision 251067)
+++ ld128/s_expl.c	(working copy)
@@ -1,5 +1,5 @@
 /*-
- * Copyright (c) 2012 Steven G. Kargl
+ * Copyright (c) 2009-2012 Steven G. Kargl
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -22,6 +22,8 @@
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Optimized by Bruce D. Evans.
  */
 
 #include <sys/cdefs.h>
@@ -38,35 +40,56 @@
 #include "math_private.h"
 
 #define	INTERVALS	128
+#define	LOG2_INTERVALS	7
 #define	BIAS	(LDBL_MAX_EXP - 1)
 
+static const long double
+huge = 0x1p10000L,
+twom10000 = 0x1p-10000L;
+/* XXX Prevent gcc from erroneously constant folding this: */
 static volatile const long double tiny = 0x1p-10000L;
 
 static const long double
-INV_L = 1.84664965233787316142070359168242182e+02L,
-L1 = 5.41521234812457272982212595914567508e-03L,
-L2 = -1.02536706388947310094527932552595546e-29L,
-huge = 0x1p10000L,
-o_threshold =  11356.523406294143949491931077970763428L,
-twom10000 = 0x1p-10000L,
+/* log(2**16384 - 0.5) rounded towards zero: */
+/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */
+o_threshold = 11356.523406294143949491931077970763428L,
+/* log(2**(-16381-64-1)) rounded towards zero: */
 u_threshold = -11433.462743336297878837243843452621503L;
 
+static const double
+/*
+ * ln2/INTERVALS = L1+L2 (hi+lo decomposition for multiplication).  L1 must
+ * have at least 22 (= log2(|LDBL_MIN_EXP-extras|) + log2(INTERVALS)) lowest
+ * bits zero so that multiplication of it by n is exact.
+ */
+INV_L = 1.8466496523378731e+2,		/* 0x171547652b82fe.0p-45 */
+L2 = -1.0253670638894731e-29;		/* -0x1.9ff0342542fc3p-97 */
 static const long double
-P2 = 5.00000000000000000000000000000000000e-1L,
-P3 = 1.66666666666666666666666666666666972e-1L,
-P4 = 4.16666666666666666666666666653708268e-2L,
-P5 = 8.33333333333333333333333315069867254e-3L,
-P6 = 1.38888888888888888888996596213795377e-3L,
-P7 = 1.98412698412698412718821436278644414e-4L,
-P8 = 2.48015873015869681884882576649543128e-5L,
-P9 = 2.75573192240103867817876199544468806e-6L,
-P10 = 2.75573236172670046201884000197885520e-7L,
-P11 = 2.50517544183909126492878226167697856e-8L;
+/* 0x1.62e42fefa39ef35793c768000000p-8 */
+L1 = 5.41521234812457272982212595914567508e-03L;
 
+static const long double
+/*
+ * Domain [-0.002708, 0.002708], range ~[-2.4011e-38, 2.4244e-38]:
+ * |exp(x) - p(x)| < 2**-124.9
+ * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
+ */
+A2 = 0.5,
+A3 = 1.66666666666666666666666666651085500e-01L,
+A4 = 4.16666666666666666666666666425885320e-02L,
+A5 = 8.33333333333333333334522877160175842e-03L,
+A6 = 1.38888888888888888889971139751596836e-03L;
+
+static const double
+A7 = 1.9841269841269471e-04,
+A8 = 2.4801587301585284e-05,
+A9 = 2.7557324277411234e-06,
+A10 = 2.7557333722375072e-07;
+
 static const struct {
 	long double	hi;
 	long double	lo;
-} s[INTERVALS] = {
+} tbl[INTERVALS] = {
 	0x1p0L, 0x0p0L,
 	0x1.0163da9fb33356d84a66aep0L, 0x3.36dcdfa4003ec04c360be2404078p-92L,
 	0x1.02c9a3e778060ee6f7cacap0L, 0x4.f7a29bde93d70a2cabc5cb89ba10p-92L,
@@ -201,9 +224,10 @@
 expl(long double x)
 {
 	union IEEEl2bits u, v;
-	long double fn, r, r1, r2, q, t, twopk, twopkp10000;
+	long double q, r, r1, t, twopk, twopkp10000;
+	double dr, fn, r2;
 	int k, n, n2;
-	uint32_t hx, ix;
+	uint16_t hx, ix;
 
 	/* Filter out exceptional cases. */
 	u.e = x;
@@ -211,31 +235,36 @@
 	ix = hx & 0x7fff;
 	if (ix >= BIAS + 13) {		/* |x| >= 8192 or x is NaN */
 		if (ix == BIAS + LDBL_MAX_EXP) {
-			if (hx & 0x8000 && u.xbits.manh == 0 &&
-			    u.xbits.manl == 0)
-				return (0.0L);	/* x is -Inf */
-			return (x + x);	/* x is +Inf or NaN */
+			if (hx & 0x8000)  /* x is -Inf or -NaN */
+				return (-1 / x);
+			return (x + x);	/* x is +Inf or +NaN */
 		}
 		if (x > o_threshold)
 			return (huge * huge);
 		if (x < u_threshold)
 			return (tiny * tiny);
-	} else if (ix < BIAS - 115) {	/* |x| < 0x1p-115 */
-	    	if (huge + x > 1.0L)	/* trigger inexact iff x != 0 */
-			return (1.0L + x);
+	} else if (ix < BIAS - 114) {	/* |x| < 0x1p-114 */
+		return (1 + x);		/* 1 with inexact iff x != 0 */
 	}
 
-	/* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */
-	fn = x * INV_L + 0x1.8p112 - 0x1.8p112;
-	n  = (int)fn;
+	ENTERI();
+
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
+	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
+	fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52;
+	r = x - fn * L1 - fn * L2;	/* r = r1 + r2 done independently. */
+#if defined(HAVE_EFFICIENT_IRINT)
+	n = irint(fn);
+#else
+	n = (int)fn;
+#endif
 	n2 = (unsigned)n % INTERVALS;
-	k = (n - n2) / INTERVALS;
+	k = n >> LOG2_INTERVALS;
 	r1 = x - fn * L1;
-	r2 = -fn * L2;
+	r2 = fn * -L2;
 
 	/* Prepare scale factors. */
-	v.xbits.manh = 0;
-	v.xbits.manl = 0;
+	v.e = 1;
 	if (k >= LDBL_MIN_EXP) {
 		v.xbits.expsign = BIAS + k;
 		twopk = v.e;
@@ -244,18 +273,223 @@
 		twopkp10000 = v.e;
 	}
 
-	r = r1 + r2;
-	q = r * r * (P2 + r * (P3 + r * (P4 + r * (P5 + r * (P6 + r * (P7 +
-	    r * (P8 + r * (P9 + r * (P10 + r * P11)))))))));
-	t = s[n2].lo + s[n2].hi;
-	t = s[n2].hi + (s[n2].lo + t * (r2 + q + r1));
+	/* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */
+	dr = r;
+	q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 +
+	    dr * (A7 + dr * (A8 + dr * (A9 + dr * A10))))))));
+	t = tbl[n2].lo + tbl[n2].hi;
+	t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
 
 	/* Scale by 2**k. */
 	if (k >= LDBL_MIN_EXP) {
 		if (k == LDBL_MAX_EXP)
-			return (t * 2.0L * 0x1p16383L);
-		return (t * twopk);
+			RETURNI(t * 2 * 0x1p16383L);
+		RETURNI(t * twopk);
 	} else {
-		return (t * twopkp10000 * twom10000);
+		RETURNI(t * twopkp10000 * twom10000);
 	}
 }
+
+/*
+ * Our T1 and T2 are chosen to be approximately the points where method
+ * A and method B have the same accuracy.  Tang's T1 and T2 are the
+ * points where method A's accuracy changes by a full bit.  For Tang,
+ * this drop in accuracy makes method A immediately less accurate than
+ * method B, but our larger INTERVALS makes method A 2 bits more
+ * accurate so it remains the most accurate method significantly
+ * closer to the origin despite losing the full bit in our extended
+ * range for it.
+ */
+static const double
+T1 = -0.1659,				/* ~-30.625/128 * log(2) */
+T2 = 0.1659;				/* ~30.625/128 * log(2) */
+
+/*
+ * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2].
+ * Setting T3 to 0 would require the |x| < 0x1p-113  condition to appear
+ * in both subintervals, so set T3 = 2**-5, which places the condition
+ * into the [T1:T3] interval.
+ */
+static const double
+T3 = 0.03125;
+
+/*
+ * XXX Estimated range is for absolute error.
+ * Domain [-0.1659, 0.03125], range ~[-1.8933e-38, 1.8943e-38]:
+ * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-125.3
+ */
+static const long double
+C3 = 1.66666666666666666666666666666666667e-01L,
+C4 = 4.16666666666666666666666666666666645e-02L,
+C5 = 8.33333333333333333333333333333371638e-03L,
+C6 = 1.38888888888888888888888888891188658e-03L,
+C7 = 1.98412698412698412698412697235950394e-04L,
+C8 = 2.48015873015873015873015112487849040e-05L,
+C9 = 2.75573192239858906525606685484412005e-06L,
+C10 = 2.75573192239858906612966093057020362e-07L,
+C11 = 2.50521083854417203619031960151253944e-08L,
+C12 = 2.08767569878679576457272282566520649e-09L,
+C13 = 1.60590438367252471783548748824255707e-10L;
+
+static const double
+C14 = 1.1470745580491932e-11,		/* 0x1.93974a81dae3p-37 */
+C15 = 7.6471620181090468e-13,		/* 0x1.ae7f3820adab1p-41 */
+C16 = 4.7793721460260450e-14,		/* 0x1.ae7cd18a18eacp-45 */
+C17 = 2.8074757356658877e-15,		/* 0x1.949992a1937d9p-49 */
+C18 = 1.4760610323699476e-16;		/* 0x1.545b43aabfbcdp-53 */
+
+/*
+ * XXX Estimated range is for absolute error.
+ * Domain [0.03125, 0.1659], range ~[-2.7597e-38, 2.7602e-38]:
+ * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-124.8
+ */
+static const long double
+D3 = 1.66666666666666666666666666666682245e-01L,
+D4 = 4.16666666666666666666666666634228324e-02L,
+D5 = 8.33333333333333333333333364022244481e-03L,
+D6 = 1.38888888888888888888887138722762072e-03L,
+D7 = 1.98412698412698412699085805424661471e-04L,
+D8 = 2.48015873015873015687993712101479612e-05L,
+D9 = 2.75573192239858944101036288338208042e-06L,
+D10 = 2.75573192239853161148064676533754048e-07L,
+D11 = 2.50521083855084570046480450935267433e-08L,
+D12 = 2.08767569819738524488686318024854942e-09L,
+D13 = 1.60590442297008495301927448122499313e-10L;
+
+static const double
+D14 = 1.1470726176204336e-11,		/* 0x1.93971dc395d9ep-37 */
+D15 = 7.6478532249581686e-13,		/* 0x1.ae892e3D16fcep-41 */
+D16 = 4.7628892832607741e-14,		/* 0x1.ad00Dfe41feccp-45 */
+D17 = 3.0524857220358650e-15;		/* 0x1.D7e8d886Df921p-49 */
+
+long double
+expm1l(long double x)
+{
+	union IEEEl2bits u, v;
+	long double hx2_hi, hx2_lo, q, r, r1, t, twomk, twopk, x_hi;
+	long double x_lo, x2;
+	double dr, dx, fn, r2;
+	int k, n, n2;
+	uint16_t hx, ix;
+
+	/* Filter out exceptional cases. */
+	u.e = x;
+	hx = u.xbits.expsign;
+	ix = hx & 0x7fff;
+	if (ix >= BIAS + 7) {		/* |x| >= 128 or x is NaN */
+		if (ix == BIAS + LDBL_MAX_EXP) {
+			if (hx & 0x8000)  /* x is -Inf or -NaN */
+				return (-1 / x - 1);
+			return (x + x);	/* x is +Inf or +NaN */
+		}
+		if (x > o_threshold)
+			return (huge * huge);
+		/*
+		 * expm1l() never underflows, but it must avoid
+		 * unrepresentable large negative exponents.  We used a
+		 * much smaller threshold for large |x| above than in
+		 * expl() so as to handle not so large negative exponents
+		 * in the same way as large ones here.
+		 */
+		if (hx & 0x8000)	/* x <= -128 */
+			return (tiny - 1);	/* good for x < -114ln2 - eps */
+	}
+
+	ENTERI();
+
+	if (T1 < x && x < T2) {
+		if (ix < BIAS - 113) {	/* |x| < 0x1p-113 */
+			/* x (rounded) with inexact if x != 0: */
+			RETURNI(x == 0 ? x :
+			    (0x1p200 * x + fabsl(x)) * 0x1p-200);
+		}
+
+		x2 = x * x;
+		dx = x;
+
+		if (x < T3) {
+			q = x * x2 * C3 + x2 * x2 * (C4 + x * (C5 + x * (C6 +
+			    x * (C7 + x * (C8 + x * (C9 + x * (C10 +
+			    x * (C11 + x * (C12 + x * (C13 +
+			    dx * (C14 + dx * (C15 + dx * (C16 +
+			    dx * (C17 + dx * C18))))))))))))));
+		} else {
+			q = x * x2 * D3 + x2 * x2 * (D4 + x * (D5 + x * (D6 +
+			    x * (D7 + x * (D8 + x * (D9 + x * (D10 +
+			    x * (D11 + x * (D12 + x * (D13 +
+			    dx * (D14 + dx * (D15 + dx * (D16 +
+			    dx * D17)))))))))))));
+		}
+
+		x_hi = (float)x;
+		x_lo = x - x_hi;
+		hx2_hi = x_hi * x_hi / 2;
+		hx2_lo = x_lo * (x + x_hi) / 2;
+		if (ix >= BIAS - 7)
+			RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi));
+		else
+			RETURNI(hx2_lo + q + hx2_hi + x);
+	}
+
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
+	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
+	fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52;
+#if defined(HAVE_EFFICIENT_IRINT)
+	n = irint(fn);
+#else
+	n = (int)fn;
+#endif
+	n2 = (unsigned)n % INTERVALS;
+	k = n >> LOG2_INTERVALS;
+	r1 = x - fn * L1;
+	r2 = fn * -L2;
+	r = r1 + r2;
+
+	/* Prepare scale factor. */
+	v.e = 1;
+	v.xbits.expsign = BIAS + k;
+	twopk = v.e;
+
+	/*
+	 * Evaluate lower terms of
+	 * expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2).
+	 */
+	dr = r;
+	q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 +
+	    dr * (A7 + dr * (A8 + dr * (A9 + dr * A10))))))));
+
+	t = tbl[n2].lo + tbl[n2].hi;
+
+	if (k == 0) {
+		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 +
+		    (tbl[n2].hi - 1);
+		RETURNI(t);
+	}
+
+	if (k == -1) {
+		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + 
+		    (tbl[n2].hi - 2);
+		RETURNI(t / 2);
+	}
+
+	if (k < -7) {
+		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
+		RETURNI(t * twopk - 1);
+	}
+
+	if (k > 2 * LDBL_MANT_DIG - 1) {
+		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
+		if (k == LDBL_MAX_EXP)
+			RETURNI(t * 2 * 0x1p16383L - 1);
+		RETURNI(t * twopk - 1);
+	}
+
+	v.xbits.expsign = BIAS - k;
+	twomk = v.e;
+
+	if (k > LDBL_MANT_DIG - 1)
+		t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi;
+	else
+		t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk);
+	RETURNI(t * twopk);
+}

From owner-freebsd-numerics@FreeBSD.ORG  Wed May 29 20:25:41 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id A12315BB
 for <freebsd-numerics@freebsd.org>; Wed, 29 May 2013 20:25:41 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 293A478D
 for <freebsd-numerics@freebsd.org>; Wed, 29 May 2013 20:25:41 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id C1B963C187A;
 Thu, 30 May 2013 06:25:32 +1000 (EST)
Date: Thu, 30 May 2013 06:25:31 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: Patches for s_expl.c
In-Reply-To: <20130529162441.GA58773@troutmask.apl.washington.edu>
Message-ID: <20130530045951.Y4776@besplex.bde.org>
References: <20130528172242.GA51485@troutmask.apl.washington.edu>
 <20130529062437.V4648@besplex.bde.org>
 <20130529162441.GA58773@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=e4Ne0tV/ c=1 sm=1 a=kj9zAlcOel0A:10
 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=SI7jfN9uMIUA:10
 a=L3_B_2Seth8ID6XF1HAA:9 a=CjuIK1q_8ugA:10 a=qSAIOg-s5ZBGxsML:21
 a=vBbP7Dv9lfjiZ7nx:21 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
Cc: freebsd-numerics@freebsd.org, Bruce Evans <brde@optusnet.com.au>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 May 2013 20:25:41 -0000

On Wed, 29 May 2013, Steve Kargl wrote:

> On Wed, May 29, 2013 at 07:39:04AM +1000, Bruce Evans wrote:
>> On Tue, 28 May 2013, Steve Kargl wrote:
>>
>>> Here are two patches for ld80/s_expl.c and ld128/s_expl.c.
>>> Instead of committing the one large patch that I have spent
>>> hours testing, I have split it into two.  One patch fixes/updates
>>> expl().  The other patch is the implementation of expm1l().
> ...
>>>   * Rename polynomial coefficients P# to A#, which is used in Tang.
>>
>> Almost all the declarations polynomial coefficients are still formatted
>> in a nonstandard way, but differently than in previous development
>> versions.  I keep sending you patches for this.
>
> Hopefully, fixed.  All fancy whitespace has been removed including
> in comments with hex values.

Er, I asked for them to be formatted in a standard way.  This has
whitespace for minus signs, since that lines up things better and its
too hard to avoid it when using printf() to format tables.  Removing
it gives much larger diffs than before (although I merged a few of the
regressions, I didn't merge them when the formatting was already
standard).

>>>   * Compute expm1l(x) for IEEE 754 128-bit format.
>>
>> There is a fairly large bug in this, from only merging half of the
>> most recent micro-optimization in the development version of the ld80
>> version.  This might only be an efficiency bug, but I haven't tested
>> the ld128 version with either the full merge or the half merge.
>>
>> The ld128 version still has excessive optimizations for |x| near 0.
>> It uses a slightly different high-degree polynomial on each side of
>> 0.  The ld80 version uses the same poly on each side.  Most of the
>> style bugs in the 4 exp[!2]l functions are in the coeffs for the
>> polys on each side.  I haven't tried so hard to get you to fix them
>> since I want to remove them.
>
> Hopefully, fixed to the extent that opened ld80/s_expl.c in one
> nedit window and ld128/s_expl.c in another.  I copied everything
> from ld80 to ld128 except of course literal constants and
> polynomials that must be different.

Seems to be fixed (matches my version).  I have barely started testing
my version of it on sparc64.

>> There are some minor style regressions relative to previous development
>> versions outside of poly coeffs.  Patches later.
>
> I'm sure you're going to hate the new patch at the end.

Mainly more whitespace regressions :-).  Several non-style regressions
for ld128.

> All coefficient are now formatted with the form:
>
> A6 = 1.3888891738560272e-3;       /* 0x16c16c651633ae.0p-62 */
>
> ie., 1 space before and 1 space after =.  The space in the comments
> for the implicit + sign has been removed.

As I mentioned above, this is nonstandard and requires manual editing
to mess up automatically formatted tables.  I used to print the tables
not very carefully and had to do lots of editing to match the style in
the source.  I got tired of this and changed the printing routines to
prettyprint in a standard format with all the necessary C syntax so that
I could copy whole tables to the source file.  Signs may ore may not
be required and it is easiest to always leave space for them in the
standard format and never edit this to add or remove spaces for them.

>> Some patches relative to my version now instead of later:
> ...
>> @ @@ -83,9 +83,9 @@
>> @   * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
>> @   */
>> @ -A2  = 0.5,
>> @ -A3  = 1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
>> @ -A4  = 4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
>> @ -A5  = 8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
>> @ -A6  = 1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
>> @ +A2 =  0.5,
>> @ +A3 =  1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
>> @ +A4 =  4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
>> @ +A5 =  8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
>> @ +A6 =  1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
>> @
>> @  /*
>>
>> Fix regressions relative to a previous development version.
>
> I made this conform to style(9).

style(9) only says not to use fancy formatting for assignments implicitly.
indent(1) cannot preserve fancy formatting for assignments or even be
directed how to format assignments.  However, we were intentionally using
fancy formatting, and the above was one of the few places in s_expl.c
where it was done consistently (after backing out regressions).  You got
the standard fancy formatting by copying one of my automatically generated
sets of coeffs when the coeff names were P[2-6].

>> @  #endif
>> @  	n2 = (unsigned)n % INTERVALS;
>> @ +	/* Depend on the sign bit being propagated: */
>> @  	k = n >> LOG2_INTERVALS;
>> @  	r1 = x - fn * L1;
>>
>> I think a comment is needed.  This micro-optimization was merged from
>> s_exp2*.c, where it is commented on more prominently for the long
>> double versions only.
>
> Ignored adding a comment.

It will be in future diffs.

>> @ @@ -389,4 +409,9 @@
>> @  		x4 = x2 * x2;
>> @  		q = x4 * (x2 * (x4 *
>> @ +		    /*
>> @ +		     * XXX the number of terms is no longer good for
>> @ +		     * pairwise grouping of all except B3, and the
>> @ +		     * grouping is no longer from highest down.
>> @ +		     */
>> @  		    (x2 *            B12  + (x * B11 + B10)) +
>> @  		    (x2 * (x * B9 +  B8) +  (x * B7 +  B6))) +
>
> I left this as-is with whitespace and did not add the comment.
> This should be the only place where there is a substantial
> deviation from style(9).

The comment is a reminder for fix the grouping of terms.

> After making the changes, current unscientific testing gives
> (best viewed in a 95 column window):
> ...
> expm1l
>
> Timing:
>                             1M          10M        100M
> i386    [-64.0000:-0.1659]   0.435783   4.342621  43.41397

Hmm, only slow on i386.  It's still fast for me.  Now tested on
Athlon64 and core2.

> i386    [ -0.1659: 0.1659]   0.082880   0.829142   8.28948
> i386    [  0.1659:11356.0]   0.110590   1.096098  10.96253
> amd64   [-64.0000:-0.1659]   0.066751   0.648734   6.46649
> amd64   [ -0.1659: 0.1659]   0.061531   0.614824   6.14377
> amd64   [  0.1659:11356.0]   0.071677   0.716927   7.16819
> sparc64 [-113.000:-0.1659]  37.84224
> sparc64 [ -0.1659: 0.1659]  66.28533
> sparc64 [  0.1659:11356.0]  41.20714
> ...
> Testing on flame is excrudiating slow especially because rdivacky
> is building clang.

I handle the normal slowness by reducing the number of tests by a
factor of 100 for long double precision on sparc64.  rdivacky only
gave another factor of 3 slowness :-).

> Yes, the following is one massive patch.

Easier to apply that way.

> Index: ld80/s_expl.c
> ...
> Index: ld128/s_expl.c
> ...

Too hard to see or describe regressions in these because they are relative
to an old version.

Here are my current diffs for ld80:

@ --- z22/s_expl.c	Thu May 30 03:56:37 2013
@ +++ ./s_expl.c	Thu May 30 04:15:33 2013
@ @@ -63,5 +63,5 @@
@  /* log(2**16384 - 0.5) rounded towards zero: */
@  /* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */
@ -o_thresholdu = LD80C(0xb17217f7d1cf79ab, 13, 11356.5234062941439488L),
@ +o_thresholdu = LD80C(0xb17217f7d1cf79ab, 13,  11356.5234062941439488L),
@  #define o_threshold	 (o_thresholdu.e)
@  /* log(2**(-16381-64-1)) rounded towards zero: */
@ @@ -75,6 +75,6 @@
@   * bits zero so that multiplication of it by n is exact.
@   */
@ -INV_L = 1.8466496523378731e+2,		/* 0x171547652b82fe.0p-45 */
@ -L1 = 5.4152123484527692e-3,		/* 0x162e42ff000000.0p-60 */
@ +INV_L = 1.8466496523378731e+2,		/*  0x171547652b82fe.0p-45 */
@ +L1 =  5.4152123484527692e-3,		/*  0x162e42ff000000.0p-60 */
@  L2 = -3.2819649005320973e-13,		/* -0x1718432a1b0e26.0p-94 */
@  /*
@ @@ -83,9 +83,9 @@
@   * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
@   */
@ -A2 = 0.5,
@ -A3 = 1.6666666666666119e-1,		/* 0x15555555555490.0p-55 */
@ -A4 = 4.1666666666665887e-2,		/* 0x155555555554e5.0p-57 */
@ -A5 = 8.3333354987869413e-3,		/* 0x1111115b789919.0p-59 */
@ -A6 = 1.3888891738560272e-3;		/* 0x16c16c651633ae.0p-62 */
@ +A2 =  0.5,
@ +A3 =  1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
@ +A4 =  4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
@ +A5 =  8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
@ +A6 =  1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
@ 
@  /*

As in previous version, with the diff larger since I didn't merge
whitespace changes that are regressions unless the formatting was
already mostly wrong.

@ @@ -273,4 +281,5 @@
@  #endif
@  	n2 = (unsigned)n % INTERVALS;
@ +	/* Depend on the sign bit being propagated: */
@  	k = n >> LOG2_INTERVALS;
@  	r1 = x - fn * L1;

As in previous version.

@ @@ -323,9 +332,19 @@
@  static const double
@  T1 = -0.1659,				/* ~-30.625/128 * log(2) */
@ -T2 = 0.1659;				/* ~30.625/128 * log(2) */
@ +T2 =  0.1659;				/* ~30.625/128 * log(2) */
@ 
@  /*
@ - * Domain [-0.1659, 0.1659], range ~[-1.2027e-22, 3.4417e-22]:
@ - * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.2
@ + * Domain [-0.1659, 0.1659], range ~[-2.6155e-22, 2.5507e-23]:
@ + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.6
@ + *
@ + * XXX the coeffs aren't very carefully rounded, and I get 4.5 more bits,
@ + * but unlike for ld128 we can't drop any terms.
@ + *
@ + * XXX this still isn't in standard format:
@ + * - extra digits in exponents for decimal values
@ + * - no spaces to line up equals signs (a new regression)
@ + * - no space for a (not present) minus sign in either the decimal or hex
@ + *   values (a new regression for the LD80C hex values)
@ + * - perhaps they are impossible for double values
@   */
@  static const union IEEEl2bits

Mostly as in previous version.  I merged a lot of whitespace regressions
here and only added comments saying that there is more to fix now.

@ @@ -387,6 +408,10 @@
@  		x2 = x * x;
@  		x4 = x2 * x2;
@ -

I didn't merge a new whitespace regression.

@  		q = x4 * (x2 * (x4 *
@ +		    /*
@ +		     * XXX the number of terms is no longer good for
@ +		     * pairwise grouping of all except B3, and the
@ +		     * grouping is no longer from highest down.
@ +		     */
@  		    (x2 *            B12  + (x * B11 + B10)) +
@  		    (x2 * (x * B9 +  B8) +  (x * B7 +  B6))) +

As in previous verision.

@ @@ -434,22 +459,21 @@
@ 
@  	if (k == 0) {
@ -		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 +
@ -		    (tbl[n2].hi - 1);
@ +		t = SUM2P(tbl[n2].hi - 1, tbl[n2].lo * (r1 + 1) + t * q +
@ +		    tbl[n2].hi * r1);
@  		RETURNI(t);
@  	}
@ -

You don't want most of this, but there is still an extra blank line here,
as in previous version.

@  	if (k == -1) {
@ -		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + 
@ -		    (tbl[n2].hi - 2);
@ +		t = SUM2P(tbl[n2].hi - 2, tbl[n2].lo * (r1 + 1) + t * q +
@ +		    tbl[n2].hi * r1);
@  		RETURNI(t / 2);
@  	}
@ 
@  	if (k < -7) {
@ -		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
@ +		t = SUM2P(tbl[n2].hi, tbl[n2].lo + t * (q + r1));
@  		RETURNI(t * twopk - 1);
@  	}
@ 
@  	if (k > 2 * LDBL_MANT_DIG - 1) {
@ -		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
@ +		t = SUM2P(tbl[n2].hi, tbl[n2].lo + t * (q + r1));
@  		if (k == LDBL_MAX_EXP)
@  			RETURNI(t * 2 * 0x1p16383L - 1);
@ @@ -459,8 +483,9 @@
@  	v.xbits.expsign = BIAS - k;
@  	twomk = v.e;
@ +

You don't want most of this, but there is now a missing blank line here.
Apparently the extra blank line above was removed here.  (The initialization
of twomk was intentionally separated from its use since the initialization
is somewhat special although it is not commented on like the inuitialization
of twopk.)

@  	if (k > LDBL_MANT_DIG - 1)
@ -		t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi;
@ +		t = SUM2P(tbl[n2].hi, tbl[n2].lo - twomk + t * (q + r1));
@  	else
@ -		t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk);
@ +		t = SUM2P(tbl[n2].hi - twomk, tbl[n2].lo + t * (q + r1));
@  	RETURNI(t * twopk);
@  }

Summary of my current diffs for ld128 (the full diffs are hard to untangle).
There are a couple of more serious regressions which these patches reverse.
No comments on formatting.  No patches for things done last year.

% --- z22/s_expl.c	Thu May 30 04:21:49 2013
% +++ ./s_expl.c	Thu May 30 04:59:06 2013
% ...
% @@ -252,6 +289,7 @@
%  	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
%  	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
% +	/* XXX assume no extra precision for the additions, as for trig fns. */
% +	/* XXX this set of comments is now quadruplicated. */
%  	fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52;
% -	r = x - fn * L1 - fn * L2;	/* r = r1 + r2 done independently. */

This undoes a regression to the ld80 version.  Initializing r using extra
operations here is an optimization for the ld80 version (really for x86
and/or OOE CPUs with fast pipelines).  It is a huge pessimization to do
2 extra long double multiplications on sparc64, so it was not done.

%  #if defined(HAVE_EFFICIENT_IRINT)
%  	n = irint(fn);
% @@ -263,6 +301,8 @@
%  	r1 = x - fn * L1;
%  	r2 = fn * -L2;
% +	r = r1 + r2;

Finish undoing the regression.

% 
%  	/* Prepare scale factors. */
% +	/* XXX sparc64 multiplication is so slow that scalbnl() is faster. */
%  	v.e = 1;
%  	if (k >= LDBL_MIN_EXP) {

Undo the regression of losing an important optimization hint.  The x86ish
optimization of using a multiplication to scale is not as bad on sparc64
as the one above, but it is still so bad that scalbnl() is better.  The
old fdlibm scaling method should be used instead of either of these (it
is a specialized scalbnl() manually inlined).

% @@ -303,11 +343,20 @@
%  static const double
%  T1 = -0.1659,				/* ~-30.625/128 * log(2) */
% -T2 = 0.1659;				/* ~30.625/128 * log(2) */
% +T2 =  0.1659;				/* ~30.625/128 * log(2) */
% 
%  /*
%   * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2].
% - * Setting T3 to 0 would require the |x| < 0x1p-113  condition to appear
% + * Setting T3 to 0 would require the |x| < 0x1p-113 condition to appear
%   * in both subintervals, so set T3 = 2**-5, which places the condition
%   * into the [T1:T3] interval.
% + *
% + * XXX the above comment has rotted.  The condition is now tested for
% + * both subintervals (although with T3 nonzero it is only satisfied for
% + * [T1:T3].  However, it is now even more critical for other reasons
% + * that T3 not being in the middle.  We now do this so that the polys
% + * for each side can have almost the same degree.  It may be slightly
% + * misplaced, since the C poly has ended up 1 degree higher.
% + *
% + * XXX these micro-optimizations are excessive.
%   */
%  static const double

I'm not sure if the change to test the condition for both intervals
is good, but it makes the first paragraph of the comment completely
wrong.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 06:46:42 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 2382A99;
 Thu, 30 May 2013 06:46:42 +0000 (UTC) (envelope-from das@FreeBSD.ORG)
Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net
 [50.196.151.174])
 by mx1.freebsd.org (Postfix) with ESMTP id E36AE298;
 Thu, 30 May 2013 06:46:41 +0000 (UTC)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
 by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r4U6kZEU091680;
 Wed, 29 May 2013 23:46:35 -0700 (PDT) (envelope-from das@FreeBSD.ORG)
Received: (from das@localhost)
 by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r4U6kZIJ091679;
 Wed, 29 May 2013 23:46:35 -0700 (PDT) (envelope-from das@FreeBSD.ORG)
Date: Wed, 29 May 2013 23:46:35 -0700
From: David Schultz <das@FreeBSD.ORG>
To: David Chisnall <theraven@freebsd.org>
Subject: Re: standards/175811: libstdc++ needs complex support in order use C99
Message-ID: <20130530064635.GA91597@zim.MIT.EDU>
References: <201302040328.r143SUd3039504@freefall.freebsd.org>
 <510F306A.6090009@missouri.edu>
 <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
Cc: Stephen Montgomery-Smith <stephen@missouri.edu>, pfg@freebsd.org,
 freebsd-standards@freebsd.org, freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 06:46:42 -0000

On Fri, Feb 22, 2013, David Chisnall wrote:
> On 4 Feb 2013, at 03:52, Stephen Montgomery-Smith <stephen@missouri.edu> wrote:
> 
> > We do really seem to have a lot of working code right now.  And the main
> > barrier to commitment seems to be style issues.
> > 
> > For example, I have code at http://people.freebsd.org/~stephen/ for the
> > complex arctrig functions.  And Bruce has clog available.  And
> > presumably he has logl and atanl also available.
> > 
> > The last I heard about my code is Bruce asking for some style changes.
> > However I really don't think I will have time to work on it until at
> > least the summer.  And to be honest, style just isn't my thing.
> > 
> > I propose (a) that someone else takes over my code (and maybe Bruce's
> > code) and make the style changes, or (b) that we get a little less fussy
> > about getting it all just so right and start committing stuff.
> > 
> > Let me add that the code we have is already far superior than anything
> > in Linux or NetBSD, who clearly didn't worry about huge numerical errors
> > in many edge cases.  Come on guys, let's start strutting our stuff.
> > 
> > Let's commit what we have, even if it isn't perfect.
> 
> Yes, please can this happen?  We are currently on 31 test
> failures in the libc++ test suite on -HEAD, of which at least 18
> are due to linker failures trying to find missing libm
> functions.  We are very close to having a complete C++11
> implementation, yet we are held up by the lack of C99 support,
> and we are held up there by style nits?
>
> On behalf of core, please can we commit the existing code and
> worry about the style later? Given the expertise required to
> work on the libm functions, most of the people who are able to
> hack on the code have already read it and so concerns about
> consistency readability are somewhat misplaced.

I didn't see this thread until now, but coincidentally, I just
wrote tests and manpages for and committed Stephen's
implementations of most of the missing double/float complex
functions. I don't know the status of clog() or cpow(), but
murray@ has a patch to port the NetBSD versions, which I'm also
willing to commit given the unacceptable delays in producing
something better.

I was wondering if you could explain a bit about what your goal is
here, though.  Is there some kind of certification you are trying
to achieve?  Why can't you just comment out the few missing
functions?  You've been adamant about this issue ever since
joining the Project, even suggesting that we commit bogus
implementations just for the sake of having the symbols.  I
completely agree with you that the lack of progress is
unacceptable, and I'm sorry I haven't had more time to work on
this stuff myself, but I also don't understand the source of your
urgency.

The reason I'm asking is that I'm pushing to get a lot of stuff
into the tree quickly, but realistically, in the short term we're
only going to get 95% of the way there.  I doubt good
implementations of complicated functions that nobody uses, such as
erfcl() and tgammal(), are going to appear overnight.  Thus, I
would like to know whether the last 5% is needed quickly, and if
so, why.

From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 13:56:27 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 6565448F
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 13:56:27 +0000 (UTC)
 (envelope-from imp@bsdimp.com)
Received: from mail-ie0-x231.google.com (mail-ie0-x231.google.com
 [IPv6:2607:f8b0:4001:c03::231])
 by mx1.freebsd.org (Postfix) with ESMTP id 349DC370
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 13:56:27 +0000 (UTC)
Received: by mail-ie0-f177.google.com with SMTP id 9so652573iec.36
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 06:56:27 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer
 :x-gm-message-state;
 bh=IORAOblq2wOwGWSdSoX5BpqP8LFSxOX+toid05IcPvk=;
 b=lYqWIwnDHYoJ4UuTJ68N4QI47ZpFs0t+xQ7C/COlUSjZGPAX/FVhq5lV3xttGZiHW/
 NdiqICfRTev1G2Bxbd8udgPfvCRi1zZ/eB1/4Ea4t6hKmVzI/tq4ns8ZkTy0qaGybLh1
 wC2sgUS/dtZymM7+KHXimOcgprkBfr6qXMbxIB85untn6EtprFQK9avoPsySLwFiSu9e
 G3pzZ3ZSMso97W4X7Yc+7vKbtBq2t8hjuoev73Uw+ZslXNmM7ByY1vheiwh5qy/k0nBF
 Kovs+SLKupQXfx1QdOpbk2zwTW/rFLx3PggVkLqJpEBCx5CQEzWnp6eVihR5dFzGO2Bf
 eQ9g==
X-Received: by 10.50.25.4 with SMTP id y4mr3701347igf.111.1369922186925;
 Thu, 30 May 2013 06:56:26 -0700 (PDT)
Received: from 53.imp.bsdimp.com
 (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198])
 by mx.google.com with ESMTPSA id ik6sm7054727igb.3.2013.05.30.06.56.25
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Thu, 30 May 2013 06:56:26 -0700 (PDT)
Sender: Warner Losh <wlosh@bsdimp.com>
Subject: Re: standards/175811: libstdc++ needs complex support in order use C99
Mime-Version: 1.0 (Apple Message framework v1085)
Content-Type: text/plain; charset=us-ascii
From: Warner Losh <imp@bsdimp.com>
In-Reply-To: <20130530064635.GA91597@zim.MIT.EDU>
Date: Thu, 30 May 2013 07:56:24 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <A3633CF7-B0D3-4E09-88FC-1D40197C652C@bsdimp.com>
References: <201302040328.r143SUd3039504@freefall.freebsd.org>
 <510F306A.6090009@missouri.edu>
 <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
 <20130530064635.GA91597@zim.MIT.EDU>
To: David Schultz <das@FreeBSD.ORG>
X-Mailer: Apple Mail (2.1085)
X-Gm-Message-State: ALoCoQlajA6OIWWt/6Iphg9osVLVA0f4PDMjoH3Dp5asn+Mrifzw9mqvNu8CbPnhhGQwmguTAN2s
Cc: Stephen Montgomery-Smith <stephen@missouri.edu>,
 freebsd-standards@freebsd.org, pfg@freebsd.org,
 David Chisnall <theraven@freebsd.org>, freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 13:56:27 -0000


On May 30, 2013, at 12:46 AM, David Schultz wrote:
> On Fri, Feb 22, 2013, David Chisnall wrote:
>> On 4 Feb 2013, at 03:52, Stephen Montgomery-Smith =
<stephen@missouri.edu> wrote:
>>=20
>>> We do really seem to have a lot of working code right now.  And the =
main
>>> barrier to commitment seems to be style issues.
>>>=20
>>> For example, I have code at http://people.freebsd.org/~stephen/ for =
the
>>> complex arctrig functions.  And Bruce has clog available.  And
>>> presumably he has logl and atanl also available.
>>>=20
>>> The last I heard about my code is Bruce asking for some style =
changes.
>>> However I really don't think I will have time to work on it until at
>>> least the summer.  And to be honest, style just isn't my thing.
>>>=20
>>> I propose (a) that someone else takes over my code (and maybe =
Bruce's
>>> code) and make the style changes, or (b) that we get a little less =
fussy
>>> about getting it all just so right and start committing stuff.
>>>=20
>>> Let me add that the code we have is already far superior than =
anything
>>> in Linux or NetBSD, who clearly didn't worry about huge numerical =
errors
>>> in many edge cases.  Come on guys, let's start strutting our stuff.
>>>=20
>>> Let's commit what we have, even if it isn't perfect.
>>=20
>> Yes, please can this happen?  We are currently on 31 test
>> failures in the libc++ test suite on -HEAD, of which at least 18
>> are due to linker failures trying to find missing libm
>> functions.  We are very close to having a complete C++11
>> implementation, yet we are held up by the lack of C99 support,
>> and we are held up there by style nits?
>>=20
>> On behalf of core, please can we commit the existing code and
>> worry about the style later? Given the expertise required to
>> work on the libm functions, most of the people who are able to
>> hack on the code have already read it and so concerns about
>> consistency readability are somewhat misplaced.
>=20
> I didn't see this thread until now, but coincidentally, I just
> wrote tests and manpages for and committed Stephen's
> implementations of most of the missing double/float complex
> functions. I don't know the status of clog() or cpow(), but
> murray@ has a patch to port the NetBSD versions, which I'm also
> willing to commit given the unacceptable delays in producing
> something better.

I'm all for better progress... Thank you for your efforts.

> I was wondering if you could explain a bit about what your goal is
> here, though.  Is there some kind of certification you are trying
> to achieve?  Why can't you just comment out the few missing
> functions?  You've been adamant about this issue ever since
> joining the Project, even suggesting that we commit bogus
> implementations just for the sake of having the symbols.  I
> completely agree with you that the lack of progress is
> unacceptable, and I'm sorry I haven't had more time to work on
> this stuff myself, but I also don't understand the source of your
> urgency.

More and more projects are refusing to work around our gridlock. We have =
to report R each new release because they have taken out  the checks for =
the missing symbols. It is really an embarrassment to the project. We've =
let the perfect be the enemy of the good. There are R scripts that run =
elsewhere and not on FreeBSD. R is the one I know most about since I've =
been using R a lot to crunch numbers for work, but there are others.

The urgency is we'd like to have this stuff done for 10, if at all =
possible. And if not done, then a lot closer to done than where we are =
today.

> The reason I'm asking is that I'm pushing to get a lot of stuff
> into the tree quickly, but realistically, in the short term we're
> only going to get 95% of the way there.  I doubt good
> implementations of complicated functions that nobody uses, such as
> erfcl() and tgammal(), are going to appear overnight.  Thus, I
> would like to know whether the last 5% is needed quickly, and if
> so, why.

I'm all for getting everything we can into the tree that produces an =
answer that's not perfect, but close. What's the error that would be =
generated with the naive implementation of

long double tgammal(long double f) { return tgamma(f); }

But assuming that, for some reason, produces errors larger than =
difference in precision between double and long double due to extreme =
non-linearity of these functions, having only a couple of stragglers is =
a far better position to be in than we are today.

Warner=

From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 15:41:34 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 80CE87E5
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 15:41:34 +0000 (UTC)
 (envelope-from pfg@FreeBSD.org)
Received: from nm1-vm1.bullet.mail.bf1.yahoo.com
 (nm1-vm1.bullet.mail.bf1.yahoo.com [98.139.213.163])
 by mx1.freebsd.org (Postfix) with ESMTP id 32889FC3
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 15:41:34 +0000 (UTC)
Received: from [98.139.212.148] by nm1.bullet.mail.bf1.yahoo.com with NNFMP;
 30 May 2013 15:41:26 -0000
Received: from [98.139.213.1] by tm5.bullet.mail.bf1.yahoo.com with NNFMP;
 30 May 2013 15:41:26 -0000
Received: from [127.0.0.1] by smtp101.mail.bf1.yahoo.com with NNFMP;
 30 May 2013 15:41:26 -0000
X-Yahoo-Newman-Id: 935342.47691.bm@smtp101.mail.bf1.yahoo.com
X-Yahoo-Newman-Property: ymail-3
X-YMail-OSG: BabCFEIVM1lG9dYnW0ApjKaEQqUYGRIAOt95Of5VL7lslDw
 0SRE1wwwyNyDyIYQy66ygEjPe6WvzbZGx1_14Uqt.pJG7ajhMWlfJ2Rcog3t
 lc0B38qJMe5sQbKyCW5TWjhnUylJjTyR5n.WRZ2EtMiFrtUTfyZ87dOKlukE
 U1bKMm8zbhHnofWl1A07iDouURYQuE7K6wWZTLSkq4d5IcGwJXhdLBRB8fRe
 QZvru99NhC6ilipyqs5L5EQ2nGXuC119FBFGtRW.lnCVIYBoPvgkC4bp6nlb
 yP06pOh0bLk3iPCLlMV6YesWHCGeHOc8_vey5u0YIpbtMMF5IzmZ86n9BOJi
 ldaaO5flRnSazS74aaymGXFqnPdu5tMZO1Uk5YKh7CM3qE45jEyuyyNBMNGB
 ETYNxtK9yZhCb6qj_1shzgXD3lsk9KJpkE.5V7Lx0BISJxMX8.LfzKH3SEEi
 v6kZhhLXqJ46cc8UiLL_rvMicLBDEPMfzvgmTSJDHUsY.ChNmmQExPlKFtVo
 VFAOcRTrTq.m1DjlRzF_qK8vVFnBjstRupPyp9ikT2ZhmPf40xePLB1IxMxZ
 yfl_PCsd1c737uQ--
X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf
X-Rocket-Received: from [192.168.0.102] (pfg@190.157.126.109 with )
 by smtp101.mail.bf1.yahoo.com with SMTP; 30 May 2013 15:41:26 +0000 UTC
Message-ID: <51A77324.2070702@FreeBSD.org>
Date: Thu, 30 May 2013 10:41:24 -0500
From: Pedro Giffuni <pfg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130407 Thunderbird/17.0.5
MIME-Version: 1.0
To: David Schultz <das@FreeBSD.ORG>
Subject: Re: standards/175811: libstdc++ needs complex support in order use C99
References: <201302040328.r143SUd3039504@freefall.freebsd.org>
 <510F306A.6090009@missouri.edu>
 <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
 <20130530064635.GA91597@zim.MIT.EDU>
In-Reply-To: <20130530064635.GA91597@zim.MIT.EDU>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Stephen Montgomery-Smith <stephen@missouri.edu>,
 freebsd-standards@freebsd.org, freebsd-numerics@freebsd.org,
 David Chisnall <theraven@freebsd.org>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 15:41:34 -0000

On 30.05.2013 01:46, David Schultz wrote:
> On Fri, Feb 22, 2013, David Chisnall wrote:
>> On 4 Feb 2013, at 03:52, Stephen Montgomery-Smith <stephen@missouri.edu> wrote:
>>
>>> We do really seem to have a lot of working code right now.  And the main
>>> barrier to commitment seems to be style issues.
>>>
>>> For example, I have code at http://people.freebsd.org/~stephen/ for the
>>> complex arctrig functions.  And Bruce has clog available.  And
>>> presumably he has logl and atanl also available.
>>>
>>> The last I heard about my code is Bruce asking for some style changes.
>>> However I really don't think I will have time to work on it until at
>>> least the summer.  And to be honest, style just isn't my thing.
>>>
>>> I propose (a) that someone else takes over my code (and maybe Bruce's
>>> code) and make the style changes, or (b) that we get a little less fussy
>>> about getting it all just so right and start committing stuff.
>>>
>>> Let me add that the code we have is already far superior than anything
>>> in Linux or NetBSD, who clearly didn't worry about huge numerical errors
>>> in many edge cases.  Come on guys, let's start strutting our stuff.
>>>
>>> Let's commit what we have, even if it isn't perfect.
>> Yes, please can this happen?  We are currently on 31 test
>> failures in the libc++ test suite on -HEAD, of which at least 18
>> are due to linker failures trying to find missing libm
>> functions.  We are very close to having a complete C++11
>> implementation, yet we are held up by the lack of C99 support,
>> and we are held up there by style nits?
>>
>> On behalf of core, please can we commit the existing code and
>> worry about the style later? Given the expertise required to
>> work on the libm functions, most of the people who are able to
>> hack on the code have already read it and so concerns about
>> consistency readability are somewhat misplaced.
> I didn't see this thread until now, but coincidentally, I just
> wrote tests and manpages for and committed Stephen's
> implementations of most of the missing double/float complex
> functions. I don't know the status of clog() or cpow(), but
> murray@ has a patch to port the NetBSD versions, which I'm also
> willing to commit given the unacceptable delays in producing
> something better.

Thank you !!

> I was wondering if you could explain a bit about what your goal is
> here, though.  Is there some kind of certification you are trying
> to achieve?  Why can't you just comment out the few missing
> functions?  You've been adamant about this issue ever since
> joining the Project, even suggesting that we commit bogus
> implementations just for the sake of having the symbols.  I
> completely agree with you that the lack of progress is
> unacceptable, and I'm sorry I haven't had more time to work on
> this stuff myself, but I also don't understand the source of your
> urgency.

What I am finding rather disappointing is that our libstdc++
lacks so many features wrt to what is expected from
developers used to linux.

I think it's reasonable to think that libc++ will require the same
features as modern libstdc++ to support a quality port.

In addition to R, the current situation also has undesirable
effects in boost, where we don't support long double
(nevermind the bogus patch on our ports tree).

if we if we can just get our local libstdc++ to use C99 that
would be an advance. The target at this time would be resolving
standards/175811 and it would also be interesting to see what
the upstream gcc/libstdc++ requires.

> The reason I'm asking is that I'm pushing to get a lot of stuff
> into the tree quickly, but realistically, in the short term we're
> only going to get 95% of the way there.  I doubt good
> implementations of complicated functions that nobody uses, such as
> erfcl() and tgammal(), are going to appear overnight.  Thus, I
> would like to know whether the last 5% is needed quickly, and if
> so, why.

I may be wrong but with long double support people that
need erfcl() and tgamma() can get them from boost.
The problem is therefore not implementing everything but
getting enough to turn on the features supported by
libstdc++ and boost.

Pedro.

From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 16:27:29 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 7A2D0153
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 16:27:29 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 469F935B
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 16:27:29 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4UGRNVo067068; 
 Thu, 30 May 2013 09:27:23 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4UGRNTQ067067;
 Thu, 30 May 2013 09:27:23 -0700 (PDT) (envelope-from sgk)
Date: Thu, 30 May 2013 09:27:23 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: Patches for s_expl.c
Message-ID: <20130530162723.GB66755@troutmask.apl.washington.edu>
References: <20130528172242.GA51485@troutmask.apl.washington.edu>
 <20130529062437.V4648@besplex.bde.org>
 <20130529162441.GA58773@troutmask.apl.washington.edu>
 <20130530045951.Y4776@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130530045951.Y4776@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 16:27:29 -0000

On Thu, May 30, 2013 at 06:25:31AM +1000, Bruce Evans wrote:
> On Wed, 29 May 2013, Steve Kargl wrote:
> 
> > Yes, the following is one massive patch.
> 
> Easier to apply that way.
> 

OK, I've restored whitespace to hopefully match your expectations.
Removed excess digits in exponents (e.g., 1.234e08 --> 1.234e8).
Restored XXX comments.
Removed (unnecessary?) blank lines.
Restored the order of computing r = r1 + r2 in ld128.
Moved the |x| < 0x1p-113 if-block back into the [T1:T3] interval.

Final questions.  What is your preference for committing expm1l?
Should it be included in s_expl.c or should I use 'svn cp' to
copy s_expl.c to s_expm1l.c and add the implementation of
expm1l to the copied version?

-- 
Steve

Index: ld80/s_expl.c
===================================================================
--- ld80/s_expl.c	(revision 251067)
+++ ld80/s_expl.c	(working copy)
@@ -29,7 +29,7 @@
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
-/*-
+/**
  * Compute the exponential of x for Intel 80-bit format.  This is based on:
  *
  *   PTP Tang, "Table-driven implementation of the exponential function
@@ -50,6 +50,7 @@
 #include "math_private.h"
 
 #define	INTERVALS	128
+#define	LOG2_INTERVALS	7
 #define	BIAS	(LDBL_MAX_EXP - 1)
 
 static const long double
@@ -60,9 +61,12 @@
 
 static const union IEEEl2bits
 /* log(2**16384 - 0.5) rounded towards zero: */
-o_threshold = LD80C(0xb17217f7d1cf79ab, 13,  11356.5234062941439488L),
+/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */
+o_thresholdu = LD80C(0xb17217f7d1cf79ab, 13,  11356.5234062941439488L),
+#define o_threshold	 (o_thresholdu.e)
 /* log(2**(-16381-64-1)) rounded towards zero: */
-u_threshold = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L);
+u_thresholdu = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L);
+#define u_threshold	 (u_thresholdu.e)
 
 static const double
 /*
@@ -78,11 +82,11 @@
  * |exp(x) - p(x)| < 2**-77.2
  * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
  */
-P2 =  0.5,
-P3 =  1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
-P4 =  4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
-P5 =  8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
-P6 =  1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
+A2 =  0.5,
+A3 =  1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
+A4 =  4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
+A5 =  8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
+A6 =  1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
 
 /*
  * 2^(i/INTERVALS) for i in [0,INTERVALS] is represented by two values where
@@ -96,8 +100,7 @@
 static const struct {
 	double	hi;
 	double	lo;
-/* XXX should rename 's'. */
-} s[INTERVALS] = {
+} tbl[INTERVALS] = {
 	0x1p+0, 0x0p+0,
 	0x1.0163da9fb3335p+0, 0x1.b61299ab8cdb7p-54,
 	0x1.02c9a3e778060p+0, 0x1.dcdef95949ef4p-53,
@@ -232,7 +235,8 @@
 expl(long double x)
 {
 	union IEEEl2bits u, v;
-	long double fn, q, r, r1, r2, t, t23, t45, twopk, twopkp10000, z;
+	long double fn, q, r, r1, r2, t, twopk, twopkp10000;
+	long double z;
 	int k, n, n2;
 	uint16_t hx, ix;
 
@@ -242,40 +246,39 @@
 	ix = hx & 0x7fff;
 	if (ix >= BIAS + 13) {		/* |x| >= 8192 or x is NaN */
 		if (ix == BIAS + LDBL_MAX_EXP) {
-			if (hx & 0x8000 && u.xbits.man == 1ULL << 63)
-				return (0.0L);	/* x is -Inf */
-			return (x + x); /* x is +Inf, NaN or unsupported */
+			if (hx & 0x8000)  /* x is -Inf, -NaN or unsupported */
+				return (-1 / x);
+ 			return (x + x);	/* x is +Inf, +NaN or unsupported */
 		}
-		if (x > o_threshold.e)
+		if (x > o_threshold)
 			return (huge * huge);
-		if (x < u_threshold.e)
+		if (x < u_threshold)
 			return (tiny * tiny);
-	} else if (ix < BIAS - 66) {	/* |x| < 0x1p-66 */
-					/* includes pseudo-denormals */
-		if (huge + x > 1.0L)	/* trigger inexact iff x != 0 */
-			return (1.0L + x);
+	} else if (ix < BIAS - 65) {	/* |x| < 0x1p-65 (includes pseudos) */
+		return (1 + x);		/* 1 with inexact iff x != 0 */
 	}
 
 	ENTERI();
 
-	/* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
 	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
 	fn = x * INV_L + 0x1.8p63 - 0x1.8p63;
 	r = x - fn * L1 - fn * L2;	/* r = r1 + r2 done independently. */
 #if defined(HAVE_EFFICIENT_IRINTL)
-	n  = irintl(fn);
+	n = irintl(fn);
 #elif defined(HAVE_EFFICIENT_IRINT)
-	n  = irint(fn);
+	n = irint(fn);
 #else
-	n  = (int)fn;
+	n = (int)fn;
 #endif
 	n2 = (unsigned)n % INTERVALS;
-	k = (n - n2) / INTERVALS;
+	/* Depend on the sign bit being propagated: */
+	k = n >> LOG2_INTERVALS;
 	r1 = x - fn * L1;
-	r2 = -fn * L2;
+	r2 = fn * -L2;
 
 	/* Prepare scale factors. */
-	v.xbits.man = 1ULL << 63;
+	v.e = 1;
 	if (k >= LDBL_MIN_EXP) {
 		v.xbits.expsign = BIAS + k;
 		twopk = v.e;
@@ -284,21 +287,183 @@
 		twopkp10000 = v.e;
 	}
 
-	/* Evaluate expl(midpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */
-	/* Here q = q(r), not q(r1), since r1 is lopped like L1. */
-	t45 = r * P5 + P4;
+	/* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */
 	z = r * r;
-	t23 = r * P3 + P2;
-	q = r2 + z * t23 + z * z * t45 + z * z * z * P6;
-	t = (long double)s[n2].lo + s[n2].hi;
-	t = s[n2].lo + t * (q + r1) + s[n2].hi;
+	q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6;
+	t = (long double)tbl[n2].lo + tbl[n2].hi;
+	t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
 
 	/* Scale by 2**k. */
 	if (k >= LDBL_MIN_EXP) {
 		if (k == LDBL_MAX_EXP)
-			RETURNI(t * 2.0L * 0x1p16383L);
+			RETURNI(t * 2 * 0x1p16383L);
 		RETURNI(t * twopk);
 	} else {
 		RETURNI(t * twopkp10000 * twom10000);
 	}
 }
+
+/**
+ * Compute expm1l(x) for Intel 80-bit format.  This is based on:
+ *
+ *   PTP Tang, "Table-driven implementation of the Expm1 function
+ *   in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18,
+ *   211-222 (1992).
+ */
+
+/*
+ * Our T1 and T2 are chosen to be approximately the points where method
+ * A and method B have the same accuracy.  Tang's T1 and T2 are the
+ * points where method A's accuracy changes by a full bit.  For Tang,
+ * this drop in accuracy makes method A immediately less accurate than
+ * method B, but our larger INTERVALS makes method A 2 bits more
+ * accurate so it remains the most accurate method significantly
+ * closer to the origin despite losing the full bit in our extended
+ * range for it.
+ */
+static const double
+T1 = -0.1659,				/* ~-30.625/128 * log(2) */
+T2 =  0.1659;				/* ~30.625/128 * log(2) */
+
+/*
+ * Domain [-0.1659, 0.1659], range ~[-1.2027e-22, 3.4417e-22]:
+ * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.2
+ */
+static const union IEEEl2bits
+B3 = LD80C(0xaaaaaaaaaaaaaaab, -3,  1.66666666666666666671e-1L),
+B4 = LD80C(0xaaaaaaaaaaaaaaac, -5,  4.16666666666666666712e-2L);
+
+static const double
+B5  =  8.3333333333333245e-3,		/*  0x1.111111111110cp-7 */
+B6  =  1.3888888888888861e-3,		/*  0x1.6c16c16c16c0ap-10 */
+B7  =  1.9841269841532042e-4,		/*  0x1.a01a01a0319f9p-13 */
+B8  =  2.4801587302069236e-5,		/*  0x1.a01a01a03cbbcp-16 */
+B9  =  2.7557316558468562e-6,		/*  0x1.71de37fd33d67p-19 */
+B10 =  2.7557315829785151e-7,		/*  0x1.27e4f91418144p-22 */
+B11 =  2.5063168199779829e-8,		/*  0x1.ae94fabdc6b27p-26 */
+B12 =  2.0887164654459567e-9;		/*  0x1.1f122d6413fe1p-29 */
+
+long double
+expm1l(long double x)
+{
+	union IEEEl2bits u, v;
+	long double fn, hx2_hi, hx2_lo, q, r, r1, r2, t, twomk, twopk, x_hi;
+	long double x_lo, x2, z;
+	long double x4;
+	int k, n, n2;
+	uint16_t hx, ix;
+
+	/* Filter out exceptional cases. */
+	u.e = x;
+	hx = u.xbits.expsign;
+	ix = hx & 0x7fff;
+	if (ix >= BIAS + 6) {		/* |x| >= 64 or x is NaN */
+		if (ix == BIAS + LDBL_MAX_EXP) {
+			if (hx & 0x8000)  /* x is -Inf, -NaN or unsupported */
+				return (-1 / x - 1);
+			return (x + x);	/* x is +Inf, +NaN or unsupported */
+		}
+		if (x > o_threshold)
+			return (huge * huge);
+		/*
+		 * expm1l() never underflows, but it must avoid
+		 * unrepresentable large negative exponents.  We used a
+		 * much smaller threshold for large |x| above than in
+		 * expl() so as to handle not so large negative exponents
+		 * in the same way as large ones here.
+		 */
+		if (hx & 0x8000)	/* x <= -64 */
+			return (tiny - 1);	/* good for x < -65ln2 - eps */
+	}
+
+	ENTERI();
+
+	if (T1 < x && x < T2) {
+		if (ix < BIAS - 64) {	/* |x| < 0x1p-64 (includes pseudos) */
+			/* x (rounded) with inexact if x != 0: */
+			RETURNI(x == 0 ? x :
+			    (0x1p100 * x + fabsl(x)) * 0x1p-100);
+		}
+
+		x2 = x * x;
+		x4 = x2 * x2;
+		q = x4 * (x2 * (x4 *
+		    /*
+		     * XXX the number of terms is no longer good for
+		     * pairwise grouping of all except B3, and the
+		     * grouping is no longer from highest down.
+		     */
+		    (x2 *            B12  + (x * B11 + B10)) +
+		    (x2 * (x * B9 +  B8) +  (x * B7 +  B6))) +
+			  (x * B5 +  B4.e)) + x2 * x * B3.e;
+
+		x_hi = (float)x;
+		x_lo = x - x_hi;
+		hx2_hi = x_hi * x_hi / 2;
+		hx2_lo = x_lo * (x + x_hi) / 2;
+		if (ix >= BIAS - 7)
+			RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi));
+		else
+			RETURNI(hx2_lo + q + hx2_hi + x);
+	}
+
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
+	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
+	fn = x * INV_L + 0x1.8p63 - 0x1.8p63;
+#if defined(HAVE_EFFICIENT_IRINTL)
+	n = irintl(fn);
+#elif defined(HAVE_EFFICIENT_IRINT)
+	n = irint(fn);
+#else
+	n = (int)fn;
+#endif
+	n2 = (unsigned)n % INTERVALS;
+	k = n >> LOG2_INTERVALS;
+	r1 = x - fn * L1;
+	r2 = fn * -L2;
+	r = r1 + r2;
+
+	/* Prepare scale factor. */
+	v.e = 1;
+	v.xbits.expsign = BIAS + k;
+	twopk = v.e;
+
+	/*
+	 * Evaluate lower terms of
+	 * expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2).
+	 */
+	z = r * r;
+	q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6;
+
+	t = (long double)tbl[n2].lo + tbl[n2].hi;
+
+	if (k == 0) {
+		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 +
+		    (tbl[n2].hi - 1);
+		RETURNI(t);
+	}
+	if (k == -1) {
+		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + 
+		    (tbl[n2].hi - 2);
+		RETURNI(t / 2);
+	}
+	if (k < -7) {
+		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
+		RETURNI(t * twopk - 1);
+	}
+	if (k > 2 * LDBL_MANT_DIG - 1) {
+		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
+		if (k == LDBL_MAX_EXP)
+			RETURNI(t * 2 * 0x1p16383L - 1);
+		RETURNI(t * twopk - 1);
+	}
+
+	v.xbits.expsign = BIAS - k;
+	twomk = v.e;
+
+	if (k > LDBL_MANT_DIG - 1)
+		t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi;
+	else
+		t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk);
+	RETURNI(t * twopk);
+}
Index: ld128/s_expl.c
===================================================================
--- ld128/s_expl.c	(revision 251067)
+++ ld128/s_expl.c	(working copy)
@@ -1,5 +1,5 @@
 /*-
- * Copyright (c) 2012 Steven G. Kargl
+ * Copyright (c) 2009-2012 Steven G. Kargl
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -22,6 +22,8 @@
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Optimized by Bruce D. Evans.
  */
 
 #include <sys/cdefs.h>
@@ -38,35 +40,56 @@
 #include "math_private.h"
 
 #define	INTERVALS	128
+#define	LOG2_INTERVALS	7
 #define	BIAS	(LDBL_MAX_EXP - 1)
 
+static const long double
+huge = 0x1p10000L,
+twom10000 = 0x1p-10000L;
+/* XXX Prevent gcc from erroneously constant folding this: */
 static volatile const long double tiny = 0x1p-10000L;
 
 static const long double
-INV_L = 1.84664965233787316142070359168242182e+02L,
-L1 = 5.41521234812457272982212595914567508e-03L,
-L2 = -1.02536706388947310094527932552595546e-29L,
-huge = 0x1p10000L,
+/* log(2**16384 - 0.5) rounded towards zero: */
+/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */
 o_threshold =  11356.523406294143949491931077970763428L,
-twom10000 = 0x1p-10000L,
+/* log(2**(-16381-64-1)) rounded towards zero: */
 u_threshold = -11433.462743336297878837243843452621503L;
 
+static const double
+/*
+ * ln2/INTERVALS = L1+L2 (hi+lo decomposition for multiplication).  L1 must
+ * have at least 22 (= log2(|LDBL_MIN_EXP-extras|) + log2(INTERVALS)) lowest
+ * bits zero so that multiplication of it by n is exact.
+ */
+INV_L = 1.8466496523378731e+2,		/*  0x171547652b82fe.0p-45 */
+L2 = -1.0253670638894731e-29;		/* -0x1.9ff0342542fc3p-97 */
 static const long double
-P2 = 5.00000000000000000000000000000000000e-1L,
-P3 = 1.66666666666666666666666666666666972e-1L,
-P4 = 4.16666666666666666666666666653708268e-2L,
-P5 = 8.33333333333333333333333315069867254e-3L,
-P6 = 1.38888888888888888888996596213795377e-3L,
-P7 = 1.98412698412698412718821436278644414e-4L,
-P8 = 2.48015873015869681884882576649543128e-5L,
-P9 = 2.75573192240103867817876199544468806e-6L,
-P10 = 2.75573236172670046201884000197885520e-7L,
-P11 = 2.50517544183909126492878226167697856e-8L;
+/* 0x1.62e42fefa39ef35793c768000000p-8 */
+L1 =  5.41521234812457272982212595914567508e-3L;
 
+static const long double
+/*
+ * Domain [-0.002708, 0.002708], range ~[-2.4011e-38, 2.4244e-38]:
+ * |exp(x) - p(x)| < 2**-124.9
+ * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
+ */
+A2  =  0.5,
+A3  =  1.66666666666666666666666666651085500e-1L,
+A4  =  4.16666666666666666666666666425885320e-2L,
+A5  =  8.33333333333333333334522877160175842e-3L,
+A6  =  1.38888888888888888889971139751596836e-3L;
+
+static const double
+A7  =  1.9841269841269471e-4,
+A8  =  2.4801587301585284e-5,
+A9  =  2.7557324277411234e-6,
+A10 =  2.7557333722375072e-7;
+
 static const struct {
 	long double	hi;
 	long double	lo;
-} s[INTERVALS] = {
+} tbl[INTERVALS] = {
 	0x1p0L, 0x0p0L,
 	0x1.0163da9fb33356d84a66aep0L, 0x3.36dcdfa4003ec04c360be2404078p-92L,
 	0x1.02c9a3e778060ee6f7cacap0L, 0x4.f7a29bde93d70a2cabc5cb89ba10p-92L,
@@ -201,9 +224,10 @@
 expl(long double x)
 {
 	union IEEEl2bits u, v;
-	long double fn, r, r1, r2, q, t, twopk, twopkp10000;
+	long double q, r, r1, t, twopk, twopkp10000;
+	double dr, fn, r2;
 	int k, n, n2;
-	uint32_t hx, ix;
+	uint16_t hx, ix;
 
 	/* Filter out exceptional cases. */
 	u.e = x;
@@ -211,31 +235,39 @@
 	ix = hx & 0x7fff;
 	if (ix >= BIAS + 13) {		/* |x| >= 8192 or x is NaN */
 		if (ix == BIAS + LDBL_MAX_EXP) {
-			if (hx & 0x8000 && u.xbits.manh == 0 &&
-			    u.xbits.manl == 0)
-				return (0.0L);	/* x is -Inf */
-			return (x + x);	/* x is +Inf or NaN */
+			if (hx & 0x8000)  /* x is -Inf or -NaN */
+				return (-1 / x);
+			return (x + x);	/* x is +Inf or +NaN */
 		}
 		if (x > o_threshold)
 			return (huge * huge);
 		if (x < u_threshold)
 			return (tiny * tiny);
-	} else if (ix < BIAS - 115) {	/* |x| < 0x1p-115 */
-	    	if (huge + x > 1.0L)	/* trigger inexact iff x != 0 */
-			return (1.0L + x);
+	} else if (ix < BIAS - 114) {	/* |x| < 0x1p-114 */
+		return (1 + x);		/* 1 with inexact iff x != 0 */
 	}
 
-	/* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */
-	fn = x * INV_L + 0x1.8p112 - 0x1.8p112;
-	n  = (int)fn;
+	ENTERI();
+
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
+	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
+	/* XXX assume no extra precision for the additions, as for trig fns. */
+	/* XXX this set of comments is now quadruplicated. */
+	fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52;
+#if defined(HAVE_EFFICIENT_IRINT)
+	n = irint(fn);
+#else
+	n = (int)fn;
+#endif
 	n2 = (unsigned)n % INTERVALS;
-	k = (n - n2) / INTERVALS;
+	k = n >> LOG2_INTERVALS;
 	r1 = x - fn * L1;
-	r2 = -fn * L2;
+	r2 = fn * -L2;
+	r = r1 + r2;
 
 	/* Prepare scale factors. */
-	v.xbits.manh = 0;
-	v.xbits.manl = 0;
+	/* XXX sparc64 multiplication is so slow that scalbnl() is faster. */
+	v.e = 1;
 	if (k >= LDBL_MIN_EXP) {
 		v.xbits.expsign = BIAS + k;
 		twopk = v.e;
@@ -244,18 +276,220 @@
 		twopkp10000 = v.e;
 	}
 
-	r = r1 + r2;
-	q = r * r * (P2 + r * (P3 + r * (P4 + r * (P5 + r * (P6 + r * (P7 +
-	    r * (P8 + r * (P9 + r * (P10 + r * P11)))))))));
-	t = s[n2].lo + s[n2].hi;
-	t = s[n2].hi + (s[n2].lo + t * (r2 + q + r1));
+	/* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */
+	dr = r;
+	q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 +
+	    dr * (A7 + dr * (A8 + dr * (A9 + dr * A10))))))));
+	t = tbl[n2].lo + tbl[n2].hi;
+	t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
 
 	/* Scale by 2**k. */
 	if (k >= LDBL_MIN_EXP) {
 		if (k == LDBL_MAX_EXP)
-			return (t * 2.0L * 0x1p16383L);
-		return (t * twopk);
+			RETURNI(t * 2 * 0x1p16383L);
+		RETURNI(t * twopk);
 	} else {
-		return (t * twopkp10000 * twom10000);
+		RETURNI(t * twopkp10000 * twom10000);
 	}
 }
+
+/*
+ * Our T1 and T2 are chosen to be approximately the points where method
+ * A and method B have the same accuracy.  Tang's T1 and T2 are the
+ * points where method A's accuracy changes by a full bit.  For Tang,
+ * this drop in accuracy makes method A immediately less accurate than
+ * method B, but our larger INTERVALS makes method A 2 bits more
+ * accurate so it remains the most accurate method significantly
+ * closer to the origin despite losing the full bit in our extended
+ * range for it.
+ */
+static const double
+T1 = -0.1659,				/* ~-30.625/128 * log(2) */
+T2 =  0.1659;				/* ~30.625/128 * log(2) */
+
+/*
+ * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2].
+ * Setting T3 to 0 would require the |x| < 0x1p-113  condition to appear
+ * in both subintervals, so set T3 = 2**-5, which places the condition
+ * into the [T1:T3] interval.
+ */
+static const double
+T3 =  0.03125;
+
+/*
+ * XXX Estimated range is for absolute error.
+ * Domain [-0.1659, 0.03125], range ~[-1.8933e-38, 1.8943e-38]:
+ * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-125.3
+ */
+static const long double
+C3  =  1.66666666666666666666666666666666667e-1L,
+C4  =  4.16666666666666666666666666666666645e-2L,
+C5  =  8.33333333333333333333333333333371638e-3L,
+C6  =  1.38888888888888888888888888891188658e-3L,
+C7  =  1.98412698412698412698412697235950394e-4L,
+C8  =  2.48015873015873015873015112487849040e-5L,
+C9  =  2.75573192239858906525606685484412005e-6L,
+C10 =  2.75573192239858906612966093057020362e-7L,
+C11 =  2.50521083854417203619031960151253944e-8L,
+C12 =  2.08767569878679576457272282566520649e-9L,
+C13 =  1.60590438367252471783548748824255707e-10L;
+
+static const double
+C14 =  1.1470745580491932e-11,		/*  0x1.93974a81dae3p-37 */
+C15 =  7.6471620181090468e-13,		/*  0x1.ae7f3820adab1p-41 */
+C16 =  4.7793721460260450e-14,		/*  0x1.ae7cd18a18eacp-45 */
+C17 =  2.8074757356658877e-15,		/*  0x1.949992a1937d9p-49 */
+C18 =  1.4760610323699476e-16;		/*  0x1.545b43aabfbcdp-53 */
+
+/*
+ * XXX Estimated range is for absolute error.
+ * Domain [0.03125, 0.1659], range ~[-2.7597e-38, 2.7602e-38]:
+ * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-124.8
+ */
+static const long double
+D3  =  1.66666666666666666666666666666682245e-1L,
+D4  =  4.16666666666666666666666666634228324e-2L,
+D5  =  8.33333333333333333333333364022244481e-3L,
+D6  =  1.38888888888888888888887138722762072e-3L,
+D7  =  1.98412698412698412699085805424661471e-4L,
+D8  =  2.48015873015873015687993712101479612e-5L,
+D9  =  2.75573192239858944101036288338208042e-6L,
+D10 =  2.75573192239853161148064676533754048e-7L,
+D11 =  2.50521083855084570046480450935267433e-8L,
+D12 =  2.08767569819738524488686318024854942e-9L,
+D13 =  1.60590442297008495301927448122499313e-10L;
+
+static const double
+D14 =  1.1470726176204336e-11,		/*  0x1.93971dc395d9ep-37 */
+D15 =  7.6478532249581686e-13,		/*  0x1.ae892e3D16fcep-41 */
+D16 =  4.7628892832607741e-14,		/*  0x1.ad00Dfe41feccp-45 */
+D17 =  3.0524857220358650e-15;		/*  0x1.D7e8d886Df921p-49 */
+
+long double
+expm1l(long double x)
+{
+	union IEEEl2bits u, v;
+	long double hx2_hi, hx2_lo, q, r, r1, t, twomk, twopk, x_hi;
+	long double x_lo, x2;
+	double dr, dx, fn, r2;
+	int k, n, n2;
+	uint16_t hx, ix;
+
+	/* Filter out exceptional cases. */
+	u.e = x;
+	hx = u.xbits.expsign;
+	ix = hx & 0x7fff;
+	if (ix >= BIAS + 7) {		/* |x| >= 128 or x is NaN */
+		if (ix == BIAS + LDBL_MAX_EXP) {
+			if (hx & 0x8000)  /* x is -Inf or -NaN */
+				return (-1 / x - 1);
+			return (x + x);	/* x is +Inf or +NaN */
+		}
+		if (x > o_threshold)
+			return (huge * huge);
+		/*
+		 * expm1l() never underflows, but it must avoid
+		 * unrepresentable large negative exponents.  We used a
+		 * much smaller threshold for large |x| above than in
+		 * expl() so as to handle not so large negative exponents
+		 * in the same way as large ones here.
+		 */
+		if (hx & 0x8000)	/* x <= -128 */
+			return (tiny - 1);	/* good for x < -114ln2 - eps */
+	}
+
+	ENTERI();
+
+	if (T1 < x && x < T2) {
+
+		x2 = x * x;
+		dx = x;
+
+		if (x < T3) {
+			if (ix < BIAS - 113) {	/* |x| < 0x1p-113 */
+				/* x (rounded) with inexact if x != 0: */
+				RETURNI(x == 0 ? x :
+				    (0x1p200 * x + fabsl(x)) * 0x1p-200);
+			}
+			q = x * x2 * C3 + x2 * x2 * (C4 + x * (C5 + x * (C6 +
+			    x * (C7 + x * (C8 + x * (C9 + x * (C10 +
+			    x * (C11 + x * (C12 + x * (C13 +
+			    dx * (C14 + dx * (C15 + dx * (C16 +
+			    dx * (C17 + dx * C18))))))))))))));
+		} else {
+			q = x * x2 * D3 + x2 * x2 * (D4 + x * (D5 + x * (D6 +
+			    x * (D7 + x * (D8 + x * (D9 + x * (D10 +
+			    x * (D11 + x * (D12 + x * (D13 +
+			    dx * (D14 + dx * (D15 + dx * (D16 +
+			    dx * D17)))))))))))));
+		}
+
+		x_hi = (float)x;
+		x_lo = x - x_hi;
+		hx2_hi = x_hi * x_hi / 2;
+		hx2_lo = x_lo * (x + x_hi) / 2;
+		if (ix >= BIAS - 7)
+			RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi));
+		else
+			RETURNI(hx2_lo + q + hx2_hi + x);
+	}
+
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
+	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
+	fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52;
+#if defined(HAVE_EFFICIENT_IRINT)
+	n = irint(fn);
+#else
+	n = (int)fn;
+#endif
+	n2 = (unsigned)n % INTERVALS;
+	k = n >> LOG2_INTERVALS;
+	r1 = x - fn * L1;
+	r2 = fn * -L2;
+	r = r1 + r2;
+
+	/* Prepare scale factor. */
+	v.e = 1;
+	v.xbits.expsign = BIAS + k;
+	twopk = v.e;
+
+	/*
+	 * Evaluate lower terms of
+	 * expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2).
+	 */
+	dr = r;
+	q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 +
+	    dr * (A7 + dr * (A8 + dr * (A9 + dr * A10))))))));
+
+	t = tbl[n2].lo + tbl[n2].hi;
+
+	if (k == 0) {
+		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 +
+		    (tbl[n2].hi - 1);
+		RETURNI(t);
+	}
+	if (k == -1) {
+		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + 
+		    (tbl[n2].hi - 2);
+		RETURNI(t / 2);
+	}
+	if (k < -7) {
+		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
+		RETURNI(t * twopk - 1);
+	}
+	if (k > 2 * LDBL_MANT_DIG - 1) {
+		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
+		if (k == LDBL_MAX_EXP)
+			RETURNI(t * 2 * 0x1p16383L - 1);
+		RETURNI(t * twopk - 1);
+	}
+
+	v.xbits.expsign = BIAS - k;
+	twomk = v.e;
+
+	if (k > LDBL_MANT_DIG - 1)
+		t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi;
+	else
+		t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk);
+	RETURNI(t * twopk);
+}

From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 16:52:38 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 9B52D882;
 Thu, 30 May 2013 16:52:38 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from fallbackmx08.syd.optusnet.com.au
 (fallbackmx08.syd.optusnet.com.au [211.29.132.10])
 by mx1.freebsd.org (Postfix) with ESMTP id B6C2C669;
 Thu, 30 May 2013 16:52:37 +0000 (UTC)
Received: from mail28.syd.optusnet.com.au (mail28.syd.optusnet.com.au
 [211.29.133.169])
 by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
 r4UGqRRB032572; Fri, 31 May 2013 02:52:27 +1000
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail28.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4UGqDd8011312
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Fri, 31 May 2013 02:52:14 +1000
Date: Fri, 31 May 2013 02:52:13 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Warner Losh <imp@bsdimp.com>
Subject: Re: standards/175811: libstdc++ needs complex support in order use C99
In-Reply-To: <A3633CF7-B0D3-4E09-88FC-1D40197C652C@bsdimp.com>
Message-ID: <20130531015915.N65390@besplex.bde.org>
References: <201302040328.r143SUd3039504@freefall.freebsd.org>
 <510F306A.6090009@missouri.edu>
 <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
 <20130530064635.GA91597@zim.MIT.EDU>
 <A3633CF7-B0D3-4E09-88FC-1D40197C652C@bsdimp.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=BPvrNysG c=1 sm=1 a=Qub1x3MNGSYA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=AyPkC9FW8vsA:10
 a=gmrSIYXE1WnqeYESaG8A:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
Cc: David Chisnall <theraven@FreeBSD.org>,
 Stephen Montgomery-Smith <stephen@missouri.edu>, pfg@FreeBSD.org,
 freebsd-numerics@FreeBSD.org, David Schultz <das@FreeBSD.org>,
 freebsd-standards@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 16:52:38 -0000

On Thu, 30 May 2013, Warner Losh wrote:

> I'm all for getting everything we can into the tree that produces an answer that's not perfect, but close. What's the error that would be generated with the naive implementation of
>
> long double tgammal(long double f) { return tgamma(f); }

On x86, 11 low bits wrong, for an error of 2048 ulps, in addition to any
errors in tgamma().  tgamma() on i386 inherits errors of 9 peta-ulps
(all 53 bits wrong) from i387 trig functions, but is OK on small args on
i386 and better on large args on amd64.

On sparc64, 60 low bits wrong, for an error of 1 exa-ulp, in addition
to any errors in tgamma(); the latter are the same as on amd64.  Sparc64
users of long double precision pay for it with a loss of performance
of a factor of several hundred, so they should be unhappy to not get
he extra bits when they ask for them (but the above inaccurate version
doesn't give them what they asked for).

On arches with long double == double, no difference.

On i386 with the default rounding precision of double, little difference.

> But assuming that, for some reason, produces errors larger than difference in precision between double and long double due to extreme non-linearity of these functions, having only a couple of stragglers is a far better position to be in than we are today.

Such extra errors normally don't happen.  In fact, my accuracy tests for
double functions are essentially to upcast the results of double functions
and compare the resulting bits with the corresponding results for long
double functions.  Nonlinearities tend to only happen at zeros and poles
of functions and then they are due to bugs, and for NaNs, and then they are 
due to implementation-defined behaviour.  It is difficult to even determine
the location of zeros and poles for some functions, and most of the
complexities in libm are to uses especially careful calculations near
them when they are known.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 16:56:14 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 8BF9A8E8;
 Thu, 30 May 2013 16:56:14 +0000 (UTC) (envelope-from das@FreeBSD.ORG)
Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net
 [50.196.151.174])
 by mx1.freebsd.org (Postfix) with ESMTP id 614A1690;
 Thu, 30 May 2013 16:56:13 +0000 (UTC)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
 by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r4UGuBTH093763;
 Thu, 30 May 2013 09:56:11 -0700 (PDT) (envelope-from das@FreeBSD.ORG)
Received: (from das@localhost)
 by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r4UGuARj093762;
 Thu, 30 May 2013 09:56:10 -0700 (PDT) (envelope-from das@FreeBSD.ORG)
Date: Thu, 30 May 2013 09:56:10 -0700
From: David Schultz <das@FreeBSD.ORG>
To: Warner Losh <imp@bsdimp.com>
Subject: Re: standards/175811: libstdc++ needs complex support in order use C99
Message-ID: <20130530165610.GA93684@zim.MIT.EDU>
References: <201302040328.r143SUd3039504@freefall.freebsd.org>
 <510F306A.6090009@missouri.edu>
 <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
 <20130530064635.GA91597@zim.MIT.EDU>
 <A3633CF7-B0D3-4E09-88FC-1D40197C652C@bsdimp.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <A3633CF7-B0D3-4E09-88FC-1D40197C652C@bsdimp.com>
Cc: Stephen Montgomery-Smith <stephen@missouri.edu>,
 David Chisnall <theraven@freebsd.org>, pfg@freebsd.org,
 freebsd-standards@freebsd.org, freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 16:56:14 -0000

On Thu, May 30, 2013, Warner Losh wrote:
> 
> On May 30, 2013, at 12:46 AM, David Schultz wrote:
> > On Fri, Feb 22, 2013, David Chisnall wrote:
> > I was wondering if you could explain a bit about what your goal is
> > here, though.  Is there some kind of certification you are trying
> > to achieve?  Why can't you just comment out the few missing
> > functions?  You've been adamant about this issue ever since
> > joining the Project, even suggesting that we commit bogus
> > implementations just for the sake of having the symbols.  I
> > completely agree with you that the lack of progress is
> > unacceptable, and I'm sorry I haven't had more time to work on
> > this stuff myself, but I also don't understand the source of your
> > urgency.
> 
> More and more projects are refusing to work around our
> gridlock. We have to report R each new release because they have
> taken out the checks for the missing symbols. It is really an
> embarrassment to the project. We've let the perfect be the enemy
> of the good. There are R scripts that run elsewhere and not on
> FreeBSD. R is the one I know most about since I've been using R
> a lot to crunch numbers for work, but there are others.
> 
> The urgency is we'd like to have this stuff done for 10, if at
> all possible. And if not done, then a lot closer to done than
> where we are today.

It looks like the R in ports just wants logl(), which isn't
surprising, and there's already code for that.  So getting that in
for 10 is achievable.

> > The reason I'm asking is that I'm pushing to get a lot of stuff
> > into the tree quickly, but realistically, in the short term we're
> > only going to get 95% of the way there.  I doubt good
> > implementations of complicated functions that nobody uses, such as
> > erfcl() and tgammal(), are going to appear overnight.  Thus, I
> > would like to know whether the last 5% is needed quickly, and if
> > so, why.
> 
> I'm all for getting everything we can into the tree that
> produces an answer that's not perfect, but close. What's the
> error that would be generated with the naive implementation of
> 
> long double tgammal(long double f) { return tgamma(f); }
> 
> But assuming that, for some reason, produces errors larger than
> difference in precision between double and long double due to
> extreme non-linearity of these functions, having only a couple
> of stragglers is a far better position to be in than we are
> today.

Whether this is acceptable depends a lot on who needs it in the
first place, which is part of why I was asking.  For many years,
the only software that cared was libstdc++, and libstdc++ only
wanted to wrap it.

Here are some of my notes on the status of things:

long double     log2l(long double);                     -- bde
long double     logl(long double);                      -- bde
long double     log1pl(long double);                    -- bde

Bruce has these written.  We can commit them with a little cleanup.

  long double     acoshl(long double);                  -- sgk
  long double     asinhl(long double);                  -- sgk
  long double     atanhl(long double);                  -- sgk
  long double     log10l(long double);                  -- bde

These are trivial given the first three. I believe Bruce and Steve
have the code for them already.

long double     expl(long double);                      -- sgk
long double     expm1l(long double);                    -- sgk

Steve has perfectly committable patches that I've already approved
(and furthermore, he doesn't need my approval anymore!)

  long double     coshl(long double);
  long double     sinhl(long double);
  long double     tanhl(long double);
  long double     erfcl(long double);
  long double     erfl(long double);

These are easy given expl() and expm1l().

long double     powl(long double, long double);

This is not so easy, but important, so we can make it a priority.

long double     lgammal(long double);
long double     tgammal(long double);

These are neither easy nor important; this gets back to my question.

float complex clogf(float complex);                     -- bde
double complex clog(double complex);                    -- bde

Bruce has code for these, which should be straightforward to turn
into something committable.

float complex cpowf(float complex, float complex);
double complex cpow(double complex, double complex);

This one is tough to do well and even tougher to test -- lots of
nasty corner cases.

long double complex cexpl(long double complex);
long double complex ccosl(long double complex);
  long double complex ccoshl(long double complex);
long double complex csinl(long double complex);
  long double complex csinhl(long double complex);
long double complex ctanl(long double complex);
  long double complex ctanhl(long double complex);
long double complex cacosl(long double complex);
  long double complex cacoshl(long double complex);
long double complex casinl(long double complex);
  long double complex casinhl(long double complex);
long double complex catanl(long double complex);
  long double complex catanhl(long double complex);
long double complex clogl(long double complex);
long double complex cpowl(long double complex, long double complex);

The long double versions of the complex math functions are trivial
once the long double versions of the corresponding real functions
are written.

From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 17:13:49 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id BB96DAFF;
 Thu, 30 May 2013 17:13:49 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 8442A78E;
 Thu, 30 May 2013 17:13:49 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4UHDm7M067303; 
 Thu, 30 May 2013 10:13:48 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4UHDmUZ067302;
 Thu, 30 May 2013 10:13:48 -0700 (PDT) (envelope-from sgk)
Date: Thu, 30 May 2013 10:13:48 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Pedro Giffuni <pfg@FreeBSD.org>
Subject: Re: standards/175811: libstdc++ needs complex support in order use C99
Message-ID: <20130530171348.GA67170@troutmask.apl.washington.edu>
References: <201302040328.r143SUd3039504@freefall.freebsd.org>
 <510F306A.6090009@missouri.edu>
 <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
 <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <51A77324.2070702@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: Stephen Montgomery-Smith <stephen@missouri.edu>,
 David Chisnall <theraven@freebsd.org>, David Schultz <das@FreeBSD.ORG>,
 freebsd-numerics@freebsd.org, freebsd-standards@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 17:13:49 -0000

On Thu, May 30, 2013 at 10:41:24AM -0500, Pedro Giffuni wrote:
> 
> I may be wrong but with long double support people that
> need erfcl() and tgamma() can get them from boost.
> The problem is therefore not implementing everything but
> getting enough to turn on the features supported by
> libstdc++ and boost.
> 

Of course, you're wrong. :-) :-) <-- Note smileys.

C99 defines many long double functions.  Anyone wanting
to use C and libm, and not C++ and boost, will need 
quality implementations of these functions.  Of course,
the lack of any actual C99 compiler tends to dampen 
this argument.  

What I find appalling is reading "people are tired
of the situation with libm, so I'm  going to commit
some atrocious hack".   The proper response should be
"so I'm going to help implement and test the missing
functionality".  It's unfortunate that only a few
individuals are working to fix libm, but such is
life. 

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 19:44:00 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id BFB419D
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 19:44:00 +0000 (UTC)
 (envelope-from pfg@FreeBSD.org)
Received: from nm38-vm1.bullet.mail.ne1.yahoo.com
 (nm38-vm1.bullet.mail.ne1.yahoo.com [98.138.229.145])
 by mx1.freebsd.org (Postfix) with ESMTP id 70C3A136
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 19:44:00 +0000 (UTC)
Received: from [98.138.90.50] by nm38.bullet.mail.ne1.yahoo.com with NNFMP;
 30 May 2013 19:43:54 -0000
Received: from [98.138.226.63] by tm3.bullet.mail.ne1.yahoo.com with NNFMP;
 30 May 2013 19:43:54 -0000
Received: from [127.0.0.1] by smtp214.mail.ne1.yahoo.com with NNFMP;
 30 May 2013 19:43:54 -0000
X-Yahoo-Newman-Id: 290118.23681.bm@smtp214.mail.ne1.yahoo.com
X-Yahoo-Newman-Property: ymail-3
X-YMail-OSG: DAxlDJAVM1kUk.zJqk8IuYUBgsk_lxZ1KTr5UNtnK7jr9uU
 uM48rCnNJr7M6M0lsK6ut7vCuicK86y_14.m0lq3VfD39jJJZH8mcwrJq.S6
 z5RHMRYHH5QrfVb1Cv69vHnSKlGaXPscHpds8CIfSL2bW28IfMNFaRNLqZls
 4WFD8YgFOcPkmc.gPj0a65MKvlDWMXxkhiEcxqmnDpB2YKeRUf.n4kA6i4dr
 iHnTxteidqstQnOwm1UPcyy0xChMfsJ2wq9GLhcgSRzma1vWPrufHBRGOmSq
 VNZGb21QFRL4OA5gLwrJ.mopMkOyRHGKbqcVAnZvze.P1Habbua82ykfeAFv
 e71qqI3B9QsEkeZkfiVKzuDgYMObD.Om1hW.he4DxrcYiPB_NyneTl3sBVmB
 ZT_h7QmjU_jEfStf1uZ0Ugsl1oid8RtK.ggAmXpa9Ut6YZjH.6m5wlnymCOi
 _tYtsTKcF5ekaskpxWbtQhZYP0hvvz8XXxaL_0QnnhMyNpge6g_9.kTeA
X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf
X-Rocket-Received: from [192.168.0.102] (pfg@190.157.126.109 with )
 by smtp214.mail.ne1.yahoo.com with SMTP; 30 May 2013 12:43:54 -0700 PDT
Message-ID: <51A7ABF7.6060807@FreeBSD.org>
Date: Thu, 30 May 2013 14:43:51 -0500
From: Pedro Giffuni <pfg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130407 Thunderbird/17.0.5
MIME-Version: 1.0
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: standards/175811: libstdc++ needs complex support in order use C99
References: <201302040328.r143SUd3039504@freefall.freebsd.org>
 <510F306A.6090009@missouri.edu>
 <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
 <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org>
 <20130530171348.GA67170@troutmask.apl.washington.edu>
In-Reply-To: <20130530171348.GA67170@troutmask.apl.washington.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Stephen Montgomery-Smith <stephen@missouri.edu>,
 freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 19:44:00 -0000

( I stripped a bit the CC list )

On 30.05.2013 12:13, Steve Kargl wrote:
> On Thu, May 30, 2013 at 10:41:24AM -0500, Pedro Giffuni wrote:
>> I may be wrong but with long double support people that
>> need erfcl() and tgamma() can get them from boost.
>> The problem is therefore not implementing everything but
>> getting enough to turn on the features supported by
>> libstdc++ and boost.
>>
> Of course, you're wrong. :-) :-) <-- Note smileys.

And I knew I could be likely wrong from the start ;).

> C99 defines many long double functions.  Anyone wanting
> to use C and libm, and not C++ and boost, will need
> quality implementations of these functions.  Of course,
> the lack of any actual C99 compiler tends to dampen
> this argument.
>
> What I find appalling is reading "people are tired
> of the situation with libm, so I'm  going to commit
> some atrocious hack".   The proper response should be
> "so I'm going to help implement and test the missing
> functionality".  It's unfortunate that only a few
> individuals are working to fix libm, but such is
> life.
>

I guess I was trying to hint that Boost is a good
place to look at to get ideas for the implementations
for such stuff. Stephen knows this well though since
he actually fixed some complex functions in boost :).

The implementations of erfc and tgamma in
OpenOffice are based on the Boost code with
the important difference that boost does the
automatic type promotion when they can.

FWIW, I was about to change OpenOffice to use
boost but then I noticed that the type promotion
doesn't work on FreeBSD (due to the lack of long
double math) so in general there was
not much gain in changing the status quo.

Pedro.


From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 20:15:14 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 15C07E15;
 Thu, 30 May 2013 20:15:14 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id EC154302;
 Thu, 30 May 2013 20:15:13 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4UKFDwx068665; 
 Thu, 30 May 2013 13:15:13 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4UKFDo2068664;
 Thu, 30 May 2013 13:15:13 -0700 (PDT) (envelope-from sgk)
Date: Thu, 30 May 2013 13:15:13 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Pedro Giffuni <pfg@FreeBSD.org>
Subject: Re: standards/175811: libstdc++ needs complex support in order use C99
Message-ID: <20130530201513.GA68512@troutmask.apl.washington.edu>
References: <201302040328.r143SUd3039504@freefall.freebsd.org>
 <510F306A.6090009@missouri.edu>
 <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
 <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org>
 <20130530171348.GA67170@troutmask.apl.washington.edu>
 <51A7ABF7.6060807@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <51A7ABF7.6060807@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: Stephen Montgomery-Smith <stephen@missouri.edu>,
 freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 20:15:14 -0000

On Thu, May 30, 2013 at 02:43:51PM -0500, Pedro Giffuni wrote:
> On 30.05.2013 12:13, Steve Kargl wrote:
> > C99 defines many long double functions.  Anyone wanting
> > to use C and libm, and not C++ and boost, will need
> > quality implementations of these functions.  Of course,
> > the lack of any actual C99 compiler tends to dampen
> > this argument.
> >
> > What I find appalling is reading "people are tired
> > of the situation with libm, so I'm  going to commit
> > some atrocious hack".   The proper response should be
> > "so I'm going to help implement and test the missing
> > functionality".  It's unfortunate that only a few
> > individuals are working to fix libm, but such is
> > life.
> >
> 
> I guess I was trying to hint that Boost is a good
> place to look at to get ideas for the implementations
> for such stuff. Stephen knows this well though since
> he actually fixed some complex functions in boost :).
> 

Boost might be a good place to look for implementation
ideas.  Looking at the msun code also works.  As does
searching with google.  This is all secondary to the 
real issue.  The real problem is no one is willing to
step forward to actually help write and test the code.
Everyone seems to be waiting (and complaining!) for
someone else to do the work.  I've been chipping away at
libm issues since 2003, and given my available free time
I should have a fully compliant C99 libm around 2025 or
so.

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 20:19:30 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 84ED0EED
 for <freebsd-numerics@FreeBSD.org>; Thu, 30 May 2013 20:19:30 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from fallbackmx07.syd.optusnet.com.au
 (fallbackmx07.syd.optusnet.com.au [211.29.132.9])
 by mx1.freebsd.org (Postfix) with ESMTP id 0CB2E34E
 for <freebsd-numerics@FreeBSD.org>; Thu, 30 May 2013 20:19:29 +0000 (UTC)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
 [211.29.132.184])
 by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
 r4UKJI9W008054
 for <freebsd-numerics@FreeBSD.org>; Fri, 31 May 2013 06:19:18 +1000
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4UKJ9FE011708
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Fri, 31 May 2013 06:19:10 +1000
Date: Fri, 31 May 2013 06:19:09 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: Patches for s_expl.c
In-Reply-To: <20130530162723.GB66755@troutmask.apl.washington.edu>
Message-ID: <20130531053652.H65974@besplex.bde.org>
References: <20130528172242.GA51485@troutmask.apl.washington.edu>
 <20130529062437.V4648@besplex.bde.org>
 <20130529162441.GA58773@troutmask.apl.washington.edu>
 <20130530045951.Y4776@besplex.bde.org>
 <20130530162723.GB66755@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=e/de0tV/ c=1 sm=1 a=kj9zAlcOel0A:10
 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=SI7jfN9uMIUA:10
 a=Iu-xyOGO5_ZyKUnWV68A:9 a=CjuIK1q_8ugA:10 a=-W0hRMvl23hXUl_A:21
 a=5eAh3lsampImaI_r:21 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
Cc: freebsd-numerics@FreeBSD.org, Bruce Evans <brde@optusnet.com.au>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 20:19:30 -0000

On Thu, 30 May 2013, Steve Kargl wrote:

> OK, I've restored whitespace to hopefully match your expectations.
> Removed excess digits in exponents (e.g., 1.234e08 --> 1.234e8).
> Restored XXX comments.
> Removed (unnecessary?) blank lines.
> Restored the order of computing r = r1 + r2 in ld128.
> Moved the |x| < 0x1p-113 if-block back into the [T1:T3] interval.

I like the ld80 version now.  My diffs for the ld128 version are below.

> Final questions.  What is your preference for committing expm1l?
> Should it be included in s_expl.c or should I use 'svn cp' to
> copy s_expl.c to s_expm1l.c and add the implementation of
> expm1l to the copied version?

I prefer it in the same file.  The big table is hard to manage in a
separate file (if the functions are split, then the table should be
too, since it is the largest component), and some constants would have
to be made public or duplicated.  Accesses to public tables and scalars
cannot be optimized (by the compiler) as much as static ones.  But when
you implement exp() so that it works as well as expl(), the table should
be shared in the ld80 case, so at least the table should be split then.

@ --- z22/s_expl.c	Fri May 31 04:31:30 2013
@ +++ s_expl.c	Fri May 31 05:32:51 2013
@ @@ -70,7 +70,13 @@
@ 
@ +/*
@ + * XXX values in hex in comments have been lost (or were never present)
@ + * from here.
@ + */

This patch fixes just a few.  All the double precision coeffs are in a
standad format now.

@  static const long double
@  /*
@ - * Domain [-0.002708, 0.002708], range ~[-2.4011e-38, 2.4244e-38]:
@ + * Domain [-0.002708, 0.002708], range ~[-2.4021e-38, 2.4234e-38]:

Checking the range showed that it is not quite the claimed one.  I
think the old values are from a previous check, but I improved the
checking program so the new values are hopefully more accurate.

Oops, I'm not quite happy with the ld80 version, since the checker
says that its B range is much more different than claimed than this
range.

@   * |exp(x) - p(x)| < 2**-124.9
@   * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
@ + *
@ + * XXX the coeffs aren't very carefully rounded, and I get 2.3 more bits.
@   */

Perhaps the coeffs are rounded carefully enough now.  They can be chosen
better.

@ @@ -83,8 +89,25 @@
@  static const double
@ -A7  =  1.9841269841269471e-4,
@ -A8  =  2.4801587301585284e-5,
@ -A9  =  2.7557324277411234e-6,
@ -A10 =  2.7557333722375072e-7;
@ +A7  =  1.9841269841269470e-4,		/*  0x1.a01a01a019f91p-13 */
@ +A8  =  2.4801587301585286e-5,		/*  0x1.71de3ec75a967p-19 */
@ +A9  =  2.7557324277411235e-6,		/*  0x1.71de3ec75a967p-19 */
@ +A10 =  2.7557333722375069e-7;		/*  0x1.27e505ab56259p-22 */

Act on an old reminder to fix things and round the values properly
(just re-print the values given by the C declarations).  Also add
comments.

@ 
@  static const struct {
@ +	/*
@ +	 * hi must be rounded to at most 106 bits so that multiplication
@ +	 * by r1 in expm1l() is exact, but it is rounded to 88 bits due to
@ +	 * historical accidents.

Keep this part of the comment.

@ +	 *
@ +	 * XXX it is wasteful to use long double for both hi and lo.  ld128
@ +	 * exp2l() uses only float for lo (in a very differently organized
@ +	 * table; ld80 exp2l() is different again.  It uses 2 doubles in a
@ +	 * table organized like this one.  1 double and 1 float would
@ +	 * suffice).  There are different packing/locality/alignment/caching
@ +	 * problems with these methods.
@ +	 *
@ +	 * XXX C's bad %a format makes the bits unreadable.  They happen
@ +	 * to all line up for the hi values 1 before the point and 88
@ +	 * in 22 nybbles, but for the low values the nybbles are shifted
@ +	 * randomly.
@ +	 */

Reminders of things to fix.

In a development version, I need hi to have only about 56 bits.  It is
easy to re-split hi+lo for testing this.  A 24-bit or 53-bit hi is
sufficient and would give this automatically.

@  	long double	hi;
@ @@ -311,5 +336,11 @@
@   * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2].
@ - * Setting T3 to 0 would require the |x| < 0x1p-113  condition to appear
@ + * Setting T3 to 0 would require the |x| < 0x1p-113 condition to appear
@   * in both subintervals, so set T3 = 2**-5, which places the condition
@   * into the [T1:T3] interval.
@ + *
@ + * XXX we now do this more to (partially) balance the number of terms
@ + * in the C and D polys than to avoid checking the conditon in both
@ + * intervals.
@ + *
@ + * XXX these micro-optimizations are excessive.
@   */
@ @@ -319,7 +350,25 @@
@  /*
@ - * XXX Estimated range is for absolute error.
@ - * Domain [-0.1659, 0.03125], range ~[-1.8933e-38, 1.8943e-38]:
@ - * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-125.3
@ + * Domain [-0.1659, 0.03125], range ~[2.9134e-44, 1.8404e-37]:
@ + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-122.03

The relative error should be the one documented.

@ + *
@ + * XXX the coeffs aren't very carefully rounded.  I got 10.3 more bits with
@ + * the old version for [-0.1659, -0.03125].  Now T3 is better balanced, and
@ + * I would expect only 7-8 extra bits.
@ + *
@ + * XXX the number of terms can be reduced by 1.  Then I get a few more bits
@ + * with the same number of doubles (5), and 0.7 more bits with 8 doubles.
@ + * This much accuracy is hard to explain, and it isn't clear that reduction
@ + * of x to double is valid at the same point that reduction of the coeffs to
@ + * double.  With C10 double, the absolute errors from rounding it are up to
@ + * about 2**-53 * 0.1659**10/10! ~= 2**-100.8.  Remes apparently improves
@ + * this to 2**-122.1.
@   */

Better polynomials should be used someday, but I want you to generate them.
After fixing the generator to minimize the relative error instead of the
absolute error, you should get ones like mine.

@  static const long double
@ +/*
@ + * XXX none of the long double C or D coeffs except C10 is correctly printed.
@ + * If you re-print their values in %.35Le format, the result is always
@ + * different.  For example, the last 2 digits in C3 should be 59, not 67.
@ + * 67 is apparently from rounding an extra-precision value to 36 decimal
@ + * places.
@ + */
@  C3  =  1.66666666666666666666666666666666667e-1L,

I didn't fix these.

@ @@ -337,3 +386,3 @@
@  static const double
@ -C14 =  1.1470745580491932e-11,		/*  0x1.93974a81dae3p-37 */
@ +C14 =  1.1470745580491932e-11,		/*  0x1.93974a81dae30p-37 */
@  C15 =  7.6471620181090468e-13,		/*  0x1.ae7f3820adab1p-41 */
@ @@ -344,5 +393,17 @@
@  /*
@ - * XXX Estimated range is for absolute error.
@ - * Domain [0.03125, 0.1659], range ~[-2.7597e-38, 2.7602e-38]:
@ - * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-124.8
@ + * Domain [0.03125, 0.1659], range ~[-2.7676e-37, -1.0367e-38]:
@ + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-121.44
@ + *
@ + * XXX the coeffs aren't very carefully rounded. I get 5.2 more bits with
@ + * the old version for [-0.03125, 0.1659].  Now T3 is better balanced, and
@ + * I would expect 7-8 extra bits.
@ + *
@ + * XXX the number of terms can be reduced by 1.  Then I get a few more bits
@ + * with the same number of doubles (4), and 1.1 more bits with 6 doubles.
@ + * This much accuracy is hard to explain, etc., as above.  With D11 double,
@ + * the absolute errors from rounding it are up to about
@ + * 2**-53 * 0.1659**11/11! ~= 2**-106.8.
@ + *
@ + * Note that with my coeffs, although this side needs 1 fewer term, it needs
@ + * 1 more long double term, so it is probably actually slower on sparc64.
@   */

It's painful to have separate polys C and D for Tang's B.

@ @@ -403,3 +466,2 @@
@  	if (T1 < x && x < T2) {
@ -
@  		x2 = x * x;

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 20:35:14 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 74AEE31B;
 Thu, 30 May 2013 20:35:14 +0000 (UTC)
 (envelope-from wollman@khavrinen.csail.mit.edu)
Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu
 [IPv6:2001:470:8b2d:1e1c:21b:21ff:feb8:d7b0])
 by mx1.freebsd.org (Postfix) with ESMTP id 4D363640;
 Thu, 30 May 2013 20:35:14 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1])
 by khavrinen.csail.mit.edu (8.14.5/8.14.5) with ESMTP id r4UKZCqQ069030
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256
 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA);
 Thu, 30 May 2013 16:35:12 -0400 (EDT)
 (envelope-from wollman@khavrinen.csail.mit.edu)
Received: (from wollman@localhost)
 by khavrinen.csail.mit.edu (8.14.5/8.14.5/Submit) id r4UKZCOZ069027;
 Thu, 30 May 2013 16:35:12 -0400 (EDT) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <20903.47104.38977.577307@khavrinen.csail.mit.edu>
Date: Thu, 30 May 2013 16:35:12 -0400
From: Garrett Wollman <wollman@csail.mit.edu>
To: Warner Losh <imp@bsdimp.com>
Subject: Re: standards/175811: libstdc++ needs complex support in order use C99
In-Reply-To: <A3633CF7-B0D3-4E09-88FC-1D40197C652C@bsdimp.com>
References: <201302040328.r143SUd3039504@freefall.freebsd.org>
 <510F306A.6090009@missouri.edu>
 <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
 <20130530064635.GA91597@zim.MIT.EDU>
 <A3633CF7-B0D3-4E09-88FC-1D40197C652C@bsdimp.com>
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (khavrinen.csail.mit.edu [127.0.0.1]); Thu, 30 May 2013 16:35:12 -0400 (EDT)
Cc: freebsd-numerics@freebsd.org, freebsd-standards@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 20:35:14 -0000

<<On Thu, 30 May 2013 07:56:24 -0600, Warner Losh <imp@bsdimp.com> said:

> I'm all for getting everything we can into the tree that produces an
> answer that's not perfect, but close. What's the error that would be
> generated with the naive implementation of

> long double tgammal(long double f) { return tgamma(f); }

Perhaps we could implement these functions in such a way that they
logged a message to inform the user (once per process) that they were
using a low-quality implementation.  That would allow us to implement
these functions without totally losing the incentive to implement them
properly, and those users who don't actually call those functions
would not have to pay the price of further delay.  (This would be a
non-conforming implementation, since it would have side effects other
than those specified by the standard, but we already fail to conform
by not implementing the functions at all, so it wouldn't make things
*worse*.)

-GAWollman

From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 20:35:21 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 3565F322
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 20:35:21 +0000 (UTC)
 (envelope-from pfg@FreeBSD.org)
Received: from nm18.bullet.mail.ne1.yahoo.com (nm18.bullet.mail.ne1.yahoo.com
 [98.138.90.81]) by mx1.freebsd.org (Postfix) with ESMTP id CB3D7641
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 20:35:20 +0000 (UTC)
Received: from [98.138.226.179] by nm18.bullet.mail.ne1.yahoo.com with NNFMP;
 30 May 2013 20:32:02 -0000
Received: from [98.138.226.61] by tm14.bullet.mail.ne1.yahoo.com with NNFMP;
 30 May 2013 20:32:02 -0000
Received: from [127.0.0.1] by smtp212.mail.ne1.yahoo.com with NNFMP;
 30 May 2013 20:32:02 -0000
X-Yahoo-Newman-Id: 634308.14639.bm@smtp212.mail.ne1.yahoo.com
X-Yahoo-Newman-Property: ymail-3
X-YMail-OSG: ZY7iBa0VM1no4wHSbRqynOmrw02iYuqXod4iGts_ae69c3e
 6Mkfz75YLdkCro9ZbEPV9rrRTQWaYg3yr8_lJLxQdN668cbTG.t9KijWXqPI
 zSJyAaz.uZiOqJuXnOeQYxH.expwxcbCtuebP3VWHEFyz0gdiSvNCnQ.uy_S
 PgPrHHXOAermSQI2rDVEsBJTLd6kLbtgvuJBxM0E3MVueUGxK3Jp4YDjRc4Q
 jLjraFv7.eGFXtRo9Ky0KphA9GcHbLuYW.IzVX2pI7zFUwfbaXTxSHisrYCS
 l0O9ILZXXsBAcfEbyIqYkTQJFfsZzxgX7ypGJFwomKa.bZu6Chn8ChTekKrZ
 SF5dAREoZDzd9PbORxqpjsCDzTc0q99Onmh6DVVGMEUnSNMb0x6td.MNiANs
 yfrz_e_f9l3eBe9I2QIPiAcRYqz82wMBVGbvOWCJqb0jovlnUDzCsfubfVt0
 E3fN_7HA0YAGsEEu6HFlMZhqLB4RSi1ZfVjEo7.XqRlCV96Cw7MLMtph1ahL
 k63QeN7eWx6ocSVrO6wx95EeEaLfwzrV6mu5cXMqsYq9Fm1ngJZp3n8xsKDI
 Gs451yYNnZTQXNbw-
X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf
X-Rocket-Received: from [192.168.0.102] (pfg@190.157.126.109 with )
 by smtp212.mail.ne1.yahoo.com with SMTP; 30 May 2013 13:32:02 -0700 PDT
Message-ID: <51A7B73F.8040409@FreeBSD.org>
Date: Thu, 30 May 2013 15:31:59 -0500
From: Pedro Giffuni <pfg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130407 Thunderbird/17.0.5
MIME-Version: 1.0
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: standards/175811: libstdc++ needs complex support in order use C99
References: <201302040328.r143SUd3039504@freefall.freebsd.org>
 <510F306A.6090009@missouri.edu>
 <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
 <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org>
 <20130530171348.GA67170@troutmask.apl.washington.edu>
 <51A7ABF7.6060807@FreeBSD.org>
 <20130530201513.GA68512@troutmask.apl.washington.edu>
In-Reply-To: <20130530201513.GA68512@troutmask.apl.washington.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 20:35:21 -0000

On 30.05.2013 15:15, Steve Kargl wrote:
> On Thu, May 30, 2013 at 02:43:51PM -0500, Pedro Giffuni wrote:
>> On 30.05.2013 12:13, Steve Kargl wrote:
>>> C99 defines many long double functions.  Anyone wanting
>>> to use C and libm, and not C++ and boost, will need
>>> quality implementations of these functions.  Of course,
>>> the lack of any actual C99 compiler tends to dampen
>>> this argument.
>>>
>>> What I find appalling is reading "people are tired
>>> of the situation with libm, so I'm  going to commit
>>> some atrocious hack".   The proper response should be
>>> "so I'm going to help implement and test the missing
>>> functionality".  It's unfortunate that only a few
>>> individuals are working to fix libm, but such is
>>> life.
>>>
>> I guess I was trying to hint that Boost is a good
>> place to look at to get ideas for the implementations
>> for such stuff. Stephen knows this well though since
>> he actually fixed some complex functions in boost :).
>>
> Boost might be a good place to look for implementation
> ideas.  Looking at the msun code also works.  As does
> searching with google.  This is all secondary to the
> real issue.  The real problem is no one is willing to
> step forward to actually help write and test the code.
> Everyone seems to be waiting (and complaining!) for
> someone else to do the work.  I've been chipping away at
> libm issues since 2003, and given my available free time
> I should have a fully compliant C99 libm around 2025 or
> so.
>

And it happens all around the tree ...

The guys fixing clang seem pretty overloaded too.
We really need a better installer, and to add more DTrace
providers and while here more filesystems ... it never stops
and we are all just volunteers.

All in all, feedback is not necessarily a bad thing.
Even if there are few heroic developers working on it, it
would help to have a list of open tasks like this:

http://www.freebsd.org/projects/c99/

so that someone asking about the status is just pointed
there and gets the picture.

Just my $0.02, sorry that I am busy with other stuff.

Pedro.

From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 20:56:11 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id D99EE175
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 20:56:11 +0000 (UTC)
 (envelope-from s.montgomerysmith@gmail.com)
Received: from mail-ie0-x234.google.com (mail-ie0-x234.google.com
 [IPv6:2607:f8b0:4001:c03::234])
 by mx1.freebsd.org (Postfix) with ESMTP id ABEA681D
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 20:56:11 +0000 (UTC)
Received: by mail-ie0-f180.google.com with SMTP id b11so1809393iee.25
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 13:56:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:subject
 :references:in-reply-to:x-enigmail-version:content-type
 :content-transfer-encoding;
 bh=24dgC1PLNQEsF310L0ySC2NAhY4O5N3bHfBB4oDzRMA=;
 b=Hjg9YbdL2IBVQEZRTc72st+QiM2SgearD+UJsF2HUKo04m3nAh8M7+07X6t79EeNHq
 ZtmbdnuhrnAJmFN5CsNSQP2M5QmTZc7i31z451tmiWVCAuhAHTgtJ9rVQjm9YghyvBfI
 PMyMOz1mzuxN6haa8pW0P6JPX0bcYR99o/o9VMmBZtOJ/NTqLv0H0JkZhDhk6968Gs1F
 L4hCX0d+btV1M44k8Fv0ftPVlEeZlvBlzHasp24xHMnOwMNfQAC/+SnGCZUlf7Y7QoBi
 pXeZ5xBM+u9HnYoM3JwGMsMOCq/Gs6U7Ea3zoIIgqBRPlWIPYCq/NhnXnniOO2BnCrLc
 WvOw==
X-Received: by 10.42.250.202 with SMTP id mp10mr3825516icb.21.1369947371458;
 Thu, 30 May 2013 13:56:11 -0700 (PDT)
Received: from [10.7.39.35] ([161.130.188.204])
 by mx.google.com with ESMTPSA id qr3sm791242igb.1.2013.05.30.13.56.09
 for <freebsd-numerics@freebsd.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Thu, 30 May 2013 13:56:10 -0700 (PDT)
Sender: Stephen Montgomery-Smith <s.montgomerysmith@gmail.com>
Message-ID: <51A7BCE8.3010001@missouri.edu>
Date: Thu, 30 May 2013 15:56:08 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
 rv:17.0) Gecko/20130510 Thunderbird/17.0.6
MIME-Version: 1.0
To: freebsd-numerics@freebsd.org
Subject: Re: standards/175811: libstdc++ needs complex support in order use C99
References: <201302040328.r143SUd3039504@freefall.freebsd.org>
 <510F306A.6090009@missouri.edu>
 <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
 <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org>
 <20130530171348.GA67170@troutmask.apl.washington.edu>
In-Reply-To: <20130530171348.GA67170@troutmask.apl.washington.edu>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 20:56:11 -0000

On 05/30/2013 12:13 PM, Steve Kargl wrote:

> What I find appalling is reading "people are tired
> of the situation with libm, so I'm  going to commit
> some atrocious hack".   The proper response should be
> "so I'm going to help implement and test the missing
> functionality".  It's unfortunate that only a few
> individuals are working to fix libm, but such is
> life. 

I don't think the problem is that there are too few individuals.  I
think the problem is that the standards are set too high.  I presented
numerically accurate complex arc-trig functions a long time ago, and I
became increasingly frustrated at the lack of progress.

I am pleased that it got committed a few days ago.

But I feel that the change requests, particular the style change
requests, became too much.  I dutifully complied with the many style
changes, but it became overwhelming.

There is a happy medium between simply copying the *l functions to the *
functions, and what we have now.  I am all for having reasonable
standards, but what we currently have is gridlock that is unacceptable.


From owner-freebsd-numerics@FreeBSD.ORG  Thu May 30 21:17:12 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id F27E9565
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 21:17:12 +0000 (UTC)
 (envelope-from imp@bsdimp.com)
Received: from mail-ie0-x22c.google.com (mail-ie0-x22c.google.com
 [IPv6:2607:f8b0:4001:c03::22c])
 by mx1.freebsd.org (Postfix) with ESMTP id C019F969
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 21:17:12 +0000 (UTC)
Received: by mail-ie0-f172.google.com with SMTP id 17so1960582iea.3
 for <freebsd-numerics@freebsd.org>; Thu, 30 May 2013 14:17:12 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer
 :x-gm-message-state;
 bh=5fL5r8OZ7yMWK400mw0a+tJ4HcpentBZd9UtNAB+EUM=;
 b=E7wiEuJS5N1fdvbn8MBcwSa1tUrXkYsI/yGSceLRC+n2U78I5owVX0Irys8CUVNKc7
 eJcM1hS16UNMSzOwrU0HQxb6IBUU0ZnzqgkEplE6toYYHJ8dxYt6jZGe8qLkoHZdm93R
 MUE7z3pUPdqIDE/R94ABNIbea4YZRqz3Lu1rOfkXmMkhOt3yHEXDtNkPE1Sf9oAwmDxj
 pxOz2SUMu8oF5O4kZeMl4Lq5NRe4gbcNZwj7k1ZzZlCq1VL2Vb9xG7SNQWxQaEjkCc63
 /gOyQquPA03+39GfpFd4GKt/x7L9oVXaGfEQTWanGOXsGqAbznrypQRd96ewftv1KM2A
 SYNA==
X-Received: by 10.50.43.234 with SMTP id z10mr247671igl.92.1369948632263;
 Thu, 30 May 2013 14:17:12 -0700 (PDT)
Received: from monkey-bot.int.fusionio.com ([209.117.142.2])
 by mx.google.com with ESMTPSA id k10sm193977ige.0.2013.05.30.14.17.10
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Thu, 30 May 2013 14:17:11 -0700 (PDT)
Sender: Warner Losh <wlosh@bsdimp.com>
Subject: Re: standards/175811: libstdc++ needs complex support in order use C99
Mime-Version: 1.0 (Apple Message framework v1085)
Content-Type: text/plain; charset=us-ascii
From: Warner Losh <imp@bsdimp.com>
In-Reply-To: <20130530171348.GA67170@troutmask.apl.washington.edu>
Date: Thu, 30 May 2013 15:17:07 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <486AC985-2F3A-4CEB-A229-DF5F4AE9C50F@bsdimp.com>
References: <201302040328.r143SUd3039504@freefall.freebsd.org>
 <510F306A.6090009@missouri.edu>
 <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
 <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org>
 <20130530171348.GA67170@troutmask.apl.washington.edu>
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
X-Mailer: Apple Mail (2.1085)
X-Gm-Message-State: ALoCoQnN+91s005OkUbnJ3QApmJa8IRnUZuNIDfsZF6T9L7cpTYFnqHqHm1x4pJhq3Y5VOx0m3bi
Cc: Stephen Montgomery-Smith <stephen@missouri.edu>,
 David Schultz <das@FreeBSD.ORG>, Pedro Giffuni <pfg@FreeBSD.org>,
 freebsd-standards@freebsd.org, freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2013 21:17:13 -0000


On May 30, 2013, at 11:13 AM, Steve Kargl wrote:

> On Thu, May 30, 2013 at 10:41:24AM -0500, Pedro Giffuni wrote:
>>=20
>> I may be wrong but with long double support people that
>> need erfcl() and tgamma() can get them from boost.
>> The problem is therefore not implementing everything but
>> getting enough to turn on the features supported by
>> libstdc++ and boost.
>>=20
>=20
> Of course, you're wrong. :-) :-) <-- Note smileys.
>=20
> C99 defines many long double functions.  Anyone wanting
> to use C and libm, and not C++ and boost, will need=20
> quality implementations of these functions.  Of course,
> the lack of any actual C99 compiler tends to dampen=20
> this argument. =20
>=20
> What I find appalling is reading "people are tired
> of the situation with libm, so I'm  going to commit
> some atrocious hack".   The proper response should be
> "so I'm going to help implement and test the missing
> functionality".  It's unfortunate that only a few
> individuals are working to fix libm, but such is
> life.=20

I'd help, but the barriers to entry are somewhat steep and prickly. I =
tried to help, and got no end of grief for documenting the differences =
in an algorithm that was actually different that people told me was the =
same. In that environment, you suck the enthusiasm out of the air an =
wind up in the something is better than nothing camp quite quickly.

Warner


From owner-freebsd-numerics@FreeBSD.ORG  Fri May 31 03:38:13 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 7257917CC
 for <freebsd-numerics@freebsd.org>; Fri, 31 May 2013 03:38:13 +0000 (UTC)
 (envelope-from das@FreeBSD.org)
Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net
 [50.196.151.174]) by mx1.freebsd.org (Postfix) with ESMTP id 52E64CCA
 for <freebsd-numerics@freebsd.org>; Fri, 31 May 2013 03:38:13 +0000 (UTC)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
 by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r4V3cCFd095032;
 Thu, 30 May 2013 20:38:12 -0700 (PDT) (envelope-from das@FreeBSD.org)
Received: (from das@localhost)
 by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r4V3cBu8095031;
 Thu, 30 May 2013 20:38:11 -0700 (PDT) (envelope-from das@FreeBSD.org)
Date: Thu, 30 May 2013 20:38:11 -0700
From: David Schultz <das@FreeBSD.org>
To: Stephen Montgomery-Smith <stephen@missouri.edu>
Subject: Re: standards/175811: libstdc++ needs complex support in order use C99
Message-ID: <20130531033811.GA95005@zim.MIT.EDU>
References: <201302040328.r143SUd3039504@freefall.freebsd.org>
 <510F306A.6090009@missouri.edu>
 <C5BD0238-121D-4D8B-924A-230C07222666@FreeBSD.org>
 <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org>
 <20130530171348.GA67170@troutmask.apl.washington.edu>
 <51A7BCE8.3010001@missouri.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <51A7BCE8.3010001@missouri.edu>
Cc: freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 May 2013 03:38:13 -0000

On Thu, May 30, 2013, Stephen Montgomery-Smith wrote:
> On 05/30/2013 12:13 PM, Steve Kargl wrote:
> 
> > What I find appalling is reading "people are tired
> > of the situation with libm, so I'm  going to commit
> > some atrocious hack".   The proper response should be
> > "so I'm going to help implement and test the missing
> > functionality".  It's unfortunate that only a few
> > individuals are working to fix libm, but such is
> > life. 
> 
> I don't think the problem is that there are too few individuals.  I
> think the problem is that the standards are set too high.  I presented
> numerically accurate complex arc-trig functions a long time ago, and I
> became increasingly frustrated at the lack of progress.
> 
> I am pleased that it got committed a few days ago.
> 
> But I feel that the change requests, particular the style change
> requests, became too much.  I dutifully complied with the many style
> changes, but it became overwhelming.

Bruce is very meticulous and has a lot of good feedback, but it's
important to understand that Bruce's standards are not the minimum
standards for committing a change.  Bruce doesn't commit directly
anymore in any case.  I don't think I have ever committed a change
that Bruce could find no flaws in, including patches submitted by
Bruce himself. :) It's okay to commit some working code first and
then improve it later.

From owner-freebsd-numerics@FreeBSD.ORG  Fri May 31 15:46:09 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id D4490DFD
 for <freebsd-numerics@FreeBSD.org>; Fri, 31 May 2013 15:46:09 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id A0E6AC1D
 for <freebsd-numerics@FreeBSD.org>; Fri, 31 May 2013 15:46:09 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4VFk8iJ073260; 
 Fri, 31 May 2013 08:46:08 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4VFk83v073259;
 Fri, 31 May 2013 08:46:08 -0700 (PDT) (envelope-from sgk)
Date: Fri, 31 May 2013 08:46:08 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: Patches for s_expl.c
Message-ID: <20130531154608.GA73175@troutmask.apl.washington.edu>
References: <20130528172242.GA51485@troutmask.apl.washington.edu>
 <20130529062437.V4648@besplex.bde.org>
 <20130529162441.GA58773@troutmask.apl.washington.edu>
 <20130530045951.Y4776@besplex.bde.org>
 <20130530162723.GB66755@troutmask.apl.washington.edu>
 <20130531053652.H65974@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130531053652.H65974@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 May 2013 15:46:09 -0000

On Fri, May 31, 2013 at 06:19:09AM +1000, Bruce Evans wrote:
> On Thu, 30 May 2013, Steve Kargl wrote:
> 
> > OK, I've restored whitespace to hopefully match your expectations.
> > Removed excess digits in exponents (e.g., 1.234e08 --> 1.234e8).
> > Restored XXX comments.
> > Removed (unnecessary?) blank lines.
> > Restored the order of computing r = r1 + r2 in ld128.
> > Moved the |x| < 0x1p-113 if-block back into the [T1:T3] interval.
> 
> I like the ld80 version now.  My diffs for the ld128 version are below.

:-) 

> > Final questions.  What is your preference for committing expm1l?
> > Should it be included in s_expl.c or should I use 'svn cp' to
> > copy s_expl.c to s_expm1l.c and add the implementation of
> > expm1l to the copied version?
> 
> I prefer it in the same file.  The big table is hard to manage in a
> separate file (if the functions are split, then the table should be
> too, since it is the largest component), and some constants would have
> to be made public or duplicated.  Accesses to public tables and scalars
> cannot be optimized (by the compiler) as much as static ones.  But when
> you implement exp() so that it works as well as expl(), the table should
> be shared in the ld80 case, so at least the table should be split then.

OK.  I'll commit expm1l into s_expl.c.

I did briefly look at splitting the code into a k_expm1l.{c|h}
and s_exp[m1]l.c, but I could not convince myself that it would 
provided us with any clear benefit due to the size and differences
in constructing the final result.

I've add most of your suggests.

> @  static const struct {
> @ +	/*
> @ +	 * hi must be rounded to at most 106 bits so that multiplication
> @ +	 * by r1 in expm1l() is exact, but it is rounded to 88 bits due to
> @ +	 * historical accidents.
> 
> Keep this part of the comment.

OK.

> @ +	 *
> @ +	 * XXX it is wasteful to use long double for both hi and lo.  ld128
> @ +	 * exp2l() uses only float for lo (in a very differently organized
> @ +	 * table; ld80 exp2l() is different again.  It uses 2 doubles in a
> @ +	 * table organized like this one.  1 double and 1 float would
> @ +	 * suffice).  There are different packing/locality/alignment/caching
> @ +	 * problems with these methods.
> @ +	 *
> @ +	 * XXX C's bad %a format makes the bits unreadable.  They happen
> @ +	 * to all line up for the hi values 1 before the point and 88
> @ +	 * in 22 nybbles, but for the low values the nybbles are shifted
> @ +	 * randomly.
> @ +	 */

I left these XXX out of the new version, and have archived your
email in my development tree.  I may someday look at whether
changing the tables provides an improvement.

> 
> Reminders of things to fix.
> 
> In a development version, I need hi to have only about 56 bits.  It is
> easy to re-split hi+lo for testing this.  A 24-bit or 53-bit hi is
> sufficient and would give this automatically.

Is this a version where you try to eliminate the C and D polynomials?

> @ + * XXX the coeffs aren't very carefully rounded.  I got 10.3 more bits with
> @ + * the old version for [-0.1659, -0.03125].  Now T3 is better balanced, and
> @ + * I would expect only 7-8 extra bits.
> @ + *
> @ + * XXX the number of terms can be reduced by 1.  Then I get a few more bits
> @ + * with the same number of doubles (5), and 0.7 more bits with 8 doubles.
> @ + * This much accuracy is hard to explain, and it isn't clear that reduction
> @ + * of x to double is valid at the same point that reduction of the coeffs to
> @ + * double.  With C10 double, the absolute errors from rounding it are up to
> @ + * about 2**-53 * 0.1659**10/10! ~= 2**-100.8.  Remes apparently improves
> @ + * this to 2**-122.1.
> @   */
> 
> Better polynomials should be used someday, but I want you to generate them.
> After fixing the generator to minimize the relative error instead of the
> absolute error, you should get ones like mine.

I left these XXX out as well.  I have a plan for possibly generating 
new polynomials, but it depends on acquiring some external funding 
to completely rewrite how I implemented the Remes algorithm.

> @  static const long double
> @ +/*
> @ + * XXX none of the long double C or D coeffs except C10 is correctly printed.
> @ + * If you re-print their values in %.35Le format, the result is always
> @ + * different.  For example, the last 2 digits in C3 should be 59, not 67.
> @ + * 67 is apparently from rounding an extra-precision value to 36 decimal
> @ + * places.
> @ + */
> @  C3  =  1.66666666666666666666666666666666667e-1L,
> 
> I didn't fix these.
> 

I didn't fix the coefficient as well.  I'll do it if I ever get 
around to regenerating the coefficients.  The limiting testing
that I've been able to do on flame gave max ULP < 0.51.  This,
IMO, is good enough for now.

> @ + *
> @ + * XXX the coeffs aren't very carefully rounded. I get 5.2 more bits with
> @ + * the old version for [-0.03125, 0.1659].  Now T3 is better balanced, and
> @ + * I would expect 7-8 extra bits.
> @ + *
> @ + * XXX the number of terms can be reduced by 1.  Then I get a few more bits
> @ + * with the same number of doubles (4), and 1.1 more bits with 6 doubles.
> @ + * This much accuracy is hard to explain, etc., as above.  With D11 double,
> @ + * the absolute errors from rounding it are up to about
> @ + * 2**-53 * 0.1659**11/11! ~= 2**-106.8.
> @ + *
> @ + * Note that with my coeffs, although this side needs 1 fewer term, it needs
> @ + * 1 more long double term, so it is probably actually slower on sparc64.
> @   */

I did not include this dialogue as the reference to "I" would
appear ambigious to the casual reader of the code.

Thanks for helping with getting the code to its current.

Final diff(?).

-- 
Steve

Index: ld80/s_expl.c
===================================================================
--- ld80/s_expl.c	(revision 251146)
+++ ld80/s_expl.c	(working copy)
@@ -1,5 +1,5 @@
 /*-
- * Copyright (c) 2009-2012 Steven G. Kargl
+ * Copyright (c) 2009-2013 Steven G. Kargl
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -29,7 +29,7 @@
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
-/*-
+/**
  * Compute the exponential of x for Intel 80-bit format.  This is based on:
  *
  *   PTP Tang, "Table-driven implementation of the exponential function
@@ -50,6 +50,7 @@
 #include "math_private.h"
 
 #define	INTERVALS	128
+#define	LOG2_INTERVALS	7
 #define	BIAS	(LDBL_MAX_EXP - 1)
 
 static const long double
@@ -60,9 +61,12 @@
 
 static const union IEEEl2bits
 /* log(2**16384 - 0.5) rounded towards zero: */
-o_threshold = LD80C(0xb17217f7d1cf79ab, 13,  11356.5234062941439488L),
+/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */
+o_thresholdu = LD80C(0xb17217f7d1cf79ab, 13,  11356.5234062941439488L),
+#define o_threshold	 (o_thresholdu.e)
 /* log(2**(-16381-64-1)) rounded towards zero: */
-u_threshold = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L);
+u_thresholdu = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L);
+#define u_threshold	 (u_thresholdu.e)
 
 static const double
 /*
@@ -78,11 +82,11 @@
  * |exp(x) - p(x)| < 2**-77.2
  * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
  */
-P2 =  0.5,
-P3 =  1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
-P4 =  4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
-P5 =  8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
-P6 =  1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
+A2 =  0.5,
+A3 =  1.6666666666666119e-1,		/*  0x15555555555490.0p-55 */
+A4 =  4.1666666666665887e-2,		/*  0x155555555554e5.0p-57 */
+A5 =  8.3333354987869413e-3,		/*  0x1111115b789919.0p-59 */
+A6 =  1.3888891738560272e-3;		/*  0x16c16c651633ae.0p-62 */
 
 /*
  * 2^(i/INTERVALS) for i in [0,INTERVALS] is represented by two values where
@@ -96,8 +100,7 @@
 static const struct {
 	double	hi;
 	double	lo;
-/* XXX should rename 's'. */
-} s[INTERVALS] = {
+} tbl[INTERVALS] = {
 	0x1p+0, 0x0p+0,
 	0x1.0163da9fb3335p+0, 0x1.b61299ab8cdb7p-54,
 	0x1.02c9a3e778060p+0, 0x1.dcdef95949ef4p-53,
@@ -232,7 +235,8 @@
 expl(long double x)
 {
 	union IEEEl2bits u, v;
-	long double fn, q, r, r1, r2, t, t23, t45, twopk, twopkp10000, z;
+	long double fn, q, r, r1, r2, t, twopk, twopkp10000;
+	long double z;
 	int k, n, n2;
 	uint16_t hx, ix;
 
@@ -242,40 +246,39 @@
 	ix = hx & 0x7fff;
 	if (ix >= BIAS + 13) {		/* |x| >= 8192 or x is NaN */
 		if (ix == BIAS + LDBL_MAX_EXP) {
-			if (hx & 0x8000 && u.xbits.man == 1ULL << 63)
-				return (0.0L);	/* x is -Inf */
-			return (x + x); /* x is +Inf, NaN or unsupported */
+			if (hx & 0x8000)  /* x is -Inf, -NaN or unsupported */
+				return (-1 / x);
+ 			return (x + x);	/* x is +Inf, +NaN or unsupported */
 		}
-		if (x > o_threshold.e)
+		if (x > o_threshold)
 			return (huge * huge);
-		if (x < u_threshold.e)
+		if (x < u_threshold)
 			return (tiny * tiny);
-	} else if (ix < BIAS - 66) {	/* |x| < 0x1p-66 */
-					/* includes pseudo-denormals */
-		if (huge + x > 1.0L)	/* trigger inexact iff x != 0 */
-			return (1.0L + x);
+	} else if (ix < BIAS - 65) {	/* |x| < 0x1p-65 (includes pseudos) */
+		return (1 + x);		/* 1 with inexact iff x != 0 */
 	}
 
 	ENTERI();
 
-	/* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
 	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
 	fn = x * INV_L + 0x1.8p63 - 0x1.8p63;
 	r = x - fn * L1 - fn * L2;	/* r = r1 + r2 done independently. */
 #if defined(HAVE_EFFICIENT_IRINTL)
-	n  = irintl(fn);
+	n = irintl(fn);
 #elif defined(HAVE_EFFICIENT_IRINT)
-	n  = irint(fn);
+	n = irint(fn);
 #else
-	n  = (int)fn;
+	n = (int)fn;
 #endif
 	n2 = (unsigned)n % INTERVALS;
-	k = (n - n2) / INTERVALS;
+	/* Depend on the sign bit being propagated: */
+	k = n >> LOG2_INTERVALS;
 	r1 = x - fn * L1;
-	r2 = -fn * L2;
+	r2 = fn * -L2;
 
 	/* Prepare scale factors. */
-	v.xbits.man = 1ULL << 63;
+	v.e = 1;
 	if (k >= LDBL_MIN_EXP) {
 		v.xbits.expsign = BIAS + k;
 		twopk = v.e;
@@ -284,21 +287,183 @@
 		twopkp10000 = v.e;
 	}
 
-	/* Evaluate expl(midpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */
-	/* Here q = q(r), not q(r1), since r1 is lopped like L1. */
-	t45 = r * P5 + P4;
+	/* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */
 	z = r * r;
-	t23 = r * P3 + P2;
-	q = r2 + z * t23 + z * z * t45 + z * z * z * P6;
-	t = (long double)s[n2].lo + s[n2].hi;
-	t = s[n2].lo + t * (q + r1) + s[n2].hi;
+	q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6;
+	t = (long double)tbl[n2].lo + tbl[n2].hi;
+	t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
 
 	/* Scale by 2**k. */
 	if (k >= LDBL_MIN_EXP) {
 		if (k == LDBL_MAX_EXP)
-			RETURNI(t * 2.0L * 0x1p16383L);
+			RETURNI(t * 2 * 0x1p16383L);
 		RETURNI(t * twopk);
 	} else {
 		RETURNI(t * twopkp10000 * twom10000);
 	}
 }
+
+/**
+ * Compute expm1l(x) for Intel 80-bit format.  This is based on:
+ *
+ *   PTP Tang, "Table-driven implementation of the Expm1 function
+ *   in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18,
+ *   211-222 (1992).
+ */
+
+/*
+ * Our T1 and T2 are chosen to be approximately the points where method
+ * A and method B have the same accuracy.  Tang's T1 and T2 are the
+ * points where method A's accuracy changes by a full bit.  For Tang,
+ * this drop in accuracy makes method A immediately less accurate than
+ * method B, but our larger INTERVALS makes method A 2 bits more
+ * accurate so it remains the most accurate method significantly
+ * closer to the origin despite losing the full bit in our extended
+ * range for it.
+ */
+static const double
+T1 = -0.1659,				/* ~-30.625/128 * log(2) */
+T2 =  0.1659;				/* ~30.625/128 * log(2) */
+
+/*
+ * Domain [-0.1659, 0.1659], range ~[-1.2027e-22, 3.4417e-22]:
+ * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.2
+ */
+static const union IEEEl2bits
+B3 = LD80C(0xaaaaaaaaaaaaaaab, -3,  1.66666666666666666671e-1L),
+B4 = LD80C(0xaaaaaaaaaaaaaaac, -5,  4.16666666666666666712e-2L);
+
+static const double
+B5  =  8.3333333333333245e-3,		/*  0x1.111111111110cp-7 */
+B6  =  1.3888888888888861e-3,		/*  0x1.6c16c16c16c0ap-10 */
+B7  =  1.9841269841532042e-4,		/*  0x1.a01a01a0319f9p-13 */
+B8  =  2.4801587302069236e-5,		/*  0x1.a01a01a03cbbcp-16 */
+B9  =  2.7557316558468562e-6,		/*  0x1.71de37fd33d67p-19 */
+B10 =  2.7557315829785151e-7,		/*  0x1.27e4f91418144p-22 */
+B11 =  2.5063168199779829e-8,		/*  0x1.ae94fabdc6b27p-26 */
+B12 =  2.0887164654459567e-9;		/*  0x1.1f122d6413fe1p-29 */
+
+long double
+expm1l(long double x)
+{
+	union IEEEl2bits u, v;
+	long double fn, hx2_hi, hx2_lo, q, r, r1, r2, t, twomk, twopk, x_hi;
+	long double x_lo, x2, z;
+	long double x4;
+	int k, n, n2;
+	uint16_t hx, ix;
+
+	/* Filter out exceptional cases. */
+	u.e = x;
+	hx = u.xbits.expsign;
+	ix = hx & 0x7fff;
+	if (ix >= BIAS + 6) {		/* |x| >= 64 or x is NaN */
+		if (ix == BIAS + LDBL_MAX_EXP) {
+			if (hx & 0x8000)  /* x is -Inf, -NaN or unsupported */
+				return (-1 / x - 1);
+			return (x + x);	/* x is +Inf, +NaN or unsupported */
+		}
+		if (x > o_threshold)
+			return (huge * huge);
+		/*
+		 * expm1l() never underflows, but it must avoid
+		 * unrepresentable large negative exponents.  We used a
+		 * much smaller threshold for large |x| above than in
+		 * expl() so as to handle not so large negative exponents
+		 * in the same way as large ones here.
+		 */
+		if (hx & 0x8000)	/* x <= -64 */
+			return (tiny - 1);	/* good for x < -65ln2 - eps */
+	}
+
+	ENTERI();
+
+	if (T1 < x && x < T2) {
+		if (ix < BIAS - 64) {	/* |x| < 0x1p-64 (includes pseudos) */
+			/* x (rounded) with inexact if x != 0: */
+			RETURNI(x == 0 ? x :
+			    (0x1p100 * x + fabsl(x)) * 0x1p-100);
+		}
+
+		x2 = x * x;
+		x4 = x2 * x2;
+		q = x4 * (x2 * (x4 *
+		    /*
+		     * XXX the number of terms is no longer good for
+		     * pairwise grouping of all except B3, and the
+		     * grouping is no longer from highest down.
+		     */
+		    (x2 *            B12  + (x * B11 + B10)) +
+		    (x2 * (x * B9 +  B8) +  (x * B7 +  B6))) +
+			  (x * B5 +  B4.e)) + x2 * x * B3.e;
+
+		x_hi = (float)x;
+		x_lo = x - x_hi;
+		hx2_hi = x_hi * x_hi / 2;
+		hx2_lo = x_lo * (x + x_hi) / 2;
+		if (ix >= BIAS - 7)
+			RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi));
+		else
+			RETURNI(hx2_lo + q + hx2_hi + x);
+	}
+
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
+	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
+	fn = x * INV_L + 0x1.8p63 - 0x1.8p63;
+#if defined(HAVE_EFFICIENT_IRINTL)
+	n = irintl(fn);
+#elif defined(HAVE_EFFICIENT_IRINT)
+	n = irint(fn);
+#else
+	n = (int)fn;
+#endif
+	n2 = (unsigned)n % INTERVALS;
+	k = n >> LOG2_INTERVALS;
+	r1 = x - fn * L1;
+	r2 = fn * -L2;
+	r = r1 + r2;
+
+	/* Prepare scale factor. */
+	v.e = 1;
+	v.xbits.expsign = BIAS + k;
+	twopk = v.e;
+
+	/*
+	 * Evaluate lower terms of
+	 * expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2).
+	 */
+	z = r * r;
+	q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6;
+
+	t = (long double)tbl[n2].lo + tbl[n2].hi;
+
+	if (k == 0) {
+		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 +
+		    (tbl[n2].hi - 1);
+		RETURNI(t);
+	}
+	if (k == -1) {
+		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + 
+		    (tbl[n2].hi - 2);
+		RETURNI(t / 2);
+	}
+	if (k < -7) {
+		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
+		RETURNI(t * twopk - 1);
+	}
+	if (k > 2 * LDBL_MANT_DIG - 1) {
+		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
+		if (k == LDBL_MAX_EXP)
+			RETURNI(t * 2 * 0x1p16383L - 1);
+		RETURNI(t * twopk - 1);
+	}
+
+	v.xbits.expsign = BIAS - k;
+	twomk = v.e;
+
+	if (k > LDBL_MANT_DIG - 1)
+		t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi;
+	else
+		t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk);
+	RETURNI(t * twopk);
+}
Index: ld128/s_expl.c
===================================================================
--- ld128/s_expl.c	(revision 251146)
+++ ld128/s_expl.c	(working copy)
@@ -1,5 +1,5 @@
 /*-
- * Copyright (c) 2012 Steven G. Kargl
+ * Copyright (c) 2009-2013 Steven G. Kargl
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -22,6 +22,8 @@
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Optimized by Bruce D. Evans.
  */
 
 #include <sys/cdefs.h>
@@ -38,35 +40,67 @@
 #include "math_private.h"
 
 #define	INTERVALS	128
+#define	LOG2_INTERVALS	7
 #define	BIAS	(LDBL_MAX_EXP - 1)
 
+static const long double
+huge = 0x1p10000L,
+twom10000 = 0x1p-10000L;
+/* XXX Prevent gcc from erroneously constant folding this: */
 static volatile const long double tiny = 0x1p-10000L;
 
 static const long double
-INV_L = 1.84664965233787316142070359168242182e+02L,
-L1 = 5.41521234812457272982212595914567508e-03L,
-L2 = -1.02536706388947310094527932552595546e-29L,
-huge = 0x1p10000L,
+/* log(2**16384 - 0.5) rounded towards zero: */
+/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */
 o_threshold =  11356.523406294143949491931077970763428L,
-twom10000 = 0x1p-10000L,
+/* log(2**(-16381-64-1)) rounded towards zero: */
 u_threshold = -11433.462743336297878837243843452621503L;
 
+static const double
+/*
+ * ln2/INTERVALS = L1+L2 (hi+lo decomposition for multiplication).  L1 must
+ * have at least 22 (= log2(|LDBL_MIN_EXP-extras|) + log2(INTERVALS)) lowest
+ * bits zero so that multiplication of it by n is exact.
+ */
+INV_L = 1.8466496523378731e+2,		/*  0x171547652b82fe.0p-45 */
+L2 = -1.0253670638894731e-29;		/* -0x1.9ff0342542fc3p-97 */
 static const long double
-P2 = 5.00000000000000000000000000000000000e-1L,
-P3 = 1.66666666666666666666666666666666972e-1L,
-P4 = 4.16666666666666666666666666653708268e-2L,
-P5 = 8.33333333333333333333333315069867254e-3L,
-P6 = 1.38888888888888888888996596213795377e-3L,
-P7 = 1.98412698412698412718821436278644414e-4L,
-P8 = 2.48015873015869681884882576649543128e-5L,
-P9 = 2.75573192240103867817876199544468806e-6L,
-P10 = 2.75573236172670046201884000197885520e-7L,
-P11 = 2.50517544183909126492878226167697856e-8L;
+/* 0x1.62e42fefa39ef35793c768000000p-8 */
+L1 =  5.41521234812457272982212595914567508e-3L;
 
+/*
+ * XXX values in hex in comments have been lost (or were never present)
+ * from here.
+ */
+static const long double
+/*
+ * Domain [-0.002708, 0.002708], range ~[-2.4021e-38, 2.4234e-38]:
+ * |exp(x) - p(x)| < 2**-124.9
+ * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
+ *
+ * XXX the coeffs aren't very carefully rounded, and I get 2.3 more bits.
+ */
+A2  =  0.5,
+A3  =  1.66666666666666666666666666651085500e-1L,
+A4  =  4.16666666666666666666666666425885320e-2L,
+A5  =  8.33333333333333333334522877160175842e-3L,
+A6  =  1.38888888888888888889971139751596836e-3L;
+
+static const double
+A7  =  1.9841269841269470e-4,		/*  0x1.a01a01a019f91p-13 */
+A8  =  2.4801587301585286e-5,		/*  0x1.71de3ec75a967p-19 */
+A9  =  2.7557324277411235e-6,		/*  0x1.71de3ec75a967p-19 */
+A10 =  2.7557333722375069e-7;		/*  0x1.27e505ab56259p-22 */
+
 static const struct {
+	/*
+	 * hi must be rounded to at most 106 bits so that multiplication
+	 * by r1 in expm1l() is exact, but it is rounded to 88 bits due to
+	 * historical accidents.
+	 */
 	long double	hi;
 	long double	lo;
-} s[INTERVALS] = {
+} tbl[INTERVALS] = {
 	0x1p0L, 0x0p0L,
 	0x1.0163da9fb33356d84a66aep0L, 0x3.36dcdfa4003ec04c360be2404078p-92L,
 	0x1.02c9a3e778060ee6f7cacap0L, 0x4.f7a29bde93d70a2cabc5cb89ba10p-92L,
@@ -201,9 +235,10 @@
 expl(long double x)
 {
 	union IEEEl2bits u, v;
-	long double fn, r, r1, r2, q, t, twopk, twopkp10000;
+	long double q, r, r1, t, twopk, twopkp10000;
+	double dr, fn, r2;
 	int k, n, n2;
-	uint32_t hx, ix;
+	uint16_t hx, ix;
 
 	/* Filter out exceptional cases. */
 	u.e = x;
@@ -211,31 +246,39 @@
 	ix = hx & 0x7fff;
 	if (ix >= BIAS + 13) {		/* |x| >= 8192 or x is NaN */
 		if (ix == BIAS + LDBL_MAX_EXP) {
-			if (hx & 0x8000 && u.xbits.manh == 0 &&
-			    u.xbits.manl == 0)
-				return (0.0L);	/* x is -Inf */
-			return (x + x);	/* x is +Inf or NaN */
+			if (hx & 0x8000)  /* x is -Inf or -NaN */
+				return (-1 / x);
+			return (x + x);	/* x is +Inf or +NaN */
 		}
 		if (x > o_threshold)
 			return (huge * huge);
 		if (x < u_threshold)
 			return (tiny * tiny);
-	} else if (ix < BIAS - 115) {	/* |x| < 0x1p-115 */
-	    	if (huge + x > 1.0L)	/* trigger inexact iff x != 0 */
-			return (1.0L + x);
+	} else if (ix < BIAS - 114) {	/* |x| < 0x1p-114 */
+		return (1 + x);		/* 1 with inexact iff x != 0 */
 	}
 
-	/* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */
-	fn = x * INV_L + 0x1.8p112 - 0x1.8p112;
-	n  = (int)fn;
+	ENTERI();
+
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
+	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
+	/* XXX assume no extra precision for the additions, as for trig fns. */
+	/* XXX this set of comments is now quadruplicated. */
+	fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52;
+#if defined(HAVE_EFFICIENT_IRINT)
+	n = irint(fn);
+#else
+	n = (int)fn;
+#endif
 	n2 = (unsigned)n % INTERVALS;
-	k = (n - n2) / INTERVALS;
+	k = n >> LOG2_INTERVALS;
 	r1 = x - fn * L1;
-	r2 = -fn * L2;
+	r2 = fn * -L2;
+	r = r1 + r2;
 
 	/* Prepare scale factors. */
-	v.xbits.manh = 0;
-	v.xbits.manl = 0;
+	/* XXX sparc64 multiplication is so slow that scalbnl() is faster. */
+	v.e = 1;
 	if (k >= LDBL_MIN_EXP) {
 		v.xbits.expsign = BIAS + k;
 		twopk = v.e;
@@ -244,18 +287,224 @@
 		twopkp10000 = v.e;
 	}
 
-	r = r1 + r2;
-	q = r * r * (P2 + r * (P3 + r * (P4 + r * (P5 + r * (P6 + r * (P7 +
-	    r * (P8 + r * (P9 + r * (P10 + r * P11)))))))));
-	t = s[n2].lo + s[n2].hi;
-	t = s[n2].hi + (s[n2].lo + t * (r2 + q + r1));
+	/* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */
+	dr = r;
+	q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 +
+	    dr * (A7 + dr * (A8 + dr * (A9 + dr * A10))))))));
+	t = tbl[n2].lo + tbl[n2].hi;
+	t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
 
 	/* Scale by 2**k. */
 	if (k >= LDBL_MIN_EXP) {
 		if (k == LDBL_MAX_EXP)
-			return (t * 2.0L * 0x1p16383L);
-		return (t * twopk);
+			RETURNI(t * 2 * 0x1p16383L);
+		RETURNI(t * twopk);
 	} else {
-		return (t * twopkp10000 * twom10000);
+		RETURNI(t * twopkp10000 * twom10000);
 	}
 }
+
+/*
+ * Our T1 and T2 are chosen to be approximately the points where method
+ * A and method B have the same accuracy.  Tang's T1 and T2 are the
+ * points where method A's accuracy changes by a full bit.  For Tang,
+ * this drop in accuracy makes method A immediately less accurate than
+ * method B, but our larger INTERVALS makes method A 2 bits more
+ * accurate so it remains the most accurate method significantly
+ * closer to the origin despite losing the full bit in our extended
+ * range for it.
+ */
+static const double
+T1 = -0.1659,				/* ~-30.625/128 * log(2) */
+T2 =  0.1659;				/* ~30.625/128 * log(2) */
+
+/*
+ * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2].
+ * Setting T3 to 0 would require the |x| < 0x1p-113 condition to appear
+ * in both subintervals, so set T3 = 2**-5, which places the condition
+ * into the [T1:T3] interval.
+ *
+ * XXX we now do this more to (partially) balance the number of terms
+ * in the C and D polys than to avoid checking the conditon in both
+ * intervals.
+ *
+ * XXX these micro-optimizations are excessive.
+ */
+static const double
+T3 =  0.03125;
+
+/*
+ * Domain [-0.1659, 0.03125], range ~[2.9134e-44, 1.8404e-37]:
+ * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-122.03
+ */
+static const long double
+C3  =  1.66666666666666666666666666666666667e-1L,
+C4  =  4.16666666666666666666666666666666645e-2L,
+C5  =  8.33333333333333333333333333333371638e-3L,
+C6  =  1.38888888888888888888888888891188658e-3L,
+C7  =  1.98412698412698412698412697235950394e-4L,
+C8  =  2.48015873015873015873015112487849040e-5L,
+C9  =  2.75573192239858906525606685484412005e-6L,
+C10 =  2.75573192239858906612966093057020362e-7L,
+C11 =  2.50521083854417203619031960151253944e-8L,
+C12 =  2.08767569878679576457272282566520649e-9L,
+C13 =  1.60590438367252471783548748824255707e-10L;
+
+static const double
+C14 =  1.1470745580491932e-11,		/*  0x1.93974a81dae30p-37 */
+C15 =  7.6471620181090468e-13,		/*  0x1.ae7f3820adab1p-41 */
+C16 =  4.7793721460260450e-14,		/*  0x1.ae7cd18a18eacp-45 */
+C17 =  2.8074757356658877e-15,		/*  0x1.949992a1937d9p-49 */
+C18 =  1.4760610323699476e-16;		/*  0x1.545b43aabfbcdp-53 */
+
+/*
+ * Domain [0.03125, 0.1659], range ~[-2.7676e-37, -1.0367e-38]:
+ * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-121.44
+ *
+ */
+static const long double
+D3  =  1.66666666666666666666666666666682245e-1L,
+D4  =  4.16666666666666666666666666634228324e-2L,
+D5  =  8.33333333333333333333333364022244481e-3L,
+D6  =  1.38888888888888888888887138722762072e-3L,
+D7  =  1.98412698412698412699085805424661471e-4L,
+D8  =  2.48015873015873015687993712101479612e-5L,
+D9  =  2.75573192239858944101036288338208042e-6L,
+D10 =  2.75573192239853161148064676533754048e-7L,
+D11 =  2.50521083855084570046480450935267433e-8L,
+D12 =  2.08767569819738524488686318024854942e-9L,
+D13 =  1.60590442297008495301927448122499313e-10L;
+
+static const double
+D14 =  1.1470726176204336e-11,		/*  0x1.93971dc395d9ep-37 */
+D15 =  7.6478532249581686e-13,		/*  0x1.ae892e3D16fcep-41 */
+D16 =  4.7628892832607741e-14,		/*  0x1.ad00Dfe41feccp-45 */
+D17 =  3.0524857220358650e-15;		/*  0x1.D7e8d886Df921p-49 */
+
+long double
+expm1l(long double x)
+{
+	union IEEEl2bits u, v;
+	long double hx2_hi, hx2_lo, q, r, r1, t, twomk, twopk, x_hi;
+	long double x_lo, x2;
+	double dr, dx, fn, r2;
+	int k, n, n2;
+	uint16_t hx, ix;
+
+	/* Filter out exceptional cases. */
+	u.e = x;
+	hx = u.xbits.expsign;
+	ix = hx & 0x7fff;
+	if (ix >= BIAS + 7) {		/* |x| >= 128 or x is NaN */
+		if (ix == BIAS + LDBL_MAX_EXP) {
+			if (hx & 0x8000)  /* x is -Inf or -NaN */
+				return (-1 / x - 1);
+			return (x + x);	/* x is +Inf or +NaN */
+		}
+		if (x > o_threshold)
+			return (huge * huge);
+		/*
+		 * expm1l() never underflows, but it must avoid
+		 * unrepresentable large negative exponents.  We used a
+		 * much smaller threshold for large |x| above than in
+		 * expl() so as to handle not so large negative exponents
+		 * in the same way as large ones here.
+		 */
+		if (hx & 0x8000)	/* x <= -128 */
+			return (tiny - 1);	/* good for x < -114ln2 - eps */
+	}
+
+	ENTERI();
+
+	if (T1 < x && x < T2) {
+		x2 = x * x;
+		dx = x;
+
+		if (x < T3) {
+			if (ix < BIAS - 113) {	/* |x| < 0x1p-113 */
+				/* x (rounded) with inexact if x != 0: */
+				RETURNI(x == 0 ? x :
+				    (0x1p200 * x + fabsl(x)) * 0x1p-200);
+			}
+			q = x * x2 * C3 + x2 * x2 * (C4 + x * (C5 + x * (C6 +
+			    x * (C7 + x * (C8 + x * (C9 + x * (C10 +
+			    x * (C11 + x * (C12 + x * (C13 +
+			    dx * (C14 + dx * (C15 + dx * (C16 +
+			    dx * (C17 + dx * C18))))))))))))));
+		} else {
+			q = x * x2 * D3 + x2 * x2 * (D4 + x * (D5 + x * (D6 +
+			    x * (D7 + x * (D8 + x * (D9 + x * (D10 +
+			    x * (D11 + x * (D12 + x * (D13 +
+			    dx * (D14 + dx * (D15 + dx * (D16 +
+			    dx * D17)))))))))))));
+		}
+
+		x_hi = (float)x;
+		x_lo = x - x_hi;
+		hx2_hi = x_hi * x_hi / 2;
+		hx2_lo = x_lo * (x + x_hi) / 2;
+		if (ix >= BIAS - 7)
+			RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi));
+		else
+			RETURNI(hx2_lo + q + hx2_hi + x);
+	}
+
+	/* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */
+	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
+	fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52;
+#if defined(HAVE_EFFICIENT_IRINT)
+	n = irint(fn);
+#else
+	n = (int)fn;
+#endif
+	n2 = (unsigned)n % INTERVALS;
+	k = n >> LOG2_INTERVALS;
+	r1 = x - fn * L1;
+	r2 = fn * -L2;
+	r = r1 + r2;
+
+	/* Prepare scale factor. */
+	v.e = 1;
+	v.xbits.expsign = BIAS + k;
+	twopk = v.e;
+
+	/*
+	 * Evaluate lower terms of
+	 * expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2).
+	 */
+	dr = r;
+	q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 +
+	    dr * (A7 + dr * (A8 + dr * (A9 + dr * A10))))))));
+
+	t = tbl[n2].lo + tbl[n2].hi;
+
+	if (k == 0) {
+		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 +
+		    (tbl[n2].hi - 1);
+		RETURNI(t);
+	}
+	if (k == -1) {
+		t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + 
+		    (tbl[n2].hi - 2);
+		RETURNI(t / 2);
+	}
+	if (k < -7) {
+		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
+		RETURNI(t * twopk - 1);
+	}
+	if (k > 2 * LDBL_MANT_DIG - 1) {
+		t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi;
+		if (k == LDBL_MAX_EXP)
+			RETURNI(t * 2 * 0x1p16383L - 1);
+		RETURNI(t * twopk - 1);
+	}
+
+	v.xbits.expsign = BIAS - k;
+	twomk = v.e;
+
+	if (k > LDBL_MANT_DIG - 1)
+		t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi;
+	else
+		t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk);
+	RETURNI(t * twopk);
+}

From owner-freebsd-numerics@FreeBSD.ORG  Fri May 31 17:02:28 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 81EE555D
 for <freebsd-numerics@freebsd.org>; Fri, 31 May 2013 17:02:28 +0000 (UTC)
 (envelope-from s.montgomerysmith@gmail.com)
Received: from mail-ie0-x234.google.com (mail-ie0-x234.google.com
 [IPv6:2607:f8b0:4001:c03::234])
 by mx1.freebsd.org (Postfix) with ESMTP id 5866EFB7
 for <freebsd-numerics@freebsd.org>; Fri, 31 May 2013 17:02:28 +0000 (UTC)
Received: by mail-ie0-f180.google.com with SMTP id b11so4710449iee.11
 for <freebsd-numerics@freebsd.org>; Fri, 31 May 2013 10:02:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:subject
 :x-enigmail-version:content-type:content-transfer-encoding;
 bh=4ekXWtv/VLO5u0H6k7jVSf7z16PjejY29KIH8UQyiU0=;
 b=U/WKFnJ36dFL285BtTAevY/UvwMVHNFHt5rLPBTOkBTvNwWprrFP/kh10FK29tVt6k
 1V8SHscKetR6m+nf/prkp11t5HJ0b9Yy2myeUHVhHSd/taAT+GjBluIaHnmllJcvwAQM
 dXxX08Ktfx5kKRwlUsXVOAircR0abF0yOgbic+UxcKoyQZdHGYmiXWqEiJlaq95ShBr2
 kglCbYTJB2ta4/MWVDgoOEv+icJI+0XLqu5z0hjiui63OEAwNG9Z2nNICtlTAFsmWN8n
 DQ58hpJkXTd+k7Llcy8xebhHXwfHF8d53Gk3UbuWT39uR+Nh8zZrNX6uL/fhh7Ej0nA3
 FQAQ==
X-Received: by 10.50.136.201 with SMTP id qc9mr2159648igb.47.1370019747862;
 Fri, 31 May 2013 10:02:27 -0700 (PDT)
Received: from [10.7.129.223] ([161.130.188.41])
 by mx.google.com with ESMTPSA id z6sm864780igw.8.2013.05.31.10.02.25
 for <freebsd-numerics@freebsd.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Fri, 31 May 2013 10:02:26 -0700 (PDT)
Sender: Stephen Montgomery-Smith <s.montgomerysmith@gmail.com>
Message-ID: <51A8D7A0.5060905@missouri.edu>
Date: Fri, 31 May 2013 12:02:24 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
 rv:17.0) Gecko/20130510 Thunderbird/17.0.6
MIME-Version: 1.0
To: freebsd-numerics@freebsd.org
Subject: cacosh etc and bin/170206
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 May 2013 17:02:28 -0000

Do you think it is OK to close PR bin/170206?  The only reason to keep
it open is that the long double functions haven't been committed yet.
But I don't see how keeping this PR open will have any effect on how
fast this will happen.

From owner-freebsd-numerics@FreeBSD.ORG  Fri May 31 19:14:10 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 7CADEB72
 for <freebsd-numerics@freebsd.org>; Fri, 31 May 2013 19:14:10 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 62755868
 for <freebsd-numerics@freebsd.org>; Fri, 31 May 2013 19:14:10 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4VJEA0D074365
 for <freebsd-numerics@freebsd.org>; Fri, 31 May 2013 12:14:10 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4VJEA9d074364
 for freebsd-numerics@freebsd.org; Fri, 31 May 2013 12:14:10 -0700 (PDT)
 (envelope-from sgk)
Date: Fri, 31 May 2013 12:14:10 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: freebsd-numerics@freebsd.org
Subject: cosh magic number?
Message-ID: <20130531191410.GA74343@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 May 2013 19:14:10 -0000

In msun/src/e_cosh.c, one finds the comment

 *
 *                                 exp(x) +  1/exp(x)
 * ln2/2 <= x <= 22 :  cosh(x) := -------------------
 *                                        2

Where does the magic number 22 come from?

Using exp(-|2x|) = 2**(1-p) with p = 53 for double, I
arrive at 18.022, which is a little too small.

#include <stdio.h>
#include <math.h>

int
main(void)
{
	double x, y, z;
	x = 18.022;
	/* x = 19; */
	y = exp(x);
	z = cosh(x);
	printf("%a\n%a\n%a\n", z, 0.5*(y + 1/y), 0.5 * y);
	return 0;
}

% cc -o z -O a.c -lm && ./z
0x1.000b5bd5b4beep+25
0x1.000b5bd5b4beep+25
0x1.000b5bd5b4bedp+25

Rounding up to 19 gives

% cc -o z -O a.c -lm && ./z
0x1.546d8f9ed26e1p+26
0x1.546d8f9ed26e1p+26
0x1.546d8f9ed26e1p+26

So, why 22?

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Fri May 31 19:18:24 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id B8C40BB1
 for <freebsd-numerics@freebsd.org>; Fri, 31 May 2013 19:18:24 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au
 [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 65FD1883
 for <freebsd-numerics@freebsd.org>; Fri, 31 May 2013 19:18:24 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id EE442D41FF3;
 Sat,  1 Jun 2013 05:18:16 +1000 (EST)
Date: Sat, 1 Jun 2013 05:18:15 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: Patches for s_expl.c
In-Reply-To: <20130531154608.GA73175@troutmask.apl.washington.edu>
Message-ID: <20130601044545.B15695@besplex.bde.org>
References: <20130528172242.GA51485@troutmask.apl.washington.edu>
 <20130529062437.V4648@besplex.bde.org>
 <20130529162441.GA58773@troutmask.apl.washington.edu>
 <20130530045951.Y4776@besplex.bde.org>
 <20130530162723.GB66755@troutmask.apl.washington.edu>
 <20130531053652.H65974@besplex.bde.org>
 <20130531154608.GA73175@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=e/de0tV/ c=1 sm=1 a=kj9zAlcOel0A:10
 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=SI7jfN9uMIUA:10
 a=eYD37nbmresOHEikpvgA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
Cc: freebsd-numerics@freebsd.org, Bruce Evans <brde@optusnet.com.au>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 May 2013 19:18:24 -0000

On Fri, 31 May 2013, Steve Kargl wrote:

> On Fri, May 31, 2013 at 06:19:09AM +1000, Bruce Evans wrote:
>> On Thu, 30 May 2013, Steve Kargl wrote:

> I've add most of your suggests.

Perhaps too many :-).  Leave out and/or act more of my XXX comments and
I'm happy with it.

>> In a development version, I need hi to have only about 56 bits.  It is
>> easy to re-split hi+lo for testing this.  A 24-bit or 53-bit hi is
>> sufficient and would give this automatically.
>
> Is this a version where you try to eliminate the C and D polynomials?

Yes.  It all works fine except for efficiency, but efficiency was the main
reason to eliminate them (also simplicity -- we avoid a special case and
hope that the pipelining effects from this compensate for a few extra
instructions for the general case).

>> @  static const long double
>> @ +/*
>> @ + * XXX none of the long double C or D coeffs except C10 is correctly printed.
>> @ + * If you re-print their values in %.35Le format, the result is always
>> @ + * different.  For example, the last 2 digits in C3 should be 59, not 67.
>> @ + * 67 is apparently from rounding an extra-precision value to 36 decimal
>> @ + * places.
>> @ + */
>> @  C3  =  1.66666666666666666666666666666666667e-1L,
>>
>> I didn't fix these.
>
> I didn't fix the coefficient as well.  I'll do it if I ever get
> around to regenerating the coefficients.  The limiting testing
> that I've been able to do on flame gave max ULP < 0.51.  This,
> IMO, is good enough for now.

This is just cosmetic.  In order to verify the coeffs, I like to be
able to at least print them and get back the same results.  My pari
program that verifies them (by plotting the error function) does a
little more.  It has to round them to binary fractions, since any
extra precision in them would make them appear to be more accurate
then they are -- pari would use the extra precision of the decimal
values, but the compiler has to convert to binary for the CPU to use.

Here is a program to print their actual values (after rounding to
binary and back to decimal):

@ #include <float.h>
@ #include <stdio.h>
@ 
@ static const long double
@ o_threshold =  11356.523406294143949491931077970763428L,
@ u_threshold = -11433.462743336297878837243843452621503L,
@ L1 =  5.41521234812457272982212595914567508e-3L,
@ A3  =  1.66666666666666666666666666651085500e-1L,
@ A4  =  4.16666666666666666666666666425885320e-2L,
@ A5  =  8.33333333333333333334522877160175842e-3L,
@ A6  =  1.38888888888888888889971139751596836e-3L,
@ C3  =  1.66666666666666666666666666666666667e-1L,
@ C4  =  4.16666666666666666666666666666666645e-2L,
@ C5  =  8.33333333333333333333333333333371638e-3L,
@ C6  =  1.38888888888888888888888888891188658e-3L,
@ C7  =  1.98412698412698412698412697235950394e-4L,
@ C8  =  2.48015873015873015873015112487849040e-5L,
@ C9  =  2.75573192239858906525606685484412005e-6L,
@ C10 =  2.75573192239858906612966093057020362e-7L,
@ C11 =  2.50521083854417203619031960151253944e-8L,
@ C12 =  2.08767569878679576457272282566520649e-9L,
@ C13 =  1.60590438367252471783548748824255707e-10L,
@ D3  =  1.66666666666666666666666666666682245e-1L,
@ D4  =  4.16666666666666666666666666634228324e-2L,
@ D5  =  8.33333333333333333333333364022244481e-3L,
@ D6  =  1.38888888888888888888887138722762072e-3L,
@ D7  =  1.98412698412698412699085805424661471e-4L,
@ D8  =  2.48015873015873015687993712101479612e-5L,
@ D9  =  2.75573192239858944101036288338208042e-6L,
@ D10 =  2.75573192239853161148064676533754048e-7L,
@ D11 =  2.50521083855084570046480450935267433e-8L,
@ D12 =  2.08767569819738524488686318024854942e-9L,
@ D13 =  1.60590442297008495301927448122499313e-10L;
@ 
@ static const double
@ INV_L = 1.8466496523378731e+2,		/*  0x171547652b82fe.0p-45 */
@ L2 = -1.0253670638894731e-29,		/* -0x1.9ff0342542fc3p-97 */
@ A7  =  1.9841269841269471e-4,
@ A8  =  2.4801587301585284e-5,
@ A9  =  2.7557324277411234e-6,
@ A10 =  2.7557333722375072e-7,
@ C14 =  1.1470745580491932e-11,		/*  0x1.93974a81dae3p-37 */
@ C15 =  7.6471620181090468e-13,		/*  0x1.ae7f3820adab1p-41 */
@ C16 =  4.7793721460260450e-14,		/*  0x1.ae7cd18a18eacp-45 */
@ C17 =  2.8074757356658877e-15,		/*  0x1.949992a1937d9p-49 */
@ C18 =  1.4760610323699476e-16,		/*  0x1.545b43aabfbcdp-53 */
@ D14 =  1.1470726176204336e-11,		/*  0x1.93971dc395d9ep-37 */
@ D15 =  7.6478532249581686e-13,		/*  0x1.ae892e3D16fcep-41 */
@ D16 =  4.7628892832607741e-14,		/*  0x1.ad00Dfe41feccp-45 */
@ D17 =  3.0524857220358650e-15;		/*  0x1.D7e8d886Df921p-49 */
@ 
@ main()
@ {
@ 	printf("       %.35Le\n", o_threshold, o_threshold);
@ 	printf("       %.35Le\n", u_threshold, u_threshold);
@ 	printf("       %.35Le\n", L1, L1);
@ 	printf("       %.35Le\n", A3, A3);
@ 	printf("       %.35Le\n", A4, A4);
@ 	printf("       %.35Le\n", A5, A5);
@ 	printf("       %.35Le\n", A6, A6);
@ 	printf("       %.35Le\n", C3, C3);
@ 	printf("       %.35Le\n", C4, C4);
@ 	printf("       %.35Le\n", C5, C5);
@ 	printf("       %.35Le\n", C6, C6);
@ 	printf("       %.35Le\n", C7, C7);
@ 	printf("       %.35Le\n", C8, C8);
@ 	printf("       %.35Le\n", C9, C9);
@ 	printf("       %.35Le\n", C10, C10);
@ 	printf("       %.35Le\n", C11, C11);
@ 	printf("       %.35Le\n", C12, C12);
@ 	printf("       %.35Le\n", C13, C13);
@ 	printf("       %.35Le\n", D3, D3);
@ 	printf("       %.35Le\n", D4, D4);
@ 	printf("       %.35Le\n", D5, D5);
@ 	printf("       %.35Le\n", D6, D6);
@ 	printf("       %.35Le\n", D7, D7);
@ 	printf("       %.35Le\n", D8, D8);
@ 	printf("       %.35Le\n", D9, D9);
@ 	printf("       %.35Le\n", D10, D10);
@ 	printf("       %.35Le\n", D11, D11);
@ 	printf("       %.35Le\n", D12, D12);
@ 	printf("       %.35Le\n", D13, D13);
@ 
@ 	printf("      %.16e                %a\n", INV_L, INV_L);
@ 	printf("      %.16e                %a\n", L2, L2);
@ 	printf("      %.16e                %a\n", A7, A7);
@ 	printf("      %.16e                %a\n", A8, A9);
@ 	printf("      %.16e                %a\n", A9, A9);
@ 	printf("      %.16e                %a\n", A10, A10);
@ 	printf("       %.16e               %a\n", C14, C14);
@ 	printf("       %.16e               %a\n", C15, C15);
@ 	printf("       %.16e               %a\n", C16, C16);
@ 	printf("       %.16e               %a\n", C17, C17);
@ 	printf("       %.16e               %a\n", C18, C18);
@ 	printf("       %.16e               %a\n", D14, D14);
@ 	printf("       %.16e               %a\n", D15, D15);
@ 	printf("       %.16e               %a\n", D16, D16);
@ 	printf("       %.16e               %a\n", D17, D17);
@ }

> Final diff(?).

Just omit some new XXX comments and fix one of the new XXX comments:

> Index: ld128/s_expl.c
> ===================================================================
> --- ld128/s_expl.c	(revision 251146)
> +++ ld128/s_expl.c	(working copy)
> ...
> @@ -38,35 +40,67 @@
> ...
> +/*
> + * XXX values in hex in comments have been lost (or were never present)
> + * from here.
> + */

Omit.

> +static const long double
> +/*
> + * Domain [-0.002708, 0.002708], range ~[-2.4021e-38, 2.4234e-38]:
> + * |exp(x) - p(x)| < 2**-124.9
> + * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
> + *
> + * XXX the coeffs aren't very carefully rounded, and I get 2.3 more bits.
> + */

Omit the XXX part.

> ...
> @@ -244,18 +287,224 @@
> +static const double
> +T1 = -0.1659,				/* ~-30.625/128 * log(2) */
> +T2 =  0.1659;				/* ~30.625/128 * log(2) */
> +
> +/*
> + * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2].
> + * Setting T3 to 0 would require the |x| < 0x1p-113 condition to appear
> + * in both subintervals, so set T3 = 2**-5, which places the condition
> + * into the [T1:T3] interval.
> + *
> + * XXX we now do this more to (partially) balance the number of terms
> + * in the C and D polys than to avoid checking the conditon in both
> + * intervals.

Merge with the previous comment and remove XXX.

I just noticed that you use a different notation for intervals than me --
[T1:T2] instead of [T1, T2].  The former looks like it is from a
programming language and the latter is normal math notation.

> ...
> +/*
> + * Domain [0.03125, 0.1659], range ~[-2.7676e-37, -1.0367e-38]:
> + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-121.44
> + *
> + */

Extra empty line.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Fri May 31 20:51:34 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id A5DA542A
 for <freebsd-numerics@FreeBSD.org>; Fri, 31 May 2013 20:51:34 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au
 [211.29.132.191]) by mx1.freebsd.org (Postfix) with ESMTP id 2EB08C01
 for <freebsd-numerics@FreeBSD.org>; Fri, 31 May 2013 20:51:33 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4VKpOxf018305
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Sat, 1 Jun 2013 06:51:26 +1000
Date: Sat, 1 Jun 2013 06:51:24 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: cosh magic number?
In-Reply-To: <20130531191410.GA74343@troutmask.apl.washington.edu>
Message-ID: <20130601052415.H15844@besplex.bde.org>
References: <20130531191410.GA74343@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=e/de0tV/ c=1 sm=1 a=kj9zAlcOel0A:10
 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=WUpannN5onsA:10
 a=qPt0-ISivhtabmaQ0fEA:9 a=CjuIK1q_8ugA:10 a=gwKr3FwWfh0Jz3qo:21
 a=etSL3OnsOLLAcWig:21 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117
Cc: freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 May 2013 20:51:34 -0000

On Fri, 31 May 2013, Steve Kargl wrote:

> In msun/src/e_cosh.c, one finds the comment
>
> *
> *                                 exp(x) +  1/exp(x)
> * ln2/2 <= x <= 22 :  cosh(x) := -------------------
> *                                        2
>
> Where does the magic number 22 come from?

It is just a threshold at which a sloppier approximation becomes
adequate.  But you know that...

> Using exp(-|2x|) = 2**(1-p) with p = 53 for double, I
> arrive at 18.022, which is a little too small.

I get 18.368 using exp(-|2x|) = 2**p for the natural threshold.
(Consider x+y instead of E+1/E.  When x is 1+eps (with eps giving
1 in the last place, adding y = eps/2 causes rounding up to even).
This y is 2**p times smaller than x.  If x has extra precision,
then y still needs to start more than <full number of bits in x>
bits further out for adding y to have no effect, even if the
final result has no extra precision.)

I first thought that the extras are guard bits.  Perhaps they are,
but guard bits are not representable unless there is extra precision.
22 gives log2(exp(44)) ~= 64.479 bits.  64 fits well with x86 extra
precision.

This is easier to test in float precision.  Try all integer thresholds
near the chosen one, on all x.  Expect a difference for extra precision.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Sat Jun  1 00:48:16 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 79F38BC9
 for <freebsd-numerics@freebsd.org>; Sat,  1 Jun 2013 00:48:16 +0000 (UTC)
 (envelope-from lists@eitanadler.com)
Received: from mail-pd0-f172.google.com (mail-pd0-f172.google.com
 [209.85.192.172]) by mx1.freebsd.org (Postfix) with ESMTP id 56C5F38C
 for <freebsd-numerics@freebsd.org>; Sat,  1 Jun 2013 00:48:15 +0000 (UTC)
Received: by mail-pd0-f172.google.com with SMTP id 10so3026112pdi.3
 for <freebsd-numerics@freebsd.org>; Fri, 31 May 2013 17:48:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=eitanadler.com; s=0xdeadbeef;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=lKZEx8SzF437Bx0J9U79+xM85p3sHRYavF3+O23ZWMM=;
 b=KlMSDeVys6bYO1tu04fGiMytXXfOlP/I9wA9icvB8HR1pS7rFOn3xJEt3FT/RHSJWz
 C5kb7YGurzS+8TNHSy7wr3Q5RuT65aKHX5NgNE6CEkdFeDdtxaVxjWv32X0t1KIgYG+g
 jv037AkfkUdUC0C3OM9yvGYDimZqKVg0wNCp0=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type:x-gm-message-state;
 bh=lKZEx8SzF437Bx0J9U79+xM85p3sHRYavF3+O23ZWMM=;
 b=haYR2Gb+S+njylixzPIhzgwPE59MRyQ/yYb/t5VB0TSOckuYtdMCikhHMsnOi2HMpY
 4rDA+dEUCHSw/6U+CPI3TC4bEVGgEQkIJ03TlIckyygEf+bJgkUiq226vXpYPFigQ9rC
 /m3H3fUt7lV2ba6+XNsntHBMWVDGW4VNzjLMNZUg5FczN2mxeXQwBOvUr8SNw7PIoPpl
 mQaZoFz9Ohjl3LwtwCBRbwxMRcRTo01rk6c4rwqRhyqmxgs/p0CAAev5V4Hb+66LcCi9
 jdgZs6pNE327o7SDMrACau95ULPGhfFYdWOGNvoybbnte7yvQJzYtxsHiy1QAXk45WiB
 NnNw==
X-Received: by 10.66.240.70 with SMTP id vy6mr16160275pac.70.1370047695634;
 Fri, 31 May 2013 17:48:15 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.70.91.139 with HTTP; Fri, 31 May 2013 17:47:45 -0700 (PDT)
In-Reply-To: <51A8D7A0.5060905@missouri.edu>
References: <51A8D7A0.5060905@missouri.edu>
From: Eitan Adler <lists@eitanadler.com>
Date: Sat, 1 Jun 2013 02:47:45 +0200
Message-ID: <CAF6rxgmNjhZTywQV=PmJyKQJuZX+ChKBtPRO6D54Te0j8Yto+A@mail.gmail.com>
Subject: Re: cacosh etc and bin/170206
To: Stephen Montgomery-Smith <stephen@missouri.edu>
Content-Type: text/plain; charset=UTF-8
X-Gm-Message-State: ALoCoQkt+23IzFnzAH1CqPsBXomigXLsb5Mtbi5uVfjyOdqDzEuZTzgGhNGj2hXeXMn1Bas++h7X
Cc: freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Jun 2013 00:48:16 -0000

On 31 May 2013 19:02, Stephen Montgomery-Smith <stephen@missouri.edu> wrote:
> Do you think it is OK to close PR bin/170206?  The only reason to keep
> it open is that the long double functions haven't been committed yet.
> But I don't see how keeping this PR open will have any effect on how
> fast this will happen.

Please leave it open until the patches that are relevant are committed
and MFCed (if appropriate).  This isn't to speed up the final result,
but to serve as a place to track current status.

-- 
Eitan Adler