From owner-freebsd-numerics@FreeBSD.ORG Mon May 27 11:06:51 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 50EC4371 for ; Mon, 27 May 2013 11:06:51 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 28A566D0 for ; Mon, 27 May 2013 11:06:51 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r4RB6pSi016110 for ; Mon, 27 May 2013 11:06:51 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r4RB6oWc016108 for freebsd-numerics@FreeBSD.org; Mon, 27 May 2013 11:06:50 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 27 May 2013 11:06:50 GMT Message-Id: <201305271106.r4RB6oWc016108@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-numerics@FreeBSD.org Subject: Current problem reports assigned to freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 May 2013 11:06:51 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o stand/175811 numerics libstdc++ needs complex support in order use C99 o bin/170206 numerics [msun] [patch] complex arcsinh, log, etc. 2 problems total. From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 04:32:24 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 016CACE2; Tue, 28 May 2013 04:32:24 +0000 (UTC) (envelope-from das@freebsd.org) Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net [50.196.151.174]) by mx1.freebsd.org (Postfix) with ESMTP id D776875B; Tue, 28 May 2013 04:32:23 +0000 (UTC) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r4S4W8sJ012895; Mon, 27 May 2013 21:32:08 -0700 (PDT) (envelope-from das@freebsd.org) Received: (from das@localhost) by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r4S4W5tG012894; Mon, 27 May 2013 21:32:05 -0700 (PDT) (envelope-from das@freebsd.org) Date: Mon, 27 May 2013 21:32:05 -0700 From: David Schultz To: Stephen Montgomery-Smith Subject: Re: Use of C99 extra long double math functions after r236148 Message-ID: <20130528043205.GA3282@zim.MIT.EDU> References: <500DAD41.5030104@missouri.edu> <20120724113214.G934@besplex.bde.org> <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5015BB9F.90807@missouri.edu> X-Mailman-Approved-At: Tue, 28 May 2013 11:20:14 +0000 Cc: Diane Bruce , Bruce Evans , John Baldwin , David Chisnall , freebsd-numerics@freebsd.org, Bruce Evans , Steve Kargl , Peter Jeremy , Warner Losh X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 04:32:24 -0000 On Sun, Jul 29, 2012, Stephen Montgomery-Smith wrote: > Also I forgot that the real part of casinh(0+I*x) isn't always 0. If > |x|>1, it is something non-zero. And so you need to check that > creal(casinh(0+I*x)) and creal(casinh(-0+I*x)) have opposite signs in > this case. > > > I'm less sure of the next logical > > step, which is to check things like > > casinh(x + I*0) = asinh(x) + I*0 > > Does C99 mandate this? My programs probably won't satisfy this, because > I realized that the computation works in these cases anyway. Of course, > it would be easy to make it happen. Hi Stephen, I wrote some tests to cover the corner cases for the complex inverse trig functions. They don't find any nontrivial bugs in your implementations. :-) Now that you have a commit bit, would you like to commit your code, or shall I? Below is a diff of all the changes needed to integrate it. I have a short list of style fixes, but otherwise I think what you have is good: - wrap lines to 80 chars, please - spaces between operators - "static inline", not "inline static" - don't use "inline" on large functions Index: lib/msun/Makefile =================================================================== --- lib/msun/Makefile (revision 251024) +++ lib/msun/Makefile (working copy) @@ -105,7 +105,8 @@ .endif # C99 complex functions -COMMON_SRCS+= s_ccosh.c s_ccoshf.c s_cexp.c s_cexpf.c \ +COMMON_SRCS+= catrig.c catrigf.c \ + s_ccosh.c s_ccoshf.c s_cexp.c s_cexpf.c \ s_cimag.c s_cimagf.c s_cimagl.c \ s_conj.c s_conjf.c s_conjl.c \ s_cproj.c s_cprojf.c s_creal.c s_crealf.c s_creall.c \ @@ -126,7 +127,7 @@ INCS+= fenv.h math.h MAN= acos.3 acosh.3 asin.3 asinh.3 atan.3 atan2.3 atanh.3 \ - ceil.3 ccos.3 ccosh.3 cexp.3 \ + ceil.3 cacos.3 ccos.3 ccosh.3 cexp.3 \ cimag.3 copysign.3 cos.3 cosh.3 csqrt.3 erf.3 exp.3 fabs.3 fdim.3 \ feclearexcept.3 feenableexcept.3 fegetenv.3 \ fegetround.3 fenv.3 floor.3 \ @@ -144,6 +145,9 @@ MLINKS+=atanh.3 atanhf.3 MLINKS+=atan2.3 atan2f.3 atan2.3 atan2l.3 \ atan2.3 carg.3 atan2.3 cargf.3 atan2.3 cargl.3 +MLINKS+=cacos.3 cacosf.3 cacos.3 cacosh.3 cacos.3 cacoshf.3 \ + cacos.3 casin.3 cacos.3 casinf.3 cacos.3 casinh.3 cacos.3 casinhf.3 \ + cacos.3 catan.3 cacos.3 catanf.3 cacos.3 catanh.3 cacos.3 catanhf.3 MLINKS+=ccos.3 ccosf.3 ccos.3 csin.3 ccos.3 csinf.3 ccos.3 ctan.3 ccos.3 ctanf.3 MLINKS+=ccosh.3 ccoshf.3 ccosh.3 csinh.3 ccosh.3 csinhf.3 \ ccosh.3 ctanh.3 ccosh.3 ctanhf.3 Index: lib/msun/Symbol.map =================================================================== --- lib/msun/Symbol.map (revision 251024) +++ lib/msun/Symbol.map (working copy) @@ -237,6 +237,18 @@ fegetround; fesetround; fesetenv; + cacos; + cacosf; + cacosh; + cacoshf; + casin; + casinf; + casinh; + casinhf; + catan; + catanf; + catanh; + catanhf; csin; csinf; csinh; Index: lib/msun/man/cacos.3 =================================================================== --- lib/msun/man/cacos.3 (revision 0) +++ lib/msun/man/cacos.3 (working copy) @@ -0,0 +1,128 @@ +.\" Copyright (c) 2013 David Schultz +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd May 27, 2013 +.Dt CACOS 3 +.Os +.Sh NAME +.Nm cacos , +.Nm cacosf , +.Nm cacosh , +.Nm cacoshf , +.Nm casin , +.Nm casinf +.Nm casinh , +.Nm casinhf +.Nm catan , +.Nm catanf +.Nm catanh , +.Nm catanhf +.Nd complex arc trigonometric and hyperbolic functions +.Sh LIBRARY +.Lb libm +.Sh SYNOPSIS +.In complex.h +.Ft double complex +.Fn cacos "double complex z" +.Ft float complex +.Fn cacosf "float complex z" +.Ft double complex +.Fn cacosh "double complex z" +.Ft float complex +.Fn cacoshf "float complex z" +.Ft double complex +.Fn casin "double complex z" +.Ft float complex +.Fn casinf "float complex z" +.Ft double complex +.Fn casinh "double complex z" +.Ft float complex +.Fn casinhf "float complex z" +.Ft double complex +.Fn catan "double complex z" +.Ft float complex +.Fn catanf "float complex z" +.Ft double complex +.Fn catanh "double complex z" +.Ft float complex +.Fn catanhf "float complex z" +.Sh DESCRIPTION +The +.Fn cacos , +.Fn casin , +and +.Fn catan +functions compute the principal value of the inverse cosine, sine, +and tangent of the complex number +.Fa z , +respectively. +The +.Fn cacosh , +.Fn casinh , +and +.Fn catanh +functions compute the principal value of the inverse hyperbolic +cosine, sine, and tangent, respectively. +The +.Fn cacosf , +.Fn casinf , +.Fn catanf +.Fn cacoshf , +.Fn casinhf , +and +.Fn catanhf +functions perform the same operations in +.Fa float +precision. +.Pp +.ie '\*[.T]'utf8' +. ds Un \[cu] +.el +. ds Un U +. +There is no universal convention for defining the principal values of +these functions. The following table gives the branch cuts, and the +corresponding ranges for the return values, adopted by the C language. +.Bl -column ".Sy Function" ".Sy (-\*(If*I, -I) \*(Un (I, \*(If*I)" ".Sy [-\*(Pi/2*I, \*(Pi/2*I]" +.It Sy Function Ta Sy Branch Cut(s) Ta Sy Range +.It cacos Ta (-\*(If, -1) \*(Un (1, \*(If) Ta [0, \*(Pi] +.It casin Ta (-\*(If, -1) \*(Un (1, \*(If) Ta [-\*(Pi/2, \*(Pi/2] +.It catan Ta (-\*(If*I, -i) \*(Un (I, \*(If*I) Ta [-\*(Pi/2, \*(Pi/2] +.It cacosh Ta (-\*(If, 1) Ta [-\*(Pi*I, \*(Pi*I] +.It casinh Ta (-\*(If*I, -i) \*(Un (I, \*(If*I) Ta [-\*(Pi/2*I, \*(Pi/2*I] +.It catanh Ta (-\*(If, -1) \*(Un (1, \*(If) Ta [-\*(Pi/2*I, \*(Pi/2*I] +.El +.Sh SEE ALSO +.Xr cacosh 3 , +.Xr ccosh 3 , +.Xr complex 3 , +.Xr cos 3 , +.Xr math 3 , +.Xr sin 3 , +.Xr tan 3 +.Sh STANDARDS +These functions conform to +.St -isoC-99 . Property changes on: lib/msun/man/cacos.3 ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Index: lib/msun/man/ccos.3 =================================================================== --- lib/msun/man/ccos.3 (revision 251024) +++ lib/msun/man/ccos.3 (working copy) @@ -69,6 +69,7 @@ .Fa float precision. .Sh SEE ALSO +.Xr cacos 3 , .Xr ccosh 3 , .Xr complex 3 , .Xr cos 3 , Index: lib/msun/man/ccosh.3 =================================================================== --- lib/msun/man/ccosh.3 (revision 251024) +++ lib/msun/man/ccosh.3 (working copy) @@ -69,6 +69,7 @@ .Fa float precision. .Sh SEE ALSO +.Xr cacosh 3 , .Xr ccos 3 , .Xr complex 3 , .Xr cosh 3 , Index: lib/msun/man/complex.3 =================================================================== --- lib/msun/man/complex.3 (revision 251024) +++ lib/msun/man/complex.3 (working copy) @@ -89,6 +89,12 @@ .\" Section 7.3.5-6 of ISO C99 standard .Ss Trigonometric and Hyperbolic Functions .Cl +cacos arc cosine +cacosh arc hyperbolic cosine +casin arc sine +casinh arc hyperbolic sine +catan arc tangent +catanh arc hyperbolic tangent ccos cosine ccosh hyperbolic cosine csin sine @@ -111,20 +117,8 @@ functions described here conform to .St -isoC-99 . .Sh BUGS -The inverse trigonometric and hyperbolic functions -.Fn cacos , -.Fn cacosh , -.Fn casin , -.Fn casinh , -.Fn catan , -and -.Fn catanh -are not implemented. -.Pp The logarithmic functions .Fn clog -are not implemented. -.Pp -The power functions +and the power functions .Fn cpow are not implemented. Index: tools/regression/lib/msun/Makefile =================================================================== --- tools/regression/lib/msun/Makefile (revision 251024) +++ tools/regression/lib/msun/Makefile (working copy) @@ -2,7 +2,8 @@ TESTS= test-cexp test-conj test-csqrt test-ctrig \ test-exponential test-fenv test-fma \ - test-fmaxmin test-ilogb test-invtrig test-logarithm test-lrint \ + test-fmaxmin test-ilogb test-invtrig test-invctrig \ + test-logarithm test-lrint \ test-lround test-nan test-nearbyint test-next test-rem test-trig CFLAGS+= -O0 -lm Index: tools/regression/lib/msun/test-invctrig.c =================================================================== --- tools/regression/lib/msun/test-invctrig.c (revision 0) +++ tools/regression/lib/msun/test-invctrig.c (working copy) @@ -0,0 +1,467 @@ +/*- + * Copyright (c) 2008-2013 David Schultz + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +/* + * Tests for casin[h](), cacos[h](), and catan[h](). + */ + +#include +__FBSDID("$FreeBSD$"); + +#include +#include +#include +#include +#include +#include + +#define ALL_STD_EXCEPT (FE_DIVBYZERO | FE_INEXACT | FE_INVALID | \ + FE_OVERFLOW | FE_UNDERFLOW) +#define OPT_INVALID (ALL_STD_EXCEPT & ~FE_INVALID) +#define OPT_INEXACT (ALL_STD_EXCEPT & ~FE_INEXACT) +#define FLT_ULP() ldexpl(1.0, 1 - FLT_MANT_DIG) +#define DBL_ULP() ldexpl(1.0, 1 - DBL_MANT_DIG) +#define LDBL_ULP() ldexpl(1.0, 1 - LDBL_MANT_DIG) + +#pragma STDC FENV_ACCESS ON +#pragma STDC CX_LIMITED_RANGE OFF + +/* + * XXX gcc implements complex multiplication incorrectly. In + * particular, it implements it as if the CX_LIMITED_RANGE pragma + * were ON. Consequently, we need this function to form numbers + * such as x + INFINITY * I, since gcc evalutes INFINITY * I as + * NaN + INFINITY * I. + */ +static inline long double complex +cpackl(long double x, long double y) +{ + long double complex z; + + __real__ z = x; + __imag__ z = y; + return (z); +} + +/* Flags that determine whether to check the signs of the result. */ +#define CS_REAL 1 +#define CS_IMAG 2 +#define CS_BOTH (CS_REAL | CS_IMAG) + +#ifdef DEBUG +#define debug(...) printf(__VA_ARGS__) +#else +#define debug(...) (void)0 +#endif + +/* + * Test that a function returns the correct value and sets the + * exception flags correctly. The exceptmask specifies which + * exceptions we should check. We need to be lenient for several + * reasons, but mainly because on some architectures it's impossible + * to raise FE_OVERFLOW without raising FE_INEXACT. + * + * These are macros instead of functions so that assert provides more + * meaningful error messages. + * + * XXX The volatile here is to avoid gcc's bogus constant folding and work + * around the lack of support for the FENV_ACCESS pragma. + */ +#define test_p(func, z, result, exceptmask, excepts, checksign) do { \ + volatile long double complex _d = z; \ + debug(" testing %s(%Lg + %Lg I) == %Lg + %Lg I\n", #func, \ + creall(_d), cimagl(_d), creall(result), cimagl(result)); \ + assert(feclearexcept(FE_ALL_EXCEPT) == 0); \ + assert(cfpequal((func)(_d), (result), (checksign))); \ + assert(((func), fetestexcept(exceptmask) == (excepts))); \ +} while (0) + +/* + * Test within a given tolerance. The tolerance indicates relative error + * in ulps. + */ +#define test_p_tol(func, z, result, tol) do { \ + volatile long double complex _d = z; \ + debug(" testing %s(%Lg + %Lg I) ~= %Lg + %Lg I\n", #func, \ + creall(_d), cimagl(_d), creall(result), cimagl(result)); \ + assert(cfpequal_tol((func)(_d), (result), (tol))); \ +} while (0) + +/* These wrappers apply the identities f(conj(z)) = conj(f(z)). */ +#define test(func, z, result, exceptmask, excepts, checksign) do { \ + test_p(func, z, result, exceptmask, excepts, checksign); \ + test_p(func, conjl(z), conjl(result), exceptmask, excepts, checksign); \ +} while (0) +#define test_tol(func, z, result, tol) do { \ + test_p_tol(func, z, result, tol); \ + test_p_tol(func, conjl(z), conjl(result), tol); \ +} while (0) + +/* Test the given function in all precisions. */ +#define testall(func, x, result, exceptmask, excepts, checksign) do { \ + test(func, x, result, exceptmask, excepts, checksign); \ + test(func##f, x, result, exceptmask, excepts, checksign); \ +} while (0) +#define testall_odd(func, x, result, exceptmask, excepts, checksign) do { \ + testall(func, x, result, exceptmask, excepts, checksign); \ + testall(func, -(x), -result, exceptmask, excepts, checksign); \ +} while (0) +#define testall_even(func, x, result, exceptmask, excepts, checksign) do { \ + testall(func, x, result, exceptmask, excepts, checksign); \ + testall(func, -(x), result, exceptmask, excepts, checksign); \ +} while (0) + +/* + * Test the given function in all precisions, within a given tolerance. + * The tolerance is specified in ulps. + */ +#define testall_tol(func, x, result, tol) do { \ + test_tol(func, x, result, (tol) * DBL_ULP()); \ + test_tol(func##f, x, result, (tol) * FLT_ULP()); \ +} while (0) +#define testall_odd_tol(func, x, result, tol) do { \ + testall_tol(func, x, result, tol); \ + testall_tol(func, -(x), -result, tol); \ +} while (0) +#define testall_even_tol(func, x, result, tol) do { \ + testall_tol(func, x, result, tol); \ + testall_tol(func, -(x), result, tol); \ +} while (0) + +static const long double +pi = 3.14159265358979323846264338327950280L, +c3pi = 9.42477796076937971538793014983850839L; + +/* + * Determine whether x and y are equal, with two special rules: + * +0.0 != -0.0 + * NaN == NaN + * If checksign is 0, we compare the absolute values instead. + */ +static int +fpequal(long double x, long double y, int checksign) +{ + if (isnan(x) && isnan(y)) + return (1); + if (checksign) + return (x == y && !signbit(x) == !signbit(y)); + else + return (fabsl(x) == fabsl(y)); +} + +static int +fpequal_tol(long double x, long double y, long double tol) +{ + fenv_t env; + int ret; + + if (isnan(x) && isnan(y)) + return (1); + if (!signbit(x) != !signbit(y)) + return (0); + if (x == y) + return (1); + if (tol == 0 || y == 0.0) + return (0); + + /* Hard case: need to check the tolerance. */ + feholdexcept(&env); + ret = fabsl(x - y) <= fabsl(y * tol); + fesetenv(&env); + return (ret); +} + +static int +cfpequal(long double complex x, long double complex y, int checksign) +{ + return (fpequal(creal(x), creal(y), checksign & CS_REAL) + && fpequal(cimag(x), cimag(y), checksign & CS_IMAG)); +} + +static int +cfpequal_tol(long double complex x, long double complex y, long double tol) +{ + return (fpequal_tol(creal(x), creal(y), tol) + && fpequal_tol(cimag(x), cimag(y), tol)); +} + + +/* Tests for 0 */ +void +test_zero(void) +{ + long double complex zero = cpackl(0.0, 0.0); + + testall_tol(cacosh, zero, cpackl(0.0, pi / 2), 1); + testall_tol(cacosh, -zero, cpackl(0.0, -pi / 2), 1); + testall_tol(cacos, zero, cpackl(pi / 2, -0.0), 1); + testall_tol(cacos, -zero, cpackl(pi / 2, 0.0), 1); + + testall_odd(casinh, zero, zero, ALL_STD_EXCEPT, 0, CS_BOTH); + testall_odd(casin, zero, zero, ALL_STD_EXCEPT, 0, CS_BOTH); + + testall_odd(catanh, zero, zero, ALL_STD_EXCEPT, 0, CS_BOTH); + testall_odd(catan, zero, zero, ALL_STD_EXCEPT, 0, CS_BOTH); +} + +/* + * Tests for NaN inputs. + */ +void +test_nan() +{ + long double complex nan_nan = cpackl(NAN, NAN); + long double complex z; + + /* + * IN CACOSH CACOS CASINH CATANH + * NaN,NaN NaN,NaN NaN,NaN NaN,NaN NaN,NaN + * finite,NaN NaN,NaN* NaN,NaN* NaN,NaN* NaN,NaN* + * NaN,finite NaN,NaN* NaN,NaN* NaN,NaN* NaN,NaN* + * NaN,Inf Inf,NaN NaN,-Inf ?Inf,NaN ?0,pi/2 + * +-Inf,NaN Inf,NaN NaN,?Inf +-Inf,NaN +-0,NaN + * +-0,NaN NaN,NaN* pi/2,NaN NaN,NaN* +-0,NaN + * NaN,0 NaN,NaN* NaN,NaN* NaN,0 NaN,NaN* + * + * * = raise invalid + */ + z = nan_nan; + testall(cacosh, z, nan_nan, ALL_STD_EXCEPT, 0, 0); + testall(cacos, z, nan_nan, ALL_STD_EXCEPT, 0, 0); + testall(casinh, z, nan_nan, ALL_STD_EXCEPT, 0, 0); + testall(casin, z, nan_nan, ALL_STD_EXCEPT, 0, 0); + testall(catanh, z, nan_nan, ALL_STD_EXCEPT, 0, 0); + testall(catan, z, nan_nan, ALL_STD_EXCEPT, 0, 0); + + z = cpackl(0.5, NAN); + testall(cacosh, z, nan_nan, OPT_INVALID, 0, 0); + testall(cacos, z, nan_nan, OPT_INVALID, 0, 0); + testall(casinh, z, nan_nan, OPT_INVALID, 0, 0); + testall(casin, z, nan_nan, OPT_INVALID, 0, 0); + testall(catanh, z, nan_nan, OPT_INVALID, 0, 0); + testall(catan, z, nan_nan, OPT_INVALID, 0, 0); + + z = cpackl(NAN, 0.5); + testall(cacosh, z, nan_nan, OPT_INVALID, 0, 0); + testall(cacos, z, nan_nan, OPT_INVALID, 0, 0); + testall(casinh, z, nan_nan, OPT_INVALID, 0, 0); + testall(casin, z, nan_nan, OPT_INVALID, 0, 0); + testall(catanh, z, nan_nan, OPT_INVALID, 0, 0); + testall(catan, z, nan_nan, OPT_INVALID, 0, 0); + + z = cpackl(NAN, INFINITY); + testall(cacosh, z, cpackl(INFINITY, NAN), ALL_STD_EXCEPT, 0, CS_REAL); + testall(cacosh, -z, cpackl(INFINITY, NAN), ALL_STD_EXCEPT, 0, CS_REAL); + testall(cacos, z, cpackl(NAN, -INFINITY), ALL_STD_EXCEPT, 0, CS_IMAG); + testall(casinh, z, cpackl(INFINITY, NAN), ALL_STD_EXCEPT, 0, 0); + testall(casin, z, cpackl(NAN, INFINITY), ALL_STD_EXCEPT, 0, CS_IMAG); + testall_tol(catanh, z, cpackl(0.0, pi / 2), 1); + testall(catan, z, cpackl(NAN, 0.0), ALL_STD_EXCEPT, 0, CS_IMAG); + + z = cpackl(INFINITY, NAN); + testall_even(cacosh, z, cpackl(INFINITY, NAN), ALL_STD_EXCEPT, 0, + CS_REAL); + testall_even(cacos, z, cpackl(NAN, INFINITY), ALL_STD_EXCEPT, 0, 0); + testall_odd(casinh, z, cpackl(INFINITY, NAN), ALL_STD_EXCEPT, 0, + CS_REAL); + testall_odd(casin, z, cpackl(NAN, INFINITY), ALL_STD_EXCEPT, 0, 0); + testall_odd(catanh, z, cpackl(0.0, NAN), ALL_STD_EXCEPT, 0, CS_REAL); + testall_odd_tol(catan, z, cpackl(pi / 2, 0.0), 1); + + z = cpackl(0.0, NAN); + /* XXX We allow a spurious inexact exception here. */ + testall_even(cacosh, z, nan_nan, OPT_INVALID & ~FE_INEXACT, 0, 0); + testall_even_tol(cacos, z, cpackl(pi / 2, NAN), 1); + testall_odd(casinh, z, nan_nan, OPT_INVALID, 0, 0); + testall_odd(casin, z, cpackl(0.0, NAN), ALL_STD_EXCEPT, 0, CS_REAL); + testall_odd(catanh, z, cpackl(0.0, NAN), OPT_INVALID, 0, CS_REAL); + testall_odd(catan, z, nan_nan, OPT_INVALID, 0, 0); + + z = cpackl(NAN, 0.0); + testall(cacosh, z, nan_nan, OPT_INVALID, 0, 0); + testall(cacos, z, nan_nan, OPT_INVALID, 0, 0); + testall(casinh, z, cpackl(NAN, 0), ALL_STD_EXCEPT, 0, CS_IMAG); + testall(casin, z, nan_nan, OPT_INVALID, 0, 0); + testall(catanh, z, nan_nan, OPT_INVALID, 0, CS_IMAG); + testall(catan, z, cpackl(NAN, 0.0), ALL_STD_EXCEPT, 0, 0); +} + +void +test_inf(void) +{ + long double complex z; + + /* + * IN CACOSH CACOS CASINH CATANH + * Inf,Inf Inf,pi/4 pi/4,-Inf Inf,pi/4 0,pi/2 + * -Inf,Inf Inf,3pi/4 3pi/4,-Inf --- --- + * Inf,finite Inf,0 0,-Inf Inf,0 0,pi/2 + * -Inf,finite Inf,pi pi,-Inf --- --- + * finite,Inf Inf,pi/2 pi/2,-Inf Inf,pi/2 0,pi/2 + */ + z = cpackl(INFINITY, INFINITY); + testall_tol(cacosh, z, cpackl(INFINITY, pi / 4), 1); + testall_tol(cacosh, -z, cpackl(INFINITY, -c3pi / 4), 1); + testall_tol(cacos, z, cpackl(pi / 4, -INFINITY), 1); + testall_tol(cacos, -z, cpackl(c3pi / 4, INFINITY), 1); + testall_odd_tol(casinh, z, cpackl(INFINITY, pi / 4), 1); + testall_odd_tol(casin, z, cpackl(pi / 4, INFINITY), 1); + testall_odd_tol(catanh, z, cpackl(0, pi / 2), 1); + testall_odd_tol(catan, z, cpackl(pi / 2, 0), 1); + + z = cpackl(INFINITY, 0.5); + /* XXX We allow a spurious inexact exception here. */ + testall(cacosh, z, cpackl(INFINITY, 0), OPT_INEXACT, 0, CS_BOTH); + testall_tol(cacosh, -z, cpackl(INFINITY, -pi), 1); + testall(cacos, z, cpackl(0, -INFINITY), OPT_INEXACT, 0, CS_BOTH); + testall_tol(cacos, -z, cpackl(pi, INFINITY), 1); + testall_odd(casinh, z, cpackl(INFINITY, 0), OPT_INEXACT, 0, CS_BOTH); + testall_odd_tol(casin, z, cpackl(pi / 2, INFINITY), 1); + testall_odd_tol(catanh, z, cpackl(0, pi / 2), 1); + testall_odd_tol(catan, z, cpackl(pi / 2, 0), 1); + + z = cpackl(0.5, INFINITY); + testall_tol(cacosh, z, cpackl(INFINITY, pi / 2), 1); + testall_tol(cacosh, -z, cpackl(INFINITY, -pi / 2), 1); + testall_tol(cacos, z, cpackl(pi / 2, -INFINITY), 1); + testall_tol(cacos, -z, cpackl(pi / 2, INFINITY), 1); + testall_odd_tol(casinh, z, cpackl(INFINITY, pi / 2), 1); + /* XXX We allow a spurious inexact exception here. */ + testall_odd(casin, z, cpackl(0.0, INFINITY), OPT_INEXACT, 0, CS_BOTH); + testall_odd_tol(catanh, z, cpackl(0, pi / 2), 1); + testall_odd_tol(catan, z, cpackl(pi / 2, 0), 1); +} + +/* Tests along the real and imaginary axes. */ +void +test_axes(void) +{ + static const long double nums[] = { + -2, -1, -0.5, 0.5, 1, 2 + }; + long double complex z; + int i; + + for (i = 0; i < sizeof(nums) / sizeof(nums[0]); i++) { + /* Real axis */ + z = cpackl(nums[i], 0.0); + if (fabs(nums[i]) <= 1) { + testall_tol(cacosh, z, cpackl(0.0, acos(nums[i])), 1); + testall_tol(cacos, z, cpackl(acosl(nums[i]), -0.0), 1); + testall_tol(casin, z, cpackl(asinl(nums[i]), 0.0), 1); + testall_tol(catanh, z, cpackl(atanh(nums[i]), 0.0), 1); + } else { + testall_tol(cacosh, z, + cpackl(acosh(fabs(nums[i])), + (nums[i] < 0) ? pi : 0), 1); + testall_tol(cacos, z, + cpackl((nums[i] < 0) ? pi : 0, + -acosh(fabs(nums[i]))), 1); + testall_tol(casin, z, + cpackl(copysign(pi / 2, nums[i]), + acosh(fabs(nums[i]))), 1); + testall_tol(catanh, z, + cpackl(atanh(1 / nums[i]), pi / 2), 1); + } + testall_tol(casinh, z, cpackl(asinh(nums[i]), 0.0), 1); + testall_tol(catan, z, cpackl(atan(nums[i]), 0), 1); + + /* TODO: Test the imaginary axis. */ + } +} + +void +test_small(void) +{ + /* + * z = 0.75 + i 0.25 + * acos(z) = Pi/4 - i ln(2)/2 + * asin(z) = Pi/4 + i ln(2)/2 + * atan(z) = atan(4)/2 + i ln(17/9)/4 + */ + static const struct { + long double a, b; + long double acos_a, acos_b; + long double asin_a, asin_b; + long double atan_a, atan_b; + } tests[] = { + { 0.75L, + 0.25L, + pi / 4, + -0.34657359027997265470861606072908828L, + pi / 4, + 0.34657359027997265470861606072908828L, + 0.66290883183401623252961960521423782L, + 0.15899719167999917436476103600701878L }, + }; + long double complex z; + int i; + + for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { + z = cpackl(tests[i].a, tests[i].b); + testall_tol(cacos, z, + cpackl(tests[i].acos_a, tests[i].acos_b), 2); + testall_odd_tol(casin, z, + cpackl(tests[i].asin_a, tests[i].asin_b), 2); + testall_odd_tol(catan, z, + cpackl(tests[i].atan_a, tests[i].atan_b), 2); + } +} + +/* Test inputs that might cause overflow in a sloppy implementation. */ +void +test_large(void) +{ + + /* TODO: Write these tests */ +} + +int +main(int argc, char *argv[]) +{ + + printf("1..6\n"); + + test_zero(); + printf("ok 1 - invctrig zero\n"); + + test_nan(); + printf("ok 2 - invctrig nan\n"); + + test_inf(); + printf("ok 3 - invctrig inf\n"); + + test_axes(); + printf("ok 4 - invctrig axes\n"); + + test_small(); + printf("ok 5 - invctrig small\n"); + + test_large(); + printf("ok 6 - invctrig large\n"); + + return (0); +} Property changes on: tools/regression/lib/msun/test-invctrig.c ___________________________________________________________________ Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 05:57:36 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 799CCC38; Tue, 28 May 2013 05:57:36 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id EB3E8A5A; Tue, 28 May 2013 05:57:35 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4S5v7R0015276 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 28 May 2013 15:57:08 +1000 Date: Tue, 28 May 2013 15:57:07 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: David Schultz Subject: Re: Use of C99 extra long double math functions after r236148 In-Reply-To: <20130528043205.GA3282@zim.MIT.EDU> Message-ID: <20130528150808.F1298@besplex.bde.org> References: <500DAD41.5030104@missouri.edu> <20120724113214.G934@besplex.bde.org> <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> <20130528043205.GA3282@zim.MIT.EDU> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=e4Ne0tV/ c=1 sm=1 a=O6A2dy7pM2IA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Zc3-fPm5GV4A:10 a=3K6gk9kpRNNbVDm7pYwA:9 a=CjuIK1q_8ugA:10 a=Wy-Xl9HimQZDeEWb:21 a=nZJVZjyGvH_yZe87:21 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 X-Mailman-Approved-At: Tue, 28 May 2013 11:40:18 +0000 Cc: Diane Bruce , Bruce Evans , John Baldwin , David Chisnall , Stephen Montgomery-Smith , freebsd-numerics@freebsd.org, Bruce Evans , Steve Kargl , Peter Jeremy , Warner Losh X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 05:57:36 -0000 On Mon, 27 May 2013, David Schultz wrote: > I wrote some tests to cover the corner cases for the complex > inverse trig functions. They don't find any nontrivial bugs in > your implementations. :-) Now that you have a commit bit, would > you like to commit your code, or shall I? > > Below is a diff of all the changes needed to integrate it. I have > a short list of style fixes, but otherwise I think what you have > is good: > - wrap lines to 80 chars, please > - spaces between operators > - "static inline", not "inline static" > - don't use "inline" on large functions indent(1) fixes the spaces between operators fairly well, without finding many other problems or adding many. It didn't find [m]any long lines (but it doesn't understand its own line length limit). Here are my local patches. Just a few that were not integrated by Stephen after we stopped working on it last October. @ diff -u2 catrigf.c~ catrigf.c @ --- catrigf.c~ 2012-09-22 21:13:50.000000000 +0000 @ +++ catrigf.c 2012-09-22 21:35:51.287614000 +0000 @ @@ -353,12 +353,7 @@ @ } @ @ - if (ax == 1 && ay < FLT_EPSILON) { @ -#if 0 @ - if (ay > 2*FLT_MIN) @ - rx = - logf(ay/2) / 2; @ - else @ -#endif @ - rx = - (logf(ay) - m_ln2) / 2; @ - } else @ + if (ax == 1 && ay < FLT_EPSILON) @ + rx = - (logf(ay) - m_ln2) / 2; @ + else @ rx = log1pf(4*ax / sum_squares(ax-1, ay)) / 4; @ This is in catrig.c, but catrigf.c wasn't regenerated from catrig.c, and the scripts for the generation and their support file are no longer in stephen's public_html directory. @ diff -u2 catrigl.c~ catrigl.c @ --- catrigl.c~ 2012-09-22 21:14:24.000000000 +0000 @ +++ catrigl.c 2013-05-26 08:46:10.423187000 +0000 @ @@ -50,4 +50,6 @@ @ #define signbit(x) (__builtin_signbitl(x)) @ @ +long double atanhl(long double); @ + @ static const long double @ A_crossover = 10, catrigl.c depends on atanhl(), logl() and log1pl() existing. Stephen has a not-very-dummy version of s_atanhl.c in this public_html directory. This needs a more direct conversion from the fdlibm e_atanhl.c to be of commit quality. I recently started testing with it, and use my own logl(). Previously this patch had to change the atanhl() call to atanh() to for catrigl.c to be usable. I haven't tested the long double complex functions for anything except efficiency and consistency with the plain double complex functions yet, so my tests don't should any difference from switching to atanhl(). They just show that atanhl() is consistent in its limited use in catrigl.c. I also haven't tested atanhl() as a real function. Strangely, catrigl.c gives complex acoshl() and asinhl() without needing real acoshl() and asinhl(). The real inverse hyperbolic trig functions seem to be just as easy as the real inverse trig functions, but you only converted the latter from the fdlibm versions to create the long double versions. Hopefully they are all as easy to translate e_atanhl.c. @ @@ -60,6 +62,6 @@ @ #if LDBL_MANT_DIG == 64 @ static const union IEEEl2bits @ -um_e = LD80C(0xadf85458a2bb4a9b, 1, 0, 2.71828182845904523536e0L), @ -um_ln2 = LD80C(0xb17217f7d1cf79ac, -1, 0, 6.93147180559945309417e-1L); @ +um_e = LD80C(0xadf85458a2bb4a9b, 1, 2.71828182845904523536e+0L), @ +um_ln2 = LD80C(0xb17217f7d1cf79ac, -1, 6.93147180559945309417e-1L); @ #define m_e um_e.e @ #define m_ln2 um_ln2.e Keep up with API changes. @ @@ -348,5 +350,5 @@ @ @ if (y == 0 && ax <= 1) @ - return (cpackl(atanhl(x), y)); /* XXX need atanhl() */ @ + return (cpackl(atanh(x), y)); /* XXX need atanhl() */ @ @ if (x == 0) The comment doesn't apply if this file is actually usable. Don't forget to remove it before committing. @ @@ -369,12 +371,7 @@ @ } @ @ - if (ax == 1 && ay < LDBL_EPSILON) { @ -#if 0 @ - if (ay > 2*LDBL_MIN) @ - rx = - logl(ay/2) / 2; @ - else @ -#endif @ - rx = - (logl(ay) - m_ln2) / 2; @ - } else @ + if (ax == 1 && ay < LDBL_EPSILON) @ + rx = - (logl(ay) - m_ln2) / 2; @ + else @ rx = log1pl(4*ax / sum_squares(ax-1, ay)) / 4; @ Should be obtained by regeneration, as for catrigf.c. Back to your changes... They mostly look good, as usual... % Index: tools/regression/lib/msun/test-invctrig.c % =================================================================== % --- tools/regression/lib/msun/test-invctrig.c (revision 0) % +++ tools/regression/lib/msun/test-invctrig.c (working copy) % @@ -0,0 +1,467 @@ % .... % +#pragma STDC FENV_ACCESS ON % +#pragma STDC CX_LIMITED_RANGE OFF Heheh, style rules for #pragma. I like the old rule which says that it should be indented 6 feet under. It is still almost useless, since we don't even have any C99 compilers than implement the fenv pragmas yet. % +/* % + * XXX gcc implements complex multiplication incorrectly. In % + * particular, it implements it as if the CX_LIMITED_RANGE pragma % + * were ON. Consequently, we need this function to form numbers % + * such as x + INFINITY * I, since gcc evalutes INFINITY * I as % + * NaN + INFINITY * I. % + */ % +static inline long double complex % +cpackl(long double x, long double y) % +{ % + long double complex z; % + % + __real__ z = x; % + __imag__ z = y; % + return (z); % +} Why duplicate this? I guess it is because math_private,h is hard to include. I use complicated conditionals (mostly switches on $(uname -p) and $(hostname) in shell scripts to locate it when compiling from external directories. The tests seem to be compiled with -O0. That tests a different environment than the usual runtime one, and in particular misses seeing most precision bugs. I mostly test with -O (-O2 with gcc is slower and even harder to debug, while with clang it makes little difference), but switch to -O0 to debug. -g -O is now almost unusable because -O optimizes away dead variables and -g is broken in many cases (sometimes it can't even show live variables). Bruce From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 06:14:47 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CEB00104; Tue, 28 May 2013 06:14:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx08.syd.optusnet.com.au (fallbackmx08.syd.optusnet.com.au [211.29.132.10]) by mx1.freebsd.org (Postfix) with ESMTP id 63285AF7; Tue, 28 May 2013 06:14:46 +0000 (UTC) Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au [211.29.132.191]) by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4S6EWp4011570; Tue, 28 May 2013 16:14:32 +1000 Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4S6EDZX013018 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 28 May 2013 16:14:14 +1000 Date: Tue, 28 May 2013 16:14:13 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: David Schultz Subject: Re: Use of C99 extra long double math functions after r236148 In-Reply-To: <20130528043205.GA3282@zim.MIT.EDU> Message-ID: <20130528155933.V1298@besplex.bde.org> References: <500DAD41.5030104@missouri.edu> <20120724113214.G934@besplex.bde.org> <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> <20130528043205.GA3282@zim.MIT.EDU> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=e/de0tV/ c=1 sm=1 a=O6A2dy7pM2IA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Zc3-fPm5GV4A:10 a=MGBSo3QMWewO758DKTcA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 X-Mailman-Approved-At: Tue, 28 May 2013 11:40:29 +0000 Cc: Diane Bruce , Bruce Evans , John Baldwin , David Chisnall , Stephen Montgomery-Smith , freebsd-numerics@freebsd.org, Bruce Evans , Steve Kargl , Peter Jeremy , Warner Losh X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 06:14:47 -0000 On Mon, 27 May 2013, David Schultz wrote: > ... > Below is a diff of all the changes needed to integrate it. I have > a short list of style fixes, but otherwise I think what you have > is good: > - wrap lines to 80 chars, please > - spaces between operators > - "static inline", not "inline static" > - don't use "inline" on large functions Another reply. I think I tested "inline" on the large functions (just 2) and found it useful for efficiency. This is like inline on large trig support functions being useful. The inline parts are duplicated once per C99-API function, and often the caller only uses on C99-API function. Actually, the large inlines are not duplicated that much. cacosh() and casinh() are just wrappers that call cacos() and casin(), respectively. There is no inlining for the last 2 (even larger) functions. The overhead for the wrappers is noticeable, but more inlining didn't seem to reduce it much. More investigation of the extent of the style bugs: - only 1 line is longer than 80 columns now and easy to fix. Other long lines are for declarations where I prefer to keep the long comments on the same line - spaces between operations will expand a few lines beyond 80 columns if done blindly. Only a few. Bruce From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 08:12:33 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 15B90B3F; Tue, 28 May 2013 08:12:33 +0000 (UTC) (envelope-from das@freebsd.org) Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net [50.196.151.174]) by mx1.freebsd.org (Postfix) with ESMTP id DA5751AB; Tue, 28 May 2013 08:12:32 +0000 (UTC) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r4S8CECE013842; Tue, 28 May 2013 01:12:14 -0700 (PDT) (envelope-from das@freebsd.org) Received: (from das@localhost) by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r4S8CCcU013841; Tue, 28 May 2013 01:12:12 -0700 (PDT) (envelope-from das@freebsd.org) Date: Tue, 28 May 2013 01:12:12 -0700 From: David Schultz To: Bruce Evans Subject: Re: Use of C99 extra long double math functions after r236148 Message-ID: <20130528081212.GA13594@zim.MIT.EDU> References: <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> <20130528043205.GA3282@zim.MIT.EDU> <20130528150808.F1298@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130528150808.F1298@besplex.bde.org> X-Mailman-Approved-At: Tue, 28 May 2013 11:40:43 +0000 Cc: Diane Bruce , John Baldwin , David Chisnall , Stephen Montgomery-Smith , freebsd-numerics@freebsd.org, Bruce Evans , Steve Kargl , Peter Jeremy , Warner Losh X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 08:12:33 -0000 On Tue, May 28, 2013, Bruce Evans wrote: > @ diff -u2 catrigl.c~ catrigl.c > @ --- catrigl.c~ 2012-09-22 21:14:24.000000000 +0000 > @ +++ catrigl.c 2013-05-26 08:46:10.423187000 +0000 > @ @@ -50,4 +50,6 @@ > @ #define signbit(x) (__builtin_signbitl(x)) > @ > @ +long double atanhl(long double); > @ + > @ static const long double > @ A_crossover = 10, > > catrigl.c depends on atanhl(), logl() and log1pl() existing. Yep, I'm ignoring the complex long double functions until the real long double functions are done. I'm hoping that won't be too long! > % Index: tools/regression/lib/msun/test-invctrig.c > % =================================================================== > % --- tools/regression/lib/msun/test-invctrig.c (revision 0) > % +++ tools/regression/lib/msun/test-invctrig.c (working copy) > % @@ -0,0 +1,467 @@ > % .... > % +#pragma STDC FENV_ACCESS ON > % +#pragma STDC CX_LIMITED_RANGE OFF > > Heheh, style rules for #pragma. I like the old rule which says that > it should be indented 6 feet under. It is still almost useless, since > we don't even have any C99 compilers than implement the fenv pragmas > yet. They are mostly just there to document the fact that this code is expecting FENV_ACCESS to work. Clang, adding insult to injury, generates a warning about these. I don't think they're going to implement the missing C99 features soon. Many bugs have been filed about the issue, but I haven't heard of any progress. When I asked years ago, I was basically told that the LLVM IR can't support the feature without substantial modifications. > % + * XXX gcc implements complex multiplication incorrectly. In > % + * particular, it implements it as if the CX_LIMITED_RANGE pragma > % + * were ON. Consequently, we need this function to form numbers > % + * such as x + INFINITY * I, since gcc evalutes INFINITY * I as > % + * NaN + INFINITY * I. > % + */ > % +static inline long double complex > % +cpackl(long double x, long double y) > % +{ > % + long double complex z; > % + > % + __real__ z = x; > % + __imag__ z = y; > % + return (z); > % +} > > Why duplicate this? I guess it is because math_private,h is hard to > include. I use complicated conditionals (mostly switches on > $(uname -p) and $(hostname) in shell scripts to locate it when > compiling from external directories. I will change to CMPLXL, now that CMPLXL has been committed. Thanks for reminding me. The ability to use complex numbers in initializers is nice (ignore whitespace munging due to cut/paste): static const struct { complex long double z; complex long double acos_z; complex long double asin_z; complex long double atan_z; } tests[] = { { CMPLXL(0.75L, 0.25L), CMPLXL(pi / 4, -0.34657359027997265470861606072908828L), CMPLXL(pi / 4, 0.34657359027997265470861606072908828L), CMPLXL(0.66290883183401623252961960521423782L, 0.15899719167999917436476103600701878L) }, }; int i; for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { testall_tol(cacos, tests[i].z, tests[i].acos_z, 2); testall_odd_tol(casin, tests[i].z, tests[i].asin_z, 2); testall_odd_tol(catan, tests[i].z, tests[i].atan_z, 2); } A few more tests would be good (e.g., large inputs, parts of the range that are close to an axis or discontinuity), but I ran out of time. > The tests seem to be compiled with -O0. That tests a different > environment than the usual runtime one, and in particular misses seeing > most precision bugs. I mostly test with -O (-O2 with gcc is slower > and even harder to debug, while with clang it makes little difference), > but switch to -O0 to debug. -g -O is now almost unusable because -O > optimizes away dead variables and -g is broken in many cases (sometimes > it can't even show live variables). I want the tests to come as close as possible to testing the behavior that real programs will see. Unfortunately, any test that exercises different rounding modes or looks at floating-point exceptions is pretty much doomed to fail with gcc and clang, so I gave up. (Sometimes I wonder if there's any point in having a free library that supports them if you need a commercial compiler to take advantage.) However, the tests do sometimes uncover compiler bugs that get fixed. They caught a few bugs in gcc builtins, and an arithmetic bug in clang's constant-folding code, all of which were fixed. From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 08:19:35 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 58AC3BD0; Tue, 28 May 2013 08:19:35 +0000 (UTC) (envelope-from das@freebsd.org) Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net [50.196.151.174]) by mx1.freebsd.org (Postfix) with ESMTP id 1F3DE1EC; Tue, 28 May 2013 08:19:34 +0000 (UTC) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r4S8JMeg013858; Tue, 28 May 2013 01:19:23 -0700 (PDT) (envelope-from das@freebsd.org) Received: (from das@localhost) by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r4S8JLAo013857; Tue, 28 May 2013 01:19:21 -0700 (PDT) (envelope-from das@freebsd.org) Date: Tue, 28 May 2013 01:19:21 -0700 From: David Schultz To: Bruce Evans Subject: Re: Use of C99 extra long double math functions after r236148 Message-ID: <20130528081921.GB13594@zim.MIT.EDU> References: <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> <20130528043205.GA3282@zim.MIT.EDU> <20130528155933.V1298@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130528155933.V1298@besplex.bde.org> X-Mailman-Approved-At: Tue, 28 May 2013 11:40:53 +0000 Cc: Diane Bruce , John Baldwin , David Chisnall , Stephen Montgomery-Smith , freebsd-numerics@freebsd.org, Bruce Evans , Steve Kargl , Peter Jeremy , Warner Losh X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 08:19:35 -0000 On Tue, May 28, 2013, Bruce Evans wrote: > On Mon, 27 May 2013, David Schultz wrote: > > > ... > > Below is a diff of all the changes needed to integrate it. I have > > a short list of style fixes, but otherwise I think what you have > > is good: > > - wrap lines to 80 chars, please > > - spaces between operators > > - "static inline", not "inline static" > > - don't use "inline" on large functions > > Another reply. > > I think I tested "inline" on the large functions (just 2) and found > it useful for efficiency. This is like inline on large trig support > functions being useful. The inline parts are duplicated once per > C99-API function, and often the caller only uses on C99-API function. > Actually, the large inlines are not duplicated that much. cacosh() > and casinh() are just wrappers that call cacos() and casin(), > respectively. There is no inlining for the last 2 (even larger) > functions. The overhead for the wrappers is noticeable, but more > inlining didn't seem to reduce it much. > > More investigation of the extent of the style bugs: > - only 1 line is longer than 80 columns now and easy to fix. Other long > lines are for declarations where I prefer to keep the long comments > on the same line > - spaces between operations will expand a few lines beyond 80 columns if > done blindly. Only a few. If you did benchmarks to show that using inline is worthwhile despite the cache pressure, then it's fine with me. I had assumed that it was added without much thought. Also, people have been asking for someone to commit this for a long time, so I'm not going to split hairs over the spacing. From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 10:48:11 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5845182D; Tue, 28 May 2013 10:48:11 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id B88B6D27; Tue, 28 May 2013 10:48:10 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4SAlj87005958 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 28 May 2013 20:47:56 +1000 Date: Tue, 28 May 2013 20:47:45 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: David Schultz Subject: Re: Use of C99 extra long double math functions after r236148 In-Reply-To: <20130528081212.GA13594@zim.MIT.EDU> Message-ID: <20130528195733.Q2294@besplex.bde.org> References: <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> <20130528043205.GA3282@zim.MIT.EDU> <20130528150808.F1298@besplex.bde.org> <20130528081212.GA13594@zim.MIT.EDU> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=BPvrNysG c=1 sm=1 a=O6A2dy7pM2IA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Zc3-fPm5GV4A:10 a=hG9Faytz-pyrK-G3USYA:9 a=CjuIK1q_8ugA:10 a=QPu_LqNFptFJU9lF:21 a=io-CCv-q2jcpwO-C:21 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 X-Mailman-Approved-At: Tue, 28 May 2013 11:41:08 +0000 Cc: Diane Bruce , Bruce Evans , John Baldwin , David Chisnall , Stephen Montgomery-Smith , freebsd-numerics@FreeBSD.org, Steve Kargl , Peter Jeremy , Warner Losh X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 10:48:11 -0000 On Tue, 28 May 2013, David Schultz wrote: > On Tue, May 28, 2013, Bruce Evans wrote: >> @ diff -u2 catrigl.c~ catrigl.c >> @ --- catrigl.c~ 2012-09-22 21:14:24.000000000 +0000 >> @ +++ catrigl.c 2013-05-26 08:46:10.423187000 +0000 >> @ @@ -50,4 +50,6 @@ >> @ #define signbit(x) (__builtin_signbitl(x)) >> @ >> @ +long double atanhl(long double); >> @ + >> @ static const long double >> @ A_crossover = 10, >> >> catrigl.c depends on atanhl(), logl() and log1pl() existing. > > Yep, I'm ignoring the complex long double functions until the real > long double functions are done. I'm hoping that won't be too long! As usual, you can find my current versions in ~bde/msun/src/zztest/s_log*.c, ~bde/msun/src/zztest/ld128/s_logl.c, and ~bde/msun/src/zztest/cplex.c (clog*). Lots of macros in ~bde/msun/src/zztest/math_private.h are also needed. The header needs more cleaning than the C files, but you can easily extract the parts needed. >> % Index: tools/regression/lib/msun/test-invctrig.c >> % =================================================================== >> % --- tools/regression/lib/msun/test-invctrig.c (revision 0) >> % +++ tools/regression/lib/msun/test-invctrig.c (working copy) >> % @@ -0,0 +1,467 @@ >> % .... > >> % + * XXX gcc implements complex multiplication incorrectly. In >> % + * particular, it implements it as if the CX_LIMITED_RANGE pragma >> % + * were ON. Consequently, we need this function to form numbers >> % + * such as x + INFINITY * I, since gcc evalutes INFINITY * I as >> % + * NaN + INFINITY * I. >> % + */ >> % +static inline long double complex >> % +cpackl(long double x, long double y) >> % +{ >> % + long double complex z; >> % + >> % + __real__ z = x; >> % + __imag__ z = y; >> % + return (z); >> % +} >> >> Why duplicate this? I guess it is because math_private,h is hard to >> include. I use complicated conditionals (mostly switches on >> $(uname -p) and $(hostname) in shell scripts to locate it when >> compiling from external directories. > > I will change to CMPLXL, now that CMPLXL has been committed. > Thanks for reminding me. That won't be very portable. I already need ifdefs and extra code in math_private.h to restore the old version that works with old versions of gcc. > The ability to use complex numbers in > initializers is nice (ignore whitespace munging due to cut/paste): > > static const struct { > complex long double z; > complex long double acos_z; > complex long double asin_z; > complex long double atan_z; > } tests[] = { > { CMPLXL(0.75L, 0.25L), > CMPLXL(pi / 4, -0.34657359027997265470861606072908828L), > CMPLXL(pi / 4, 0.34657359027997265470861606072908828L), > CMPLXL(0.66290883183401623252961960521423782L, > 0.15899719167999917436476103600701878L) }, > }; I think you mean "nasty" :-). Simply x + I * y seems to work correctly with the following compilers on amd64: gcc-2.95.4, gcc-3.3.3, gcc-3.4.6, gcc-4.2.1, clang 3.3. But you cannot use either x + I * y or CMPLXL() with literals for for for long doubles, since on i386 most of the gcc's will round the long doubles to 53 bits, so you must use LD80C() for most long double constants, and LD80C() won't work inside either x + I * y or CMPLXL(). I didn't test this with exactly the above. Untested conversion of it: { 0.75L + I * 0.25L, pi / 4 + I * -0.34657359027997265470861606072908828L, pi / 4 + I * 0.34657359027997265470861606072908828L, 0.66290883183401623252961960521423782L + I * 0.15899719167999917436476103600701878L, }, Is pi a variable, and/or does CMPLXL() work with variables in static initializers? Non-static initializers and CMPLXL() can be used on variables constructed using LD80C(). Now gcc-3.3.3 generates horrible code for a runtime evaluation and probably causes overflow bugs for exceptional args (the ones that we invented cpack*() to avoid). gcc-4.2.1 generates good code. The freebsd cluster seems to have crashed while I was writing this, so I don't have access to the other compilers. >> The tests seem to be compiled with -O0. That tests a different >> environment than the usual runtime one, and in particular misses seeing >> most precision bugs. I mostly test with -O (-O2 with gcc is slower >> and even harder to debug, while with clang it makes little difference), >> but switch to -O0 to debug. -g -O is now almost unusable because -O >> optimizes away dead variables and -g is broken in many cases (sometimes >> it can't even show live variables). > > I want the tests to come as close as possible to testing the > behavior that real programs will see. Unfortunately, any test that > exercises different rounding modes or looks at floating-point > exceptions is pretty much doomed to fail with gcc and clang, so I > gave up. (Sometimes I wonder if there's any point in having a free > library that supports them if you need a commercial compiler to > take advantage.) However, the tests do sometimes uncover compiler > bugs that get fixed. They caught a few bugs in gcc builtins, and > an arithmetic bug in clang's constant-folding code, all of which > were fixed. But doesn't using -O0 give the opposite of that? The library is closer to working than tests and real programs since it is relatively careful and the compiler problems usually don't have much effect (since wrong rounding by the compiler tends to show up as errors of >= 1 ulp and gets fixed). Bruce From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 11:12:12 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 01C31D5B; Tue, 28 May 2013 11:12:12 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 7D5DAE92; Tue, 28 May 2013 11:12:11 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 5F0D1D405D6; Tue, 28 May 2013 21:12:00 +1000 (EST) Date: Tue, 28 May 2013 21:12:00 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: David Schultz Subject: Re: Use of C99 extra long double math functions after r236148 In-Reply-To: <20130528081921.GB13594@zim.MIT.EDU> Message-ID: <20130528205441.U2294@besplex.bde.org> References: <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> <20130528043205.GA3282@zim.MIT.EDU> <20130528155933.V1298@besplex.bde.org> <20130528081921.GB13594@zim.MIT.EDU> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=BPvrNysG c=1 sm=1 a=O6A2dy7pM2IA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Zc3-fPm5GV4A:10 a=SKYg3Y9sK9o-Tfi3u28A:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 X-Mailman-Approved-At: Tue, 28 May 2013 11:41:17 +0000 Cc: Diane Bruce , Bruce Evans , John Baldwin , David Chisnall , Stephen Montgomery-Smith , freebsd-numerics@FreeBSD.org, Steve Kargl , Peter Jeremy , Warner Losh X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 11:12:12 -0000 On Tue, 28 May 2013, David Schultz wrote: > On Tue, May 28, 2013, Bruce Evans wrote: >> >> I think I tested "inline" on the large functions (just 2) and found >> it useful for efficiency. This is like inline on large trig support >> functions being useful. The inline parts are duplicated once per >> C99-API function, and often the caller only uses on C99-API function. >> Actually, the large inlines are not duplicated that much. cacosh() >> and casinh() are just wrappers that call cacos() and casin(), >> respectively. There is no inlining for the last 2 (even larger) >> functions. The overhead for the wrappers is noticeable, but more >> inlining didn't seem to reduce it much. > > If you did benchmarks to show that using inline is worthwhile > despite the cache pressure, then it's fine with me. I had assumed > that it was added without much thought. I retested. Inlining the big function do_hard_work() helps for gcc on amd64 (about 5% faster), but makes no significant difference for clang. The previous testing was mostly with gcc. Bruce From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 11:55:44 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C3D43919; Tue, 28 May 2013 11:55:44 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from theravensnest.org (theraven.freebsd.your.org [216.14.102.27]) by mx1.freebsd.org (Postfix) with ESMTP id 9438F238; Tue, 28 May 2013 11:55:44 +0000 (UTC) Received: from c120.sec.cl.cam.ac.uk (c120.sec.cl.cam.ac.uk [128.232.18.120]) (authenticated bits=0) by theravensnest.org (8.14.5/8.14.5) with ESMTP id r4SBtccK042149 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 28 May 2013 11:55:39 GMT (envelope-from theraven@FreeBSD.org) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: Use of C99 extra long double math functions after r236148 From: David Chisnall In-Reply-To: <20130528205441.U2294@besplex.bde.org> Date: Tue, 28 May 2013 12:55:34 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> <20130528043205.GA3282@zim.MIT.EDU> <20130528155933.V1298@besplex.bde.org> <20130528081921.GB13594@zim.MIT.EDU> <20130528205441.U2294@besplex.bde.org> To: Bruce Evans X-Mailer: Apple Mail (2.1503) Cc: Diane Bruce , John Baldwin , Stephen Montgomery-Smith , freebsd-numerics@FreeBSD.org, Steve Kargl , David Schultz , Peter Jeremy , Warner Losh X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 11:55:44 -0000 On 28 May 2013, at 12:12, Bruce Evans wrote: > Inlining the big function do_hard_work() helps for gcc on > amd64 (about 5% faster), but makes no significant difference for = clang. > The previous testing was mostly with gcc. How are you inlining? With the C99 inline keyword, which changes the = linkage type but only provides and advisory hint to the compiler with = regard to inlining (which, in a modern compiler, is largely ignored), or = with the always_inline attribute, which forces the compiler to inline = the function? David From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 12:03:19 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F1802DBA for ; Tue, 28 May 2013 12:03:19 +0000 (UTC) (envelope-from s.montgomerysmith@gmail.com) Received: from mail-ie0-x22a.google.com (mail-ie0-x22a.google.com [IPv6:2607:f8b0:4001:c03::22a]) by mx1.freebsd.org (Postfix) with ESMTP id C1F0C2E4 for ; Tue, 28 May 2013 12:03:19 +0000 (UTC) Received: by mail-ie0-f170.google.com with SMTP id e14so2506268iej.1 for ; Tue, 28 May 2013 05:03:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; bh=1GPUFncYViNScC5uzVOcidJV1nmflw0qt/mpmKv7Z5A=; b=V6eD6lOiPlvkcvoNF32C4EVSufDHqQq/h/78G91MOhkZOlXJnHLVrZIS2VdftYiwmU 3bvqIxoQQ7idgdEcV0rS5B/6dheL2/fDZvTmWvQAshlvMdCigCqLWn8dqihPUDdszDmt Bol2qynaW8S4ff6dK/UjxsMoe/5HMiV/oZ2wHeO/2vTvsb0ta45zABoYYX5G82Sqz+Uo WWinhQDnQ1eEUIIrS4N+Y2451ww87dybHGkh/jXci6v1dzfc8cqIzK5Z0ykobi2cgkgz oQ0UHP1J9qfQZP5G6ffbk1O/JqBeVwc9RGpwf+thPM1UlAj4EInC5+zf39DZXOIQUU+0 g7Zg== X-Received: by 10.42.196.138 with SMTP id eg10mr19096254icb.5.1369742599562; Tue, 28 May 2013 05:03:19 -0700 (PDT) Received: from [192.168.0.11] (50-82-246-58.client.mchsi.com. [50.82.246.58]) by mx.google.com with ESMTPSA id gz1sm5147957igb.5.2013.05.28.05.03.17 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 28 May 2013 05:03:18 -0700 (PDT) Sender: Stephen Montgomery-Smith Message-ID: <51A49D04.5050409@missouri.edu> Date: Tue, 28 May 2013 07:03:16 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org Subject: Re: Use of C99 extra long double math functions after r236148 References: <500DAD41.5030104@missouri.edu> <20120724113214.G934@besplex.bde.org> <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> <20130528043205.GA3282@zim.MIT.EDU> <51A49A40.3040505@missouri.edu> In-Reply-To: <51A49A40.3040505@missouri.edu> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 12:03:20 -0000 On 05/28/2013 06:51 AM, Stephen Montgomery-Smith wrote: > On 05/27/2013 11:32 PM, David Schultz wrote: > >> Hi Stephen, >> >> I wrote some tests to cover the corner cases for the complex >> inverse trig functions. They don't find any nontrivial bugs in >> your implementations. :-) Now that you have a commit bit, would >> you like to commit your code, or shall I? > > I think I only have a commit bit for ports, not src. > > In any case, I would much prefer that you commit it. I have a lot on my > plate right now. > > Thank you for doing this. It would be great to see this in FreeBSD. > Also, if I can brag a little, I think the only other implementation of the complex arc-trig functions that is as accurate are the most recent boost library implementations, and then only because I submitted bug fixes to them. I also found a bug in the Hull, Fairgrieve, and Tang algorithm for cacos/cacaosh, which was faulty in certain extreme cases. This bug is documented here: https://svn.boost.org/trac/boost/ticket/7290 From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 11:51:34 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 175068B3; Tue, 28 May 2013 11:51:34 +0000 (UTC) (envelope-from s.montgomerysmith@gmail.com) Received: from mail-ie0-x22a.google.com (mail-ie0-x22a.google.com [IPv6:2607:f8b0:4001:c03::22a]) by mx1.freebsd.org (Postfix) with ESMTP id B26AB211; Tue, 28 May 2013 11:51:33 +0000 (UTC) Received: by mail-ie0-f170.google.com with SMTP id e14so2475756iej.1 for ; Tue, 28 May 2013 04:51:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; bh=H+/7e2OzqUy//muIYaUjn6p9QEFLHwI+uyPNvRXjU2g=; b=aZElcC2MFEmydK12ja/lnqL5/5UvzHTAIu06iAlJT8xhH7dvs28S8OxIuvica4m0wD tk4vb/7fzNf8IEKIAZQ82E9a6AkEnEDB6IpMrvGyc2yDm2Pgn5ojAFZXjPokMgn1nX81 nn0ic/3IbghHnqGcruZhAUw0jmbV8/KSnzkCouhmBtcoMSO9F49SuLGQotBbxntiTKER HTVpwJUjrLELfmF5QIhmmqwOeU3qx85GOBt6o++BHy4x7Qyes6fvOutx6szwDzCz19sT DTl74Ln/P4ZCxkAPm/sMe42eF6SuButH555s/x5yR7U8UUl29eKfFUwcTprkQrulMJEp 4OqQ== X-Received: by 10.42.196.138 with SMTP id eg10mr19076648icb.5.1369741892821; Tue, 28 May 2013 04:51:32 -0700 (PDT) Received: from [192.168.0.11] (50-82-246-58.client.mchsi.com. [50.82.246.58]) by mx.google.com with ESMTPSA id 9sm17646992igy.7.2013.05.28.04.51.29 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 28 May 2013 04:51:31 -0700 (PDT) Sender: Stephen Montgomery-Smith Message-ID: <51A49A40.3040505@missouri.edu> Date: Tue, 28 May 2013 06:51:28 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: David Schultz Subject: Re: Use of C99 extra long double math functions after r236148 References: <500DAD41.5030104@missouri.edu> <20120724113214.G934@besplex.bde.org> <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> <20130528043205.GA3282@zim.MIT.EDU> In-Reply-To: <20130528043205.GA3282@zim.MIT.EDU> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Tue, 28 May 2013 12:07:52 +0000 Cc: Diane Bruce , Bruce Evans , John Baldwin , David Chisnall , freebsd-numerics@freebsd.org, Bruce Evans , Steve Kargl , Peter Jeremy , Warner Losh X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 11:51:34 -0000 On 05/27/2013 11:32 PM, David Schultz wrote: > Hi Stephen, > > I wrote some tests to cover the corner cases for the complex > inverse trig functions. They don't find any nontrivial bugs in > your implementations. :-) Now that you have a commit bit, would > you like to commit your code, or shall I? I think I only have a commit bit for ports, not src. In any case, I would much prefer that you commit it. I have a lot on my plate right now. Thank you for doing this. It would be great to see this in FreeBSD. From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 13:03:13 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A5D60ADE; Tue, 28 May 2013 13:03:13 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 65C2E8A2; Tue, 28 May 2013 13:03:13 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 7D79C104139F; Tue, 28 May 2013 22:44:57 +1000 (EST) Date: Tue, 28 May 2013 22:44:22 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: David Chisnall Subject: Re: Use of C99 extra long double math functions after r236148 In-Reply-To: Message-ID: <20130528222541.N2926@besplex.bde.org> References: <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> <20130528043205.GA3282@zim.MIT.EDU> <20130528155933.V1298@besplex.bde.org> <20130528081921.GB13594@zim.MIT.EDU> <20130528205441.U2294@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=e/de0tV/ c=1 sm=1 a=O6A2dy7pM2IA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Zc3-fPm5GV4A:10 a=jpvodMJeLT64p2G4esEA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: Diane Bruce , John Baldwin , Stephen Montgomery-Smith , freebsd-numerics@freebsd.org, Steve Kargl , David Schultz , Peter Jeremy , Warner Losh X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 13:03:13 -0000 On Tue, 28 May 2013, David Chisnall wrote: > On 28 May 2013, at 12:12, Bruce Evans wrote: > >> Inlining the big function do_hard_work() helps for gcc on >> amd64 (about 5% faster), but makes no significant difference for clang. >> The previous testing was mostly with gcc. > > How are you inlining? With the C99 inline keyword, which changes the linkage type but only provides and advisory hint to the compiler with regard to inlining (which, in a modern compiler, is largely ignored), or with the always_inline attribute, which forces the compiler to inline the function? Only static inlining in catrig*.c. All compilers follow its hints there. libm sometimes uses static __always_inline instead of static inline elsewhere (but mostly not). Bruce From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 17:22:43 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 27148BB4 for ; Tue, 28 May 2013 17:22:43 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id E6E2BA07 for ; Tue, 28 May 2013 17:22:42 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4SHMgDf051541 for ; Tue, 28 May 2013 10:22:42 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4SHMghZ051540 for freebsd-numerics@freebsd.org; Tue, 28 May 2013 10:22:42 -0700 (PDT) (envelope-from sgk) Date: Tue, 28 May 2013 10:22:42 -0700 From: Steve Kargl To: freebsd-numerics@freebsd.org Subject: Patches for s_expl.c Message-ID: <20130528172242.GA51485@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 17:22:43 -0000 Here are two patches for ld80/s_expl.c and ld128/s_expl.c. Instead of committing the one large patch that I have spent hours testing, I have split it into two. One patch fixes/updates expl(). The other patch is the implementation of expm1l(). My commit messages will be: Patch 1: ld80/s_expl.c: * Use the LOG2_INTERVALS macro instead of hardcoding 7. * Use LD80C to set overflow and underflow thresholds, and then use #defines to access the .e component to reduce diffs with ld128 version. * Rename polynomial coefficients P# to A#, which is used in Tang. * Remove the use of intermediate results t23 and t45. * Micro-optimization: remove access to u.xbits.man. * Fix an off-by-one in the underflow case. * Replace a factor the long double constant 2.0L by the integer 2. Let the compiler to the conversion. ld128/s_expl.c: * Adjust Copyright years to reflect when bits of the code were actually written. * Reduce diff between the ld80 and ld128 versions. Patch 2: ld80/s_expl.c: * Compute expm1l(x) for Intel 80-bit format. ld128/s_expl.c: * Compute expm1l(x) for IEEE 754 128-bit format. These are based on: PTP Tang, "Table-driven implementation of the Expm1 function in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18, 211-222 (1992). These commit logs may be too terse for some, but quite frankly after 2 or 3 years of submitting and resubmitting diffs, I've forgotten why some changes have or have not been made. expm1l() resides in s_expl.c because she shares the same table, polynomial coefficients, and some numerical constants with expl(). -- Steve Patch 1: Index: ld80/s_expl.c =================================================================== --- ld80/s_expl.c (revision 251062) +++ ld80/s_expl.c (working copy) @@ -50,6 +50,7 @@ #include "math_private.h" #define INTERVALS 128 +#define LOG2_INTERVALS 7 #define BIAS (LDBL_MAX_EXP - 1) static const long double @@ -60,9 +61,12 @@ static const union IEEEl2bits /* log(2**16384 - 0.5) rounded towards zero: */ -o_threshold = LD80C(0xb17217f7d1cf79ab, 13, 11356.5234062941439488L), +/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */ +o_thresholdu = LD80C(0xb17217f7d1cf79ab, 13, 11356.5234062941439488L), +#define o_threshold (o_thresholdu.e) /* log(2**(-16381-64-1)) rounded towards zero: */ -u_threshold = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L); +u_thresholdu = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L); +#define u_threshold (u_thresholdu.e) static const double /* @@ -78,11 +82,11 @@ * |exp(x) - p(x)| < 2**-77.2 * (0.002708 is ln2/(2*INTERVALS) rounded up a little). */ -P2 = 0.5, -P3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ -P4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ -P5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ -P6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ +A2 = 0.5, +A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ +A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ +A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ +A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ /* * 2^(i/INTERVALS) for i in [0,INTERVALS] is represented by two values where @@ -232,7 +236,8 @@ expl(long double x) { union IEEEl2bits u, v; - long double fn, q, r, r1, r2, t, t23, t45, twopk, twopkp10000, z; + long double fn, q, r, r1, r2, t, twopk, twopkp10000; + long double z; int k, n, n2; uint16_t hx, ix; @@ -242,23 +247,21 @@ ix = hx & 0x7fff; if (ix >= BIAS + 13) { /* |x| >= 8192 or x is NaN */ if (ix == BIAS + LDBL_MAX_EXP) { - if (hx & 0x8000 && u.xbits.man == 1ULL << 63) - return (0.0L); /* x is -Inf */ - return (x + x); /* x is +Inf, NaN or unsupported */ + if (hx & 0x8000) /* x is -Inf, -NaN or unsupported */ + return (-1 / x); + return (x + x); /* x is +Inf, +NaN or unsupported */ } - if (x > o_threshold.e) + if (x > o_threshold) return (huge * huge); - if (x < u_threshold.e) + if (x < u_threshold) return (tiny * tiny); - } else if (ix < BIAS - 66) { /* |x| < 0x1p-66 */ - /* includes pseudo-denormals */ - if (huge + x > 1.0L) /* trigger inexact iff x != 0 */ - return (1.0L + x); + } else if (ix < BIAS - 65) { /* |x| < 0x1p-65 (includes pseudos) */ + return (1 + x); /* 1 with inexact iff x != 0 */ } ENTERI(); - /* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */ + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ /* Use a specialized rint() to get fn. Assume round-to-nearest. */ fn = x * INV_L + 0x1.8p63 - 0x1.8p63; r = x - fn * L1 - fn * L2; /* r = r1 + r2 done independently. */ @@ -270,12 +273,12 @@ n = (int)fn; #endif n2 = (unsigned)n % INTERVALS; - k = (n - n2) / INTERVALS; + k = n >> LOG2_INTERVALS; r1 = x - fn * L1; - r2 = -fn * L2; + r2 = fn * -L2; /* Prepare scale factors. */ - v.xbits.man = 1ULL << 63; + v.e = 1; if (k >= LDBL_MIN_EXP) { v.xbits.expsign = BIAS + k; twopk = v.e; @@ -284,19 +287,16 @@ twopkp10000 = v.e; } - /* Evaluate expl(midpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */ - /* Here q = q(r), not q(r1), since r1 is lopped like L1. */ - t45 = r * P5 + P4; + /* Evaluate expl(endpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */ z = r * r; - t23 = r * P3 + P2; - q = r2 + z * t23 + z * z * t45 + z * z * z * P6; + q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6; t = (long double)s[n2].lo + s[n2].hi; t = s[n2].lo + t * (q + r1) + s[n2].hi; /* Scale by 2**k. */ if (k >= LDBL_MIN_EXP) { if (k == LDBL_MAX_EXP) - RETURNI(t * 2.0L * 0x1p16383L); + RETURNI(t * 2 * 0x1p16383L); RETURNI(t * twopk); } else { RETURNI(t * twopkp10000 * twom10000); Index: ld128/s_expl.c =================================================================== --- ld128/s_expl.c (revision 251062) +++ ld128/s_expl.c (working copy) @@ -1,5 +1,5 @@ /*- - * Copyright (c) 2012 Steven G. Kargl + * Copyright (c) 2009-2012 Steven G. Kargl * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -22,6 +22,8 @@ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * Optimized by Bruce D. Evans. */ #include @@ -38,34 +40,56 @@ #include "math_private.h" #define INTERVALS 128 +#define LOG2_INTERVALS 7 #define BIAS (LDBL_MAX_EXP - 1) +static const long double +huge = 0x1p10000L, +twom10000 = 0x1p-10000L; +/* XXX Prevent gcc from erroneously constant folding this: */ static volatile const long double tiny = 0x1p-10000L; static const long double -INV_L = 1.84664965233787316142070359168242182e+02L, -L1 = 5.41521234812457272982212595914567508e-03L, -L2 = -1.02536706388947310094527932552595546e-29L, -huge = 0x1p10000L, +/* log(2**16384 - 0.5) rounded towards zero: */ +/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */ o_threshold = 11356.523406294143949491931077970763428L, -twom10000 = 0x1p-10000L, +/* log(2**(-16381-64-1)) rounded towards zero: */ u_threshold = -11433.462743336297878837243843452621503L; +/* + * ln2/INTERVALS = L1+L2 (hi+lo decomposition for multiplication). L1 must + * have at least 22 (= log2(|LDBL_MIN_EXP-extras|) + log2(INTERVALS)) lowest + * bits zero so that multiplication of it by n is exact. + */ +static const double +INV_L = 1.8466496523378731e+2, /* 0x171547652b82fe.0p-45 */ +L2 = -1.0253670638894731e-29; /* -0x1.9ff0342542fc3p-97 */ static const long double -P2 = 5.00000000000000000000000000000000000e-1L, -P3 = 1.66666666666666666666666666666666972e-1L, -P4 = 4.16666666666666666666666666653708268e-2L, -P5 = 8.33333333333333333333333315069867254e-3L, -P6 = 1.38888888888888888888996596213795377e-3L, -P7 = 1.98412698412698412718821436278644414e-4L, -P8 = 2.48015873015869681884882576649543128e-5L, -P9 = 2.75573192240103867817876199544468806e-6L, -P10 = 2.75573236172670046201884000197885520e-7L, -P11 = 2.50517544183909126492878226167697856e-8L; +/* 0x1.62e42fefa39ef35793c768000000p-8 */ +L1 = 5.41521234812457272982212595914567508e-03L; +static const long double +/* + * Domain [-0.002708, 0.002708], range ~[-2.4011e-38, 2.4244e-38]: + * |exp(x) - p(x)| < 2**-124.9 + * (0.002708 is ln2/(2*INTERVALS) rounded up a little). + */ +A2 = 0.5, +A3 = 1.66666666666666666666666666651085500e-01L, +A4 = 4.16666666666666666666666666425885320e-02L, +A5 = 8.33333333333333333334522877160175842e-03L, +A6 = 1.38888888888888888889971139751596836e-03L; + +static const double +A7 = 1.9841269841269471e-04, +A8 = 2.4801587301585284e-05, +A9 = 2.7557324277411234e-06, +A10 = 2.7557333722375072e-07; + static const struct { long double hi; long double lo; +/* XXX should rename 's'. */ } s[INTERVALS] = { 0x1p0L, 0x0p0L, 0x1.0163da9fb33356d84a66aep0L, 0x3.36dcdfa4003ec04c360be2404078p-92L, @@ -201,9 +225,10 @@ expl(long double x) { union IEEEl2bits u, v; - long double fn, r, r1, r2, q, t, twopk, twopkp10000; + long double q, r, r1, t, twopk, twopkp10000; + double dr, fn, r2; int k, n, n2; - uint32_t hx, ix; + uint16_t hx, ix; /* Filter out exceptional cases. */ u.e = x; @@ -211,31 +236,38 @@ ix = hx & 0x7fff; if (ix >= BIAS + 13) { /* |x| >= 8192 or x is NaN */ if (ix == BIAS + LDBL_MAX_EXP) { - if (hx & 0x8000 && u.xbits.manh == 0 && - u.xbits.manl == 0) - return (0.0L); /* x is -Inf */ - return (x + x); /* x is +Inf or NaN */ + if (hx & 0x8000) /* x is -Inf or -NaN */ + return (-1 / x); + return (x + x); /* x is +Inf or +NaN */ } if (x > o_threshold) return (huge * huge); if (x < u_threshold) return (tiny * tiny); - } else if (ix < BIAS - 115) { /* |x| < 0x1p-115 */ - if (huge + x > 1.0L) /* trigger inexact iff x != 0 */ - return (1.0L + x); + } else if (ix < BIAS - 114) { /* |x| < 0x1p-114 */ + return (1 + x); /* 1 with inexact iff x != 0 */ } - /* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */ - fn = x * INV_L + 0x1.8p112 - 0x1.8p112; + ENTERI(); + + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ + /* Use a specialized rint() to get fn. Assume round-to-nearest. */ + /* XXX assume no extra precision for the additions, as for trig fns. */ + fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52; +#if defined(HAVE_EFFICIENT_IRINT) + n = irint(fn); +#else n = (int)fn; +#endif n2 = (unsigned)n % INTERVALS; k = (n - n2) / INTERVALS; r1 = x - fn * L1; - r2 = -fn * L2; + r2 = fn * -L2; + r = r1 + r2; /* Prepare scale factors. */ - v.xbits.manh = 0; - v.xbits.manl = 0; + /* XXX sparc64 multiplication is so slow that scalbnl() is faster. */ + v.e = 1; if (k >= LDBL_MIN_EXP) { v.xbits.expsign = BIAS + k; twopk = v.e; @@ -244,18 +276,19 @@ twopkp10000 = v.e; } - r = r1 + r2; - q = r * r * (P2 + r * (P3 + r * (P4 + r * (P5 + r * (P6 + r * (P7 + - r * (P8 + r * (P9 + r * (P10 + r * P11))))))))); + /* Evaluate expl(endpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */ + dr = r; + q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 + + dr * (A7 + dr * (A8 + dr * (A9 + dr * A10)))))))); t = s[n2].lo + s[n2].hi; - t = s[n2].hi + (s[n2].lo + t * (r2 + q + r1)); + t = s[n2].lo + t * (q + r1) + s[n2].hi; /* Scale by 2**k. */ if (k >= LDBL_MIN_EXP) { if (k == LDBL_MAX_EXP) - return (t * 2.0L * 0x1p16383L); - return (t * twopk); + RETURNI(t * 2 * 0x1p16383L); + RETURNI(t * twopk); } else { - return (t * twopkp10000 * twom10000); + RETURNI(t * twopkp10000 * twom10000); } } Patch 2: --- ld80/s_expl.c 2013-05-28 09:36:27.000000000 -0700 +++ ld80/s_expl.c.all 2013-05-28 09:34:41.000000000 -0700 @@ -302,3 +302,166 @@ RETURNI(t * twopkp10000 * twom10000); } } + +/** + * Compute expm1l(x) for Intel 80-bit format. This is based on: + * + * PTP Tang, "Table-driven implementation of the Expm1 function + * in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18, + * 211-222 (1992). + */ + +/* + * Our T1 and T2 are chosen to be approximately the points where method + * A and method B have the same accuracy. Tang's T1 and T2 are the + * points where method A's accuracy changes by a full bit. For Tang, + * this drop in accuracy makes method A immediately less accurate than + * method B, but our larger INTERVALS makes method A 2 bits more + * accurate so it remains the most accurate method significantly + * closer to the origin despite losing the full bit in our extended + * range for it. + */ +static const double +T1 = -0.1659, /* ~-30.625/128 * log(2) */ +T2 = 0.1659; /* ~30.625/128 * log(2) */ + +/* + * Domain [-0.1659, 0.1659], range ~[-1.2027e-22, 3.4417e-22]: + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.2 + */ +static const union IEEEl2bits +B3 = LD80C(0xaaaaaaaaaaaaaaab, -3, 1.66666666666666666671e-01L), +B4 = LD80C(0xaaaaaaaaaaaaaaac, -5, 4.16666666666666666712e-02L); + +static const double +B5 = 8.3333333333333245e-03, /* 0x1.111111111110cp-7 */ +B6 = 1.3888888888888861e-03, /* 0x1.6c16c16c16c0ap-10 */ +B7 = 1.9841269841532042e-04, /* 0x1.a01a01a0319f9p-13 */ +B8 = 2.4801587302069236e-05, /* 0x1.a01a01a03cbbcp-16 */ +B9 = 2.7557316558468562e-06, /* 0x1.71de37fd33d67p-19 */ +B10 = 2.7557315829785151e-07, /* 0x1.27e4f91418144p-22 */ +B11 = 2.5063168199779829e-08, /* 0x1.ae94fabdc6b27p-26 */ +B12 = 2.0887164654459567e-09; /* 0x1.1f122d6413fe1p-29 */ + +long double +expm1l(long double x) +{ + union IEEEl2bits u, v; + long double fn, hx2_hi, hx2_lo, q, r, r1, r2, t, twomk, twopk, x_hi; + long double x_lo, x2, z; + long double x4; + int k, n, n2; + uint16_t hx, ix; + + /* Filter out exceptional cases. */ + u.e = x; + hx = u.xbits.expsign; + ix = hx & 0x7fff; + if (ix >= BIAS + 6) { /* |x| >= 64 or x is NaN */ + if (ix == BIAS + LDBL_MAX_EXP) { + if (hx & 0x8000) /* x is -Inf, -NaN or unsupported */ + return (-1 / x - 1); + return (x + x); /* x is +Inf, +NaN or unsupported */ + } + if (x > o_threshold) + return (huge * huge); + /* + * expm1l() never underflows, but it must avoid + * unrepresentable large negative exponents. We used a + * much smaller threshold for large |x| above than in + * expl() so as to handle not so large negative exponents + * in the same way as large ones here. + */ + if (hx & 0x8000) /* x <= -64 */ + return (tiny - 1); /* good for x < -65ln2 - eps */ + } + + ENTERI(); + + if (T1 < x && x < T2) { + if (ix < BIAS - 64) { /* |x| < 0x1p-64 (includes pseudos) */ + /* x (rounded) with inexact if x != 0: */ + RETURNI(x == 0 ? x : + (0x1p100 * x + fabsl(x)) * 0x1p-100); + } + + x2 = x * x; + x4 = x2 * x2; + q = x4 * (x2 * (x4 * + (x2 * B12 + (x * B11 + B10)) + + (x2 * (x * B9 + B8) + (x * B7 + B6))) + + (x * B5 + B4.e)) + x2 * x * B3.e; + + x_hi = (float)x; + x_lo = x - x_hi; + hx2_hi = x_hi * x_hi / 2; + hx2_lo = x_lo * (x + x_hi) / 2; + if (ix >= BIAS - 7) + RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi)); + else + RETURNI(hx2_lo + q + hx2_hi + x); + } + + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ + /* Use a specialized rint() to get fn. Assume round-to-nearest. */ + fn = x * INV_L + 0x1.8p63 - 0x1.8p63; +#if defined(HAVE_EFFICIENT_IRINTL) + n = irintl(fn); +#elif defined(HAVE_EFFICIENT_IRINT) + n = irint(fn); +#else + n = (int)fn; +#endif + n2 = (unsigned)n % INTERVALS; + k = n >> LOG2_INTERVALS; + r1 = x - fn * L1; + r2 = fn * -L2; + r = r1 + r2; + + /* Prepare scale factor. */ + v.e = 1; + v.xbits.expsign = BIAS + k; + twopk = v.e; + + /* + * Evaluate lower terms of + * expl(endpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). + */ + z = r * r; + q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6; + + t = (long double)s[n2].lo + s[n2].hi; + + if (k == 0) { + t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + + (s[n2].hi - 1); + RETURNI(t); + } + + if (k == -1) { + t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + + (s[n2].hi - 2); + RETURNI(t / 2); + } + + if (k < -7) { + t = s[n2].lo + t * (q + r1) + s[n2].hi; + RETURNI(t * twopk - 1); + } + + if (k > 2 * LDBL_MANT_DIG - 1) { + t = s[n2].lo + t * (q + r1) + s[n2].hi; + if (k == LDBL_MAX_EXP) + RETURNI(t * 2 * 0x1p16383L - 1); + RETURNI(t * twopk - 1); + } + + v.xbits.expsign = BIAS - k; + twomk = v.e; + + if (k > LDBL_MANT_DIG - 1) + t = s[n2].lo - twomk + t * (q + r1) + s[n2].hi; + else + t = s[n2].lo + t * (q + r1) + (s[n2].hi - twomk); + RETURNI(t * twopk); +} --- ld128/s_expl.c 2013-05-28 09:36:11.000000000 -0700 +++ ld128/s_expl.c.all 2013-05-28 09:34:52.000000000 -0700 @@ -292,3 +292,214 @@ RETURNI(t * twopkp10000 * twom10000); } } + +/* + * Our T1 and T2 are chosen to be approximately the points where method + * A and method B have the same accuracy. Tang's T1 and T2 are the + * points where method A's accuracy changes by a full bit. For Tang, + * this drop in accuracy makes method A immediately less accurate than + * method B, but our larger INTERVALS makes method A 2 bits more + * accurate so it remains the most accurate method significantly + * closer to the origin despite losing the full bit in our extended + * range for it. + */ +static const double +T1 = -0.1659, /* ~-30.625/128 * log(2) */ +T2 = 0.1659; /* ~30.625/128 * log(2) */ + +/* + * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2]. + * Setting T3 to 0 would require the |x| < 0x1p-113 condition to appear + * in both subintervals, so set T3 = 2**-5, which places the condition + * into the [T1:T3] interval. + */ +static const double +T3 = 0.03125; + +/* + * XXX Estimated range is for absolute error. + * Domain [-0.1659, 0.03125], range ~[-1.8933e-38, 1.8943e-38]: + * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-125.3 + */ +static const long double +C3 = 1.66666666666666666666666666666666667e-01L, +C4 = 4.16666666666666666666666666666666645e-02L, +C5 = 8.33333333333333333333333333333371638e-03L, +C6 = 1.38888888888888888888888888891188658e-03L, +C7 = 1.98412698412698412698412697235950394e-04L, +C8 = 2.48015873015873015873015112487849040e-05L, +C9 = 2.75573192239858906525606685484412005e-06L, +C10 = 2.75573192239858906612966093057020362e-07L, +C11 = 2.50521083854417203619031960151253944e-08L, +C12 = 2.08767569878679576457272282566520649e-09L, +C13 = 1.60590438367252471783548748824255707e-10L; + +static const double +C14 = 1.1470745580491932e-11, /* 0x1.93974a81dae3p-37 */ +C15 = 7.6471620181090468e-13, /* 0x1.ae7f3820adab1p-41 */ +C16 = 4.7793721460260450e-14, /* 0x1.ae7cd18a18eacp-45 */ +C17 = 2.8074757356658877e-15, /* 0x1.949992a1937d9p-49 */ +C18 = 1.4760610323699476e-16; /* 0x1.545b43aabfbcdp-53 */ + + +/* + * XXX Estimated range is for absolute error. + * Domain [0.03125, 0.1659], range ~[-2.7597e-38, 2.7602e-38]: + * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-124.8 + */ +static const long double +D3 = 1.66666666666666666666666666666682245e-01L, +D4 = 4.16666666666666666666666666634228324e-02L, +D5 = 8.33333333333333333333333364022244481e-03L, +D6 = 1.38888888888888888888887138722762072e-03L, +D7 = 1.98412698412698412699085805424661471e-04L, +D8 = 2.48015873015873015687993712101479612e-05L, +D9 = 2.75573192239858944101036288338208042e-06L, +D10 = 2.75573192239853161148064676533754048e-07L, +D11 = 2.50521083855084570046480450935267433e-08L, +D12 = 2.08767569819738524488686318024854942e-09L, +D13 = 1.60590442297008495301927448122499313e-10L; + +static const double +D14 = 1.1470726176204336e-11, /* 0x1.93971dc395d9ep-37 */ +D15 = 7.6478532249581686e-13, /* 0x1.ae892e3D16fcep-41 */ +D16 = 4.7628892832607741e-14, /* 0x1.ad00Dfe41feccp-45 */ +D17 = 3.0524857220358650e-15; /* 0x1.D7e8d886Df921p-49 */ + +long double +expm1l(long double x) +{ + union IEEEl2bits u, v; + long double hx2_hi, hx2_lo, q, r, r1, t, twomk, twopk, x_hi; + long double x_lo, x2; + double dr, dx, fn, r2; + int k, n, n2; + uint16_t hx, ix; + + /* Filter out exceptional cases. */ + u.e = x; + hx = u.xbits.expsign; + ix = hx & 0x7fff; + if (ix >= BIAS + 7) { /* |x| >= 128 or x is NaN */ + if (ix == BIAS + LDBL_MAX_EXP) { + if (hx & 0x8000) /* x is -Inf or -NaN */ + return (-1 / x - 1); + return (x + x); /* x is +Inf or +NaN */ + } + if (x > o_threshold) + return (huge * huge); + /* + * expm1l() never underflows, but it must avoid + * unrepresentable large negative exponents. We used a + * much smaller threshold for large |x| above than in + * expl() so as to handle not so large negative exponents + * in the same way as large ones here. + */ + if (hx & 0x8000) /* x <= -128 */ + return (tiny - 1); /* good for x < -114ln2 - eps */ + } + + ENTERI(); + + if (T1 < x && x < T2) { + x2 = x * x; + dx = x; + + if (x < T3) { + if (ix < BIAS - 113) { /* |x| < 0x1p-113 */ + /* x (rounded) with inexact if x != 0: */ + RETURNI(x == 0 ? x : + (0x1p200 * x + fabsl(x)) * 0x1p-200); + } + q = x * x2 * C3 + x2 * x2 * (C4 + x * (C5 + x * (C6 + + x * (C7 + x * (C8 + x * (C9 + x * (C10 + + x * (C11 + x * (C12 + x * (C13 + + dx * (C14 + dx * (C15 + dx * (C16 + + dx * (C17 + dx * C18)))))))))))))); + } else { + q = x * x2 * D3 + x2 * x2 * (D4 + x * (D5 + x * (D6 + + x * (D7 + x * (D8 + x * (D9 + x * (D10 + + x * (D11 + x * (D12 + x * (D13 + + dx * (D14 + dx * (D15 + dx * (D16 + + dx * D17))))))))))))); + } + + x_hi = (float)x; + x_lo = x - x_hi; + hx2_hi = x_hi * x_hi / 2; + hx2_lo = x_lo * (x + x_hi) / 2; + if (ix >= BIAS - 7) + RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi)); + else + RETURNI(hx2_lo + q + hx2_hi + x); + } + + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ + /* Use a specialized rint() to get fn. Assume round-to-nearest. */ + /* XXX assume no extra precision for the additions, as for trig fns. */ + fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52; +#if defined(HAVE_EFFICIENT_IRINT) + n = irint(fn); +#else + n = (int)fn; +#endif + n2 = (unsigned)n % INTERVALS; + k = n >> LOG2_INTERVALS; + r1 = x - fn * L1; + r2 = fn * -L2; + r = r1 + r2; + + /* Prepare scale factor. */ + /* XXX sparc64 multiplication is so slow that scalbnl() is faster. */ + v.e = 1; + v.xbits.expsign = BIAS + k; + twopk = v.e; + + /* + * Evaluate lower terms of + * expl(endpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). + */ + dr = r; + q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 + + dr * (A7 + dr * (A8 + dr * (A9 + dr * A10)))))))); + + t = s[n2].lo + s[n2].hi; + + if (k == 0) { + t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + + (s[n2].hi - 1); + RETURNI(t); + } + + if (k == -1) { + t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + + (s[n2].hi - 2); + RETURNI(t / 2); + } + + + if (k < -7) { + t = s[n2].lo + t * (q + r1) + s[n2].hi; + RETURNI(t * twopk - 1); + } + + if (k > 2 * LDBL_MANT_DIG - 1) { + t = s[n2].lo + t * (q + r1) + s[n2].hi; + if (k == LDBL_MAX_EXP) + RETURNI(t * 2 * 0x1p16383L - 1); + RETURNI(t * twopk - 1); + } + + v.xbits.expsign = BIAS - k; + twomk = v.e; + + if (k > LDBL_MANT_DIG - 1) + t = s[n2].lo - twomk + t * (q + r1) + s[n2].hi; + else if (k < 1) + t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + + (s[n2].hi - twomk); + else + t = s[n2].lo * (q + r1 + 1) + s[n2].hi * (q + r1) + + (s[n2].hi - twomk); + RETURNI(t * twopk); +} From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 17:37:10 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0208ED9D for ; Tue, 28 May 2013 17:37:10 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id DF6F0A95 for ; Tue, 28 May 2013 17:37:09 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4SHb9mm051666 for ; Tue, 28 May 2013 10:37:09 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4SHb9LG051665 for freebsd-numerics@freebsd.org; Tue, 28 May 2013 10:37:09 -0700 (PDT) (envelope-from sgk) Date: Tue, 28 May 2013 10:37:09 -0700 From: Steve Kargl To: freebsd-numerics@freebsd.org Subject: Re: Patches for s_expl.c Message-ID: <20130528173709.GA51603@troutmask.apl.washington.edu> References: <20130528172242.GA51485@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130528172242.GA51485@troutmask.apl.washington.edu> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 17:37:10 -0000 On Tue, May 28, 2013 at 10:22:42AM -0700, Steve Kargl wrote: > Here are two patches for ld80/s_expl.c and ld128/s_expl.c. > Instead of committing the one large patch that I have spent > hours testing, I have split it into two. One patch fixes/updates > expl(). The other patch is the implementation of expm1l(). I forgot to send the 3rd patch, which updates documentations, deals with 53-bit long double targets, and math.h. Yes, there is some cruft in the diff, which I'll disentangle when I do the commit. -- steve Index: Symbol.map =================================================================== --- Symbol.map (revision 251062) +++ Symbol.map (working copy) @@ -250,4 +250,7 @@ ctanh; ctanhf; expl; + expm1l; + logl; + sincos; }; Index: man/exp.3 =================================================================== --- man/exp.3 (revision 251062) +++ man/exp.3 (working copy) @@ -41,6 +41,7 @@ .Nm exp2l , .Nm expm1 , .Nm expm1f , +.Nm expm1l , .Nm pow , .Nm powf .Nd exponential and power functions @@ -64,6 +65,8 @@ .Fn expm1 "double x" .Ft float .Fn expm1f "float x" +.Ft long double +.Fn expm1l "long double x" .Ft double .Fn pow "double x" "double y" .Ft float @@ -88,9 +91,10 @@ .Fa x . .Pp The -.Fn expm1 -and the -.Fn expm1f +.Fn expm1 , +.Fn expm1f , +and +.Fn expm1l functions compute the value exp(x)\-1 accurately even for tiny argument .Fa x . .Pp Index: src/math.h =================================================================== --- src/math.h (revision 251062) +++ src/math.h (working copy) @@ -405,6 +405,7 @@ long double cosl(long double); long double exp2l(long double); long double expl(long double); +long double expm1l(long double); long double fabsl(long double) __pure2; long double fdiml(long double, long double); long double floorl(long double); @@ -419,6 +420,7 @@ long long llrintl(long double); long long llroundl(long double); long double logbl(long double); +long double logl(long double); long lrintl(long double); long lroundl(long double); long double modfl(long double, long double *); /* fundamentally !__pure2 */ @@ -440,6 +442,11 @@ long double truncl(long double); #endif /* __ISO_C_VISIBLE >= 1999 */ + +#if __BSD_VISIBLE +void sincos(double, double *, double *); +#endif /* __BSD_VISIBLE */ + __END_DECLS #endif /* !_MATH_H_ */ @@ -462,12 +469,10 @@ long double coshl(long double); long double erfcl(long double); long double erfl(long double); -long double expm1l(long double); long double lgammal(long double); long double log10l(long double); long double log1pl(long double); long double log2l(long double); -long double logl(long double); long double powl(long double, long double); long double sinhl(long double); long double tanhl(long double); Index: src/s_expm1.c =================================================================== --- src/s_expm1.c (revision 251062) +++ src/s_expm1.c (working copy) @@ -216,3 +216,7 @@ } return y; } + +#if (LDBL_MANT_DIG == 53) +__weak_reference(expm1, expm1l); +#endif From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 21:58:38 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 11750D17 for ; Tue, 28 May 2013 21:58:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 8FE9FD30 for ; Tue, 28 May 2013 21:58:37 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 8A74B122EDD; Wed, 29 May 2013 07:39:12 +1000 (EST) Date: Wed, 29 May 2013 07:39:04 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: Patches for s_expl.c In-Reply-To: <20130528172242.GA51485@troutmask.apl.washington.edu> Message-ID: <20130529062437.V4648@besplex.bde.org> References: <20130528172242.GA51485@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=e/de0tV/ c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=SI7jfN9uMIUA:10 a=enA2T3gqEfefmBwEoGAA:9 a=CjuIK1q_8ugA:10 a=tJtbpcaLiRwA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 21:58:38 -0000 On Tue, 28 May 2013, Steve Kargl wrote: > Here are two patches for ld80/s_expl.c and ld128/s_expl.c. > Instead of committing the one large patch that I have spent > hours testing, I have split it into two. One patch fixes/updates > expl(). The other patch is the implementation of expm1l(). > > My commit messages will be: > > Patch 1: > > ld80/s_expl.c: > > * Use the LOG2_INTERVALS macro instead of hardcoding 7. The use of LOG2_INTERVALS isn't merged into the ld128 version. Patch 2 merges its use for expm1l() only. > * Use LD80C to set overflow and underflow thresholds, and then use > #defines to access the .e component to reduce diffs with ld128 version. > * Rename polynomial coefficients P# to A#, which is used in Tang. Almost all the declarations polynomial coefficients are still formatted in a nonstandard way, but differently than in previous development versions. I keep sending you patches for this. > * Remove the use of intermediate results t23 and t45. > * Micro-optimization: remove access to u.xbits.man. On the same line(s) that LOG2_INTERVALS is used, there is a more important micro-optimization than this one. > * Fix an off-by-one in the underflow case. > * Replace a factor the long double constant 2.0L by the integer 2. Let > the compiler to the conversion. > > ld128/s_expl.c: > > * Adjust Copyright years to reflect when bits of the code were actually > written. > * Reduce diff between the ld80 and ld128 versions. > > Patch 2: > > ld80/s_expl.c: > > * Compute expm1l(x) for Intel 80-bit format. > > ld128/s_expl.c: > > * Compute expm1l(x) for IEEE 754 128-bit format. There is a fairly large bug in this, from only merging half of the most recent micro-optimization in the development version of the ld80 version. This might only be an efficiency bug, but I haven't tested the ld128 version with either the full merge or the half merge. The ld128 version still has excessive optimizations for |x| near 0. It uses a slightly different high-degree polynomial on each side of 0. The ld80 version uses the same poly on each side. Most of the style bugs in the 4 exp[!2]l functions are in the coeffs for the polys on each side. I haven't tried so hard to get you to fix them since I want to remove them. > > These are based on: > > PTP Tang, "Table-driven implementation of the Expm1 function > in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18, > 211-222 (1992). > > These commit logs may be too terse for some, but quite frankly after > 2 or 3 years of submitting and resubmitting diffs, I've forgotten > why some changes have or have not been made. > > expm1l() resides in s_expl.c because she shares the same table, > polynomial coefficients, and some numerical constants with expl(). There are some minor style regressions relative to previous development versions outside of poly coeffs. Patches later. > Index: ld80/s_expl.c > =================================================================== > --- ld80/s_expl.c (revision 251062) > +++ ld80/s_expl.c (working copy) > ... > @@ -78,11 +82,11 @@ > * |exp(x) - p(x)| < 2**-77.2 > * (0.002708 is ln2/(2*INTERVALS) rounded up a little). > */ > -P2 = 0.5, > -P3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ > -P4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ > -P5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ > -P6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ > +A2 = 0.5, > +A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ > +A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ > +A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ > +A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ Example of a formatting regression. The extra space that was before the values is for a possible minus sign. This space is still there for the hex values. The extra space before the equals sign is used for fancy formatting to line up the values when the variable names reach A10. Since thee variable names only reach A6, this is not needed. > ... > @@ -242,23 +247,21 @@ > ix = hx & 0x7fff; > if (ix >= BIAS + 13) { /* |x| >= 8192 or x is NaN */ > if (ix == BIAS + LDBL_MAX_EXP) { > - if (hx & 0x8000 && u.xbits.man == 1ULL << 63) > - return (0.0L); /* x is -Inf */ > - return (x + x); /* x is +Inf, NaN or unsupported */ > + if (hx & 0x8000) /* x is -Inf, -NaN or unsupported */ > + return (-1 / x); Micro-optimization here. > ... > @@ -270,12 +273,12 @@ > n = (int)fn; > #endif > n2 = (unsigned)n % INTERVALS; > - k = (n - n2) / INTERVALS; > + k = n >> LOG2_INTERVALS; > r1 = x - fn * L1; > - r2 = -fn * L2; > + r2 = fn * -L2; 2 micro-optimizations. > Index: ld128/s_expl.c > =================================================================== > --- ld128/s_expl.c (revision 251062) > +++ ld128/s_expl.c (working copy) > ... > @@ -38,34 +40,56 @@ > #include "math_private.h" > > #define INTERVALS 128 > +#define LOG2_INTERVALS 7 Not used. > ... > n2 = (unsigned)n % INTERVALS; > k = (n - n2) / INTERVALS; > r1 = x - fn * L1; > - r2 = -fn * L2; > + r2 = fn * -L2; > + r = r1 + r2; 1 micro-optimization (that uses LOG2_INTERVALS) not merrged here. > @@ -244,18 +276,19 @@ > twopkp10000 = v.e; > } > > - r = r1 + r2; > - q = r * r * (P2 + r * (P3 + r * (P4 + r * (P5 + r * (P6 + r * (P7 + > - r * (P8 + r * (P9 + r * (P10 + r * P11))))))))); > + /* Evaluate expl(endpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */ > + dr = r; > + q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 + > + dr * (A7 + dr * (A8 + dr * (A9 + dr * A10)))))))); Macro-optimizations here. Quite different from the ld80 ones. The grouping of terms was already quite different. This merges a macro-optimization technique from das's old work on the ld128 logl -- evaluate terms in double precision if possible, since long double precision is so slow on sparc64 (about 1000 times slower than long double precision on x86. Only hundreds of times slower than double precision on sparc64). > Patch 2: > > --- ld80/s_expl.c 2013-05-28 09:36:27.000000000 -0700 > +++ ld80/s_expl.c.all 2013-05-28 09:34:41.000000000 -0700 > @@ -302,3 +302,166 @@ > ... > + if (k == 0) { > + t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + > + (s[n2].hi - 1); > + RETURNI(t); > + } > + > + if (k == -1) { > + t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + > + (s[n2].hi - 2); > + RETURNI(t / 2); > + } Some cases are optimized here. > ... > + if (k > LDBL_MANT_DIG - 1) > + t = s[n2].lo - twomk + t * (q + r1) + s[n2].hi; > + else > + t = s[n2].lo + t * (q + r1) + (s[n2].hi - twomk); The last statement isn't accurate enough for k = 0 and k = -1, so handling of those cases were moved earlier so that this statement could be optimized to what it is now. The ld128 version is missing this. > ... > --- ld128/s_expl.c 2013-05-28 09:36:11.000000000 -0700 > +++ ld128/s_expl.c.all 2013-05-28 09:34:52.000000000 -0700 > ... > + if (k == 0) { > + t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + > + (s[n2].hi - 1); > + RETURNI(t); > + } > + > + if (k == -1) { > + t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + > + (s[n2].hi - 2); > + RETURNI(t / 2); > + } > + > + Same as for ld808, except for 2 style bugs instead of 1 (1 more extra blank line). > + if (k > LDBL_MANT_DIG - 1) > + t = s[n2].lo - twomk + t * (q + r1) + s[n2].hi; > + else if (k < 1) > + t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + > + (s[n2].hi - twomk); > + else > + t = s[n2].lo * (q + r1 + 1) + s[n2].hi * (q + r1) + > + (s[n2].hi - twomk); Not the same as for ld128. Still has the old slower code, so it probably still works, but even more slowly than before except for k == 0 and k == -1, since there are extra branches to filter out those values. Some patches relative to my version now instead of later: @ --- z22/s_expl.c Wed May 29 04:48:10 2013 @ +++ ./s_expl.c Wed May 29 06:16:29 2013 @ @@ -30,5 +30,5 @@ @ __FBSDID("$FreeBSD: src/lib/msun/ld80/s_expl.c,v 1.10 2012/10/13 19:53:11 kargl Exp $"); @ @ -/*- @ +/** @ * Compute the exponential of x for Intel 80-bit format. This is based on: @ * This ugliness is now required by style(9) :-(. You only made this change in some places places. The indent protection '/*-' was subverted to mean a copyright markup. Its previously-KNF use for non-copyrights was purged in some places but not all. It is still used extensively for non-copyrights in kern/kern_prot.c. @ @@ -83,9 +83,9 @@ @ * (0.002708 is ln2/(2*INTERVALS) rounded up a little). @ */ @ -A2 = 0.5, @ -A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ @ -A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ @ -A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ @ -A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ @ +A2 = 0.5, @ +A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ @ +A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ @ +A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ @ +A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ @ @ /* Fix regressions relative to a previous development version. @ @@ -267,11 +275,12 @@ @ r = x - fn * L1 - fn * L2; /* r = r1 + r2 done independently. */ @ #if defined(HAVE_EFFICIENT_IRINTL) @ - n = irintl(fn); @ + n = irintl(fn); @ #elif defined(HAVE_EFFICIENT_IRINT) @ - n = irint(fn); @ + n = irint(fn); @ #else @ - n = (int)fn; @ + n = (int)fn; Fix more regressions. @ #endif @ n2 = (unsigned)n % INTERVALS; @ + /* Depend on the sign bit being propagated: */ @ k = n >> LOG2_INTERVALS; @ r1 = x - fn * L1; I think a comment is needed. This micro-optimization was merged from s_exp2*.c, where it is commented on more prominently for the long double versions only. @ @@ -327,6 +336,15 @@ @ @ /* @ - * Domain [-0.1659, 0.1659], range ~[-1.2027e-22, 3.4417e-22]: @ - * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.2 @ + * Domain [-0.1659, 0.1659], range ~[-2.6155e-22, 2.5507e-23]: @ + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.6 The coeffs were improved a little, but the comment wasn't updated to match. @ + * @ + * XXX the coeffs aren't very carefully rounded, and I get 4.5 more bits, @ + * but unlike for ld128 we can't drop any terms. @ + * @ + * XXX this still isn't in standard format: @ + * - extra digits in exponents for decimal values @ + * - no space for a (not present) minus sign in either the decimal or hex @ + * values @ + * - perhaps they are impossible for double values @ */ @ static const union IEEEl2bits The coeffs have lots of style bugs, though not as many as for ld128. I'm not sure where the latest set of B coeffs came from. Looks like you improved your generation of them. You still seem to minimize the absolute error. This gives larger than necessary relative errors, especially near the endpoints. I think I wrote the new and old versions of the comment about the domain and range. I take a proposed set of coeffs and plot the relative error of the function given by them, then copy the results to the comment. @ @@ -389,4 +409,9 @@ @ x4 = x2 * x2; @ q = x4 * (x2 * (x4 * @ + /* @ + * XXX the number of terms is no longer good for @ + * pairwise grouping of all except B3, and the @ + * grouping is no longer from highest down. @ + */ @ (x2 * B12 + (x * B11 + B10)) + @ (x2 * (x * B9 + B8) + (x * B7 + B6))) + @ @@ -407,9 +432,9 @@ @ fn = x * INV_L + 0x1.8p63 - 0x1.8p63; @ #if defined(HAVE_EFFICIENT_IRINTL) @ - n = irintl(fn); @ + n = irintl(fn); @ #elif defined(HAVE_EFFICIENT_IRINT) @ - n = irint(fn); @ + n = irint(fn); @ #else @ - n = (int)fn; @ + n = (int)fn; @ #endif @ n2 = (unsigned)n % INTERVALS; @ @@ -434,22 +459,21 @@ @ @ if (k == 0) { @ - t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + @ - (s[n2].hi - 1); @ + t = SUM2P(s[n2].hi - 1, s[n2].lo * (r1 + 1) + t * q + @ + s[n2].hi * r1); @ RETURNI(t); @ } @ - Style bug (extra blank line between related statements). @ if (k == -1) { @ - t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + @ - (s[n2].hi - 2); @ + t = SUM2P(s[n2].hi - 2, s[n2].lo * (r1 + 1) + t * q + @ + s[n2].hi * r1); @ RETURNI(t / 2); @ } @ This blank line is correct since the statements are unrelated -- the evaluation method changes significantly. For k = 0 and k = -1, the evaluation is the same but we repeat it all to avoid using a variable for (k - 1) for the 2 values of k. @ if (k < -7) { @ - t = s[n2].lo + t * (q + r1) + s[n2].hi; @ + t = SUM2P(s[n2].hi, s[n2].lo + t * (q + r1)); @ RETURNI(t * twopk - 1); @ } @ @ if (k > 2 * LDBL_MANT_DIG - 1) { @ - t = s[n2].lo + t * (q + r1) + s[n2].hi; @ + t = SUM2P(s[n2].hi, s[n2].lo + t * (q + r1)); @ if (k == LDBL_MAX_EXP) @ RETURNI(t * 2 * 0x1p16383L - 1); Ignore all the other changes in this hunk. Bruce From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 22:53:11 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D1FBAAE8 for ; Tue, 28 May 2013 22:53:11 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id B6264FA2 for ; Tue, 28 May 2013 22:53:11 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4SMrAbL053382; Tue, 28 May 2013 15:53:10 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4SMrAkA053381; Tue, 28 May 2013 15:53:10 -0700 (PDT) (envelope-from sgk) Date: Tue, 28 May 2013 15:53:10 -0700 From: Steve Kargl To: Bruce Evans Subject: Re: Patches for s_expl.c Message-ID: <20130528225310.GA53144@troutmask.apl.washington.edu> References: <20130528172242.GA51485@troutmask.apl.washington.edu> <20130529062437.V4648@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130529062437.V4648@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 22:53:11 -0000 On Wed, May 29, 2013 at 07:39:04AM +1000, Bruce Evans wrote: > On Tue, 28 May 2013, Steve Kargl wrote: > > > Here are two patches for ld80/s_expl.c and ld128/s_expl.c. > > Instead of committing the one large patch that I have spent > > hours testing, I have split it into two. One patch fixes/updates > > expl(). The other patch is the implementation of expm1l(). > > > > My commit messages will be: > > > > Patch 1: > > > > ld80/s_expl.c: > > > > * Use the LOG2_INTERVALS macro instead of hardcoding 7. > > The use of LOG2_INTERVALS isn't merged into the ld128 version. Patch 2 > merges its use for expm1l() only. > > > * Use LD80C to set overflow and underflow thresholds, and then use > > #defines to access the .e component to reduce diffs with ld128 version. > > * Rename polynomial coefficients P# to A#, which is used in Tang. > > Almost all the declarations polynomial coefficients are still formatted > in a nonstandard way, but differently than in previous development > versions. I keep sending you patches for this. Given that I've merged, unmerged, remerged, disremerged, and undisremerged numerous diffs over the last 2+ years, I am not surprise that there are issues with the patches. I'm neither an expert in floating arithmetic nor style(9). If I understand half of what you write when you annotate one of your diffs, I feel lucky. (Un)fortunately, I only have a few hours this week to work on expl/expm1l, and then I'll disappear again for a month or two (due to work and life). (Un)fortunately, theraven (under the pretense of core) has threaten to completely rendered libm into a crippled useless mess by mapping all unimplemented long double functions to their double cousins. When/if it comes to pass that I have to untangle whatever theraven does, I'll likely just walk away from libm hacking. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Tue May 28 23:17:49 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D3AA390 for ; Tue, 28 May 2013 23:17:49 +0000 (UTC) (envelope-from s.montgomerysmith@gmail.com) Received: from mail-ie0-x229.google.com (mail-ie0-x229.google.com [IPv6:2607:f8b0:4001:c03::229]) by mx1.freebsd.org (Postfix) with ESMTP id A72511CE for ; Tue, 28 May 2013 23:17:49 +0000 (UTC) Received: by mail-ie0-f169.google.com with SMTP id u16so23556194iet.0 for ; Tue, 28 May 2013 16:17:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; bh=SvFDEBzW9er/cY4ta5wvKGwTqj1R7Vi6usIXpyhzSro=; b=BBHPsF/Me+J2MOPl5ruP1iBf5gNUXvBIK4xOv1cXSuPxefFOuwhqqRPzcO84PQx22b vNEeQmZatx0EL2tCz+PkUKoXnkx2uAfhMa0gzV21sD51RWLtyPjeOyR/JqZ2tTH9WJMl 6D/lc2qNNQ24qJGBUiEymmWTzHtUFgOjHGOPfsZVlwuWj3JEYFFnm97S86HcenaXIGly 8gDDXl1UEcNKzSMhDaJxAPPwq39ws0vwxe0WVJF+jLUQTCY+jAwGB8nquxovbrvdysrE DmYv8/WbN6YPArTgHirgrTNjKIzNZGac6HcbU8mvyO/JV3/WVfpemb1qfuXjVNqnGvoy M8nA== X-Received: by 10.50.8.65 with SMTP id p1mr54053iga.19.1369783069366; Tue, 28 May 2013 16:17:49 -0700 (PDT) Received: from [192.168.0.11] (50-82-246-58.client.mchsi.com. [50.82.246.58]) by mx.google.com with ESMTPSA id ct8sm20129230igb.7.2013.05.28.16.17.47 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 28 May 2013 16:17:48 -0700 (PDT) Sender: Stephen Montgomery-Smith Message-ID: <51A53B1A.9040607@missouri.edu> Date: Tue, 28 May 2013 18:17:46 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org Subject: Re: Patches for s_expl.c References: <20130528172242.GA51485@troutmask.apl.washington.edu> <20130529062437.V4648@besplex.bde.org> <20130528225310.GA53144@troutmask.apl.washington.edu> In-Reply-To: <20130528225310.GA53144@troutmask.apl.washington.edu> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 23:17:49 -0000 On 05/28/2013 05:53 PM, Steve Kargl wrote: > Given that I've merged, unmerged, remerged, disremerged, and > undisremerged numerous diffs over the last 2+ years, I am not > surprise that there are issues with the patches. I'm neither > an expert in floating arithmetic nor style(9). If I understand > half of what you write when you annotate one of your diffs, I > feel lucky. > > (Un)fortunately, I only have a few hours this week to work on > expl/expm1l, and then I'll disappear again for a month or two > (due to work and life). (Un)fortunately, theraven (under the > pretense of core) has threaten to completely rendered libm into > a crippled useless mess by mapping all unimplemented long double > functions to their double cousins. When/if it comes to pass > that I have to untangle whatever theraven does, I'll likely > just walk away from libm hacking. I think it is better to commit "as is" if you cannot make all the changes. As for me, I don't really understand the need to be so consistent with style, nor to get every last drop of optimization. In particular, regarding style, I think it is like people talking different languages. You could insist that everyone speak a common language, but it is far better for the intellectual commons if people learn other peoples' languages. Anyway, I think it is better for Steve to commit, and then for Bruce to make changes later on. From owner-freebsd-numerics@FreeBSD.ORG Wed May 29 00:06:23 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1E0A0AA8 for ; Wed, 29 May 2013 00:06:23 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id DCFC66B6 for ; Wed, 29 May 2013 00:06:22 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4T06MqY053909; Tue, 28 May 2013 17:06:22 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4T06MI4053908; Tue, 28 May 2013 17:06:22 -0700 (PDT) (envelope-from sgk) Date: Tue, 28 May 2013 17:06:22 -0700 From: Steve Kargl To: Stephen Montgomery-Smith Subject: Re: Patches for s_expl.c Message-ID: <20130529000622.GA53899@troutmask.apl.washington.edu> References: <20130528172242.GA51485@troutmask.apl.washington.edu> <20130529062437.V4648@besplex.bde.org> <20130528225310.GA53144@troutmask.apl.washington.edu> <51A53B1A.9040607@missouri.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51A53B1A.9040607@missouri.edu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 May 2013 00:06:23 -0000 On Tue, May 28, 2013 at 06:17:46PM -0500, Stephen Montgomery-Smith wrote: > On 05/28/2013 05:53 PM, Steve Kargl wrote: > > > Given that I've merged, unmerged, remerged, disremerged, and > > undisremerged numerous diffs over the last 2+ years, I am not > > surprise that there are issues with the patches. I'm neither > > an expert in floating arithmetic nor style(9). If I understand > > half of what you write when you annotate one of your diffs, I > > feel lucky. > > > > (Un)fortunately, I only have a few hours this week to work on > > expl/expm1l, and then I'll disappear again for a month or two > > (due to work and life). (Un)fortunately, theraven (under the > > pretense of core) has threaten to completely rendered libm into > > a crippled useless mess by mapping all unimplemented long double > > functions to their double cousins. When/if it comes to pass > > that I have to untangle whatever theraven does, I'll likely > > just walk away from libm hacking. > > I think it is better to commit "as is" if you cannot make all the changes. > > As for me, I don't really understand the need to be so consistent with > style, nor to get every last drop of optimization. In particular, > regarding style, I think it is like people talking different languages. > You could insist that everyone speak a common language, but it is far > better for the intellectual commons if people learn other peoples' > languages. > > Anyway, I think it is better for Steve to commit, and then for Bruce to > make changes later on. > It's too late. In making some change since the last time I test has introduced a massive regression in the computation of expm1l. laptop-kargl:kargl[204] ./testl -n 5 -b prec: 64 For x in [-64.0000:-0.1659], 5M expm1l calls in 2.176513 seconds. For x in [-0.1659:0.1659], 5M expm1l calls in 0.415051 seconds. For x in [0.1659:11356.0000], 5M expm1l calls in 0.550342 seconds. Notice, the first interval is now 4 to 5 times slower than the other intervals. This was not the case with an older version of the code. :( -- Steve From owner-freebsd-numerics@FreeBSD.ORG Wed May 29 01:21:19 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2E775467 for ; Wed, 29 May 2013 01:21:19 +0000 (UTC) (envelope-from s.montgomerysmith@gmail.com) Received: from mail-ie0-x232.google.com (mail-ie0-x232.google.com [IPv6:2607:f8b0:4001:c03::232]) by mx1.freebsd.org (Postfix) with ESMTP id F2BE7A79 for ; Wed, 29 May 2013 01:21:18 +0000 (UTC) Received: by mail-ie0-f178.google.com with SMTP id f4so7240082iea.37 for ; Tue, 28 May 2013 18:21:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; bh=vZGvRyAYMcXbSQd+Pef+ozY2WNuHDSVSkodRKHfc5CE=; b=L+ejsyWiYDzrzqDLg7R9uQ+Bk32Dl+DULmlvMZsdCPyO5lhO+FeTkquPr5KtBAr1tv AqD91GrLc59FPmMIgnp4/ZaGY7SmE7qrz7MRTkDAlAMgPeaieWt5GoImICPp96T2F2kt +31Tx+X2k/V2nLt3U5n6+Eyfkg8/sGAbCOjCFg7rT8TYptZzKXELM2f45RH3ZGopiQuL sJ0IFeJa3bV0Fq3o5HfWkXkZcGJORreUKVWz+LSwWHH1qczA89eNxYLRecPJ4MIIqXlN Dnkf8mjkhwdjODCMbHJ/VtPQFKo1QCGuTa4b0kWV3lFPADTk4/bV060mVyQ3yIPhaQAP GMkQ== X-Received: by 10.42.84.73 with SMTP id k9mr140386icl.50.1369790478756; Tue, 28 May 2013 18:21:18 -0700 (PDT) Received: from [192.168.0.11] (50-82-246-58.client.mchsi.com. [50.82.246.58]) by mx.google.com with ESMTPSA id o10sm20679318igh.2.2013.05.28.18.21.16 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 28 May 2013 18:21:17 -0700 (PDT) Sender: Stephen Montgomery-Smith Message-ID: <51A5580C.9000607@missouri.edu> Date: Tue, 28 May 2013 20:21:16 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org Subject: Re: Patches for s_expl.c References: <20130528172242.GA51485@troutmask.apl.washington.edu> <20130529062437.V4648@besplex.bde.org> <20130528225310.GA53144@troutmask.apl.washington.edu> <51A53B1A.9040607@missouri.edu> <20130529000622.GA53899@troutmask.apl.washington.edu> In-Reply-To: <20130529000622.GA53899@troutmask.apl.washington.edu> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 May 2013 01:21:19 -0000 On 05/28/2013 07:06 PM, Steve Kargl wrote: > On Tue, May 28, 2013 at 06:17:46PM -0500, Stephen Montgomery-Smith wrote: >> On 05/28/2013 05:53 PM, Steve Kargl wrote: >> >>> Given that I've merged, unmerged, remerged, disremerged, and >>> undisremerged numerous diffs over the last 2+ years, I am not >>> surprise that there are issues with the patches. I'm neither >>> an expert in floating arithmetic nor style(9). If I understand >>> half of what you write when you annotate one of your diffs, I >>> feel lucky. >>> >>> (Un)fortunately, I only have a few hours this week to work on >>> expl/expm1l, and then I'll disappear again for a month or two >>> (due to work and life). (Un)fortunately, theraven (under the >>> pretense of core) has threaten to completely rendered libm into >>> a crippled useless mess by mapping all unimplemented long double >>> functions to their double cousins. When/if it comes to pass >>> that I have to untangle whatever theraven does, I'll likely >>> just walk away from libm hacking. >> >> I think it is better to commit "as is" if you cannot make all the changes. >> >> As for me, I don't really understand the need to be so consistent with >> style, nor to get every last drop of optimization. In particular, >> regarding style, I think it is like people talking different languages. >> You could insist that everyone speak a common language, but it is far >> better for the intellectual commons if people learn other peoples' >> languages. >> >> Anyway, I think it is better for Steve to commit, and then for Bruce to >> make changes later on. >> > > It's too late. In making some change since the last time I test > has introduced a massive regression in the computation of expm1l. > > laptop-kargl:kargl[204] ./testl -n 5 -b > prec: 64 > For x in [-64.0000:-0.1659], 5M expm1l calls in 2.176513 seconds. > For x in [-0.1659:0.1659], 5M expm1l calls in 0.415051 seconds. > For x in [0.1659:11356.0000], 5M expm1l calls in 0.550342 seconds. > > Notice, the first interval is now 4 to 5 times slower than the > other intervals. This was not the case with an older version > of the code. > > :( I think it is still better to commit. Then figure out where the regression was later, when you have time. From owner-freebsd-numerics@FreeBSD.ORG Wed May 29 11:04:54 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D32E175A for ; Wed, 29 May 2013 11:04:54 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 7C5C6F33 for ; Wed, 29 May 2013 11:04:54 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 8C5461217FE; Wed, 29 May 2013 21:04:51 +1000 (EST) Date: Wed, 29 May 2013 21:04:50 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith Subject: Re: Patches for s_expl.c In-Reply-To: <51A5580C.9000607@missouri.edu> Message-ID: <20130529203350.V1268@besplex.bde.org> References: <20130528172242.GA51485@troutmask.apl.washington.edu> <20130529062437.V4648@besplex.bde.org> <20130528225310.GA53144@troutmask.apl.washington.edu> <51A53B1A.9040607@missouri.edu> <20130529000622.GA53899@troutmask.apl.washington.edu> <51A5580C.9000607@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=e4Ne0tV/ c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=SI7jfN9uMIUA:10 a=hyAGcHVSu_I8guswM6YA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 May 2013 11:04:54 -0000 On Tue, 28 May 2013, Stephen Montgomery-Smith wrote: > On 05/28/2013 07:06 PM, Steve Kargl wrote: >> On Tue, May 28, 2013 at 06:17:46PM -0500, Stephen Montgomery-Smith wrote: >>> On 05/28/2013 05:53 PM, Steve Kargl wrote: >>> >>>> Given that I've merged, unmerged, remerged, disremerged, and >>>> undisremerged numerous diffs over the last 2+ years, I am not >>>> surprise that there are issues with the patches. I'm neither >>>> an expert in floating arithmetic nor style(9). If I understand >>>> half of what you write when you annotate one of your diffs, I >>>> feel lucky. Mail is not a very suitable medium for exchanging patches (but is better than a vcs that is not shared, or url). >>>> (Un)fortunately, I only have a few hours this week to work on >>>> expl/expm1l, and then I'll disappear again for a month or two >>>> (due to work and life). (Un)fortunately, theraven (under the >>>> ... It can take a long time to merger patches, especially when the turnaround time is months. I take more than a few hours a week on this when I'm working on it. >>> ... >>> Anyway, I think it is better for Steve to commit, and then for Bruce to >>> make changes later on. >> >> It's too late. In making some change since the last time I test >> has introduced a massive regression in the computation of expm1l. >> >> laptop-kargl:kargl[204] ./testl -n 5 -b >> prec: 64 >> For x in [-64.0000:-0.1659], 5M expm1l calls in 2.176513 seconds. >> For x in [-0.1659:0.1659], 5M expm1l calls in 0.415051 seconds. >> For x in [0.1659:11356.0000], 5M expm1l calls in 0.550342 seconds. >> >> Notice, the first interval is now 4 to 5 times slower than the >> other intervals. This was not the case with an older version >> of the code. I don't see this (only checked on i386 so far). expm1l on [-64.0000:-0.1659] takes about 55-59 cycles (22 nsec; 5M calls in 0.11 seconds) on freefall (Xeon i7(?)) when compiled by gcc. Other intervals are only a couple of cycles faster, except when compiled by clang expm1l takes only 44-45 cycles on [-0.1659:0.1659]. Large slowdowns may be caused by exceptions, but I tested the above range with overflow and underflow traps and didn't get any. > I think it is still better to commit. Then figure out where the > regression was later, when you have time. This is OK for transient efficiency regressions, not for accuracy ones. Bruce From owner-freebsd-numerics@FreeBSD.ORG Wed May 29 16:24:48 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C591798D for ; Wed, 29 May 2013 16:24:48 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id A57C7801 for ; Wed, 29 May 2013 16:24:48 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4TGOfSa058882; Wed, 29 May 2013 09:24:41 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4TGOf7Y058881; Wed, 29 May 2013 09:24:41 -0700 (PDT) (envelope-from sgk) Date: Wed, 29 May 2013 09:24:41 -0700 From: Steve Kargl To: Bruce Evans Subject: Re: Patches for s_expl.c Message-ID: <20130529162441.GA58773@troutmask.apl.washington.edu> References: <20130528172242.GA51485@troutmask.apl.washington.edu> <20130529062437.V4648@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130529062437.V4648@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 May 2013 16:24:48 -0000 On Wed, May 29, 2013 at 07:39:04AM +1000, Bruce Evans wrote: > On Tue, 28 May 2013, Steve Kargl wrote: > > > Here are two patches for ld80/s_expl.c and ld128/s_expl.c. > > Instead of committing the one large patch that I have spent > > hours testing, I have split it into two. One patch fixes/updates > > expl(). The other patch is the implementation of expm1l(). > > > > My commit messages will be: > > > > Patch 1: > > > > ld80/s_expl.c: > > > > * Use the LOG2_INTERVALS macro instead of hardcoding 7. > > The use of LOG2_INTERVALS isn't merged into the ld128 version. Patch 2 > merges its use for expm1l() only. Hopefully, fixed. > > * Use LD80C to set overflow and underflow thresholds, and then use > > #defines to access the .e component to reduce diffs with ld128 version. > > * Rename polynomial coefficients P# to A#, which is used in Tang. > > Almost all the declarations polynomial coefficients are still formatted > in a nonstandard way, but differently than in previous development > versions. I keep sending you patches for this. Hopefully, fixed. All fancy whitespace has been removed including in comments with hex values. > > * Compute expm1l(x) for IEEE 754 128-bit format. > > There is a fairly large bug in this, from only merging half of the > most recent micro-optimization in the development version of the ld80 > version. This might only be an efficiency bug, but I haven't tested > the ld128 version with either the full merge or the half merge. > > The ld128 version still has excessive optimizations for |x| near 0. > It uses a slightly different high-degree polynomial on each side of > 0. The ld80 version uses the same poly on each side. Most of the > style bugs in the 4 exp[!2]l functions are in the coeffs for the > polys on each side. I haven't tried so hard to get you to fix them > since I want to remove them. Hopefully, fixed to the extent that opened ld80/s_expl.c in one nedit window and ld128/s_expl.c in another. I copied everything from ld80 to ld128 except of course literal constants and polynomials that must be different. > > > > These are based on: > > > > PTP Tang, "Table-driven implementation of the Expm1 function > > in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18, > > 211-222 (1992). > > > > These commit logs may be too terse for some, but quite frankly after > > 2 or 3 years of submitting and resubmitting diffs, I've forgotten > > why some changes have or have not been made. > > > > expm1l() resides in s_expl.c because she shares the same table, > > polynomial coefficients, and some numerical constants with expl(). > > There are some minor style regressions relative to previous development > versions outside of poly coeffs. Patches later. I'm sure you're going to hate the new patch at the end. > > Index: ld80/s_expl.c > > =================================================================== > > --- ld80/s_expl.c (revision 251062) > > +++ ld80/s_expl.c (working copy) > > ... > > @@ -78,11 +82,11 @@ > > * |exp(x) - p(x)| < 2**-77.2 > > * (0.002708 is ln2/(2*INTERVALS) rounded up a little). > > */ > > -P2 = 0.5, > > -P3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ > > -P4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ > > -P5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ > > -P6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ > > +A2 = 0.5, > > +A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ > > +A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ > > +A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ > > +A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ > > Example of a formatting regression. The extra space that was before the > values is for a possible minus sign. This space is still there for the > hex values. The extra space before the equals sign is used for fancy > formatting to line up the values when the variable names reach A10. Since > thee variable names only reach A6, this is not needed. All coefficient are now formatted with the form: A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ ie., 1 space before and 1 space after =. The space in the comments for the implicit + sign has been removed. > > Index: ld128/s_expl.c > > =================================================================== > > --- ld128/s_expl.c (revision 251062) > > +++ ld128/s_expl.c (working copy) > > ... > > @@ -38,34 +40,56 @@ > > #include "math_private.h" > > > > #define INTERVALS 128 > > +#define LOG2_INTERVALS 7 > > Not used. Hopefully, fixed. > > n2 = (unsigned)n % INTERVALS; > > k = (n - n2) / INTERVALS; > > r1 = x - fn * L1; > > - r2 = -fn * L2; > > + r2 = fn * -L2; > > + r = r1 + r2; > > 1 micro-optimization (that uses LOG2_INTERVALS) not merrged here. > Hopefully, fixed. > > ... > > + if (k > LDBL_MANT_DIG - 1) > > + t = s[n2].lo - twomk + t * (q + r1) + s[n2].hi; > > + else > > + t = s[n2].lo + t * (q + r1) + (s[n2].hi - twomk); > > The last statement isn't accurate enough for k = 0 and k = -1, so > handling of those cases were moved earlier so that this statement > could be optimized to what it is now. The ld128 version is missing > this. ld80 code merged into ld128. > > --- ld128/s_expl.c 2013-05-28 09:36:11.000000000 -0700 > > +++ ld128/s_expl.c.all 2013-05-28 09:34:52.000000000 -0700 > > ... > > + if (k == 0) { > > + t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + > > + (s[n2].hi - 1); > > + RETURNI(t); > > + } > > + > > + if (k == -1) { > > + t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + > > + (s[n2].hi - 2); > > + RETURNI(t / 2); > > + } > > + > > + > > Same as for ld808, except for 2 style bugs instead of 1 (1 more extra > blank line). Hopefully, fixed. > > + if (k > LDBL_MANT_DIG - 1) > > + t = s[n2].lo - twomk + t * (q + r1) + s[n2].hi; > > + else if (k < 1) > > + t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + > > + (s[n2].hi - twomk); > > + else > > + t = s[n2].lo * (q + r1 + 1) + s[n2].hi * (q + r1) + > > + (s[n2].hi - twomk); > > Not the same as for ld128. Still has the old slower code, so it probably > still works, but even more slowly than before except for k == 0 and k == -1, > since there are extra branches to filter out those values. ld80 and ld128 now use identical code. > > Some patches relative to my version now instead of later: > > @ --- z22/s_expl.c Wed May 29 04:48:10 2013 > @ +++ ./s_expl.c Wed May 29 06:16:29 2013 > @ @@ -30,5 +30,5 @@ > @ __FBSDID("$FreeBSD: src/lib/msun/ld80/s_expl.c,v 1.10 2012/10/13 19:53:11 kargl Exp $"); > @ > @ -/*- > @ +/** > @ * Compute the exponential of x for Intel 80-bit format. This is based on: > @ * > > This ugliness is now required by style(9) :-(. You only made this change in > some places places. Hopefully, fixed. > @ @@ -83,9 +83,9 @@ > @ * (0.002708 is ln2/(2*INTERVALS) rounded up a little). > @ */ > @ -A2 = 0.5, > @ -A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ > @ -A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ > @ -A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ > @ -A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ > @ +A2 = 0.5, > @ +A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ > @ +A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ > @ +A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ > @ +A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ > @ > @ /* > > Fix regressions relative to a previous development version. I made this conform to style(9). > @ @@ -267,11 +275,12 @@ > @ r = x - fn * L1 - fn * L2; /* r = r1 + r2 done independently. */ > @ #if defined(HAVE_EFFICIENT_IRINTL) > @ - n = irintl(fn); > @ + n = irintl(fn); > @ #elif defined(HAVE_EFFICIENT_IRINT) > @ - n = irint(fn); > @ + n = irint(fn); > @ #else > @ - n = (int)fn; > @ + n = (int)fn; > > Fix more regressions. Hopefully, fixed. > @ #endif > @ n2 = (unsigned)n % INTERVALS; > @ + /* Depend on the sign bit being propagated: */ > @ k = n >> LOG2_INTERVALS; > @ r1 = x - fn * L1; > > I think a comment is needed. This micro-optimization was merged from > s_exp2*.c, where it is commented on more prominently for the long > double versions only. Ignored adding a comment. > > The coeffs have lots of style bugs, though not as many as for ld128. > Hopefully, fixed. > @ @@ -389,4 +409,9 @@ > @ x4 = x2 * x2; > @ q = x4 * (x2 * (x4 * > @ + /* > @ + * XXX the number of terms is no longer good for > @ + * pairwise grouping of all except B3, and the > @ + * grouping is no longer from highest down. > @ + */ > @ (x2 * B12 + (x * B11 + B10)) + > @ (x2 * (x * B9 + B8) + (x * B7 + B6))) + I left this as-is with whitespace and did not add the comment. This should be the only place where there is a substantial deviation from style(9). > @ @@ -407,9 +432,9 @@ > @ fn = x * INV_L + 0x1.8p63 - 0x1.8p63; > @ #if defined(HAVE_EFFICIENT_IRINTL) > @ - n = irintl(fn); > @ + n = irintl(fn); > @ #elif defined(HAVE_EFFICIENT_IRINT) > @ - n = irint(fn); > @ + n = irint(fn); > @ #else > @ - n = (int)fn; > @ + n = (int)fn; > @ #endif Hopefully, fixed. > @ n2 = (unsigned)n % INTERVALS; > @ @@ -434,22 +459,21 @@ > @ > @ if (k == 0) { > @ - t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + > @ - (s[n2].hi - 1); > @ + t = SUM2P(s[n2].hi - 1, s[n2].lo * (r1 + 1) + t * q + > @ + s[n2].hi * r1); > @ RETURNI(t); > @ } > @ - > > Style bug (extra blank line between related statements). Hopefully, fixed. > > @ if (k == -1) { > @ - t = s[n2].lo * (r1 + 1) + t * q + s[n2].hi * r1 + > @ - (s[n2].hi - 2); > @ + t = SUM2P(s[n2].hi - 2, s[n2].lo * (r1 + 1) + t * q + > @ + s[n2].hi * r1); > @ RETURNI(t / 2); > @ } > @ > > This blank line is correct since the statements are unrelated -- the > evaluation method changes significantly. For k = 0 and k = -1, the > evaluation is the same but we repeat it all to avoid using a variable > for (k - 1) for the 2 values of k. > > @ if (k < -7) { > @ - t = s[n2].lo + t * (q + r1) + s[n2].hi; > @ + t = SUM2P(s[n2].hi, s[n2].lo + t * (q + r1)); > @ RETURNI(t * twopk - 1); > @ } > @ > @ if (k > 2 * LDBL_MANT_DIG - 1) { > @ - t = s[n2].lo + t * (q + r1) + s[n2].hi; > @ + t = SUM2P(s[n2].hi, s[n2].lo + t * (q + r1)); > @ if (k == LDBL_MAX_EXP) > @ RETURNI(t * 2 * 0x1p16383L - 1); > > Ignore all the other changes in this hunk. After making the changes, current unscientific testing gives (best viewed in a 95 column window): expl Timing: 1M 2M 10M 100M i386 [-11355.0:11356.0] 0.088302 0.867567 8.64871 amd64 [-11355.0:11356.0] 0.062994 0.631960 6.30295 sparc64 [-11355.0:11356.0] 39.5309 79.1927 Accuracy: M Max ULP x at Max ULP i386 [-11355.0:11356.0] 1 0.50465 -3.5510383760383760e+03 -0x1.bbe13a6062b8cdd4p+11 i386 [-11355.0:11356.0] 10 0.50556 -9.6479456830945683e+03 -0x1.2d7f90c24c5c686p+13 i386 [-11355.0:11356.0] 100 0.50654 -7.9982712426427124e+03 -0x1.f3e45702867bb01p+12 amd64 [-11355.0:11356.0] 1 0.50465 -3.5510383760383760e+03 -0x1.bbe13a6062b8cdd4p+11 amd64 [-11355.0:11356.0] 10 0.50556 -9.6479456830945683e+03 -0x1.2d7f90c24c5c686p+13 amd64 [-11355.0:11356.0] 100 0.50654 -7.9982712426427124e+03 -0x1.f3e45702867bb01p+12 sparc64 [-11355.0:11356.0] 1 0.50619 1.79779355979355979355979355979355983e+03 sparc64 {-11355.0:11356.0] 2 0.50541 1.11496704618352309176154588077294027e+04 expm1l Timing: 1M 10M 100M i386 [-64.0000:-0.1659] 0.435783 4.342621 43.41397 i386 [ -0.1659: 0.1659] 0.082880 0.829142 8.28948 i386 [ 0.1659:11356.0] 0.110590 1.096098 10.96253 amd64 [-64.0000:-0.1659] 0.066751 0.648734 6.46649 amd64 [ -0.1659: 0.1659] 0.061531 0.614824 6.14377 amd64 [ 0.1659:11356.0] 0.071677 0.716927 7.16819 sparc64 [-113.000:-0.1659] 37.84224 sparc64 [ -0.1659: 0.1659] 66.28533 sparc64 [ 0.1659:11356.0] 41.20714 Accuracy: M Max ULP x at Max ULP i386 [-64.0000:-0.1659] 1 0.50824 -1.7579429539429599e-01 -0x1.6806d6ec55bd2cp-3 i386 [ -0.1659: 0.1659] 1 0.50807 1.5765476175476175e-01 0x1.42e07fee5cecaa04p-3 i386 [ 0.1659:11356.0] 1 0.50533 4.6558240641420642e+03 0x1.22fd2f5de1bf8cb2p+12 i386 [-64.0000:-0.1659] 10 0.51163 -1.8666523480652408e-01 -0x1.7e4a57b65a7cp-3 i386 [ -0.1659: 0.1659] 10 0.51031 -1.6139564864956486e-01 -0x1.4a89cd45552be4a8p-3 i386 [ 0.1659:11356.0] 10 0.50597 7.2029609713952472e+03 0x1.c22f60238aafa618p+12 i386 [-64.0000:-0.1659] 100 0.51520 -1.8119337383093434e-01 -0x1.731582f6d89b72p-3 i386 [ -0.1659: 0.1659] 100 0.51161 1.6120475455904754e-01 0x1.4a25b7e6539760ecp-3 i386 [ 0.1659:11356.0] 100 0.50645 1.5581592136564341e+03 0x1.858a308e79dd8494p+10 amd64 [-64.0000:-0.1659] 1 0.50502 -1.8115636515636515e-01 -0x1.73021bbe7877ccp-3 amd64 [ -0.1659: 0.1659] 1 0.50807 1.5765476175476175e-01 0x1.42e07fee5cecaa04p-3 amd64 [ 0.1659:11356.0] 1 0.50522 5.3732636683514684e+03 0x1.4fd437fc4e28bfb6p+12 amd64 [-64.0000:-0.1659] 10 0.51363 -1.7086629347662934e-01 -0x1.5def25b3c452dap-3 amd64 [ -0.1659: 0.1659] 10 0.51031 -1.6139564864956486e-01 -0x1.4a89cd45552be4a8p-3 amd64 [ 0.1659:11356.0] 10 0.50595 2.2495034322503431e-01 0x1.ccb2c3fb0104dbe4p-3 amd64 [-64.0000:-0.1659] 100 0.51376 -2.7335577165055771e-01 -0x1.17ea934da5e086p-2 amd64 [ -0.1659: 0.1659] 100 0.51161 1.6120475455904754e-01 0x1.4a25b7e6539760ecp-3 amd64 [ 0.1659:11356.0] 100 0.50662 3.9436528827225188e+02 0x1.8a5d83883eef2676p+8 sparc64 [-113.000:-0.1659] 1 0.50339 -4.89331501511501510727132103685011835e+00 sparc64 [ -0.1659:0.1659] 1 0.50837 -1.28120218820218813976976441251060453e-01 sparc64 [ 0.1659:11356.] 1 0.50514 6.45515777662077662077313264157127259e+03 Testing on flame is excrudiating slow especially because rdivacky is building clang. Yes, the following is one massive patch. -- Steve Index: ld80/s_expl.c =================================================================== --- ld80/s_expl.c (revision 251067) +++ ld80/s_expl.c (working copy) @@ -29,7 +29,7 @@ #include __FBSDID("$FreeBSD$"); -/*- +/** * Compute the exponential of x for Intel 80-bit format. This is based on: * * PTP Tang, "Table-driven implementation of the exponential function @@ -50,6 +50,7 @@ #include "math_private.h" #define INTERVALS 128 +#define LOG2_INTERVALS 7 #define BIAS (LDBL_MAX_EXP - 1) static const long double @@ -60,9 +61,12 @@ static const union IEEEl2bits /* log(2**16384 - 0.5) rounded towards zero: */ -o_threshold = LD80C(0xb17217f7d1cf79ab, 13, 11356.5234062941439488L), +/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */ +o_thresholdu = LD80C(0xb17217f7d1cf79ab, 13, 11356.5234062941439488L), +#define o_threshold (o_thresholdu.e) /* log(2**(-16381-64-1)) rounded towards zero: */ -u_threshold = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L); +u_thresholdu = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L); +#define u_threshold (u_thresholdu.e) static const double /* @@ -70,19 +74,19 @@ * have at least 22 (= log2(|LDBL_MIN_EXP-extras|) + log2(INTERVALS)) lowest * bits zero so that multiplication of it by n is exact. */ -INV_L = 1.8466496523378731e+2, /* 0x171547652b82fe.0p-45 */ -L1 = 5.4152123484527692e-3, /* 0x162e42ff000000.0p-60 */ +INV_L = 1.8466496523378731e+2, /* 0x171547652b82fe.0p-45 */ +L1 = 5.4152123484527692e-3, /* 0x162e42ff000000.0p-60 */ L2 = -3.2819649005320973e-13, /* -0x1718432a1b0e26.0p-94 */ /* * Domain [-0.002708, 0.002708], range ~[-5.7136e-24, 5.7110e-24]: * |exp(x) - p(x)| < 2**-77.2 * (0.002708 is ln2/(2*INTERVALS) rounded up a little). */ -P2 = 0.5, -P3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ -P4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ -P5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ -P6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ +A2 = 0.5, +A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ +A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ +A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ +A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ /* * 2^(i/INTERVALS) for i in [0,INTERVALS] is represented by two values where @@ -96,8 +100,7 @@ static const struct { double hi; double lo; -/* XXX should rename 's'. */ -} s[INTERVALS] = { +} tbl[INTERVALS] = { 0x1p+0, 0x0p+0, 0x1.0163da9fb3335p+0, 0x1.b61299ab8cdb7p-54, 0x1.02c9a3e778060p+0, 0x1.dcdef95949ef4p-53, @@ -232,7 +235,8 @@ expl(long double x) { union IEEEl2bits u, v; - long double fn, q, r, r1, r2, t, t23, t45, twopk, twopkp10000, z; + long double fn, q, r, r1, r2, t, twopk, twopkp10000; + long double z; int k, n, n2; uint16_t hx, ix; @@ -242,40 +246,38 @@ ix = hx & 0x7fff; if (ix >= BIAS + 13) { /* |x| >= 8192 or x is NaN */ if (ix == BIAS + LDBL_MAX_EXP) { - if (hx & 0x8000 && u.xbits.man == 1ULL << 63) - return (0.0L); /* x is -Inf */ - return (x + x); /* x is +Inf, NaN or unsupported */ + if (hx & 0x8000) /* x is -Inf, -NaN or unsupported */ + return (-1 / x); + return (x + x); /* x is +Inf, +NaN or unsupported */ } - if (x > o_threshold.e) + if (x > o_threshold) return (huge * huge); - if (x < u_threshold.e) + if (x < u_threshold) return (tiny * tiny); - } else if (ix < BIAS - 66) { /* |x| < 0x1p-66 */ - /* includes pseudo-denormals */ - if (huge + x > 1.0L) /* trigger inexact iff x != 0 */ - return (1.0L + x); + } else if (ix < BIAS - 65) { /* |x| < 0x1p-65 (includes pseudos) */ + return (1 + x); /* 1 with inexact iff x != 0 */ } ENTERI(); - /* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */ + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ /* Use a specialized rint() to get fn. Assume round-to-nearest. */ fn = x * INV_L + 0x1.8p63 - 0x1.8p63; r = x - fn * L1 - fn * L2; /* r = r1 + r2 done independently. */ #if defined(HAVE_EFFICIENT_IRINTL) - n = irintl(fn); + n = irintl(fn); #elif defined(HAVE_EFFICIENT_IRINT) - n = irint(fn); + n = irint(fn); #else - n = (int)fn; + n = (int)fn; #endif n2 = (unsigned)n % INTERVALS; - k = (n - n2) / INTERVALS; + k = n >> LOG2_INTERVALS; r1 = x - fn * L1; - r2 = -fn * L2; + r2 = fn * -L2; /* Prepare scale factors. */ - v.xbits.man = 1ULL << 63; + v.e = 1; if (k >= LDBL_MIN_EXP) { v.xbits.expsign = BIAS + k; twopk = v.e; @@ -284,21 +286,181 @@ twopkp10000 = v.e; } - /* Evaluate expl(midpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */ - /* Here q = q(r), not q(r1), since r1 is lopped like L1. */ - t45 = r * P5 + P4; + /* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */ z = r * r; - t23 = r * P3 + P2; - q = r2 + z * t23 + z * z * t45 + z * z * z * P6; - t = (long double)s[n2].lo + s[n2].hi; - t = s[n2].lo + t * (q + r1) + s[n2].hi; + q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6; + t = (long double)tbl[n2].lo + tbl[n2].hi; + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; /* Scale by 2**k. */ if (k >= LDBL_MIN_EXP) { if (k == LDBL_MAX_EXP) - RETURNI(t * 2.0L * 0x1p16383L); + RETURNI(t * 2 * 0x1p16383L); RETURNI(t * twopk); } else { RETURNI(t * twopkp10000 * twom10000); } } + +/** + * Compute expm1l(x) for Intel 80-bit format. This is based on: + * + * PTP Tang, "Table-driven implementation of the Expm1 function + * in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18, + * 211-222 (1992). + */ + +/* + * Our T1 and T2 are chosen to be approximately the points where method + * A and method B have the same accuracy. Tang's T1 and T2 are the + * points where method A's accuracy changes by a full bit. For Tang, + * this drop in accuracy makes method A immediately less accurate than + * method B, but our larger INTERVALS makes method A 2 bits more + * accurate so it remains the most accurate method significantly + * closer to the origin despite losing the full bit in our extended + * range for it. + */ +static const double +T1 = -0.1659, /* ~-30.625/128 * log(2) */ +T2 = 0.1659; /* ~30.625/128 * log(2) */ + +/* + * Domain [-0.1659, 0.1659], range ~[-1.2027e-22, 3.4417e-22]: + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.2 + */ +static const union IEEEl2bits +B3 = LD80C(0xaaaaaaaaaaaaaaab, -3, 1.66666666666666666671e-01L), +B4 = LD80C(0xaaaaaaaaaaaaaaac, -5, 4.16666666666666666712e-02L); + +static const double +B5 = 8.3333333333333245e-03, /* 0x1.111111111110cp-7 */ +B6 = 1.3888888888888861e-03, /* 0x1.6c16c16c16c0ap-10 */ +B7 = 1.9841269841532042e-04, /* 0x1.a01a01a0319f9p-13 */ +B8 = 2.4801587302069236e-05, /* 0x1.a01a01a03cbbcp-16 */ +B9 = 2.7557316558468562e-06, /* 0x1.71de37fd33d67p-19 */ +B10 = 2.7557315829785151e-07, /* 0x1.27e4f91418144p-22 */ +B11 = 2.5063168199779829e-08, /* 0x1.ae94fabdc6b27p-26 */ +B12 = 2.0887164654459567e-09; /* 0x1.1f122d6413fe1p-29 */ + +long double +expm1l(long double x) +{ + union IEEEl2bits u, v; + long double fn, hx2_hi, hx2_lo, q, r, r1, r2, t, twomk, twopk, x_hi; + long double x_lo, x2, z; + long double x4; + int k, n, n2; + uint16_t hx, ix; + + /* Filter out exceptional cases. */ + u.e = x; + hx = u.xbits.expsign; + ix = hx & 0x7fff; + if (ix >= BIAS + 6) { /* |x| >= 64 or x is NaN */ + if (ix == BIAS + LDBL_MAX_EXP) { + if (hx & 0x8000) /* x is -Inf, -NaN or unsupported */ + return (-1 / x - 1); + return (x + x); /* x is +Inf, +NaN or unsupported */ + } + if (x > o_threshold) + return (huge * huge); + /* + * expm1l() never underflows, but it must avoid + * unrepresentable large negative exponents. We used a + * much smaller threshold for large |x| above than in + * expl() so as to handle not so large negative exponents + * in the same way as large ones here. + */ + if (hx & 0x8000) /* x <= -64 */ + return (tiny - 1); /* good for x < -65ln2 - eps */ + } + + ENTERI(); + + if (T1 < x && x < T2) { + if (ix < BIAS - 64) { /* |x| < 0x1p-64 (includes pseudos) */ + /* x (rounded) with inexact if x != 0: */ + RETURNI(x == 0 ? x : + (0x1p100 * x + fabsl(x)) * 0x1p-100); + } + + x2 = x * x; + x4 = x2 * x2; + + q = x4 * (x2 * (x4 * + (x2 * B12 + (x * B11 + B10)) + + (x2 * (x * B9 + B8) + (x * B7 + B6))) + + (x * B5 + B4.e)) + x2 * x * B3.e; + + x_hi = (float)x; + x_lo = x - x_hi; + hx2_hi = x_hi * x_hi / 2; + hx2_lo = x_lo * (x + x_hi) / 2; + if (ix >= BIAS - 7) + RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi)); + else + RETURNI(hx2_lo + q + hx2_hi + x); + } + + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ + /* Use a specialized rint() to get fn. Assume round-to-nearest. */ + fn = x * INV_L + 0x1.8p63 - 0x1.8p63; +#if defined(HAVE_EFFICIENT_IRINTL) + n = irintl(fn); +#elif defined(HAVE_EFFICIENT_IRINT) + n = irint(fn); +#else + n = (int)fn; +#endif + n2 = (unsigned)n % INTERVALS; + k = n >> LOG2_INTERVALS; + r1 = x - fn * L1; + r2 = fn * -L2; + r = r1 + r2; + + /* Prepare scale factor. */ + v.e = 1; + v.xbits.expsign = BIAS + k; + twopk = v.e; + + /* + * Evaluate lower terms of + * expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). + */ + z = r * r; + q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6; + + t = (long double)tbl[n2].lo + tbl[n2].hi; + + if (k == 0) { + t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + + (tbl[n2].hi - 1); + RETURNI(t); + } + + if (k == -1) { + t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + + (tbl[n2].hi - 2); + RETURNI(t / 2); + } + + if (k < -7) { + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; + RETURNI(t * twopk - 1); + } + + if (k > 2 * LDBL_MANT_DIG - 1) { + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; + if (k == LDBL_MAX_EXP) + RETURNI(t * 2 * 0x1p16383L - 1); + RETURNI(t * twopk - 1); + } + + v.xbits.expsign = BIAS - k; + twomk = v.e; + if (k > LDBL_MANT_DIG - 1) + t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi; + else + t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk); + RETURNI(t * twopk); +} Index: ld128/s_expl.c =================================================================== --- ld128/s_expl.c (revision 251067) +++ ld128/s_expl.c (working copy) @@ -1,5 +1,5 @@ /*- - * Copyright (c) 2012 Steven G. Kargl + * Copyright (c) 2009-2012 Steven G. Kargl * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -22,6 +22,8 @@ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * Optimized by Bruce D. Evans. */ #include @@ -38,35 +40,56 @@ #include "math_private.h" #define INTERVALS 128 +#define LOG2_INTERVALS 7 #define BIAS (LDBL_MAX_EXP - 1) +static const long double +huge = 0x1p10000L, +twom10000 = 0x1p-10000L; +/* XXX Prevent gcc from erroneously constant folding this: */ static volatile const long double tiny = 0x1p-10000L; static const long double -INV_L = 1.84664965233787316142070359168242182e+02L, -L1 = 5.41521234812457272982212595914567508e-03L, -L2 = -1.02536706388947310094527932552595546e-29L, -huge = 0x1p10000L, -o_threshold = 11356.523406294143949491931077970763428L, -twom10000 = 0x1p-10000L, +/* log(2**16384 - 0.5) rounded towards zero: */ +/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */ +o_threshold = 11356.523406294143949491931077970763428L, +/* log(2**(-16381-64-1)) rounded towards zero: */ u_threshold = -11433.462743336297878837243843452621503L; +static const double +/* + * ln2/INTERVALS = L1+L2 (hi+lo decomposition for multiplication). L1 must + * have at least 22 (= log2(|LDBL_MIN_EXP-extras|) + log2(INTERVALS)) lowest + * bits zero so that multiplication of it by n is exact. + */ +INV_L = 1.8466496523378731e+2, /* 0x171547652b82fe.0p-45 */ +L2 = -1.0253670638894731e-29; /* -0x1.9ff0342542fc3p-97 */ static const long double -P2 = 5.00000000000000000000000000000000000e-1L, -P3 = 1.66666666666666666666666666666666972e-1L, -P4 = 4.16666666666666666666666666653708268e-2L, -P5 = 8.33333333333333333333333315069867254e-3L, -P6 = 1.38888888888888888888996596213795377e-3L, -P7 = 1.98412698412698412718821436278644414e-4L, -P8 = 2.48015873015869681884882576649543128e-5L, -P9 = 2.75573192240103867817876199544468806e-6L, -P10 = 2.75573236172670046201884000197885520e-7L, -P11 = 2.50517544183909126492878226167697856e-8L; +/* 0x1.62e42fefa39ef35793c768000000p-8 */ +L1 = 5.41521234812457272982212595914567508e-03L; +static const long double +/* + * Domain [-0.002708, 0.002708], range ~[-2.4011e-38, 2.4244e-38]: + * |exp(x) - p(x)| < 2**-124.9 + * (0.002708 is ln2/(2*INTERVALS) rounded up a little). + */ +A2 = 0.5, +A3 = 1.66666666666666666666666666651085500e-01L, +A4 = 4.16666666666666666666666666425885320e-02L, +A5 = 8.33333333333333333334522877160175842e-03L, +A6 = 1.38888888888888888889971139751596836e-03L; + +static const double +A7 = 1.9841269841269471e-04, +A8 = 2.4801587301585284e-05, +A9 = 2.7557324277411234e-06, +A10 = 2.7557333722375072e-07; + static const struct { long double hi; long double lo; -} s[INTERVALS] = { +} tbl[INTERVALS] = { 0x1p0L, 0x0p0L, 0x1.0163da9fb33356d84a66aep0L, 0x3.36dcdfa4003ec04c360be2404078p-92L, 0x1.02c9a3e778060ee6f7cacap0L, 0x4.f7a29bde93d70a2cabc5cb89ba10p-92L, @@ -201,9 +224,10 @@ expl(long double x) { union IEEEl2bits u, v; - long double fn, r, r1, r2, q, t, twopk, twopkp10000; + long double q, r, r1, t, twopk, twopkp10000; + double dr, fn, r2; int k, n, n2; - uint32_t hx, ix; + uint16_t hx, ix; /* Filter out exceptional cases. */ u.e = x; @@ -211,31 +235,36 @@ ix = hx & 0x7fff; if (ix >= BIAS + 13) { /* |x| >= 8192 or x is NaN */ if (ix == BIAS + LDBL_MAX_EXP) { - if (hx & 0x8000 && u.xbits.manh == 0 && - u.xbits.manl == 0) - return (0.0L); /* x is -Inf */ - return (x + x); /* x is +Inf or NaN */ + if (hx & 0x8000) /* x is -Inf or -NaN */ + return (-1 / x); + return (x + x); /* x is +Inf or +NaN */ } if (x > o_threshold) return (huge * huge); if (x < u_threshold) return (tiny * tiny); - } else if (ix < BIAS - 115) { /* |x| < 0x1p-115 */ - if (huge + x > 1.0L) /* trigger inexact iff x != 0 */ - return (1.0L + x); + } else if (ix < BIAS - 114) { /* |x| < 0x1p-114 */ + return (1 + x); /* 1 with inexact iff x != 0 */ } - /* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */ - fn = x * INV_L + 0x1.8p112 - 0x1.8p112; - n = (int)fn; + ENTERI(); + + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ + /* Use a specialized rint() to get fn. Assume round-to-nearest. */ + fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52; + r = x - fn * L1 - fn * L2; /* r = r1 + r2 done independently. */ +#if defined(HAVE_EFFICIENT_IRINT) + n = irint(fn); +#else + n = (int)fn; +#endif n2 = (unsigned)n % INTERVALS; - k = (n - n2) / INTERVALS; + k = n >> LOG2_INTERVALS; r1 = x - fn * L1; - r2 = -fn * L2; + r2 = fn * -L2; /* Prepare scale factors. */ - v.xbits.manh = 0; - v.xbits.manl = 0; + v.e = 1; if (k >= LDBL_MIN_EXP) { v.xbits.expsign = BIAS + k; twopk = v.e; @@ -244,18 +273,223 @@ twopkp10000 = v.e; } - r = r1 + r2; - q = r * r * (P2 + r * (P3 + r * (P4 + r * (P5 + r * (P6 + r * (P7 + - r * (P8 + r * (P9 + r * (P10 + r * P11))))))))); - t = s[n2].lo + s[n2].hi; - t = s[n2].hi + (s[n2].lo + t * (r2 + q + r1)); + /* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */ + dr = r; + q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 + + dr * (A7 + dr * (A8 + dr * (A9 + dr * A10)))))))); + t = tbl[n2].lo + tbl[n2].hi; + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; /* Scale by 2**k. */ if (k >= LDBL_MIN_EXP) { if (k == LDBL_MAX_EXP) - return (t * 2.0L * 0x1p16383L); - return (t * twopk); + RETURNI(t * 2 * 0x1p16383L); + RETURNI(t * twopk); } else { - return (t * twopkp10000 * twom10000); + RETURNI(t * twopkp10000 * twom10000); } } + +/* + * Our T1 and T2 are chosen to be approximately the points where method + * A and method B have the same accuracy. Tang's T1 and T2 are the + * points where method A's accuracy changes by a full bit. For Tang, + * this drop in accuracy makes method A immediately less accurate than + * method B, but our larger INTERVALS makes method A 2 bits more + * accurate so it remains the most accurate method significantly + * closer to the origin despite losing the full bit in our extended + * range for it. + */ +static const double +T1 = -0.1659, /* ~-30.625/128 * log(2) */ +T2 = 0.1659; /* ~30.625/128 * log(2) */ + +/* + * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2]. + * Setting T3 to 0 would require the |x| < 0x1p-113 condition to appear + * in both subintervals, so set T3 = 2**-5, which places the condition + * into the [T1:T3] interval. + */ +static const double +T3 = 0.03125; + +/* + * XXX Estimated range is for absolute error. + * Domain [-0.1659, 0.03125], range ~[-1.8933e-38, 1.8943e-38]: + * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-125.3 + */ +static const long double +C3 = 1.66666666666666666666666666666666667e-01L, +C4 = 4.16666666666666666666666666666666645e-02L, +C5 = 8.33333333333333333333333333333371638e-03L, +C6 = 1.38888888888888888888888888891188658e-03L, +C7 = 1.98412698412698412698412697235950394e-04L, +C8 = 2.48015873015873015873015112487849040e-05L, +C9 = 2.75573192239858906525606685484412005e-06L, +C10 = 2.75573192239858906612966093057020362e-07L, +C11 = 2.50521083854417203619031960151253944e-08L, +C12 = 2.08767569878679576457272282566520649e-09L, +C13 = 1.60590438367252471783548748824255707e-10L; + +static const double +C14 = 1.1470745580491932e-11, /* 0x1.93974a81dae3p-37 */ +C15 = 7.6471620181090468e-13, /* 0x1.ae7f3820adab1p-41 */ +C16 = 4.7793721460260450e-14, /* 0x1.ae7cd18a18eacp-45 */ +C17 = 2.8074757356658877e-15, /* 0x1.949992a1937d9p-49 */ +C18 = 1.4760610323699476e-16; /* 0x1.545b43aabfbcdp-53 */ + +/* + * XXX Estimated range is for absolute error. + * Domain [0.03125, 0.1659], range ~[-2.7597e-38, 2.7602e-38]: + * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-124.8 + */ +static const long double +D3 = 1.66666666666666666666666666666682245e-01L, +D4 = 4.16666666666666666666666666634228324e-02L, +D5 = 8.33333333333333333333333364022244481e-03L, +D6 = 1.38888888888888888888887138722762072e-03L, +D7 = 1.98412698412698412699085805424661471e-04L, +D8 = 2.48015873015873015687993712101479612e-05L, +D9 = 2.75573192239858944101036288338208042e-06L, +D10 = 2.75573192239853161148064676533754048e-07L, +D11 = 2.50521083855084570046480450935267433e-08L, +D12 = 2.08767569819738524488686318024854942e-09L, +D13 = 1.60590442297008495301927448122499313e-10L; + +static const double +D14 = 1.1470726176204336e-11, /* 0x1.93971dc395d9ep-37 */ +D15 = 7.6478532249581686e-13, /* 0x1.ae892e3D16fcep-41 */ +D16 = 4.7628892832607741e-14, /* 0x1.ad00Dfe41feccp-45 */ +D17 = 3.0524857220358650e-15; /* 0x1.D7e8d886Df921p-49 */ + +long double +expm1l(long double x) +{ + union IEEEl2bits u, v; + long double hx2_hi, hx2_lo, q, r, r1, t, twomk, twopk, x_hi; + long double x_lo, x2; + double dr, dx, fn, r2; + int k, n, n2; + uint16_t hx, ix; + + /* Filter out exceptional cases. */ + u.e = x; + hx = u.xbits.expsign; + ix = hx & 0x7fff; + if (ix >= BIAS + 7) { /* |x| >= 128 or x is NaN */ + if (ix == BIAS + LDBL_MAX_EXP) { + if (hx & 0x8000) /* x is -Inf or -NaN */ + return (-1 / x - 1); + return (x + x); /* x is +Inf or +NaN */ + } + if (x > o_threshold) + return (huge * huge); + /* + * expm1l() never underflows, but it must avoid + * unrepresentable large negative exponents. We used a + * much smaller threshold for large |x| above than in + * expl() so as to handle not so large negative exponents + * in the same way as large ones here. + */ + if (hx & 0x8000) /* x <= -128 */ + return (tiny - 1); /* good for x < -114ln2 - eps */ + } + + ENTERI(); + + if (T1 < x && x < T2) { + if (ix < BIAS - 113) { /* |x| < 0x1p-113 */ + /* x (rounded) with inexact if x != 0: */ + RETURNI(x == 0 ? x : + (0x1p200 * x + fabsl(x)) * 0x1p-200); + } + + x2 = x * x; + dx = x; + + if (x < T3) { + q = x * x2 * C3 + x2 * x2 * (C4 + x * (C5 + x * (C6 + + x * (C7 + x * (C8 + x * (C9 + x * (C10 + + x * (C11 + x * (C12 + x * (C13 + + dx * (C14 + dx * (C15 + dx * (C16 + + dx * (C17 + dx * C18)))))))))))))); + } else { + q = x * x2 * D3 + x2 * x2 * (D4 + x * (D5 + x * (D6 + + x * (D7 + x * (D8 + x * (D9 + x * (D10 + + x * (D11 + x * (D12 + x * (D13 + + dx * (D14 + dx * (D15 + dx * (D16 + + dx * D17))))))))))))); + } + + x_hi = (float)x; + x_lo = x - x_hi; + hx2_hi = x_hi * x_hi / 2; + hx2_lo = x_lo * (x + x_hi) / 2; + if (ix >= BIAS - 7) + RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi)); + else + RETURNI(hx2_lo + q + hx2_hi + x); + } + + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ + /* Use a specialized rint() to get fn. Assume round-to-nearest. */ + fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52; +#if defined(HAVE_EFFICIENT_IRINT) + n = irint(fn); +#else + n = (int)fn; +#endif + n2 = (unsigned)n % INTERVALS; + k = n >> LOG2_INTERVALS; + r1 = x - fn * L1; + r2 = fn * -L2; + r = r1 + r2; + + /* Prepare scale factor. */ + v.e = 1; + v.xbits.expsign = BIAS + k; + twopk = v.e; + + /* + * Evaluate lower terms of + * expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). + */ + dr = r; + q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 + + dr * (A7 + dr * (A8 + dr * (A9 + dr * A10)))))))); + + t = tbl[n2].lo + tbl[n2].hi; + + if (k == 0) { + t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + + (tbl[n2].hi - 1); + RETURNI(t); + } + + if (k == -1) { + t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + + (tbl[n2].hi - 2); + RETURNI(t / 2); + } + + if (k < -7) { + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; + RETURNI(t * twopk - 1); + } + + if (k > 2 * LDBL_MANT_DIG - 1) { + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; + if (k == LDBL_MAX_EXP) + RETURNI(t * 2 * 0x1p16383L - 1); + RETURNI(t * twopk - 1); + } + + v.xbits.expsign = BIAS - k; + twomk = v.e; + + if (k > LDBL_MANT_DIG - 1) + t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi; + else + t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk); + RETURNI(t * twopk); +} From owner-freebsd-numerics@FreeBSD.ORG Wed May 29 20:25:41 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A12315BB for ; Wed, 29 May 2013 20:25:41 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 293A478D for ; Wed, 29 May 2013 20:25:41 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id C1B963C187A; Thu, 30 May 2013 06:25:32 +1000 (EST) Date: Thu, 30 May 2013 06:25:31 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: Patches for s_expl.c In-Reply-To: <20130529162441.GA58773@troutmask.apl.washington.edu> Message-ID: <20130530045951.Y4776@besplex.bde.org> References: <20130528172242.GA51485@troutmask.apl.washington.edu> <20130529062437.V4648@besplex.bde.org> <20130529162441.GA58773@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=e4Ne0tV/ c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=SI7jfN9uMIUA:10 a=L3_B_2Seth8ID6XF1HAA:9 a=CjuIK1q_8ugA:10 a=qSAIOg-s5ZBGxsML:21 a=vBbP7Dv9lfjiZ7nx:21 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: freebsd-numerics@freebsd.org, Bruce Evans X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 May 2013 20:25:41 -0000 On Wed, 29 May 2013, Steve Kargl wrote: > On Wed, May 29, 2013 at 07:39:04AM +1000, Bruce Evans wrote: >> On Tue, 28 May 2013, Steve Kargl wrote: >> >>> Here are two patches for ld80/s_expl.c and ld128/s_expl.c. >>> Instead of committing the one large patch that I have spent >>> hours testing, I have split it into two. One patch fixes/updates >>> expl(). The other patch is the implementation of expm1l(). > ... >>> * Rename polynomial coefficients P# to A#, which is used in Tang. >> >> Almost all the declarations polynomial coefficients are still formatted >> in a nonstandard way, but differently than in previous development >> versions. I keep sending you patches for this. > > Hopefully, fixed. All fancy whitespace has been removed including > in comments with hex values. Er, I asked for them to be formatted in a standard way. This has whitespace for minus signs, since that lines up things better and its too hard to avoid it when using printf() to format tables. Removing it gives much larger diffs than before (although I merged a few of the regressions, I didn't merge them when the formatting was already standard). >>> * Compute expm1l(x) for IEEE 754 128-bit format. >> >> There is a fairly large bug in this, from only merging half of the >> most recent micro-optimization in the development version of the ld80 >> version. This might only be an efficiency bug, but I haven't tested >> the ld128 version with either the full merge or the half merge. >> >> The ld128 version still has excessive optimizations for |x| near 0. >> It uses a slightly different high-degree polynomial on each side of >> 0. The ld80 version uses the same poly on each side. Most of the >> style bugs in the 4 exp[!2]l functions are in the coeffs for the >> polys on each side. I haven't tried so hard to get you to fix them >> since I want to remove them. > > Hopefully, fixed to the extent that opened ld80/s_expl.c in one > nedit window and ld128/s_expl.c in another. I copied everything > from ld80 to ld128 except of course literal constants and > polynomials that must be different. Seems to be fixed (matches my version). I have barely started testing my version of it on sparc64. >> There are some minor style regressions relative to previous development >> versions outside of poly coeffs. Patches later. > > I'm sure you're going to hate the new patch at the end. Mainly more whitespace regressions :-). Several non-style regressions for ld128. > All coefficient are now formatted with the form: > > A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ > > ie., 1 space before and 1 space after =. The space in the comments > for the implicit + sign has been removed. As I mentioned above, this is nonstandard and requires manual editing to mess up automatically formatted tables. I used to print the tables not very carefully and had to do lots of editing to match the style in the source. I got tired of this and changed the printing routines to prettyprint in a standard format with all the necessary C syntax so that I could copy whole tables to the source file. Signs may ore may not be required and it is easiest to always leave space for them in the standard format and never edit this to add or remove spaces for them. >> Some patches relative to my version now instead of later: > ... >> @ @@ -83,9 +83,9 @@ >> @ * (0.002708 is ln2/(2*INTERVALS) rounded up a little). >> @ */ >> @ -A2 = 0.5, >> @ -A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ >> @ -A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ >> @ -A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ >> @ -A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ >> @ +A2 = 0.5, >> @ +A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ >> @ +A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ >> @ +A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ >> @ +A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ >> @ >> @ /* >> >> Fix regressions relative to a previous development version. > > I made this conform to style(9). style(9) only says not to use fancy formatting for assignments implicitly. indent(1) cannot preserve fancy formatting for assignments or even be directed how to format assignments. However, we were intentionally using fancy formatting, and the above was one of the few places in s_expl.c where it was done consistently (after backing out regressions). You got the standard fancy formatting by copying one of my automatically generated sets of coeffs when the coeff names were P[2-6]. >> @ #endif >> @ n2 = (unsigned)n % INTERVALS; >> @ + /* Depend on the sign bit being propagated: */ >> @ k = n >> LOG2_INTERVALS; >> @ r1 = x - fn * L1; >> >> I think a comment is needed. This micro-optimization was merged from >> s_exp2*.c, where it is commented on more prominently for the long >> double versions only. > > Ignored adding a comment. It will be in future diffs. >> @ @@ -389,4 +409,9 @@ >> @ x4 = x2 * x2; >> @ q = x4 * (x2 * (x4 * >> @ + /* >> @ + * XXX the number of terms is no longer good for >> @ + * pairwise grouping of all except B3, and the >> @ + * grouping is no longer from highest down. >> @ + */ >> @ (x2 * B12 + (x * B11 + B10)) + >> @ (x2 * (x * B9 + B8) + (x * B7 + B6))) + > > I left this as-is with whitespace and did not add the comment. > This should be the only place where there is a substantial > deviation from style(9). The comment is a reminder for fix the grouping of terms. > After making the changes, current unscientific testing gives > (best viewed in a 95 column window): > ... > expm1l > > Timing: > 1M 10M 100M > i386 [-64.0000:-0.1659] 0.435783 4.342621 43.41397 Hmm, only slow on i386. It's still fast for me. Now tested on Athlon64 and core2. > i386 [ -0.1659: 0.1659] 0.082880 0.829142 8.28948 > i386 [ 0.1659:11356.0] 0.110590 1.096098 10.96253 > amd64 [-64.0000:-0.1659] 0.066751 0.648734 6.46649 > amd64 [ -0.1659: 0.1659] 0.061531 0.614824 6.14377 > amd64 [ 0.1659:11356.0] 0.071677 0.716927 7.16819 > sparc64 [-113.000:-0.1659] 37.84224 > sparc64 [ -0.1659: 0.1659] 66.28533 > sparc64 [ 0.1659:11356.0] 41.20714 > ... > Testing on flame is excrudiating slow especially because rdivacky > is building clang. I handle the normal slowness by reducing the number of tests by a factor of 100 for long double precision on sparc64. rdivacky only gave another factor of 3 slowness :-). > Yes, the following is one massive patch. Easier to apply that way. > Index: ld80/s_expl.c > ... > Index: ld128/s_expl.c > ... Too hard to see or describe regressions in these because they are relative to an old version. Here are my current diffs for ld80: @ --- z22/s_expl.c Thu May 30 03:56:37 2013 @ +++ ./s_expl.c Thu May 30 04:15:33 2013 @ @@ -63,5 +63,5 @@ @ /* log(2**16384 - 0.5) rounded towards zero: */ @ /* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */ @ -o_thresholdu = LD80C(0xb17217f7d1cf79ab, 13, 11356.5234062941439488L), @ +o_thresholdu = LD80C(0xb17217f7d1cf79ab, 13, 11356.5234062941439488L), @ #define o_threshold (o_thresholdu.e) @ /* log(2**(-16381-64-1)) rounded towards zero: */ @ @@ -75,6 +75,6 @@ @ * bits zero so that multiplication of it by n is exact. @ */ @ -INV_L = 1.8466496523378731e+2, /* 0x171547652b82fe.0p-45 */ @ -L1 = 5.4152123484527692e-3, /* 0x162e42ff000000.0p-60 */ @ +INV_L = 1.8466496523378731e+2, /* 0x171547652b82fe.0p-45 */ @ +L1 = 5.4152123484527692e-3, /* 0x162e42ff000000.0p-60 */ @ L2 = -3.2819649005320973e-13, /* -0x1718432a1b0e26.0p-94 */ @ /* @ @@ -83,9 +83,9 @@ @ * (0.002708 is ln2/(2*INTERVALS) rounded up a little). @ */ @ -A2 = 0.5, @ -A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ @ -A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ @ -A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ @ -A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ @ +A2 = 0.5, @ +A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ @ +A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ @ +A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ @ +A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ @ @ /* As in previous version, with the diff larger since I didn't merge whitespace changes that are regressions unless the formatting was already mostly wrong. @ @@ -273,4 +281,5 @@ @ #endif @ n2 = (unsigned)n % INTERVALS; @ + /* Depend on the sign bit being propagated: */ @ k = n >> LOG2_INTERVALS; @ r1 = x - fn * L1; As in previous version. @ @@ -323,9 +332,19 @@ @ static const double @ T1 = -0.1659, /* ~-30.625/128 * log(2) */ @ -T2 = 0.1659; /* ~30.625/128 * log(2) */ @ +T2 = 0.1659; /* ~30.625/128 * log(2) */ @ @ /* @ - * Domain [-0.1659, 0.1659], range ~[-1.2027e-22, 3.4417e-22]: @ - * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.2 @ + * Domain [-0.1659, 0.1659], range ~[-2.6155e-22, 2.5507e-23]: @ + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.6 @ + * @ + * XXX the coeffs aren't very carefully rounded, and I get 4.5 more bits, @ + * but unlike for ld128 we can't drop any terms. @ + * @ + * XXX this still isn't in standard format: @ + * - extra digits in exponents for decimal values @ + * - no spaces to line up equals signs (a new regression) @ + * - no space for a (not present) minus sign in either the decimal or hex @ + * values (a new regression for the LD80C hex values) @ + * - perhaps they are impossible for double values @ */ @ static const union IEEEl2bits Mostly as in previous version. I merged a lot of whitespace regressions here and only added comments saying that there is more to fix now. @ @@ -387,6 +408,10 @@ @ x2 = x * x; @ x4 = x2 * x2; @ - I didn't merge a new whitespace regression. @ q = x4 * (x2 * (x4 * @ + /* @ + * XXX the number of terms is no longer good for @ + * pairwise grouping of all except B3, and the @ + * grouping is no longer from highest down. @ + */ @ (x2 * B12 + (x * B11 + B10)) + @ (x2 * (x * B9 + B8) + (x * B7 + B6))) + As in previous verision. @ @@ -434,22 +459,21 @@ @ @ if (k == 0) { @ - t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + @ - (tbl[n2].hi - 1); @ + t = SUM2P(tbl[n2].hi - 1, tbl[n2].lo * (r1 + 1) + t * q + @ + tbl[n2].hi * r1); @ RETURNI(t); @ } @ - You don't want most of this, but there is still an extra blank line here, as in previous version. @ if (k == -1) { @ - t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + @ - (tbl[n2].hi - 2); @ + t = SUM2P(tbl[n2].hi - 2, tbl[n2].lo * (r1 + 1) + t * q + @ + tbl[n2].hi * r1); @ RETURNI(t / 2); @ } @ @ if (k < -7) { @ - t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; @ + t = SUM2P(tbl[n2].hi, tbl[n2].lo + t * (q + r1)); @ RETURNI(t * twopk - 1); @ } @ @ if (k > 2 * LDBL_MANT_DIG - 1) { @ - t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; @ + t = SUM2P(tbl[n2].hi, tbl[n2].lo + t * (q + r1)); @ if (k == LDBL_MAX_EXP) @ RETURNI(t * 2 * 0x1p16383L - 1); @ @@ -459,8 +483,9 @@ @ v.xbits.expsign = BIAS - k; @ twomk = v.e; @ + You don't want most of this, but there is now a missing blank line here. Apparently the extra blank line above was removed here. (The initialization of twomk was intentionally separated from its use since the initialization is somewhat special although it is not commented on like the inuitialization of twopk.) @ if (k > LDBL_MANT_DIG - 1) @ - t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi; @ + t = SUM2P(tbl[n2].hi, tbl[n2].lo - twomk + t * (q + r1)); @ else @ - t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk); @ + t = SUM2P(tbl[n2].hi - twomk, tbl[n2].lo + t * (q + r1)); @ RETURNI(t * twopk); @ } Summary of my current diffs for ld128 (the full diffs are hard to untangle). There are a couple of more serious regressions which these patches reverse. No comments on formatting. No patches for things done last year. % --- z22/s_expl.c Thu May 30 04:21:49 2013 % +++ ./s_expl.c Thu May 30 04:59:06 2013 % ... % @@ -252,6 +289,7 @@ % /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ % /* Use a specialized rint() to get fn. Assume round-to-nearest. */ % + /* XXX assume no extra precision for the additions, as for trig fns. */ % + /* XXX this set of comments is now quadruplicated. */ % fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52; % - r = x - fn * L1 - fn * L2; /* r = r1 + r2 done independently. */ This undoes a regression to the ld80 version. Initializing r using extra operations here is an optimization for the ld80 version (really for x86 and/or OOE CPUs with fast pipelines). It is a huge pessimization to do 2 extra long double multiplications on sparc64, so it was not done. % #if defined(HAVE_EFFICIENT_IRINT) % n = irint(fn); % @@ -263,6 +301,8 @@ % r1 = x - fn * L1; % r2 = fn * -L2; % + r = r1 + r2; Finish undoing the regression. % % /* Prepare scale factors. */ % + /* XXX sparc64 multiplication is so slow that scalbnl() is faster. */ % v.e = 1; % if (k >= LDBL_MIN_EXP) { Undo the regression of losing an important optimization hint. The x86ish optimization of using a multiplication to scale is not as bad on sparc64 as the one above, but it is still so bad that scalbnl() is better. The old fdlibm scaling method should be used instead of either of these (it is a specialized scalbnl() manually inlined). % @@ -303,11 +343,20 @@ % static const double % T1 = -0.1659, /* ~-30.625/128 * log(2) */ % -T2 = 0.1659; /* ~30.625/128 * log(2) */ % +T2 = 0.1659; /* ~30.625/128 * log(2) */ % % /* % * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2]. % - * Setting T3 to 0 would require the |x| < 0x1p-113 condition to appear % + * Setting T3 to 0 would require the |x| < 0x1p-113 condition to appear % * in both subintervals, so set T3 = 2**-5, which places the condition % * into the [T1:T3] interval. % + * % + * XXX the above comment has rotted. The condition is now tested for % + * both subintervals (although with T3 nonzero it is only satisfied for % + * [T1:T3]. However, it is now even more critical for other reasons % + * that T3 not being in the middle. We now do this so that the polys % + * for each side can have almost the same degree. It may be slightly % + * misplaced, since the C poly has ended up 1 degree higher. % + * % + * XXX these micro-optimizations are excessive. % */ % static const double I'm not sure if the change to test the condition for both intervals is good, but it makes the first paragraph of the comment completely wrong. Bruce From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 06:46:42 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2382A99; Thu, 30 May 2013 06:46:42 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net [50.196.151.174]) by mx1.freebsd.org (Postfix) with ESMTP id E36AE298; Thu, 30 May 2013 06:46:41 +0000 (UTC) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r4U6kZEU091680; Wed, 29 May 2013 23:46:35 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r4U6kZIJ091679; Wed, 29 May 2013 23:46:35 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Date: Wed, 29 May 2013 23:46:35 -0700 From: David Schultz To: David Chisnall Subject: Re: standards/175811: libstdc++ needs complex support in order use C99 Message-ID: <20130530064635.GA91597@zim.MIT.EDU> References: <201302040328.r143SUd3039504@freefall.freebsd.org> <510F306A.6090009@missouri.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Cc: Stephen Montgomery-Smith , pfg@freebsd.org, freebsd-standards@freebsd.org, freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 06:46:42 -0000 On Fri, Feb 22, 2013, David Chisnall wrote: > On 4 Feb 2013, at 03:52, Stephen Montgomery-Smith wrote: > > > We do really seem to have a lot of working code right now. And the main > > barrier to commitment seems to be style issues. > > > > For example, I have code at http://people.freebsd.org/~stephen/ for the > > complex arctrig functions. And Bruce has clog available. And > > presumably he has logl and atanl also available. > > > > The last I heard about my code is Bruce asking for some style changes. > > However I really don't think I will have time to work on it until at > > least the summer. And to be honest, style just isn't my thing. > > > > I propose (a) that someone else takes over my code (and maybe Bruce's > > code) and make the style changes, or (b) that we get a little less fussy > > about getting it all just so right and start committing stuff. > > > > Let me add that the code we have is already far superior than anything > > in Linux or NetBSD, who clearly didn't worry about huge numerical errors > > in many edge cases. Come on guys, let's start strutting our stuff. > > > > Let's commit what we have, even if it isn't perfect. > > Yes, please can this happen? We are currently on 31 test > failures in the libc++ test suite on -HEAD, of which at least 18 > are due to linker failures trying to find missing libm > functions. We are very close to having a complete C++11 > implementation, yet we are held up by the lack of C99 support, > and we are held up there by style nits? > > On behalf of core, please can we commit the existing code and > worry about the style later? Given the expertise required to > work on the libm functions, most of the people who are able to > hack on the code have already read it and so concerns about > consistency readability are somewhat misplaced. I didn't see this thread until now, but coincidentally, I just wrote tests and manpages for and committed Stephen's implementations of most of the missing double/float complex functions. I don't know the status of clog() or cpow(), but murray@ has a patch to port the NetBSD versions, which I'm also willing to commit given the unacceptable delays in producing something better. I was wondering if you could explain a bit about what your goal is here, though. Is there some kind of certification you are trying to achieve? Why can't you just comment out the few missing functions? You've been adamant about this issue ever since joining the Project, even suggesting that we commit bogus implementations just for the sake of having the symbols. I completely agree with you that the lack of progress is unacceptable, and I'm sorry I haven't had more time to work on this stuff myself, but I also don't understand the source of your urgency. The reason I'm asking is that I'm pushing to get a lot of stuff into the tree quickly, but realistically, in the short term we're only going to get 95% of the way there. I doubt good implementations of complicated functions that nobody uses, such as erfcl() and tgammal(), are going to appear overnight. Thus, I would like to know whether the last 5% is needed quickly, and if so, why. From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 13:56:27 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6565448F for ; Thu, 30 May 2013 13:56:27 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-ie0-x231.google.com (mail-ie0-x231.google.com [IPv6:2607:f8b0:4001:c03::231]) by mx1.freebsd.org (Postfix) with ESMTP id 349DC370 for ; Thu, 30 May 2013 13:56:27 +0000 (UTC) Received: by mail-ie0-f177.google.com with SMTP id 9so652573iec.36 for ; Thu, 30 May 2013 06:56:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=IORAOblq2wOwGWSdSoX5BpqP8LFSxOX+toid05IcPvk=; b=lYqWIwnDHYoJ4UuTJ68N4QI47ZpFs0t+xQ7C/COlUSjZGPAX/FVhq5lV3xttGZiHW/ NdiqICfRTev1G2Bxbd8udgPfvCRi1zZ/eB1/4Ea4t6hKmVzI/tq4ns8ZkTy0qaGybLh1 wC2sgUS/dtZymM7+KHXimOcgprkBfr6qXMbxIB85untn6EtprFQK9avoPsySLwFiSu9e G3pzZ3ZSMso97W4X7Yc+7vKbtBq2t8hjuoev73Uw+ZslXNmM7ByY1vheiwh5qy/k0nBF Kovs+SLKupQXfx1QdOpbk2zwTW/rFLx3PggVkLqJpEBCx5CQEzWnp6eVihR5dFzGO2Bf eQ9g== X-Received: by 10.50.25.4 with SMTP id y4mr3701347igf.111.1369922186925; Thu, 30 May 2013 06:56:26 -0700 (PDT) Received: from 53.imp.bsdimp.com (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198]) by mx.google.com with ESMTPSA id ik6sm7054727igb.3.2013.05.30.06.56.25 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 30 May 2013 06:56:26 -0700 (PDT) Sender: Warner Losh Subject: Re: standards/175811: libstdc++ needs complex support in order use C99 Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <20130530064635.GA91597@zim.MIT.EDU> Date: Thu, 30 May 2013 07:56:24 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: <201302040328.r143SUd3039504@freefall.freebsd.org> <510F306A.6090009@missouri.edu> <20130530064635.GA91597@zim.MIT.EDU> To: David Schultz X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQlajA6OIWWt/6Iphg9osVLVA0f4PDMjoH3Dp5asn+Mrifzw9mqvNu8CbPnhhGQwmguTAN2s Cc: Stephen Montgomery-Smith , freebsd-standards@freebsd.org, pfg@freebsd.org, David Chisnall , freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 13:56:27 -0000 On May 30, 2013, at 12:46 AM, David Schultz wrote: > On Fri, Feb 22, 2013, David Chisnall wrote: >> On 4 Feb 2013, at 03:52, Stephen Montgomery-Smith = wrote: >>=20 >>> We do really seem to have a lot of working code right now. And the = main >>> barrier to commitment seems to be style issues. >>>=20 >>> For example, I have code at http://people.freebsd.org/~stephen/ for = the >>> complex arctrig functions. And Bruce has clog available. And >>> presumably he has logl and atanl also available. >>>=20 >>> The last I heard about my code is Bruce asking for some style = changes. >>> However I really don't think I will have time to work on it until at >>> least the summer. And to be honest, style just isn't my thing. >>>=20 >>> I propose (a) that someone else takes over my code (and maybe = Bruce's >>> code) and make the style changes, or (b) that we get a little less = fussy >>> about getting it all just so right and start committing stuff. >>>=20 >>> Let me add that the code we have is already far superior than = anything >>> in Linux or NetBSD, who clearly didn't worry about huge numerical = errors >>> in many edge cases. Come on guys, let's start strutting our stuff. >>>=20 >>> Let's commit what we have, even if it isn't perfect. >>=20 >> Yes, please can this happen? We are currently on 31 test >> failures in the libc++ test suite on -HEAD, of which at least 18 >> are due to linker failures trying to find missing libm >> functions. We are very close to having a complete C++11 >> implementation, yet we are held up by the lack of C99 support, >> and we are held up there by style nits? >>=20 >> On behalf of core, please can we commit the existing code and >> worry about the style later? Given the expertise required to >> work on the libm functions, most of the people who are able to >> hack on the code have already read it and so concerns about >> consistency readability are somewhat misplaced. >=20 > I didn't see this thread until now, but coincidentally, I just > wrote tests and manpages for and committed Stephen's > implementations of most of the missing double/float complex > functions. I don't know the status of clog() or cpow(), but > murray@ has a patch to port the NetBSD versions, which I'm also > willing to commit given the unacceptable delays in producing > something better. I'm all for better progress... Thank you for your efforts. > I was wondering if you could explain a bit about what your goal is > here, though. Is there some kind of certification you are trying > to achieve? Why can't you just comment out the few missing > functions? You've been adamant about this issue ever since > joining the Project, even suggesting that we commit bogus > implementations just for the sake of having the symbols. I > completely agree with you that the lack of progress is > unacceptable, and I'm sorry I haven't had more time to work on > this stuff myself, but I also don't understand the source of your > urgency. More and more projects are refusing to work around our gridlock. We have = to report R each new release because they have taken out the checks for = the missing symbols. It is really an embarrassment to the project. We've = let the perfect be the enemy of the good. There are R scripts that run = elsewhere and not on FreeBSD. R is the one I know most about since I've = been using R a lot to crunch numbers for work, but there are others. The urgency is we'd like to have this stuff done for 10, if at all = possible. And if not done, then a lot closer to done than where we are = today. > The reason I'm asking is that I'm pushing to get a lot of stuff > into the tree quickly, but realistically, in the short term we're > only going to get 95% of the way there. I doubt good > implementations of complicated functions that nobody uses, such as > erfcl() and tgammal(), are going to appear overnight. Thus, I > would like to know whether the last 5% is needed quickly, and if > so, why. I'm all for getting everything we can into the tree that produces an = answer that's not perfect, but close. What's the error that would be = generated with the naive implementation of long double tgammal(long double f) { return tgamma(f); } But assuming that, for some reason, produces errors larger than = difference in precision between double and long double due to extreme = non-linearity of these functions, having only a couple of stragglers is = a far better position to be in than we are today. Warner= From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 15:41:34 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 80CE87E5 for ; Thu, 30 May 2013 15:41:34 +0000 (UTC) (envelope-from pfg@FreeBSD.org) Received: from nm1-vm1.bullet.mail.bf1.yahoo.com (nm1-vm1.bullet.mail.bf1.yahoo.com [98.139.213.163]) by mx1.freebsd.org (Postfix) with ESMTP id 32889FC3 for ; Thu, 30 May 2013 15:41:34 +0000 (UTC) Received: from [98.139.212.148] by nm1.bullet.mail.bf1.yahoo.com with NNFMP; 30 May 2013 15:41:26 -0000 Received: from [98.139.213.1] by tm5.bullet.mail.bf1.yahoo.com with NNFMP; 30 May 2013 15:41:26 -0000 Received: from [127.0.0.1] by smtp101.mail.bf1.yahoo.com with NNFMP; 30 May 2013 15:41:26 -0000 X-Yahoo-Newman-Id: 935342.47691.bm@smtp101.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: BabCFEIVM1lG9dYnW0ApjKaEQqUYGRIAOt95Of5VL7lslDw 0SRE1wwwyNyDyIYQy66ygEjPe6WvzbZGx1_14Uqt.pJG7ajhMWlfJ2Rcog3t lc0B38qJMe5sQbKyCW5TWjhnUylJjTyR5n.WRZ2EtMiFrtUTfyZ87dOKlukE U1bKMm8zbhHnofWl1A07iDouURYQuE7K6wWZTLSkq4d5IcGwJXhdLBRB8fRe QZvru99NhC6ilipyqs5L5EQ2nGXuC119FBFGtRW.lnCVIYBoPvgkC4bp6nlb yP06pOh0bLk3iPCLlMV6YesWHCGeHOc8_vey5u0YIpbtMMF5IzmZ86n9BOJi ldaaO5flRnSazS74aaymGXFqnPdu5tMZO1Uk5YKh7CM3qE45jEyuyyNBMNGB ETYNxtK9yZhCb6qj_1shzgXD3lsk9KJpkE.5V7Lx0BISJxMX8.LfzKH3SEEi v6kZhhLXqJ46cc8UiLL_rvMicLBDEPMfzvgmTSJDHUsY.ChNmmQExPlKFtVo VFAOcRTrTq.m1DjlRzF_qK8vVFnBjstRupPyp9ikT2ZhmPf40xePLB1IxMxZ yfl_PCsd1c737uQ-- X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf X-Rocket-Received: from [192.168.0.102] (pfg@190.157.126.109 with ) by smtp101.mail.bf1.yahoo.com with SMTP; 30 May 2013 15:41:26 +0000 UTC Message-ID: <51A77324.2070702@FreeBSD.org> Date: Thu, 30 May 2013 10:41:24 -0500 From: Pedro Giffuni User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: David Schultz Subject: Re: standards/175811: libstdc++ needs complex support in order use C99 References: <201302040328.r143SUd3039504@freefall.freebsd.org> <510F306A.6090009@missouri.edu> <20130530064635.GA91597@zim.MIT.EDU> In-Reply-To: <20130530064635.GA91597@zim.MIT.EDU> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Stephen Montgomery-Smith , freebsd-standards@freebsd.org, freebsd-numerics@freebsd.org, David Chisnall X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 15:41:34 -0000 On 30.05.2013 01:46, David Schultz wrote: > On Fri, Feb 22, 2013, David Chisnall wrote: >> On 4 Feb 2013, at 03:52, Stephen Montgomery-Smith wrote: >> >>> We do really seem to have a lot of working code right now. And the main >>> barrier to commitment seems to be style issues. >>> >>> For example, I have code at http://people.freebsd.org/~stephen/ for the >>> complex arctrig functions. And Bruce has clog available. And >>> presumably he has logl and atanl also available. >>> >>> The last I heard about my code is Bruce asking for some style changes. >>> However I really don't think I will have time to work on it until at >>> least the summer. And to be honest, style just isn't my thing. >>> >>> I propose (a) that someone else takes over my code (and maybe Bruce's >>> code) and make the style changes, or (b) that we get a little less fussy >>> about getting it all just so right and start committing stuff. >>> >>> Let me add that the code we have is already far superior than anything >>> in Linux or NetBSD, who clearly didn't worry about huge numerical errors >>> in many edge cases. Come on guys, let's start strutting our stuff. >>> >>> Let's commit what we have, even if it isn't perfect. >> Yes, please can this happen? We are currently on 31 test >> failures in the libc++ test suite on -HEAD, of which at least 18 >> are due to linker failures trying to find missing libm >> functions. We are very close to having a complete C++11 >> implementation, yet we are held up by the lack of C99 support, >> and we are held up there by style nits? >> >> On behalf of core, please can we commit the existing code and >> worry about the style later? Given the expertise required to >> work on the libm functions, most of the people who are able to >> hack on the code have already read it and so concerns about >> consistency readability are somewhat misplaced. > I didn't see this thread until now, but coincidentally, I just > wrote tests and manpages for and committed Stephen's > implementations of most of the missing double/float complex > functions. I don't know the status of clog() or cpow(), but > murray@ has a patch to port the NetBSD versions, which I'm also > willing to commit given the unacceptable delays in producing > something better. Thank you !! > I was wondering if you could explain a bit about what your goal is > here, though. Is there some kind of certification you are trying > to achieve? Why can't you just comment out the few missing > functions? You've been adamant about this issue ever since > joining the Project, even suggesting that we commit bogus > implementations just for the sake of having the symbols. I > completely agree with you that the lack of progress is > unacceptable, and I'm sorry I haven't had more time to work on > this stuff myself, but I also don't understand the source of your > urgency. What I am finding rather disappointing is that our libstdc++ lacks so many features wrt to what is expected from developers used to linux. I think it's reasonable to think that libc++ will require the same features as modern libstdc++ to support a quality port. In addition to R, the current situation also has undesirable effects in boost, where we don't support long double (nevermind the bogus patch on our ports tree). if we if we can just get our local libstdc++ to use C99 that would be an advance. The target at this time would be resolving standards/175811 and it would also be interesting to see what the upstream gcc/libstdc++ requires. > The reason I'm asking is that I'm pushing to get a lot of stuff > into the tree quickly, but realistically, in the short term we're > only going to get 95% of the way there. I doubt good > implementations of complicated functions that nobody uses, such as > erfcl() and tgammal(), are going to appear overnight. Thus, I > would like to know whether the last 5% is needed quickly, and if > so, why. I may be wrong but with long double support people that need erfcl() and tgamma() can get them from boost. The problem is therefore not implementing everything but getting enough to turn on the features supported by libstdc++ and boost. Pedro. From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 16:27:29 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7A2D0153 for ; Thu, 30 May 2013 16:27:29 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 469F935B for ; Thu, 30 May 2013 16:27:29 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4UGRNVo067068; Thu, 30 May 2013 09:27:23 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4UGRNTQ067067; Thu, 30 May 2013 09:27:23 -0700 (PDT) (envelope-from sgk) Date: Thu, 30 May 2013 09:27:23 -0700 From: Steve Kargl To: Bruce Evans Subject: Re: Patches for s_expl.c Message-ID: <20130530162723.GB66755@troutmask.apl.washington.edu> References: <20130528172242.GA51485@troutmask.apl.washington.edu> <20130529062437.V4648@besplex.bde.org> <20130529162441.GA58773@troutmask.apl.washington.edu> <20130530045951.Y4776@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130530045951.Y4776@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 16:27:29 -0000 On Thu, May 30, 2013 at 06:25:31AM +1000, Bruce Evans wrote: > On Wed, 29 May 2013, Steve Kargl wrote: > > > Yes, the following is one massive patch. > > Easier to apply that way. > OK, I've restored whitespace to hopefully match your expectations. Removed excess digits in exponents (e.g., 1.234e08 --> 1.234e8). Restored XXX comments. Removed (unnecessary?) blank lines. Restored the order of computing r = r1 + r2 in ld128. Moved the |x| < 0x1p-113 if-block back into the [T1:T3] interval. Final questions. What is your preference for committing expm1l? Should it be included in s_expl.c or should I use 'svn cp' to copy s_expl.c to s_expm1l.c and add the implementation of expm1l to the copied version? -- Steve Index: ld80/s_expl.c =================================================================== --- ld80/s_expl.c (revision 251067) +++ ld80/s_expl.c (working copy) @@ -29,7 +29,7 @@ #include __FBSDID("$FreeBSD$"); -/*- +/** * Compute the exponential of x for Intel 80-bit format. This is based on: * * PTP Tang, "Table-driven implementation of the exponential function @@ -50,6 +50,7 @@ #include "math_private.h" #define INTERVALS 128 +#define LOG2_INTERVALS 7 #define BIAS (LDBL_MAX_EXP - 1) static const long double @@ -60,9 +61,12 @@ static const union IEEEl2bits /* log(2**16384 - 0.5) rounded towards zero: */ -o_threshold = LD80C(0xb17217f7d1cf79ab, 13, 11356.5234062941439488L), +/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */ +o_thresholdu = LD80C(0xb17217f7d1cf79ab, 13, 11356.5234062941439488L), +#define o_threshold (o_thresholdu.e) /* log(2**(-16381-64-1)) rounded towards zero: */ -u_threshold = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L); +u_thresholdu = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L); +#define u_threshold (u_thresholdu.e) static const double /* @@ -78,11 +82,11 @@ * |exp(x) - p(x)| < 2**-77.2 * (0.002708 is ln2/(2*INTERVALS) rounded up a little). */ -P2 = 0.5, -P3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ -P4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ -P5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ -P6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ +A2 = 0.5, +A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ +A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ +A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ +A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ /* * 2^(i/INTERVALS) for i in [0,INTERVALS] is represented by two values where @@ -96,8 +100,7 @@ static const struct { double hi; double lo; -/* XXX should rename 's'. */ -} s[INTERVALS] = { +} tbl[INTERVALS] = { 0x1p+0, 0x0p+0, 0x1.0163da9fb3335p+0, 0x1.b61299ab8cdb7p-54, 0x1.02c9a3e778060p+0, 0x1.dcdef95949ef4p-53, @@ -232,7 +235,8 @@ expl(long double x) { union IEEEl2bits u, v; - long double fn, q, r, r1, r2, t, t23, t45, twopk, twopkp10000, z; + long double fn, q, r, r1, r2, t, twopk, twopkp10000; + long double z; int k, n, n2; uint16_t hx, ix; @@ -242,40 +246,39 @@ ix = hx & 0x7fff; if (ix >= BIAS + 13) { /* |x| >= 8192 or x is NaN */ if (ix == BIAS + LDBL_MAX_EXP) { - if (hx & 0x8000 && u.xbits.man == 1ULL << 63) - return (0.0L); /* x is -Inf */ - return (x + x); /* x is +Inf, NaN or unsupported */ + if (hx & 0x8000) /* x is -Inf, -NaN or unsupported */ + return (-1 / x); + return (x + x); /* x is +Inf, +NaN or unsupported */ } - if (x > o_threshold.e) + if (x > o_threshold) return (huge * huge); - if (x < u_threshold.e) + if (x < u_threshold) return (tiny * tiny); - } else if (ix < BIAS - 66) { /* |x| < 0x1p-66 */ - /* includes pseudo-denormals */ - if (huge + x > 1.0L) /* trigger inexact iff x != 0 */ - return (1.0L + x); + } else if (ix < BIAS - 65) { /* |x| < 0x1p-65 (includes pseudos) */ + return (1 + x); /* 1 with inexact iff x != 0 */ } ENTERI(); - /* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */ + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ /* Use a specialized rint() to get fn. Assume round-to-nearest. */ fn = x * INV_L + 0x1.8p63 - 0x1.8p63; r = x - fn * L1 - fn * L2; /* r = r1 + r2 done independently. */ #if defined(HAVE_EFFICIENT_IRINTL) - n = irintl(fn); + n = irintl(fn); #elif defined(HAVE_EFFICIENT_IRINT) - n = irint(fn); + n = irint(fn); #else - n = (int)fn; + n = (int)fn; #endif n2 = (unsigned)n % INTERVALS; - k = (n - n2) / INTERVALS; + /* Depend on the sign bit being propagated: */ + k = n >> LOG2_INTERVALS; r1 = x - fn * L1; - r2 = -fn * L2; + r2 = fn * -L2; /* Prepare scale factors. */ - v.xbits.man = 1ULL << 63; + v.e = 1; if (k >= LDBL_MIN_EXP) { v.xbits.expsign = BIAS + k; twopk = v.e; @@ -284,21 +287,183 @@ twopkp10000 = v.e; } - /* Evaluate expl(midpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */ - /* Here q = q(r), not q(r1), since r1 is lopped like L1. */ - t45 = r * P5 + P4; + /* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */ z = r * r; - t23 = r * P3 + P2; - q = r2 + z * t23 + z * z * t45 + z * z * z * P6; - t = (long double)s[n2].lo + s[n2].hi; - t = s[n2].lo + t * (q + r1) + s[n2].hi; + q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6; + t = (long double)tbl[n2].lo + tbl[n2].hi; + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; /* Scale by 2**k. */ if (k >= LDBL_MIN_EXP) { if (k == LDBL_MAX_EXP) - RETURNI(t * 2.0L * 0x1p16383L); + RETURNI(t * 2 * 0x1p16383L); RETURNI(t * twopk); } else { RETURNI(t * twopkp10000 * twom10000); } } + +/** + * Compute expm1l(x) for Intel 80-bit format. This is based on: + * + * PTP Tang, "Table-driven implementation of the Expm1 function + * in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18, + * 211-222 (1992). + */ + +/* + * Our T1 and T2 are chosen to be approximately the points where method + * A and method B have the same accuracy. Tang's T1 and T2 are the + * points where method A's accuracy changes by a full bit. For Tang, + * this drop in accuracy makes method A immediately less accurate than + * method B, but our larger INTERVALS makes method A 2 bits more + * accurate so it remains the most accurate method significantly + * closer to the origin despite losing the full bit in our extended + * range for it. + */ +static const double +T1 = -0.1659, /* ~-30.625/128 * log(2) */ +T2 = 0.1659; /* ~30.625/128 * log(2) */ + +/* + * Domain [-0.1659, 0.1659], range ~[-1.2027e-22, 3.4417e-22]: + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.2 + */ +static const union IEEEl2bits +B3 = LD80C(0xaaaaaaaaaaaaaaab, -3, 1.66666666666666666671e-1L), +B4 = LD80C(0xaaaaaaaaaaaaaaac, -5, 4.16666666666666666712e-2L); + +static const double +B5 = 8.3333333333333245e-3, /* 0x1.111111111110cp-7 */ +B6 = 1.3888888888888861e-3, /* 0x1.6c16c16c16c0ap-10 */ +B7 = 1.9841269841532042e-4, /* 0x1.a01a01a0319f9p-13 */ +B8 = 2.4801587302069236e-5, /* 0x1.a01a01a03cbbcp-16 */ +B9 = 2.7557316558468562e-6, /* 0x1.71de37fd33d67p-19 */ +B10 = 2.7557315829785151e-7, /* 0x1.27e4f91418144p-22 */ +B11 = 2.5063168199779829e-8, /* 0x1.ae94fabdc6b27p-26 */ +B12 = 2.0887164654459567e-9; /* 0x1.1f122d6413fe1p-29 */ + +long double +expm1l(long double x) +{ + union IEEEl2bits u, v; + long double fn, hx2_hi, hx2_lo, q, r, r1, r2, t, twomk, twopk, x_hi; + long double x_lo, x2, z; + long double x4; + int k, n, n2; + uint16_t hx, ix; + + /* Filter out exceptional cases. */ + u.e = x; + hx = u.xbits.expsign; + ix = hx & 0x7fff; + if (ix >= BIAS + 6) { /* |x| >= 64 or x is NaN */ + if (ix == BIAS + LDBL_MAX_EXP) { + if (hx & 0x8000) /* x is -Inf, -NaN or unsupported */ + return (-1 / x - 1); + return (x + x); /* x is +Inf, +NaN or unsupported */ + } + if (x > o_threshold) + return (huge * huge); + /* + * expm1l() never underflows, but it must avoid + * unrepresentable large negative exponents. We used a + * much smaller threshold for large |x| above than in + * expl() so as to handle not so large negative exponents + * in the same way as large ones here. + */ + if (hx & 0x8000) /* x <= -64 */ + return (tiny - 1); /* good for x < -65ln2 - eps */ + } + + ENTERI(); + + if (T1 < x && x < T2) { + if (ix < BIAS - 64) { /* |x| < 0x1p-64 (includes pseudos) */ + /* x (rounded) with inexact if x != 0: */ + RETURNI(x == 0 ? x : + (0x1p100 * x + fabsl(x)) * 0x1p-100); + } + + x2 = x * x; + x4 = x2 * x2; + q = x4 * (x2 * (x4 * + /* + * XXX the number of terms is no longer good for + * pairwise grouping of all except B3, and the + * grouping is no longer from highest down. + */ + (x2 * B12 + (x * B11 + B10)) + + (x2 * (x * B9 + B8) + (x * B7 + B6))) + + (x * B5 + B4.e)) + x2 * x * B3.e; + + x_hi = (float)x; + x_lo = x - x_hi; + hx2_hi = x_hi * x_hi / 2; + hx2_lo = x_lo * (x + x_hi) / 2; + if (ix >= BIAS - 7) + RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi)); + else + RETURNI(hx2_lo + q + hx2_hi + x); + } + + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ + /* Use a specialized rint() to get fn. Assume round-to-nearest. */ + fn = x * INV_L + 0x1.8p63 - 0x1.8p63; +#if defined(HAVE_EFFICIENT_IRINTL) + n = irintl(fn); +#elif defined(HAVE_EFFICIENT_IRINT) + n = irint(fn); +#else + n = (int)fn; +#endif + n2 = (unsigned)n % INTERVALS; + k = n >> LOG2_INTERVALS; + r1 = x - fn * L1; + r2 = fn * -L2; + r = r1 + r2; + + /* Prepare scale factor. */ + v.e = 1; + v.xbits.expsign = BIAS + k; + twopk = v.e; + + /* + * Evaluate lower terms of + * expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). + */ + z = r * r; + q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6; + + t = (long double)tbl[n2].lo + tbl[n2].hi; + + if (k == 0) { + t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + + (tbl[n2].hi - 1); + RETURNI(t); + } + if (k == -1) { + t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + + (tbl[n2].hi - 2); + RETURNI(t / 2); + } + if (k < -7) { + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; + RETURNI(t * twopk - 1); + } + if (k > 2 * LDBL_MANT_DIG - 1) { + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; + if (k == LDBL_MAX_EXP) + RETURNI(t * 2 * 0x1p16383L - 1); + RETURNI(t * twopk - 1); + } + + v.xbits.expsign = BIAS - k; + twomk = v.e; + + if (k > LDBL_MANT_DIG - 1) + t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi; + else + t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk); + RETURNI(t * twopk); +} Index: ld128/s_expl.c =================================================================== --- ld128/s_expl.c (revision 251067) +++ ld128/s_expl.c (working copy) @@ -1,5 +1,5 @@ /*- - * Copyright (c) 2012 Steven G. Kargl + * Copyright (c) 2009-2012 Steven G. Kargl * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -22,6 +22,8 @@ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * Optimized by Bruce D. Evans. */ #include @@ -38,35 +40,56 @@ #include "math_private.h" #define INTERVALS 128 +#define LOG2_INTERVALS 7 #define BIAS (LDBL_MAX_EXP - 1) +static const long double +huge = 0x1p10000L, +twom10000 = 0x1p-10000L; +/* XXX Prevent gcc from erroneously constant folding this: */ static volatile const long double tiny = 0x1p-10000L; static const long double -INV_L = 1.84664965233787316142070359168242182e+02L, -L1 = 5.41521234812457272982212595914567508e-03L, -L2 = -1.02536706388947310094527932552595546e-29L, -huge = 0x1p10000L, +/* log(2**16384 - 0.5) rounded towards zero: */ +/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */ o_threshold = 11356.523406294143949491931077970763428L, -twom10000 = 0x1p-10000L, +/* log(2**(-16381-64-1)) rounded towards zero: */ u_threshold = -11433.462743336297878837243843452621503L; +static const double +/* + * ln2/INTERVALS = L1+L2 (hi+lo decomposition for multiplication). L1 must + * have at least 22 (= log2(|LDBL_MIN_EXP-extras|) + log2(INTERVALS)) lowest + * bits zero so that multiplication of it by n is exact. + */ +INV_L = 1.8466496523378731e+2, /* 0x171547652b82fe.0p-45 */ +L2 = -1.0253670638894731e-29; /* -0x1.9ff0342542fc3p-97 */ static const long double -P2 = 5.00000000000000000000000000000000000e-1L, -P3 = 1.66666666666666666666666666666666972e-1L, -P4 = 4.16666666666666666666666666653708268e-2L, -P5 = 8.33333333333333333333333315069867254e-3L, -P6 = 1.38888888888888888888996596213795377e-3L, -P7 = 1.98412698412698412718821436278644414e-4L, -P8 = 2.48015873015869681884882576649543128e-5L, -P9 = 2.75573192240103867817876199544468806e-6L, -P10 = 2.75573236172670046201884000197885520e-7L, -P11 = 2.50517544183909126492878226167697856e-8L; +/* 0x1.62e42fefa39ef35793c768000000p-8 */ +L1 = 5.41521234812457272982212595914567508e-3L; +static const long double +/* + * Domain [-0.002708, 0.002708], range ~[-2.4011e-38, 2.4244e-38]: + * |exp(x) - p(x)| < 2**-124.9 + * (0.002708 is ln2/(2*INTERVALS) rounded up a little). + */ +A2 = 0.5, +A3 = 1.66666666666666666666666666651085500e-1L, +A4 = 4.16666666666666666666666666425885320e-2L, +A5 = 8.33333333333333333334522877160175842e-3L, +A6 = 1.38888888888888888889971139751596836e-3L; + +static const double +A7 = 1.9841269841269471e-4, +A8 = 2.4801587301585284e-5, +A9 = 2.7557324277411234e-6, +A10 = 2.7557333722375072e-7; + static const struct { long double hi; long double lo; -} s[INTERVALS] = { +} tbl[INTERVALS] = { 0x1p0L, 0x0p0L, 0x1.0163da9fb33356d84a66aep0L, 0x3.36dcdfa4003ec04c360be2404078p-92L, 0x1.02c9a3e778060ee6f7cacap0L, 0x4.f7a29bde93d70a2cabc5cb89ba10p-92L, @@ -201,9 +224,10 @@ expl(long double x) { union IEEEl2bits u, v; - long double fn, r, r1, r2, q, t, twopk, twopkp10000; + long double q, r, r1, t, twopk, twopkp10000; + double dr, fn, r2; int k, n, n2; - uint32_t hx, ix; + uint16_t hx, ix; /* Filter out exceptional cases. */ u.e = x; @@ -211,31 +235,39 @@ ix = hx & 0x7fff; if (ix >= BIAS + 13) { /* |x| >= 8192 or x is NaN */ if (ix == BIAS + LDBL_MAX_EXP) { - if (hx & 0x8000 && u.xbits.manh == 0 && - u.xbits.manl == 0) - return (0.0L); /* x is -Inf */ - return (x + x); /* x is +Inf or NaN */ + if (hx & 0x8000) /* x is -Inf or -NaN */ + return (-1 / x); + return (x + x); /* x is +Inf or +NaN */ } if (x > o_threshold) return (huge * huge); if (x < u_threshold) return (tiny * tiny); - } else if (ix < BIAS - 115) { /* |x| < 0x1p-115 */ - if (huge + x > 1.0L) /* trigger inexact iff x != 0 */ - return (1.0L + x); + } else if (ix < BIAS - 114) { /* |x| < 0x1p-114 */ + return (1 + x); /* 1 with inexact iff x != 0 */ } - /* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */ - fn = x * INV_L + 0x1.8p112 - 0x1.8p112; - n = (int)fn; + ENTERI(); + + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ + /* Use a specialized rint() to get fn. Assume round-to-nearest. */ + /* XXX assume no extra precision for the additions, as for trig fns. */ + /* XXX this set of comments is now quadruplicated. */ + fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52; +#if defined(HAVE_EFFICIENT_IRINT) + n = irint(fn); +#else + n = (int)fn; +#endif n2 = (unsigned)n % INTERVALS; - k = (n - n2) / INTERVALS; + k = n >> LOG2_INTERVALS; r1 = x - fn * L1; - r2 = -fn * L2; + r2 = fn * -L2; + r = r1 + r2; /* Prepare scale factors. */ - v.xbits.manh = 0; - v.xbits.manl = 0; + /* XXX sparc64 multiplication is so slow that scalbnl() is faster. */ + v.e = 1; if (k >= LDBL_MIN_EXP) { v.xbits.expsign = BIAS + k; twopk = v.e; @@ -244,18 +276,220 @@ twopkp10000 = v.e; } - r = r1 + r2; - q = r * r * (P2 + r * (P3 + r * (P4 + r * (P5 + r * (P6 + r * (P7 + - r * (P8 + r * (P9 + r * (P10 + r * P11))))))))); - t = s[n2].lo + s[n2].hi; - t = s[n2].hi + (s[n2].lo + t * (r2 + q + r1)); + /* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */ + dr = r; + q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 + + dr * (A7 + dr * (A8 + dr * (A9 + dr * A10)))))))); + t = tbl[n2].lo + tbl[n2].hi; + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; /* Scale by 2**k. */ if (k >= LDBL_MIN_EXP) { if (k == LDBL_MAX_EXP) - return (t * 2.0L * 0x1p16383L); - return (t * twopk); + RETURNI(t * 2 * 0x1p16383L); + RETURNI(t * twopk); } else { - return (t * twopkp10000 * twom10000); + RETURNI(t * twopkp10000 * twom10000); } } + +/* + * Our T1 and T2 are chosen to be approximately the points where method + * A and method B have the same accuracy. Tang's T1 and T2 are the + * points where method A's accuracy changes by a full bit. For Tang, + * this drop in accuracy makes method A immediately less accurate than + * method B, but our larger INTERVALS makes method A 2 bits more + * accurate so it remains the most accurate method significantly + * closer to the origin despite losing the full bit in our extended + * range for it. + */ +static const double +T1 = -0.1659, /* ~-30.625/128 * log(2) */ +T2 = 0.1659; /* ~30.625/128 * log(2) */ + +/* + * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2]. + * Setting T3 to 0 would require the |x| < 0x1p-113 condition to appear + * in both subintervals, so set T3 = 2**-5, which places the condition + * into the [T1:T3] interval. + */ +static const double +T3 = 0.03125; + +/* + * XXX Estimated range is for absolute error. + * Domain [-0.1659, 0.03125], range ~[-1.8933e-38, 1.8943e-38]: + * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-125.3 + */ +static const long double +C3 = 1.66666666666666666666666666666666667e-1L, +C4 = 4.16666666666666666666666666666666645e-2L, +C5 = 8.33333333333333333333333333333371638e-3L, +C6 = 1.38888888888888888888888888891188658e-3L, +C7 = 1.98412698412698412698412697235950394e-4L, +C8 = 2.48015873015873015873015112487849040e-5L, +C9 = 2.75573192239858906525606685484412005e-6L, +C10 = 2.75573192239858906612966093057020362e-7L, +C11 = 2.50521083854417203619031960151253944e-8L, +C12 = 2.08767569878679576457272282566520649e-9L, +C13 = 1.60590438367252471783548748824255707e-10L; + +static const double +C14 = 1.1470745580491932e-11, /* 0x1.93974a81dae3p-37 */ +C15 = 7.6471620181090468e-13, /* 0x1.ae7f3820adab1p-41 */ +C16 = 4.7793721460260450e-14, /* 0x1.ae7cd18a18eacp-45 */ +C17 = 2.8074757356658877e-15, /* 0x1.949992a1937d9p-49 */ +C18 = 1.4760610323699476e-16; /* 0x1.545b43aabfbcdp-53 */ + +/* + * XXX Estimated range is for absolute error. + * Domain [0.03125, 0.1659], range ~[-2.7597e-38, 2.7602e-38]: + * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-124.8 + */ +static const long double +D3 = 1.66666666666666666666666666666682245e-1L, +D4 = 4.16666666666666666666666666634228324e-2L, +D5 = 8.33333333333333333333333364022244481e-3L, +D6 = 1.38888888888888888888887138722762072e-3L, +D7 = 1.98412698412698412699085805424661471e-4L, +D8 = 2.48015873015873015687993712101479612e-5L, +D9 = 2.75573192239858944101036288338208042e-6L, +D10 = 2.75573192239853161148064676533754048e-7L, +D11 = 2.50521083855084570046480450935267433e-8L, +D12 = 2.08767569819738524488686318024854942e-9L, +D13 = 1.60590442297008495301927448122499313e-10L; + +static const double +D14 = 1.1470726176204336e-11, /* 0x1.93971dc395d9ep-37 */ +D15 = 7.6478532249581686e-13, /* 0x1.ae892e3D16fcep-41 */ +D16 = 4.7628892832607741e-14, /* 0x1.ad00Dfe41feccp-45 */ +D17 = 3.0524857220358650e-15; /* 0x1.D7e8d886Df921p-49 */ + +long double +expm1l(long double x) +{ + union IEEEl2bits u, v; + long double hx2_hi, hx2_lo, q, r, r1, t, twomk, twopk, x_hi; + long double x_lo, x2; + double dr, dx, fn, r2; + int k, n, n2; + uint16_t hx, ix; + + /* Filter out exceptional cases. */ + u.e = x; + hx = u.xbits.expsign; + ix = hx & 0x7fff; + if (ix >= BIAS + 7) { /* |x| >= 128 or x is NaN */ + if (ix == BIAS + LDBL_MAX_EXP) { + if (hx & 0x8000) /* x is -Inf or -NaN */ + return (-1 / x - 1); + return (x + x); /* x is +Inf or +NaN */ + } + if (x > o_threshold) + return (huge * huge); + /* + * expm1l() never underflows, but it must avoid + * unrepresentable large negative exponents. We used a + * much smaller threshold for large |x| above than in + * expl() so as to handle not so large negative exponents + * in the same way as large ones here. + */ + if (hx & 0x8000) /* x <= -128 */ + return (tiny - 1); /* good for x < -114ln2 - eps */ + } + + ENTERI(); + + if (T1 < x && x < T2) { + + x2 = x * x; + dx = x; + + if (x < T3) { + if (ix < BIAS - 113) { /* |x| < 0x1p-113 */ + /* x (rounded) with inexact if x != 0: */ + RETURNI(x == 0 ? x : + (0x1p200 * x + fabsl(x)) * 0x1p-200); + } + q = x * x2 * C3 + x2 * x2 * (C4 + x * (C5 + x * (C6 + + x * (C7 + x * (C8 + x * (C9 + x * (C10 + + x * (C11 + x * (C12 + x * (C13 + + dx * (C14 + dx * (C15 + dx * (C16 + + dx * (C17 + dx * C18)))))))))))))); + } else { + q = x * x2 * D3 + x2 * x2 * (D4 + x * (D5 + x * (D6 + + x * (D7 + x * (D8 + x * (D9 + x * (D10 + + x * (D11 + x * (D12 + x * (D13 + + dx * (D14 + dx * (D15 + dx * (D16 + + dx * D17))))))))))))); + } + + x_hi = (float)x; + x_lo = x - x_hi; + hx2_hi = x_hi * x_hi / 2; + hx2_lo = x_lo * (x + x_hi) / 2; + if (ix >= BIAS - 7) + RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi)); + else + RETURNI(hx2_lo + q + hx2_hi + x); + } + + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ + /* Use a specialized rint() to get fn. Assume round-to-nearest. */ + fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52; +#if defined(HAVE_EFFICIENT_IRINT) + n = irint(fn); +#else + n = (int)fn; +#endif + n2 = (unsigned)n % INTERVALS; + k = n >> LOG2_INTERVALS; + r1 = x - fn * L1; + r2 = fn * -L2; + r = r1 + r2; + + /* Prepare scale factor. */ + v.e = 1; + v.xbits.expsign = BIAS + k; + twopk = v.e; + + /* + * Evaluate lower terms of + * expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). + */ + dr = r; + q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 + + dr * (A7 + dr * (A8 + dr * (A9 + dr * A10)))))))); + + t = tbl[n2].lo + tbl[n2].hi; + + if (k == 0) { + t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + + (tbl[n2].hi - 1); + RETURNI(t); + } + if (k == -1) { + t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + + (tbl[n2].hi - 2); + RETURNI(t / 2); + } + if (k < -7) { + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; + RETURNI(t * twopk - 1); + } + if (k > 2 * LDBL_MANT_DIG - 1) { + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; + if (k == LDBL_MAX_EXP) + RETURNI(t * 2 * 0x1p16383L - 1); + RETURNI(t * twopk - 1); + } + + v.xbits.expsign = BIAS - k; + twomk = v.e; + + if (k > LDBL_MANT_DIG - 1) + t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi; + else + t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk); + RETURNI(t * twopk); +} From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 16:52:38 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9B52D882; Thu, 30 May 2013 16:52:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx08.syd.optusnet.com.au (fallbackmx08.syd.optusnet.com.au [211.29.132.10]) by mx1.freebsd.org (Postfix) with ESMTP id B6C2C669; Thu, 30 May 2013 16:52:37 +0000 (UTC) Received: from mail28.syd.optusnet.com.au (mail28.syd.optusnet.com.au [211.29.133.169]) by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4UGqRRB032572; Fri, 31 May 2013 02:52:27 +1000 Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail28.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4UGqDd8011312 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 31 May 2013 02:52:14 +1000 Date: Fri, 31 May 2013 02:52:13 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Warner Losh Subject: Re: standards/175811: libstdc++ needs complex support in order use C99 In-Reply-To: Message-ID: <20130531015915.N65390@besplex.bde.org> References: <201302040328.r143SUd3039504@freefall.freebsd.org> <510F306A.6090009@missouri.edu> <20130530064635.GA91597@zim.MIT.EDU> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=BPvrNysG c=1 sm=1 a=Qub1x3MNGSYA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=AyPkC9FW8vsA:10 a=gmrSIYXE1WnqeYESaG8A:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: David Chisnall , Stephen Montgomery-Smith , pfg@FreeBSD.org, freebsd-numerics@FreeBSD.org, David Schultz , freebsd-standards@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 16:52:38 -0000 On Thu, 30 May 2013, Warner Losh wrote: > I'm all for getting everything we can into the tree that produces an answer that's not perfect, but close. What's the error that would be generated with the naive implementation of > > long double tgammal(long double f) { return tgamma(f); } On x86, 11 low bits wrong, for an error of 2048 ulps, in addition to any errors in tgamma(). tgamma() on i386 inherits errors of 9 peta-ulps (all 53 bits wrong) from i387 trig functions, but is OK on small args on i386 and better on large args on amd64. On sparc64, 60 low bits wrong, for an error of 1 exa-ulp, in addition to any errors in tgamma(); the latter are the same as on amd64. Sparc64 users of long double precision pay for it with a loss of performance of a factor of several hundred, so they should be unhappy to not get he extra bits when they ask for them (but the above inaccurate version doesn't give them what they asked for). On arches with long double == double, no difference. On i386 with the default rounding precision of double, little difference. > But assuming that, for some reason, produces errors larger than difference in precision between double and long double due to extreme non-linearity of these functions, having only a couple of stragglers is a far better position to be in than we are today. Such extra errors normally don't happen. In fact, my accuracy tests for double functions are essentially to upcast the results of double functions and compare the resulting bits with the corresponding results for long double functions. Nonlinearities tend to only happen at zeros and poles of functions and then they are due to bugs, and for NaNs, and then they are due to implementation-defined behaviour. It is difficult to even determine the location of zeros and poles for some functions, and most of the complexities in libm are to uses especially careful calculations near them when they are known. Bruce From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 16:56:14 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8BF9A8E8; Thu, 30 May 2013 16:56:14 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net [50.196.151.174]) by mx1.freebsd.org (Postfix) with ESMTP id 614A1690; Thu, 30 May 2013 16:56:13 +0000 (UTC) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r4UGuBTH093763; Thu, 30 May 2013 09:56:11 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r4UGuARj093762; Thu, 30 May 2013 09:56:10 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Date: Thu, 30 May 2013 09:56:10 -0700 From: David Schultz To: Warner Losh Subject: Re: standards/175811: libstdc++ needs complex support in order use C99 Message-ID: <20130530165610.GA93684@zim.MIT.EDU> References: <201302040328.r143SUd3039504@freefall.freebsd.org> <510F306A.6090009@missouri.edu> <20130530064635.GA91597@zim.MIT.EDU> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Cc: Stephen Montgomery-Smith , David Chisnall , pfg@freebsd.org, freebsd-standards@freebsd.org, freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 16:56:14 -0000 On Thu, May 30, 2013, Warner Losh wrote: > > On May 30, 2013, at 12:46 AM, David Schultz wrote: > > On Fri, Feb 22, 2013, David Chisnall wrote: > > I was wondering if you could explain a bit about what your goal is > > here, though. Is there some kind of certification you are trying > > to achieve? Why can't you just comment out the few missing > > functions? You've been adamant about this issue ever since > > joining the Project, even suggesting that we commit bogus > > implementations just for the sake of having the symbols. I > > completely agree with you that the lack of progress is > > unacceptable, and I'm sorry I haven't had more time to work on > > this stuff myself, but I also don't understand the source of your > > urgency. > > More and more projects are refusing to work around our > gridlock. We have to report R each new release because they have > taken out the checks for the missing symbols. It is really an > embarrassment to the project. We've let the perfect be the enemy > of the good. There are R scripts that run elsewhere and not on > FreeBSD. R is the one I know most about since I've been using R > a lot to crunch numbers for work, but there are others. > > The urgency is we'd like to have this stuff done for 10, if at > all possible. And if not done, then a lot closer to done than > where we are today. It looks like the R in ports just wants logl(), which isn't surprising, and there's already code for that. So getting that in for 10 is achievable. > > The reason I'm asking is that I'm pushing to get a lot of stuff > > into the tree quickly, but realistically, in the short term we're > > only going to get 95% of the way there. I doubt good > > implementations of complicated functions that nobody uses, such as > > erfcl() and tgammal(), are going to appear overnight. Thus, I > > would like to know whether the last 5% is needed quickly, and if > > so, why. > > I'm all for getting everything we can into the tree that > produces an answer that's not perfect, but close. What's the > error that would be generated with the naive implementation of > > long double tgammal(long double f) { return tgamma(f); } > > But assuming that, for some reason, produces errors larger than > difference in precision between double and long double due to > extreme non-linearity of these functions, having only a couple > of stragglers is a far better position to be in than we are > today. Whether this is acceptable depends a lot on who needs it in the first place, which is part of why I was asking. For many years, the only software that cared was libstdc++, and libstdc++ only wanted to wrap it. Here are some of my notes on the status of things: long double log2l(long double); -- bde long double logl(long double); -- bde long double log1pl(long double); -- bde Bruce has these written. We can commit them with a little cleanup. long double acoshl(long double); -- sgk long double asinhl(long double); -- sgk long double atanhl(long double); -- sgk long double log10l(long double); -- bde These are trivial given the first three. I believe Bruce and Steve have the code for them already. long double expl(long double); -- sgk long double expm1l(long double); -- sgk Steve has perfectly committable patches that I've already approved (and furthermore, he doesn't need my approval anymore!) long double coshl(long double); long double sinhl(long double); long double tanhl(long double); long double erfcl(long double); long double erfl(long double); These are easy given expl() and expm1l(). long double powl(long double, long double); This is not so easy, but important, so we can make it a priority. long double lgammal(long double); long double tgammal(long double); These are neither easy nor important; this gets back to my question. float complex clogf(float complex); -- bde double complex clog(double complex); -- bde Bruce has code for these, which should be straightforward to turn into something committable. float complex cpowf(float complex, float complex); double complex cpow(double complex, double complex); This one is tough to do well and even tougher to test -- lots of nasty corner cases. long double complex cexpl(long double complex); long double complex ccosl(long double complex); long double complex ccoshl(long double complex); long double complex csinl(long double complex); long double complex csinhl(long double complex); long double complex ctanl(long double complex); long double complex ctanhl(long double complex); long double complex cacosl(long double complex); long double complex cacoshl(long double complex); long double complex casinl(long double complex); long double complex casinhl(long double complex); long double complex catanl(long double complex); long double complex catanhl(long double complex); long double complex clogl(long double complex); long double complex cpowl(long double complex, long double complex); The long double versions of the complex math functions are trivial once the long double versions of the corresponding real functions are written. From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 17:13:49 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BB96DAFF; Thu, 30 May 2013 17:13:49 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 8442A78E; Thu, 30 May 2013 17:13:49 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4UHDm7M067303; Thu, 30 May 2013 10:13:48 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4UHDmUZ067302; Thu, 30 May 2013 10:13:48 -0700 (PDT) (envelope-from sgk) Date: Thu, 30 May 2013 10:13:48 -0700 From: Steve Kargl To: Pedro Giffuni Subject: Re: standards/175811: libstdc++ needs complex support in order use C99 Message-ID: <20130530171348.GA67170@troutmask.apl.washington.edu> References: <201302040328.r143SUd3039504@freefall.freebsd.org> <510F306A.6090009@missouri.edu> <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51A77324.2070702@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Stephen Montgomery-Smith , David Chisnall , David Schultz , freebsd-numerics@freebsd.org, freebsd-standards@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 17:13:49 -0000 On Thu, May 30, 2013 at 10:41:24AM -0500, Pedro Giffuni wrote: > > I may be wrong but with long double support people that > need erfcl() and tgamma() can get them from boost. > The problem is therefore not implementing everything but > getting enough to turn on the features supported by > libstdc++ and boost. > Of course, you're wrong. :-) :-) <-- Note smileys. C99 defines many long double functions. Anyone wanting to use C and libm, and not C++ and boost, will need quality implementations of these functions. Of course, the lack of any actual C99 compiler tends to dampen this argument. What I find appalling is reading "people are tired of the situation with libm, so I'm going to commit some atrocious hack". The proper response should be "so I'm going to help implement and test the missing functionality". It's unfortunate that only a few individuals are working to fix libm, but such is life. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 19:44:00 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id BFB419D for ; Thu, 30 May 2013 19:44:00 +0000 (UTC) (envelope-from pfg@FreeBSD.org) Received: from nm38-vm1.bullet.mail.ne1.yahoo.com (nm38-vm1.bullet.mail.ne1.yahoo.com [98.138.229.145]) by mx1.freebsd.org (Postfix) with ESMTP id 70C3A136 for ; Thu, 30 May 2013 19:44:00 +0000 (UTC) Received: from [98.138.90.50] by nm38.bullet.mail.ne1.yahoo.com with NNFMP; 30 May 2013 19:43:54 -0000 Received: from [98.138.226.63] by tm3.bullet.mail.ne1.yahoo.com with NNFMP; 30 May 2013 19:43:54 -0000 Received: from [127.0.0.1] by smtp214.mail.ne1.yahoo.com with NNFMP; 30 May 2013 19:43:54 -0000 X-Yahoo-Newman-Id: 290118.23681.bm@smtp214.mail.ne1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: DAxlDJAVM1kUk.zJqk8IuYUBgsk_lxZ1KTr5UNtnK7jr9uU uM48rCnNJr7M6M0lsK6ut7vCuicK86y_14.m0lq3VfD39jJJZH8mcwrJq.S6 z5RHMRYHH5QrfVb1Cv69vHnSKlGaXPscHpds8CIfSL2bW28IfMNFaRNLqZls 4WFD8YgFOcPkmc.gPj0a65MKvlDWMXxkhiEcxqmnDpB2YKeRUf.n4kA6i4dr iHnTxteidqstQnOwm1UPcyy0xChMfsJ2wq9GLhcgSRzma1vWPrufHBRGOmSq VNZGb21QFRL4OA5gLwrJ.mopMkOyRHGKbqcVAnZvze.P1Habbua82ykfeAFv e71qqI3B9QsEkeZkfiVKzuDgYMObD.Om1hW.he4DxrcYiPB_NyneTl3sBVmB ZT_h7QmjU_jEfStf1uZ0Ugsl1oid8RtK.ggAmXpa9Ut6YZjH.6m5wlnymCOi _tYtsTKcF5ekaskpxWbtQhZYP0hvvz8XXxaL_0QnnhMyNpge6g_9.kTeA X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf X-Rocket-Received: from [192.168.0.102] (pfg@190.157.126.109 with ) by smtp214.mail.ne1.yahoo.com with SMTP; 30 May 2013 12:43:54 -0700 PDT Message-ID: <51A7ABF7.6060807@FreeBSD.org> Date: Thu, 30 May 2013 14:43:51 -0500 From: Pedro Giffuni User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: Steve Kargl Subject: Re: standards/175811: libstdc++ needs complex support in order use C99 References: <201302040328.r143SUd3039504@freefall.freebsd.org> <510F306A.6090009@missouri.edu> <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org> <20130530171348.GA67170@troutmask.apl.washington.edu> In-Reply-To: <20130530171348.GA67170@troutmask.apl.washington.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Stephen Montgomery-Smith , freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 19:44:00 -0000 ( I stripped a bit the CC list ) On 30.05.2013 12:13, Steve Kargl wrote: > On Thu, May 30, 2013 at 10:41:24AM -0500, Pedro Giffuni wrote: >> I may be wrong but with long double support people that >> need erfcl() and tgamma() can get them from boost. >> The problem is therefore not implementing everything but >> getting enough to turn on the features supported by >> libstdc++ and boost. >> > Of course, you're wrong. :-) :-) <-- Note smileys. And I knew I could be likely wrong from the start ;). > C99 defines many long double functions. Anyone wanting > to use C and libm, and not C++ and boost, will need > quality implementations of these functions. Of course, > the lack of any actual C99 compiler tends to dampen > this argument. > > What I find appalling is reading "people are tired > of the situation with libm, so I'm going to commit > some atrocious hack". The proper response should be > "so I'm going to help implement and test the missing > functionality". It's unfortunate that only a few > individuals are working to fix libm, but such is > life. > I guess I was trying to hint that Boost is a good place to look at to get ideas for the implementations for such stuff. Stephen knows this well though since he actually fixed some complex functions in boost :). The implementations of erfc and tgamma in OpenOffice are based on the Boost code with the important difference that boost does the automatic type promotion when they can. FWIW, I was about to change OpenOffice to use boost but then I noticed that the type promotion doesn't work on FreeBSD (due to the lack of long double math) so in general there was not much gain in changing the status quo. Pedro. From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 20:15:14 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 15C07E15; Thu, 30 May 2013 20:15:14 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id EC154302; Thu, 30 May 2013 20:15:13 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4UKFDwx068665; Thu, 30 May 2013 13:15:13 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4UKFDo2068664; Thu, 30 May 2013 13:15:13 -0700 (PDT) (envelope-from sgk) Date: Thu, 30 May 2013 13:15:13 -0700 From: Steve Kargl To: Pedro Giffuni Subject: Re: standards/175811: libstdc++ needs complex support in order use C99 Message-ID: <20130530201513.GA68512@troutmask.apl.washington.edu> References: <201302040328.r143SUd3039504@freefall.freebsd.org> <510F306A.6090009@missouri.edu> <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org> <20130530171348.GA67170@troutmask.apl.washington.edu> <51A7ABF7.6060807@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51A7ABF7.6060807@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Stephen Montgomery-Smith , freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 20:15:14 -0000 On Thu, May 30, 2013 at 02:43:51PM -0500, Pedro Giffuni wrote: > On 30.05.2013 12:13, Steve Kargl wrote: > > C99 defines many long double functions. Anyone wanting > > to use C and libm, and not C++ and boost, will need > > quality implementations of these functions. Of course, > > the lack of any actual C99 compiler tends to dampen > > this argument. > > > > What I find appalling is reading "people are tired > > of the situation with libm, so I'm going to commit > > some atrocious hack". The proper response should be > > "so I'm going to help implement and test the missing > > functionality". It's unfortunate that only a few > > individuals are working to fix libm, but such is > > life. > > > > I guess I was trying to hint that Boost is a good > place to look at to get ideas for the implementations > for such stuff. Stephen knows this well though since > he actually fixed some complex functions in boost :). > Boost might be a good place to look for implementation ideas. Looking at the msun code also works. As does searching with google. This is all secondary to the real issue. The real problem is no one is willing to step forward to actually help write and test the code. Everyone seems to be waiting (and complaining!) for someone else to do the work. I've been chipping away at libm issues since 2003, and given my available free time I should have a fully compliant C99 libm around 2025 or so. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 20:19:30 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 84ED0EED for ; Thu, 30 May 2013 20:19:30 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id 0CB2E34E for ; Thu, 30 May 2013 20:19:29 +0000 (UTC) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4UKJI9W008054 for ; Fri, 31 May 2013 06:19:18 +1000 Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4UKJ9FE011708 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 31 May 2013 06:19:10 +1000 Date: Fri, 31 May 2013 06:19:09 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: Patches for s_expl.c In-Reply-To: <20130530162723.GB66755@troutmask.apl.washington.edu> Message-ID: <20130531053652.H65974@besplex.bde.org> References: <20130528172242.GA51485@troutmask.apl.washington.edu> <20130529062437.V4648@besplex.bde.org> <20130529162441.GA58773@troutmask.apl.washington.edu> <20130530045951.Y4776@besplex.bde.org> <20130530162723.GB66755@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=e/de0tV/ c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=SI7jfN9uMIUA:10 a=Iu-xyOGO5_ZyKUnWV68A:9 a=CjuIK1q_8ugA:10 a=-W0hRMvl23hXUl_A:21 a=5eAh3lsampImaI_r:21 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: freebsd-numerics@FreeBSD.org, Bruce Evans X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 20:19:30 -0000 On Thu, 30 May 2013, Steve Kargl wrote: > OK, I've restored whitespace to hopefully match your expectations. > Removed excess digits in exponents (e.g., 1.234e08 --> 1.234e8). > Restored XXX comments. > Removed (unnecessary?) blank lines. > Restored the order of computing r = r1 + r2 in ld128. > Moved the |x| < 0x1p-113 if-block back into the [T1:T3] interval. I like the ld80 version now. My diffs for the ld128 version are below. > Final questions. What is your preference for committing expm1l? > Should it be included in s_expl.c or should I use 'svn cp' to > copy s_expl.c to s_expm1l.c and add the implementation of > expm1l to the copied version? I prefer it in the same file. The big table is hard to manage in a separate file (if the functions are split, then the table should be too, since it is the largest component), and some constants would have to be made public or duplicated. Accesses to public tables and scalars cannot be optimized (by the compiler) as much as static ones. But when you implement exp() so that it works as well as expl(), the table should be shared in the ld80 case, so at least the table should be split then. @ --- z22/s_expl.c Fri May 31 04:31:30 2013 @ +++ s_expl.c Fri May 31 05:32:51 2013 @ @@ -70,7 +70,13 @@ @ @ +/* @ + * XXX values in hex in comments have been lost (or were never present) @ + * from here. @ + */ This patch fixes just a few. All the double precision coeffs are in a standad format now. @ static const long double @ /* @ - * Domain [-0.002708, 0.002708], range ~[-2.4011e-38, 2.4244e-38]: @ + * Domain [-0.002708, 0.002708], range ~[-2.4021e-38, 2.4234e-38]: Checking the range showed that it is not quite the claimed one. I think the old values are from a previous check, but I improved the checking program so the new values are hopefully more accurate. Oops, I'm not quite happy with the ld80 version, since the checker says that its B range is much more different than claimed than this range. @ * |exp(x) - p(x)| < 2**-124.9 @ * (0.002708 is ln2/(2*INTERVALS) rounded up a little). @ + * @ + * XXX the coeffs aren't very carefully rounded, and I get 2.3 more bits. @ */ Perhaps the coeffs are rounded carefully enough now. They can be chosen better. @ @@ -83,8 +89,25 @@ @ static const double @ -A7 = 1.9841269841269471e-4, @ -A8 = 2.4801587301585284e-5, @ -A9 = 2.7557324277411234e-6, @ -A10 = 2.7557333722375072e-7; @ +A7 = 1.9841269841269470e-4, /* 0x1.a01a01a019f91p-13 */ @ +A8 = 2.4801587301585286e-5, /* 0x1.71de3ec75a967p-19 */ @ +A9 = 2.7557324277411235e-6, /* 0x1.71de3ec75a967p-19 */ @ +A10 = 2.7557333722375069e-7; /* 0x1.27e505ab56259p-22 */ Act on an old reminder to fix things and round the values properly (just re-print the values given by the C declarations). Also add comments. @ @ static const struct { @ + /* @ + * hi must be rounded to at most 106 bits so that multiplication @ + * by r1 in expm1l() is exact, but it is rounded to 88 bits due to @ + * historical accidents. Keep this part of the comment. @ + * @ + * XXX it is wasteful to use long double for both hi and lo. ld128 @ + * exp2l() uses only float for lo (in a very differently organized @ + * table; ld80 exp2l() is different again. It uses 2 doubles in a @ + * table organized like this one. 1 double and 1 float would @ + * suffice). There are different packing/locality/alignment/caching @ + * problems with these methods. @ + * @ + * XXX C's bad %a format makes the bits unreadable. They happen @ + * to all line up for the hi values 1 before the point and 88 @ + * in 22 nybbles, but for the low values the nybbles are shifted @ + * randomly. @ + */ Reminders of things to fix. In a development version, I need hi to have only about 56 bits. It is easy to re-split hi+lo for testing this. A 24-bit or 53-bit hi is sufficient and would give this automatically. @ long double hi; @ @@ -311,5 +336,11 @@ @ * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2]. @ - * Setting T3 to 0 would require the |x| < 0x1p-113 condition to appear @ + * Setting T3 to 0 would require the |x| < 0x1p-113 condition to appear @ * in both subintervals, so set T3 = 2**-5, which places the condition @ * into the [T1:T3] interval. @ + * @ + * XXX we now do this more to (partially) balance the number of terms @ + * in the C and D polys than to avoid checking the conditon in both @ + * intervals. @ + * @ + * XXX these micro-optimizations are excessive. @ */ @ @@ -319,7 +350,25 @@ @ /* @ - * XXX Estimated range is for absolute error. @ - * Domain [-0.1659, 0.03125], range ~[-1.8933e-38, 1.8943e-38]: @ - * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-125.3 @ + * Domain [-0.1659, 0.03125], range ~[2.9134e-44, 1.8404e-37]: @ + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-122.03 The relative error should be the one documented. @ + * @ + * XXX the coeffs aren't very carefully rounded. I got 10.3 more bits with @ + * the old version for [-0.1659, -0.03125]. Now T3 is better balanced, and @ + * I would expect only 7-8 extra bits. @ + * @ + * XXX the number of terms can be reduced by 1. Then I get a few more bits @ + * with the same number of doubles (5), and 0.7 more bits with 8 doubles. @ + * This much accuracy is hard to explain, and it isn't clear that reduction @ + * of x to double is valid at the same point that reduction of the coeffs to @ + * double. With C10 double, the absolute errors from rounding it are up to @ + * about 2**-53 * 0.1659**10/10! ~= 2**-100.8. Remes apparently improves @ + * this to 2**-122.1. @ */ Better polynomials should be used someday, but I want you to generate them. After fixing the generator to minimize the relative error instead of the absolute error, you should get ones like mine. @ static const long double @ +/* @ + * XXX none of the long double C or D coeffs except C10 is correctly printed. @ + * If you re-print their values in %.35Le format, the result is always @ + * different. For example, the last 2 digits in C3 should be 59, not 67. @ + * 67 is apparently from rounding an extra-precision value to 36 decimal @ + * places. @ + */ @ C3 = 1.66666666666666666666666666666666667e-1L, I didn't fix these. @ @@ -337,3 +386,3 @@ @ static const double @ -C14 = 1.1470745580491932e-11, /* 0x1.93974a81dae3p-37 */ @ +C14 = 1.1470745580491932e-11, /* 0x1.93974a81dae30p-37 */ @ C15 = 7.6471620181090468e-13, /* 0x1.ae7f3820adab1p-41 */ @ @@ -344,5 +393,17 @@ @ /* @ - * XXX Estimated range is for absolute error. @ - * Domain [0.03125, 0.1659], range ~[-2.7597e-38, 2.7602e-38]: @ - * |(exp(x)-1-x-x**2/2)/x**3 - p(x)| < 2**-124.8 @ + * Domain [0.03125, 0.1659], range ~[-2.7676e-37, -1.0367e-38]: @ + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-121.44 @ + * @ + * XXX the coeffs aren't very carefully rounded. I get 5.2 more bits with @ + * the old version for [-0.03125, 0.1659]. Now T3 is better balanced, and @ + * I would expect 7-8 extra bits. @ + * @ + * XXX the number of terms can be reduced by 1. Then I get a few more bits @ + * with the same number of doubles (4), and 1.1 more bits with 6 doubles. @ + * This much accuracy is hard to explain, etc., as above. With D11 double, @ + * the absolute errors from rounding it are up to about @ + * 2**-53 * 0.1659**11/11! ~= 2**-106.8. @ + * @ + * Note that with my coeffs, although this side needs 1 fewer term, it needs @ + * 1 more long double term, so it is probably actually slower on sparc64. @ */ It's painful to have separate polys C and D for Tang's B. @ @@ -403,3 +466,2 @@ @ if (T1 < x && x < T2) { @ - @ x2 = x * x; Bruce From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 20:35:14 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 74AEE31B; Thu, 30 May 2013 20:35:14 +0000 (UTC) (envelope-from wollman@khavrinen.csail.mit.edu) Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu [IPv6:2001:470:8b2d:1e1c:21b:21ff:feb8:d7b0]) by mx1.freebsd.org (Postfix) with ESMTP id 4D363640; Thu, 30 May 2013 20:35:14 +0000 (UTC) Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1]) by khavrinen.csail.mit.edu (8.14.5/8.14.5) with ESMTP id r4UKZCqQ069030 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA); Thu, 30 May 2013 16:35:12 -0400 (EDT) (envelope-from wollman@khavrinen.csail.mit.edu) Received: (from wollman@localhost) by khavrinen.csail.mit.edu (8.14.5/8.14.5/Submit) id r4UKZCOZ069027; Thu, 30 May 2013 16:35:12 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <20903.47104.38977.577307@khavrinen.csail.mit.edu> Date: Thu, 30 May 2013 16:35:12 -0400 From: Garrett Wollman To: Warner Losh Subject: Re: standards/175811: libstdc++ needs complex support in order use C99 In-Reply-To: References: <201302040328.r143SUd3039504@freefall.freebsd.org> <510F306A.6090009@missouri.edu> <20130530064635.GA91597@zim.MIT.EDU> X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (khavrinen.csail.mit.edu [127.0.0.1]); Thu, 30 May 2013 16:35:12 -0400 (EDT) Cc: freebsd-numerics@freebsd.org, freebsd-standards@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 20:35:14 -0000 < said: > I'm all for getting everything we can into the tree that produces an > answer that's not perfect, but close. What's the error that would be > generated with the naive implementation of > long double tgammal(long double f) { return tgamma(f); } Perhaps we could implement these functions in such a way that they logged a message to inform the user (once per process) that they were using a low-quality implementation. That would allow us to implement these functions without totally losing the incentive to implement them properly, and those users who don't actually call those functions would not have to pay the price of further delay. (This would be a non-conforming implementation, since it would have side effects other than those specified by the standard, but we already fail to conform by not implementing the functions at all, so it wouldn't make things *worse*.) -GAWollman From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 20:35:21 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3565F322 for ; Thu, 30 May 2013 20:35:21 +0000 (UTC) (envelope-from pfg@FreeBSD.org) Received: from nm18.bullet.mail.ne1.yahoo.com (nm18.bullet.mail.ne1.yahoo.com [98.138.90.81]) by mx1.freebsd.org (Postfix) with ESMTP id CB3D7641 for ; Thu, 30 May 2013 20:35:20 +0000 (UTC) Received: from [98.138.226.179] by nm18.bullet.mail.ne1.yahoo.com with NNFMP; 30 May 2013 20:32:02 -0000 Received: from [98.138.226.61] by tm14.bullet.mail.ne1.yahoo.com with NNFMP; 30 May 2013 20:32:02 -0000 Received: from [127.0.0.1] by smtp212.mail.ne1.yahoo.com with NNFMP; 30 May 2013 20:32:02 -0000 X-Yahoo-Newman-Id: 634308.14639.bm@smtp212.mail.ne1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: ZY7iBa0VM1no4wHSbRqynOmrw02iYuqXod4iGts_ae69c3e 6Mkfz75YLdkCro9ZbEPV9rrRTQWaYg3yr8_lJLxQdN668cbTG.t9KijWXqPI zSJyAaz.uZiOqJuXnOeQYxH.expwxcbCtuebP3VWHEFyz0gdiSvNCnQ.uy_S PgPrHHXOAermSQI2rDVEsBJTLd6kLbtgvuJBxM0E3MVueUGxK3Jp4YDjRc4Q jLjraFv7.eGFXtRo9Ky0KphA9GcHbLuYW.IzVX2pI7zFUwfbaXTxSHisrYCS l0O9ILZXXsBAcfEbyIqYkTQJFfsZzxgX7ypGJFwomKa.bZu6Chn8ChTekKrZ SF5dAREoZDzd9PbORxqpjsCDzTc0q99Onmh6DVVGMEUnSNMb0x6td.MNiANs yfrz_e_f9l3eBe9I2QIPiAcRYqz82wMBVGbvOWCJqb0jovlnUDzCsfubfVt0 E3fN_7HA0YAGsEEu6HFlMZhqLB4RSi1ZfVjEo7.XqRlCV96Cw7MLMtph1ahL k63QeN7eWx6ocSVrO6wx95EeEaLfwzrV6mu5cXMqsYq9Fm1ngJZp3n8xsKDI Gs451yYNnZTQXNbw- X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf X-Rocket-Received: from [192.168.0.102] (pfg@190.157.126.109 with ) by smtp212.mail.ne1.yahoo.com with SMTP; 30 May 2013 13:32:02 -0700 PDT Message-ID: <51A7B73F.8040409@FreeBSD.org> Date: Thu, 30 May 2013 15:31:59 -0500 From: Pedro Giffuni User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: Steve Kargl Subject: Re: standards/175811: libstdc++ needs complex support in order use C99 References: <201302040328.r143SUd3039504@freefall.freebsd.org> <510F306A.6090009@missouri.edu> <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org> <20130530171348.GA67170@troutmask.apl.washington.edu> <51A7ABF7.6060807@FreeBSD.org> <20130530201513.GA68512@troutmask.apl.washington.edu> In-Reply-To: <20130530201513.GA68512@troutmask.apl.washington.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 20:35:21 -0000 On 30.05.2013 15:15, Steve Kargl wrote: > On Thu, May 30, 2013 at 02:43:51PM -0500, Pedro Giffuni wrote: >> On 30.05.2013 12:13, Steve Kargl wrote: >>> C99 defines many long double functions. Anyone wanting >>> to use C and libm, and not C++ and boost, will need >>> quality implementations of these functions. Of course, >>> the lack of any actual C99 compiler tends to dampen >>> this argument. >>> >>> What I find appalling is reading "people are tired >>> of the situation with libm, so I'm going to commit >>> some atrocious hack". The proper response should be >>> "so I'm going to help implement and test the missing >>> functionality". It's unfortunate that only a few >>> individuals are working to fix libm, but such is >>> life. >>> >> I guess I was trying to hint that Boost is a good >> place to look at to get ideas for the implementations >> for such stuff. Stephen knows this well though since >> he actually fixed some complex functions in boost :). >> > Boost might be a good place to look for implementation > ideas. Looking at the msun code also works. As does > searching with google. This is all secondary to the > real issue. The real problem is no one is willing to > step forward to actually help write and test the code. > Everyone seems to be waiting (and complaining!) for > someone else to do the work. I've been chipping away at > libm issues since 2003, and given my available free time > I should have a fully compliant C99 libm around 2025 or > so. > And it happens all around the tree ... The guys fixing clang seem pretty overloaded too. We really need a better installer, and to add more DTrace providers and while here more filesystems ... it never stops and we are all just volunteers. All in all, feedback is not necessarily a bad thing. Even if there are few heroic developers working on it, it would help to have a list of open tasks like this: http://www.freebsd.org/projects/c99/ so that someone asking about the status is just pointed there and gets the picture. Just my $0.02, sorry that I am busy with other stuff. Pedro. From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 20:56:11 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D99EE175 for ; Thu, 30 May 2013 20:56:11 +0000 (UTC) (envelope-from s.montgomerysmith@gmail.com) Received: from mail-ie0-x234.google.com (mail-ie0-x234.google.com [IPv6:2607:f8b0:4001:c03::234]) by mx1.freebsd.org (Postfix) with ESMTP id ABEA681D for ; Thu, 30 May 2013 20:56:11 +0000 (UTC) Received: by mail-ie0-f180.google.com with SMTP id b11so1809393iee.25 for ; Thu, 30 May 2013 13:56:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; bh=24dgC1PLNQEsF310L0ySC2NAhY4O5N3bHfBB4oDzRMA=; b=Hjg9YbdL2IBVQEZRTc72st+QiM2SgearD+UJsF2HUKo04m3nAh8M7+07X6t79EeNHq ZtmbdnuhrnAJmFN5CsNSQP2M5QmTZc7i31z451tmiWVCAuhAHTgtJ9rVQjm9YghyvBfI PMyMOz1mzuxN6haa8pW0P6JPX0bcYR99o/o9VMmBZtOJ/NTqLv0H0JkZhDhk6968Gs1F L4hCX0d+btV1M44k8Fv0ftPVlEeZlvBlzHasp24xHMnOwMNfQAC/+SnGCZUlf7Y7QoBi pXeZ5xBM+u9HnYoM3JwGMsMOCq/Gs6U7Ea3zoIIgqBRPlWIPYCq/NhnXnniOO2BnCrLc WvOw== X-Received: by 10.42.250.202 with SMTP id mp10mr3825516icb.21.1369947371458; Thu, 30 May 2013 13:56:11 -0700 (PDT) Received: from [10.7.39.35] ([161.130.188.204]) by mx.google.com with ESMTPSA id qr3sm791242igb.1.2013.05.30.13.56.09 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 30 May 2013 13:56:10 -0700 (PDT) Sender: Stephen Montgomery-Smith Message-ID: <51A7BCE8.3010001@missouri.edu> Date: Thu, 30 May 2013 15:56:08 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org Subject: Re: standards/175811: libstdc++ needs complex support in order use C99 References: <201302040328.r143SUd3039504@freefall.freebsd.org> <510F306A.6090009@missouri.edu> <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org> <20130530171348.GA67170@troutmask.apl.washington.edu> In-Reply-To: <20130530171348.GA67170@troutmask.apl.washington.edu> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 20:56:11 -0000 On 05/30/2013 12:13 PM, Steve Kargl wrote: > What I find appalling is reading "people are tired > of the situation with libm, so I'm going to commit > some atrocious hack". The proper response should be > "so I'm going to help implement and test the missing > functionality". It's unfortunate that only a few > individuals are working to fix libm, but such is > life. I don't think the problem is that there are too few individuals. I think the problem is that the standards are set too high. I presented numerically accurate complex arc-trig functions a long time ago, and I became increasingly frustrated at the lack of progress. I am pleased that it got committed a few days ago. But I feel that the change requests, particular the style change requests, became too much. I dutifully complied with the many style changes, but it became overwhelming. There is a happy medium between simply copying the *l functions to the * functions, and what we have now. I am all for having reasonable standards, but what we currently have is gridlock that is unacceptable. From owner-freebsd-numerics@FreeBSD.ORG Thu May 30 21:17:12 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F27E9565 for ; Thu, 30 May 2013 21:17:12 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-ie0-x22c.google.com (mail-ie0-x22c.google.com [IPv6:2607:f8b0:4001:c03::22c]) by mx1.freebsd.org (Postfix) with ESMTP id C019F969 for ; Thu, 30 May 2013 21:17:12 +0000 (UTC) Received: by mail-ie0-f172.google.com with SMTP id 17so1960582iea.3 for ; Thu, 30 May 2013 14:17:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=5fL5r8OZ7yMWK400mw0a+tJ4HcpentBZd9UtNAB+EUM=; b=E7wiEuJS5N1fdvbn8MBcwSa1tUrXkYsI/yGSceLRC+n2U78I5owVX0Irys8CUVNKc7 eJcM1hS16UNMSzOwrU0HQxb6IBUU0ZnzqgkEplE6toYYHJ8dxYt6jZGe8qLkoHZdm93R MUE7z3pUPdqIDE/R94ABNIbea4YZRqz3Lu1rOfkXmMkhOt3yHEXDtNkPE1Sf9oAwmDxj pxOz2SUMu8oF5O4kZeMl4Lq5NRe4gbcNZwj7k1ZzZlCq1VL2Vb9xG7SNQWxQaEjkCc63 /gOyQquPA03+39GfpFd4GKt/x7L9oVXaGfEQTWanGOXsGqAbznrypQRd96ewftv1KM2A SYNA== X-Received: by 10.50.43.234 with SMTP id z10mr247671igl.92.1369948632263; Thu, 30 May 2013 14:17:12 -0700 (PDT) Received: from monkey-bot.int.fusionio.com ([209.117.142.2]) by mx.google.com with ESMTPSA id k10sm193977ige.0.2013.05.30.14.17.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 30 May 2013 14:17:11 -0700 (PDT) Sender: Warner Losh Subject: Re: standards/175811: libstdc++ needs complex support in order use C99 Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <20130530171348.GA67170@troutmask.apl.washington.edu> Date: Thu, 30 May 2013 15:17:07 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <486AC985-2F3A-4CEB-A229-DF5F4AE9C50F@bsdimp.com> References: <201302040328.r143SUd3039504@freefall.freebsd.org> <510F306A.6090009@missouri.edu> <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org> <20130530171348.GA67170@troutmask.apl.washington.edu> To: Steve Kargl X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQnN+91s005OkUbnJ3QApmJa8IRnUZuNIDfsZF6T9L7cpTYFnqHqHm1x4pJhq3Y5VOx0m3bi Cc: Stephen Montgomery-Smith , David Schultz , Pedro Giffuni , freebsd-standards@freebsd.org, freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 21:17:13 -0000 On May 30, 2013, at 11:13 AM, Steve Kargl wrote: > On Thu, May 30, 2013 at 10:41:24AM -0500, Pedro Giffuni wrote: >>=20 >> I may be wrong but with long double support people that >> need erfcl() and tgamma() can get them from boost. >> The problem is therefore not implementing everything but >> getting enough to turn on the features supported by >> libstdc++ and boost. >>=20 >=20 > Of course, you're wrong. :-) :-) <-- Note smileys. >=20 > C99 defines many long double functions. Anyone wanting > to use C and libm, and not C++ and boost, will need=20 > quality implementations of these functions. Of course, > the lack of any actual C99 compiler tends to dampen=20 > this argument. =20 >=20 > What I find appalling is reading "people are tired > of the situation with libm, so I'm going to commit > some atrocious hack". The proper response should be > "so I'm going to help implement and test the missing > functionality". It's unfortunate that only a few > individuals are working to fix libm, but such is > life.=20 I'd help, but the barriers to entry are somewhat steep and prickly. I = tried to help, and got no end of grief for documenting the differences = in an algorithm that was actually different that people told me was the = same. In that environment, you suck the enthusiasm out of the air an = wind up in the something is better than nothing camp quite quickly. Warner From owner-freebsd-numerics@FreeBSD.ORG Fri May 31 03:38:13 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7257917CC for ; Fri, 31 May 2013 03:38:13 +0000 (UTC) (envelope-from das@FreeBSD.org) Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net [50.196.151.174]) by mx1.freebsd.org (Postfix) with ESMTP id 52E64CCA for ; Fri, 31 May 2013 03:38:13 +0000 (UTC) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r4V3cCFd095032; Thu, 30 May 2013 20:38:12 -0700 (PDT) (envelope-from das@FreeBSD.org) Received: (from das@localhost) by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r4V3cBu8095031; Thu, 30 May 2013 20:38:11 -0700 (PDT) (envelope-from das@FreeBSD.org) Date: Thu, 30 May 2013 20:38:11 -0700 From: David Schultz To: Stephen Montgomery-Smith Subject: Re: standards/175811: libstdc++ needs complex support in order use C99 Message-ID: <20130531033811.GA95005@zim.MIT.EDU> References: <201302040328.r143SUd3039504@freefall.freebsd.org> <510F306A.6090009@missouri.edu> <20130530064635.GA91597@zim.MIT.EDU> <51A77324.2070702@FreeBSD.org> <20130530171348.GA67170@troutmask.apl.washington.edu> <51A7BCE8.3010001@missouri.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51A7BCE8.3010001@missouri.edu> Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 May 2013 03:38:13 -0000 On Thu, May 30, 2013, Stephen Montgomery-Smith wrote: > On 05/30/2013 12:13 PM, Steve Kargl wrote: > > > What I find appalling is reading "people are tired > > of the situation with libm, so I'm going to commit > > some atrocious hack". The proper response should be > > "so I'm going to help implement and test the missing > > functionality". It's unfortunate that only a few > > individuals are working to fix libm, but such is > > life. > > I don't think the problem is that there are too few individuals. I > think the problem is that the standards are set too high. I presented > numerically accurate complex arc-trig functions a long time ago, and I > became increasingly frustrated at the lack of progress. > > I am pleased that it got committed a few days ago. > > But I feel that the change requests, particular the style change > requests, became too much. I dutifully complied with the many style > changes, but it became overwhelming. Bruce is very meticulous and has a lot of good feedback, but it's important to understand that Bruce's standards are not the minimum standards for committing a change. Bruce doesn't commit directly anymore in any case. I don't think I have ever committed a change that Bruce could find no flaws in, including patches submitted by Bruce himself. :) It's okay to commit some working code first and then improve it later. From owner-freebsd-numerics@FreeBSD.ORG Fri May 31 15:46:09 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D4490DFD for ; Fri, 31 May 2013 15:46:09 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id A0E6AC1D for ; Fri, 31 May 2013 15:46:09 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4VFk8iJ073260; Fri, 31 May 2013 08:46:08 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4VFk83v073259; Fri, 31 May 2013 08:46:08 -0700 (PDT) (envelope-from sgk) Date: Fri, 31 May 2013 08:46:08 -0700 From: Steve Kargl To: Bruce Evans Subject: Re: Patches for s_expl.c Message-ID: <20130531154608.GA73175@troutmask.apl.washington.edu> References: <20130528172242.GA51485@troutmask.apl.washington.edu> <20130529062437.V4648@besplex.bde.org> <20130529162441.GA58773@troutmask.apl.washington.edu> <20130530045951.Y4776@besplex.bde.org> <20130530162723.GB66755@troutmask.apl.washington.edu> <20130531053652.H65974@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130531053652.H65974@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 May 2013 15:46:09 -0000 On Fri, May 31, 2013 at 06:19:09AM +1000, Bruce Evans wrote: > On Thu, 30 May 2013, Steve Kargl wrote: > > > OK, I've restored whitespace to hopefully match your expectations. > > Removed excess digits in exponents (e.g., 1.234e08 --> 1.234e8). > > Restored XXX comments. > > Removed (unnecessary?) blank lines. > > Restored the order of computing r = r1 + r2 in ld128. > > Moved the |x| < 0x1p-113 if-block back into the [T1:T3] interval. > > I like the ld80 version now. My diffs for the ld128 version are below. :-) > > Final questions. What is your preference for committing expm1l? > > Should it be included in s_expl.c or should I use 'svn cp' to > > copy s_expl.c to s_expm1l.c and add the implementation of > > expm1l to the copied version? > > I prefer it in the same file. The big table is hard to manage in a > separate file (if the functions are split, then the table should be > too, since it is the largest component), and some constants would have > to be made public or duplicated. Accesses to public tables and scalars > cannot be optimized (by the compiler) as much as static ones. But when > you implement exp() so that it works as well as expl(), the table should > be shared in the ld80 case, so at least the table should be split then. OK. I'll commit expm1l into s_expl.c. I did briefly look at splitting the code into a k_expm1l.{c|h} and s_exp[m1]l.c, but I could not convince myself that it would provided us with any clear benefit due to the size and differences in constructing the final result. I've add most of your suggests. > @ static const struct { > @ + /* > @ + * hi must be rounded to at most 106 bits so that multiplication > @ + * by r1 in expm1l() is exact, but it is rounded to 88 bits due to > @ + * historical accidents. > > Keep this part of the comment. OK. > @ + * > @ + * XXX it is wasteful to use long double for both hi and lo. ld128 > @ + * exp2l() uses only float for lo (in a very differently organized > @ + * table; ld80 exp2l() is different again. It uses 2 doubles in a > @ + * table organized like this one. 1 double and 1 float would > @ + * suffice). There are different packing/locality/alignment/caching > @ + * problems with these methods. > @ + * > @ + * XXX C's bad %a format makes the bits unreadable. They happen > @ + * to all line up for the hi values 1 before the point and 88 > @ + * in 22 nybbles, but for the low values the nybbles are shifted > @ + * randomly. > @ + */ I left these XXX out of the new version, and have archived your email in my development tree. I may someday look at whether changing the tables provides an improvement. > > Reminders of things to fix. > > In a development version, I need hi to have only about 56 bits. It is > easy to re-split hi+lo for testing this. A 24-bit or 53-bit hi is > sufficient and would give this automatically. Is this a version where you try to eliminate the C and D polynomials? > @ + * XXX the coeffs aren't very carefully rounded. I got 10.3 more bits with > @ + * the old version for [-0.1659, -0.03125]. Now T3 is better balanced, and > @ + * I would expect only 7-8 extra bits. > @ + * > @ + * XXX the number of terms can be reduced by 1. Then I get a few more bits > @ + * with the same number of doubles (5), and 0.7 more bits with 8 doubles. > @ + * This much accuracy is hard to explain, and it isn't clear that reduction > @ + * of x to double is valid at the same point that reduction of the coeffs to > @ + * double. With C10 double, the absolute errors from rounding it are up to > @ + * about 2**-53 * 0.1659**10/10! ~= 2**-100.8. Remes apparently improves > @ + * this to 2**-122.1. > @ */ > > Better polynomials should be used someday, but I want you to generate them. > After fixing the generator to minimize the relative error instead of the > absolute error, you should get ones like mine. I left these XXX out as well. I have a plan for possibly generating new polynomials, but it depends on acquiring some external funding to completely rewrite how I implemented the Remes algorithm. > @ static const long double > @ +/* > @ + * XXX none of the long double C or D coeffs except C10 is correctly printed. > @ + * If you re-print their values in %.35Le format, the result is always > @ + * different. For example, the last 2 digits in C3 should be 59, not 67. > @ + * 67 is apparently from rounding an extra-precision value to 36 decimal > @ + * places. > @ + */ > @ C3 = 1.66666666666666666666666666666666667e-1L, > > I didn't fix these. > I didn't fix the coefficient as well. I'll do it if I ever get around to regenerating the coefficients. The limiting testing that I've been able to do on flame gave max ULP < 0.51. This, IMO, is good enough for now. > @ + * > @ + * XXX the coeffs aren't very carefully rounded. I get 5.2 more bits with > @ + * the old version for [-0.03125, 0.1659]. Now T3 is better balanced, and > @ + * I would expect 7-8 extra bits. > @ + * > @ + * XXX the number of terms can be reduced by 1. Then I get a few more bits > @ + * with the same number of doubles (4), and 1.1 more bits with 6 doubles. > @ + * This much accuracy is hard to explain, etc., as above. With D11 double, > @ + * the absolute errors from rounding it are up to about > @ + * 2**-53 * 0.1659**11/11! ~= 2**-106.8. > @ + * > @ + * Note that with my coeffs, although this side needs 1 fewer term, it needs > @ + * 1 more long double term, so it is probably actually slower on sparc64. > @ */ I did not include this dialogue as the reference to "I" would appear ambigious to the casual reader of the code. Thanks for helping with getting the code to its current. Final diff(?). -- Steve Index: ld80/s_expl.c =================================================================== --- ld80/s_expl.c (revision 251146) +++ ld80/s_expl.c (working copy) @@ -1,5 +1,5 @@ /*- - * Copyright (c) 2009-2012 Steven G. Kargl + * Copyright (c) 2009-2013 Steven G. Kargl * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -29,7 +29,7 @@ #include __FBSDID("$FreeBSD$"); -/*- +/** * Compute the exponential of x for Intel 80-bit format. This is based on: * * PTP Tang, "Table-driven implementation of the exponential function @@ -50,6 +50,7 @@ #include "math_private.h" #define INTERVALS 128 +#define LOG2_INTERVALS 7 #define BIAS (LDBL_MAX_EXP - 1) static const long double @@ -60,9 +61,12 @@ static const union IEEEl2bits /* log(2**16384 - 0.5) rounded towards zero: */ -o_threshold = LD80C(0xb17217f7d1cf79ab, 13, 11356.5234062941439488L), +/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */ +o_thresholdu = LD80C(0xb17217f7d1cf79ab, 13, 11356.5234062941439488L), +#define o_threshold (o_thresholdu.e) /* log(2**(-16381-64-1)) rounded towards zero: */ -u_threshold = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L); +u_thresholdu = LD80C(0xb21dfe7f09e2baa9, 13, -11399.4985314888605581L); +#define u_threshold (u_thresholdu.e) static const double /* @@ -78,11 +82,11 @@ * |exp(x) - p(x)| < 2**-77.2 * (0.002708 is ln2/(2*INTERVALS) rounded up a little). */ -P2 = 0.5, -P3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ -P4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ -P5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ -P6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ +A2 = 0.5, +A3 = 1.6666666666666119e-1, /* 0x15555555555490.0p-55 */ +A4 = 4.1666666666665887e-2, /* 0x155555555554e5.0p-57 */ +A5 = 8.3333354987869413e-3, /* 0x1111115b789919.0p-59 */ +A6 = 1.3888891738560272e-3; /* 0x16c16c651633ae.0p-62 */ /* * 2^(i/INTERVALS) for i in [0,INTERVALS] is represented by two values where @@ -96,8 +100,7 @@ static const struct { double hi; double lo; -/* XXX should rename 's'. */ -} s[INTERVALS] = { +} tbl[INTERVALS] = { 0x1p+0, 0x0p+0, 0x1.0163da9fb3335p+0, 0x1.b61299ab8cdb7p-54, 0x1.02c9a3e778060p+0, 0x1.dcdef95949ef4p-53, @@ -232,7 +235,8 @@ expl(long double x) { union IEEEl2bits u, v; - long double fn, q, r, r1, r2, t, t23, t45, twopk, twopkp10000, z; + long double fn, q, r, r1, r2, t, twopk, twopkp10000; + long double z; int k, n, n2; uint16_t hx, ix; @@ -242,40 +246,39 @@ ix = hx & 0x7fff; if (ix >= BIAS + 13) { /* |x| >= 8192 or x is NaN */ if (ix == BIAS + LDBL_MAX_EXP) { - if (hx & 0x8000 && u.xbits.man == 1ULL << 63) - return (0.0L); /* x is -Inf */ - return (x + x); /* x is +Inf, NaN or unsupported */ + if (hx & 0x8000) /* x is -Inf, -NaN or unsupported */ + return (-1 / x); + return (x + x); /* x is +Inf, +NaN or unsupported */ } - if (x > o_threshold.e) + if (x > o_threshold) return (huge * huge); - if (x < u_threshold.e) + if (x < u_threshold) return (tiny * tiny); - } else if (ix < BIAS - 66) { /* |x| < 0x1p-66 */ - /* includes pseudo-denormals */ - if (huge + x > 1.0L) /* trigger inexact iff x != 0 */ - return (1.0L + x); + } else if (ix < BIAS - 65) { /* |x| < 0x1p-65 (includes pseudos) */ + return (1 + x); /* 1 with inexact iff x != 0 */ } ENTERI(); - /* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */ + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ /* Use a specialized rint() to get fn. Assume round-to-nearest. */ fn = x * INV_L + 0x1.8p63 - 0x1.8p63; r = x - fn * L1 - fn * L2; /* r = r1 + r2 done independently. */ #if defined(HAVE_EFFICIENT_IRINTL) - n = irintl(fn); + n = irintl(fn); #elif defined(HAVE_EFFICIENT_IRINT) - n = irint(fn); + n = irint(fn); #else - n = (int)fn; + n = (int)fn; #endif n2 = (unsigned)n % INTERVALS; - k = (n - n2) / INTERVALS; + /* Depend on the sign bit being propagated: */ + k = n >> LOG2_INTERVALS; r1 = x - fn * L1; - r2 = -fn * L2; + r2 = fn * -L2; /* Prepare scale factors. */ - v.xbits.man = 1ULL << 63; + v.e = 1; if (k >= LDBL_MIN_EXP) { v.xbits.expsign = BIAS + k; twopk = v.e; @@ -284,21 +287,183 @@ twopkp10000 = v.e; } - /* Evaluate expl(midpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */ - /* Here q = q(r), not q(r1), since r1 is lopped like L1. */ - t45 = r * P5 + P4; + /* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */ z = r * r; - t23 = r * P3 + P2; - q = r2 + z * t23 + z * z * t45 + z * z * z * P6; - t = (long double)s[n2].lo + s[n2].hi; - t = s[n2].lo + t * (q + r1) + s[n2].hi; + q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6; + t = (long double)tbl[n2].lo + tbl[n2].hi; + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; /* Scale by 2**k. */ if (k >= LDBL_MIN_EXP) { if (k == LDBL_MAX_EXP) - RETURNI(t * 2.0L * 0x1p16383L); + RETURNI(t * 2 * 0x1p16383L); RETURNI(t * twopk); } else { RETURNI(t * twopkp10000 * twom10000); } } + +/** + * Compute expm1l(x) for Intel 80-bit format. This is based on: + * + * PTP Tang, "Table-driven implementation of the Expm1 function + * in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18, + * 211-222 (1992). + */ + +/* + * Our T1 and T2 are chosen to be approximately the points where method + * A and method B have the same accuracy. Tang's T1 and T2 are the + * points where method A's accuracy changes by a full bit. For Tang, + * this drop in accuracy makes method A immediately less accurate than + * method B, but our larger INTERVALS makes method A 2 bits more + * accurate so it remains the most accurate method significantly + * closer to the origin despite losing the full bit in our extended + * range for it. + */ +static const double +T1 = -0.1659, /* ~-30.625/128 * log(2) */ +T2 = 0.1659; /* ~30.625/128 * log(2) */ + +/* + * Domain [-0.1659, 0.1659], range ~[-1.2027e-22, 3.4417e-22]: + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-71.2 + */ +static const union IEEEl2bits +B3 = LD80C(0xaaaaaaaaaaaaaaab, -3, 1.66666666666666666671e-1L), +B4 = LD80C(0xaaaaaaaaaaaaaaac, -5, 4.16666666666666666712e-2L); + +static const double +B5 = 8.3333333333333245e-3, /* 0x1.111111111110cp-7 */ +B6 = 1.3888888888888861e-3, /* 0x1.6c16c16c16c0ap-10 */ +B7 = 1.9841269841532042e-4, /* 0x1.a01a01a0319f9p-13 */ +B8 = 2.4801587302069236e-5, /* 0x1.a01a01a03cbbcp-16 */ +B9 = 2.7557316558468562e-6, /* 0x1.71de37fd33d67p-19 */ +B10 = 2.7557315829785151e-7, /* 0x1.27e4f91418144p-22 */ +B11 = 2.5063168199779829e-8, /* 0x1.ae94fabdc6b27p-26 */ +B12 = 2.0887164654459567e-9; /* 0x1.1f122d6413fe1p-29 */ + +long double +expm1l(long double x) +{ + union IEEEl2bits u, v; + long double fn, hx2_hi, hx2_lo, q, r, r1, r2, t, twomk, twopk, x_hi; + long double x_lo, x2, z; + long double x4; + int k, n, n2; + uint16_t hx, ix; + + /* Filter out exceptional cases. */ + u.e = x; + hx = u.xbits.expsign; + ix = hx & 0x7fff; + if (ix >= BIAS + 6) { /* |x| >= 64 or x is NaN */ + if (ix == BIAS + LDBL_MAX_EXP) { + if (hx & 0x8000) /* x is -Inf, -NaN or unsupported */ + return (-1 / x - 1); + return (x + x); /* x is +Inf, +NaN or unsupported */ + } + if (x > o_threshold) + return (huge * huge); + /* + * expm1l() never underflows, but it must avoid + * unrepresentable large negative exponents. We used a + * much smaller threshold for large |x| above than in + * expl() so as to handle not so large negative exponents + * in the same way as large ones here. + */ + if (hx & 0x8000) /* x <= -64 */ + return (tiny - 1); /* good for x < -65ln2 - eps */ + } + + ENTERI(); + + if (T1 < x && x < T2) { + if (ix < BIAS - 64) { /* |x| < 0x1p-64 (includes pseudos) */ + /* x (rounded) with inexact if x != 0: */ + RETURNI(x == 0 ? x : + (0x1p100 * x + fabsl(x)) * 0x1p-100); + } + + x2 = x * x; + x4 = x2 * x2; + q = x4 * (x2 * (x4 * + /* + * XXX the number of terms is no longer good for + * pairwise grouping of all except B3, and the + * grouping is no longer from highest down. + */ + (x2 * B12 + (x * B11 + B10)) + + (x2 * (x * B9 + B8) + (x * B7 + B6))) + + (x * B5 + B4.e)) + x2 * x * B3.e; + + x_hi = (float)x; + x_lo = x - x_hi; + hx2_hi = x_hi * x_hi / 2; + hx2_lo = x_lo * (x + x_hi) / 2; + if (ix >= BIAS - 7) + RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi)); + else + RETURNI(hx2_lo + q + hx2_hi + x); + } + + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ + /* Use a specialized rint() to get fn. Assume round-to-nearest. */ + fn = x * INV_L + 0x1.8p63 - 0x1.8p63; +#if defined(HAVE_EFFICIENT_IRINTL) + n = irintl(fn); +#elif defined(HAVE_EFFICIENT_IRINT) + n = irint(fn); +#else + n = (int)fn; +#endif + n2 = (unsigned)n % INTERVALS; + k = n >> LOG2_INTERVALS; + r1 = x - fn * L1; + r2 = fn * -L2; + r = r1 + r2; + + /* Prepare scale factor. */ + v.e = 1; + v.xbits.expsign = BIAS + k; + twopk = v.e; + + /* + * Evaluate lower terms of + * expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). + */ + z = r * r; + q = r2 + z * (A2 + r * A3) + z * z * (A4 + r * A5) + z * z * z * A6; + + t = (long double)tbl[n2].lo + tbl[n2].hi; + + if (k == 0) { + t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + + (tbl[n2].hi - 1); + RETURNI(t); + } + if (k == -1) { + t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + + (tbl[n2].hi - 2); + RETURNI(t / 2); + } + if (k < -7) { + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; + RETURNI(t * twopk - 1); + } + if (k > 2 * LDBL_MANT_DIG - 1) { + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; + if (k == LDBL_MAX_EXP) + RETURNI(t * 2 * 0x1p16383L - 1); + RETURNI(t * twopk - 1); + } + + v.xbits.expsign = BIAS - k; + twomk = v.e; + + if (k > LDBL_MANT_DIG - 1) + t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi; + else + t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk); + RETURNI(t * twopk); +} Index: ld128/s_expl.c =================================================================== --- ld128/s_expl.c (revision 251146) +++ ld128/s_expl.c (working copy) @@ -1,5 +1,5 @@ /*- - * Copyright (c) 2012 Steven G. Kargl + * Copyright (c) 2009-2013 Steven G. Kargl * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -22,6 +22,8 @@ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * Optimized by Bruce D. Evans. */ #include @@ -38,35 +40,67 @@ #include "math_private.h" #define INTERVALS 128 +#define LOG2_INTERVALS 7 #define BIAS (LDBL_MAX_EXP - 1) +static const long double +huge = 0x1p10000L, +twom10000 = 0x1p-10000L; +/* XXX Prevent gcc from erroneously constant folding this: */ static volatile const long double tiny = 0x1p-10000L; static const long double -INV_L = 1.84664965233787316142070359168242182e+02L, -L1 = 5.41521234812457272982212595914567508e-03L, -L2 = -1.02536706388947310094527932552595546e-29L, -huge = 0x1p10000L, +/* log(2**16384 - 0.5) rounded towards zero: */ +/* log(2**16384 - 0.5 + 1) rounded towards zero for expm1l() is the same: */ o_threshold = 11356.523406294143949491931077970763428L, -twom10000 = 0x1p-10000L, +/* log(2**(-16381-64-1)) rounded towards zero: */ u_threshold = -11433.462743336297878837243843452621503L; +static const double +/* + * ln2/INTERVALS = L1+L2 (hi+lo decomposition for multiplication). L1 must + * have at least 22 (= log2(|LDBL_MIN_EXP-extras|) + log2(INTERVALS)) lowest + * bits zero so that multiplication of it by n is exact. + */ +INV_L = 1.8466496523378731e+2, /* 0x171547652b82fe.0p-45 */ +L2 = -1.0253670638894731e-29; /* -0x1.9ff0342542fc3p-97 */ static const long double -P2 = 5.00000000000000000000000000000000000e-1L, -P3 = 1.66666666666666666666666666666666972e-1L, -P4 = 4.16666666666666666666666666653708268e-2L, -P5 = 8.33333333333333333333333315069867254e-3L, -P6 = 1.38888888888888888888996596213795377e-3L, -P7 = 1.98412698412698412718821436278644414e-4L, -P8 = 2.48015873015869681884882576649543128e-5L, -P9 = 2.75573192240103867817876199544468806e-6L, -P10 = 2.75573236172670046201884000197885520e-7L, -P11 = 2.50517544183909126492878226167697856e-8L; +/* 0x1.62e42fefa39ef35793c768000000p-8 */ +L1 = 5.41521234812457272982212595914567508e-3L; +/* + * XXX values in hex in comments have been lost (or were never present) + * from here. + */ +static const long double +/* + * Domain [-0.002708, 0.002708], range ~[-2.4021e-38, 2.4234e-38]: + * |exp(x) - p(x)| < 2**-124.9 + * (0.002708 is ln2/(2*INTERVALS) rounded up a little). + * + * XXX the coeffs aren't very carefully rounded, and I get 2.3 more bits. + */ +A2 = 0.5, +A3 = 1.66666666666666666666666666651085500e-1L, +A4 = 4.16666666666666666666666666425885320e-2L, +A5 = 8.33333333333333333334522877160175842e-3L, +A6 = 1.38888888888888888889971139751596836e-3L; + +static const double +A7 = 1.9841269841269470e-4, /* 0x1.a01a01a019f91p-13 */ +A8 = 2.4801587301585286e-5, /* 0x1.71de3ec75a967p-19 */ +A9 = 2.7557324277411235e-6, /* 0x1.71de3ec75a967p-19 */ +A10 = 2.7557333722375069e-7; /* 0x1.27e505ab56259p-22 */ + static const struct { + /* + * hi must be rounded to at most 106 bits so that multiplication + * by r1 in expm1l() is exact, but it is rounded to 88 bits due to + * historical accidents. + */ long double hi; long double lo; -} s[INTERVALS] = { +} tbl[INTERVALS] = { 0x1p0L, 0x0p0L, 0x1.0163da9fb33356d84a66aep0L, 0x3.36dcdfa4003ec04c360be2404078p-92L, 0x1.02c9a3e778060ee6f7cacap0L, 0x4.f7a29bde93d70a2cabc5cb89ba10p-92L, @@ -201,9 +235,10 @@ expl(long double x) { union IEEEl2bits u, v; - long double fn, r, r1, r2, q, t, twopk, twopkp10000; + long double q, r, r1, t, twopk, twopkp10000; + double dr, fn, r2; int k, n, n2; - uint32_t hx, ix; + uint16_t hx, ix; /* Filter out exceptional cases. */ u.e = x; @@ -211,31 +246,39 @@ ix = hx & 0x7fff; if (ix >= BIAS + 13) { /* |x| >= 8192 or x is NaN */ if (ix == BIAS + LDBL_MAX_EXP) { - if (hx & 0x8000 && u.xbits.manh == 0 && - u.xbits.manl == 0) - return (0.0L); /* x is -Inf */ - return (x + x); /* x is +Inf or NaN */ + if (hx & 0x8000) /* x is -Inf or -NaN */ + return (-1 / x); + return (x + x); /* x is +Inf or +NaN */ } if (x > o_threshold) return (huge * huge); if (x < u_threshold) return (tiny * tiny); - } else if (ix < BIAS - 115) { /* |x| < 0x1p-115 */ - if (huge + x > 1.0L) /* trigger inexact iff x != 0 */ - return (1.0L + x); + } else if (ix < BIAS - 114) { /* |x| < 0x1p-114 */ + return (1 + x); /* 1 with inexact iff x != 0 */ } - /* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */ - fn = x * INV_L + 0x1.8p112 - 0x1.8p112; - n = (int)fn; + ENTERI(); + + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ + /* Use a specialized rint() to get fn. Assume round-to-nearest. */ + /* XXX assume no extra precision for the additions, as for trig fns. */ + /* XXX this set of comments is now quadruplicated. */ + fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52; +#if defined(HAVE_EFFICIENT_IRINT) + n = irint(fn); +#else + n = (int)fn; +#endif n2 = (unsigned)n % INTERVALS; - k = (n - n2) / INTERVALS; + k = n >> LOG2_INTERVALS; r1 = x - fn * L1; - r2 = -fn * L2; + r2 = fn * -L2; + r = r1 + r2; /* Prepare scale factors. */ - v.xbits.manh = 0; - v.xbits.manl = 0; + /* XXX sparc64 multiplication is so slow that scalbnl() is faster. */ + v.e = 1; if (k >= LDBL_MIN_EXP) { v.xbits.expsign = BIAS + k; twopk = v.e; @@ -244,18 +287,224 @@ twopkp10000 = v.e; } - r = r1 + r2; - q = r * r * (P2 + r * (P3 + r * (P4 + r * (P5 + r * (P6 + r * (P7 + - r * (P8 + r * (P9 + r * (P10 + r * P11))))))))); - t = s[n2].lo + s[n2].hi; - t = s[n2].hi + (s[n2].lo + t * (r2 + q + r1)); + /* Evaluate expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). */ + dr = r; + q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 + + dr * (A7 + dr * (A8 + dr * (A9 + dr * A10)))))))); + t = tbl[n2].lo + tbl[n2].hi; + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; /* Scale by 2**k. */ if (k >= LDBL_MIN_EXP) { if (k == LDBL_MAX_EXP) - return (t * 2.0L * 0x1p16383L); - return (t * twopk); + RETURNI(t * 2 * 0x1p16383L); + RETURNI(t * twopk); } else { - return (t * twopkp10000 * twom10000); + RETURNI(t * twopkp10000 * twom10000); } } + +/* + * Our T1 and T2 are chosen to be approximately the points where method + * A and method B have the same accuracy. Tang's T1 and T2 are the + * points where method A's accuracy changes by a full bit. For Tang, + * this drop in accuracy makes method A immediately less accurate than + * method B, but our larger INTERVALS makes method A 2 bits more + * accurate so it remains the most accurate method significantly + * closer to the origin despite losing the full bit in our extended + * range for it. + */ +static const double +T1 = -0.1659, /* ~-30.625/128 * log(2) */ +T2 = 0.1659; /* ~30.625/128 * log(2) */ + +/* + * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2]. + * Setting T3 to 0 would require the |x| < 0x1p-113 condition to appear + * in both subintervals, so set T3 = 2**-5, which places the condition + * into the [T1:T3] interval. + * + * XXX we now do this more to (partially) balance the number of terms + * in the C and D polys than to avoid checking the conditon in both + * intervals. + * + * XXX these micro-optimizations are excessive. + */ +static const double +T3 = 0.03125; + +/* + * Domain [-0.1659, 0.03125], range ~[2.9134e-44, 1.8404e-37]: + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-122.03 + */ +static const long double +C3 = 1.66666666666666666666666666666666667e-1L, +C4 = 4.16666666666666666666666666666666645e-2L, +C5 = 8.33333333333333333333333333333371638e-3L, +C6 = 1.38888888888888888888888888891188658e-3L, +C7 = 1.98412698412698412698412697235950394e-4L, +C8 = 2.48015873015873015873015112487849040e-5L, +C9 = 2.75573192239858906525606685484412005e-6L, +C10 = 2.75573192239858906612966093057020362e-7L, +C11 = 2.50521083854417203619031960151253944e-8L, +C12 = 2.08767569878679576457272282566520649e-9L, +C13 = 1.60590438367252471783548748824255707e-10L; + +static const double +C14 = 1.1470745580491932e-11, /* 0x1.93974a81dae30p-37 */ +C15 = 7.6471620181090468e-13, /* 0x1.ae7f3820adab1p-41 */ +C16 = 4.7793721460260450e-14, /* 0x1.ae7cd18a18eacp-45 */ +C17 = 2.8074757356658877e-15, /* 0x1.949992a1937d9p-49 */ +C18 = 1.4760610323699476e-16; /* 0x1.545b43aabfbcdp-53 */ + +/* + * Domain [0.03125, 0.1659], range ~[-2.7676e-37, -1.0367e-38]: + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-121.44 + * + */ +static const long double +D3 = 1.66666666666666666666666666666682245e-1L, +D4 = 4.16666666666666666666666666634228324e-2L, +D5 = 8.33333333333333333333333364022244481e-3L, +D6 = 1.38888888888888888888887138722762072e-3L, +D7 = 1.98412698412698412699085805424661471e-4L, +D8 = 2.48015873015873015687993712101479612e-5L, +D9 = 2.75573192239858944101036288338208042e-6L, +D10 = 2.75573192239853161148064676533754048e-7L, +D11 = 2.50521083855084570046480450935267433e-8L, +D12 = 2.08767569819738524488686318024854942e-9L, +D13 = 1.60590442297008495301927448122499313e-10L; + +static const double +D14 = 1.1470726176204336e-11, /* 0x1.93971dc395d9ep-37 */ +D15 = 7.6478532249581686e-13, /* 0x1.ae892e3D16fcep-41 */ +D16 = 4.7628892832607741e-14, /* 0x1.ad00Dfe41feccp-45 */ +D17 = 3.0524857220358650e-15; /* 0x1.D7e8d886Df921p-49 */ + +long double +expm1l(long double x) +{ + union IEEEl2bits u, v; + long double hx2_hi, hx2_lo, q, r, r1, t, twomk, twopk, x_hi; + long double x_lo, x2; + double dr, dx, fn, r2; + int k, n, n2; + uint16_t hx, ix; + + /* Filter out exceptional cases. */ + u.e = x; + hx = u.xbits.expsign; + ix = hx & 0x7fff; + if (ix >= BIAS + 7) { /* |x| >= 128 or x is NaN */ + if (ix == BIAS + LDBL_MAX_EXP) { + if (hx & 0x8000) /* x is -Inf or -NaN */ + return (-1 / x - 1); + return (x + x); /* x is +Inf or +NaN */ + } + if (x > o_threshold) + return (huge * huge); + /* + * expm1l() never underflows, but it must avoid + * unrepresentable large negative exponents. We used a + * much smaller threshold for large |x| above than in + * expl() so as to handle not so large negative exponents + * in the same way as large ones here. + */ + if (hx & 0x8000) /* x <= -128 */ + return (tiny - 1); /* good for x < -114ln2 - eps */ + } + + ENTERI(); + + if (T1 < x && x < T2) { + x2 = x * x; + dx = x; + + if (x < T3) { + if (ix < BIAS - 113) { /* |x| < 0x1p-113 */ + /* x (rounded) with inexact if x != 0: */ + RETURNI(x == 0 ? x : + (0x1p200 * x + fabsl(x)) * 0x1p-200); + } + q = x * x2 * C3 + x2 * x2 * (C4 + x * (C5 + x * (C6 + + x * (C7 + x * (C8 + x * (C9 + x * (C10 + + x * (C11 + x * (C12 + x * (C13 + + dx * (C14 + dx * (C15 + dx * (C16 + + dx * (C17 + dx * C18)))))))))))))); + } else { + q = x * x2 * D3 + x2 * x2 * (D4 + x * (D5 + x * (D6 + + x * (D7 + x * (D8 + x * (D9 + x * (D10 + + x * (D11 + x * (D12 + x * (D13 + + dx * (D14 + dx * (D15 + dx * (D16 + + dx * D17))))))))))))); + } + + x_hi = (float)x; + x_lo = x - x_hi; + hx2_hi = x_hi * x_hi / 2; + hx2_lo = x_lo * (x + x_hi) / 2; + if (ix >= BIAS - 7) + RETURNI(hx2_lo + x_lo + q + (hx2_hi + x_hi)); + else + RETURNI(hx2_lo + q + hx2_hi + x); + } + + /* Reduce x to (k*ln2 + endpoint[n2] + r1 + r2). */ + /* Use a specialized rint() to get fn. Assume round-to-nearest. */ + fn = (double)x * INV_L + 0x1.8p52 - 0x1.8p52; +#if defined(HAVE_EFFICIENT_IRINT) + n = irint(fn); +#else + n = (int)fn; +#endif + n2 = (unsigned)n % INTERVALS; + k = n >> LOG2_INTERVALS; + r1 = x - fn * L1; + r2 = fn * -L2; + r = r1 + r2; + + /* Prepare scale factor. */ + v.e = 1; + v.xbits.expsign = BIAS + k; + twopk = v.e; + + /* + * Evaluate lower terms of + * expl(endpoint[n2] + r1 + r2) = tbl[n2] * expl(r1 + r2). + */ + dr = r; + q = r2 + r * r * (A2 + r * (A3 + r * (A4 + r * (A5 + r * (A6 + + dr * (A7 + dr * (A8 + dr * (A9 + dr * A10)))))))); + + t = tbl[n2].lo + tbl[n2].hi; + + if (k == 0) { + t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + + (tbl[n2].hi - 1); + RETURNI(t); + } + if (k == -1) { + t = tbl[n2].lo * (r1 + 1) + t * q + tbl[n2].hi * r1 + + (tbl[n2].hi - 2); + RETURNI(t / 2); + } + if (k < -7) { + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; + RETURNI(t * twopk - 1); + } + if (k > 2 * LDBL_MANT_DIG - 1) { + t = tbl[n2].lo + t * (q + r1) + tbl[n2].hi; + if (k == LDBL_MAX_EXP) + RETURNI(t * 2 * 0x1p16383L - 1); + RETURNI(t * twopk - 1); + } + + v.xbits.expsign = BIAS - k; + twomk = v.e; + + if (k > LDBL_MANT_DIG - 1) + t = tbl[n2].lo - twomk + t * (q + r1) + tbl[n2].hi; + else + t = tbl[n2].lo + t * (q + r1) + (tbl[n2].hi - twomk); + RETURNI(t * twopk); +} From owner-freebsd-numerics@FreeBSD.ORG Fri May 31 17:02:28 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 81EE555D for ; Fri, 31 May 2013 17:02:28 +0000 (UTC) (envelope-from s.montgomerysmith@gmail.com) Received: from mail-ie0-x234.google.com (mail-ie0-x234.google.com [IPv6:2607:f8b0:4001:c03::234]) by mx1.freebsd.org (Postfix) with ESMTP id 5866EFB7 for ; Fri, 31 May 2013 17:02:28 +0000 (UTC) Received: by mail-ie0-f180.google.com with SMTP id b11so4710449iee.11 for ; Fri, 31 May 2013 10:02:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :x-enigmail-version:content-type:content-transfer-encoding; bh=4ekXWtv/VLO5u0H6k7jVSf7z16PjejY29KIH8UQyiU0=; b=U/WKFnJ36dFL285BtTAevY/UvwMVHNFHt5rLPBTOkBTvNwWprrFP/kh10FK29tVt6k 1V8SHscKetR6m+nf/prkp11t5HJ0b9Yy2myeUHVhHSd/taAT+GjBluIaHnmllJcvwAQM dXxX08Ktfx5kKRwlUsXVOAircR0abF0yOgbic+UxcKoyQZdHGYmiXWqEiJlaq95ShBr2 kglCbYTJB2ta4/MWVDgoOEv+icJI+0XLqu5z0hjiui63OEAwNG9Z2nNICtlTAFsmWN8n DQ58hpJkXTd+k7Llcy8xebhHXwfHF8d53Gk3UbuWT39uR+Nh8zZrNX6uL/fhh7Ej0nA3 FQAQ== X-Received: by 10.50.136.201 with SMTP id qc9mr2159648igb.47.1370019747862; Fri, 31 May 2013 10:02:27 -0700 (PDT) Received: from [10.7.129.223] ([161.130.188.41]) by mx.google.com with ESMTPSA id z6sm864780igw.8.2013.05.31.10.02.25 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 31 May 2013 10:02:26 -0700 (PDT) Sender: Stephen Montgomery-Smith Message-ID: <51A8D7A0.5060905@missouri.edu> Date: Fri, 31 May 2013 12:02:24 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org Subject: cacosh etc and bin/170206 X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 May 2013 17:02:28 -0000 Do you think it is OK to close PR bin/170206? The only reason to keep it open is that the long double functions haven't been committed yet. But I don't see how keeping this PR open will have any effect on how fast this will happen. From owner-freebsd-numerics@FreeBSD.ORG Fri May 31 19:14:10 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7CADEB72 for ; Fri, 31 May 2013 19:14:10 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 62755868 for ; Fri, 31 May 2013 19:14:10 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r4VJEA0D074365 for ; Fri, 31 May 2013 12:14:10 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r4VJEA9d074364 for freebsd-numerics@freebsd.org; Fri, 31 May 2013 12:14:10 -0700 (PDT) (envelope-from sgk) Date: Fri, 31 May 2013 12:14:10 -0700 From: Steve Kargl To: freebsd-numerics@freebsd.org Subject: cosh magic number? Message-ID: <20130531191410.GA74343@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 May 2013 19:14:10 -0000 In msun/src/e_cosh.c, one finds the comment * * exp(x) + 1/exp(x) * ln2/2 <= x <= 22 : cosh(x) := ------------------- * 2 Where does the magic number 22 come from? Using exp(-|2x|) = 2**(1-p) with p = 53 for double, I arrive at 18.022, which is a little too small. #include #include int main(void) { double x, y, z; x = 18.022; /* x = 19; */ y = exp(x); z = cosh(x); printf("%a\n%a\n%a\n", z, 0.5*(y + 1/y), 0.5 * y); return 0; } % cc -o z -O a.c -lm && ./z 0x1.000b5bd5b4beep+25 0x1.000b5bd5b4beep+25 0x1.000b5bd5b4bedp+25 Rounding up to 19 gives % cc -o z -O a.c -lm && ./z 0x1.546d8f9ed26e1p+26 0x1.546d8f9ed26e1p+26 0x1.546d8f9ed26e1p+26 So, why 22? -- Steve From owner-freebsd-numerics@FreeBSD.ORG Fri May 31 19:18:24 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B8C40BB1 for ; Fri, 31 May 2013 19:18:24 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 65FD1883 for ; Fri, 31 May 2013 19:18:24 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id EE442D41FF3; Sat, 1 Jun 2013 05:18:16 +1000 (EST) Date: Sat, 1 Jun 2013 05:18:15 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: Patches for s_expl.c In-Reply-To: <20130531154608.GA73175@troutmask.apl.washington.edu> Message-ID: <20130601044545.B15695@besplex.bde.org> References: <20130528172242.GA51485@troutmask.apl.washington.edu> <20130529062437.V4648@besplex.bde.org> <20130529162441.GA58773@troutmask.apl.washington.edu> <20130530045951.Y4776@besplex.bde.org> <20130530162723.GB66755@troutmask.apl.washington.edu> <20130531053652.H65974@besplex.bde.org> <20130531154608.GA73175@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=e/de0tV/ c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=SI7jfN9uMIUA:10 a=eYD37nbmresOHEikpvgA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: freebsd-numerics@freebsd.org, Bruce Evans X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 May 2013 19:18:24 -0000 On Fri, 31 May 2013, Steve Kargl wrote: > On Fri, May 31, 2013 at 06:19:09AM +1000, Bruce Evans wrote: >> On Thu, 30 May 2013, Steve Kargl wrote: > I've add most of your suggests. Perhaps too many :-). Leave out and/or act more of my XXX comments and I'm happy with it. >> In a development version, I need hi to have only about 56 bits. It is >> easy to re-split hi+lo for testing this. A 24-bit or 53-bit hi is >> sufficient and would give this automatically. > > Is this a version where you try to eliminate the C and D polynomials? Yes. It all works fine except for efficiency, but efficiency was the main reason to eliminate them (also simplicity -- we avoid a special case and hope that the pipelining effects from this compensate for a few extra instructions for the general case). >> @ static const long double >> @ +/* >> @ + * XXX none of the long double C or D coeffs except C10 is correctly printed. >> @ + * If you re-print their values in %.35Le format, the result is always >> @ + * different. For example, the last 2 digits in C3 should be 59, not 67. >> @ + * 67 is apparently from rounding an extra-precision value to 36 decimal >> @ + * places. >> @ + */ >> @ C3 = 1.66666666666666666666666666666666667e-1L, >> >> I didn't fix these. > > I didn't fix the coefficient as well. I'll do it if I ever get > around to regenerating the coefficients. The limiting testing > that I've been able to do on flame gave max ULP < 0.51. This, > IMO, is good enough for now. This is just cosmetic. In order to verify the coeffs, I like to be able to at least print them and get back the same results. My pari program that verifies them (by plotting the error function) does a little more. It has to round them to binary fractions, since any extra precision in them would make them appear to be more accurate then they are -- pari would use the extra precision of the decimal values, but the compiler has to convert to binary for the CPU to use. Here is a program to print their actual values (after rounding to binary and back to decimal): @ #include @ #include @ @ static const long double @ o_threshold = 11356.523406294143949491931077970763428L, @ u_threshold = -11433.462743336297878837243843452621503L, @ L1 = 5.41521234812457272982212595914567508e-3L, @ A3 = 1.66666666666666666666666666651085500e-1L, @ A4 = 4.16666666666666666666666666425885320e-2L, @ A5 = 8.33333333333333333334522877160175842e-3L, @ A6 = 1.38888888888888888889971139751596836e-3L, @ C3 = 1.66666666666666666666666666666666667e-1L, @ C4 = 4.16666666666666666666666666666666645e-2L, @ C5 = 8.33333333333333333333333333333371638e-3L, @ C6 = 1.38888888888888888888888888891188658e-3L, @ C7 = 1.98412698412698412698412697235950394e-4L, @ C8 = 2.48015873015873015873015112487849040e-5L, @ C9 = 2.75573192239858906525606685484412005e-6L, @ C10 = 2.75573192239858906612966093057020362e-7L, @ C11 = 2.50521083854417203619031960151253944e-8L, @ C12 = 2.08767569878679576457272282566520649e-9L, @ C13 = 1.60590438367252471783548748824255707e-10L, @ D3 = 1.66666666666666666666666666666682245e-1L, @ D4 = 4.16666666666666666666666666634228324e-2L, @ D5 = 8.33333333333333333333333364022244481e-3L, @ D6 = 1.38888888888888888888887138722762072e-3L, @ D7 = 1.98412698412698412699085805424661471e-4L, @ D8 = 2.48015873015873015687993712101479612e-5L, @ D9 = 2.75573192239858944101036288338208042e-6L, @ D10 = 2.75573192239853161148064676533754048e-7L, @ D11 = 2.50521083855084570046480450935267433e-8L, @ D12 = 2.08767569819738524488686318024854942e-9L, @ D13 = 1.60590442297008495301927448122499313e-10L; @ @ static const double @ INV_L = 1.8466496523378731e+2, /* 0x171547652b82fe.0p-45 */ @ L2 = -1.0253670638894731e-29, /* -0x1.9ff0342542fc3p-97 */ @ A7 = 1.9841269841269471e-4, @ A8 = 2.4801587301585284e-5, @ A9 = 2.7557324277411234e-6, @ A10 = 2.7557333722375072e-7, @ C14 = 1.1470745580491932e-11, /* 0x1.93974a81dae3p-37 */ @ C15 = 7.6471620181090468e-13, /* 0x1.ae7f3820adab1p-41 */ @ C16 = 4.7793721460260450e-14, /* 0x1.ae7cd18a18eacp-45 */ @ C17 = 2.8074757356658877e-15, /* 0x1.949992a1937d9p-49 */ @ C18 = 1.4760610323699476e-16, /* 0x1.545b43aabfbcdp-53 */ @ D14 = 1.1470726176204336e-11, /* 0x1.93971dc395d9ep-37 */ @ D15 = 7.6478532249581686e-13, /* 0x1.ae892e3D16fcep-41 */ @ D16 = 4.7628892832607741e-14, /* 0x1.ad00Dfe41feccp-45 */ @ D17 = 3.0524857220358650e-15; /* 0x1.D7e8d886Df921p-49 */ @ @ main() @ { @ printf(" %.35Le\n", o_threshold, o_threshold); @ printf(" %.35Le\n", u_threshold, u_threshold); @ printf(" %.35Le\n", L1, L1); @ printf(" %.35Le\n", A3, A3); @ printf(" %.35Le\n", A4, A4); @ printf(" %.35Le\n", A5, A5); @ printf(" %.35Le\n", A6, A6); @ printf(" %.35Le\n", C3, C3); @ printf(" %.35Le\n", C4, C4); @ printf(" %.35Le\n", C5, C5); @ printf(" %.35Le\n", C6, C6); @ printf(" %.35Le\n", C7, C7); @ printf(" %.35Le\n", C8, C8); @ printf(" %.35Le\n", C9, C9); @ printf(" %.35Le\n", C10, C10); @ printf(" %.35Le\n", C11, C11); @ printf(" %.35Le\n", C12, C12); @ printf(" %.35Le\n", C13, C13); @ printf(" %.35Le\n", D3, D3); @ printf(" %.35Le\n", D4, D4); @ printf(" %.35Le\n", D5, D5); @ printf(" %.35Le\n", D6, D6); @ printf(" %.35Le\n", D7, D7); @ printf(" %.35Le\n", D8, D8); @ printf(" %.35Le\n", D9, D9); @ printf(" %.35Le\n", D10, D10); @ printf(" %.35Le\n", D11, D11); @ printf(" %.35Le\n", D12, D12); @ printf(" %.35Le\n", D13, D13); @ @ printf(" %.16e %a\n", INV_L, INV_L); @ printf(" %.16e %a\n", L2, L2); @ printf(" %.16e %a\n", A7, A7); @ printf(" %.16e %a\n", A8, A9); @ printf(" %.16e %a\n", A9, A9); @ printf(" %.16e %a\n", A10, A10); @ printf(" %.16e %a\n", C14, C14); @ printf(" %.16e %a\n", C15, C15); @ printf(" %.16e %a\n", C16, C16); @ printf(" %.16e %a\n", C17, C17); @ printf(" %.16e %a\n", C18, C18); @ printf(" %.16e %a\n", D14, D14); @ printf(" %.16e %a\n", D15, D15); @ printf(" %.16e %a\n", D16, D16); @ printf(" %.16e %a\n", D17, D17); @ } > Final diff(?). Just omit some new XXX comments and fix one of the new XXX comments: > Index: ld128/s_expl.c > =================================================================== > --- ld128/s_expl.c (revision 251146) > +++ ld128/s_expl.c (working copy) > ... > @@ -38,35 +40,67 @@ > ... > +/* > + * XXX values in hex in comments have been lost (or were never present) > + * from here. > + */ Omit. > +static const long double > +/* > + * Domain [-0.002708, 0.002708], range ~[-2.4021e-38, 2.4234e-38]: > + * |exp(x) - p(x)| < 2**-124.9 > + * (0.002708 is ln2/(2*INTERVALS) rounded up a little). > + * > + * XXX the coeffs aren't very carefully rounded, and I get 2.3 more bits. > + */ Omit the XXX part. > ... > @@ -244,18 +287,224 @@ > +static const double > +T1 = -0.1659, /* ~-30.625/128 * log(2) */ > +T2 = 0.1659; /* ~30.625/128 * log(2) */ > + > +/* > + * Split the interval [T1:T2] into two intervals [T1:T3] and [T3:T2]. > + * Setting T3 to 0 would require the |x| < 0x1p-113 condition to appear > + * in both subintervals, so set T3 = 2**-5, which places the condition > + * into the [T1:T3] interval. > + * > + * XXX we now do this more to (partially) balance the number of terms > + * in the C and D polys than to avoid checking the conditon in both > + * intervals. Merge with the previous comment and remove XXX. I just noticed that you use a different notation for intervals than me -- [T1:T2] instead of [T1, T2]. The former looks like it is from a programming language and the latter is normal math notation. > ... > +/* > + * Domain [0.03125, 0.1659], range ~[-2.7676e-37, -1.0367e-38]: > + * |(exp(x)-1-x-x**2/2)/x - p(x)| < 2**-121.44 > + * > + */ Extra empty line. Bruce From owner-freebsd-numerics@FreeBSD.ORG Fri May 31 20:51:34 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A5DA542A for ; Fri, 31 May 2013 20:51:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au [211.29.132.191]) by mx1.freebsd.org (Postfix) with ESMTP id 2EB08C01 for ; Fri, 31 May 2013 20:51:33 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r4VKpOxf018305 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 1 Jun 2013 06:51:26 +1000 Date: Sat, 1 Jun 2013 06:51:24 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: cosh magic number? In-Reply-To: <20130531191410.GA74343@troutmask.apl.washington.edu> Message-ID: <20130601052415.H15844@besplex.bde.org> References: <20130531191410.GA74343@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=e/de0tV/ c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=WUpannN5onsA:10 a=qPt0-ISivhtabmaQ0fEA:9 a=CjuIK1q_8ugA:10 a=gwKr3FwWfh0Jz3qo:21 a=etSL3OnsOLLAcWig:21 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 May 2013 20:51:34 -0000 On Fri, 31 May 2013, Steve Kargl wrote: > In msun/src/e_cosh.c, one finds the comment > > * > * exp(x) + 1/exp(x) > * ln2/2 <= x <= 22 : cosh(x) := ------------------- > * 2 > > Where does the magic number 22 come from? It is just a threshold at which a sloppier approximation becomes adequate. But you know that... > Using exp(-|2x|) = 2**(1-p) with p = 53 for double, I > arrive at 18.022, which is a little too small. I get 18.368 using exp(-|2x|) = 2**p for the natural threshold. (Consider x+y instead of E+1/E. When x is 1+eps (with eps giving 1 in the last place, adding y = eps/2 causes rounding up to even). This y is 2**p times smaller than x. If x has extra precision, then y still needs to start more than bits further out for adding y to have no effect, even if the final result has no extra precision.) I first thought that the extras are guard bits. Perhaps they are, but guard bits are not representable unless there is extra precision. 22 gives log2(exp(44)) ~= 64.479 bits. 64 fits well with x86 extra precision. This is easier to test in float precision. Try all integer thresholds near the chosen one, on all x. Expect a difference for extra precision. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sat Jun 1 00:48:16 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 79F38BC9 for ; Sat, 1 Jun 2013 00:48:16 +0000 (UTC) (envelope-from lists@eitanadler.com) Received: from mail-pd0-f172.google.com (mail-pd0-f172.google.com [209.85.192.172]) by mx1.freebsd.org (Postfix) with ESMTP id 56C5F38C for ; Sat, 1 Jun 2013 00:48:15 +0000 (UTC) Received: by mail-pd0-f172.google.com with SMTP id 10so3026112pdi.3 for ; Fri, 31 May 2013 17:48:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eitanadler.com; s=0xdeadbeef; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=lKZEx8SzF437Bx0J9U79+xM85p3sHRYavF3+O23ZWMM=; b=KlMSDeVys6bYO1tu04fGiMytXXfOlP/I9wA9icvB8HR1pS7rFOn3xJEt3FT/RHSJWz C5kb7YGurzS+8TNHSy7wr3Q5RuT65aKHX5NgNE6CEkdFeDdtxaVxjWv32X0t1KIgYG+g jv037AkfkUdUC0C3OM9yvGYDimZqKVg0wNCp0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-gm-message-state; bh=lKZEx8SzF437Bx0J9U79+xM85p3sHRYavF3+O23ZWMM=; b=haYR2Gb+S+njylixzPIhzgwPE59MRyQ/yYb/t5VB0TSOckuYtdMCikhHMsnOi2HMpY 4rDA+dEUCHSw/6U+CPI3TC4bEVGgEQkIJ03TlIckyygEf+bJgkUiq226vXpYPFigQ9rC /m3H3fUt7lV2ba6+XNsntHBMWVDGW4VNzjLMNZUg5FczN2mxeXQwBOvUr8SNw7PIoPpl mQaZoFz9Ohjl3LwtwCBRbwxMRcRTo01rk6c4rwqRhyqmxgs/p0CAAev5V4Hb+66LcCi9 jdgZs6pNE327o7SDMrACau95ULPGhfFYdWOGNvoybbnte7yvQJzYtxsHiy1QAXk45WiB NnNw== X-Received: by 10.66.240.70 with SMTP id vy6mr16160275pac.70.1370047695634; Fri, 31 May 2013 17:48:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.70.91.139 with HTTP; Fri, 31 May 2013 17:47:45 -0700 (PDT) In-Reply-To: <51A8D7A0.5060905@missouri.edu> References: <51A8D7A0.5060905@missouri.edu> From: Eitan Adler Date: Sat, 1 Jun 2013 02:47:45 +0200 Message-ID: Subject: Re: cacosh etc and bin/170206 To: Stephen Montgomery-Smith Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQkt+23IzFnzAH1CqPsBXomigXLsb5Mtbi5uVfjyOdqDzEuZTzgGhNGj2hXeXMn1Bas++h7X Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Jun 2013 00:48:16 -0000 On 31 May 2013 19:02, Stephen Montgomery-Smith wrote: > Do you think it is OK to close PR bin/170206? The only reason to keep > it open is that the long double functions haven't been committed yet. > But I don't see how keeping this PR open will have any effect on how > fast this will happen. Please leave it open until the patches that are relevant are committed and MFCed (if appropriate). This isn't to speed up the final result, but to serve as a place to track current status. -- Eitan Adler