From owner-freebsd-numerics@freebsd.org Tue May 16 18:00:17 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 037EFD70897; Tue, 16 May 2017 18:00:17 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D52896D8; Tue, 16 May 2017 18:00:16 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4GI080B035243 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 16 May 2017 11:00:09 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4GI03I7035241; Tue, 16 May 2017 11:00:03 -0700 (PDT) (envelope-from sgk) Date: Tue, 16 May 2017 11:00:02 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-hackers@freebsd.org, freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170516180002.GA35211@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170428165658.GA17560@troutmask.apl.washington.edu> <20170429035131.E3406@besplex.bde.org> <20170428201522.GA32785@troutmask.apl.washington.edu> <20170429070036.A4005@besplex.bde.org> <20170428233552.GA34580@troutmask.apl.washington.edu> <20170429005924.GA37947@troutmask.apl.washington.edu> <20170429151457.F809@besplex.bde.org> <20170429194239.P3294@besplex.bde.org> <20170513205517.GA91911@troutmask.apl.washington.edu> <20170514071942.T1084@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170514071942.T1084@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 May 2017 18:00:17 -0000 On Sun, May 14, 2017 at 08:30:34AM +1000, Bruce Evans wrote: > On Sat, 13 May 2017, Steve Kargl wrote: > > > Based on other replies in this email exchange, I have gone back > > and looked at improvements to my __kernel_{cos|sin|tan}pi[fl] > > routines. The improvements where for both accuracy and speed. > > I really don't want another set of kernels (or more sets for degrees > instead of radians, and sincos). Ugh. __kernel_tan(x,y,iy) and __kernel_tanl(x,y,iy) seem to have inverted logic for iy. That is, iy = 1 for __kernel_tan() needs to be changed to y = -1 for __kernel_tanl(), and vice versa. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Tue May 16 22:46:21 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9840DD6E8D3 for ; Tue, 16 May 2017 22:46:21 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 77F01983 for ; Tue, 16 May 2017 22:46:21 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4GMkJBH041449 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 16 May 2017 15:46:19 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4GMkI3u041448; Tue, 16 May 2017 15:46:18 -0700 (PDT) (envelope-from sgk) Date: Tue, 16 May 2017 15:46:18 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170516224618.GA40855@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170428175851.A1386@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 May 2017 22:46:21 -0000 On Fri, Apr 28, 2017 at 06:00:19PM +1000, Bruce Evans wrote: > [cc list trimmed] > > On Thu, 27 Apr 2017, Steve Kargl wrote: > > > I have attached a new diff to the bugzilla report. The > > diff is 3090 lines and won't be broadcast the mailing list. > > But no one will see it there. > Picking an almost email to reply to. I have removed my new kernels and written shims to use the fdlibm trig kernels. I know that in some cases this leads to slower code, but a small (if not tiny) improvement in accuracy. This also leads to a significant reduction in code size. A couple of notes: 1) The ld128 routines have not been compiled, and so by extension these routines are untested. I have no access to an ld128 platform with sufficient infrastructure for testing. 2) Argument reduction is of the form x = n + r where n is an integer and 0 <= r < 1. Thus, this is indeed a half-cycle implementation. I have found discussions of argument reduction of the form x = 2 * n + r. I have no plans to pursue this. For sinpi and tanpi, one must test for even or odd n to determine the proper sign. 3) Patch against trunk follows. Index: lib/msun/Makefile =================================================================== --- lib/msun/Makefile (revision 318383) +++ lib/msun/Makefile (working copy) @@ -59,6 +59,7 @@ s_asinh.c s_asinhf.c s_atan.c s_atanf.c s_carg.c s_cargf.c s_cargl.c \ s_cbrt.c s_cbrtf.c s_ceil.c s_ceilf.c \ s_copysign.c s_copysignf.c s_cos.c s_cosf.c \ + s_cospi.c s_cospif.c \ s_csqrt.c s_csqrtf.c s_erf.c s_erff.c \ s_exp2.c s_exp2f.c s_expm1.c s_expm1f.c s_fabsf.c s_fdim.c \ s_finite.c s_finitef.c \ @@ -74,7 +75,10 @@ s_rint.c s_rintf.c s_round.c s_roundf.c \ s_scalbln.c s_scalbn.c s_scalbnf.c s_signbit.c \ s_signgam.c s_significand.c s_significandf.c s_sin.c s_sinf.c \ - s_tan.c s_tanf.c s_tanh.c s_tanhf.c s_tgammaf.c s_trunc.c s_truncf.c \ + s_sinpi.c s_sinpif.c \ + s_tan.c s_tanf.c s_tanh.c s_tanhf.c \ + s_tanpi.c s_tanpif.c \ + s_tgammaf.c s_trunc.c s_truncf.c \ w_cabs.c w_cabsf.c w_drem.c w_dremf.c # Location of fpmath.h and _fpmath.h @@ -100,11 +104,16 @@ e_lgammal.c e_lgammal_r.c \ e_remainderl.c e_sinhl.c e_sqrtl.c \ invtrig.c k_cosl.c k_sinl.c k_tanl.c \ - s_asinhl.c s_atanl.c s_cbrtl.c s_ceill.c s_cosl.c s_cprojl.c \ + s_asinhl.c s_atanl.c s_cbrtl.c s_ceill.c s_cosl.c \ + s_cospil.c \ + s_cprojl.c \ s_csqrtl.c s_erfl.c s_exp2l.c s_expl.c s_floorl.c s_fmal.c \ s_fmaxl.c s_fminl.c s_frexpl.c s_logbl.c s_logl.c s_nanl.c \ s_nextafterl.c s_nexttoward.c s_remquol.c s_rintl.c s_roundl.c \ - s_scalbnl.c s_sinl.c s_tanhl.c s_tanl.c s_truncl.c w_cabsl.c + s_scalbnl.c s_sinl.c \ + s_sinpil.c \ + s_tanhl.c s_tanl.c \ + s_tanpil.c s_truncl.c w_cabsl.c .endif # C99 complex functions @@ -131,13 +140,19 @@ MAN= acos.3 acosh.3 asin.3 asinh.3 atan.3 atan2.3 atanh.3 \ ceil.3 cacos.3 ccos.3 ccosh.3 cexp.3 \ - cimag.3 copysign.3 cos.3 cosh.3 csqrt.3 erf.3 exp.3 fabs.3 fdim.3 \ + cimag.3 copysign.3 cos.3 cosh.3 \ + cospi.3 \ + csqrt.3 erf.3 exp.3 fabs.3 fdim.3 \ feclearexcept.3 feenableexcept.3 fegetenv.3 \ fegetround.3 fenv.3 floor.3 \ fma.3 fmax.3 fmod.3 hypot.3 ieee.3 ieee_test.3 ilogb.3 j0.3 \ lgamma.3 log.3 lrint.3 lround.3 math.3 nan.3 \ nextafter.3 remainder.3 rint.3 \ - round.3 scalbn.3 signbit.3 sin.3 sinh.3 sqrt.3 tan.3 tanh.3 trunc.3 \ + round.3 scalbn.3 signbit.3 sin.3 sinh.3 \ + sinpi.3 \ + sqrt.3 tan.3 tanh.3 \ + tanpi.3 \ + trunc.3 \ complex.3 MLINKS+=acos.3 acosf.3 acos.3 acosl.3 @@ -166,6 +181,7 @@ MLINKS+=copysign.3 copysignf.3 copysign.3 copysignl.3 MLINKS+=cos.3 cosf.3 cos.3 cosl.3 MLINKS+=cosh.3 coshf.3 cosh.3 coshl.3 +MLINKS+=cospi.3 cospif.3 cospi.3 cospil.3 MLINKS+=csqrt.3 csqrtf.3 csqrt.3 csqrtl.3 MLINKS+=erf.3 erfc.3 erf.3 erff.3 erf.3 erfcf.3 erf.3 erfl.3 erf.3 erfcl.3 MLINKS+=exp.3 expm1.3 exp.3 expm1f.3 exp.3 expm1l.3 exp.3 pow.3 exp.3 powf.3 \ @@ -216,10 +232,12 @@ MLINKS+=scalbn.3 scalbnf.3 scalbn.3 scalbnl.3 MLINKS+=sin.3 sinf.3 sin.3 sinl.3 MLINKS+=sinh.3 sinhf.3 sinh.3 sinhl.3 +MLINKS+=sinpi.3 sinpif.3 sinpi.3 sinpil.3 MLINKS+=sqrt.3 cbrt.3 sqrt.3 cbrtf.3 sqrt.3 cbrtl.3 sqrt.3 sqrtf.3 \ sqrt.3 sqrtl.3 MLINKS+=tan.3 tanf.3 tan.3 tanl.3 MLINKS+=tanh.3 tanhf.3 tanh.3 tanhl.3 +MLINKS+=tanpi.3 tanpif.3 tanpi.3 tanpil.3 MLINKS+=trunc.3 truncf.3 trunc.3 truncl.3 .include Index: lib/msun/Symbol.map =================================================================== --- lib/msun/Symbol.map (revision 318383) +++ lib/msun/Symbol.map (working copy) @@ -294,4 +294,13 @@ casinl; catanl; catanhl; + cospi; + cospif; + cospil; + sinpi; + sinpif; + sinpil; + tanpi; + tanpif; + tanpil; }; Index: lib/msun/ld128/k_cospil.c =================================================================== --- lib/msun/ld128/k_cospil.c (nonexistent) +++ lib/msun/ld128/k_cospil.c (working copy) @@ -0,0 +1,41 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * See ../src/k_cospi.c for implementation details. + */ + +static inline long double +__kernel_cospil(long double x) +{ + long double hi, lo; + hi = (double)x; + lo = x - hi; + lo = lo * (pilo + pihi) + hi * pilo; + hi *= pihi; + _2sumF(hi, lo); + return (__kernel_cosl(hi, lo)); +} Property changes on: lib/msun/ld128/k_cospil.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/ld128/k_sinpil.c =================================================================== --- lib/msun/ld128/k_sinpil.c (nonexistent) +++ lib/msun/ld128/k_sinpil.c (working copy) @@ -0,0 +1,41 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * See ../src/k_sinpi.c for implementation details. + */ + +static inline long double +__kernel_sinpil(long double x) +{ + long double hi, lo; + hi = (double)x; + lo = x - hi; + lo = lo * (pilo + pihi) + hi * pilo; + hi *= pihi; + _2sumF(hi, lo); + return (__kernel_sinl(hi, lo, 1)); +} Property changes on: lib/msun/ld128/k_sinpil.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/ld128/s_cospil.c =================================================================== --- lib/msun/ld128/s_cospil.c (nonexistent) +++ lib/msun/ld128/s_cospil.c (working copy) @@ -0,0 +1,109 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * See ../src/s_cospi.c for implementation details. + * + * FIXME: This has not been compiled nor has it been tested for accuracy. + * FIXME: This should use bit twiddling. + */ + +#include "math.h" +#include "math_private.h" + +static const long double +pihi = 3.14159265358979322702026593105983920e+00L, +pilo = 1.14423774522196636802434264184180742e-17L; + +#include "k_cospil.c" +#include "k_sinpil.c" + +long double +cospil(long double x) +{ + volatile static const double vzero = 0; + long double ax, c, xf; + uint32_t ix; + + ax = fabsl(x); + + if (ax < 1) { + if (ax < 0.25) { + if (ax < 0x1p-60) { + if ((int)x == 0) + return (1); + } + return (__kernel_cospil(ax)); + } + + if (ax < 0.5) + c = __kernel_sinpil(0.5 - ax); + else if (ax < 0.75) { + if (ax == 0.5) + return (0); + c = -__kernel_sinpil(ax - 0.5); + } else + c = -__kernel_cospil(1 - ax); + return (c); + } + + if (ax < 0x1p112) { + + xf = floorl(ax); + + ax -= xf; + + if (x < 0.5) { + if (x < 0.25) + c = ax == 0 ? 1 : __kernel_cospil(ax); + else + c = __kernel_sinpil(0.5 - ax); + } else { + if (x < 0.75) { + if (ax == 0.5) + return (0); + c = -__kernel_sinpil(ax - 0.5); + } else + c = -__kernel_cospil(1 - ax); + } + + if (xf > 0x1p50) xf -= 0x1p50; + if (xf > 0x1p30) xf -= 0x1p30; + ix = (uint32_t)xf; + return (ix & 1 ? -c : c); + } + + if (isinf(x) || isnan(x)) + return (vzero / vzero); + + /* + * |x| >= 0x1p112 is always an even integer, so return 1. + * FIXME: should this raise FE_INEXACT or FE_INVALID. + */ + if (ax + 1 > 1) + ax = 1; + return (ax); +} Property changes on: lib/msun/ld128/s_cospil.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/ld128/s_sinpil.c =================================================================== --- lib/msun/ld128/s_sinpil.c (nonexistent) +++ lib/msun/ld128/s_sinpil.c (working copy) @@ -0,0 +1,117 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * See ../src/s_sinpi.c for implementation details. + * + * FIXME: This has not been compiled nor has it been tested for accuracy. + * FIXME: This should use bit twiddling. + */ + +#include "math.h" +#include "math_private.h" + +static const long double +pihi = 3.14159265358979322702026593105983920e+00L, +pilo = 1.14423774522196636802434264184180742e-17L; + +#include "k_cospil.c" +#include "k_sinpil.c" + +long double +sinpil(long double x) +{ + volatile static const double vzero = 0; + long double ax, s, xf, xhi, xlo; + uint32_t ix; + + ax = fabsl(x); + + if (ax < 1) { + if (ax < 0.25) { + if (ax < 0x1p-60) { + if (x == 0) + return (x); + ax = (double)x; + x -= ax; + s = pilo * x + pihi * x + pilo * ax + + pihi * ax; + return (s); + } + + s = __kernel_sinpil(ax); + return (copysignl(s, x)); + } + + if (ax < 0.5) + s = __kernel_cospil(0.5 - ax); + else if (ax < 0.75) + s = __kernel_cospil(ax - 0.5); + else + s = __kernel_sinpil(1 - ax); + return (copysignl(s, x)); + } + + if (ax < 0x1p112) { + + xf = floorl(ax); + + ax -= xf; + + if (ax == 0) { + s = 0; + } else { + if (ax < 0.5) { + if (ax <= 0.25) + s = __kernel_sinpil(ax); + else + s = __kernel_cospil(0.5 - ax); + } else { + if (ax < 0.75) + s = __kernel_cospil(ax - 0.5); + else + s = __kernel_sinpil(1 - ax); + } + + if (xf > 0x1p50) xf -= 0x1p50; + if (xf > 0x1p30) xf -= 0x1p30; + ix = (uint32_t)xf; + if (ix & 1) s = -s; + } + return (copysignl(s, x)); + } + + if (isinf(x) || isnan(x)) + return (vzero / vzero); + + /* + * |x| >= 0x1p112 is always an integer, so return +-0. + * FIXME: should this raise FE_INEXACT or FE_INVALID. + */ + if (ax + 1 > 1) + ax = copysignl(0, x); + return (ax); +} Property changes on: lib/msun/ld128/s_sinpil.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/ld128/s_tanpil.c =================================================================== --- lib/msun/ld128/s_tanpil.c (nonexistent) +++ lib/msun/ld128/s_tanpil.c (working copy) @@ -0,0 +1,123 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * See ../src/s_tanpi.c for implementation details. + * + * FIXME: This has not been compiled nor has it been tested for accuracy. + * FIXME: This should use bit twiddling. + * FIXME: This should use a polynomial approximation in [0,0.25], but the + * FIXME: increase in max ULP is probably small, so who cares. + */ + +#include "math.h" +#include "math_private.h" + +static const long double +pihi = 3.14159265358979322702026593105983920e+00L, +pilo = 1.14423774522196636802434264184180742e-17L; + +static inline long double +__kernel_tanpi(long double x) +{ + long double hi, lo, t; + + if (x < 0.25) { + hi = (double)x; + lo = x - hi; + lo = lo * (pilo + pihi) + hi * pilo; + hi *= pihi; + _2sumF(hi, lo); + t = __kernel_tanl(hi, lo, -1); + } else if (x > 0.25) { + x = 0.5 - x; + hi = (double)x; + lo = x - hi; + lo = lo * (pilo + pihi) + hi * pilo; + hi *= pihi; + _2sumF(hi, lo); + t = - __kernel_tanl(hi, lo, 1); + } else + t = 1; + + return (t); +} + +long double +tanpil(long double x) +{ + volatile static const double vzero = 0; + long double ax, xf; + uint32_t ix; + + ax = fabsl(ax); + + if (ax < 1) { + if (ax < 0.5) { + if (ax < 0x1p-60) { + if (x == 0) + return (x); + ax = (double)x; + x -= ax; + t = pilo * x + pihi * x + pilo * ax + + pihi * ax; + return (t); + } + t = __kernel_tanpil(ax); + } else if (ax == 0.5) + return ((ax - ax) / (ax - ax)); + else + t = -__kernel_tanpil(1 - ax); + return (copysignl(t, x)); + } + + if (ix < 0x1p112) { + + xf = floorl(ax); + + ax -= xf; + + if (ax < 0.5) + t = ax == 0 ? 0 : __kernel_tanpil(ax); + else if (ax == 0.5) + return ((ax - ax) / (ax - ax)); + else + t = -__kernel_tanpil(1 - ax); + return (copysignl(t, x)); + } + + /* x = +-inf or nan. */ + if (isinf(x) || isnan(x)) + return (vzero / vzero); + + /* + * |x| >= 0x1p53 is always an integer, so return +-0. + * FIXME: should this raise FE_INEXACT or FE_INVALID. + */ + if (ax + 1 > 1) + ax = copysignl(0, x); + return (ax); +} Property changes on: lib/msun/ld128/s_tanpil.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/ld80/k_cospil.c =================================================================== --- lib/msun/ld80/k_cospil.c (nonexistent) +++ lib/msun/ld80/k_cospil.c (working copy) @@ -0,0 +1,41 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * See ../src/k_cospi.c for implementation details. + */ + +static inline long double +__kernel_cospil(long double x) +{ + long double hi, lo; + hi = (float)x; + lo = x - hi; + lo = lo * (pilo + pihi) + hi * pilo; + hi *= pihi; + _2sumF(hi, lo); + return (__kernel_cosl(hi, lo)); +} Property changes on: lib/msun/ld80/k_cospil.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/ld80/k_sinpil.c =================================================================== --- lib/msun/ld80/k_sinpil.c (nonexistent) +++ lib/msun/ld80/k_sinpil.c (working copy) @@ -0,0 +1,41 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * See ../src/k_sinpi.c for implementation details. + */ + +static inline long double +__kernel_sinpil(long double x) +{ + long double hi, lo; + hi = (float)x; + lo = x - hi; + lo = lo * (pilo + pihi) + hi * pilo; + hi *= pihi; + _2sumF(hi, lo); + return (__kernel_sinl(hi, lo, 1)); +} Property changes on: lib/msun/ld80/k_sinpil.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/ld80/s_cospil.c =================================================================== --- lib/msun/ld80/s_cospil.c (nonexistent) +++ lib/msun/ld80/s_cospil.c (working copy) @@ -0,0 +1,129 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * See ../src/s_cospi.c for implementation details. + */ + +#ifdef __i386__ +#include +#endif +#include + +#include "fpmath.h" +#include "math.h" +#include "math_private.h" + +static const double +pihi = 3.1415926814079285e+00, /* 0x400921fb 0x58000000 */ +pilo =-2.7818135228334233e-08; /* 0xbe5dde97 0x3dcb3b3a */ + +#include "k_cospil.c" +#include "k_sinpil.c" + +long double +cospil(long double x) +{ + volatile static const double vzero = 0; + long double ax, c; + uint32_t j0; + uint64_t lx; + uint16_t hx, ix; + + EXTRACT_LDBL80_WORDS(hx, lx, x); + ix = hx & 0x7fff; + INSERT_LDBL80_WORDS(ax, ix, lx); + + ENTERI(); + + if (ix < 0x3fff) { /* |x| < 1 */ + if (ix < 0x3ffd) { /* |x| < 0.25 */ + if (ix < 0x3fdd) { /* |x| < 0x1p-34 */ + if ((int)x == 0) + RETURNI(1); + } + RETURNI(__kernel_cospil(ax)); + } + + if (ix < 0x3ffe) /* |x| < 0.5 */ + c = __kernel_sinpil(0.5 - ax); + else if (lx < 0xc000000000000000ull) { /* |x| < 0.75 */ + if (ax == 0.5) + RETURNI(0); + c = -__kernel_sinpil(ax - 0.5); + } else + c = -__kernel_cospil(1 - ax); + RETURNI(c); + } + + if (ix < 0x403e) { /* 1 <= |x| < 0x1p63 */ + /* Determine integer part of ax. */ + j0 = ix - 0x3fff + 1; + if (j0 < 32) { + lx = (lx >> 32) << 32; + lx &= ~(((lx << 32)-1) >> j0); + } else { + uint64_t m = (uint64_t)-1 >> (j0 + 1); + if (lx & m) lx &= ~m; + } + INSERT_LDBL80_WORDS(x, ix, lx); + + ax -= x; + EXTRACT_LDBL80_WORDS(ix, lx, ax); + + if (ix < 0x3ffe) { /* |x| < 0.5 */ + if (ix < 0x3ffd) /* |x| < 0.25 */ + c = ix == 0 ? 1 : __kernel_cospil(ax); + else + c = __kernel_sinpil(0.5 - ax); + + } else { + if (lx < 0xc000000000000000ull) { /* |x| < 0.75 */ + if (ax == 0.5) + RETURNI(0); + c = -__kernel_sinpil(ax - 0.5); + } else + c = -__kernel_cospil(1 - ax); + } + + if (j0 > 40) x -= 0x1p40; + if (j0 > 30) x -= 0x1p30; + j0 = (uint32_t)x; + + RETURNI(j0 & 1 ? -c : c); + } + + if (ix >= 0x7fff) + RETURNI(vzero / vzero); + + /* + * |x| >= 0x1p63 is always an even integer, so return 1. + * FIXME: should this raise FE_INEXACT or FE_INVALID. + */ + if (ax + 1 > 1) + ax = 1; + RETURNI(ax); +} Property changes on: lib/msun/ld80/s_cospil.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/ld80/s_sinpil.c =================================================================== --- lib/msun/ld80/s_sinpil.c (nonexistent) +++ lib/msun/ld80/s_sinpil.c (working copy) @@ -0,0 +1,136 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * See ../src/s_sinpi.c for implementation details. + */ + +#ifdef __i386__ +#include +#endif +#include + +#include "fpmath.h" +#include "math.h" +#include "math_private.h" + +static const double +pihi = 3.1415926814079285e+00, /* 0x400921fb 0x58000000 */ +pilo =-2.7818135228334233e-08; /* 0xbe5dde97 0x3dcb3b3a */ + +#include "k_cospil.c" +#include "k_sinpil.c" + +long double +sinpil(long double x) +{ + volatile static const double vzero = 0; + long double ax, s; + uint32_t j0; + uint64_t lx; + uint16_t hx, ix; + + EXTRACT_LDBL80_WORDS(hx, lx, x); + ix = hx & 0x7fff; + INSERT_LDBL80_WORDS(ax, ix, lx); + + ENTERI(); + + if (ix < 0x3fff) { /* |x| < 1 */ + if (ix < 0x3ffd) { /* |x| < 0.25 */ + if (ix < 0x3fdd) { /* |x| < 0x1p-34 */ + if (x == 0) + RETURNI(x); + ax = (float)x; + x -= ax; + s = pilo * x + pihi * x + pilo * ax + + pihi * ax; + RETURNI(s); + } + s = __kernel_sinpil(ax); + RETURNI((hx & 0x8000) ? -s : s); + } + + if (ix < 0x3ffe) /* |x| < 0.5 */ + s = __kernel_cospil(0.5 - ax); + else if (lx < 0xc000000000000000ull) /* |x| < 0.75 */ + s = __kernel_cospil(ax - 0.5); + else + s = __kernel_sinpil(1 - ax); + RETURNI((hx & 0x8000) ? -s : s); + } + + if (ix < 0x403e) { /* 1 <= |x| < 0x1p63 */ + /* Determine integer part of ax. */ + j0 = ix - 0x3fff + 1; + if (j0 < 32) { + lx = (lx >> 32) << 32; + lx &= ~(((lx << 32)-1) >> j0); + } else { + uint64_t m = (uint64_t)-1 >> (j0 + 1); + if (lx & m) lx &= ~m; + } + INSERT_LDBL80_WORDS(x, ix, lx); + + ax -= x; + EXTRACT_LDBL80_WORDS(ix, lx, ax); + + if (ix == 0) { + s = 0; + } else { + if (ix < 0x3ffe) { /* |x| < 0.5 */ + if (ix < 0x3ffd) /* |x| < 0.25 */ + s = __kernel_sinpil(ax); + else + s = __kernel_cospil(0.5 - ax); + } else { + /* |x| < 0.75 */ + if (lx < 0xc000000000000000ull) + s = __kernel_cospil(ax - 0.5); + else + s = __kernel_sinpil(1 - ax); + } + + if (j0 > 40) x -= 0x1p40; + if (j0 > 30) x -= 0x1p30; + j0 = (uint32_t)x; + if (j0 & 1) s = -s; + } + RETURNI((hx & 0x8000) ? -s : s); + } + + /* x = +-inf or nan. */ + if (ix >= 0x7fff) + RETURNI(vzero / vzero); + + /* + * |x| >= 0x1p63 is always an integer, so return +-0. + * FIXME: should this raise FE_INEXACT or FE_INVALID. + */ + if (ax + 1 > 1) + ax = copysignl(0, x); + RETURNI(ax); +} Property changes on: lib/msun/ld80/s_sinpil.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/ld80/s_tanpil.c =================================================================== --- lib/msun/ld80/s_tanpil.c (nonexistent) +++ lib/msun/ld80/s_tanpil.c (working copy) @@ -0,0 +1,139 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * See ../src/s_tanpi.c for implementation details. + */ + +#ifdef __i386__ +#include +#endif +#include + +#include "fpmath.h" +#include "math.h" +#include "math_private.h" + +static const double +pihi = 3.1415926814079285e+00, /* 0x400921fb 0x58000000 */ +pilo = -2.7818135228334233e-08; /* 0xbe5dde97 0x3dcb3b3a */ + +static inline long double +__kernel_tanpil(long double x) +{ + long double hi, lo, t; + + if (x < 0.25) { + hi = (float)x; + lo = x - hi; + lo = lo * (pilo + pihi) + hi * pilo; + hi *= pihi; + _2sumF(hi, lo); + t = __kernel_tanl(hi, lo, -1); + } else if (x > 0.25) { + x = 0.5 - x; + hi = (float)x; + lo = x - hi; + lo = lo * (pilo + pihi) + hi * pilo; + hi *= pihi; + _2sumF(hi, lo); + t = - __kernel_tanl(hi, lo, 1); + } else + t = 1; + + return (t); +} + +long double +tanpil(long double x) +{ + volatile static const double vzero = 0; + long double ax, t; + uint32_t j0; + uint64_t lx; + uint16_t hx, ix; + + EXTRACT_LDBL80_WORDS(hx, lx, x); + ix = hx & 0x7fff; + INSERT_LDBL80_WORDS(ax, ix, lx); + + ENTERI(); + + if (ix < 0x3fff) { /* |x| < 1 */ + if (ix < 0x3ffe) { /* |x| < 0.5 */ + if (ix < 0x3fdd) { /* |x| < 0x1p-34 */ + if (x == 0) + RETURNI(x); + ax = (float)x; + x -= ax; + t = pilo * x + pihi * x + pilo * ax + + pihi * ax; + RETURNI(t); + } + t = __kernel_tanpil(ax); + } else if (ax == 0.5) + RETURNI((ax - ax) / (ax - ax)); + else + t = -__kernel_tanpil(1 - ax); + RETURNI((hx & 0x8000) ? -t : t); + } + + if (ix < 0x403e) { /* 1 <= |x| < 0x1p63 */ + /* Determine integer part of ax. */ + j0 = ix - 0x3fff + 1; + if (j0 < 32) { + lx = (lx >> 32) << 32; + lx &= ~(((lx << 32)-1) >> j0); + } else { + uint64_t m = (uint64_t)-1 >> (j0 + 1); + if (lx & m) lx &= ~m; + } + INSERT_LDBL80_WORDS(x, ix, lx); + + ax -= x; + EXTRACT_LDBL80_WORDS(ix, lx, ax); + + if (ix < 0x3ffe) /* |x| < 0.5 */ + t = ax == 0 ? 0 : __kernel_tanpil(ax); + else if (ax == 0.5) + RETURNI((ax - ax) / (ax - ax)); + else + t = -__kernel_tanpil(1 - ax); + RETURNI((hx & 0x8000) ? -t : t); + } + + /* x = +-inf or nan. */ + if (ix >= 0x7fff) + RETURNI(vzero / vzero); + + /* + * |x| >= 0x1p63 is always an integer, so return +-0. + * FIXME: should this raise FE_INEXACT or FE_INVALID. + */ + if (ax + 1 > 1) + ax = copysignl(0, x); + RETURNI(ax); +} Property changes on: lib/msun/ld80/s_tanpil.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/man/cospi.3 =================================================================== --- lib/msun/man/cospi.3 (nonexistent) +++ lib/msun/man/cospi.3 (working copy) @@ -0,0 +1,111 @@ +.\" Copyright (c) 2017 Steven G. Kargl +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd April 1, 2017 +.Dt COSPI 3 +.Os +.Sh NAME +.Nm cospi , +.Nm cospif , +.Nm cospil +.Nd half\(encycle cosine functions +.Sh LIBRARY +.Lb libm +.Sh SYNOPSIS +.In math.h +.Ft double +.Fn cospi "double x" +.Ft float +.Fn cospif "float x" +.Ft long double +.Fn cospil "long double x" +.Sh DESCRIPTION +The +.Fn cospi , +.Fn cospif , +and +.Fn cospil +functions compute the cosine of +.Fa "\(*p \(mu x" . +and measure angles in half-cycles. +.Sh RETURN VALUES +The +.Fn cospi , +.Fn cospif , +and +.Fn cospil +functions returns +.Fn cos "\(*p \(mu x" . +If \*(Bax\*(Ba \*(Ge 2^(p - 1) +where p is the floating\(enpoint precision of +.Ar x , +then the returned value is 1 and it has no significance. +.Sh SPECIAL VALUES +.Bl -tag +.It +.Fn cospi \*(Pm0 +returns 1. +.It +.Fn cospi \*(Pmn/2 +returns 0 for positive integers +.Ar n . +.It +.Fn cospi n +returns 1 for even integers +.Ar n . +.It +.Fn cospi n +returns \-1 for odd integers +.Ar n . +.It +.Fn cospi \*(Pm\(if +return an \*(Na and raises an FE_INVALID exception. +.It +.Fn cospi \*(Na +return an \*(Na and raises an FE_INVALID exception. +.El +.Sh SEE ALSO +.Xr cos 3 , +.Xr fenv 3 , +.Xr math 3 , +.Xr sin 3 , +.Xr sinpi 3 , +.Xr tan 3 , +.Xr tanpi 3 +.Sh AUTHORS +The half\(encycle trignometric functions were written by +.An Steven G. Kargl Aq Mt kargl@FreeBSD.org . +.Sh STANDARDS +These functions conform to +IEEE Std 754\(tm\(en2008 , +\(dqIEEE Standard for Floating-Point Arithmetic\(dq +and to +ISO/IEC TS 18661-4 , +\(dqInformation technology \(em Programming languages, their environments, +and system software interfaces \(em Floating\(enpoint extensions for +C\(dq \(em Part 4: Supplementary functions. + + Property changes on: lib/msun/man/cospi.3 ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/man/sinpi.3 =================================================================== --- lib/msun/man/sinpi.3 (nonexistent) +++ lib/msun/man/sinpi.3 (working copy) @@ -0,0 +1,102 @@ +.\" Copyright (c) 2017 Steven G. Kargl +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd April 1, 2017 +.Dt SINPI 3 +.Os +.Sh NAME +.Nm sinpi , +.Nm sinpif , +.Nm sinpil +.Nd half\(encycle sine functions +.Sh LIBRARY +.Lb libm +.Sh SYNOPSIS +.In math.h +.Ft double +.Fn sinpi "double x" +.Ft float +.Fn sinpif "float x" +.Ft long double +.Fn sinpil "long double x" +.Sh DESCRIPTION +The +.Fn sinpi , +.Fn sinpif , +and +.Fn sinpil +functions compute the sine of +.Fa "\(*p \(mu x" . +and measure angles in half-cycles. +.Sh RETURN VALUES +The +.Fn sinpi , +.Fn sinpif , +and +.Fn sinpil +functions returns +.Fn sin "\(*p \(mu x" . +If \*(Bax\*(Ba \*(Ge 2^(p - 1) +where p is the floating\(enpoint precision of +.Ar x , +then the returned value is \*(Pm0 and it has no significance. +.Sh SPECIAL VALUES +.Bl -tag +.It +.Fn sinpi \*(Pm0 +returns \*(Pm0. +.It +.Fn sinpi \*(Pmn +returns \*(Pm0 for positive integers +.Ar n . +.It +.Fn sinpi \*(Pm\(if +return an \*(Na and raises an FE_INVALID exception. +.It +.Fn sinpi \*(Na +return an \*(Na and raises an FE_INVALID exception. +.El +.Sh SEE ALSO +.Xr cos 3 , +.Xr cospi 3 , +.Xr fenv 3 , +.Xr math 3 , +.Xr sin 3 , +.Xr tan 3 , +.Xr tanpi 3 +.Sh AUTHORS +The half\(encycle trignometric functions were written by +.An Steven G. Kargl Aq Mt kargl@FreeBSD.org . +.Sh STANDARDS +These functions conform to +IEEE Std 754\(tm\(en2008 , +\(dqIEEE Standard for Floating-Point Arithmetic\(dq +and to +ISO/IEC TS 18661-4 , +\(dqInformation technology \(em Programming languages, their environments, +and system software interfaces \(em Floating\(enpoint extensions for +C\(dq \(em Part 4: Supplementary functions. + Property changes on: lib/msun/man/sinpi.3 ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/man/tanpi.3 =================================================================== --- lib/msun/man/tanpi.3 (nonexistent) +++ lib/msun/man/tanpi.3 (working copy) @@ -0,0 +1,106 @@ +.\" Copyright (c) 2017 Steven G. Kargl +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd April 1, 2017 +.Dt TANPI 3 +.Os +.Sh NAME +.Nm tanpi , +.Nm tanpif , +.Nm tanpil +.Nd half\(encycle tangent functions +.Sh LIBRARY +.Lb libm +.Sh SYNOPSIS +.In math.h +.Ft double +.Fn tanpi "double x" +.Ft float +.Fn tanpif "float x" +.Ft long double +.Fn tanpil "long double x" +.Sh DESCRIPTION +The +.Fn tanpi , +.Fn tanpif , +and +.Fn tanpil +functions compute the tangent of +.Fa "\(*p \(mu x" +and measure angles in half-cycles. +.Sh RETURN VALUES +The +.Fn tanpi , +.Fn tanpif , +and +.Fn tanpil +functions returns +.Fn tan "\(*p \(mu x" . +If \*(Bax\*(Ba \*(Ge 2^(p - 1) +where p is the floating\(enpoint precision of +.Ar x , +then the returned value is \*(Pm0 and it has no significance. +.Sh SPECIAL VALUES +.Bl -tag +.It +.Fn tanpi \*(Pm0 +returns \*(Pm0. +.It +.Fn tanpi \*(Pmn +returns \*(Pm0 for positive integers +.Ar n . +.It +.Fn tanpi \*(Pmn/2 +returns \*(Na for n > 0 and raises an FE_INVALID exception. +.It +.Fn tanpi \*(Pm\(if +return an \*(Na and raises an FE_INVALID exception. +.It +.Fn tanpi \*(Na +return an \*(Na and raises an FE_INVALID exception. +.El +.Sh SEE ALSO +.Xr cos 3 , +.Xr cospi 3 , +.Xr fenv 3 , +.Xr math 3 , +.Xr sin 3 , +.Xr sinpi 3 , +.Xr tan 3 , +.Sh AUTHORS +The half\(encycle trignometric functions were written by +.An Steven G. Kargl Aq Mt kargl@FreeBSD.org . +.Sh STANDARDS +These functions conform to +IEEE Std 754\(tm\(en2008 , +\(dqIEEE Standard for Floating-Point Arithmetic\(dq +and to +ISO/IEC TS 18661-4 , +\(dqInformation technology \(em Programming languages, their environments, +and system software interfaces \(em Floating\(enpoint extensions for +C\(dq \(em Part 4: Supplementary functions. + + Property changes on: lib/msun/man/tanpi.3 ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/src/k_cospi.c =================================================================== --- lib/msun/src/k_cospi.c (nonexistent) +++ lib/msun/src/k_cospi.c (working copy) @@ -0,0 +1,45 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * The basic kernel for x in [0,0.25]. To exploit the kernel for cos(x), + * the argument to __kernel_cospi() must be multiply by pi. As pi is an + * irrational number, this allows the splitting of pi*x into high and low + * parts. + */ + +static inline double +__kernel_cospi(double x) +{ + double hi, lo; + hi = (float)x; + lo = x - hi; + lo = lo * (pilo + pihi) + hi * pilo; + hi *= pihi; + _2sumF(hi, lo); + return (__kernel_cos(hi, lo)); +} + Property changes on: lib/msun/src/k_cospi.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/src/k_sinpi.c =================================================================== --- lib/msun/src/k_sinpi.c (nonexistent) +++ lib/msun/src/k_sinpi.c (working copy) @@ -0,0 +1,44 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * The basic kernel for x in [0,0.25]. To exploit the kernel for sin(x), + * the argument to __kernel_sinpi() must be multiply by pi. As pi is an + * irrational number, this allows the splitting of pi*x into high and low + * parts. + */ + +static inline double +__kernel_sinpi(double x) +{ + double hi, lo; + hi = (float)x; + lo = x - hi; + lo = lo * (pilo + pihi) + hi * pilo; + hi *= pihi; + _2sumF(hi, lo); + return (__kernel_sin(hi, lo, 1)); +} Property changes on: lib/msun/src/k_sinpi.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/src/math.h =================================================================== --- lib/msun/src/math.h (revision 318383) +++ lib/msun/src/math.h (working copy) @@ -500,6 +500,15 @@ #if __BSD_VISIBLE long double lgammal_r(long double, int *); +double cospi(double); +float cospif(float); +long double cospil(long double); +double sinpi(double); +float sinpif(float); +long double sinpil(long double); +double tanpi(double); +float tanpif(float); +long double tanpil(long double); #endif __END_DECLS Index: lib/msun/src/s_cospi.c =================================================================== --- lib/msun/src/s_cospi.c (nonexistent) +++ lib/msun/src/s_cospi.c (working copy) @@ -0,0 +1,152 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/** + * cospi(x) computes cos(pi*x) without multiplication by pi (almost). First, + * note that cospi(-x) = cospi(x), so the algorithm considers only |x|. The + * method used depends on the magnitude of x. + * + * 1. For small |x|, cospi(x) = 1 with FE_INEXACT raised where a sloppy + * threshold is used. The threshold is |x| < 0x1pN with N = -(P/2+M). + * P is the precision of the floating-point type and M = 2 to 4. + * + * 2. For |x| < 1, argument reduction is not required and sinpi(x) is + * computed by calling a kernel that leverages the kernels for sin(x) + * ans cos(x). See k_sinpi.c and k_cospi.c for details. + * + * 3. For 1 <= |x| < 0x1p(P-1), argument reduction is required where + * |x| = j0 + r with j0 an integer and the remainder r satisfies + * 0 <= r < 1. With the given domain, a simplified inline floor(x) + * is used. Also, note the following identity + * + * cospi(x) = cos(pi*(j0+r)) + * = cos(pi*j0) * cos(pi*r) - sin(pi*j0) * sin(pi*r) + * = cos(pi*j0) * cos(pi*r) + * = +-cospi(r) + * + * If j0 is even, then cos(pi*j0) = 1. If j0 is odd, then cos(pi*j0) = -1. + * cospi(r) is then computed via an appropriate kernel. + * + * 4. For |x| >= 0x1p(P-1), |x| is integral and cospi(x) = 1. + * + * 5. Special cases: + * + * cospi(+-0) = 1. + * cospi(n.5) = 0 for n an integer. + * cospi(+-inf) = nan. Raises the "invalid" floating-point exception. + * cospi(nan) = nan. Raises the "invalid" floating-point exception. + */ + +#include "math.h" +#include "math_private.h" + +static const double +pihi = 3.1415926814079285e+00, /* 0x400921fb 0x58000000 */ +pilo =-2.7818135228334233e-08; /* 0xbe5dde97 0x3dcb3b3a */ + +#include "k_cospi.c" +#include "k_sinpi.c" + +double +cospi(double x) +{ + volatile static const double vzero = 0; + double ax, c; + uint32_t hx, ix, j0, lx; + + EXTRACT_WORDS(hx, lx, x); + ix = hx & 0x7fffffff; + INSERT_WORDS(ax, ix, lx); + + if (ix < 0x3ff00000) { /* |x| < 1 */ + if (ix < 0x3fd00000) { /* |x| < 0.25 */ + if (ix < 0x3e200000) { /* |x| < 0x1p-29 */ + if ((int)ax == 0) + return (1); + } + return (__kernel_cospi(ax)); + } + + if (ix < 0x3fe00000) /* |x| < 0.5 */ + c = __kernel_sinpi(0.5 - ax); + else if (ix < 0x3fe80000){ /* |x| < 0.75 */ + if (ax == 0.5) + return (0); + c = -__kernel_sinpi(ax - 0.5); + } else + c = -__kernel_cospi(1 - ax); + return (c); + } + + if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ + /* Determine integer part of ax. */ + j0 = ((ix >> 20) & 0x7ff) - 0x3ff; + if (j0 < 20) { + ix &= ~(0x000fffff >> j0); + lx = 0; + } else { + lx &= ~((uint32_t)0xffffffff >> (j0 - 20)); + } + INSERT_WORDS(x, ix, lx); + + ax -= x; + EXTRACT_WORDS(ix, lx, ax); + + + if (ix < 0x3fe00000) { /* |x| < 0.5 */ + if (ix < 0x3fd00000) /* |x| < 0.25 */ + c = ix == 0 ? 1 : __kernel_cospi(ax); + else + c = __kernel_sinpi(0.5 - ax); + } else { + if (ix < 0x3fe80000) { /* |x| < 0.75 */ + if (ax == 0.5) + return (0); + c = -__kernel_sinpi(ax - 0.5); + } else + c = -__kernel_cospi(1 - ax); + } + + if (j0 > 30) x -= 0x1p30; + j0 = (uint32_t)x; + return (j0 & 1 ? -c : c); + } + + if (ix >= 0x7f800000) + return (vzero / vzero); + + /* + * |x| >= 0x1p52 is always an even integer, so return 1. + * FIXME: should this raise FE_INEXACT or FE_INVALID? + */ + if (ax + 1 > 1) + ax = 1; + return (ax); +} + +#if LDBL_MANT_DIG == 53 +__weak_reference(cospi, cospil); +#endif Property changes on: lib/msun/src/s_cospi.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/src/s_cospif.c =================================================================== --- lib/msun/src/s_cospif.c (nonexistent) +++ lib/msun/src/s_cospif.c (working copy) @@ -0,0 +1,111 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * See ../src/s_cospi.c for implementation details. + */ +#define INLINE_KERNEL_SINDF +#define INLINE_KERNEL_COSDF + +#include "math.h" +#include "math_private.h" +#include "k_cosf.c" +#include "k_sinf.c" + +#define __kernel_cospif(x) ((float)__kernel_cosdf(M_PI * (x))) +#define __kernel_sinpif(x) ((float)__kernel_sindf(M_PI * (x))) + +float +cospif(float x) +{ + volatile static const float vzero = 0; + float ax, c; + uint32_t ix, j0; + + GET_FLOAT_WORD(ix, x); + ix = ix & 0x7fffffff; + SET_FLOAT_WORD(ax, ix); + + if (ix < 0x3f800000) { /* |x| < 1 */ + if (ix < 0x3e800000) { /* |x| < 0.25 */ + if (ix < 0x38800000) { /* |x| < 0x1p-14 */ + /* Raise inexact iff != 0. */ + if ((int)ax == 0) + return (1); + } + return (__kernel_cospif(ax)); + } + + if (ix < 0x3f000000) /* |x| < 0.5 */ + c = __kernel_sinpif(0.5f - ax); + else if (ix < 0x3f400000) { /* |x| < 0.75 */ + if (ix == 0x3f000000) + return (0); + c = -__kernel_sinpif(ax - 0.5f); + } else + c = -__kernel_cospif(1 - ax); + return (c); + } + + if (ix < 0x4b000000) { /* 1 <= |x| < 0x1p23 */ + /* Determine integer part of ax. */ + j0 = ((ix >> 23) & 0xff) - 0x7f; + ix &= ~(0x007fffff >> j0); + SET_FLOAT_WORD(x, ix); + + ax -= x; + GET_FLOAT_WORD(ix, ax); + + if (ix < 0x3f000000) { /* |x| < 0.5 */ + if (ix < 0x3e800000) /* |x| < 0.25 */ + c = ix == 0 ? 1 : __kernel_cospif(ax); + else + c = __kernel_sinpif(0.5f - ax); + } else { + if (ix < 0x3f400000) { /* |x| < 0.75 */ + if (ix == 0x3f000000) + return (0); + c = -__kernel_sinpif(ax - 0.5f); + } else + c = -__kernel_cospif(1 - ax); + } + + j0 = (uint32_t)x; + return (j0 & 1 ? -c : c); + } + + /* x = +-inf or nan. */ + if (ix >= 0x7f800000) + return (vzero / vzero); + + /* + * |x| >= 0x1p23 is always an even integer, so return 1. + * FIXME: should this raise FE_INEXACT or FE_INVALID. + */ + if (ax + 1 > 1) + ax = 1; + return (ax); +} Property changes on: lib/msun/src/s_cospif.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/src/s_sinpi.c =================================================================== --- lib/msun/src/s_sinpi.c (nonexistent) +++ lib/msun/src/s_sinpi.c (working copy) @@ -0,0 +1,163 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/** + * sinpi(x) computes sin(pi*x) without multiplication by pi (almost). First, + * note that sinpi(-x) = -sinpi(x), so the algorithm considers only |x| and + * includes reflection symmetry by considering the sign of x on output. The + * method used depends on the magnitude of x. + * + * 1. For small |x|, sinpi(x) = pi * x where a sloppy threshold is used. The + * threshold is |x| < 0x1pN with N = -(P/2+M). P is the precision of the + * floating-point type and M = 2 to 4. To achieve high accuracy, pi is + * decomposed into high and low parts with the high part containing a + * number of trailing zero bits. x is also split into high and low parts. + * + * 2. For |x| < 1, argument reduction is not required and sinpi(x) is + * computed by calling a kernel that leverages the kernels for sin(x) + * ans cos(x). See k_sinpi.c and k_cospi.c for details. + * + * 3. For 1 <= |x| < 0x1p(P-1), argument reduction is required where + * |x| = j0 + r with j0 an integer and the remainder r satisfies + * 0 <= r < 1. With the given domain, a simplified inline floor(x) + * is used. Also, note the following identity + * + * sinpi(x) = sin(pi*(j0+r)) + * = sin(pi*j0) * cos(pi*r) + cos(pi*j0) * sin(pi*r) + * = cos(pi*j0) * sin(pi*r) + * = +-sinpi(r) + * + * If j0 is even, then cos(pi*j0) = 1. If j0 is odd, then cos(pi*j0) = -1. + * sinpi(r) is then computed via an appropriate kernel. + * + * 4. For |x| >= 0x1p(P-1), |x| is integral and sinpi(x) = copysign(0,x). + * + * 5. Special cases: + * + * sinpi(+-0) = +-0 + * sinpi(+-n) = +-0, for positive integers n. + * sinpi(+-inf) = nan. Raises the "invalid" floating-point exception. + * sinpi(nan) = nan. Raises the "invalid" floating-point exception. + */ + +#include "math.h" +#include "math_private.h" + +static const double +pihi = 3.1415926814079285e+00, /* 0x400921fb 0x58000000 */ +pilo =-2.7818135228334233e-08; /* 0xbe5dde97 0x3dcb3b3a */ + +#include "k_cospi.c" +#include "k_sinpi.c" + +double +sinpi(double x) +{ + volatile static const double vzero = 0; + double ax, s; + uint32_t hx, ix, j0, lx; + + EXTRACT_WORDS(hx, lx, x); + ix = hx & 0x7fffffff; + INSERT_WORDS(ax, ix, lx); + + if (ix < 0x3ff00000) { /* |x| < 1 */ + if (ix < 0x3fd00000) { /* |x| < 0.25 */ + if (ix < 0x3e200000) { /* |x| < 0x1p-29 */ + if (x == 0) + return (x); + INSERT_WORDS(ax, hx, 0); + x -= ax; + s = pilo * x + pihi * x + pilo * ax + + pihi * ax; + return (s); + } + + s = __kernel_sinpi(ax); + return ((hx & 0x80000000) ? -s : s); + } + + if (ix < 0x3fe00000) /* |x| < 0.5 */ + s = __kernel_cospi(0.5 - ax); + else if (ix < 0x3fe80000) /* |x| < 0.75 */ + s = __kernel_cospi(ax - 0.5); + else + s = __kernel_sinpi(1 - ax); + return ((hx & 0x80000000) ? -s : s); + } + + if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ + /* Determine integer part of ax. */ + j0 = ((ix >> 20) & 0x7ff) - 0x3ff; + if (j0 < 20) { + ix &= ~(0x000fffff >> j0); + lx = 0; + } else { + lx &= ~((uint32_t)0xffffffff >> (j0 - 20)); + } + INSERT_WORDS(x, ix, lx); + + ax -= x; + EXTRACT_WORDS(ix, lx, ax); + + if (ix == 0) + s = 0; + else { + if (ix < 0x3fe00000) { /* |x| < 0.5 */ + if (ix < 0x3fd00000) /* |x| < 0.25 */ + s = __kernel_sinpi(ax); + else + s = __kernel_cospi(0.5 - ax); + } else { + if (ix < 0x3fe80000) /* |x| < 0.75 */ + s = __kernel_cospi(ax - 0.5); + else + s = __kernel_sinpi(1 - ax); + } + + if (j0 > 30) x -= 0x1p30; + j0 = (uint32_t)x; + if (j0 & 1) s = -s; + } + + return ((hx & 0x80000000) ? -s : s); + } + + if (ix >= 0x7f800000) + return (vzero / vzero); + + /* + * |x| >= 0x1p52 is always an integer, so return +-0. + * FIXME: should this raise FE_INEXACT or FE_INVALID? + */ + if (ax + 1 > 1) + ax = copysign(0, x); + return (ax); +} + +#if LDBL_MANT_DIG == 53 +__weak_reference(sinpi, sinpil); +#endif Property changes on: lib/msun/src/s_sinpi.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/src/s_sinpif.c =================================================================== --- lib/msun/src/s_sinpif.c (nonexistent) +++ lib/msun/src/s_sinpif.c (working copy) @@ -0,0 +1,122 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * See ../src/s_sinpi.c for implementation details. + */ + +#define INLINE_KERNEL_SINDF +#define INLINE_KERNEL_COSDF + +#include "math.h" +#include "math_private.h" +#include "k_cosf.c" +#include "k_sinf.c" + +#define __kernel_cospif(x) ((float)__kernel_cosdf(M_PI * (x))) +#define __kernel_sinpif(x) ((float)__kernel_sindf(M_PI * (x))) + +static const float pihi = 3.14160156e+00f; /* 0x40491000 */ +static const float pilo = -8.90890988e-06f; /* 0xb715777a */ + +float +sinpif(float x) +{ + volatile static const float vzero = 0; + float ax, s; + uint32_t hx, ix, j0; + + GET_FLOAT_WORD(hx, x); + ix = hx & 0x7fffffff; + SET_FLOAT_WORD(ax, ix); + + if (ix < 0x3f800000) { /* |x| < 1 */ + if (ix < 0x3e800000) { /* |x| < 0.25 */ + if (ix < 0x38800000) { /* |x| < 0x1p-14 */ + if (x == 0) + return (x); + SET_FLOAT_WORD(ax, hx & 0xffff0000); + x -= ax; + s = pilo * x + pihi * x + pilo * ax + + pihi * ax; + return (s); + } + + s = __kernel_sinpif(ax); + return ((hx & 0x80000000) ? -s : s); + } + + if (ix < 0x3f000000) /* |x| < 0.5 */ + s = __kernel_cospif(0.5f - ax); + else if (ix < 0x3f400000) /* |x| < 0.75 */ + s = __kernel_cospif(ax - 0.5f); + else + s = __kernel_sinpif(1 - ax); + return ((hx & 0x80000000) ? -s : s); + } + + if (ix < 0x4b000000) { /* 1 <= |x| < 0x1p23 */ + /* Determine integer part of ax. */ + j0 = ((ix >> 23) & 0xff) - 0x7f; + ix &= ~(0x007fffff >> j0); + SET_FLOAT_WORD(x, ix); + + ax -= x; + GET_FLOAT_WORD(ix, ax); + + if (ix == 0) + s = 0; + else { + if (ix < 0x3f000000) { /* |x| < 0.5 */ + if (ix < 0x3e800000) /* |x| < 0.25 */ + s = __kernel_sinpif(ax); + else + s = __kernel_cospif(0.5f - ax); + } else { + if (ix < 0x3f400000) /* |x| < 0.75 */ + s = __kernel_cospif(ax - 0.5f); + else + s = __kernel_sinpif(1 - ax); + } + + j0 = (uint32_t)x; + s = (j0 & 1) ? -s : s; + } + return ((hx & 0x80000000) ? -s : s); + } + + /* x = +-inf or nan. */ + if (ix >= 0x7f800000) + return (vzero / vzero); + + /* + * |x| >= 0x1p23 is always an integer, so return +-0. + * FIXME: should this raise FE_INEXACT or FE_INVALID? + */ + if (ax + 1 > 1) + ax = copysignf(0, x); + return (ax); +} Property changes on: lib/msun/src/s_sinpif.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/src/s_tanpi.c =================================================================== --- lib/msun/src/s_tanpi.c (nonexistent) +++ lib/msun/src/s_tanpi.c (working copy) @@ -0,0 +1,172 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/** + * tanpi(x) computes tan(pi*x) without multiplication by pi (almost). First, + * note that tanpi(-x) = -tanpi(x), so the algorithm considers only |x| and + * includes reflection symmetry by considering the sign of x on output. The + * method used depends on the magnitude of x. + * + * 1. For small |x|, tanpi(x) = pi * x where a sloppy threshold is used. The + * threshold is |x| < 0x1pN with N = -(P/2+M). P is the precision of the + * floating-point type and M = 2 to 4. To achieve high accuracy, pi is + * decomposed into high and low parts with the high part containing a + * number of trailing zero bits. x is also split into high and low parts. + * + * 2. For |x| < 1, argument reduction is not required and tanpi(x) is + * computed by a direct call to a kernel, which uses the kernel for + * tan(x). See below. + * + * 3. For 1 <= |x| < 0x1p(P-1), argument reduction is required where + * |x| = j0 + r with j0 an integer and the remainder r satisfies + * 0 <= r < 1. With the given domain, a simplified inline floor(x) + * is used. Also, note the following identity + * + * tan(pi*j0) + tan(pi*r) + * tanpi(x) = tan(pi*(j0+r)) = ---------------------------- = tanpi(r) + * 1 - tan(pi*j0) * tan(pi*r) + * + * So, after argument reduction, the kernel is again invoked. + * + * 4. For |x| >= 0x1p(P-1), |x| is integral and tanpi(x) = copysign(0,x). + * + * 5. Special cases: + * + * tanpi(+-0) = +-0 + * tanpi(+-n) = +-0, for positive integers n. + * tanpi(+-n+1/4) = +-1, for positive integers n. + * tanpi(+-n+1/2) = NaN, for positive integers n. + * tanpi(+-inf) = NaN. Raises the "invalid" floating-point exception. + * tanpi(nan) = NaN. Raises the "invalid" floating-point exception. + */ + +#include "math.h" +#include "math_private.h" + +static const double +pihi = 3.1415926814079285e+00, /* 0x400921fb 0x58000000 */ +pilo = -2.7818135228334233e-08; /* 0xbe5dde97 0x3dcb3b3a */ + +/* + * The kernel for tanpi(x) multiplies x by an 80-bit approximation of + * pi, where the hi and lo parts are used with with kernel for tan(x). + */ +static inline double +__kernel_tanpi(double x) +{ + double hi, lo, t; + + if (x < 0.25) { + hi = (float)x; + lo = x - hi; + lo = lo * (pilo + pihi) + hi * pilo; + hi *= pihi; + _2sumF(hi, lo); + t = __kernel_tan(hi, lo, 1); + } else if (x > 0.25) { + x = 0.5 - x; + hi = (float)x; + lo = x - hi; + lo = lo * (pilo + pihi) + hi * pilo; + hi *= pihi; + _2sumF(hi, lo); + t = - __kernel_tan(hi, lo, -1); + } else + t = 1; + + return (t); +} + +double +tanpi(double x) +{ + volatile static const double vzero = 0; + double ax, t; + uint32_t hx, ix, j0, lx; + + EXTRACT_WORDS(hx, lx, x); + ix = hx & 0x7fffffff; + INSERT_WORDS(ax, ix, lx); + + if (ix < 0x3ff00000) { /* |x| < 1 */ + if (ix < 0x3fe00000) { /* |x| < 0.5 */ + if (ix < 0x3e200000) { /* |x| < 0x1p-29 */ + if (x == 0) + return (x); + ax = (float)x; + x -= ax; + t = pilo * x + pihi * x + pilo * ax + + pihi * ax; + return (t); + } + t = __kernel_tanpi(ax); + } else if (ax == 0.5) + return ((ax - ax) / (ax - ax)); + else + t = - __kernel_tanpi(1 - ax); + return ((hx & 0x80000000) ? -t : t); + } + + if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ + /* Determine integer part of ax. */ + j0 = ((ix >> 20) & 0x7ff) - 0x3ff; + if (j0 < 20) { + ix &= ~(0x000fffff >> j0); + lx = 0; + } else { + lx &= ~(((uint32_t)(0xffffffff)) >> (j0 - 20)); + } + INSERT_WORDS(x,ix,lx); + + ax -= x; + EXTRACT_WORDS(ix, lx, ax); + + if (ix < 0x3fe00000) /* |x| < 0.5 */ + t = ax == 0 ? 0 : __kernel_tanpi(ax); + else if (ax == 0.5) + return ((ax - ax) / (ax - ax)); + else + t = - __kernel_tanpi(1 - ax); + + return ((hx & 0x80000000) ? -t : t); + } + + /* x = +-inf or nan. */ + if (ix >= 0x7f800000) + return (vzero / vzero); + + /* + * |x| >= 0x1p52 is always an integer, so return +-0. + * FIXME: should this raise FE_INEXACT or FE_INVALID? + */ + if (ax + 1 > 1) + ax = copysign(0, x); + return (ax); +} + +#if LDBL_MANT_DIG == 53 +__weak_reference(tanpi, tanpil); +#endif Property changes on: lib/msun/src/s_tanpi.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/src/s_tanpif.c =================================================================== --- lib/msun/src/s_tanpif.c (nonexistent) +++ lib/msun/src/s_tanpif.c (working copy) @@ -0,0 +1,115 @@ +/*- + * Copyright (c) 2017 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * See ../src/s_tanpi.c for implementation details. + */ + +#define INLINE_KERNEL_TANDF + +#include "math.h" +#include "math_private.h" +#include "k_tanf.c" + +static const float +pihi = 3.14160156e+00f, /* 0x40491000 */ +pilo = -8.90890988e-06f; /* 0xb715777a */ + +static inline float +__kernel_tanpif(float x) +{ + float t; + + if (x < 0.25f) + t = __kernel_tandf(M_PI * x, 1); + else if (x > 0.25f) + t = -__kernel_tandf(M_PI * (0.5 - x), -1); + else + t = 1; + + return (t); +} + +float +tanpif(float x) +{ + volatile static const float vzero = 0; + float ax, t; + uint32_t hx, ix, j0; + + GET_FLOAT_WORD(hx, x); + ix = hx & 0x7fffffff; + SET_FLOAT_WORD(ax, ix); + + if (ix < 0x3f800000) { /* |x| < 1 */ + if (ix < 0x3f000000) { /* |x| < 0.5 */ + if (ix < 0x38800000) { /* |x| < 0x1p-14 */ + if (ix == 0) + return (x); + SET_FLOAT_WORD(ax, hx & 0xffff0000); + x -= ax; + t = pilo * x + pihi * x + pilo * ax + + pihi * ax; + return (t); + } + t = __kernel_tanpif(ax); + } else if (ix == 0x3f000000) + return ((ax - ax) / (ax - ax)); + else + t = - __kernel_tanpif(1 - ax); + return ((hx & 0x80000000) ? -t : t); + } + + if (ix < 0x4b000000) { /* 1 <= |x| < 0x1p23 */ + /* Determine integer part of ax. */ + j0 = ((ix >> 23) & 0xff) - 0x7f; + ix &= ~(0x007fffff >> j0); + SET_FLOAT_WORD(x, ix); + + ax -= x; + GET_FLOAT_WORD(ix, ax); + + if (ix < 0x3f000000) /* |x| < 0.5 */ + t = ix == 0 ? 0 : __kernel_tanpif(ax); + else if (ix == 0x3f000000) + return ((ax - ax) / (ax - ax)); + else + t = - __kernel_tanpif(1 - ax); + return ((hx & 0x80000000) ? -t : t); + } + + /* x = +-inf or nan. */ + if (ix >= 0x7f800000) + return (vzero / vzero); + + /* + * |x| >= 0x1p23 is always an integer, so return +-0. + * FIXME: should this raise FE_INEXACT or FE_INVALID. + */ + if (ax + 1 > 1) + ax = copysignf(0, x); + return (ax); +} Property changes on: lib/msun/src/s_tanpif.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Wed May 17 02:49:56 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 757B7D70100 for ; Wed, 17 May 2017 02:49:56 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id E77A21E60 for ; Wed, 17 May 2017 02:49:55 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 1924B1043A6C; Wed, 17 May 2017 12:49:45 +1000 (AEST) Date: Wed, 17 May 2017 12:49:45 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions In-Reply-To: <20170516224618.GA40855@troutmask.apl.washington.edu> Message-ID: <20170517094848.A52247@besplex.bde.org> References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=VjOiZrt8L6bhyw-5nSAA:9 a=DCdrCWzBJeGlZEVK:21 a=Pt8S-Y3xSoovzgYL:21 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 May 2017 02:49:56 -0000 On Tue, 16 May 2017, Steve Kargl wrote: > ... > I have removed my new kernels and written shims to > use the fdlibm trig kernels. I know that in some > cases this leads to slower code, but a small (if not > tiny) improvement in accuracy. This also leads to > a significant reduction in code size. A couple of > notes: Still quite large. On Tue, 16 May 2017, Steve Kargl wrote: > ... > I have removed my new kernels and written shims to > use the fdlibm trig kernels. I know that in some > cases this leads to slower code, but a small (if not > tiny) improvement in accuracy. This also leads to > a significant reduction in code size. A couple of > notes: Still quite large. > Index: lib/msun/ld128/k_cospil.c This is a header file, not a .c file or even a self-sufficient kernel, since it only compiles when included by C files which declare the things that it uses. My kernels for exp are header files and are also self-sufficient (except they only define static functions and objects, so are useless unless they are included). > +static inline long double > +__kernel_cospil(long double x) > +{ > + long double hi, lo; > + hi = (double)x; Style (missing blank line after declararions). > Index: lib/msun/ld128/k_sinpil.c > ... > + long double hi, lo; > + hi = (double)x; Style. > Index: lib/msun/ld128/s_cospil.c > ... > +static const long double > +pihi = 3.14159265358979322702026593105983920e+00L, > +pilo = 1.14423774522196636802434264184180742e-17L; These are not in normal format, and are hard to read. I can't see if pihi has the correct number of zero bits for exact multiplication. These don't have the normal spelling. fdlibm never uses pihi or pio2hi, or pi_hi. It often uses pio2_hi and other pio2_*. My s_sinpi.c uses pi_hi. > +long double > +cospil(long double x) > +{ > + volatile static const double vzero = 0; > + long double ax, c, xf; > + uint32_t ix; > + > + ax = fabsl(x); > + Style (lots of extra blank lines). > + if (ax < 0x1p112) { > + > + xf = floorl(ax); > + > + ax -= xf; > + These extra blank lines are especially bad, since they separate related code. > + if (xf > 0x1p50) xf -= 0x1p50; > + if (xf > 0x1p30) xf -= 0x1p30; Style. > Index: lib/msun/ld128/s_sinpil.c Similarly. > Index: lib/msun/ld128/s_tanpil.c Similarly. > + * FIXME: This should use a polynomial approximation in [0,0.25], but the > + * FIXME: increase in max ULP is probably small, so who cares. Doesn't it already use the standard kernel, which uses a poly approx? Splitting up [0, Pi/4] doesn't work very well for reducing the degree of the poly. The current poly has degree about 56 (maybe 52 or 54), and I couldn't reduce it to below 40 using reasonably large interval. I think it can be reduced to degree about 10 using intervals of length 1/128 as for logl(), but unlike for logl() this obvious way of doing this needs about 128 diferent polys. > +static inline long double > +__kernel_tanpi(long double x) > +{ > + long double hi, lo, t; > + > + if (x < 0.25) { > + hi = (double)x; > + lo = x - hi; > + lo = lo * (pilo + pihi) + hi * pilo; > + hi *= pihi; > + _2sumF(hi, lo); > + t = __kernel_tanl(hi, lo, -1); Oops, I forgot that this 0.25 is really Pi/4 when scaled. It is impossible to use 1 poly over the whole range, since the poly would need to have even higher degree than 50+, and would still have bad numeric properties. so __kernel_tan*() uses special methods. > +long double > +tanpil(long double x) > +{ > + volatile static const double vzero = 0; > + long double ax, xf; > + uint32_t ix; > + > + ax = fabsl(ax); > + > + if (ax < 1) { > + if (ax < 0.5) { > + if (ax < 0x1p-60) { > + if (x == 0) > + return (x); > + ax = (double)x; Style (unnecessary cast). > + x -= ax; > + t = pilo * x + pihi * x + pilo * ax > + + pihi * ax; Style: - unnecessary line splitting - gnu style for placement of the operator on a new line. If not following fdlibm style, use strict KNF. > + return (t); > + } > + t = __kernel_tanpil(ax); For most long double functions, we put everything except the kernels in the same files, with ifdefs for LDBL_MANT_DIG. Here even the hi+lo decomposition and multiplication by Pi can be in a kernel. The style bugs are especially painful to read when sextuplicated. (The ld80 variants are quite different due to being better optimized, but this is a different style bug.) > Index: lib/msun/ld80/k_cospil.c No more comments on sextuplicated style bugs. > Index: lib/msun/ld80/s_cospil.c > ... > + if (ix < 0x3ffe) /* |x| < 0.5 */ > + c = __kernel_sinpil(0.5 - ax); > + else if (lx < 0xc000000000000000ull) { /* |x| < 0.75 */ Style bug (spelling of the long long abomination using "ull"). The explicit abomination is unfortunately needed by old versions of gcc with CFLAGS asking for nearly C90. > + if (ax == 0.5) > + RETURNI(0); Perhaps pessimal to not test for 0.5 in bits. It is pessimal to avoid generating inexact for 0.5 at all. I don't see a much better way. > + if (ix < 0x403e) { /* 1 <= |x| < 0x1p63 */ > + /* Determine integer part of ax. */ > + j0 = ix - 0x3fff + 1; > + if (j0 < 32) { > + lx = (lx >> 32) << 32; > + lx &= ~(((lx << 32)-1) >> j0); > + } else { > + uint64_t m = (uint64_t)-1 >> (j0 + 1); > + if (lx & m) lx &= ~m; Style: - nested declaration - initialization in declaration - half the code in the initialization - no newline for statement after 'if' > + } > + INSERT_LDBL80_WORDS(x, ix, lx); > + > + ax -= x; > + EXTRACT_LDBL80_WORDS(ix, lx, ax); I don't know if this is a good method. INSERT/EXTRACT are slow, so I doubt that it is. In e_lgamma*.c:sin_pi*(), we use only a single GET of the high word. This function already used EXTRACT/INSERT to classify, and 2 more here. (_e_lgamma*.c used GET to classify before calling sin_pi*(), then 1 more GET.) INSERT is especially slow. Care is needed near the 0x1p63 boundary, since the fast methods require adding approx 0x1p63. At or above the boundary, these methods don't obviously work. Perhaps they give the correct result starting at 2 times the boundary, except for setting inexact. Ugh. Do you avoid the fast methods since they set inexact for most half-integers? That costs more than avoiding inexact for 0.5. tanpi(0.25 is exact too, so quarter-integers need special treatment too if you really want to avoid all spurious inexacts. I now see that some of the above complications are from inlining floor(). floor() can't use the fast method since it is careful about inexact, and the spec may require this. It actually sets inexact whenever the result is not an integer. So it use it for sinpi() without getting spurious inexact, the arg must be multiplied by 2 first, and for tanpi(), multiply by 4 first. This is excessive. > Index: lib/msun/ld80/s_sinpil.c Similarly. > Index: lib/msun/ld80/s_tanpil.c Similarly. > Index: lib/msun/src/k_cospi.c > ... > +/* > + * The basic kernel for x in [0,0.25]. To exploit the kernel for cos(x), > + * the argument to __kernel_cospi() must be multiply by pi. As pi is an > + * irrational number, this allows the splitting of pi*x into high and low > + * parts. > + */ Allows? > + > +static inline double > +__kernel_cospi(double x) > +{ > + double hi, lo; > + hi = (float)x; _2sumF is extensively documented to not work with double. It requires double_t here. Many style bugs are actually duodecimally (?) tuplicated (3 copies for ld128, 3 for ld80, 3 for double, 3 for float). > Index: lib/msun/src/k_sinpi.c Similarly. > Index: lib/msun/src/math.h > =================================================================== > --- lib/msun/src/math.h (revision 318383) > +++ lib/msun/src/math.h (working copy) > @@ -500,6 +500,15 @@ > > #if __BSD_VISIBLE > long double lgammal_r(long double, int *); > +double cospi(double); > +float cospif(float); > +long double cospil(long double); > +double sinpi(double); > +float sinpif(float); > +long double sinpil(long double); > +double tanpi(double); > +float tanpif(float); > +long double tanpil(long double); > #endif Should be sorted into the double, float and long double sections. > Index: lib/msun/src/s_cospi.c > +/** > + * cospi(x) computes cos(pi*x) without multiplication by pi (almost). First, You mean with multiplication by pi in extra precision. See s_sinpi.c for more. > Index: lib/msun/src/s_cospif.c Similarly. > Index: lib/msun/src/s_cospif.c > +#define INLINE_KERNEL_SINDF > +#define INLINE_KERNEL_COSDF Prossibly not worth inlining. > +#define __kernel_cospif(x) ((float)__kernel_cosdf(M_PI * (x))) > +#define __kernel_sinpif(x) ((float)__kernel_sindf(M_PI * (x))) Style bugs (not even corrupt tabs, but 2 spaces after #define and 3 spaces instead of a second tab). Bogus casts. The kernels already return float, although that is pessimal and destroys their the xtra precision that they naturally have. The cost of avoiding inexact is clearer here. Returning the non-kernel sin(M_PI * (x)), etc. with no classifation would probably be faster than the slow reduction done here (except for large x). See s_sinpi.c for more. > Index: lib/msun/src/s_sinpi.c > ... > +double > +sinpi(double x) > +{ > + volatile static const double vzero = 0; > + double ax, s; > + uint32_t hx, ix, j0, lx; > + > + EXTRACT_WORDS(hx, lx, x); > + ix = hx & 0x7fffffff; > + INSERT_WORDS(ax, ix, lx); A slow way to set ax to fabs(x). This is otherwise much slower than sin_pi() for lgamma(). That started with just a GET of hx, while this EXTRACTs 2 words, like floor(), apparently to avoid spurious inexact. Note that lgamma(0.5) is naturally inexact. lgamma(integer) should be exact, but I doubt that it is. Maybe we broke it. > + > + if (ix < 0x3ff00000) { /* |x| < 1 */ > + if (ix < 0x3fd00000) { /* |x| < 0.25 */ > + if (ix < 0x3e200000) { /* |x| < 0x1p-29 */ > + if (x == 0) > + return (x); > + INSERT_WORDS(ax, hx, 0); > + x -= ax; Flipping minus signs to reduce the number of cases for odd functions is a bad way to do things. I speeded up several functions by 4-10 cycles (5-20%) by avoiding such flipping. Here there is a dependency on ax and a slower than usual calculation of it with. (On modern x86, fabs would take ~2 cycles latency and ~1/2 cycles throughput), while going throigh these extractions and insertions might take 10 cycles latency at best and 50 or more with store to load mismatch penalties on not so modern x86). Apart from being slower, sign flipping it prevents gives different rounding in non-default modes. C99 says that sin() is odd, so may require the flipping to break the rounding modes intentionally. If so, then the bug is in C99. I believe that it intentionally doesn't require avoiding spurious inexact because this would be onerous. It also doesn't require sin() to be rounded in the current rounding mode. Requiring sin() to be perfectly old is similarly onerous. On Haswell, fdlibm floor() takes about 33 cycles and fdlibm sin() about 44 cycles below 2*Pi, so any sinpi() that uses floor() or its internals is going to be about twice as slow as sin() because its classification is too slow. > + if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ > + /* Determine integer part of ax. */ > + j0 = ((ix >> 20) & 0x7ff) - 0x3ff; > + if (j0 < 20) { > + ix &= ~(0x000fffff >> j0); > + lx = 0; > + } else { > + lx &= ~((uint32_t)0xffffffff >> (j0 - 20)); > + } > + INSERT_WORDS(x, ix, lx); > + > + ax -= x; > + EXTRACT_WORDS(ix, lx, ax); Much slower than what lgamma()s sin_pi() does. After our optimizations, that does basically rint() inline using ax + 0x1p52 - 0x1p52, then a further += 0x1p50 to get octants instead of half-cycles. This takes approx 16 cycles latency on modern x86, which is lot. I think and64 can do the above in 10-20 cycles, while i386 might take 20-40 or much longer with store-to-load mismatch penalties. I think we can avoid the spurious inexacts with a little care. When ax is an integer (and < 0x1p52), we can add at least 0x1p51 to it exactly. We also want to avoid inexact for half-integers for sinpi and cospi, and quarter-integers for tanpi, and it is good to reduce to quadrants anyway. So we might intitially multiply by 4 and do the += 0x1p52 trick only if 4*ax < 0x1p52 (or whatever the safe threshold is). Efficiency doesn't matter for larger ax, so slow methods using things like floor(16*ax) are adequate, or we can reduce by a few powers of 2 using special cases: v = 4 * ax; if (v >= 0x1p54) ; /* ax is an even integer (check factor) / else { if (v >= 0x1p53) v -= 0x1p53; if (v >= 0x1p52) v -= 0x1p52; } I will finish this for lgamma*(). The fdlibm version does multiplications by 2 and we seem to have removed a bit too much of that method. > + /* > + * |x| >= 0x1p52 is always an integer, so return +-0. > + * FIXME: should this raise FE_INEXACT or FE_INVALID? > + */ The result is always +0, exact and valid (maybe -0 in unsupported unusual rounding modes). We just did a lot of work to avoid inexact. > + if (ax + 1 > 1) > + ax = copysign(0, x); An even slower way of setting the sign than INSERT, especially if the compiler doesn't inline copysign(). Using the correct +0 fixes this. We might have classified slightly smaller values as odd integers and have to select the sign for them. This is done differently now. > Index: lib/msun/src/s_sinpif.c > ... > Index: lib/msun/src/s_tanpi.c > ... > Index: lib/msun/src/s_tanpif.c > ... Similarly. Bruce From owner-freebsd-numerics@freebsd.org Wed May 17 04:20:25 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 74D97D70925 for ; Wed, 17 May 2017 04:20:25 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 5DACF106D for ; Wed, 17 May 2017 04:20:25 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4H4KN2R043753 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 16 May 2017 21:20:24 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4H4KNYs043752; Tue, 16 May 2017 21:20:23 -0700 (PDT) (envelope-from sgk) Date: Tue, 16 May 2017 21:20:23 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170517042023.GA43647@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> <20170517094848.A52247@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170517094848.A52247@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 May 2017 04:20:25 -0000 On Wed, May 17, 2017 at 12:49:45PM +1000, Bruce Evans wrote: > On Tue, 16 May 2017, Steve Kargl wrote: > > Index: lib/msun/ld128/s_cospil.c > > +long double > > +cospil(long double x) > > +{ > > + volatile static const double vzero = 0; > > + long double ax, c, xf; > > + uint32_t ix; > > + > > + ax = fabsl(x); > > + > > Style (lots of extra blank lines). If there are only style issues, then we are fortunate as I have not even compiled the code. > > + ax -= xf; > > + > > These extra blank lines are especially bad, since they separate related > code. Extra blank lines make things easier to read. > > > + * FIXME: This should use a polynomial approximation in [0,0.25], but the > > + * FIXME: increase in max ULP is probably small, so who cares. > > Doesn't it already use the standard kernel, which uses a poly approx? Whoops forgot to remove the unneeded comment. > > + volatile static const double vzero = 0; > > + long double ax, xf; > > + uint32_t ix; > > + > > + ax = fabsl(ax); > > + > > + if (ax < 1) { > > + if (ax < 0.5) { > > + if (ax < 0x1p-60) { > > + if (x == 0) > > + return (x); > > + ax = (double)x; > > Style (unnecessary cast). Huh? ax is long double. x is long double. The cast should remove 11 bits. > > + x -= ax; > > + t = pilo * x + pihi * x + pilo * ax > > + + pihi * ax; > > Style: > - unnecessary line splitting The is line is 80 characters without splitting (and even longer if one mandates pi_lo and pi_hi). I use an 80 character wide and wrap lines that are 79+ characters. > > Index: lib/msun/ld80/s_cospil.c > > ... > > + if (ix < 0x3ffe) /* |x| < 0.5 */ > > + c = __kernel_sinpil(0.5 - ax); > > + else if (lx < 0xc000000000000000ull) { /* |x| < 0.75 */ > > Style bug (spelling of the long long abomination using "ull"). The > explicit abomination is unfortunately needed by old versions of gcc > with CFLAGS asking for nearly C90. Whoops that one may have slipped through. > > + if (ax == 0.5) > > + RETURNI(0); > > Perhaps pessimal to not test for 0.5 in bits. > > It is pessimal to avoid generating inexact for 0.5 at all. I don't see > a much better way. I tried using bits and floating point. Saw no difference. cospi(n.5) is exact (see n1950.pdf). See comment in s_cospi.c that documents special cases. > > + } > > + INSERT_LDBL80_WORDS(x, ix, lx); > > + > > + ax -= x; > > + EXTRACT_LDBL80_WORDS(ix, lx, ax); > > I don't know if this is a good method. INSERT/EXTRACT are slow, so I > doubt that it is. In e_lgamma*.c:sin_pi*(), we use only a single GET > of the high word. This function already used EXTRACT/INSERT to classify, > and 2 more here. (_e_lgamma*.c used GET to classify before calling > sin_pi*(), then 1 more GET.) INSERT is especially slow. > > Care is needed near the 0x1p63 boundary, since the fast methods require > adding approx 0x1p63. At or above the boundary, these methods don't > obviously work. Perhaps they give the correct result starting at 2 times > the boundary, except for setting inexact. Ugh. Do you avoid the fast > methods since they set inexact for most half-integers? That costs more > than avoiding inexact for 0.5. n1950.pdf says cospi(n.5) is exact. Perhaps, the __kernel_cos() and __kernel_sin() provide an exact result. I did not investigate. > tanpi(0.25 is exact too, so quarter-integers need special treatment too > if you really want to avoid all spurious inexacts. It does. > I now see that some of the above complications are from inlining floor(). Yes, it is an inline of floor(). > > +/* > > + * The basic kernel for x in [0,0.25]. To exploit the kernel for cos(x), > > + * the argument to __kernel_cospi() must be multiply by pi. As pi is an > > + * irrational number, this allows the splitting of pi*x into high and low > > + * parts. > > + */ > > Allows? x = n + r. For double, r has 53 bits. There are no trailing low bits from the argument reduction. pi is irrational, and therefore not exactly representable by 53 bits (or any number of bits). __kernel_cospi() uses __kernel_cos(), and the low bits come about because of pi. I split x into high in low. So, we have lo = pilo * xlo + pilo * xhi + pihi * xlo; /* all the low bits */ hi = pihi * xhi; /* Exact */ _2sumF(hi, lo); /* Maybe superfluous? */ > > + > > +static inline double > > +__kernel_cospi(double x) > > +{ > > + double hi, lo; > > + hi = (float)x; > > _2sumF is extensively documented to not work with double. It requires > double_t here. It seems to work base on testing hundreds of millions of values. :-) > > +#define INLINE_KERNEL_SINDF > > +#define INLINE_KERNEL_COSDF > > Prossibly not worth inlining. > > > +#define __kernel_cospif(x) ((float)__kernel_cosdf(M_PI * (x))) > > +#define __kernel_sinpif(x) ((float)__kernel_sindf(M_PI * (x))) > > Style bugs (not even corrupt tabs, but 2 spaces after #define and 3 spaces > instead of a second tab). Ugh. My bad. I have nedit set to do 3-space tabs. I forgot to reset to use hard tabs. > Bogus casts. The kernels already return float, although that is pessimal > and destroys their the xtra precision that they naturally have. Exhaustive testing shows max ULP < 0.500xyyy where might be 0, but can't recall now. How exact to you want these? > > Index: lib/msun/src/s_sinpi.c > > ... > > +double > > +sinpi(double x) > > +{ > > + volatile static const double vzero = 0; > > + double ax, s; > > + uint32_t hx, ix, j0, lx; > > + > > + EXTRACT_WORDS(hx, lx, x); > > + ix = hx & 0x7fffffff; > > + INSERT_WORDS(ax, ix, lx); > > A slow way to set ax to fabs(x). Make it work. Make it correct. Make it fast. I've undertaken the first two. > This is otherwise much slower than sin_pi() for lgamma(). That started > with just a GET of hx, while this EXTRACTs 2 words, like floor(), apparently > to avoid spurious inexact. > > Note that lgamma(0.5) is naturally inexact. lgamma(integer) should be exact, > but I doubt that it is. Maybe we broke it. See below. > > + if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ > > + /* Determine integer part of ax. */ > > + j0 = ((ix >> 20) & 0x7ff) - 0x3ff; > > + if (j0 < 20) { > > + ix &= ~(0x000fffff >> j0); > > + lx = 0; > > + } else { > > + lx &= ~((uint32_t)0xffffffff >> (j0 - 20)); > > + } > > + INSERT_WORDS(x, ix, lx); > > + > > + ax -= x; > > + EXTRACT_WORDS(ix, lx, ax); > > Much slower than what lgamma()s sin_pi() does. After our optimizations, > that does basically rint() inline using ax + 0x1p52 - 0x1p52, I found a situation where this trick seems to round in the wrong direction. You participated in the discussion. https://lists.freebsd.org/pipermail/freebsd-numerics/2017-March/000658.html https://lists.freebsd.org/pipermail/freebsd-numerics/2017-March/000660.html > then a > further += 0x1p50 to get octants instead of half-cycles. This takes > approx 16 cycles latency on modern x86, which is lot. I think and64 can > do the above in 10-20 cycles, while i386 might take 20-40 or much > longer with store-to-load mismatch penalties. > > I think we can avoid the spurious inexacts with a little care. When > ax is an integer (and < 0x1p52), we can add at least 0x1p51 to it exactly. > We also want to avoid inexact for half-integers for sinpi and cospi, and > quarter-integers for tanpi, and it is good to reduce to quadrants anyway. > So we might intitially multiply by 4 and do the += 0x1p52 trick only if > 4*ax < 0x1p52 (or whatever the safe threshold is). Efficiency doesn't > matter for larger ax, so slow methods using things like floor(16*ax) > are adequate, or we can reduce by a few powers of 2 using special cases: > > v = 4 * ax; > if (v >= 0x1p54) > ; /* ax is an even integer (check factor) / > else { > if (v >= 0x1p53) > v -= 0x1p53; > if (v >= 0x1p52) > v -= 0x1p52; > } I'll see if I can understand this and modify waht I have. Unfortunately, I'm running out of time to work on this. > The result is always +0, exact and valid (maybe -0 in unsupported > unusual rounding modes). We just did a lot of work to avoid inexact. Not according to n1950.pdf. It states sinpi(+-n) = +-0 for positive integers. > > > + if (ax + 1 > 1) > > + ax = copysign(0, x); > > An even slower way of setting the sign than INSERT, especially if > the compiler doesn't inline copysign(). Using the correct +0 fixes > this. |x| > 0x1p(P-1) is (should be) exceptional. Who cares if it's fast. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Wed May 17 10:48:33 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4AC42D7113F for ; Wed, 17 May 2017 10:48:33 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id E98CF993 for ; Wed, 17 May 2017 10:48:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 711AB42AB13; Wed, 17 May 2017 20:48:23 +1000 (AEST) Date: Wed, 17 May 2017 20:48:22 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions In-Reply-To: <20170517042023.GA43647@troutmask.apl.washington.edu> Message-ID: <20170517184838.O60700@besplex.bde.org> References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> <20170517094848.A52247@besplex.bde.org> <20170517042023.GA43647@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=zS366X0SCmc1Lw_eJcsA:9 a=KAAw_Sxtfs94dvDO:21 a=rDEoI6jNYpUyrzqe:21 a=CjuIK1q_8ugA:10 a=TEblM0x7jKwA:10 a=R-d4_Vt-f0IA:10 a=IjZwj45LgO3ly-622nXo:22 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 May 2017 10:48:33 -0000 On Tue, 16 May 2017, Steve Kargl wrote: > On Wed, May 17, 2017 at 12:49:45PM +1000, Bruce Evans wrote: >> On Tue, 16 May 2017, Steve Kargl wrote: >>> Index: lib/msun/ld128/s_cospil.c >>> +long double >>> +cospil(long double x) >>> +{ >>> + volatile static const double vzero = 0; >>> + long double ax, c, xf; >>> + uint32_t ix; >>> + >>> + ax = fabsl(x); >>> + >> >> Style (lots of extra blank lines). > > If there are only style issues, then we are fortunate as > I have not even compiled the code. > >>> + ax -= xf; >>> + >> >> These extra blank lines are especially bad, since they separate related >> code. > > Extra blank lines make things easier to read. Harder. Related lines should be together. >>> + volatile static const double vzero = 0; >>> + long double ax, xf; >>> + uint32_t ix; >>> + >>> + ax = fabsl(ax); >>> + >>> + if (ax < 1) { >>> + if (ax < 0.5) { >>> + if (ax < 0x1p-60) { >>> + if (x == 0) >>> + return (x); >>> + ax = (double)x; >> >> Style (unnecessary cast). > > Huh? ax is long double. x is long double. The cast should > remove 11 bits. Oops. This special case for small x is annoying. >>> + x -= ax; >>> + t = pilo * x + pihi * x + pilo * ax >>> + + pihi * ax; >> >> Style: >> - unnecessary line splitting > > The is line is 80 characters without splitting (and even longer if > one mandates pi_lo and pi_hi). I use an 80 character wide and wrap > lines that are 79+ characters. It seemed a bit longer when I tested it before. I missed the confusing names pessimal arrangement of terms before. ax is clobbered to hold the hi term, and x is clobbered to hold the lo term. Only the pihi * hi term (spelled pihi * ax) needs to be evaluated exactly or even accurately. pilo * x can possibly be ignored, but it is better to group terms so that evaluating it is free. After grouping terms, the expression fits in 79 columns: t = (pihi + pilo) * x + pilo * ax + pihi * ax; Burt with better variable names, it no longer fits, unless (pihi + pilo) is spelled pi. >>> Index: lib/msun/ld80/s_cospil.c >>> ... >>> + if (ix < 0x3ffe) /* |x| < 0.5 */ >>> + c = __kernel_sinpil(0.5 - ax); >>> + else if (lx < 0xc000000000000000ull) { /* |x| < 0.75 */ >> >> Style bug (spelling of the long long abomination using "ull"). The >> explicit abomination is unfortunately needed by old versions of gcc >> with CFLAGS asking for nearly C90. > > Whoops that one may have slipped through. There are also some f suffixes which should be F. >>> + if (ax == 0.5) >>> + RETURNI(0); >> >> Perhaps pessimal to not test for 0.5 in bits. >> >> It is pessimal to avoid generating inexact for 0.5 at all. I don't see >> a much better way. > > I tried using bits and floating point. Saw no difference. > cospi(n.5) is exact (see n1950.pdf). See comment in s_cospi.c > that documents special cases. I havven't looked at n1950.pdf. Exact shouldn't mean "no spurious inexact". >>> +/* >>> + * The basic kernel for x in [0,0.25]. To exploit the kernel for cos(x), >>> + * the argument to __kernel_cospi() must be multiply by pi. As pi is an >>> + * irrational number, this allows the splitting of pi*x into high and low >>> + * parts. >>> + */ >> >> Allows? > > x = n + r. For double, r has 53 bits. There are no trailing low bits > from the argument reduction. pi is irrational, and therefore not exactly > representable by 53 bits (or any number of bits). __kernel_cospi() uses > __kernel_cos(), and the low bits come about because of pi. I split > x into high in low. So, we have It is a misdescription or grammar error. Errors start with "exploit" instead of "use" (this is only a style bug). Then "multiply" instead of "multiplied". Irrationality has little to do with things. Most numbers, rational, irrational and algebraic and transcendental, are not exactly representable. This doesn't allow splitting, but usually requires splitting. Transcendental numbers are less likely to need splitting than albegraic numbers since they are more likely to be nearly representable, but since no numeric accident for pi occurs or is arranged or easily arrangeable near here, it does need splitting. The source code is not the place to describe basic splitting. > lo = pilo * xlo + pilo * xhi + pihi * xlo; /* all the low bits */ > hi = pihi * xhi; /* Exact */ > _2sumF(hi, lo); /* Maybe superfluous? */ In my experiments, _2sumF was needed to prevent too much being in the lo term. The lo term has pessimal grouping here. It should be written as (pihi + pilo) * xlo + pilo * xhi or pilo * x + pihi * xlo. The second one is probably better since it has a shorter dependency chain for pilo * x. >>> +static inline double >>> +__kernel_cospi(double x) >>> +{ >>> + double hi, lo; >>> + hi = (float)x; >> >> _2sumF is extensively documented to not work with double. It requires >> double_t here. > > It seems to work base on testing hundreds of millions of values. :-) It never works with extra precision. Except when double is accidentally the same as double_t. That is, when the variables declared as double remain in registers. While they are there, they act like a hi-lo decomposition in double_t. This breaks as soon as the variables are spilled or passed to a function. You inlined the kernels and this masks the problem. The problem still occurs if the compiler decides not to inline. Example: let x be just 1.25. The initial (hi, lo) is (1.25, 0). After multiplying by pi, it is (1.25 * pihi, 1.25 * pilo). The lo term here is small enough to work directly. But we have to merger the terms for the general case. We use _2sumF(). With extra precision, this puts most of the lo term in the hi term. (hi, lo) is now approx. (1.25 * pi, 0) in extra precision. Passing this to a function or assigning it using a C99 compiler loses all of the extra precision and gives approx. (1.25 * pi, 0) in double precision. If there is extra precision and the variable type is plain double, then _2sumF() also depends on not being compiled by a C99 compiler for efficiency. C99 compilers produce a (hi, lo) decomposition in plain doubles, but this is too slow to use. My logl() depended on the double variables staying in registers not so accidentally for many years until I found a portable way to write an efficient _2sumF(). >>> +#define INLINE_KERNEL_SINDF >>> +#define INLINE_KERNEL_COSDF >> >> Prossibly not worth inlining. Inlining is a small pessimization in most of my tests in lgamma. >> Bogus casts. The kernels already return float, although that is pessimal >> and destroys their the xtra precision that they naturally have. > > Exhaustive testing shows max ULP < 0.500xyyy where might be 0, but > can't recall now. How exact to you want these? The same number of bits as there are internally. This is about 10 extra bits to get the accuracy of 0.5009 ulps (the error is about 1 in 1024 ot 0.0009 internally and 0.5000 more for rounding to float). >>> Index: lib/msun/src/s_sinpi.c >>> ... >>> +double >>> +sinpi(double x) >>> +{ >>> + volatile static const double vzero = 0; >>> + double ax, s; >>> + uint32_t hx, ix, j0, lx; >>> + >>> + EXTRACT_WORDS(hx, lx, x); >>> + ix = hx & 0x7fffffff; >>> + INSERT_WORDS(ax, ix, lx); >> >> A slow way to set ax to fabs(x). > > Make it work. Make it correct. Make it fast. I've undertaken > the first two. Simple fabs() already does the first 2 much more easily. The fdlibm implemenation of it is slow essentially because it does the above EXTRACT/INSERT (a subset, but almost as slow). fabs() is normally a builtin and doesn't use these. ax = (x < 0 ? -x : x) is also faster than the above, but doesn't work right for NaNs. It can be done after testing for NaNs. >> Note that lgamma(0.5) is naturally inexact. lgamma(integer) should be exact, >> but I doubt that it is. Maybe we broke it. We didn't. >>> + if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ >>> + /* Determine integer part of ax. */ >>> + j0 = ((ix >> 20) & 0x7ff) - 0x3ff; >>> + if (j0 < 20) { >>> + ix &= ~(0x000fffff >> j0); >>> + lx = 0; >>> + } else { >>> + lx &= ~((uint32_t)0xffffffff >> (j0 - 20)); >>> + } >>> + INSERT_WORDS(x, ix, lx); >>> + >>> + ax -= x; >>> + EXTRACT_WORDS(ix, lx, ax); >> >> Much slower than what lgamma()s sin_pi() does. After our optimizations, >> that does basically rint() inline using ax + 0x1p52 - 0x1p52, > > I found a situation where this trick seems to round in the wrong direction. > You participated in the discussion. > > https://lists.freebsd.org/pipermail/freebsd-numerics/2017-March/000658.html > https://lists.freebsd.org/pipermail/freebsd-numerics/2017-March/000660.html We always had a fix for that in lgamma. Now I'm not sure why floor() and rint() don't use our method. >> I think we can avoid the spurious inexacts with a little care. When >> ax is an integer (and < 0x1p52), we can add at least 0x1p51 to it exactly. >> We also want to avoid inexact for half-integers for sinpi and cospi, and >> quarter-integers for tanpi, and it is good to reduce to quadrants anyway. >> So we might intitially multiply by 4 and do the += 0x1p52 trick only if >> 4*ax < 0x1p52 (or whatever the safe threshold is). Efficiency doesn't >> matter for larger ax, so slow methods using things like floor(16*ax) >> are adequate, or we can reduce by a few powers of 2 using special cases: >> >> v = 4 * ax; >> if (v >= 0x1p54) >> ; /* ax is an even integer (check factor) / >> else { >> if (v >= 0x1p53) >> v -= 0x1p53; >> if (v >= 0x1p52) >> v -= 0x1p52; >> } I ended up going the mostly other way and removing all of the bit fiddling. The preliminary +-0x1p52 was mostly a waste of time, so I removed it, but ;ater I brought it back in a different form and removed the secondatry +-0x1p50. Then I compressed the case statement. The result is much simpler but only slightly faster: XX /* XX * Compute sin(pi*x), with the multiplication done after reducing x. XX * sin_pi(x) is only called for x < 0 and |x| < 2**(p-1) where p is XX * the precision of x. XX */ XX static double XX sin_pi(double x) XX { XX volatile double vz; XX double s,y,z; XX XX y = -x; XX XX /* XX * Redundant step to demonstrate a general version. The usual magic XX * for rounding needs only no overflow, which this step ensures. XX */ XX if (y >= 0x1p53) XX return zero; /* even integer */ XX XX if (y < 2) XX goto y_reduced; XX vz = y+0x1p53; XX z = vz-0x1p53; /* y rounded to a multiple of 2 */ XX if (z > y) XX z -= 2; /* adjust to round down */ XX y = y - z; /* y mod 2 */ XX y_reduced: XX if (y >= one) { XX s = -one; XX y -= one; XX } else XX s = one; XX if (y == zero) XX return zero; /* avoid inexact at poles */ XX if (y >= 0.75) XX y -= one; /* join with y < 0.25 case */ XX if (y < 0.25) { XX return __kernel_sin(pi*y,zero,0); XX } else XX return __kernel_cos(pi*(y-half),zero); XX } YY static float YY sin_pif(float x) YY { YY volatile float vz; YY float s,y,z; YY YY y = -x; YY YY if (y >= 0x1p24) YY return zero; /* y is even integer */ YY YY if (y < 2) YY goto y_reduced; YY vz = y+0x1p24F; YY z = vz-0x1p24F; /* y rounded to a multiple of 2 */ YY if (z > y) YY z -= 2; /* adjust to round down */ YY y = y - z; /* y mod 2 */ YY y_reduced: YY if (y > one) { YY s = -one; YY y -= one; YY } else YY s = one; YY if (y == zero) YY return zero; /* avoid inexact at poles */ YY if (y >= 0.75F) YY y -= one; /* join with y < 0.25 case */ YY if (y < 0.25F) YY return s*__kernel_sindf(pi*y); YY else YY return s*__kernel_cosdf(pi*(y-half)); YY } This also does the sign stuff more like you. It works to be very sloppy with signs here. lgamma squares the result. But leaving out the s's makes little difference to the speed. After compressing the case statement, the sign handling takes little space too. This sin_pif() with its inaccurate multiplication by 'float pi' gives more accuracy for lgammaf() than your more accurate version. Replacing pi in the multiplications by M_PI acts similarly strangely. This seems to be because lgamma() does the calculation pi/(sin_pif(x)*x) using its pi. When x is small, this pi/(pi*x*x) and it is apparently better to have the same pi in the numerator and denominator. My test gives too much coverage to x's near 0, enough for the error count for these cases to dominate improvements elsewhere. >> The result is always +0, exact and valid (maybe -0 in unsupported >> unusual rounding modes). We just did a lot of work to avoid inexact. > > Not according to n1950.pdf. It states > > sinpi(+-n) = +-0 for positive integers. Bug in n1950.pdf. The sign depends on which side of +-n you are on, and there is no reason to prefer either. Rather the opposite -- limits with x increasing gives the opposite. >>> + if (ax + 1 > 1) >>> + ax = copysign(0, x); >> >> An even slower way of setting the sign than INSERT, especially if >> the compiler doesn't inline copysign(). Using the correct +0 fixes >> this. > > |x| > 0x1p(P-1) is (should be) exceptional. Who cares if it's fast. It is also buggy. ax here is large, so ax + 1 > 1 is always true. If ax is DBL_MAX, then ax + 1 gives a spurious overflow exception only in unsupported rounding mode, and then only if there is no extra precision. The test is apparently intended to check for odd integers. Something like (ax + 1 > ax) is closer to working for that. But it doesn't work with extra precision, and risks overflow. My classification handles this by reducing to 0 <= y < 2 by subtracting an even integer from y = |x|. y == 0 means an even integer and y == 1 means an odd integer. Then it takes special care to return +0 for odd integers :-). It reduces 1 to 0; it sets the sign, but doesn't use it when y was 1. Falling through to s*__kernel_sin(0...) would probably work to give the n1950.pdf bug and not give spurious inexact. Bruce From owner-freebsd-numerics@freebsd.org Wed May 17 18:09:34 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A761DD713C5 for ; Wed, 17 May 2017 18:09:34 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 84F351647 for ; Wed, 17 May 2017 18:09:34 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4HI9SHA054575 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 17 May 2017 11:09:28 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4HI9REn054574; Wed, 17 May 2017 11:09:27 -0700 (PDT) (envelope-from sgk) Date: Wed, 17 May 2017 11:09:27 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170517180927.GA54431@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> <20170517094848.A52247@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170517094848.A52247@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 May 2017 18:09:34 -0000 On Wed, May 17, 2017 at 12:49:45PM +1000, Bruce Evans wrote: > On Tue, 16 May 2017, Steve Kargl wrote: > > > Index: lib/msun/ld128/s_cospil.c > > ... > > +static const long double > > +pihi = 3.14159265358979322702026593105983920e+00L, > > +pilo = 1.14423774522196636802434264184180742e-17L; > > These are not in normal format, and are hard to read. I can't see if > pihi has the correct number of zero bits for exact multiplication. I don't have access to an ld128 system with suitable facilities to allow me to spit out the hex representation. As such, I've added a comment /* * pi_hi contains the leading 56 bits of a 169 bit approximation for pi. */ > > These don't have the normal spelling. fdlibm never uses pihi or pio2hi, > or pi_hi. It often uses pio2_hi and other pio2_*. My s_sinpi.c uses > pi_hi. fdlibm has no convention. For example, e_log10.c actually uses a mixature of convensions. static const double two54 = 1.80143985094819840000e+16, /* 0x43500000, 0x00000000 */ ivln10hi = 4.34294481878168880939e-01, /* 0x3fdbcb7b, 0x15200000 */ ivln10lo = 2.50829467116452752298e-11, /* 0x3dbb9438, 0xca9aadd5 */ log10_2hi = 3.01029995663611771306e-01, /* 0x3FD34413, 0x509F6000 */ log10_2lo = 3.69423907715893078616e-13; /* 0x3D59FEF3, 0x11F12B36 */ and double f,hfsq,hi,lo,r,val_hi,val_lo,w,y,y2; -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Wed May 17 23:26:09 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B6EC1D71CBA for ; Wed, 17 May 2017 23:26:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 6620514CB for ; Wed, 17 May 2017 23:26:08 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id E87F8429AEB; Thu, 18 May 2017 09:25:59 +1000 (AEST) Date: Thu, 18 May 2017 09:25:58 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions In-Reply-To: <20170517180927.GA54431@troutmask.apl.washington.edu> Message-ID: <20170518072636.E951@besplex.bde.org> References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> <20170517094848.A52247@besplex.bde.org> <20170517180927.GA54431@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=WvBbCZXv c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=Sek2kApjW0X8YfFk6SkA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 May 2017 23:26:09 -0000 On Wed, 17 May 2017, Steve Kargl wrote: > On Wed, May 17, 2017 at 12:49:45PM +1000, Bruce Evans wrote: >> On Tue, 16 May 2017, Steve Kargl wrote: >> >>> Index: lib/msun/ld128/s_cospil.c >>> ... >>> +static const long double >>> +pihi = 3.14159265358979322702026593105983920e+00L, >>> +pilo = 1.14423774522196636802434264184180742e-17L; >> >> These are not in normal format, and are hard to read. I can't see if >> pihi has the correct number of zero bits for exact multiplication. > > I don't have access to an ld128 system with suitable > facilities to allow me to spit out the hex representation. flame is still in the freebsd cluster, but is almost unusable because its files are an old copy of actual home directories on freefall. Utilities for printing hex and FP in good formats are remarkably rare. pari/gp supports many special math types but can only input or output ordinarly numbers in decimal. I use my library routines to print hex and special FP formats for it. sh, printf(1) and awk(1) are limited by native tyoes and have many other defects. > As such, I've added a comment > > /* > * pi_hi contains the leading 56 bits of a 169 bit approximation for pi. > */ Why 56? 53 + 56 = 109. >> These don't have the normal spelling. fdlibm never uses pihi or pio2hi, >> or pi_hi. It often uses pio2_hi and other pio2_*. My s_sinpi.c uses >> pi_hi. > > fdlibm has no convention. For example, e_log10.c actually uses > a mixature of convensions. > > static const double > two54 = 1.80143985094819840000e+16, /* 0x43500000, 0x00000000 */ > ivln10hi = 4.34294481878168880939e-01, /* 0x3fdbcb7b, 0x15200000 */ > ivln10lo = 2.50829467116452752298e-11, /* 0x3dbb9438, 0xca9aadd5 */ > log10_2hi = 3.01029995663611771306e-01, /* 0x3FD34413, 0x509F6000 */ > log10_2lo = 3.69423907715893078616e-13; /* 0x3D59FEF3, 0x11F12B36 */ My log* (committed only for long double precision) cleans this up a bit and uses invln10_hi, etc. My improvements to lgamma()'s sin_pi*() are buggy. I forgot the original target of avoiding inexact better: XX --- /tmp/bak/e_lgamma_r.c Thu Oct 30 09:20:46 2014 XX +++ e_lgamma_r.c Thu May 18 07:46:37 2017 XX @@ -157,5 +157,5 @@ XX XX /* XX - * Compute sin(pi*x) without actually doing the pi*x multiplication. XX + * Compute sin(pi*x), with the multiplication done after reducing x. XX * sin_pi(x) is only called for x < 0 and |x| < 2**(p-1) where p is XX * the precision of x. You added this comment. It is not very useful. Its first sentence was wrong about not doing the multiplication (fixed). It's second sentence is redundant. The caller is nearby, and spells p-1 literally so that it is easier to see that the magic numbers here match up. The caller's code is: if(ix>=0x43300000) /* |x|>=2**52, must be -integer */ return one/vzero; t = sin_pi(x); if(t==zero) return one/vzero; /* -integer */ This is a not so good optimization optimization. This function has to be careful with large values and it would be clearer to do all the checking itself. Then the t==zero check will handle the large integer. My bug and old code and optimizations are related to this. XX @@ -165,35 +165,149 @@ XX { XX volatile double vz; XX - double y,z; XX - int n; XX + double s,y,z; XX XX y = -x; XX XX - vz = y+0x1p52; /* depend on 0 <= y < 0x1p52 */ The dependencies are for 2 things: - no overflow when y is very large - no spurious inexact when y is just 0x1p52. For y below 0x1p52, the addition keeps the integer part and loses the fractional part, so it gives inexact if the fractional part is nonzero. This is just what we want for lgamma. However, for general sinpi() we don't want inexact for half-integers, and for general tanpi() we don't want inexact for quarter-integers. XX - z = vz-0x1p52; /* rint(y) for the above range */ XX - if (z == y) XX - return zero; My initially fix was to remove the above code. Integers can be classifyed later. fdlibm starts with floor() here. This works for lgamma(), but gives the same inexact for half- and quarter-integers as the above, so is not suitable for general used. XX - XX - vz = y+0x1p50; XX - GET_LOW_WORD(n,vz); /* bits for rounded y (units 0.25) */ XX - z = vz-0x1p50; /* y rounded to a multiple of 0.25 */ XX - if (z > y) { XX - z -= 0.25; /* adjust to round down */ XX - n--; This classifies quarter-integers well. y is 0 at exact quarter-integers, and there is no spurious inexact for them. n tells you which quarter-integer it was. XX + /* XX + * Redundant step to demonstrate a general version. The usual magic XX + * for rounding needs only no overflow, which this step ensures. XX + */ XX + if (y >= 0x1p53) XX + return zero; /* even integer */ The comment is wrong. I changed from 0x1p5{0,2} to 0x1p53 to get y in the full range [0, 2) instead of [0, 0.25) or [0, 1), but this gives spurious inexact for odd integers between 0x1p52 and 0x1p53 as well as for the half- and quarter-integers. This can be fixed using special classification near 0x1p52: - >= 0x1p53: even integer - in [0x1p52, 0x1p53): integer - in [0x1p51, 0x1p52): half-integer (that is, integer or integer + 0.5 - in [0x1p50, 0x1p51): quarter-integer - < 0x1p50: use general method but it seems better to go back to the general method of adding and subtracting 0x1p50. That gives a null rounding step (without inexact) for all values >= 0x1p50 provided only that overflow doesn't occur. I changed this because it reduces y a bit too much, and needs something like the GET of n to classify the quadrant. XX + XX + if (y < 2) XX + goto y_reduced; XX + vz = y+0x1p53; XX + z = vz-0x1p53; /* y rounded to a multiple of 2 */ XX + if (z > y) XX + z -= 2; /* adjust to round down */ XX + y = y - z; /* y mod 2 */ XX +y_reduced: The problem with this method is fundamental. Rounding of y to an integer multiple of 2 in any normal way should cause inexact whenever y is not such a multiple, but we want to avoid inexact except for multiples of 0.25. I think the remainder operation doesn't set inexact, so remainder(y, 2) would work. I expected it to be slow, but it is actually faster than the above on i386! (remainder is in hardware on x86, but it uses the i387, so is not quite as good on amd64.) Code using remainder(): XX /* XX * Redundant step to demonstrate a general version. The usual magic XX * for rounding needs only no overflow, which this step ensures. XX */ XX if (y >= 0x1p53) XX return zero; /* even integer */ XX XX if (y < 2) XX goto y_reduced; XX #if 0 XX vz = y+0x1p53; XX z = vz-0x1p53; /* y rounded to a multiple of 2 */ XX if (z > y) XX z -= 2; /* adjust to round down */ XX y = y - z; /* y mod 2 */ XX #else XX y = remainder(y, 2); XX if (y < 0) XX y += 2; XX #endif XX y_reduced: x86 remainder() has a slow loop for large args, so although it would work up to DBL_MAX, the special case to handle large args is still needed as an optimization. If this works, then it also works for conversions from degrees to radians (first take the remainder mod 360 instead of 2 in the above), and later scale by pi/180 instead of pi. Bruce From owner-freebsd-numerics@freebsd.org Thu May 18 01:42:28 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 150C3D712C0 for ; Thu, 18 May 2017 01:42:28 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D3F74198A for ; Thu, 18 May 2017 01:42:27 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4I1gQvg063967 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 17 May 2017 18:42:26 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4I1gQAl063966; Wed, 17 May 2017 18:42:26 -0700 (PDT) (envelope-from sgk) Date: Wed, 17 May 2017 18:42:26 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170518014226.GA63819@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> <20170517094848.A52247@besplex.bde.org> <20170517180927.GA54431@troutmask.apl.washington.edu> <20170518072636.E951@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170518072636.E951@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 May 2017 01:42:28 -0000 On Thu, May 18, 2017 at 09:25:58AM +1000, Bruce Evans wrote: > On Wed, 17 May 2017, Steve Kargl wrote: > > > On Wed, May 17, 2017 at 12:49:45PM +1000, Bruce Evans wrote: > >> On Tue, 16 May 2017, Steve Kargl wrote: > >> > >>> Index: lib/msun/ld128/s_cospil.c > >>> ... > >>> +static const long double > >>> +pihi = 3.14159265358979322702026593105983920e+00L, > >>> +pilo = 1.14423774522196636802434264184180742e-17L; > >> > >> These are not in normal format, and are hard to read. I can't see if > >> pihi has the correct number of zero bits for exact multiplication. > > > > I don't have access to an ld128 system with suitable > > facilities to allow me to spit out the hex representation. > > flame is still in the freebsd cluster, but is almost unusable because > its files are an old copy of actual home directories on freefall. Yes, I know flame is still available. I deleted all infrastructure I had on freefall over 2 years agos. Need at a minimum GMP and MPFR. > Utilities for printing hex and FP in good formats are remarkably rare. > pari/gp supports many special math types but can only input or output > ordinarly numbers in decimal. I use my library routines to print hex > and special FP formats for it. sh, printf(1) and awk(1) are limited > by native tyoes and have many other defects. I wrote my own utility with the catchy name hex. > > As such, I've added a comment > > > > /* > > * pi_hi contains the leading 56 bits of a 169 bit approximation for pi. > > */ > > Why 56? 53 + 56 = 109. This is ld128. static const long double pi_hi = 3.14159265358979322702026593105983920e+00L, pi_lo = 1.14423774522196636802434264184180742e-17L; pi_hi has 113/2 = 56 bits. pi_lo has 113 bit. 56 + 113 = 169 > > >> These don't have the normal spelling. fdlibm never uses pihi or pio2hi, > >> or pi_hi. It often uses pio2_hi and other pio2_*. My s_sinpi.c uses > >> pi_hi. I've changed my spelling. > > My improvements to lgamma()'s sin_pi*() are buggy. I forgot the original > target of avoiding inexact better: I'll need to wait until the weekend to study your improved sin_pi. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Thu May 18 06:35:26 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5B41D72F03 for ; Thu, 18 May 2017 06:35:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 71E551DC2 for ; Thu, 18 May 2017 06:35:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 36E473C9655; Thu, 18 May 2017 16:35:22 +1000 (AEST) Date: Thu, 18 May 2017 16:34:57 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions In-Reply-To: <20170518014226.GA63819@troutmask.apl.washington.edu> Message-ID: <20170518154820.A8280@besplex.bde.org> References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> <20170517094848.A52247@besplex.bde.org> <20170517180927.GA54431@troutmask.apl.washington.edu> <20170518072636.E951@besplex.bde.org> <20170518014226.GA63819@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=WvBbCZXv c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=AT3K7O-BTnsdAHa8EAYA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 May 2017 06:35:26 -0000 On Wed, 17 May 2017, Steve Kargl wrote: > On Thu, May 18, 2017 at 09:25:58AM +1000, Bruce Evans wrote: >> On Wed, 17 May 2017, Steve Kargl wrote: > ... >>> As such, I've added a comment >>> >>> /* >>> * pi_hi contains the leading 56 bits of a 169 bit approximation for pi. >>> */ >> >> Why 56? 53 + 56 = 109. > > This is ld128. ld128 has precision 113. > static const long double > pi_hi = 3.14159265358979322702026593105983920e+00L, > pi_lo = 1.14423774522196636802434264184180742e-17L; > > pi_hi has 113/2 = 56 bits. > pi_lo has 113 bit. > > 56 + 113 = 169 But hi is set intentionally sloppily by converting to double, so it has 53 bits instead of the 57 that 56 would go with (57 = 113/2 rounded up). >> My improvements to lgamma()'s sin_pi*() are buggy. I forgot the original >> target of avoiding inexact better: > > I'll need to wait until the weekend to study your improved sin_pi. I now have closer to production version. sinpi() runs more than 2 times faster that your version, without doing much different except not having so much EXTRACT/INSERT or special cases. (66 cycles instead of 150 on Haswell-i386, and the 66 can be reduced to 58 by removing the STRICT_ASSIGN() which breaks only i386 with non-default extra precision. The difference would be smaller on amd64 or better compilers , but Haswell-i386 has much the same slownesses as old Athlon from the EXTRACTs and INSERTs. However, both versions fail tests. Min has a max error of 2048 ulps and an everage error of 928 ulps and yours has a maximum error of many giga-ups for the sign bit wrong and an average error of 561 ulps. The enormous errors are probably for denormals or NaNs. XX double XX ysinpi(double x) XX { XX double s,y,z; XX int32_t hx,ix; XX int n; XX XX GET_HIGH_WORD(hx,x); XX ix = hx & 0x7fffffff; XX XX if (ix < 0x3fd00000) { /* |x| < 0.25 */ XX if (ix < 0x3e200000) /* |x| < 2**-29 */ XX /* XXX: extra precision would generate spurious denormals. */ XX if(((int)x)==0) return M_PI*x; /* pi*x with inexact if x != 0 */ Note a problem with denormals. The hi+lo decomposition on values up to about 2**53 times the largest denormal creates denormals and thus spurious denormal and underflow exceptions and possibly slowness. I'm not sure if it gives correct results either. Your version has a special case for calculating Pi*x, but doesn't handle this. I think it would work to scale up into non-denormal range for the calculation. I just multiply by M_PI. I suspects this gives an error of nearly 1 ulp and will test to see if it is strictly below 1 ulp (the details depend on how closely M_PI approximates pi). XX return __kernel_sinpi(x); XX } XX XX if (ix>=0x7ff00000) return x-x; /* Inf or NaN -> NaN */ XX This classification was cloned from sin(). Then adjust constants as in your version. Not doing so much here probably gives half of the 150:66 speed improvement. Don't bother with special cases for |x| <= 2 or 2.25 as done by sinf(). We don't really want the special case for |x| <= 0.25, but need it to avoid denormals and underflow. BTW, I just noticed that I broke this case in lgamma*(). I moved the check for tiny x out of the kernels into the sin*() and cos*() callers (or kept the duplicate of it there), but didn't add it for the lgamma*() callers. I broke it for sinf() or sin() too. You broke it for sinl(). XX if (ix >= 0x43400000) return 0; /* even integer -> 0 */ XX XX y = fabs(x); XX XX STRICT_ASSIGN(double,z,y+0x1p50); XX GET_LOW_WORD(n,z); /* bits for rounded y (units 0.25) */ XX z -= 0x1p50; /* y rounded to a multiple of 0.25 */ XX if (z > y) { XX z -= 0.25; /* adjust to round down */ XX n--; XX } XX n &= 7; /* octant of y mod 2 */ XX y -= z; /* y mod 0.25 */ This is from old optimizations in lgamma() with minor improvements: - remove extra reduction step to classify integers - use STRICT_ASSIGN() instead of hard-coded volatile to avoid pessimizing arches without extra precision XX XX if (n >= 4) { XX s = -1; XX n -= 4; XX } else XX s = 1; XX if (n == 0) XX return s*__kernel_sinpi(y); /* this handles poles right */ Oops, the poles are specific to gamma(). This handles integers right. y is 0 and the kernel returns +=0 without setting inexact (depend on knowing its internals for the latter). Don't be careful about the sign. I think it is always +0 with the default rounding mode. XX else if (n == 1) XX return s*__kernel_cospi(y-0.25); XX else if (n == 2) XX return s*__kernel_cospi(y); This handles half-integers right. No problem with the sign since the kernel always returns 1. XX else XX return s*__kernel_sinpi(y-0.25); XX } This is from old optimizations in lgamma with minor improvements (but larger code changes): - don't use a case statement - don't add n * 0.25 to y only to have to a multiple of 0.25 from y in more cases later. Bruce From owner-freebsd-numerics@freebsd.org Thu May 18 17:54:37 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9070DD70642 for ; Thu, 18 May 2017 17:54:37 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6455FCF7 for ; Thu, 18 May 2017 17:54:37 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4IHsZVu074686 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 18 May 2017 10:54:35 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4IHsZFV074685; Thu, 18 May 2017 10:54:35 -0700 (PDT) (envelope-from sgk) Date: Thu, 18 May 2017 10:54:34 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170518175434.GA74453@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> <20170517094848.A52247@besplex.bde.org> <20170517180927.GA54431@troutmask.apl.washington.edu> <20170518072636.E951@besplex.bde.org> <20170518014226.GA63819@troutmask.apl.washington.edu> <20170518154820.A8280@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170518154820.A8280@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 May 2017 17:54:37 -0000 On Thu, May 18, 2017 at 04:34:57PM +1000, Bruce Evans wrote: > On Wed, 17 May 2017, Steve Kargl wrote: > > > On Thu, May 18, 2017 at 09:25:58AM +1000, Bruce Evans wrote: > >> On Wed, 17 May 2017, Steve Kargl wrote: > > ... > >>> As such, I've added a comment > >>> > >>> /* > >>> * pi_hi contains the leading 56 bits of a 169 bit approximation for pi. > >>> */ > >> > >> Why 56? 53 + 56 = 109. > > > > This is ld128. > > ld128 has precision 113. Yes. I know. > > static const long double > > pi_hi = 3.14159265358979322702026593105983920e+00L, > > pi_lo = 1.14423774522196636802434264184180742e-17L; > > > > pi_hi has 113/2 = 56 bits. > > pi_lo has 113 bit. > > > > 56 + 113 = 169 > > But hi is set intentionally sloppily by converting to double, so it has > 53 bits instead of the 57 that 56 would go with (57 = 113/2 rounded up). A 53-bit entity times a 56-bit entity needs 109-bits for the result. A 113-bit entity can hold a 109-bit entity. Or am I missing something? > >> My improvements to lgamma()'s sin_pi*() are buggy. I forgot the original > >> target of avoiding inexact better: > > > > I'll need to wait until the weekend to study your improved sin_pi. > > I now have closer to production version. sinpi() runs more than 2 > times faster that your version, without doing much different except > not having so much EXTRACT/INSERT or special cases. (66 cycles instead > of 150 on Haswell-i386, and the 66 can be reduced to 58 by removing > the STRICT_ASSIGN() which breaks only i386 with non-default extra > precision. The difference would be smaller on amd64 or better compilers > , but Haswell-i386 has much the same slownesses as old Athlon from the > EXTRACTs and INSERTs. > > However, both versions fail tests. Min has a max error of 2048 ulps > and an everage error of 928 ulps and yours has a maximum error of many > giga-ups for the sign bit wrong and an average error of 561 ulps. The > enormous errors are probably for denormals or NaNs. You're correct that I did not give much consideration to subnormal number. For float this, can be fixed by if (ix < 0x38800000) { /* |x| < 0x1p-14 */ if (x == 0) return (x); SET_FLOAT_WORD(hi, hx & 0xffff0000); if (ix <= 0x0b7f0001) { /* Avoid spurios underflow. */ hi *= 0x1p23F; lo = x * 0x1p23F - hi; s = (pi_lo + pi_hi) * lo + pi_lo * hi + pi_hi * hi; return (s * 0x1p-23F); } lo = x - hi; s = (pi_lo + pi_hi) * lo + pi_lo * hi + pi_hi * hi; return (s); } or if you want to avoid the extra if-statement. Simply scale if (ix < 0x38800000) { /* |x| < 0x1p-14 */ if (x == 0) return (x); SET_FLOAT_WORD(hi, hx & 0xffff0000); hi *= 0x1p23F; lo = x * 0x1p23F - hi; s = (pi_lo + pi_hi) * lo + pi_lo * hi + pi_hi * hi; return (s * 0x1p-23F); } Starting with x = 0x1p-14 and using nextafterf(x,-1.f), the first value where underflow is signalled is % ./testf -S -D sinpif(3.74171353e-39) = 1.17549407e-38 x is subnormal sinpif(x) is subnormal Exhaustive testing of numbers in the domain [FLT_MIN, 0x1p-14] yields % ./testf -S -m 1.17549435E-38F -M 0x1p-14F -ed MAX ULP: 0.58366567 Total tested: 939524096 -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Thu May 18 22:56:41 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B4D4AD71A0F for ; Thu, 18 May 2017 22:56:41 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 529461D4B for ; Thu, 18 May 2017 22:56:41 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 4FCCBD67DD8; Fri, 19 May 2017 08:56:28 +1000 (AEST) Date: Fri, 19 May 2017 08:56:27 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions In-Reply-To: <20170518175434.GA74453@troutmask.apl.washington.edu> Message-ID: <20170519074512.M884@besplex.bde.org> References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> <20170517094848.A52247@besplex.bde.org> <20170517180927.GA54431@troutmask.apl.washington.edu> <20170518072636.E951@besplex.bde.org> <20170518014226.GA63819@troutmask.apl.washington.edu> <20170518154820.A8280@besplex.bde.org> <20170518175434.GA74453@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=30vkLdqRwj5xFZukw2YA:9 a=2GCknW8xBUceK4Wk:21 a=Tm1mFJjD2G5J-NMC:21 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 May 2017 22:56:41 -0000 On Thu, 18 May 2017, Steve Kargl wrote: > On Thu, May 18, 2017 at 04:34:57PM +1000, Bruce Evans wrote: >> On Wed, 17 May 2017, Steve Kargl wrote: >> >>> On Thu, May 18, 2017 at 09:25:58AM +1000, Bruce Evans wrote: >>>> On Wed, 17 May 2017, Steve Kargl wrote: >>> ... >>>>> As such, I've added a comment >>>>> >>>>> /* >>>>> * pi_hi contains the leading 56 bits of a 169 bit approximation for pi. >>>>> */ >>>> >>>> Why 56? 53 + 56 = 109. >>> >>> This is ld128. >> >> ld128 has precision 113. > > Yes. I know. > >>> static const long double >>> pi_hi = 3.14159265358979322702026593105983920e+00L, >>> pi_lo = 1.14423774522196636802434264184180742e-17L; >>> >>> pi_hi has 113/2 = 56 bits. >>> pi_lo has 113 bit. >>> >>> 56 + 113 = 169 >> >> But hi is set intentionally sloppily by converting to double, so it has >> 53 bits instead of the 57 that 56 would go with (57 = 113/2 rounded up). > > A 53-bit entity times a 56-bit entity needs 109-bits for the result. > A 113-bit entity can hold a 109-bit entity. Or am I missing something? It wastes 4 bits. This reader has to guess if this is intentional for magic elsewhere, or just sloppy. Also, the rounded value might naturally have some extra low zero bits. So looking at the bits, it is not easy to see if there are the right number. This is not worth a comment. The reader should trust that the right (maximal) number of bits are used, and only check it approximately. The double precision pi*hi and pi*lo also seemed to be sloppily rounded. My version has less precision (28 low zero bits in pi_hi instead of the minimal 24), while yours has 27 low zero bits in pihi. The explanation for this is something like: - I copied pi_hi from another file with slightly different requirements - the choice makes pi_lo positive but pilo negative. The other file might have need pi_lo positive, or just extra zero bits in pi_hi. Most likely just the latter. - your 27 is apparently from the calculation 53 - 53 / 2 for almost even splitting of x, but the splitting of x is uneven (24 + 29). >>> I'll need to wait until the weekend to study your improved sin_pi. >> .. >> However, both versions fail tests. Min has a max error of 2048 ulps >> and an everage error of 928 ulps and yours has a maximum error of many >> giga-ups for the sign bit wrong and an average error of 561 ulps. The >> enormous errors are probably for denormals or NaNs. The were mostly incomplete setup of the tests. After fixing this, many real bugs were found: - 2 or 3 wrong magic numbers in your version (different ones for sinpi() and sinpil()) - wrong quadrant in my version - incompatible signs in my version (sign errors are reported as value errors in the exa-ulp range, so my tests can't pass with even 1) - the non-use of double_t breaks sinpi() completely, and gives errors of up to about 1.4 ulps (slightly worse than a simple multiplication by M_PI). Using double_t is no better, since the kernels don't support it. hi+lo ends up with extra precision in hi either way, and this gets destroyed on calling the kernel. The tests find this by switching the mode to extra precision to actually test it. The bug would be more obvious using the same algorithm in float precision, since then the default precision would be extra. This can be fixed using STRICT_ASSIGN() or volatile hacks in _2sumF(), but that would turn _2sumF() into back into one of its slow development versions. Options for controlling this were intentionally left out to get an easy to ise API for _2sumF(). The quick fix of declaring hi and lo volatile seems to work, but is probably slower than doing it in _2sumnvF() and is certainly worse without extra precision. Only 1 more strict assignment is needed. The quick fix of compiling with -ffloat-store works. > You're correct that I did not give much consideration to subnormal > number. For float this, can be fixed by > > if (ix < 0x38800000) { /* |x| < 0x1p-14 */ > if (x == 0) > return (x); > SET_FLOAT_WORD(hi, hx & 0xffff0000); > if (ix <= 0x0b7f0001) { /* Avoid spurios underflow. */ > hi *= 0x1p23F; > lo = x * 0x1p23F - hi; > s = (pi_lo + pi_hi) * lo + pi_lo * hi + > pi_hi * hi; > return (s * 0x1p-23F); > } > lo = x - hi; > s = (pi_lo + pi_hi) * lo + pi_lo * hi + > pi_hi * hi; > return (s); > } Far too verbose. The unfixed version was bad enough. > or if you want to avoid the extra if-statement. Simply scale > > if (ix < 0x38800000) { /* |x| < 0x1p-14 */ > if (x == 0) > return (x); > SET_FLOAT_WORD(hi, hx & 0xffff0000); > hi *= 0x1p23F; > lo = x * 0x1p23F - hi; > s = (pi_lo + pi_hi) * lo + pi_lo * hi + > pi_hi * hi; > return (s * 0x1p-23F); > } Better, but this is bogus for float. Just multiply by M_PI in double precision, as is done everywhere else in the function. My version for double precision moves this multiplication to a kernel to defend against n-tuplication, but botches the scaling by splitting x before scaling it: XX static inline double XX __kernel_mpi(double x) XX { XX double hi, lo; XX XX hi = (float)x; XX lo = x - hi; XX return (0x1p1000 * pi_lo * x + 0x1p1000 * pi_hi * lo + XX 0x1p1000 * pi_hi * hi) * 0x1p-1000; XX } XX ... XX if (ix < 0x3fd00000) { /* |x| < 0.25 */ XX if (ix < 0x3e200000) { /* |x| < 2**-29 */ XX if (x == 0) XX return x; XX if ((int)x == 0) /* generate inexact if x != 0 */ XX return __kernel_mpi(x); XX } XX return __kernel_sinpi(x); XX } The hi+lo decomposition can also be done in a kernel. This requires returning hi and lo indirectly, which is fast enough with inlining to make the indirections not actually indirect. My exp kernels return hi, lo, and also (approx log2(x)) indirectly, where hi and lo are closer to the final result. Other exp kernels re-assemble with several variations of (hi + lo) * 2**k and return that directly. My sinpif() now passes all tests and gives the same results as yours, except possibly near 0. The maximum error in float precision is about 0.54 ulps. This is much larger than 0.5009 ulps for sin() and cos(). I have no explanation. The approx. to pi should give only 12 bits of inaccuracy, so the errror shouldn't be much more than 1/1024 more. XX float XX sinpif(float x) XX { XX float s,y,z; XX int32_t hx,ix; XX int n; XX XX GET_FLOAT_WORD(hx,x); XX ix = hx & 0x7fffffff; XX XX if (ix < 0x3e800000) { /* |x| < 0.25 */ XX if (ix < 0x38800000) /* |x| < 2**-14 */ XX if(((int)x)==0) return M_PI*x; /* pi*x with inexact if x != 0 */ XX return __kernel_sindf(M_PI*x); XX } We multiply by M_PI for the usual case, so it is silly not to multiply by it when x is small. XX XX if (ix>=0x7f800000) return x-x; /* Inf or NaN -> NaN */ XX XX if (ix >= 0x4b800000) return (x < 0 ? -0.0F : 0); /* even int -> +-0 */ You said that n1950.pdf documents returning +0 at even integers and -0 at odd intgers. I couldn't find n1950.pdf, and your version doesn't do anything like that. It just returns +0 for all nonnegative integers and -0 for all negative integers, as actually makes sense. The threshold in the above is to classify even integers so as to possibly return +0 for them and -0 as part of the general classification below. It can be reduced to possibly give special cases for odd and even large integers above, as in your versions. XX XX y = fabs(x); XX XX STRICT_ASSIGN(float,z,y+0x1p21F); XX GET_FLOAT_WORD(n,z); /* bits for rounded y (units 0.25) */ XX z -= 0x1p21F; /* y rounded to a multiple of 0.25 */ XX if (z > y) { XX z -= 0.25F; /* adjust to round down */ XX n--; XX } XX n &= 7; /* octant of y mod 2 */ XX y -= z; /* y mod 0.25 */ XX XX if (n >= 3 && n < 7) XX s = -1; XX else XX s = 1; XX if (x < 0) XX s = -s; XX n &= 3; XX if (y == 0 && n == 0) XX return (x < 0 ? -0.0F : 0); I don't like the complications for signs, but they seem to execute fast enough using direct logic. On modern OOE x86, 's' can be calculated in parallel, so it only costs about 4 cycles of latency to assemble with it at the end, and my tests de-emphasize latency a bit too much. XX if (n == 0) XX return s*__kernel_sindf(M_PI*y); XX else if (n == 1) XX return s*__kernel_cosdf(M_PI*(y-0.25F)); XX else if (n == 2) XX return s*__kernel_cosdf(M_PI*y); XX else XX return s*__kernel_sindf(M_PI*(y-0.25F)); XX } Bruce From owner-freebsd-numerics@freebsd.org Thu May 18 23:47:30 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2DD05D73CB5 for ; Thu, 18 May 2017 23:47:30 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 14C831763 for ; Thu, 18 May 2017 23:47:30 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4INlMhu077565 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 18 May 2017 16:47:22 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4INlMRp077564; Thu, 18 May 2017 16:47:22 -0700 (PDT) (envelope-from sgk) Date: Thu, 18 May 2017 16:47:22 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170518234722.GB77471@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> <20170517094848.A52247@besplex.bde.org> <20170517180927.GA54431@troutmask.apl.washington.edu> <20170518072636.E951@besplex.bde.org> <20170518014226.GA63819@troutmask.apl.washington.edu> <20170518154820.A8280@besplex.bde.org> <20170518175434.GA74453@troutmask.apl.washington.edu> <20170519074512.M884@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170519074512.M884@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 May 2017 23:47:30 -0000 On Fri, May 19, 2017 at 08:56:27AM +1000, Bruce Evans wrote: > On Thu, 18 May 2017, Steve Kargl wrote: > > > On Thu, May 18, 2017 at 04:34:57PM +1000, Bruce Evans wrote: > >> On Wed, 17 May 2017, Steve Kargl wrote: > >> > >>> On Thu, May 18, 2017 at 09:25:58AM +1000, Bruce Evans wrote: > >>>> On Wed, 17 May 2017, Steve Kargl wrote: > >>> ... > >>>>> As such, I've added a comment > >>>>> > >>>>> /* > >>>>> * pi_hi contains the leading 56 bits of a 169 bit approximation for pi. > >>>>> */ > >>>> > >>>> Why 56? 53 + 56 = 109. > >>> > >>> This is ld128. > >> > >> ld128 has precision 113. > > > > Yes. I know. > > > >>> static const long double > >>> pi_hi = 3.14159265358979322702026593105983920e+00L, > >>> pi_lo = 1.14423774522196636802434264184180742e-17L; > >>> > >>> pi_hi has 113/2 = 56 bits. > >>> pi_lo has 113 bit. > >>> > >>> 56 + 113 = 169 > >> > >> But hi is set intentionally sloppily by converting to double, so it has > >> 53 bits instead of the 57 that 56 would go with (57 = 113/2 rounded up). > > > > A 53-bit entity times a 56-bit entity needs 109-bits for the result. > > A 113-bit entity can hold a 109-bit entity. Or am I missing something? > > It wastes 4 bits. This reader has to guess if this is intentional for > magic elsewhere, or just sloppy. > > Also, the rounded value might naturally have some extra low zero bits. > So looking at the bits, it is not easy to see if there are the right > number. This is not worth a comment. The reader should trust that > the right (maximal) number of bits are used, and only check it > approximately. Unfortunately, I don't have the capibility to easily convert an ld128 number to hex; otherwise, I would include a comment with the hex representation. > The double precision pi*hi and pi*lo also seemed to be sloppily > rounded. My version has less precision (28 low zero bits in pi_hi > instead of the minimal 24), while yours has 27 low zero bits in > pihi. The explanation for this is something like: > - I copied pi_hi from another file with slightly different requirements > - the choice makes pi_lo positive but pilo negative. The other file > might have need pi_lo positive, or just extra zero bits in pi_hi. > Most likely just the latter. > - your 27 is apparently from the calculation 53 - 53 / 2 for almost > even splitting of x, but the splitting of x is uneven (24 + 29). The last one is the closest. I did p/2+1. Yes, the splitting for x is too sloppy. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Fri May 19 06:52:52 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D0B69D74F2C for ; Fri, 19 May 2017 06:52:52 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 5576513B0 for ; Fri, 19 May 2017 06:52:51 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 1706B42B46B; Fri, 19 May 2017 16:52:02 +1000 (AEST) Date: Fri, 19 May 2017 16:52:02 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions In-Reply-To: <20170518234722.GB77471@troutmask.apl.washington.edu> Message-ID: <20170519153326.A20099@besplex.bde.org> References: <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> <20170517094848.A52247@besplex.bde.org> <20170517180927.GA54431@troutmask.apl.washington.edu> <20170518072636.E951@besplex.bde.org> <20170518014226.GA63819@troutmask.apl.washington.edu> <20170518154820.A8280@besplex.bde.org> <20170518175434.GA74453@troutmask.apl.washington.edu> <20170519074512.M884@besplex.bde.org> <20170518234722.GB77471@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=nyoC9Yw7XdoYBhFG8wcA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 May 2017 06:52:52 -0000 I fixed all bugs found by my tests: - sinpif() and cospif() were unnecessarily inaccurate. Now they are have the same maximum error of 0.5009 ulps as sinf() and cosf(), and more accuracy on average. - sinpi(), cospi() and tanpi() were broken on i386 with extra precision. Now they have a smaller error than sin(), etc. (about 0.55 ulps; the bug gave 1.5 ulps). - fix cospi*(odd integer) returning +1 for odd integers. Clean up related but accidentally working code for sinpi*() and tanpi*() (12 copies altogether). - fix broken threshold for Infs and NaNs in sinpi(), cosp() and tanpi() - try to fix off-by-1 error in translation of floorl() for ld80. This seems to work. The bug showed up as 2**62 being treated as 2**63, so args above 2**62 and below 2**63 only worked accidentally for sinl(), cosl() and tanl(). - fix the return value for NaNs. XX diff -u2 k_cospi.c~ k_cospi.c XX --- k_cospi.c~ Fri May 19 09:05:14 2017 XX +++ k_cospi.c Fri May 19 12:02:24 2017 XX @@ -35,5 +35,8 @@ XX __kernel_cospi(double x) XX { XX - double hi, lo; XX + /* XXX: volatile hack to abuse _2sumF(): */ XX + volatile double hi; XX + double lo; XX + XX hi = (float)x; XX lo = x - hi; This hack is sufficent to get strict assignment to hi. It is a bit slow. It costs about 30% on i386 (70 cycles increased to 90) in my version. XX diff -u2 k_sinpi.c~ k_sinpi.c XX --- k_sinpi.c~ Fri May 19 09:05:14 2017 XX +++ k_sinpi.c Fri May 19 12:02:56 2017 XX @@ -35,5 +35,8 @@ XX __kernel_sinpi(double x) XX { XX - double hi, lo; XX + /* XXX: volatile hack to abuse _2sumF(): */ XX + volatile double hi; XX + double lo; XX + XX hi = (float)x; XX lo = x - hi; XX diff -u2 ld128-s_cospil.c~ ld128-s_cospil.c XX --- ld128-s_cospil.c~ Fri May 19 11:45:43 2017 XX +++ ld128-s_cospil.c Fri May 19 12:30:56 2017 XX @@ -45,5 +45,4 @@ XX cospil(long double x) XX { XX - volatile static const double vzero = 0; XX long double ax, c, xf; XX uint32_t ix; This variable was only used to return wrong results for NaNs. XX @@ -73,5 +72,5 @@ XX if (ax < 0x1p112) { XX XX - xf = floorl(ax); XX + xf = ax - remainderl(ax, 1); /* floorl() without inexact */ XX XX ax -= xf; Try to fix spurious inexact for half-integers (quarter-integers for tanpil*()). I didn't test ld128. I tested ld80 enough to find inconsisties with double. They were much the same as inconsistencies with float. XX @@ -98,12 +97,6 @@ XX XX if (isinf(x) || isnan(x)) XX - return (vzero / vzero); XX + return (x - x); Use the standard convention and method for producing NaNs (+-Inf -> default NaN; NaN -> quiet(same NaN). XX XX - /* XX - * |x| >= 0x1p112 is always an even integer, so return 1. XX - * FIXME: should this raise FE_INEXACT or FE_INVALID. XX - */ XX - if (ax + 1 > 1) XX - ax = 1; XX - return (ax); XX + return (ix >= 0x403f || (lx & 1) == 0 ? 1 : -1); XX } Remove lots of style bugs and make this work. ax + 1 > 1 was always true, so this always returned +1, but it must return -1 at odd integers. The details are intentionally only given in s_cospi.c. XX diff -u2 ld128-s_sinpil.c~ ld128-s_sinpil.c XX --- ld128-s_sinpil.c~ Fri May 19 09:19:00 2017 XX +++ ld128-s_sinpil.c Fri May 19 12:34:02 2017 XX @@ -45,5 +45,4 @@ XX sinpil(long double x) XX { XX - volatile static const double vzero = 0; XX long double ax, s, xf, xhi, xlo; XX uint32_t ix; XX @@ -78,5 +77,5 @@ XX if (ax < 0x1p112) { XX XX - xf = floorl(ax); XX + xf = ax - remainderl(ax, 1); /* floorl() without inexact */ XX XX ax -= xf; XX @@ -106,12 +105,6 @@ XX XX if (isinf(x) || isnan(x)) XX - return (vzero / vzero); XX + return (x - x); XX XX - /* XX - * |x| >= 0x1p112 is always an integer, so return +-0. XX - * FIXME: should this raise FE_INEXACT or FE_INVALID. XX - */ XX - if (ax + 1 > 1) XX - ax = copysignl(0, x); XX - return (ax); XX + return (x < 0 ? -0.0 : 0); XX } The sign wasn't even copied for cospil(). As usual, ax + 1 > 1 is always true, so this returned the correct value of 0 with the sign of x in all cases. I only fixed this because the bug was more serious for cospi*() so I had to edit that, and I wanted to remove the nearly-duplicated style bugs. XX diff -u2 ld128-s_tanpil.c~ ld128-s_tanpil.c XX --- ld128-s_tanpil.c~ Fri May 19 09:15:52 2017 XX +++ ld128-s_tanpil.c Fri May 19 12:34:41 2017 XX @@ -70,5 +70,4 @@ XX tanpil(long double x) XX { XX - volatile static const double vzero = 0; XX long double ax, xf; XX uint32_t ix; XX @@ -97,5 +96,5 @@ XX if (ix < 0x1p112) { XX XX - xf = floorl(ax); XX + xf = ax - remainderl(ax, 1); /* floorl() without inexact */ XX XX ax -= xf; XX @@ -112,12 +111,6 @@ XX /* x = +-inf or nan. */ XX if (isinf(x) || isnan(x)) XX - return (vzero / vzero); XX + return (x - x); XX XX - /* XX - * |x| >= 0x1p53 is always an integer, so return +-0. XX - * FIXME: should this raise FE_INEXACT or FE_INVALID. XX - */ XX - if (ax + 1 > 1) XX - ax = copysignl(0, x); XX - return (ax); XX + return (x < 0 ? -0.0 : 0); XX } XX diff -u2 s_cospi.c~ s_cospi.c XX --- s_cospi.c~ Fri May 19 09:05:16 2017 XX +++ s_cospi.c Fri May 19 12:31:08 2017 XX @@ -74,5 +74,4 @@ XX cospi(double x) XX { XX - volatile static const double vzero = 0; XX double ax, c; XX uint32_t hx, ix, j0, lx; XX @@ -136,14 +135,12 @@ XX } XX XX - if (ix >= 0x7f800000) XX - return (vzero / vzero); XX + if (ix >= 0x7ff00000) XX + return (x - x); The wrong threshold was copied from the float case. XX XX /* XX - * |x| >= 0x1p52 is always an even integer, so return 1. XX - * FIXME: should this raise FE_INEXACT or FE_INVALID? XX + * 0x1p53 <= |x|, then x is an even integer. Otherwise, XX + * 0x1p52 <= |x| < 0x1p53 and is odd iff lx is odd. XX */ XX - if (ax + 1 > 1) XX - ax = 1; XX - return (ax); XX + return (ix >= 0x43400000 || (lx & 1) == 0 ? 1 : -1); XX } XX This gives details of how to decide if an value is an even or odd integer. XX diff -u2 s_cospif.c~ s_cospif.c XX --- s_cospif.c~ Fri May 19 09:05:16 2017 XX +++ s_cospif.c Fri May 19 12:56:24 2017 XX @@ -42,5 +42,4 @@ XX cospif(float x) XX { XX - volatile static const float vzero = 0; XX float ax, c; XX uint32_t ix, j0; XX @@ -100,12 +99,6 @@ XX /* x = +-inf or nan. */ XX if (ix >= 0x7f800000) XX - return (vzero / vzero); XX + return (x - x); XX XX - /* XX - * |x| >= 0x1p23 is always an even integer, so return 1. XX - * FIXME: should this raise FE_INEXACT or FE_INVALID. XX - */ XX - if (ax + 1 > 1) XX - ax = 1; XX - return (ax); XX + return (ix >= 0x4b800000 || (ix & 1) == 0 ? 1 : -1); XX } XX diff -u2 s_cospil.c~ s_cospil.c XX --- s_cospil.c~ Fri May 19 09:22:48 2017 XX +++ s_cospil.c Fri May 19 12:31:28 2017 XX @@ -48,5 +48,4 @@ XX cospil(long double x) XX { XX - volatile static const double vzero = 0; XX long double ax, c; XX uint32_t j0; XX @@ -87,5 +86,5 @@ XX lx &= ~(((lx << 32)-1) >> j0); XX } else { XX - uint64_t m = (uint64_t)-1 >> (j0 + 1); XX + uint64_t m = (uint64_t)-1 >> j0; XX if (lx & m) lx &= ~m; XX } XX @@ -118,12 +117,6 @@ XX XX if (ix >= 0x7fff) XX - RETURNI(vzero / vzero); XX + RETURNI(x - x); XX XX - /* XX - * |x| >= 0x1p63 is always an even integer, so return 1. XX - * FIXME: should this raise FE_INEXACT or FE_INVALID. XX - */ XX - if (ax + 1 > 1) XX - ax = 1; XX - RETURNI(ax); XX + RETURNI(ix >= 0x403f || (lx & 1) == 0 ? 1 : -1); XX } XX diff -u2 s_sinpi.c~ s_sinpi.c XX --- s_sinpi.c~ Fri May 19 09:05:16 2017 XX +++ s_sinpi.c Fri May 19 13:44:44 2017 XX @@ -77,5 +77,4 @@ XX sinpi(double x) XX { XX - volatile static const double vzero = 0; XX double ax, s; XX uint32_t hx, ix, j0, lx; XX @@ -87,5 +86,5 @@ XX if (ix < 0x3ff00000) { /* |x| < 1 */ XX if (ix < 0x3fd00000) { /* |x| < 0.25 */ XX - if (ix < 0x3e200000) { /* |x| < 0x1p-29 */ XX + if (ix < 0x3de00000) { /* |x| < 2**-33 */ XX if (x == 0) XX return (x); Reduce this threshold by a large factor for sinpi() and tanpi(), and by an even larger factor for sinpif() and tanpif(). For some reason, the x for sin(x), etc. is just as accurate as the poly up to a much higher threshold than pi*x for sinpi(x), etc. This reduces the maximum error for sinpif() from ~0.54 ulps to the same ~0.5009 ulps as for sinf(). For tanpif(), it only reduces the error like that for small x (tanpif()'s max error is large elswhere). For sinpi() and tanpi(), similarly except sinpi()'s maximum error is smaller than tanpi()'s (it seems to be smaller than sin()'s too -- I haven't seen it larger than 0.5312 ulps, where .0312 is from the poly error of 2**-5). For sinpil() and tanpil(), I didn't change this since I can't test it. This might depend on extra precision. Accuracy was thrown away for some other functions by increasing the threshold to what works without extra precision. It is unusual to even test with extra precision for doubles. Start fixing style bugs -- use the Fortran spelling of powers and not C hex FP values when the code doesn't use the latter. XX @@ -147,14 +146,8 @@ XX } XX XX - if (ix >= 0x7f800000) XX - return (vzero / vzero); XX + if (ix >= 0x7ff00000) XX + return (x - x); XX XX - /* XX - * |x| >= 0x1p52 is always an integer, so return +-0. XX - * FIXME: should this raise FE_INEXACT or FE_INVALID? XX - */ XX - if (ax + 1 > 1) XX - ax = copysign(0, x); XX - return (ax); XX + return (x < 0 ? -0.0 : 0); XX } XX XX diff -u2 s_sinpif.c~ s_sinpif.c XX --- s_sinpif.c~ Fri May 19 09:05:16 2017 XX +++ s_sinpif.c Fri May 19 13:22:55 2017 XX @@ -46,5 +46,4 @@ XX sinpif(float x) XX { XX - volatile static const float vzero = 0; XX float ax, s; XX uint32_t hx, ix, j0; XX @@ -56,5 +55,5 @@ XX if (ix < 0x3f800000) { /* |x| < 1 */ XX if (ix < 0x3e800000) { /* |x| < 0.25 */ XX - if (ix < 0x38800000) { /* |x| < 0x1p-14 */ XX + if (ix < 0x34800000) { /* |x| < 2**-22 */ XX if (x == 0) XX return (x); XX @@ -111,12 +110,6 @@ XX /* x = +-inf or nan. */ XX if (ix >= 0x7f800000) XX - return (vzero / vzero); XX + return (x - x); XX XX - /* XX - * |x| >= 0x1p23 is always an integer, so return +-0. XX - * FIXME: should this raise FE_INEXACT or FE_INVALID? XX - */ XX - if (ax + 1 > 1) XX - ax = copysignf(0, x); XX - return (ax); XX + return (x < 0 ? -0.0F : 0); XX } XX diff -u2 s_sinpil.c~ s_sinpil.c XX --- s_sinpil.c~ Fri May 19 09:23:00 2017 XX +++ s_sinpil.c Fri May 19 12:34:28 2017 XX @@ -48,5 +48,4 @@ XX sinpil(long double x) XX { XX - volatile static const double vzero = 0; XX long double ax, s; XX uint32_t j0; XX @@ -91,5 +90,5 @@ XX lx &= ~(((lx << 32)-1) >> j0); XX } else { XX - uint64_t m = (uint64_t)-1 >> (j0 + 1); XX + uint64_t m = (uint64_t)-1 >> j0; XX if (lx & m) lx &= ~m; XX } XX @@ -125,12 +124,6 @@ XX /* x = +-inf or nan. */ XX if (ix >= 0x7fff) XX - RETURNI(vzero / vzero); XX + RETURNI(x - x); XX XX - /* XX - * |x| >= 0x1p63 is always an integer, so return +-0. XX - * FIXME: should this raise FE_INEXACT or FE_INVALID. XX - */ XX - if (ax + 1 > 1) XX - ax = copysignl(0, x); XX - RETURNI(ax); XX + RETURNI(x < 0 ? -0.0 : 0); XX } XX diff -u2 s_tanpi.c~ s_tanpi.c XX --- s_tanpi.c~ Fri May 19 09:05:16 2017 XX +++ s_tanpi.c Fri May 19 14:02:34 2017 XX @@ -78,5 +78,7 @@ XX __kernel_tanpi(double x) XX { XX - double hi, lo, t; XX + /* XXX: volatile hack to abuse _2sumF(): */ XX + volatile double hi; XX + double lo, t; XX XX if (x < 0.25) { XX @@ -104,5 +106,4 @@ XX tanpi(double x) XX { XX - volatile static const double vzero = 0; XX double ax, t; XX uint32_t hx, ix, j0, lx; XX @@ -114,5 +115,5 @@ XX if (ix < 0x3ff00000) { /* |x| < 1 */ XX if (ix < 0x3fe00000) { /* |x| < 0.5 */ XX - if (ix < 0x3e200000) { /* |x| < 0x1p-29 */ XX + if (ix < 0x3df00000) { /* |x| < 0x1p-32 */ XX if (x == 0) XX return (x); XX @@ -156,14 +157,8 @@ XX XX /* x = +-inf or nan. */ XX - if (ix >= 0x7f800000) XX - return (vzero / vzero); XX + if (ix >= 0x7ff00000) XX + return (x - x); XX XX - /* XX - * |x| >= 0x1p52 is always an integer, so return +-0. XX - * FIXME: should this raise FE_INEXACT or FE_INVALID? XX - */ XX - if (ax + 1 > 1) XX - ax = copysign(0, x); XX - return (ax); XX + return (x < 0 ? -0.0 : 0); XX } XX XX diff -u2 s_tanpif.c~ s_tanpif.c XX --- s_tanpif.c~ Fri May 19 09:05:16 2017 XX +++ s_tanpif.c Fri May 19 13:20:29 2017 XX @@ -57,5 +57,4 @@ XX tanpif(float x) XX { XX - volatile static const float vzero = 0; XX float ax, t; XX uint32_t hx, ix, j0; XX @@ -67,5 +66,5 @@ XX if (ix < 0x3f800000) { /* |x| < 1 */ XX if (ix < 0x3f000000) { /* |x| < 0.5 */ XX - if (ix < 0x38800000) { /* |x| < 0x1p-14 */ XX + if (ix < 0x34800000) { /* |x| < 2**-22 */ XX if (ix == 0) XX return (x); XX @@ -104,12 +103,6 @@ XX /* x = +-inf or nan. */ XX if (ix >= 0x7f800000) XX - return (vzero / vzero); XX + return (x - x); XX XX - /* XX - * |x| >= 0x1p23 is always an integer, so return +-0. XX - * FIXME: should this raise FE_INEXACT or FE_INVALID. XX - */ XX - if (ax + 1 > 1) XX - ax = copysignf(0, x); XX - return (ax); XX + return (x < 0 ? -0.0F : 0); XX } XX diff -u2 s_tanpil.c~ s_tanpil.c XX --- s_tanpil.c~ Fri May 19 09:15:55 2017 XX +++ s_tanpil.c Fri May 19 12:35:05 2017 XX @@ -71,5 +71,4 @@ XX tanpil(long double x) XX { XX - volatile static const double vzero = 0; XX long double ax, t; XX uint32_t j0; XX @@ -109,5 +108,5 @@ XX lx &= ~(((lx << 32)-1) >> j0); XX } else { XX - uint64_t m = (uint64_t)-1 >> (j0 + 1); XX + uint64_t m = (uint64_t)-1 >> j0; XX if (lx & m) lx &= ~m; XX } XX @@ -128,12 +127,6 @@ XX /* x = +-inf or nan. */ XX if (ix >= 0x7fff) XX - RETURNI(vzero / vzero); XX + RETURNI(x - x); XX XX - /* XX - * |x| >= 0x1p63 is always an integer, so return +-0. XX - * FIXME: should this raise FE_INEXACT or FE_INVALID. XX - */ XX - if (ax + 1 > 1) XX - ax = copysignl(0, x); XX - RETURNI(ax); XX + RETURNI(x < 0 ? -0.0 : 0); XX } I now have complete and debugged versions of sinpif() and sinpil(). They are still about twice as fast, and competitive with sinf() etc. instead of more than twice as slow. They needed further fixes for classifications of half-integers and quarter-integers near the threshold. lgamma*() was broken (probably by us) by not doing this right, but for lgamma() these errors are hidden in the noise of other large errors. Here my a complete sinpinf(). It grew larger and slower than I like to become correct: YY #include YY YY #include "math.h" YY #include "math_private.h" YY YY float YY ysinpif(float x) YY { YY float s,y,z; YY int32_t hx,ix; YY int n; YY YY GET_FLOAT_WORD(hx,x); YY ix = hx & 0x7fffffff; YY YY if (ix < 0x3e800000) { /* |x| < 0.25 */ YY if (ix < 0x34800000) /* |x| < 2**-22 */ YY if(((int)x)==0) return M_PI*x; /* pi*x with inexact if x != 0 */ YY return __kernel_sindf(M_PI*x); YY } YY YY if (ix>=0x7f800000) return x-x; /* Inf or NaN -> NaN */ YY YY if (ix >= 0x4b000000) return (x < 0 ? -0.0F : 0); /* integer -> +-0 */ YY YY y = fabs(x); YY YY if (ix >= 0x4a800000) { /* half-integer */ YY n = (ix & 3) * 2; YY y = 0; YY } else if (ix >= 0x4a000000) { /* quarter-integer */ YY n = ix & 7; YY y = 0; The half-integer and quarter-intger classification is new. Adding 0x1p21F to classify only works below 0x1p21F. The commments should translate the thresholds. The version for lgamma*() in -current is broken too, but the error is reduced by a preliminary classification of all integers in the above ranges. Adding 0x1p21F doesn't work for these either. So lgamma*() only gives wrong results for half-integers and quarter-integers in the above range. I think gamma*() overflows on these values, and the errors are relatively not so enormous for lgamma*(). YY } else { YY STRICT_ASSIGN(float,z,y+0x1p21F); YY GET_FLOAT_WORD(n,z); /* bits for rounded y (units 0.25) */ YY z -= 0x1p21F; /* y rounded to a multiple of 0.25 */ YY if (z > y) { YY z -= 0.25F; /* adjust to round down */ YY n--; YY } YY n &= 7; /* octant of y mod 2 */ YY y -= z; /* y mod 0.25 */ YY } YY YY if (n >= 3 && n < 7) YY s = -1; YY else YY s = 1; YY if (x < 0) YY s = -s; YY n &= 3; YY if (y == 0 && n == 0) YY return (x < 0 ? -0.0F : 0); YY if (n == 0) YY return s*__kernel_sindf(M_PI*y); YY else if (n == 1) YY return s*__kernel_cosdf(M_PI*(y-0.25F)); YY else if (n == 2) YY return s*__kernel_cosdf(M_PI*y); YY else YY return s*__kernel_sindf(M_PI*(y-0.25F)); YY } This still has style bugs suitable for copying back to lgamma*. Bruce From owner-freebsd-numerics@freebsd.org Sat May 20 00:58:14 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 36C77D738D7 for ; Sat, 20 May 2017 00:58:14 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 95D5D18AD for ; Sat, 20 May 2017 00:58:13 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id B8DA8104973C; Sat, 20 May 2017 10:58:04 +1000 (AEST) Date: Sat, 20 May 2017 10:58:03 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans cc: Steve Kargl , freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions In-Reply-To: <20170519153326.A20099@besplex.bde.org> Message-ID: <20170520105748.X1017@besplex.bde.org> References: <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> <20170517094848.A52247@besplex.bde.org> <20170517180927.GA54431@troutmask.apl.washington.edu> <20170518072636.E951@besplex.bde.org> <20170518014226.GA63819@troutmask.apl.washington.edu> <20170518154820.A8280@besplex.bde.org> <20170518175434.GA74453@troutmask.apl.washington.edu> <20170519074512.M884@besplex.bde.org> <20170518234722.GB77471@troutmask.apl.washington.edu> <20170519153326.A20099@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=WvBbCZXv c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=rYtKBivQexIEMgcjFssA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 May 2017 00:58:14 -0000 On Fri, 19 May 2017, Bruce Evans wrote: > I fixed all bugs found by my tests: > ... > - sinpi(), cospi() and tanpi() were broken on i386 with extra precision. > Now they have a smaller error than sin(), etc. (about 0.55 ulps; the > bug gave 1.5 ulps). > ... > > XX diff -u2 k_cospi.c~ k_cospi.c > XX --- k_cospi.c~ Fri May 19 09:05:14 2017 > XX +++ k_cospi.c Fri May 19 12:02:24 2017 > XX @@ -35,5 +35,8 @@ > XX __kernel_cospi(double x) > XX { > XX - double hi, lo; > XX + /* XXX: volatile hack to abuse _2sumF(): */ > XX + volatile double hi; > XX + double lo; > XX + > XX hi = (float)x; > XX lo = x - hi; > > This hack is sufficent to get strict assignment to hi. It is a bit slow. > It costs about 30% on i386 (70 cycles increased to 90) in my version. I fixed this more cleanly in the infrastructure: XX diff -u2 math_private.h~ math_private.h XX --- math_private.h~ Tue Jun 4 21:39:14 2013 XX +++ math_private.h Sun Nov 27 17:58:57 2005 XX @@ -21,4 +21,6 @@ XX #include XX XX +#include /* XXX: pollution for STRICT_ASSIGN() */ XX + XX /* XX * The original fdlibm code used statements like: XX @@ -322,5 +537,5 @@ XX __typeof(a) __s, __w; \ XX \ XX - __w = (a) + (b); \ XX + STRICT_ASSIGN(__typeof(a), __w, (a) + (b)); \ XX __s = __w - (a); \ XX (b) = ((a) - (__w - __s)) + ((b) - __s); \ XX @@ -355,4 +570,8 @@ XX * reduce their own extra-precision and efficiciency problems. In XX * particular, they shouldn't convert back and forth just to call here. XX + * XX + * Update: the 2sum algorithms now use STRICT_ASSIGN(), so they should work XX + * with any FP type, and only be slow if that type is not float_t, double_t XX + * or long double. XX */ XX #ifdef DEBUG XX @@ -365,5 +584,5 @@ XX assert(__ia == 0 || fabsl(__ia) >= fabsl(__ib)); \ XX \ XX - __w = (a) + (b); \ XX + STRICT_ASSIGN(__typeof(a), __w, (a) + (b)); \ XX (b) = ((a) - __w) + (b); \ XX (a) = __w; \ XX @@ -380,5 +599,5 @@ XX __typeof(a) __w; \ XX \ XX - __w = (a) + (b); \ XX + STRICT_ASSIGN(__typeof(a), __w, (a) + (b)); \ XX (b) = ((a) - __w) + (b); \ XX (a) = __w; \ Other fixes: my tests only passed because they were on i386 with extra precision for doubles. On amd64 and on i386 with the default precision for doubles, the problem with the hi+lo decomposition for denormals showed up as errors of about 1.5 ulps for sinpif(), tanpif(), sinpi() and tanpi(). Previously-discussed fixes for this worked. I rewrote them from scratch (not very cleanly -- they should be in a header file, as should be the kernels. 1 header file can keep kernels and functions to multiply by pi for all precisions): XX diff -u2 s_sinpi.c~ s_sinpi.c XX --- s_sinpi.c~ Fri May 19 09:05:16 2017 XX +++ s_sinpi.c Fri May 19 19:52:07 2017 XX @@ -77,6 +77,5 @@ XX sinpi(double x) XX { XX - volatile static const double vzero = 0; XX - double ax, s; XX + double ax, hi, lo, s; XX uint32_t hx, ix, j0, lx; XX XX @@ -87,12 +86,15 @@ XX if (ix < 0x3ff00000) { /* |x| < 1 */ XX if (ix < 0x3fd00000) { /* |x| < 0.25 */ XX - if (ix < 0x3e200000) { /* |x| < 0x1p-29 */ XX + if (ix < 0x3de00000) { /* |x| < 2**-33 */ XX if (x == 0) XX return (x); XX - INSERT_WORDS(ax, hx, 0); XX - x -= ax; XX - s = pilo * x + pihi * x + pilo * ax XX - + pihi * ax; XX - return (s); XX + if ((int)x == 1) /* gen. inexact if x != 0 */ XX + return (x); /* NOTREACHED */ XX + hi = x; XX + SET_LOW_WORD(hi, 0); XX + hi *= 0x1p1000; XX + lo = 0x1p1000 * x - hi; XX + return ((0x1p1000 * pilo * x + pihi * lo + XX + pihi * hi) * 0x1p-1000); XX } XX This also fixes the style bugs of clobbering x and ax to hold lo and hi, and the pessimization of using INSER_WORDS instead of SET_LOW_WORD(), and the pessimization of not combining low terms, and the bug of not always setting inexact for nonzero x (e.g., for x = 2**-34), and the style bug and pessimization of handling the sign explicitly (rounding to nearest gives oddness automatically exept for x = 0), and has 2 older fixes. Multiplication by pi is remarkably complicated. I noticed that you already avoided the trap of using the usual method of truncating x by converting to float -- this fails for denormals and other small values since floats don't have enough range. Converting long doubles fails similarly. So 2 different methods for multiplying by pi are needed: - a slower general method that works for small values (use bit-fiddling to truncate), and scale up - the usual method of casting to float for medium-sized values - mercifully, a third method that works on large values is not needed. These should be in a header file and not duplicated. But I used the quick fix of duplicating the above for tanpi(). XX diff -u2 s_sinpif.c~ s_sinpif.c XX --- s_sinpif.c~ Fri May 19 09:05:16 2017 XX +++ s_sinpif.c Fri May 19 19:49:33 2017 XX @@ -46,5 +46,4 @@ XX sinpif(float x) XX { XX - volatile static const float vzero = 0; XX float ax, s; XX uint32_t hx, ix, j0; XX @@ -56,14 +55,7 @@ XX if (ix < 0x3f800000) { /* |x| < 1 */ XX if (ix < 0x3e800000) { /* |x| < 0.25 */ XX - if (ix < 0x38800000) { /* |x| < 0x1p-14 */ XX - if (x == 0) XX - return (x); XX - SET_FLOAT_WORD(ax, hx & 0xffff0000); XX - x -= ax; XX - s = pilo * x + pihi * x + pilo * ax XX - + pihi * ax; XX - return (s); XX - } XX - XX + if (ix < 0x34800000) /* |x| < 2**-22 */ XX + if ((int)x == 0) /* gen. inexact if x != 0 */ XX + return (M_PI * x); XX s = __kernel_sinpif(ax); XX return ((hx & 0x80000000) ? -s : s); This is simpler because it can just use double precision. Also fix missing setting of inexact and 1 style bug. This can use essentially the same code as sinf(). It doesn't need a special case for x == 0 since multiplication if +-0 by M_PI preserves the sign. I didn't fix the style bug and pessimal sign handling for non-small x. This should be the same as for sinf() too. The kernel preserves oddness in the default rounding mode except for x == 0 which is not handled by the kernel. My sinpi() needed similar fixes for the scaling. I changed all of its formatting away from fdlibm towards KNF: XX #include XX XX #include "math.h" XX #include "math_private.h" XX XX static const double XX pi_hi = 3.1415926218032837e+00, /* 0x400921fb 0x50000000 */ XX pi_lo = 3.1786509547050787e-08; /* 0x3e6110b4 0x611a5f14 */ XX XX static double XX __kernel_cospi(double x) XX { XX double hi, lo; XX XX hi = (float)x; XX lo = x - hi; XX lo = (pi_lo + pi_hi) * lo + pi_lo * hi; XX hi = pi_hi * hi; XX _2sumF(hi, lo); XX return (__kernel_cos(hi, lo)); XX } XX XX static double XX __kernel_sinpi(double x) XX { XX double hi, lo; XX XX hi = (float)x; XX lo = x - hi; XX lo = (pi_lo + pi_hi) * lo + pi_lo * hi; XX hi = pi_hi * hi; XX _2sumF(hi, lo); XX return (__kernel_sin(hi, lo, 1)); XX } XX XX static inline double XX __kernel_mpi(double x) XX { XX double hi, lo; XX int32_t hx, lx; XX XX EXTRACT_WORDS(hx, lx, x); XX hi = x; XX SET_LOW_WORD(hi, 0); XX hi *= 0x1p1000; XX lo = 0x1p1000 * x - hi; XX return ((0x1p1000 * pi_lo * x + pi_hi * lo + pi_hi * hi) * 0x1p-1000); XX } This is cleaner, but not yet suitable for a kernel in a header file since it does a pessimal second extraction (which should be only of the high word again). Since this is inline, perhaps the compile can optimize away the second extraction. It can do this easily only if the extractions are the same. If the compiler can't do this (perhaps because this isn't inlined), then hx (and lx if available) should be passed to the kernel. The above isn't a general multiplication kernel as originally intended, since it has to scale up to handle small values, so doesn't work for large values. Actually, it would work up to 0x1p53, but be slower. The amount of scaling up is tricky to determine. 0x1p1000 is about half-way in the range of ~2048 exponents, to scale up denormals to nearly 1 so that they obviously aren't denormals. XX XX double XX ysinpi(double x) XX { XX double s, y, z; XX int32_t hx, ix; XX int n; XX XX GET_HIGH_WORD(hx, x); XX ix = hx & 0x7fffffff; XX XX if (ix < 0x3fd00000) { /* |x| < 0.25 */ XX if (ix < 0x3de00000) { /* |x| < 2**-33 */ XX if (x == 0) XX return (x); XX if ((int)x == 0) /* generate inexact if x != 0 */ XX return __kernel_mpi(x); XX } XX return (__kernel_sinpi(x)); XX } XX if (ix >= 0x7ff00000) /* Inf or NaN -> NaN */ XX return (x - x); XX if (ix >= 0x43300000) /* 2**52 <= |x| (integer) -> +-0 */ XX return (x < 0 ? -0.0 : 0); XX if (ix >= 0x43200000) { /* 2**51 <= |x| < 2**52 (int/2) */ XX GET_LOW_WORD(n, x); XX n = (n & 3) * 2; XX y = 0; XX } else if (ix >= 0x43100000) { /* 2**50 <= |x| < 2**51 (int/4) */ XX GET_LOW_WORD(n, x); XX n &= 7; XX y = 0; XX } else { XX y = fabs(x); XX STRICT_ASSIGN(double, z, y + 0x1p50); XX GET_LOW_WORD(n, z); /* bits for rounded y (units 0.25) */ XX z -= 0x1p50; /* y rounded to a multiple of 0.25 */ XX if (z > y) { XX z -= 0.25; /* adjust to round down */ XX n--; XX } XX n &= 7; /* octant of y mod 2 */ XX y -= z; /* y mod 0.25 */ XX } XX XX s = (n >= 3 && n < 7) ? -1 : 1; XX if (x < 0) XX s = -s; XX n &= 3; XX if (y == 0 && n == 0) XX return (x < 0 ? -0.0 : 0); XX if (n == 0) XX return (s * __kernel_sinpi(y)); XX if (n == 1) XX return (s * __kernel_cospi(y - 0.25)); XX if (n == 2) XX return (s * __kernel_cospi(y)); XX return (s * __kernel_sinpi(y - 0.25)); XX } With better-designed kernels, there would be no kernels for cospi or sinpi, and the code would look something like: return (s * __k_cos(k_mpi(y))); where: - __k_cos() is essentially __kernel_cos() with the following fixes: - it must take double_t args to work optimally and simply with _2sumF() - these args should be packaged in a struct for convenience - rename it because its API changed. Fix its verbose name. I'd like to remove the leading underscores too, but inlining many copies of it would be too much. It isn't even inlined now, and inlining it tends to be a pessimizaition. - k_mpi() is the first part of __kernel_cospi() with the hi+lo decomposition packaged as a struct for convenience. It takes a fairly smart compiler to not pessimize such structs (gcc-3.3.3 pessimizes them, but gcc-4.2.1 mostly handles them optimally, and clang always handles them optimally). k_mpi() should always be inline so it doesn't need underscores. The kernels can't quite be type-generic without complications. The truncation part depends on the precision. The fix for _2sumF() is needed mainly because the current kernels don't support float_t or double_t (except the internal one in logl(), and my uncommitted ones for exp() and expf()). Requiring use types without extra precision for _2sumF() was intentional to promote use of double_t, etc., but I forgot that the old kernels didn't support it. Bruce From owner-freebsd-numerics@freebsd.org Sat May 20 14:58:15 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BBA47D767F2 for ; Sat, 20 May 2017 14:58:15 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id A304A1562 for ; Sat, 20 May 2017 14:58:15 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4KEwE43031478 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 20 May 2017 07:58:14 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4KEwDfW031477; Sat, 20 May 2017 07:58:13 -0700 (PDT) (envelope-from sgk) Date: Sat, 20 May 2017 07:58:13 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170520145813.GA31395@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170517094848.A52247@besplex.bde.org> <20170517180927.GA54431@troutmask.apl.washington.edu> <20170518072636.E951@besplex.bde.org> <20170518014226.GA63819@troutmask.apl.washington.edu> <20170518154820.A8280@besplex.bde.org> <20170518175434.GA74453@troutmask.apl.washington.edu> <20170519074512.M884@besplex.bde.org> <20170518234722.GB77471@troutmask.apl.washington.edu> <20170519153326.A20099@besplex.bde.org> <20170520105748.X1017@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170520105748.X1017@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 May 2017 14:58:15 -0000 On Sat, May 20, 2017 at 10:58:03AM +1000, Bruce Evans wrote: > On Fri, 19 May 2017, Bruce Evans wrote: > > XX - __w = (a) + (b); \ > XX + STRICT_ASSIGN(__typeof(a), __w, (a) + (b)); \ > XX (b) = ((a) - __w) + (b); \ > XX (a) = __w; \ > > Other fixes: my tests only passed because they were on i386 with extra > precision for doubles. On amd64 and on i386 with the default precision > for doubles, the problem with the hi+lo decomposition for denormals > showed up as errors of about 1.5 ulps for sinpif(), tanpif(), sinpi() > and tanpi(). Previously-discussed fixes for this worked. I rewrote > them from scratch (not very cleanly -- they should be in a header file, > as should be the kernels. 1 header file can keep kernels and functions > to multiply by pi for all precisions): After you mentioned denormals in another email, I tested at sinpif/cospif. I don't remember the magnitude of the ULP observed, but it was much larger thanm I expected. I added scaling similar to what you have done but used a different constant. > XX diff -u2 s_sinpi.c~ s_sinpi.c > XX --- s_sinpi.c~ Fri May 19 09:05:16 2017 > XX +++ s_sinpi.c Fri May 19 19:52:07 2017 > XX @@ -77,6 +77,5 @@ > XX sinpi(double x) > XX { > XX - volatile static const double vzero = 0; > XX - double ax, s; > XX + double ax, hi, lo, s; I also introduced hi and lo as you have done. If you initially missed my overloading use of ax and x, I assumed others would as well. > XX uint32_t hx, ix, j0, lx; > XX > XX @@ -87,12 +86,15 @@ > XX if (ix < 0x3ff00000) { /* |x| < 1 */ > XX if (ix < 0x3fd00000) { /* |x| < 0.25 */ > XX - if (ix < 0x3e200000) { /* |x| < 0x1p-29 */ > XX + if (ix < 0x3de00000) { /* |x| < 2**-33 */ > XX if (x == 0) > XX return (x); > XX - INSERT_WORDS(ax, hx, 0); > XX - x -= ax; > XX - s = pilo * x + pihi * x + pilo * ax > XX - + pihi * ax; > XX - return (s); > XX + if ((int)x == 1) /* gen. inexact if x != 0 */ > XX + return (x); /* NOTREACHED */ Is this simply to guarantee that inexact is raised? I had assumed that inexact would be raised by one or more if the * and + operation in s. > Multiplication by pi is remarkably complicated. I noticed that you > already avoided the trap of using the usual method of truncating x > by converting to float -- this fails for denormals and other small > values since floats don't have enough range. Converting long doubles > fails similarly. I avoided the trap in some cases, but use cast elsewhere. I'll need to check if those are ok. > So 2 different methods for multiplying by pi are needed: > - a slower general method that works for small values (use bit-fiddling > to truncate), and scale up > - the usual method of casting to float for medium-sized values > - mercifully, a third method that works on large values is not needed. > These should be in a header file and not duplicated. But I used the > quick fix of duplicating the above for tanpi(). > > XX diff -u2 s_sinpif.c~ s_sinpif.c > XX --- s_sinpif.c~ Fri May 19 09:05:16 2017 > XX +++ s_sinpif.c Fri May 19 19:49:33 2017 > XX @@ -46,5 +46,4 @@ > XX sinpif(float x) > XX { > XX - volatile static const float vzero = 0; > XX float ax, s; > XX uint32_t hx, ix, j0; > XX @@ -56,14 +55,7 @@ > XX if (ix < 0x3f800000) { /* |x| < 1 */ > XX if (ix < 0x3e800000) { /* |x| < 0.25 */ > XX - if (ix < 0x38800000) { /* |x| < 0x1p-14 */ > XX - if (x == 0) > XX - return (x); > XX - SET_FLOAT_WORD(ax, hx & 0xffff0000); > XX - x -= ax; > XX - s = pilo * x + pihi * x + pilo * ax > XX - + pihi * ax; > XX - return (s); > XX - } > XX - > XX + if (ix < 0x34800000) /* |x| < 2**-22 */ > XX + if ((int)x == 0) /* gen. inexact if x != 0 */ > XX + return (M_PI * x); > XX s = __kernel_sinpif(ax); > XX return ((hx & 0x80000000) ? -s : s); > > This is simpler because it can just use double precision. It defeats the use of the same algorithm in all precisions. In particular, it is easier to develop an algorithm in float, test it, and then generalize to other precisions. > Also fix missing > setting of inexact and 1 style bug. This can use essentially the same code > as sinf(). It doesn't need a special case for x == 0 since multiplication > if +-0 by M_PI preserves the sign. I didn't fix the style bug and pessimal > sign handling for non-small x. This should be the same as for sinf() too. > The kernel preserves oddness in the default rounding mode except for x == 0 > which is not handled by the kernel. > > XX static inline double > XX __kernel_mpi(double x) > XX { > XX double hi, lo; > XX int32_t hx, lx; > XX > XX EXTRACT_WORDS(hx, lx, x); > XX hi = x; > XX SET_LOW_WORD(hi, 0); > XX hi *= 0x1p1000; > XX lo = 0x1p1000 * x - hi; > XX return ((0x1p1000 * pi_lo * x + pi_hi * lo + pi_hi * hi) * 0x1p-1000); > XX } > > This is cleaner, but not yet suitable for a kernel in a header file since > it does a pessimal second extraction (which should be only of the high word > again). Why the 2nd extraction? Neither hx nor lx are used. Also, int32_t should be uint32_t (or to be in line with math_private.h perhaps u_int32_t). > Since this is inline, perhaps the compile can optimize away the > second extraction. It can do this easily only if the extractions are the > same. If the compiler can't do this (perhaps because this isn't inlined), > then hx (and lx if available) should be passed to the kernel. Why? Neither is used in the kernel. > The above isn't a general multiplication kernel as originally intended, > since it has to scale up to handle small values, so doesn't work for > large values. Actually, it would work up to 0x1p53, but be slower. > > The amount of scaling up is tricky to determine. 0x1p1000 is about > half-way in the range of ~2048 exponents, to scale up denormals to > nearly 1 so that they obviously aren't denormals. > > XX > XX double > XX ysinpi(double x) I'll need to spend part of today studying this function. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Sat May 20 23:52:43 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8B847D76664 for ; Sat, 20 May 2017 23:52:43 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 392671F8 for ; Sat, 20 May 2017 23:52:42 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 84C92104B9A1; Sun, 21 May 2017 09:52:40 +1000 (AEST) Date: Sun, 21 May 2017 09:52:39 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions In-Reply-To: <20170520145813.GA31395@troutmask.apl.washington.edu> Message-ID: <20170521082808.T1142@besplex.bde.org> References: <20170517094848.A52247@besplex.bde.org> <20170517180927.GA54431@troutmask.apl.washington.edu> <20170518072636.E951@besplex.bde.org> <20170518014226.GA63819@troutmask.apl.washington.edu> <20170518154820.A8280@besplex.bde.org> <20170518175434.GA74453@troutmask.apl.washington.edu> <20170519074512.M884@besplex.bde.org> <20170518234722.GB77471@troutmask.apl.washington.edu> <20170519153326.A20099@besplex.bde.org> <20170520105748.X1017@besplex.bde.org> <20170520145813.GA31395@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=k3lLdbPPBgMaOkpglPkA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 May 2017 23:52:43 -0000 On Sat, 20 May 2017, Steve Kargl wrote: > On Sat, May 20, 2017 at 10:58:03AM +1000, Bruce Evans wrote: >> On Fri, 19 May 2017, Bruce Evans wrote: > >> XX uint32_t hx, ix, j0, lx; >> XX >> XX @@ -87,12 +86,15 @@ >> XX if (ix < 0x3ff00000) { /* |x| < 1 */ >> XX if (ix < 0x3fd00000) { /* |x| < 0.25 */ >> XX - if (ix < 0x3e200000) { /* |x| < 0x1p-29 */ >> XX + if (ix < 0x3de00000) { /* |x| < 2**-33 */ >> XX if (x == 0) >> XX return (x); >> XX - INSERT_WORDS(ax, hx, 0); >> XX - x -= ax; >> XX - s = pilo * x + pihi * x + pilo * ax >> XX - + pihi * ax; >> XX - return (s); >> XX + if ((int)x == 1) /* gen. inexact if x != 0 */ >> XX + return (x); /* NOTREACHED */ > > Is this simply to guarantee that inexact is raised? I had > assumed that inexact would be raised by one or more if the > * and + operation in s. I was confused, but accidentally correct. Let x be a power of 2. Then all the multiplications are exact, but the additions are usually inexact. However, the additions may be exact if there is extra precision, and other cases are not clear either. Simpler example: suppose we approximate cos(x) by 1 - x*x/2. Then the result is obviously exact for must powers of 2. It is unclear if poly coeffs have enough low bits nonzero to give inexact for _all_ args in general. Relevant cases here: (1) (M_PI * x), where x is float and a small power of 2, is exact in double precision, but inexact when rounded to float precision. The C standard requires destruction of the extra precision on return, since the return type has lower rank than the nominal expression type. (I use a hack to prevent this.) It is lots of low bits 1 in M_PI that give the extra precision and the destruction of them is what sets inexact. (2) Expressions like: s = pilo * x + pihi * x + pilo * ax + pihi * ax; C allows the rvalue to be evaluated in extra precision. This may be exact, depending on how much extra precision there is. (2a) C requires discarding the extra precision on assignment. (This is now discarding and not destruction, since there must be ways to remove the extra precision intentionally and assignment is the most natural way -- the user should have used float_t to avoid loss.) (2b) However, (2a) is too slow when there actually is extra precision, so x86 non-C compilers don't do it. gcc starting with about gcc-4.8 attempts to be a C99 compiler with certain CFLAGS. clang doesn't support these flags. So s may be exact too. (2c) C11 requires destroying the extra precision on return. (This is now perverse destruction and incompatible with C99. It defeats FLT_EVAL_METHOD giving extra precision for parts of expressions using functions, at least if the functions are written in C. So if s is exact, C11 requires it to become inexact on return. (2d) Howver, (2c) is too slow and broken, so x86 non-C compilers don't do it. gcc starting with about gcc-4.8 attempts to be a C11 compiler with with just -std=c11 in CFLAGS and the user must use the reverse of the more nonstandard CFLAGS to get C99 conformats to avoid this bug. clang doesn't attempt to support this bug. Callers can see the extra precision even if they are compiled with a C compiler: (3a) float f = sinpif(x); C compilers remove any extra precision and set inxact if there was any. (3b) float_t f = sinpif(x); C compilers keep any extra precision, and code should usually be written like this to keep it intentionally. But if sinpif() was compiled by a C11 compiler, it is not allowed to return extra precision. It is unclear if callers compiled with a C11 compiler can depend on sinpif() being compiled with a C11 compiler so that it can't have extra precision. (3c) float_t f or float f; STRICT_ASSIGN(float, f, sinpif(x)); My code to not see any extra precision when testing sinpif() does this. >> XX diff -u2 s_sinpif.c~ s_sinpif.c >> ... >> XX + if (ix < 0x34800000) /* |x| < 2**-22 */ >> XX + if ((int)x == 0) /* gen. inexact if x != 0 */ >> XX + return (M_PI * x); >> XX s = __kernel_sinpif(ax); >> XX return ((hx & 0x80000000) ? -s : s); >> >> This is simpler because it can just use double precision. > > It defeats the use of the same algorithm in all precisions. > In particular, it is easier to develop an algorithm in float, > test it, and then generalize to other precisions. The algorithm is already very different for the usual case since it has to map to the kernels which use double precision. It would be a bit much to restore all the unoptimized kernels that use float precision just to develop the sinpif() family. But I have considered going back to pure float precision to get further optimizations. Switching precisions costs about 2-4 cycles per switch on x86, which is a lot when some of the functions take less than 20 cycles. On i386, the switch should be free, but destruction of extra precision on return costs more than 4 cycles. On and64, there is a switch to promote to double precision on entry and another to demote to float precision on return. >> XX static inline double >> XX __kernel_mpi(double x) >> XX { >> XX double hi, lo; >> XX int32_t hx, lx; >> XX >> XX EXTRACT_WORDS(hx, lx, x); >> XX hi = x; >> XX SET_LOW_WORD(hi, 0); >> XX hi *= 0x1p1000; >> XX lo = 0x1p1000 * x - hi; >> XX return ((0x1p1000 * pi_lo * x + pi_hi * lo + pi_hi * hi) * 0x1p-1000); >> XX } >> >> This is cleaner, but not yet suitable for a kernel in a header file since >> it does a pessimal second extraction (which should be only of the high word >> again). > > Why the 2nd extraction? Neither hx nor lx are used. Also, It is a bug. Larger than I remembered, but the compiler will optimize away the extra extraction. If there is an extraction, then it should match the one in the caller so that it can be optimized away. > int32_t should be uint32_t (or to be in line with math_private.h > perhaps u_int32_t). u[_]int32_t is technically correct for raw bits, but int32_t works too since it is pure 2's complement. fdlibm uses int32_t to do signed comparisons on the result or when it is just sloppy, and I must have copied one of the sloppy places. >> Since this is inline, perhaps the compile can optimize away the >> second extraction. It can do this easily only if the extractions are the >> same. If the compiler can't do this (perhaps because this isn't inlined), >> then hx (and lx if available) should be passed to the kernel. > > Why? Neither is used in the kernel. They need to be the same to avoid repeated insertions or extractions. Since my s_sinpi() only GETs the high word and __kernel_mpi() only does a null extraction followed by a SET of the low word, the operations appear to be independent. However, the only efficient way to do them is 64 bits at a time for each, with operations merged. This would be more needed to clear 53/2 instead of 32 low bits in 'hi'. Then lx would be needed explicitly. But efficient code has to extract and insert 64 bits at a time anyway (but on x86, keep it in unnamed SSE registers unless the bits are used explicitly). Bruce