From owner-freebsd-numerics@freebsd.org  Tue May 16 18:00:17 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 037EFD70897;
 Tue, 16 May 2017 18:00:17 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id D52896D8;
 Tue, 16 May 2017 18:00:16 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4GI080B035243
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Tue, 16 May 2017 11:00:09 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4GI03I7035241;
 Tue, 16 May 2017 11:00:03 -0700 (PDT) (envelope-from sgk)
Date: Tue, 16 May 2017 11:00:02 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-hackers@freebsd.org, freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
Message-ID: <20170516180002.GA35211@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170428165658.GA17560@troutmask.apl.washington.edu>
 <20170429035131.E3406@besplex.bde.org>
 <20170428201522.GA32785@troutmask.apl.washington.edu>
 <20170429070036.A4005@besplex.bde.org>
 <20170428233552.GA34580@troutmask.apl.washington.edu>
 <20170429005924.GA37947@troutmask.apl.washington.edu>
 <20170429151457.F809@besplex.bde.org>
 <20170429194239.P3294@besplex.bde.org>
 <20170513205517.GA91911@troutmask.apl.washington.edu>
 <20170514071942.T1084@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170514071942.T1084@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 May 2017 18:00:17 -0000

On Sun, May 14, 2017 at 08:30:34AM +1000, Bruce Evans wrote:
> On Sat, 13 May 2017, Steve Kargl wrote:
> 
> > Based on other replies in this email exchange, I have gone back
> > and looked at improvements to my __kernel_{cos|sin|tan}pi[fl]
> > routines.  The improvements where for both accuracy and speed.
> 
> I really don't want another set of kernels (or more sets for degrees
> instead of radians, and sincos).

Ugh.  __kernel_tan(x,y,iy) and __kernel_tanl(x,y,iy) seem to
have inverted logic for iy.  That is, iy = 1 for __kernel_tan()
needs to be changed to y = -1 for __kernel_tanl(), and vice versa.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Tue May 16 22:46:21 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9840DD6E8D3
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Tue, 16 May 2017 22:46:21 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 77F01983
 for <freebsd-numerics@freebsd.org>; Tue, 16 May 2017 22:46:21 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4GMkJBH041449
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Tue, 16 May 2017 15:46:19 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4GMkI3u041448;
 Tue, 16 May 2017 15:46:18 -0700 (PDT) (envelope-from sgk)
Date: Tue, 16 May 2017 15:46:18 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
Message-ID: <20170516224618.GA40855@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170409220809.GA25076@troutmask.apl.washington.edu>
 <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170428175851.A1386@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 May 2017 22:46:21 -0000

On Fri, Apr 28, 2017 at 06:00:19PM +1000, Bruce Evans wrote:
> [cc list trimmed]
> 
> On Thu, 27 Apr 2017, Steve Kargl wrote:
> 
> > I have attached a new diff to the bugzilla report.  The
> > diff is 3090 lines and won't be broadcast the mailing list.
> 
> But no one will see it there.
> 

Picking an almost email to reply to.

I have removed my new kernels and written shims to 
use the fdlibm trig kernels.  I know that in some
cases this leads to slower code, but a small (if not
tiny) improvement in accuracy.  This also leads to
a significant reduction in code size.  A couple of
notes:

1) The ld128 routines have not been compiled, and so
   by extension these routines are untested.  I have 
   no access to an ld128 platform with sufficient
   infrastructure for testing.

2) Argument reduction is of the form x = n + r where n
   is an integer and 0 <= r < 1.  Thus, this is indeed
   a half-cycle implementation.  I have found discussions
   of argument reduction of the form x = 2 * n + r.  I
   have no plans to pursue this.  For sinpi and tanpi,
   one must test for even or odd n to determine the
   proper sign. 

3) Patch against trunk follows.

Index: lib/msun/Makefile
===================================================================
--- lib/msun/Makefile	(revision 318383)
+++ lib/msun/Makefile	(working copy)
@@ -59,6 +59,7 @@
 	s_asinh.c s_asinhf.c s_atan.c s_atanf.c s_carg.c s_cargf.c s_cargl.c \
 	s_cbrt.c s_cbrtf.c s_ceil.c s_ceilf.c \
 	s_copysign.c s_copysignf.c s_cos.c s_cosf.c \
+	s_cospi.c s_cospif.c \
 	s_csqrt.c s_csqrtf.c s_erf.c s_erff.c \
 	s_exp2.c s_exp2f.c s_expm1.c s_expm1f.c s_fabsf.c s_fdim.c \
 	s_finite.c s_finitef.c \
@@ -74,7 +75,10 @@
 	s_rint.c s_rintf.c s_round.c s_roundf.c \
 	s_scalbln.c s_scalbn.c s_scalbnf.c s_signbit.c \
 	s_signgam.c s_significand.c s_significandf.c s_sin.c s_sinf.c \
-	s_tan.c s_tanf.c s_tanh.c s_tanhf.c s_tgammaf.c s_trunc.c s_truncf.c \
+	s_sinpi.c s_sinpif.c \
+	s_tan.c s_tanf.c s_tanh.c s_tanhf.c \
+	s_tanpi.c s_tanpif.c \
+	s_tgammaf.c s_trunc.c s_truncf.c \
 	w_cabs.c w_cabsf.c w_drem.c w_dremf.c
 
 # Location of fpmath.h and _fpmath.h
@@ -100,11 +104,16 @@
 	e_lgammal.c e_lgammal_r.c \
 	e_remainderl.c e_sinhl.c e_sqrtl.c \
 	invtrig.c k_cosl.c k_sinl.c k_tanl.c \
-	s_asinhl.c s_atanl.c s_cbrtl.c s_ceill.c s_cosl.c s_cprojl.c \
+	s_asinhl.c s_atanl.c s_cbrtl.c s_ceill.c s_cosl.c \
+	s_cospil.c \
+	s_cprojl.c \
 	s_csqrtl.c s_erfl.c s_exp2l.c s_expl.c s_floorl.c s_fmal.c \
 	s_fmaxl.c s_fminl.c s_frexpl.c s_logbl.c s_logl.c s_nanl.c \
 	s_nextafterl.c s_nexttoward.c s_remquol.c s_rintl.c s_roundl.c \
-	s_scalbnl.c s_sinl.c s_tanhl.c s_tanl.c s_truncl.c w_cabsl.c
+	s_scalbnl.c s_sinl.c \
+	s_sinpil.c \
+	s_tanhl.c s_tanl.c \
+	s_tanpil.c s_truncl.c w_cabsl.c
 .endif
 
 # C99 complex functions
@@ -131,13 +140,19 @@
 
 MAN=	acos.3 acosh.3 asin.3 asinh.3 atan.3 atan2.3 atanh.3 \
 	ceil.3 cacos.3 ccos.3 ccosh.3 cexp.3 \
-	cimag.3 copysign.3 cos.3 cosh.3 csqrt.3 erf.3 exp.3 fabs.3 fdim.3 \
+	cimag.3 copysign.3 cos.3 cosh.3 \
+	cospi.3 \
+	csqrt.3 erf.3 exp.3 fabs.3 fdim.3 \
 	feclearexcept.3 feenableexcept.3 fegetenv.3 \
 	fegetround.3 fenv.3 floor.3 \
 	fma.3 fmax.3 fmod.3 hypot.3 ieee.3 ieee_test.3 ilogb.3 j0.3 \
 	lgamma.3 log.3 lrint.3 lround.3 math.3 nan.3 \
 	nextafter.3 remainder.3 rint.3 \
-	round.3 scalbn.3 signbit.3 sin.3 sinh.3 sqrt.3 tan.3 tanh.3 trunc.3 \
+	round.3 scalbn.3 signbit.3 sin.3 sinh.3 \
+	sinpi.3 \
+	sqrt.3 tan.3 tanh.3 \
+	tanpi.3 \
+	trunc.3 \
 	complex.3
 
 MLINKS+=acos.3 acosf.3 acos.3 acosl.3
@@ -166,6 +181,7 @@
 MLINKS+=copysign.3 copysignf.3 copysign.3 copysignl.3
 MLINKS+=cos.3 cosf.3 cos.3 cosl.3
 MLINKS+=cosh.3 coshf.3 cosh.3 coshl.3
+MLINKS+=cospi.3 cospif.3 cospi.3 cospil.3
 MLINKS+=csqrt.3 csqrtf.3 csqrt.3 csqrtl.3
 MLINKS+=erf.3 erfc.3 erf.3 erff.3 erf.3 erfcf.3 erf.3 erfl.3 erf.3 erfcl.3
 MLINKS+=exp.3 expm1.3 exp.3 expm1f.3 exp.3 expm1l.3 exp.3 pow.3 exp.3 powf.3 \
@@ -216,10 +232,12 @@
 MLINKS+=scalbn.3 scalbnf.3 scalbn.3 scalbnl.3
 MLINKS+=sin.3 sinf.3 sin.3 sinl.3
 MLINKS+=sinh.3 sinhf.3 sinh.3 sinhl.3
+MLINKS+=sinpi.3 sinpif.3 sinpi.3 sinpil.3
 MLINKS+=sqrt.3 cbrt.3 sqrt.3 cbrtf.3 sqrt.3 cbrtl.3 sqrt.3 sqrtf.3 \
 	sqrt.3 sqrtl.3
 MLINKS+=tan.3 tanf.3 tan.3 tanl.3
 MLINKS+=tanh.3 tanhf.3 tanh.3 tanhl.3
+MLINKS+=tanpi.3 tanpif.3 tanpi.3 tanpil.3
 MLINKS+=trunc.3 truncf.3 trunc.3 truncl.3
 
 .include <src.opts.mk>
Index: lib/msun/Symbol.map
===================================================================
--- lib/msun/Symbol.map	(revision 318383)
+++ lib/msun/Symbol.map	(working copy)
@@ -294,4 +294,13 @@
 	casinl;
 	catanl;
 	catanhl;
+	cospi;
+	cospif;
+	cospil;
+	sinpi;
+	sinpif;
+	sinpil;
+	tanpi;
+	tanpif;
+	tanpil;
 };
Index: lib/msun/ld128/k_cospil.c
===================================================================
--- lib/msun/ld128/k_cospil.c	(nonexistent)
+++ lib/msun/ld128/k_cospil.c	(working copy)
@@ -0,0 +1,41 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * See ../src/k_cospi.c for implementation details.
+ */
+
+static inline long double
+__kernel_cospil(long double x)
+{
+	long double hi, lo;
+	hi = (double)x;
+	lo = x - hi;
+	lo = lo * (pilo + pihi) + hi * pilo;
+	hi *= pihi;
+	_2sumF(hi, lo);
+	return (__kernel_cosl(hi, lo));
+}

Property changes on: lib/msun/ld128/k_cospil.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/ld128/k_sinpil.c
===================================================================
--- lib/msun/ld128/k_sinpil.c	(nonexistent)
+++ lib/msun/ld128/k_sinpil.c	(working copy)
@@ -0,0 +1,41 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * See ../src/k_sinpi.c for implementation details.
+ */
+
+static inline long double
+__kernel_sinpil(long double x)
+{
+	long double hi, lo;
+	hi = (double)x;
+	lo = x - hi;
+	lo = lo * (pilo + pihi) + hi * pilo;
+	hi *= pihi;
+	_2sumF(hi, lo);
+	return (__kernel_sinl(hi, lo, 1));
+}

Property changes on: lib/msun/ld128/k_sinpil.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/ld128/s_cospil.c
===================================================================
--- lib/msun/ld128/s_cospil.c	(nonexistent)
+++ lib/msun/ld128/s_cospil.c	(working copy)
@@ -0,0 +1,109 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * See ../src/s_cospi.c for implementation details.
+ *
+ * FIXME:  This has not been compiled nor has it been tested for accuracy.
+ * FIXME:  This should use bit twiddling.
+ */
+
+#include "math.h"
+#include "math_private.h"
+
+static const long double
+pihi = 3.14159265358979322702026593105983920e+00L,
+pilo = 1.14423774522196636802434264184180742e-17L;
+
+#include "k_cospil.c"
+#include "k_sinpil.c"
+
+long double
+cospil(long double x)
+{
+	volatile static const double vzero = 0;
+	long double ax, c, xf;
+	uint32_t ix;
+
+	ax = fabsl(x);
+
+	if (ax < 1) {
+		if (ax < 0.25) {
+			if (ax < 0x1p-60) {
+				if ((int)x == 0)
+					return (1);
+			}
+			return (__kernel_cospil(ax));
+		}
+
+		if (ax < 0.5)
+			c = __kernel_sinpil(0.5 - ax);
+		else if (ax < 0.75) {
+			if (ax == 0.5)
+				return (0);
+			c = -__kernel_sinpil(ax - 0.5);
+		} else
+			c = -__kernel_cospil(1 - ax);
+		return (c);
+	}
+
+	if (ax < 0x1p112) {
+
+		xf = floorl(ax);
+
+		ax -= xf;
+
+		if (x < 0.5) {
+			if (x < 0.25)
+				c = ax == 0 ? 1 : __kernel_cospil(ax);
+			else
+				c = __kernel_sinpil(0.5 - ax);
+		} else {
+			if (x < 0.75) {
+				if (ax == 0.5)
+					return (0);
+				c = -__kernel_sinpil(ax - 0.5);
+			} else
+				c = -__kernel_cospil(1 - ax);
+		}
+
+		if (xf > 0x1p50) xf -= 0x1p50;
+		if (xf > 0x1p30) xf -= 0x1p30;
+		ix = (uint32_t)xf;
+		return (ix & 1 ? -c : c);
+	}
+
+	if (isinf(x) || isnan(x))
+		return (vzero / vzero);
+
+	/*
+	 * |x| >= 0x1p112 is always an even integer, so return 1.
+	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
+	 */
+	if (ax + 1 > 1)
+		ax = 1;
+	return (ax);
+}

Property changes on: lib/msun/ld128/s_cospil.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/ld128/s_sinpil.c
===================================================================
--- lib/msun/ld128/s_sinpil.c	(nonexistent)
+++ lib/msun/ld128/s_sinpil.c	(working copy)
@@ -0,0 +1,117 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * See ../src/s_sinpi.c for implementation details.
+ *
+ * FIXME:  This has not been compiled nor has it been tested for accuracy.
+ * FIXME:  This should use bit twiddling.
+ */
+
+#include "math.h"
+#include "math_private.h"
+
+static const long double
+pihi = 3.14159265358979322702026593105983920e+00L,
+pilo = 1.14423774522196636802434264184180742e-17L;
+
+#include "k_cospil.c"
+#include "k_sinpil.c"
+
+long double
+sinpil(long double x)
+{
+	volatile static const double vzero = 0;
+	long double ax, s, xf, xhi, xlo;
+	uint32_t ix;
+
+	ax = fabsl(x);
+
+	if (ax < 1) {
+		if (ax < 0.25) {
+			if (ax < 0x1p-60) {
+				if (x == 0)
+					return (x);
+				ax = (double)x;
+				x -= ax;
+				s = pilo * x + pihi * x + pilo * ax
+				    + pihi * ax;
+				return (s);
+			}
+
+			s = __kernel_sinpil(ax);
+			return (copysignl(s, x));
+		}
+
+		if (ax < 0.5)
+			s = __kernel_cospil(0.5 - ax);
+		else if (ax < 0.75)
+			s = __kernel_cospil(ax - 0.5);
+		else
+			s = __kernel_sinpil(1 - ax);
+		return (copysignl(s, x));
+	}
+
+	if (ax < 0x1p112) {
+
+		xf = floorl(ax);
+
+		ax -= xf;
+
+		if (ax == 0) {
+			s = 0;
+		} else {
+			if (ax < 0.5) {
+				if (ax <= 0.25)
+					s = __kernel_sinpil(ax);
+				else
+					s = __kernel_cospil(0.5 - ax);
+			} else {
+				if (ax < 0.75)
+					s = __kernel_cospil(ax - 0.5);
+				else
+					s = __kernel_sinpil(1 - ax);
+			}
+
+			if (xf > 0x1p50) xf -= 0x1p50;
+			if (xf > 0x1p30) xf -= 0x1p30;
+			ix = (uint32_t)xf;
+			if (ix & 1) s = -s;
+		}
+		return (copysignl(s, x));
+	}
+
+	if (isinf(x) || isnan(x))
+		return (vzero / vzero);
+
+	/*
+	 * |x| >= 0x1p112 is always an integer, so return +-0.
+	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
+	 */
+	if (ax + 1 > 1)
+		ax = copysignl(0, x);
+	return (ax);
+}

Property changes on: lib/msun/ld128/s_sinpil.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/ld128/s_tanpil.c
===================================================================
--- lib/msun/ld128/s_tanpil.c	(nonexistent)
+++ lib/msun/ld128/s_tanpil.c	(working copy)
@@ -0,0 +1,123 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * See ../src/s_tanpi.c for implementation details.
+ *
+ * FIXME: This has not been compiled nor has it been tested for accuracy.
+ * FIXME: This should use bit twiddling.
+ * FIXME: This should use a polynomial approximation in [0,0.25], but the
+ * FIXME: increase in max ULP is probably small, so who cares.
+ */
+
+#include "math.h"
+#include "math_private.h"
+
+static const long double
+pihi = 3.14159265358979322702026593105983920e+00L,
+pilo = 1.14423774522196636802434264184180742e-17L;
+
+static inline long double
+__kernel_tanpi(long double x)
+{
+	long double hi, lo, t;
+
+	if (x < 0.25) {
+		hi = (double)x;
+		lo = x - hi;
+		lo = lo * (pilo + pihi) + hi * pilo;
+		hi *= pihi;
+		_2sumF(hi, lo);
+		t = __kernel_tanl(hi, lo, -1);
+	} else if (x > 0.25) {
+		x = 0.5 - x;
+		hi = (double)x;
+		lo = x - hi;
+		lo = lo * (pilo + pihi) + hi * pilo;
+		hi *= pihi;
+		_2sumF(hi, lo);
+		t = - __kernel_tanl(hi, lo, 1);
+	} else
+		t = 1;
+
+	return (t);
+}
+
+long double
+tanpil(long double x)
+{
+	volatile static const double vzero = 0;
+	long double ax, xf;
+	uint32_t ix;
+
+	ax = fabsl(ax);
+
+	if (ax < 1) {
+		if (ax < 0.5) {
+			if (ax < 0x1p-60) {
+				if (x == 0)
+					return (x);
+				ax = (double)x;
+				x -= ax;
+				t = pilo * x + pihi * x + pilo * ax
+				    + pihi * ax;
+				return (t);
+			}
+			t = __kernel_tanpil(ax);
+		} else if (ax == 0.5)
+			return ((ax - ax) / (ax - ax));
+		else
+			t = -__kernel_tanpil(1 - ax);
+		return (copysignl(t, x));
+	}
+
+	if (ix < 0x1p112) {
+
+		xf = floorl(ax);
+
+		ax -= xf;
+
+		if (ax < 0.5)
+			t = ax == 0 ? 0 : __kernel_tanpil(ax);
+		else if (ax == 0.5)
+			return ((ax - ax) / (ax - ax));
+		else
+			t = -__kernel_tanpil(1 - ax);
+		return (copysignl(t, x));
+	}
+
+	/* x = +-inf or nan. */
+	if (isinf(x) || isnan(x))
+		return (vzero / vzero);
+
+	/*
+	 * |x| >= 0x1p53 is always an integer, so return +-0.
+	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
+	 */
+	if (ax + 1 > 1)
+		ax = copysignl(0, x);
+	return (ax);
+}

Property changes on: lib/msun/ld128/s_tanpil.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/ld80/k_cospil.c
===================================================================
--- lib/msun/ld80/k_cospil.c	(nonexistent)
+++ lib/msun/ld80/k_cospil.c	(working copy)
@@ -0,0 +1,41 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * See ../src/k_cospi.c for implementation details.
+ */
+
+static inline long double
+__kernel_cospil(long double x)
+{
+	long double hi, lo;
+	hi = (float)x;
+	lo = x - hi;
+	lo = lo * (pilo + pihi) + hi * pilo;
+	hi *= pihi;
+	_2sumF(hi, lo);
+	return (__kernel_cosl(hi, lo));
+}

Property changes on: lib/msun/ld80/k_cospil.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/ld80/k_sinpil.c
===================================================================
--- lib/msun/ld80/k_sinpil.c	(nonexistent)
+++ lib/msun/ld80/k_sinpil.c	(working copy)
@@ -0,0 +1,41 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * See ../src/k_sinpi.c for implementation details.
+ */
+
+static inline long double
+__kernel_sinpil(long double x)
+{
+	long double hi, lo;
+	hi = (float)x;
+	lo = x - hi;
+	lo = lo * (pilo + pihi) + hi * pilo;
+	hi *= pihi;
+	_2sumF(hi, lo);
+	return (__kernel_sinl(hi, lo, 1));
+}

Property changes on: lib/msun/ld80/k_sinpil.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/ld80/s_cospil.c
===================================================================
--- lib/msun/ld80/s_cospil.c	(nonexistent)
+++ lib/msun/ld80/s_cospil.c	(working copy)
@@ -0,0 +1,129 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * See ../src/s_cospi.c for implementation details.
+ */
+
+#ifdef __i386__
+#include <ieeefp.h>
+#endif
+#include <stdint.h>
+
+#include "fpmath.h"
+#include "math.h"
+#include "math_private.h"
+
+static const double
+pihi = 3.1415926814079285e+00,	/* 0x400921fb 0x58000000 */
+pilo =-2.7818135228334233e-08;	/* 0xbe5dde97 0x3dcb3b3a */
+
+#include "k_cospil.c"
+#include "k_sinpil.c"
+
+long double
+cospil(long double x)
+{
+	volatile static const double vzero = 0;
+	long double ax, c;
+	uint32_t j0;
+	uint64_t lx;
+	uint16_t hx, ix;
+
+	EXTRACT_LDBL80_WORDS(hx, lx, x);
+	ix = hx & 0x7fff;
+	INSERT_LDBL80_WORDS(ax, ix, lx);
+
+	ENTERI();
+
+	if (ix < 0x3fff) {			/* |x| < 1 */
+		if (ix < 0x3ffd) {		/* |x| < 0.25 */
+			if (ix < 0x3fdd) {	/* |x| < 0x1p-34 */
+				if ((int)x == 0)
+					RETURNI(1);
+			}
+			RETURNI(__kernel_cospil(ax));
+		}
+
+		if (ix < 0x3ffe)			/* |x| < 0.5 */
+			c = __kernel_sinpil(0.5 - ax);
+		else if (lx < 0xc000000000000000ull) {	/* |x| < 0.75 */
+			if (ax == 0.5)
+				RETURNI(0);
+			c = -__kernel_sinpil(ax - 0.5);
+		} else
+			c = -__kernel_cospil(1 - ax);
+		RETURNI(c);
+	}
+
+	if (ix < 0x403e) {		/* 1 <= |x| < 0x1p63 */
+		/* Determine integer part of ax. */
+		j0 = ix - 0x3fff + 1;
+		if (j0 < 32) {
+			lx = (lx >> 32) << 32;
+			lx &= ~(((lx << 32)-1) >> j0);
+		} else {
+			uint64_t m = (uint64_t)-1 >> (j0 + 1);
+			if (lx & m) lx &= ~m;
+		}
+		INSERT_LDBL80_WORDS(x, ix, lx);
+
+		ax -= x;
+		EXTRACT_LDBL80_WORDS(ix, lx, ax);
+
+		if (ix < 0x3ffe) {			/* |x| < 0.5 */
+			if (ix < 0x3ffd)		/* |x| < 0.25 */
+				c = ix == 0 ? 1 : __kernel_cospil(ax);
+			else
+				c = __kernel_sinpil(0.5 - ax);
+
+		} else {
+			if (lx < 0xc000000000000000ull) { /* |x| < 0.75 */
+				if (ax == 0.5)
+					RETURNI(0);
+				c = -__kernel_sinpil(ax - 0.5);
+			} else
+				c = -__kernel_cospil(1 - ax);
+		}
+
+		if (j0 > 40) x -= 0x1p40;
+		if (j0 > 30) x -= 0x1p30;
+		j0 = (uint32_t)x;
+
+		RETURNI(j0 & 1 ? -c : c);
+	}
+
+	if (ix >= 0x7fff)
+		RETURNI(vzero / vzero);
+
+	/*
+	 * |x| >= 0x1p63 is always an even integer, so return 1.
+	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
+	 */
+	if (ax + 1 > 1)
+		ax = 1;
+	RETURNI(ax);
+}

Property changes on: lib/msun/ld80/s_cospil.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/ld80/s_sinpil.c
===================================================================
--- lib/msun/ld80/s_sinpil.c	(nonexistent)
+++ lib/msun/ld80/s_sinpil.c	(working copy)
@@ -0,0 +1,136 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * See ../src/s_sinpi.c for implementation details.
+ */
+
+#ifdef __i386__
+#include <ieeefp.h>
+#endif
+#include <stdint.h>
+
+#include "fpmath.h"
+#include "math.h"
+#include "math_private.h"
+
+static const double
+pihi = 3.1415926814079285e+00,	/* 0x400921fb 0x58000000 */
+pilo =-2.7818135228334233e-08;	/* 0xbe5dde97 0x3dcb3b3a */
+
+#include "k_cospil.c"
+#include "k_sinpil.c"
+
+long double
+sinpil(long double x)
+{
+	volatile static const double vzero = 0;
+	long double ax, s;
+	uint32_t j0;
+	uint64_t lx;
+	uint16_t hx, ix;
+
+	EXTRACT_LDBL80_WORDS(hx, lx, x);
+	ix = hx & 0x7fff;
+	INSERT_LDBL80_WORDS(ax, ix, lx);
+
+	ENTERI();
+
+	if (ix < 0x3fff) {			/* |x| < 1 */
+		if (ix < 0x3ffd) {		/* |x| < 0.25 */
+			if (ix < 0x3fdd) {	/* |x| < 0x1p-34 */
+				if (x == 0)
+					RETURNI(x);
+				ax = (float)x;
+				x -= ax;
+				s = pilo * x + pihi * x + pilo * ax
+				    + pihi * ax;
+				RETURNI(s);
+			}
+			s = __kernel_sinpil(ax);
+			RETURNI((hx & 0x8000) ? -s : s);
+		}
+
+		if (ix < 0x3ffe)			/* |x| < 0.5 */
+			s = __kernel_cospil(0.5 - ax);
+		else if (lx < 0xc000000000000000ull)	/* |x| < 0.75 */
+			s = __kernel_cospil(ax - 0.5);
+		else
+			s = __kernel_sinpil(1 - ax);
+		RETURNI((hx & 0x8000) ? -s : s);
+	}
+
+	if (ix < 0x403e) {		/* 1 <= |x| < 0x1p63 */
+		/* Determine integer part of ax. */
+		j0 = ix - 0x3fff + 1;
+		if (j0 < 32) {
+			lx = (lx >> 32) << 32;
+			lx &= ~(((lx << 32)-1) >> j0);
+		} else {
+			uint64_t m = (uint64_t)-1 >> (j0 + 1);
+			if (lx & m) lx &= ~m;
+		}
+		INSERT_LDBL80_WORDS(x, ix, lx);
+
+		ax -= x;
+		EXTRACT_LDBL80_WORDS(ix, lx, ax);
+
+		if (ix == 0) {
+			s = 0;
+		} else {
+			if (ix < 0x3ffe) {		/* |x| < 0.5 */
+				if (ix < 0x3ffd)	/* |x| < 0.25 */
+					s = __kernel_sinpil(ax);
+				else 
+					s = __kernel_cospil(0.5 - ax);
+			} else {
+							/* |x| < 0.75 */
+				if (lx < 0xc000000000000000ull)
+					s = __kernel_cospil(ax - 0.5);
+				else
+					s = __kernel_sinpil(1 - ax);
+			}
+
+			if (j0 > 40) x -= 0x1p40;
+			if (j0 > 30) x -= 0x1p30;
+			j0 = (uint32_t)x;
+			if (j0 & 1) s = -s;
+		}
+		RETURNI((hx & 0x8000) ? -s : s);
+	}
+
+	/* x = +-inf or nan. */
+	if (ix >= 0x7fff)
+		RETURNI(vzero / vzero);
+
+	/*
+	 * |x| >= 0x1p63 is always an integer, so return +-0.
+	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
+	 */
+	if (ax + 1 > 1)
+		ax = copysignl(0, x);
+	RETURNI(ax);
+}

Property changes on: lib/msun/ld80/s_sinpil.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/ld80/s_tanpil.c
===================================================================
--- lib/msun/ld80/s_tanpil.c	(nonexistent)
+++ lib/msun/ld80/s_tanpil.c	(working copy)
@@ -0,0 +1,139 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * See ../src/s_tanpi.c for implementation details.
+ */
+
+#ifdef __i386__
+#include <ieeefp.h>
+#endif
+#include <stdint.h>
+
+#include "fpmath.h"
+#include "math.h"
+#include "math_private.h"
+
+static const double
+pihi =  3.1415926814079285e+00,	/* 0x400921fb 0x58000000 */
+pilo = -2.7818135228334233e-08;	/* 0xbe5dde97 0x3dcb3b3a */
+
+static inline long double
+__kernel_tanpil(long double x)
+{
+	long double hi, lo, t;
+
+	if (x < 0.25) {
+		hi = (float)x;
+		lo = x - hi;
+		lo = lo * (pilo + pihi) + hi * pilo;
+		hi *= pihi;
+		_2sumF(hi, lo);
+		t = __kernel_tanl(hi, lo, -1);
+	} else if (x > 0.25) {
+		x = 0.5 - x;
+		hi = (float)x;
+		lo = x - hi;
+		lo = lo * (pilo + pihi) + hi * pilo;
+		hi *= pihi;
+		_2sumF(hi, lo);
+		t = - __kernel_tanl(hi, lo, 1);
+	} else
+		t = 1;
+
+	return (t);
+}
+
+long double
+tanpil(long double x)
+{
+	volatile static const double vzero = 0;
+	long double ax, t;
+	uint32_t j0;
+	uint64_t lx;
+	uint16_t hx, ix;
+
+	EXTRACT_LDBL80_WORDS(hx, lx, x);
+	ix = hx & 0x7fff;
+	INSERT_LDBL80_WORDS(ax, ix, lx);
+
+	ENTERI();
+
+	if (ix < 0x3fff) {			/* |x| < 1 */
+		if (ix < 0x3ffe) {		/* |x| < 0.5 */
+			if (ix < 0x3fdd) {	/* |x| < 0x1p-34 */
+				if (x == 0)
+					RETURNI(x);
+				ax = (float)x;
+				x -= ax;
+				t = pilo * x + pihi * x + pilo * ax
+				    + pihi * ax;
+				RETURNI(t);
+			}
+			t = __kernel_tanpil(ax);
+		} else if (ax == 0.5)
+			RETURNI((ax - ax) / (ax - ax));
+		else
+			t = -__kernel_tanpil(1 - ax);
+		RETURNI((hx & 0x8000) ? -t : t);
+	}
+
+	if (ix < 0x403e) {		/* 1 <= |x| < 0x1p63 */
+		/* Determine integer part of ax. */
+		j0 = ix - 0x3fff + 1;
+		if (j0 < 32) {
+			lx = (lx >> 32) << 32;
+			lx &= ~(((lx << 32)-1) >> j0);
+		} else {
+			uint64_t m = (uint64_t)-1 >> (j0 + 1);
+			if (lx & m) lx &= ~m;
+		}
+		INSERT_LDBL80_WORDS(x, ix, lx);
+
+		ax -= x;
+		EXTRACT_LDBL80_WORDS(ix, lx, ax);
+
+		if (ix < 0x3ffe)		/* |x| < 0.5 */
+			t = ax == 0 ? 0 : __kernel_tanpil(ax);
+		else if (ax == 0.5)
+			RETURNI((ax - ax) / (ax - ax));
+		else
+			t = -__kernel_tanpil(1 - ax);
+		RETURNI((hx & 0x8000) ? -t : t);
+	}
+
+	/* x = +-inf or nan. */
+	if (ix >= 0x7fff)
+		RETURNI(vzero / vzero);
+
+	/*
+	 * |x| >= 0x1p63 is always an integer, so return +-0.
+	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
+	 */
+	if (ax + 1 > 1)
+		ax = copysignl(0, x);
+	RETURNI(ax);
+}

Property changes on: lib/msun/ld80/s_tanpil.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/man/cospi.3
===================================================================
--- lib/msun/man/cospi.3	(nonexistent)
+++ lib/msun/man/cospi.3	(working copy)
@@ -0,0 +1,111 @@
+.\" Copyright (c) 2017 Steven G. Kargl <kargl@FreeBSD.org>
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd April 1, 2017
+.Dt COSPI 3
+.Os
+.Sh NAME
+.Nm cospi ,
+.Nm cospif ,
+.Nm cospil
+.Nd half\(encycle cosine functions
+.Sh LIBRARY
+.Lb libm
+.Sh SYNOPSIS
+.In math.h
+.Ft double
+.Fn cospi "double x"
+.Ft float
+.Fn cospif "float x"
+.Ft long double
+.Fn cospil "long double x"
+.Sh DESCRIPTION
+The
+.Fn cospi ,
+.Fn cospif ,
+and
+.Fn cospil
+functions compute the cosine of
+.Fa "\(*p \(mu x" .
+and measure angles in half-cycles.
+.Sh RETURN VALUES
+The
+.Fn cospi ,
+.Fn cospif ,
+and
+.Fn cospil
+functions returns
+.Fn cos "\(*p \(mu x" .
+If \*(Bax\*(Ba \*(Ge 2^(p - 1)
+where p is the floating\(enpoint precision of 
+.Ar x ,
+then the returned value is 1 and it has no significance.
+.Sh SPECIAL VALUES
+.Bl -tag
+.It
+.Fn cospi \*(Pm0
+returns 1.
+.It
+.Fn cospi \*(Pmn/2
+returns 0 for positive integers
+.Ar n .
+.It
+.Fn cospi n
+returns 1 for even integers
+.Ar n .
+.It
+.Fn cospi n
+returns \-1 for odd integers
+.Ar n .
+.It
+.Fn cospi \*(Pm\(if
+return an \*(Na and raises an FE_INVALID exception.
+.It
+.Fn cospi \*(Na
+return an \*(Na and raises an FE_INVALID exception.
+.El
+.Sh SEE ALSO
+.Xr cos 3 ,
+.Xr fenv 3 ,
+.Xr math 3 ,
+.Xr sin 3 ,
+.Xr sinpi 3 ,
+.Xr tan 3 ,
+.Xr tanpi 3
+.Sh AUTHORS
+The half\(encycle trignometric functions were written by
+.An Steven G. Kargl Aq Mt kargl@FreeBSD.org .
+.Sh STANDARDS
+These functions conform to
+IEEE Std 754\(tm\(en2008 ,
+\(dqIEEE Standard for Floating-Point Arithmetic\(dq
+and to
+ISO/IEC TS 18661-4 , 
+\(dqInformation technology \(em Programming languages, their environments,
+and system software interfaces \(em Floating\(enpoint extensions for
+C\(dq \(em Part 4: Supplementary functions.
+
+

Property changes on: lib/msun/man/cospi.3
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/man/sinpi.3
===================================================================
--- lib/msun/man/sinpi.3	(nonexistent)
+++ lib/msun/man/sinpi.3	(working copy)
@@ -0,0 +1,102 @@
+.\" Copyright (c) 2017 Steven G. Kargl <kargl@FreeBSD.org>
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd April 1, 2017
+.Dt SINPI 3
+.Os
+.Sh NAME
+.Nm sinpi ,
+.Nm sinpif ,
+.Nm sinpil
+.Nd half\(encycle sine functions
+.Sh LIBRARY
+.Lb libm
+.Sh SYNOPSIS
+.In math.h
+.Ft double
+.Fn sinpi "double x"
+.Ft float
+.Fn sinpif "float x"
+.Ft long double
+.Fn sinpil "long double x"
+.Sh DESCRIPTION
+The
+.Fn sinpi ,
+.Fn sinpif ,
+and
+.Fn sinpil
+functions compute the sine of
+.Fa "\(*p \(mu x" .
+and measure angles in half-cycles.
+.Sh RETURN VALUES
+The
+.Fn sinpi ,
+.Fn sinpif ,
+and
+.Fn sinpil
+functions returns
+.Fn sin "\(*p \(mu x" .
+If \*(Bax\*(Ba \*(Ge 2^(p - 1)
+where p is the floating\(enpoint precision of 
+.Ar x ,
+then the returned value is \*(Pm0 and it has no significance.
+.Sh SPECIAL VALUES
+.Bl -tag
+.It
+.Fn sinpi \*(Pm0
+returns \*(Pm0.
+.It
+.Fn sinpi \*(Pmn
+returns \*(Pm0 for positive integers
+.Ar n .
+.It
+.Fn sinpi \*(Pm\(if
+return an \*(Na and raises an FE_INVALID exception.
+.It
+.Fn sinpi \*(Na
+return an \*(Na and raises an FE_INVALID exception.
+.El
+.Sh SEE ALSO
+.Xr cos 3 ,
+.Xr cospi 3 ,
+.Xr fenv 3 ,
+.Xr math 3 ,
+.Xr sin 3 ,
+.Xr tan 3 ,
+.Xr tanpi 3
+.Sh AUTHORS
+The half\(encycle trignometric functions were written by
+.An Steven G. Kargl Aq Mt kargl@FreeBSD.org .
+.Sh STANDARDS
+These functions conform to
+IEEE Std 754\(tm\(en2008 ,
+\(dqIEEE Standard for Floating-Point Arithmetic\(dq
+and to
+ISO/IEC TS 18661-4 , 
+\(dqInformation technology \(em Programming languages, their environments,
+and system software interfaces \(em Floating\(enpoint extensions for
+C\(dq \(em Part 4: Supplementary functions.
+

Property changes on: lib/msun/man/sinpi.3
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/man/tanpi.3
===================================================================
--- lib/msun/man/tanpi.3	(nonexistent)
+++ lib/msun/man/tanpi.3	(working copy)
@@ -0,0 +1,106 @@
+.\" Copyright (c) 2017 Steven G. Kargl <kargl@FreeBSD.org>
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd April 1, 2017
+.Dt TANPI 3
+.Os
+.Sh NAME
+.Nm tanpi ,
+.Nm tanpif ,
+.Nm tanpil
+.Nd half\(encycle tangent functions
+.Sh LIBRARY
+.Lb libm
+.Sh SYNOPSIS
+.In math.h
+.Ft double
+.Fn tanpi "double x"
+.Ft float
+.Fn tanpif "float x"
+.Ft long double
+.Fn tanpil "long double x"
+.Sh DESCRIPTION
+The
+.Fn tanpi ,
+.Fn tanpif ,
+and
+.Fn tanpil
+functions compute the tangent of
+.Fa "\(*p \(mu x" 
+and measure angles in half-cycles.
+.Sh RETURN VALUES
+The
+.Fn tanpi ,
+.Fn tanpif ,
+and
+.Fn tanpil
+functions returns
+.Fn tan "\(*p \(mu x" .
+If \*(Bax\*(Ba \*(Ge 2^(p - 1)
+where p is the floating\(enpoint precision of 
+.Ar x ,
+then the returned value is \*(Pm0 and it has no significance.
+.Sh SPECIAL VALUES
+.Bl -tag
+.It
+.Fn tanpi \*(Pm0
+returns \*(Pm0.
+.It
+.Fn tanpi \*(Pmn
+returns \*(Pm0 for positive integers
+.Ar n .
+.It
+.Fn tanpi \*(Pmn/2
+returns \*(Na for n > 0 and raises an FE_INVALID exception.
+.It
+.Fn tanpi \*(Pm\(if
+return an \*(Na and raises an FE_INVALID exception.
+.It
+.Fn tanpi \*(Na
+return an \*(Na and raises an FE_INVALID exception.
+.El
+.Sh SEE ALSO
+.Xr cos 3 ,
+.Xr cospi 3 ,
+.Xr fenv 3 ,
+.Xr math 3 ,
+.Xr sin 3 ,
+.Xr sinpi 3 ,
+.Xr tan 3 ,
+.Sh AUTHORS
+The half\(encycle trignometric functions were written by
+.An Steven G. Kargl Aq Mt kargl@FreeBSD.org .
+.Sh STANDARDS
+These functions conform to
+IEEE Std 754\(tm\(en2008 ,
+\(dqIEEE Standard for Floating-Point Arithmetic\(dq
+and to
+ISO/IEC TS 18661-4 , 
+\(dqInformation technology \(em Programming languages, their environments,
+and system software interfaces \(em Floating\(enpoint extensions for
+C\(dq \(em Part 4: Supplementary functions.
+
+

Property changes on: lib/msun/man/tanpi.3
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/src/k_cospi.c
===================================================================
--- lib/msun/src/k_cospi.c	(nonexistent)
+++ lib/msun/src/k_cospi.c	(working copy)
@@ -0,0 +1,45 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * The basic kernel for x in [0,0.25].  To exploit the kernel for cos(x),
+ * the argument to __kernel_cospi() must be multiply by pi.  As pi is an
+ * irrational number, this allows the splitting of pi*x into high and low
+ * parts.
+ */
+
+static inline double
+__kernel_cospi(double x)
+{
+	double hi, lo;
+	hi = (float)x;
+	lo = x - hi;
+	lo = lo * (pilo + pihi) + hi * pilo;
+	hi *= pihi;
+	_2sumF(hi, lo);
+	return (__kernel_cos(hi, lo));
+}
+

Property changes on: lib/msun/src/k_cospi.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/src/k_sinpi.c
===================================================================
--- lib/msun/src/k_sinpi.c	(nonexistent)
+++ lib/msun/src/k_sinpi.c	(working copy)
@@ -0,0 +1,44 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * The basic kernel for x in [0,0.25].  To exploit the kernel for sin(x),
+ * the argument to __kernel_sinpi() must be multiply by pi.  As pi is an
+ * irrational number, this allows the splitting of pi*x into high and low
+ * parts.
+ */
+
+static inline double
+__kernel_sinpi(double x)
+{
+	double hi, lo;
+	hi = (float)x;
+	lo = x - hi;
+	lo = lo * (pilo + pihi) + hi * pilo;
+	hi *= pihi;
+	_2sumF(hi, lo);
+	return (__kernel_sin(hi, lo, 1));
+}

Property changes on: lib/msun/src/k_sinpi.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/src/math.h
===================================================================
--- lib/msun/src/math.h	(revision 318383)
+++ lib/msun/src/math.h	(working copy)
@@ -500,6 +500,15 @@
 
 #if __BSD_VISIBLE
 long double	lgammal_r(long double, int *);
+double		cospi(double);
+float		cospif(float);
+long double	cospil(long double);
+double		sinpi(double);
+float		sinpif(float);
+long double	sinpil(long double);
+double		tanpi(double);
+float		tanpif(float);
+long double	tanpil(long double);
 #endif
 
 __END_DECLS
Index: lib/msun/src/s_cospi.c
===================================================================
--- lib/msun/src/s_cospi.c	(nonexistent)
+++ lib/msun/src/s_cospi.c	(working copy)
@@ -0,0 +1,152 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * cospi(x) computes cos(pi*x) without multiplication by pi (almost).  First,
+ * note that cospi(-x) = cospi(x), so the algorithm considers only |x|.  The
+ * method used depends on the magnitude of x.
+ *
+ * 1. For small |x|, cospi(x) = 1 with FE_INEXACT raised where a sloppy
+ *    threshold is used.  The threshold is |x| < 0x1pN with N = -(P/2+M).
+ *    P is the precision of the floating-point type and M = 2 to 4.
+ *
+ * 2. For |x| < 1, argument reduction is not required and sinpi(x) is 
+ *    computed by calling a kernel that leverages the kernels for sin(x)
+ *    ans cos(x).  See k_sinpi.c and k_cospi.c for details.
+ *
+ * 3. For 1 <= |x| < 0x1p(P-1), argument reduction is required where
+ *    |x| = j0 + r with j0 an integer and the remainder r satisfies
+ *    0 <= r < 1.  With the given domain, a simplified inline floor(x)
+ *    is used.  Also, note the following identity
+ *
+ *    cospi(x) = cos(pi*(j0+r))
+ *             = cos(pi*j0) * cos(pi*r) - sin(pi*j0) * sin(pi*r)
+ *             = cos(pi*j0) * cos(pi*r)
+ *             = +-cospi(r)
+ *
+ *    If j0 is even, then cos(pi*j0) = 1. If j0 is odd, then cos(pi*j0) = -1.
+ *    cospi(r) is then computed via an appropriate kernel.
+ *
+ * 4. For |x| >= 0x1p(P-1), |x| is integral and cospi(x) = 1.
+ *
+ * 5. Special cases:
+ *
+ *    cospi(+-0) = 1.
+ *    cospi(n.5) = 0 for n an integer.
+ *    cospi(+-inf) = nan.  Raises the "invalid" floating-point exception.
+ *    cospi(nan) = nan.  Raises the "invalid" floating-point exception.
+ */
+
+#include "math.h"
+#include "math_private.h"
+
+static const double
+pihi = 3.1415926814079285e+00,	/* 0x400921fb 0x58000000 */
+pilo =-2.7818135228334233e-08;	/* 0xbe5dde97 0x3dcb3b3a */
+
+#include "k_cospi.c"
+#include "k_sinpi.c"
+
+double
+cospi(double x)
+{
+	volatile static const double vzero = 0;
+	double ax, c;
+	uint32_t hx, ix, j0, lx;
+
+	EXTRACT_WORDS(hx, lx, x);
+	ix = hx & 0x7fffffff;
+	INSERT_WORDS(ax, ix, lx);
+
+	if (ix < 0x3ff00000) {			/* |x| < 1 */
+		if (ix < 0x3fd00000) {		/* |x| < 0.25 */
+			if (ix < 0x3e200000) {	/* |x| < 0x1p-29 */
+				if ((int)ax == 0)
+					return (1);
+			}
+			return (__kernel_cospi(ax));
+		}
+
+		if (ix < 0x3fe00000)		/* |x| < 0.5 */
+			c = __kernel_sinpi(0.5 - ax);
+		else if (ix < 0x3fe80000){	/* |x| < 0.75 */
+			if (ax == 0.5)
+				return (0);
+			c = -__kernel_sinpi(ax - 0.5);
+		} else
+			c = -__kernel_cospi(1 - ax);
+		return (c);
+	}
+
+	if (ix < 0x43300000) {		/* 1 <= |x| < 0x1p52 */
+		/* Determine integer part of ax. */
+		j0 = ((ix >> 20) & 0x7ff) - 0x3ff;
+		if (j0 < 20) {
+			ix &= ~(0x000fffff >> j0);
+			lx = 0;
+		} else {
+			lx &= ~((uint32_t)0xffffffff >> (j0 - 20));
+		}
+		INSERT_WORDS(x, ix, lx);
+
+		ax -= x;
+		EXTRACT_WORDS(ix, lx, ax);
+
+
+		if (ix < 0x3fe00000) {		/* |x| < 0.5 */
+			if (ix < 0x3fd00000)	/* |x| < 0.25 */
+				c = ix == 0 ? 1 : __kernel_cospi(ax);
+			else 
+				c = __kernel_sinpi(0.5 - ax);
+		} else {
+			if (ix < 0x3fe80000) {	/* |x| < 0.75 */
+				if (ax == 0.5)
+					return (0);
+				c = -__kernel_sinpi(ax - 0.5);
+			} else
+				c = -__kernel_cospi(1 - ax);
+		}
+
+		if (j0 > 30) x -= 0x1p30;
+		j0 = (uint32_t)x;
+		return (j0 & 1 ? -c : c);
+	}
+
+	if (ix >= 0x7f800000)
+		return (vzero / vzero);
+
+	/*
+	 * |x| >= 0x1p52 is always an even integer, so return 1.
+	 * FIXME: should this raise FE_INEXACT or FE_INVALID?
+	 */
+	if (ax + 1 > 1)
+		ax = 1;
+	return (ax);
+}
+
+#if LDBL_MANT_DIG == 53
+__weak_reference(cospi, cospil);
+#endif

Property changes on: lib/msun/src/s_cospi.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/src/s_cospif.c
===================================================================
--- lib/msun/src/s_cospif.c	(nonexistent)
+++ lib/msun/src/s_cospif.c	(working copy)
@@ -0,0 +1,111 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * See ../src/s_cospi.c for implementation details.
+ */
+#define	INLINE_KERNEL_SINDF
+#define	INLINE_KERNEL_COSDF
+
+#include "math.h"
+#include "math_private.h"
+#include "k_cosf.c"
+#include "k_sinf.c"
+
+#define  __kernel_cospif(x)   ((float)__kernel_cosdf(M_PI * (x)))
+#define  __kernel_sinpif(x)   ((float)__kernel_sindf(M_PI * (x)))
+
+float
+cospif(float x)
+{
+	volatile static const float vzero = 0;
+	float ax, c;
+	uint32_t ix, j0;
+
+	GET_FLOAT_WORD(ix, x);
+	ix = ix & 0x7fffffff;
+	SET_FLOAT_WORD(ax, ix);
+
+	if (ix < 0x3f800000) {			/* |x| < 1 */
+		if (ix < 0x3e800000) {		/* |x| < 0.25 */
+			if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
+				/* Raise inexact iff != 0. */
+				if ((int)ax == 0)
+					return (1);
+			}
+			return (__kernel_cospif(ax));
+		}
+
+		if (ix < 0x3f000000)		/* |x| < 0.5 */
+			c = __kernel_sinpif(0.5f - ax);
+		else if (ix < 0x3f400000) {	/* |x| < 0.75 */
+			if (ix == 0x3f000000)
+				return (0);
+			c = -__kernel_sinpif(ax - 0.5f);
+		} else
+			c = -__kernel_cospif(1 - ax);
+		return (c);
+	}
+
+	if (ix < 0x4b000000) {			/* 1 <= |x| < 0x1p23 */
+		/* Determine integer part of ax. */
+		j0 = ((ix >> 23) & 0xff) - 0x7f;
+		ix &= ~(0x007fffff >> j0);
+		SET_FLOAT_WORD(x, ix);
+
+		ax -= x;
+		GET_FLOAT_WORD(ix, ax);
+
+		if (ix < 0x3f000000) {		/* |x| < 0.5 */
+			if (ix < 0x3e800000)	/* |x| < 0.25 */
+				c = ix == 0 ? 1 : __kernel_cospif(ax);
+			else
+				c = __kernel_sinpif(0.5f - ax);
+		} else {
+			if (ix < 0x3f400000) {	/* |x| < 0.75 */
+				if (ix == 0x3f000000)
+					return (0);
+				c = -__kernel_sinpif(ax - 0.5f);
+			} else
+				c = -__kernel_cospif(1 - ax);
+		}
+
+		j0 = (uint32_t)x;
+		return (j0 & 1 ? -c : c);
+	}
+
+	/* x = +-inf or nan. */
+	if (ix >= 0x7f800000)
+		return (vzero / vzero);
+
+	/*
+	 * |x| >= 0x1p23 is always an even integer, so return 1.
+	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
+	 */
+	if (ax + 1 > 1)
+		ax = 1;
+	return (ax);
+}

Property changes on: lib/msun/src/s_cospif.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/src/s_sinpi.c
===================================================================
--- lib/msun/src/s_sinpi.c	(nonexistent)
+++ lib/msun/src/s_sinpi.c	(working copy)
@@ -0,0 +1,163 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * sinpi(x) computes sin(pi*x) without multiplication by pi (almost).  First,
+ * note that sinpi(-x) = -sinpi(x), so the algorithm considers only |x| and
+ * includes reflection symmetry by considering the sign of x on output.  The
+ * method used depends on the magnitude of x.
+ *
+ * 1. For small |x|, sinpi(x) = pi * x where a sloppy threshold is used.  The
+ *    threshold is |x| < 0x1pN with N = -(P/2+M).  P is the precision of the
+ *    floating-point type and M = 2 to 4.  To achieve high accuracy, pi is 
+ *    decomposed into high and low parts with the high part containing a
+ *    number of trailing zero bits.  x is also split into high and low parts.
+ *
+ * 2. For |x| < 1, argument reduction is not required and sinpi(x) is 
+ *    computed by calling a kernel that leverages the kernels for sin(x)
+ *    ans cos(x).  See k_sinpi.c and k_cospi.c for details.
+ *
+ * 3. For 1 <= |x| < 0x1p(P-1), argument reduction is required where
+ *    |x| = j0 + r with j0 an integer and the remainder r satisfies
+ *    0 <= r < 1.  With the given domain, a simplified inline floor(x)
+ *    is used.  Also, note the following identity
+ *
+ *    sinpi(x) = sin(pi*(j0+r))
+ *             = sin(pi*j0) * cos(pi*r) + cos(pi*j0) * sin(pi*r)
+ *             = cos(pi*j0) * sin(pi*r)
+ *             = +-sinpi(r)
+ *
+ *    If j0 is even, then cos(pi*j0) = 1. If j0 is odd, then cos(pi*j0) = -1.
+ *    sinpi(r) is then computed via an appropriate kernel.
+ *
+ * 4. For |x| >= 0x1p(P-1), |x| is integral and sinpi(x) = copysign(0,x).
+ *
+ * 5. Special cases:
+ *
+ *    sinpi(+-0) = +-0
+ *    sinpi(+-n) = +-0, for positive integers n.
+ *    sinpi(+-inf) = nan.  Raises the "invalid" floating-point exception.
+ *    sinpi(nan) = nan.  Raises the "invalid" floating-point exception.
+ */
+
+#include "math.h"
+#include "math_private.h"
+
+static const double
+pihi = 3.1415926814079285e+00,	/* 0x400921fb 0x58000000 */
+pilo =-2.7818135228334233e-08;	/* 0xbe5dde97 0x3dcb3b3a */
+
+#include "k_cospi.c"
+#include "k_sinpi.c"
+
+double
+sinpi(double x)
+{
+	volatile static const double vzero = 0;
+	double ax, s;
+	uint32_t hx, ix, j0, lx;
+
+	EXTRACT_WORDS(hx, lx, x);
+	ix = hx & 0x7fffffff;
+	INSERT_WORDS(ax, ix, lx);
+
+	if (ix < 0x3ff00000) {			/* |x| < 1 */
+		if (ix < 0x3fd00000) {		/* |x| < 0.25 */
+			if (ix < 0x3e200000) {	/* |x| < 0x1p-29 */
+				if (x == 0)
+					return (x);
+				INSERT_WORDS(ax, hx, 0);
+				x -= ax;
+				s = pilo * x + pihi * x + pilo * ax
+				    + pihi * ax;
+				return (s);
+			}
+
+			s = __kernel_sinpi(ax);
+			return ((hx & 0x80000000) ? -s : s);
+		}
+
+		if (ix < 0x3fe00000)		/* |x| < 0.5 */
+			s = __kernel_cospi(0.5 - ax);
+		else if (ix < 0x3fe80000)	/* |x| < 0.75 */
+			s = __kernel_cospi(ax - 0.5);
+		else
+			s = __kernel_sinpi(1 - ax);
+		return ((hx & 0x80000000) ? -s : s);
+	}
+
+	if (ix < 0x43300000) {			/* 1 <= |x| < 0x1p52 */
+		/* Determine integer part of ax. */
+		j0 = ((ix >> 20) & 0x7ff) - 0x3ff;
+		if (j0 < 20) {
+			ix &= ~(0x000fffff >> j0);
+			lx = 0;
+		} else {
+			lx &= ~((uint32_t)0xffffffff >> (j0 - 20));
+		}
+		INSERT_WORDS(x, ix, lx);
+
+		ax -= x;
+		EXTRACT_WORDS(ix, lx, ax);
+
+		if (ix == 0)
+			s = 0;
+		else {
+			if (ix < 0x3fe00000) {		/* |x| < 0.5 */
+				if (ix < 0x3fd00000)	/* |x| < 0.25 */
+					s = __kernel_sinpi(ax);
+				else 
+					s = __kernel_cospi(0.5 - ax);
+			} else {
+				if (ix < 0x3fe80000)	/* |x| < 0.75 */
+					s = __kernel_cospi(ax - 0.5);
+				else
+					s = __kernel_sinpi(1 - ax);
+			}
+
+			if (j0 > 30) x -= 0x1p30;
+			j0 = (uint32_t)x;
+			if (j0 & 1) s = -s;
+		}
+
+		return ((hx & 0x80000000) ? -s : s);
+	}
+
+	if (ix >= 0x7f800000)
+		return (vzero / vzero);
+
+	/*
+	 * |x| >= 0x1p52 is always an integer, so return +-0.
+	 * FIXME: should this raise FE_INEXACT or FE_INVALID?
+	 */
+	if (ax + 1 > 1)
+		ax = copysign(0, x);
+	return (ax);
+}
+
+#if LDBL_MANT_DIG == 53
+__weak_reference(sinpi, sinpil);
+#endif

Property changes on: lib/msun/src/s_sinpi.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/src/s_sinpif.c
===================================================================
--- lib/msun/src/s_sinpif.c	(nonexistent)
+++ lib/msun/src/s_sinpif.c	(working copy)
@@ -0,0 +1,122 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * See ../src/s_sinpi.c for implementation details.
+ */
+
+#define	INLINE_KERNEL_SINDF
+#define	INLINE_KERNEL_COSDF
+
+#include "math.h"
+#include "math_private.h"
+#include "k_cosf.c"
+#include "k_sinf.c"
+
+#define  __kernel_cospif(x)   ((float)__kernel_cosdf(M_PI * (x)))
+#define  __kernel_sinpif(x)   ((float)__kernel_sindf(M_PI * (x)))
+
+static const float pihi =  3.14160156e+00f;	/* 0x40491000 */
+static const float pilo = -8.90890988e-06f;	/* 0xb715777a */
+
+float
+sinpif(float x)
+{
+	volatile static const float vzero = 0;
+	float ax, s;
+	uint32_t hx, ix, j0;
+
+	GET_FLOAT_WORD(hx, x);
+	ix = hx & 0x7fffffff;
+	SET_FLOAT_WORD(ax, ix);
+
+	if (ix < 0x3f800000) {			/* |x| < 1 */
+		if (ix < 0x3e800000) {		/* |x| < 0.25 */
+	 		if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
+				if (x == 0)
+					return (x);
+				SET_FLOAT_WORD(ax, hx & 0xffff0000);
+				x -= ax;
+				s = pilo * x + pihi * x + pilo * ax
+				    + pihi * ax;
+				return (s);
+			}
+
+			s = __kernel_sinpif(ax);
+			return ((hx & 0x80000000) ? -s : s);
+		}
+
+		if (ix < 0x3f000000)		/* |x| < 0.5 */
+			s = __kernel_cospif(0.5f - ax);
+		else if (ix < 0x3f400000)	/* |x| < 0.75 */
+			s = __kernel_cospif(ax - 0.5f);
+		else
+			s = __kernel_sinpif(1 - ax);
+		return ((hx & 0x80000000) ? -s : s);
+	}
+
+	if (ix < 0x4b000000) {			/* 1 <= |x| < 0x1p23 */
+		/* Determine integer part of ax. */
+		j0 = ((ix >> 23) & 0xff) - 0x7f;
+		ix &= ~(0x007fffff >> j0);
+		SET_FLOAT_WORD(x, ix);
+
+		ax -= x;
+		GET_FLOAT_WORD(ix, ax);
+
+		if (ix == 0)
+			s = 0;
+		else {
+			if (ix < 0x3f000000) {		/* |x| < 0.5 */
+				if (ix < 0x3e800000)	/* |x| < 0.25 */
+					s = __kernel_sinpif(ax);
+				else
+					s = __kernel_cospif(0.5f - ax);
+			} else {
+				if (ix < 0x3f400000)	/* |x| < 0.75 */
+					s = __kernel_cospif(ax - 0.5f);
+				else
+					s = __kernel_sinpif(1 - ax);
+			}
+
+			j0 = (uint32_t)x;
+			s = (j0 & 1) ? -s : s;
+		}
+		return ((hx & 0x80000000) ? -s : s);
+	}
+
+	/* x = +-inf or nan. */
+	if (ix >= 0x7f800000)
+		return (vzero / vzero);
+
+	/*
+	 * |x| >= 0x1p23 is always an integer, so return +-0.
+	 * FIXME: should this raise FE_INEXACT or FE_INVALID?
+	 */
+	if (ax + 1 > 1)
+		ax = copysignf(0, x);
+	return (ax);
+}

Property changes on: lib/msun/src/s_sinpif.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/src/s_tanpi.c
===================================================================
--- lib/msun/src/s_tanpi.c	(nonexistent)
+++ lib/msun/src/s_tanpi.c	(working copy)
@@ -0,0 +1,172 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * tanpi(x) computes tan(pi*x) without multiplication by pi (almost).  First,
+ * note that tanpi(-x) = -tanpi(x), so the algorithm considers only |x| and
+ * includes reflection symmetry by considering the sign of x on output.  The
+ * method used depends on the magnitude of x.
+ *
+ * 1. For small |x|, tanpi(x) = pi * x where a sloppy threshold is used.  The
+ *    threshold is |x| < 0x1pN with N = -(P/2+M).  P is the precision of the
+ *    floating-point type and M = 2 to 4.  To achieve high accuracy, pi is
+ *    decomposed into high and low parts with the high part containing a
+ *    number of trailing zero bits.  x is also split into high and low parts.
+ *
+ * 2. For |x| < 1, argument reduction is not required and tanpi(x) is 
+ *    computed by a direct call to a kernel, which uses the kernel for
+ *    tan(x).  See below.
+ *
+ * 3. For 1 <= |x| < 0x1p(P-1), argument reduction is required where
+ *    |x| = j0 + r with j0 an integer and the remainder r satisfies
+ *    0 <= r < 1.  With the given domain, a simplified inline floor(x)
+ *    is used.  Also, note the following identity
+ *
+ *                                   tan(pi*j0) + tan(pi*r)
+ *    tanpi(x) = tan(pi*(j0+r)) = ---------------------------- = tanpi(r)
+ *                                 1 - tan(pi*j0) * tan(pi*r)
+ * 
+ *    So, after argument reduction, the kernel is again invoked.
+ *
+ * 4. For |x| >= 0x1p(P-1), |x| is integral and tanpi(x) = copysign(0,x).
+ *
+ * 5. Special cases:
+ *
+ *    tanpi(+-0) = +-0
+ *    tanpi(+-n) = +-0, for positive integers n.
+ *    tanpi(+-n+1/4) = +-1, for positive integers n.
+ *    tanpi(+-n+1/2) = NaN, for positive integers n.
+ *    tanpi(+-inf) = NaN.  Raises the "invalid" floating-point exception.
+ *    tanpi(nan) = NaN.  Raises the "invalid" floating-point exception.
+ */
+
+#include "math.h"
+#include "math_private.h"
+
+static const double 
+pihi  =  3.1415926814079285e+00,	/* 0x400921fb 0x58000000 */
+pilo  = -2.7818135228334233e-08;	/* 0xbe5dde97 0x3dcb3b3a */
+
+/*
+ * The kernel for tanpi(x) multiplies x by an 80-bit approximation of
+ * pi, where the hi and lo parts are used with with kernel for tan(x).
+ */
+static inline double
+__kernel_tanpi(double x)
+{
+	double hi, lo, t;
+
+	if (x < 0.25) {
+		hi = (float)x;
+		lo = x - hi;
+		lo = lo * (pilo + pihi) + hi * pilo;
+		hi *= pihi;
+		_2sumF(hi, lo);
+		t = __kernel_tan(hi, lo, 1);
+	} else if (x > 0.25) {
+		x = 0.5 - x;
+		hi = (float)x;
+		lo = x - hi;
+		lo = lo * (pilo + pihi) + hi * pilo;
+		hi *= pihi;
+		_2sumF(hi, lo);
+		t = - __kernel_tan(hi, lo, -1);
+	} else
+		t = 1;
+
+	return (t);
+}
+
+double
+tanpi(double x)
+{
+	volatile static const double vzero = 0;
+	double ax, t;
+	uint32_t hx, ix, j0, lx;
+
+	EXTRACT_WORDS(hx, lx, x);
+	ix = hx & 0x7fffffff;
+	INSERT_WORDS(ax, ix, lx);
+
+	if (ix < 0x3ff00000) {			/* |x| < 1 */
+		if (ix < 0x3fe00000) {		/* |x| < 0.5 */
+			if (ix < 0x3e200000) {	/* |x| < 0x1p-29 */
+				if (x == 0)
+					return (x);
+				ax = (float)x;
+				x -= ax;
+				t = pilo * x + pihi * x + pilo * ax
+				    + pihi * ax;
+				return (t);
+			}
+			t = __kernel_tanpi(ax);
+		} else if (ax == 0.5)
+			return ((ax - ax) / (ax - ax));
+		else
+			t = - __kernel_tanpi(1 - ax);
+		return ((hx & 0x80000000) ? -t : t);
+	}
+
+	if (ix < 0x43300000) {		/* 1 <= |x| < 0x1p52 */
+		/* Determine integer part of ax. */
+		j0 = ((ix >> 20) & 0x7ff) - 0x3ff;
+		if (j0 < 20) {
+			ix &= ~(0x000fffff >> j0);
+			lx = 0;
+		} else {
+			lx &= ~(((uint32_t)(0xffffffff)) >> (j0 - 20));
+		}
+		INSERT_WORDS(x,ix,lx);
+
+		ax -= x;
+		EXTRACT_WORDS(ix, lx, ax);
+
+		if (ix < 0x3fe00000)		/* |x| < 0.5 */
+			t = ax == 0 ? 0 : __kernel_tanpi(ax);
+		else if (ax == 0.5)
+			return ((ax - ax) / (ax - ax));
+		else
+			t = - __kernel_tanpi(1 - ax);
+
+		return ((hx & 0x80000000) ? -t : t);
+	}
+
+	/* x = +-inf or nan. */
+	if (ix >= 0x7f800000)
+		return (vzero / vzero);
+
+	/*
+	 * |x| >= 0x1p52 is always an integer, so return +-0.
+	 * FIXME: should this raise FE_INEXACT or FE_INVALID?
+	 */
+	if (ax + 1 > 1)
+		ax = copysign(0, x);
+	return (ax);
+}
+
+#if LDBL_MANT_DIG == 53
+__weak_reference(tanpi, tanpil);
+#endif

Property changes on: lib/msun/src/s_tanpi.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/src/s_tanpif.c
===================================================================
--- lib/msun/src/s_tanpif.c	(nonexistent)
+++ lib/msun/src/s_tanpif.c	(working copy)
@@ -0,0 +1,115 @@
+/*-
+ * Copyright (c) 2017 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * See ../src/s_tanpi.c for implementation details.
+ */
+
+#define INLINE_KERNEL_TANDF
+
+#include "math.h"
+#include "math_private.h"
+#include "k_tanf.c"
+
+static const float
+pihi =  3.14160156e+00f,	/* 0x40491000 */
+pilo = -8.90890988e-06f;	/* 0xb715777a */
+
+static inline float
+__kernel_tanpif(float x)
+{
+	float t;
+
+	if (x < 0.25f)
+		t = __kernel_tandf(M_PI * x, 1);
+	else if (x > 0.25f)
+		t = -__kernel_tandf(M_PI * (0.5 - x), -1);
+	else
+		t = 1;
+
+	return (t);
+}
+
+float
+tanpif(float x)
+{
+	volatile static const float vzero = 0;
+	float ax, t;
+	uint32_t hx, ix, j0;
+
+	GET_FLOAT_WORD(hx, x);
+	ix = hx & 0x7fffffff;
+	SET_FLOAT_WORD(ax, ix);
+
+	if (ix < 0x3f800000) {			/* |x| < 1 */
+		if (ix < 0x3f000000) {		/* |x| < 0.5 */
+			if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
+				if (ix == 0)
+					return (x);
+				SET_FLOAT_WORD(ax, hx & 0xffff0000);
+				x -= ax;
+				t = pilo * x + pihi * x + pilo * ax
+				    + pihi * ax;
+				return (t);
+			}
+			t = __kernel_tanpif(ax);
+		} else if (ix == 0x3f000000)
+			return ((ax - ax) / (ax - ax));
+		else
+			t = - __kernel_tanpif(1 - ax);
+		return ((hx & 0x80000000) ? -t : t);
+	}
+
+	if (ix < 0x4b000000) {		/* 1 <= |x| < 0x1p23 */
+		/* Determine integer part of ax. */
+		j0 = ((ix >> 23) & 0xff) - 0x7f;
+		ix &= ~(0x007fffff >> j0);
+		SET_FLOAT_WORD(x, ix);
+
+		ax -= x;
+		GET_FLOAT_WORD(ix, ax);
+
+		if (ix < 0x3f000000)		/* |x| < 0.5 */
+			t = ix == 0 ? 0 : __kernel_tanpif(ax);
+		else if (ix == 0x3f000000)
+			return ((ax - ax) / (ax - ax));
+		else
+			t = - __kernel_tanpif(1 - ax);
+		return ((hx & 0x80000000) ? -t : t);
+	}
+
+	/* x = +-inf or nan. */
+	if (ix >= 0x7f800000)
+		return (vzero / vzero);
+
+	/*
+	 * |x| >= 0x1p23 is always an integer, so return +-0.
+	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
+	 */
+	if (ax + 1 > 1)
+		ax = copysignf(0, x);
+	return (ax);
+}

Property changes on: lib/msun/src/s_tanpif.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Wed May 17 02:49:56 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 757B7D70100
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 17 May 2017 02:49:56 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id E77A21E60
 for <freebsd-numerics@freebsd.org>; Wed, 17 May 2017 02:49:55 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 1924B1043A6C;
 Wed, 17 May 2017 12:49:45 +1000 (AEST)
Date: Wed, 17 May 2017 12:49:45 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
In-Reply-To: <20170516224618.GA40855@troutmask.apl.washington.edu>
Message-ID: <20170517094848.A52247@besplex.bde.org>
References: <20170409220809.GA25076@troutmask.apl.washington.edu>
 <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=VjOiZrt8L6bhyw-5nSAA:9 a=DCdrCWzBJeGlZEVK:21
 a=Pt8S-Y3xSoovzgYL:21 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 May 2017 02:49:56 -0000

On Tue, 16 May 2017, Steve Kargl wrote:

> ...
> I have removed my new kernels and written shims to
> use the fdlibm trig kernels.  I know that in some
> cases this leads to slower code, but a small (if not
> tiny) improvement in accuracy.  This also leads to
> a significant reduction in code size.  A couple of
> notes:

Still quite large.
On Tue, 16 May 2017, Steve Kargl wrote:

> ...
> I have removed my new kernels and written shims to
> use the fdlibm trig kernels.  I know that in some
> cases this leads to slower code, but a small (if not
> tiny) improvement in accuracy.  This also leads to
> a significant reduction in code size.  A couple of
> notes:

Still quite large.

> Index: lib/msun/ld128/k_cospil.c

This is a header file, not a .c file or even a self-sufficient kernel,
since it only compiles when included by C files which declare the things
that it uses.

My kernels for exp are header files and are also self-sufficient (except
they only define static functions and objects, so are useless unless they
are included).

> +static inline long double
> +__kernel_cospil(long double x)
> +{
> +	long double hi, lo;
> +	hi = (double)x;

Style (missing blank line after declararions).

> Index: lib/msun/ld128/k_sinpil.c
> ...
> +	long double hi, lo;
> +	hi = (double)x;

Style.

> Index: lib/msun/ld128/s_cospil.c
> ...
> +static const long double
> +pihi = 3.14159265358979322702026593105983920e+00L,
> +pilo = 1.14423774522196636802434264184180742e-17L;

These are not in normal format, and are hard to read.  I can't see if
pihi has the correct number of zero bits for exact multiplication.

These don't have the normal spelling.  fdlibm never uses pihi or pio2hi,
or pi_hi.  It often uses pio2_hi and other pio2_*.  My s_sinpi.c uses
pi_hi.


> +long double
> +cospil(long double x)
> +{
> +	volatile static const double vzero = 0;
> +	long double ax, c, xf;
> +	uint32_t ix;
> +
> +	ax = fabsl(x);
> +

Style (lots of extra blank lines).

> +	if (ax < 0x1p112) {
> +
> +		xf = floorl(ax);
> +
> +		ax -= xf;
> +

These extra blank lines are especially bad, since they separate related
code.

> +		if (xf > 0x1p50) xf -= 0x1p50;
> +		if (xf > 0x1p30) xf -= 0x1p30;

Style.

> Index: lib/msun/ld128/s_sinpil.c

Similarly.

> Index: lib/msun/ld128/s_tanpil.c

Similarly.

> + * FIXME: This should use a polynomial approximation in [0,0.25], but the
> + * FIXME: increase in max ULP is probably small, so who cares.

Doesn't it already use the standard kernel, which uses a poly approx?

Splitting up [0, Pi/4] doesn't work very well for reducing the degree of
the poly.  The current poly has degree about 56 (maybe 52 or 54), and I
couldn't reduce it to below 40 using reasonably large interval.  I think
it can be reduced to degree about 10 using intervals of length 1/128 as
for logl(), but unlike for logl() this obvious way of doing this needs
about 128 diferent polys.

> +static inline long double
> +__kernel_tanpi(long double x)
> +{
> +	long double hi, lo, t;
> +
> +	if (x < 0.25) {
> +		hi = (double)x;
> +		lo = x - hi;
> +		lo = lo * (pilo + pihi) + hi * pilo;
> +		hi *= pihi;
> +		_2sumF(hi, lo);
> +		t = __kernel_tanl(hi, lo, -1);

Oops, I forgot that this 0.25 is really Pi/4 when scaled.  It is impossible
to use 1 poly over the whole range, since the poly would need to have even
higher degree than 50+, and would still have bad numeric properties.  so
__kernel_tan*() uses special methods.

> +long double
> +tanpil(long double x)
> +{
> +	volatile static const double vzero = 0;
> +	long double ax, xf;
> +	uint32_t ix;
> +
> +	ax = fabsl(ax);
> +
> +	if (ax < 1) {
> +		if (ax < 0.5) {
> +			if (ax < 0x1p-60) {
> +				if (x == 0)
> +					return (x);
> +				ax = (double)x;

Style (unnecessary cast).

> +				x -= ax;
> +				t = pilo * x + pihi * x + pilo * ax
> +				    + pihi * ax;

Style:
- unnecessary line splitting
- gnu style for placement of the operator on a new line.  If not following
   fdlibm style, use strict KNF.

> +				return (t);
> +			}
> +			t = __kernel_tanpil(ax);

For most long double functions, we put everything except the kernels in
the same files, with ifdefs for LDBL_MANT_DIG.  Here even the hi+lo
decomposition and multiplication by Pi can be in a kernel.  The style
bugs are especially painful to read when sextuplicated.  (The ld80
variants are quite different due to being better optimized, but this
is a different style bug.)

> Index: lib/msun/ld80/k_cospil.c

No more comments on sextuplicated style bugs.

> Index: lib/msun/ld80/s_cospil.c
> ...
> +		if (ix < 0x3ffe)			/* |x| < 0.5 */
> +			c = __kernel_sinpil(0.5 - ax);
> +		else if (lx < 0xc000000000000000ull) {	/* |x| < 0.75 */

Style bug (spelling of the long long abomination using "ull").  The
explicit abomination is unfortunately needed by old versions of gcc
with CFLAGS asking for nearly C90.

> +			if (ax == 0.5)
> +				RETURNI(0);

Perhaps pessimal to not test for 0.5 in bits.

It is pessimal to avoid generating inexact for 0.5 at all.  I don't see
a much better way.

> +	if (ix < 0x403e) {		/* 1 <= |x| < 0x1p63 */
> +		/* Determine integer part of ax. */
> +		j0 = ix - 0x3fff + 1;
> +		if (j0 < 32) {
> +			lx = (lx >> 32) << 32;
> +			lx &= ~(((lx << 32)-1) >> j0);
> +		} else {
> +			uint64_t m = (uint64_t)-1 >> (j0 + 1);
> +			if (lx & m) lx &= ~m;

Style:
- nested declaration
- initialization in declaration
- half the code in the initialization
- no newline for statement after 'if'

> +		}
> +		INSERT_LDBL80_WORDS(x, ix, lx);
> +
> +		ax -= x;
> +		EXTRACT_LDBL80_WORDS(ix, lx, ax);

I don't know if this is a good method.  INSERT/EXTRACT are slow, so I
doubt that it is.  In e_lgamma*.c:sin_pi*(), we use only a single GET
of the high word.  This function already used EXTRACT/INSERT to classify,
and 2 more here.  (_e_lgamma*.c used GET to classify before calling
sin_pi*(), then 1 more GET.)  INSERT is especially slow.

Care is needed near the 0x1p63 boundary, since the fast methods require
adding approx 0x1p63.  At or above the boundary, these methods don't
obviously work.  Perhaps they give the correct result starting at 2 times
the boundary, except for setting inexact.  Ugh.  Do you avoid the fast
methods since they set inexact for most half-integers?  That costs more
than avoiding inexact for 0.5.

tanpi(0.25 is exact too, so quarter-integers need special treatment too
if you really want to avoid all spurious inexacts.

I now see that some of the above complications are from inlining floor().
floor() can't use the fast method since it is careful about inexact, and
the spec may require this.  It actually sets inexact whenever the result
is not an integer.  So it use it for sinpi() without getting spurious
inexact, the arg must be multiplied by 2 first, and for tanpi(), multiply
by 4 first.  This is excessive.

> Index: lib/msun/ld80/s_sinpil.c

Similarly.

> Index: lib/msun/ld80/s_tanpil.c

Similarly.

> Index: lib/msun/src/k_cospi.c
> ...
> +/*
> + * The basic kernel for x in [0,0.25].  To exploit the kernel for cos(x),
> + * the argument to __kernel_cospi() must be multiply by pi.  As pi is an
> + * irrational number, this allows the splitting of pi*x into high and low
> + * parts.
> + */

Allows?

> +
> +static inline double
> +__kernel_cospi(double x)
> +{
> +	double hi, lo;
> +	hi = (float)x;

_2sumF is extensively documented to not work with double.  It requires
double_t here.

Many style bugs are actually duodecimally (?) tuplicated (3 copies for ld128,
3 for ld80, 3 for double, 3 for float).

> Index: lib/msun/src/k_sinpi.c

Similarly.

> Index: lib/msun/src/math.h
> ===================================================================
> --- lib/msun/src/math.h	(revision 318383)
> +++ lib/msun/src/math.h	(working copy)
> @@ -500,6 +500,15 @@
>
> #if __BSD_VISIBLE
> long double	lgammal_r(long double, int *);
> +double		cospi(double);
> +float		cospif(float);
> +long double	cospil(long double);
> +double		sinpi(double);
> +float		sinpif(float);
> +long double	sinpil(long double);
> +double		tanpi(double);
> +float		tanpif(float);
> +long double	tanpil(long double);
> #endif

Should be sorted into the double, float and long double sections.

> Index: lib/msun/src/s_cospi.c

> +/**
> + * cospi(x) computes cos(pi*x) without multiplication by pi (almost).  First,

You mean with multiplication by pi in extra precision.

See s_sinpi.c for more.

> Index: lib/msun/src/s_cospif.c

Similarly.

> Index: lib/msun/src/s_cospif.c

> +#define	INLINE_KERNEL_SINDF
> +#define	INLINE_KERNEL_COSDF

Prossibly not worth inlining.

> +#define  __kernel_cospif(x)   ((float)__kernel_cosdf(M_PI * (x)))
> +#define  __kernel_sinpif(x)   ((float)__kernel_sindf(M_PI * (x)))

Style bugs (not even corrupt tabs, but 2 spaces after #define and 3 spaces
instead of a second tab).

Bogus casts.  The kernels already return float, although that is pessimal
and destroys their the xtra precision that they naturally have.

The cost of avoiding inexact is clearer here.  Returning the non-kernel
sin(M_PI * (x)), etc. with no classifation would probably be faster than
the slow reduction done here (except for large x).

See s_sinpi.c for more.

> Index: lib/msun/src/s_sinpi.c
> ...
> +double
> +sinpi(double x)
> +{
> +	volatile static const double vzero = 0;
> +	double ax, s;
> +	uint32_t hx, ix, j0, lx;
> +
> +	EXTRACT_WORDS(hx, lx, x);
> +	ix = hx & 0x7fffffff;
> +	INSERT_WORDS(ax, ix, lx);

A slow way to set ax to fabs(x).

This is otherwise much slower than sin_pi() for lgamma().  That started
with just a GET of hx, while this EXTRACTs 2 words, like floor(), apparently
to avoid spurious inexact.

Note that lgamma(0.5) is naturally inexact.  lgamma(integer) should be exact,
but I doubt that it is.  Maybe we broke it.

> +
> +	if (ix < 0x3ff00000) {			/* |x| < 1 */
> +		if (ix < 0x3fd00000) {		/* |x| < 0.25 */
> +			if (ix < 0x3e200000) {	/* |x| < 0x1p-29 */
> +				if (x == 0)
> +					return (x);
> +				INSERT_WORDS(ax, hx, 0);
> +				x -= ax;

Flipping minus signs to reduce the number of cases for odd functions
is a bad way to do things.  I speeded up several functions by 4-10
cycles (5-20%) by avoiding such flipping.  Here there is a dependency
on ax and a slower than usual calculation of it with.  (On modern
x86, fabs would take ~2 cycles latency and ~1/2 cycles throughput),
while going throigh these extractions and insertions might take 10
cycles latency at best and 50 or more with store to load mismatch
penalties on not so modern x86).

Apart from being slower, sign flipping it prevents gives different
rounding in non-default modes.  C99 says that sin() is odd, so may
require the flipping to break the rounding modes intentionally.  If
so, then the bug is in C99.  I believe that it intentionally doesn't
require avoiding spurious inexact because this would be onerous.  It
also doesn't require sin() to be rounded in the current rounding mode.
Requiring sin() to be perfectly old is similarly onerous.

On Haswell, fdlibm floor() takes about 33 cycles and fdlibm sin() about
44 cycles below 2*Pi, so any sinpi() that uses floor() or its internals
is going to be about twice as slow as sin() because its classification
is too slow.

> +	if (ix < 0x43300000) {			/* 1 <= |x| < 0x1p52 */
> +		/* Determine integer part of ax. */
> +		j0 = ((ix >> 20) & 0x7ff) - 0x3ff;
> +		if (j0 < 20) {
> +			ix &= ~(0x000fffff >> j0);
> +			lx = 0;
> +		} else {
> +			lx &= ~((uint32_t)0xffffffff >> (j0 - 20));
> +		}
> +		INSERT_WORDS(x, ix, lx);
> +
> +		ax -= x;
> +		EXTRACT_WORDS(ix, lx, ax);

Much slower than what lgamma()s sin_pi() does.  After our optimizations,
that does basically rint() inline using ax + 0x1p52 - 0x1p52, then a
further += 0x1p50 to get octants instead of half-cycles.  This takes
approx 16 cycles latency on modern x86, which is lot.  I think and64 can
do the above in 10-20 cycles, while i386 might take 20-40 or much
longer with store-to-load mismatch penalties.

I think we can avoid the spurious inexacts with a little care.  When
ax is an integer (and < 0x1p52), we can add at least 0x1p51 to it exactly.
We also want to avoid inexact for half-integers for sinpi and cospi, and
quarter-integers for tanpi, and it is good to reduce to quadrants anyway.
So we might intitially multiply by 4 and do the += 0x1p52 trick only if
4*ax < 0x1p52 (or whatever the safe threshold is).  Efficiency doesn't
matter for larger ax, so slow methods using things like floor(16*ax)
are adequate, or we can reduce by a few powers of 2 using special cases:

 	v = 4 * ax;
 	if (v >= 0x1p54)
 		;		/* ax is an even integer (check factor) /
 	else {
 		if (v >= 0x1p53)
 			v -= 0x1p53;
 		if (v >= 0x1p52)
 			v -= 0x1p52;
 	}

I will finish this for lgamma*().  The fdlibm version does multiplications
by 2 and we seem to have removed a bit too much of that method.

> +	/*
> +	 * |x| >= 0x1p52 is always an integer, so return +-0.
> +	 * FIXME: should this raise FE_INEXACT or FE_INVALID?
> +	 */

The result is always +0, exact and valid (maybe -0 in unsupported
unusual rounding modes).  We just did a lot of work to avoid inexact.

> +	if (ax + 1 > 1)
> +		ax = copysign(0, x);

An even slower way of setting the sign than INSERT, especially if
the compiler doesn't inline copysign().  Using the correct +0 fixes
this.

We might have classified slightly smaller values as odd integers and
have to select the sign for them.  This is done differently now.

> Index: lib/msun/src/s_sinpif.c
> ...
> Index: lib/msun/src/s_tanpi.c
> ...
> Index: lib/msun/src/s_tanpif.c
> ...

Similarly.

Bruce

From owner-freebsd-numerics@freebsd.org  Wed May 17 04:20:25 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 74D97D70925
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 17 May 2017 04:20:25 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5DACF106D
 for <freebsd-numerics@freebsd.org>; Wed, 17 May 2017 04:20:25 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4H4KN2R043753
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Tue, 16 May 2017 21:20:24 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4H4KNYs043752;
 Tue, 16 May 2017 21:20:23 -0700 (PDT) (envelope-from sgk)
Date: Tue, 16 May 2017 21:20:23 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
Message-ID: <20170517042023.GA43647@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170409220809.GA25076@troutmask.apl.washington.edu>
 <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
 <20170517094848.A52247@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170517094848.A52247@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 May 2017 04:20:25 -0000

On Wed, May 17, 2017 at 12:49:45PM +1000, Bruce Evans wrote:
> On Tue, 16 May 2017, Steve Kargl wrote:
> > Index: lib/msun/ld128/s_cospil.c
> > +long double
> > +cospil(long double x)
> > +{
> > +	volatile static const double vzero = 0;
> > +	long double ax, c, xf;
> > +	uint32_t ix;
> > +
> > +	ax = fabsl(x);
> > +
> 
> Style (lots of extra blank lines).

If there are only style issues, then we are fortunate as
I have not even compiled the code.

> > +		ax -= xf;
> > +
> 
> These extra blank lines are especially bad, since they separate related
> code.

Extra blank lines make things easier to read.

> 
> > + * FIXME: This should use a polynomial approximation in [0,0.25], but the
> > + * FIXME: increase in max ULP is probably small, so who cares.
> 
> Doesn't it already use the standard kernel, which uses a poly approx?

Whoops forgot to remove the unneeded comment.

> > +	volatile static const double vzero = 0;
> > +	long double ax, xf;
> > +	uint32_t ix;
> > +
> > +	ax = fabsl(ax);
> > +
> > +	if (ax < 1) {
> > +		if (ax < 0.5) {
> > +			if (ax < 0x1p-60) {
> > +				if (x == 0)
> > +					return (x);
> > +				ax = (double)x;
> 
> Style (unnecessary cast).

Huh?  ax is long double.  x is long double.  The cast should
remove 11 bits.

> > +				x -= ax;
> > +				t = pilo * x + pihi * x + pilo * ax
> > +				    + pihi * ax;
> 
> Style:
> - unnecessary line splitting

The is line is 80 characters without splitting (and even longer if 
one mandates pi_lo and pi_hi).  I use an 80 character wide and wrap
lines that are 79+ characters.

> > Index: lib/msun/ld80/s_cospil.c
> > ...
> > +		if (ix < 0x3ffe)			/* |x| < 0.5 */
> > +			c = __kernel_sinpil(0.5 - ax);
> > +		else if (lx < 0xc000000000000000ull) {	/* |x| < 0.75 */
> 
> Style bug (spelling of the long long abomination using "ull").  The
> explicit abomination is unfortunately needed by old versions of gcc
> with CFLAGS asking for nearly C90.

Whoops that one may have slipped through.

> > +			if (ax == 0.5)
> > +				RETURNI(0);
> 
> Perhaps pessimal to not test for 0.5 in bits.
> 
> It is pessimal to avoid generating inexact for 0.5 at all.  I don't see
> a much better way.

I tried using bits and floating point.  Saw no difference.
cospi(n.5) is exact (see n1950.pdf).  See comment in s_cospi.c
that documents special cases.

> > +		}
> > +		INSERT_LDBL80_WORDS(x, ix, lx);
> > +
> > +		ax -= x;
> > +		EXTRACT_LDBL80_WORDS(ix, lx, ax);
> 
> I don't know if this is a good method.  INSERT/EXTRACT are slow, so I
> doubt that it is.  In e_lgamma*.c:sin_pi*(), we use only a single GET
> of the high word.  This function already used EXTRACT/INSERT to classify,
> and 2 more here.  (_e_lgamma*.c used GET to classify before calling
> sin_pi*(), then 1 more GET.)  INSERT is especially slow.
> 
> Care is needed near the 0x1p63 boundary, since the fast methods require
> adding approx 0x1p63.  At or above the boundary, these methods don't
> obviously work.  Perhaps they give the correct result starting at 2 times
> the boundary, except for setting inexact.  Ugh.  Do you avoid the fast
> methods since they set inexact for most half-integers?  That costs more
> than avoiding inexact for 0.5.

n1950.pdf says cospi(n.5) is exact.  Perhaps, the __kernel_cos()
and __kernel_sin() provide an exact result.  I did not investigate.

> tanpi(0.25 is exact too, so quarter-integers need special treatment too
> if you really want to avoid all spurious inexacts.

It does.

> I now see that some of the above complications are from inlining floor().

Yes, it is an inline of floor().

> > +/*
> > + * The basic kernel for x in [0,0.25].  To exploit the kernel for cos(x),
> > + * the argument to __kernel_cospi() must be multiply by pi.  As pi is an
> > + * irrational number, this allows the splitting of pi*x into high and low
> > + * parts.
> > + */
> 
> Allows?

x = n + r.  For double, r has 53 bits.  There are no trailing low bits
from the argument reduction.  pi is irrational, and therefore not exactly
representable by 53 bits (or any number of bits).  __kernel_cospi() uses
__kernel_cos(), and the low bits come about because of pi.  I split
x into high in low.  So, we have

lo = pilo * xlo + pilo * xhi + pihi * xlo;  /* all the low bits */
hi = pihi * xhi;  /* Exact */
_2sumF(hi, lo);   /* Maybe superfluous? */

> > +
> > +static inline double
> > +__kernel_cospi(double x)
> > +{
> > +	double hi, lo;
> > +	hi = (float)x;
> 
> _2sumF is extensively documented to not work with double.  It requires
> double_t here.

It seems to work base on testing hundreds of millions of values. :-)

> > +#define	INLINE_KERNEL_SINDF
> > +#define	INLINE_KERNEL_COSDF
> 
> Prossibly not worth inlining.
> 
> > +#define  __kernel_cospif(x)   ((float)__kernel_cosdf(M_PI * (x)))
> > +#define  __kernel_sinpif(x)   ((float)__kernel_sindf(M_PI * (x)))
> 
> Style bugs (not even corrupt tabs, but 2 spaces after #define and 3 spaces
> instead of a second tab).

Ugh.  My bad.  I have nedit set to do 3-space tabs.  I forgot to
reset to use hard tabs.

> Bogus casts.  The kernels already return float, although that is pessimal
> and destroys their the xtra precision that they naturally have.

Exhaustive testing shows max ULP < 0.500xyyy where might be 0, but
can't recall now.  How exact to you want these?

> > Index: lib/msun/src/s_sinpi.c
> > ...
> > +double
> > +sinpi(double x)
> > +{
> > +	volatile static const double vzero = 0;
> > +	double ax, s;
> > +	uint32_t hx, ix, j0, lx;
> > +
> > +	EXTRACT_WORDS(hx, lx, x);
> > +	ix = hx & 0x7fffffff;
> > +	INSERT_WORDS(ax, ix, lx);
> 
> A slow way to set ax to fabs(x).

Make it work.  Make it correct.  Make it fast.  I've undertaken
the first two.

> This is otherwise much slower than sin_pi() for lgamma().  That started
> with just a GET of hx, while this EXTRACTs 2 words, like floor(), apparently
> to avoid spurious inexact.
> 
> Note that lgamma(0.5) is naturally inexact.  lgamma(integer) should be exact,
> but I doubt that it is.  Maybe we broke it.

See below.

> > +	if (ix < 0x43300000) {			/* 1 <= |x| < 0x1p52 */
> > +		/* Determine integer part of ax. */
> > +		j0 = ((ix >> 20) & 0x7ff) - 0x3ff;
> > +		if (j0 < 20) {
> > +			ix &= ~(0x000fffff >> j0);
> > +			lx = 0;
> > +		} else {
> > +			lx &= ~((uint32_t)0xffffffff >> (j0 - 20));
> > +		}
> > +		INSERT_WORDS(x, ix, lx);
> > +
> > +		ax -= x;
> > +		EXTRACT_WORDS(ix, lx, ax);
> 
> Much slower than what lgamma()s sin_pi() does.  After our optimizations,
> that does basically rint() inline using ax + 0x1p52 - 0x1p52,

I found a situation where this trick seems to round in the wrong direction.
You participated in the discussion. 

https://lists.freebsd.org/pipermail/freebsd-numerics/2017-March/000658.html
https://lists.freebsd.org/pipermail/freebsd-numerics/2017-March/000660.html

> then a
> further += 0x1p50 to get octants instead of half-cycles.  This takes
> approx 16 cycles latency on modern x86, which is lot.  I think and64 can
> do the above in 10-20 cycles, while i386 might take 20-40 or much
> longer with store-to-load mismatch penalties.
> 
> I think we can avoid the spurious inexacts with a little care.  When
> ax is an integer (and < 0x1p52), we can add at least 0x1p51 to it exactly.
> We also want to avoid inexact for half-integers for sinpi and cospi, and
> quarter-integers for tanpi, and it is good to reduce to quadrants anyway.
> So we might intitially multiply by 4 and do the += 0x1p52 trick only if
> 4*ax < 0x1p52 (or whatever the safe threshold is).  Efficiency doesn't
> matter for larger ax, so slow methods using things like floor(16*ax)
> are adequate, or we can reduce by a few powers of 2 using special cases:
> 
>  	v = 4 * ax;
>  	if (v >= 0x1p54)
>  		;		/* ax is an even integer (check factor) /
>  	else {
>  		if (v >= 0x1p53)
>  			v -= 0x1p53;
>  		if (v >= 0x1p52)
>  			v -= 0x1p52;
>  	}

I'll see if I can understand this and modify waht I have. 
Unfortunately, I'm running out of time to work on this.


> The result is always +0, exact and valid (maybe -0 in unsupported
> unusual rounding modes).  We just did a lot of work to avoid inexact.

Not according to n1950.pdf.  It states

sinpi(+-n) = +-0 for positive integers.

> 
> > +	if (ax + 1 > 1)
> > +		ax = copysign(0, x);
> 
> An even slower way of setting the sign than INSERT, especially if
> the compiler doesn't inline copysign().  Using the correct +0 fixes
> this.

|x| > 0x1p(P-1) is (should be) exceptional.   Who cares if it's fast.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Wed May 17 10:48:33 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4AC42D7113F
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 17 May 2017 10:48:33 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id E98CF993
 for <freebsd-numerics@freebsd.org>; Wed, 17 May 2017 10:48:32 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 711AB42AB13;
 Wed, 17 May 2017 20:48:23 +1000 (AEST)
Date: Wed, 17 May 2017 20:48:22 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
In-Reply-To: <20170517042023.GA43647@troutmask.apl.washington.edu>
Message-ID: <20170517184838.O60700@besplex.bde.org>
References: <20170409220809.GA25076@troutmask.apl.washington.edu>
 <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
 <20170517094848.A52247@besplex.bde.org>
 <20170517042023.GA43647@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=zS366X0SCmc1Lw_eJcsA:9
 a=KAAw_Sxtfs94dvDO:21 a=rDEoI6jNYpUyrzqe:21 a=CjuIK1q_8ugA:10
 a=TEblM0x7jKwA:10 a=R-d4_Vt-f0IA:10 a=IjZwj45LgO3ly-622nXo:22
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 May 2017 10:48:33 -0000

On Tue, 16 May 2017, Steve Kargl wrote:

> On Wed, May 17, 2017 at 12:49:45PM +1000, Bruce Evans wrote:
>> On Tue, 16 May 2017, Steve Kargl wrote:
>>> Index: lib/msun/ld128/s_cospil.c
>>> +long double
>>> +cospil(long double x)
>>> +{
>>> +	volatile static const double vzero = 0;
>>> +	long double ax, c, xf;
>>> +	uint32_t ix;
>>> +
>>> +	ax = fabsl(x);
>>> +
>>
>> Style (lots of extra blank lines).
>
> If there are only style issues, then we are fortunate as
> I have not even compiled the code.
>
>>> +		ax -= xf;
>>> +
>>
>> These extra blank lines are especially bad, since they separate related
>> code.
>
> Extra blank lines make things easier to read.

Harder.  Related lines should be together.

>>> +	volatile static const double vzero = 0;
>>> +	long double ax, xf;
>>> +	uint32_t ix;
>>> +
>>> +	ax = fabsl(ax);
>>> +
>>> +	if (ax < 1) {
>>> +		if (ax < 0.5) {
>>> +			if (ax < 0x1p-60) {
>>> +				if (x == 0)
>>> +					return (x);
>>> +				ax = (double)x;
>>
>> Style (unnecessary cast).
>
> Huh?  ax is long double.  x is long double.  The cast should
> remove 11 bits.

Oops.  This special case for small x is annoying.

>>> +				x -= ax;
>>> +				t = pilo * x + pihi * x + pilo * ax
>>> +				    + pihi * ax;
>>
>> Style:
>> - unnecessary line splitting
>
> The is line is 80 characters without splitting (and even longer if
> one mandates pi_lo and pi_hi).  I use an 80 character wide and wrap
> lines that are 79+ characters.

It seemed a bit longer when I tested it before.

I missed the confusing names pessimal arrangement of terms before.  ax
is clobbered to hold the hi term, and x is clobbered to hold the lo
term.  Only the pihi * hi term (spelled pihi * ax) needs to be evaluated
exactly or even accurately.  pilo * x can possibly be ignored, but it
is better to group terms so that evaluating it is free.  After grouping
terms, the expression fits in 79 columns:

 				t = (pihi + pilo) * x + pilo * ax + pihi * ax;

Burt with better variable names, it no longer fits, unless (pihi + pilo) is
spelled pi.

>>> Index: lib/msun/ld80/s_cospil.c
>>> ...
>>> +		if (ix < 0x3ffe)			/* |x| < 0.5 */
>>> +			c = __kernel_sinpil(0.5 - ax);
>>> +		else if (lx < 0xc000000000000000ull) {	/* |x| < 0.75 */
>>
>> Style bug (spelling of the long long abomination using "ull").  The
>> explicit abomination is unfortunately needed by old versions of gcc
>> with CFLAGS asking for nearly C90.
>
> Whoops that one may have slipped through.

There are also some f suffixes which should be F.

>>> +			if (ax == 0.5)
>>> +				RETURNI(0);
>>
>> Perhaps pessimal to not test for 0.5 in bits.
>>
>> It is pessimal to avoid generating inexact for 0.5 at all.  I don't see
>> a much better way.
>
> I tried using bits and floating point.  Saw no difference.
> cospi(n.5) is exact (see n1950.pdf).  See comment in s_cospi.c
> that documents special cases.

I havven't looked at n1950.pdf.  Exact shouldn't mean "no spurious inexact".

>>> +/*
>>> + * The basic kernel for x in [0,0.25].  To exploit the kernel for cos(x),
>>> + * the argument to __kernel_cospi() must be multiply by pi.  As pi is an
>>> + * irrational number, this allows the splitting of pi*x into high and low
>>> + * parts.
>>> + */
>>
>> Allows?
>
> x = n + r.  For double, r has 53 bits.  There are no trailing low bits
> from the argument reduction.  pi is irrational, and therefore not exactly
> representable by 53 bits (or any number of bits).  __kernel_cospi() uses
> __kernel_cos(), and the low bits come about because of pi.  I split
> x into high in low.  So, we have

It is a misdescription or grammar error.  Errors start with "exploit"
instead of "use" (this is only a style bug).  Then "multiply" instead of
"multiplied".  Irrationality has little to do with things.  Most numbers,
rational, irrational and algebraic and transcendental, are not exactly
representable.  This doesn't allow splitting, but usually requires
splitting.  Transcendental numbers are less likely to need splitting than
albegraic numbers since they are more likely to be nearly representable,
but since no numeric accident for pi occurs or is arranged or easily
arrangeable near here, it does need splitting.

The source code is not the place to describe basic splitting.

> lo = pilo * xlo + pilo * xhi + pihi * xlo;  /* all the low bits */
> hi = pihi * xhi;  /* Exact */
> _2sumF(hi, lo);   /* Maybe superfluous? */

In my experiments, _2sumF was needed to prevent too much being in the lo
term.

The lo term has pessimal grouping here.  It should be written as
(pihi + pilo) * xlo + pilo * xhi or pilo * x + pihi * xlo.  The second one
is probably better since it has a shorter dependency chain for pilo * x.

>>> +static inline double
>>> +__kernel_cospi(double x)
>>> +{
>>> +	double hi, lo;
>>> +	hi = (float)x;
>>
>> _2sumF is extensively documented to not work with double.  It requires
>> double_t here.
>
> It seems to work base on testing hundreds of millions of values. :-)

It never works with extra precision.  Except when double is accidentally
the same as double_t.  That is, when the variables declared as double
remain in registers.  While they are there, they act like a hi-lo
decomposition in double_t.  This breaks as soon as the variables are
spilled or passed to a function.  You inlined the kernels and this masks
the problem.  The problem still occurs if the compiler decides not to
inline.

Example: let x be just 1.25.  The initial (hi, lo) is (1.25, 0).  After
multiplying by pi, it is (1.25 * pihi, 1.25 * pilo).  The lo term here is
small enough to work directly.  But we have to merger the terms for the
general case.  We use _2sumF().  With extra precision, this puts most
of the lo term in the hi term.  (hi, lo) is now approx. (1.25 * pi, 0)
in extra precision.  Passing this to a function or assigning it using a
C99 compiler loses all of the extra precision and gives approx.
(1.25 * pi, 0) in double precision.  If there is extra precision and the
variable type is plain double, then _2sumF() also depends on not being
compiled by a C99 compiler for efficiency.  C99 compilers produce a
(hi, lo) decomposition in plain doubles, but this is too slow to use.

My logl() depended on the double variables staying in registers not
so accidentally for many years until I found a portable way to write
an efficient _2sumF().

>>> +#define	INLINE_KERNEL_SINDF
>>> +#define	INLINE_KERNEL_COSDF
>>
>> Prossibly not worth inlining.

Inlining is a small pessimization in most of my tests in lgamma.

>> Bogus casts.  The kernels already return float, although that is pessimal
>> and destroys their the xtra precision that they naturally have.
>
> Exhaustive testing shows max ULP < 0.500xyyy where might be 0, but
> can't recall now.  How exact to you want these?

The same number of bits as there are internally.  This is about 10 extra
bits to get the accuracy of 0.5009 ulps (the error is about 1 in 1024 ot
0.0009 internally and 0.5000 more for rounding to float).


>>> Index: lib/msun/src/s_sinpi.c
>>> ...
>>> +double
>>> +sinpi(double x)
>>> +{
>>> +	volatile static const double vzero = 0;
>>> +	double ax, s;
>>> +	uint32_t hx, ix, j0, lx;
>>> +
>>> +	EXTRACT_WORDS(hx, lx, x);
>>> +	ix = hx & 0x7fffffff;
>>> +	INSERT_WORDS(ax, ix, lx);
>>
>> A slow way to set ax to fabs(x).
>
> Make it work.  Make it correct.  Make it fast.  I've undertaken
> the first two.

Simple fabs() already does the first 2 much more easily.  The fdlibm
implemenation of it is slow essentially because it does the above
EXTRACT/INSERT (a subset, but almost as slow).  fabs() is normally
a builtin and doesn't use these.

ax = (x < 0 ? -x : x) is also faster than the above, but doesn't work
right for NaNs.  It can be done after testing for NaNs.

>> Note that lgamma(0.5) is naturally inexact.  lgamma(integer) should be exact,
>> but I doubt that it is.  Maybe we broke it.

We didn't.

>>> +	if (ix < 0x43300000) {			/* 1 <= |x| < 0x1p52 */
>>> +		/* Determine integer part of ax. */
>>> +		j0 = ((ix >> 20) & 0x7ff) - 0x3ff;
>>> +		if (j0 < 20) {
>>> +			ix &= ~(0x000fffff >> j0);
>>> +			lx = 0;
>>> +		} else {
>>> +			lx &= ~((uint32_t)0xffffffff >> (j0 - 20));
>>> +		}
>>> +		INSERT_WORDS(x, ix, lx);
>>> +
>>> +		ax -= x;
>>> +		EXTRACT_WORDS(ix, lx, ax);
>>
>> Much slower than what lgamma()s sin_pi() does.  After our optimizations,
>> that does basically rint() inline using ax + 0x1p52 - 0x1p52,
>
> I found a situation where this trick seems to round in the wrong direction.
> You participated in the discussion.
>
> https://lists.freebsd.org/pipermail/freebsd-numerics/2017-March/000658.html
> https://lists.freebsd.org/pipermail/freebsd-numerics/2017-March/000660.html

We always had a fix for that in lgamma.  Now I'm not sure why floor() and
rint() don't use our method.

>> I think we can avoid the spurious inexacts with a little care.  When
>> ax is an integer (and < 0x1p52), we can add at least 0x1p51 to it exactly.
>> We also want to avoid inexact for half-integers for sinpi and cospi, and
>> quarter-integers for tanpi, and it is good to reduce to quadrants anyway.
>> So we might intitially multiply by 4 and do the += 0x1p52 trick only if
>> 4*ax < 0x1p52 (or whatever the safe threshold is).  Efficiency doesn't
>> matter for larger ax, so slow methods using things like floor(16*ax)
>> are adequate, or we can reduce by a few powers of 2 using special cases:
>>
>>  	v = 4 * ax;
>>  	if (v >= 0x1p54)
>>  		;		/* ax is an even integer (check factor) /
>>  	else {
>>  		if (v >= 0x1p53)
>>  			v -= 0x1p53;
>>  		if (v >= 0x1p52)
>>  			v -= 0x1p52;
>>  	}

I ended up going the mostly other way and removing all of the bit fiddling.
The preliminary +-0x1p52 was mostly a waste of time, so I removed it, but
;ater I brought it back in a different form and removed the secondatry
+-0x1p50.  Then I compressed the case statement.  The result is much simpler
but only slightly faster:

XX /*
XX  * Compute sin(pi*x), with the multiplication done after reducing x.
XX  * sin_pi(x) is only called for x < 0 and |x| < 2**(p-1) where p is
XX  * the precision of x.
XX  */
XX static double
XX sin_pi(double x)
XX {
XX 	volatile double vz;
XX 	double s,y,z;
XX 
XX 	y = -x;
XX 
XX 	/*
XX 	 * Redundant step to demonstrate a general version.  The usual magic
XX 	 * for rounding needs only no overflow, which this step ensures.
XX 	 */
XX 	if (y >= 0x1p53)
XX 	    return zero;		/* even integer */
XX 
XX 	if (y < 2)
XX 	    goto y_reduced;
XX 	vz = y+0x1p53;
XX 	z = vz-0x1p53;			/* y rounded to a multiple of 2 */
XX 	if (z > y)
XX 	    z -= 2;			/* adjust to round down */
XX 	y = y - z;			/* y mod 2 */
XX y_reduced:
XX 	if (y >= one) {
XX 	    s = -one;
XX 	    y -= one;
XX 	} else
XX 	    s = one;
XX 	if (y == zero)
XX 	    return zero;		/* avoid inexact at poles */
XX 	if (y >= 0.75)
XX 	    y -= one;			/* join with y < 0.25 case */
XX 	if (y < 0.25) {
XX 	    return __kernel_sin(pi*y,zero,0);
XX 	} else
XX 	    return __kernel_cos(pi*(y-half),zero);
XX }

YY static float
YY sin_pif(float x)
YY {
YY 	volatile float vz;
YY 	float s,y,z;
YY 
YY 	y = -x;
YY 
YY 	if (y >= 0x1p24)
YY 	    return zero;		/* y is even integer */
YY 
YY 	if (y < 2)
YY 	    goto y_reduced;
YY 	vz = y+0x1p24F;
YY 	z = vz-0x1p24F;			/* y rounded to a multiple of 2 */
YY 	if (z > y)
YY 	    z -= 2;			/* adjust to round down */
YY 	y = y - z;			/* y mod 2 */
YY y_reduced:
YY 	if (y > one) {
YY 	    s = -one;
YY 	    y -= one;
YY 	} else
YY 	    s = one;
YY 	if (y == zero)
YY 	    return zero;		/* avoid inexact at poles */
YY 	if (y >= 0.75F)
YY 	    y -= one;			/* join with y < 0.25 case */
YY 	if (y < 0.25F)
YY 	    return s*__kernel_sindf(pi*y);
YY 	else
YY 	    return s*__kernel_cosdf(pi*(y-half));
YY }

This also does the sign stuff more like you.

It works to be very sloppy with signs here.  lgamma squares the result.
But leaving out the s's makes little difference to the speed.  After
compressing the case statement, the sign handling takes little space too.

This sin_pif() with its inaccurate multiplication by 'float pi' gives
more accuracy for lgammaf() than your more accurate version.  Replacing
pi in the multiplications by M_PI acts similarly strangely.  This seems
to be because lgamma() does the calculation pi/(sin_pif(x)*x) using its
pi.  When x is small, this pi/(pi*x*x) and it is apparently better to
have the same pi in the numerator and denominator.  My test gives too
much coverage to x's near 0, enough for the error count for these cases
to dominate improvements elsewhere.

>> The result is always +0, exact and valid (maybe -0 in unsupported
>> unusual rounding modes).  We just did a lot of work to avoid inexact.
>
> Not according to n1950.pdf.  It states
>
> sinpi(+-n) = +-0 for positive integers.

Bug in n1950.pdf.  The sign depends on which side of +-n you are on,
and there is no reason to prefer either.  Rather the opposite -- 
limits with x increasing gives the opposite.

>>> +	if (ax + 1 > 1)
>>> +		ax = copysign(0, x);
>>
>> An even slower way of setting the sign than INSERT, especially if
>> the compiler doesn't inline copysign().  Using the correct +0 fixes
>> this.
>
> |x| > 0x1p(P-1) is (should be) exceptional.   Who cares if it's fast.

It is also buggy.  ax here is large, so ax + 1 > 1 is always true.
If ax is DBL_MAX, then ax + 1 gives a spurious overflow exception only
in unsupported rounding mode, and then only if there is no extra precision.

The test is apparently intended to check for odd integers.  Something
like (ax + 1 > ax) is closer to working for that.  But it doesn't work
with extra precision, and risks overflow.

My classification handles this by reducing to 0 <= y < 2 by subtracting
an even integer from y = |x|.  y == 0 means an even integer and y == 1
means an odd integer.  Then it takes special care to return +0 for odd
integers :-).  It reduces 1 to 0; it sets the sign, but doesn't use it
when y was 1.  Falling through to s*__kernel_sin(0...)  would probably
work to give the n1950.pdf bug and not give spurious inexact.

Bruce

From owner-freebsd-numerics@freebsd.org  Wed May 17 18:09:34 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A761DD713C5
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 17 May 2017 18:09:34 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 84F351647
 for <freebsd-numerics@freebsd.org>; Wed, 17 May 2017 18:09:34 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4HI9SHA054575
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 17 May 2017 11:09:28 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4HI9REn054574;
 Wed, 17 May 2017 11:09:27 -0700 (PDT) (envelope-from sgk)
Date: Wed, 17 May 2017 11:09:27 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
Message-ID: <20170517180927.GA54431@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170409220809.GA25076@troutmask.apl.washington.edu>
 <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
 <20170517094848.A52247@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170517094848.A52247@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 May 2017 18:09:34 -0000

On Wed, May 17, 2017 at 12:49:45PM +1000, Bruce Evans wrote:
> On Tue, 16 May 2017, Steve Kargl wrote:
> 
> > Index: lib/msun/ld128/s_cospil.c
> > ...
> > +static const long double
> > +pihi = 3.14159265358979322702026593105983920e+00L,
> > +pilo = 1.14423774522196636802434264184180742e-17L;
> 
> These are not in normal format, and are hard to read.  I can't see if
> pihi has the correct number of zero bits for exact multiplication.

I don't have access to an ld128 system with suitable 
facilities to allow me to spit out the hex representation.
As such, I've added a comment

/*
 * pi_hi contains the leading 56 bits of a 169 bit approximation for pi.
 */

> 
> These don't have the normal spelling.  fdlibm never uses pihi or pio2hi,
> or pi_hi.  It often uses pio2_hi and other pio2_*.  My s_sinpi.c uses
> pi_hi.

fdlibm has no convention.  For example, e_log10.c actually uses
a mixature of convensions.

static const double
two54      =  1.80143985094819840000e+16, /* 0x43500000, 0x00000000 */
ivln10hi   =  4.34294481878168880939e-01, /* 0x3fdbcb7b, 0x15200000 */
ivln10lo   =  2.50829467116452752298e-11, /* 0x3dbb9438, 0xca9aadd5 */
log10_2hi  =  3.01029995663611771306e-01, /* 0x3FD34413, 0x509F6000 */
log10_2lo  =  3.69423907715893078616e-13; /* 0x3D59FEF3, 0x11F12B36 */

and 

        double f,hfsq,hi,lo,r,val_hi,val_lo,w,y,y2;

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Wed May 17 23:26:09 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B6EC1D71CBA
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 17 May 2017 23:26:09 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 6620514CB
 for <freebsd-numerics@freebsd.org>; Wed, 17 May 2017 23:26:08 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id E87F8429AEB;
 Thu, 18 May 2017 09:25:59 +1000 (AEST)
Date: Thu, 18 May 2017 09:25:58 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
In-Reply-To: <20170517180927.GA54431@troutmask.apl.washington.edu>
Message-ID: <20170518072636.E951@besplex.bde.org>
References: <20170409220809.GA25076@troutmask.apl.washington.edu>
 <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
 <20170517094848.A52247@besplex.bde.org>
 <20170517180927.GA54431@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=WvBbCZXv c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=Sek2kApjW0X8YfFk6SkA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 May 2017 23:26:09 -0000

On Wed, 17 May 2017, Steve Kargl wrote:

> On Wed, May 17, 2017 at 12:49:45PM +1000, Bruce Evans wrote:
>> On Tue, 16 May 2017, Steve Kargl wrote:
>>
>>> Index: lib/msun/ld128/s_cospil.c
>>> ...
>>> +static const long double
>>> +pihi = 3.14159265358979322702026593105983920e+00L,
>>> +pilo = 1.14423774522196636802434264184180742e-17L;
>>
>> These are not in normal format, and are hard to read.  I can't see if
>> pihi has the correct number of zero bits for exact multiplication.
>
> I don't have access to an ld128 system with suitable
> facilities to allow me to spit out the hex representation.

flame is still in the freebsd cluster, but is almost unusable because
its files are an old copy of actual home directories on freefall.

Utilities for printing hex and FP in good formats are remarkably rare.
pari/gp supports many special math types but can only input or output
ordinarly numbers in decimal.  I use my library routines to print hex
and special FP formats for it.  sh, printf(1) and awk(1) are limited
by native tyoes and have many other defects.

> As such, I've added a comment
>
> /*
> * pi_hi contains the leading 56 bits of a 169 bit approximation for pi.
> */

Why 56?  53 + 56 = 109.

>> These don't have the normal spelling.  fdlibm never uses pihi or pio2hi,
>> or pi_hi.  It often uses pio2_hi and other pio2_*.  My s_sinpi.c uses
>> pi_hi.
>
> fdlibm has no convention.  For example, e_log10.c actually uses
> a mixature of convensions.
>
> static const double
> two54      =  1.80143985094819840000e+16, /* 0x43500000, 0x00000000 */
> ivln10hi   =  4.34294481878168880939e-01, /* 0x3fdbcb7b, 0x15200000 */
> ivln10lo   =  2.50829467116452752298e-11, /* 0x3dbb9438, 0xca9aadd5 */
> log10_2hi  =  3.01029995663611771306e-01, /* 0x3FD34413, 0x509F6000 */
> log10_2lo  =  3.69423907715893078616e-13; /* 0x3D59FEF3, 0x11F12B36 */

My log* (committed only for long double precision) cleans this up a bit
and uses invln10_hi, etc.

My improvements to lgamma()'s sin_pi*() are buggy.  I forgot the original
target of avoiding inexact better:

XX --- /tmp/bak/e_lgamma_r.c	Thu Oct 30 09:20:46 2014
XX +++ e_lgamma_r.c	Thu May 18 07:46:37 2017
XX @@ -157,5 +157,5 @@
XX 
XX  /*
XX - * Compute sin(pi*x) without actually doing the pi*x multiplication.
XX + * Compute sin(pi*x), with the multiplication done after reducing x.
XX   * sin_pi(x) is only called for x < 0 and |x| < 2**(p-1) where p is
XX   * the precision of x.

You added this comment.  It is not very useful.  Its first sentence was
wrong about not doing the multiplication (fixed).  It's second sentence
is redundant.  The caller is nearby, and spells p-1 literally so that
it is easier to see that the magic numbers here match up.  The caller's
code is:

 	    if(ix>=0x43300000) 	/* |x|>=2**52, must be -integer */
 		return one/vzero;
 	    t = sin_pi(x);
 	    if(t==zero) return one/vzero; /* -integer */

This is a not so good optimization optimization.  This function has to be
careful with large values and it would be clearer to do all the checking
itself.  Then the t==zero check will handle the large integer.  My bug
and old code and optimizations are related to this.

XX @@ -165,35 +165,149 @@
XX  {
XX  	volatile double vz;
XX -	double y,z;
XX -	int n;
XX +	double s,y,z;
XX 
XX  	y = -x;
XX 
XX -	vz = y+0x1p52;			/* depend on 0 <= y < 0x1p52 */

The dependencies are for 2 things:
- no overflow when y is very large
- no spurious inexact when y is just 0x1p52.  For y below 0x1p52, the
   addition keeps the integer part and loses the fractional part, so
   it gives inexact if the fractional part is nonzero.  This is just
   what we want for lgamma.  However, for general sinpi() we don't
   want inexact for half-integers, and for general tanpi() we don't
   want inexact for quarter-integers.

XX -	z = vz-0x1p52;			/* rint(y) for the above range */
XX -	if (z == y)
XX -	    return zero;

My initially fix was to remove the above code.  Integers can be classifyed
later.

fdlibm starts with floor() here.  This works for lgamma(), but gives the
same inexact for half- and quarter-integers as the above, so is not
suitable for general used.

XX -
XX -	vz = y+0x1p50;
XX -	GET_LOW_WORD(n,vz);		/* bits for rounded y (units 0.25) */
XX -	z = vz-0x1p50;			/* y rounded to a multiple of 0.25 */
XX -	if (z > y) {
XX -	    z -= 0.25;			/* adjust to round down */
XX -	    n--;

This classifies quarter-integers well.  y is 0 at exact quarter-integers, and
there is no spurious inexact for them.  n tells you which quarter-integer
it was.

XX +	/*
XX +	 * Redundant step to demonstrate a general version.  The usual magic
XX +	 * for rounding needs only no overflow, which this step ensures.
XX +	 */
XX +	if (y >= 0x1p53)
XX +	    return zero;		/* even integer */

The comment is wrong.  I changed from 0x1p5{0,2} to 0x1p53 to get y in the
full range [0, 2) instead of [0, 0.25) or [0, 1), but this gives spurious
inexact for odd integers between 0x1p52 and 0x1p53 as well as for the half-
and quarter-integers.

This can be fixed using special classification near 0x1p52:

- >= 0x1p53: even integer
- in [0x1p52, 0x1p53): integer
- in [0x1p51, 0x1p52): half-integer (that is, integer or integer + 0.5
- in [0x1p50, 0x1p51): quarter-integer
- < 0x1p50: use general method

but it seems better to go back to the general method of adding and
subtracting 0x1p50.  That gives a null rounding step (without inexact)
for all values >= 0x1p50 provided only that overflow doesn't occur.
I changed this because it reduces y a bit too much, and needs something
like the GET of n to classify the quadrant.

XX +
XX +	if (y < 2)
XX +	    goto y_reduced;
XX +	vz = y+0x1p53;
XX +	z = vz-0x1p53;			/* y rounded to a multiple of 2 */
XX +	if (z > y)
XX +	    z -= 2;			/* adjust to round down */
XX +	y = y - z;			/* y mod 2 */
XX +y_reduced:

The problem with this method is fundamental.  Rounding of y to an
integer multiple of 2 in any normal way should cause inexact whenever
y is not such a multiple, but we want to avoid inexact except for
multiples of 0.25.

I think the remainder operation doesn't set inexact, so remainder(y, 2)
would work.  I expected it to be slow, but it is actually faster than
the above on i386!  (remainder is in hardware on x86, but it uses the
i387, so is not quite as good on amd64.)  Code using remainder():

XX 	/*
XX 	 * Redundant step to demonstrate a general version.  The usual magic
XX 	 * for rounding needs only no overflow, which this step ensures.
XX 	 */
XX 	if (y >= 0x1p53)
XX 	    return zero;		/* even integer */
XX 
XX 	if (y < 2)
XX 	    goto y_reduced;
XX #if 0
XX 	vz = y+0x1p53;
XX 	z = vz-0x1p53;			/* y rounded to a multiple of 2 */
XX 	if (z > y)
XX 	    z -= 2;			/* adjust to round down */
XX 	y = y - z;			/* y mod 2 */
XX #else
XX 	y = remainder(y, 2);
XX 	if (y < 0)
XX 		y += 2;
XX #endif
XX y_reduced:

x86 remainder() has a slow loop for large args, so although it would work
up to DBL_MAX, the special case to handle large args is still needed as
an optimization.

If this works, then it also works for conversions from degrees to radians
(first take the remainder mod 360 instead of 2 in the above), and later
scale by pi/180 instead of pi.

Bruce

From owner-freebsd-numerics@freebsd.org  Thu May 18 01:42:28 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 150C3D712C0
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu, 18 May 2017 01:42:28 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id D3F74198A
 for <freebsd-numerics@freebsd.org>; Thu, 18 May 2017 01:42:27 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4I1gQvg063967
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 17 May 2017 18:42:26 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4I1gQAl063966;
 Wed, 17 May 2017 18:42:26 -0700 (PDT) (envelope-from sgk)
Date: Wed, 17 May 2017 18:42:26 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
Message-ID: <20170518014226.GA63819@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170409220809.GA25076@troutmask.apl.washington.edu>
 <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
 <20170517094848.A52247@besplex.bde.org>
 <20170517180927.GA54431@troutmask.apl.washington.edu>
 <20170518072636.E951@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170518072636.E951@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 May 2017 01:42:28 -0000

On Thu, May 18, 2017 at 09:25:58AM +1000, Bruce Evans wrote:
> On Wed, 17 May 2017, Steve Kargl wrote:
> 
> > On Wed, May 17, 2017 at 12:49:45PM +1000, Bruce Evans wrote:
> >> On Tue, 16 May 2017, Steve Kargl wrote:
> >>
> >>> Index: lib/msun/ld128/s_cospil.c
> >>> ...
> >>> +static const long double
> >>> +pihi = 3.14159265358979322702026593105983920e+00L,
> >>> +pilo = 1.14423774522196636802434264184180742e-17L;
> >>
> >> These are not in normal format, and are hard to read.  I can't see if
> >> pihi has the correct number of zero bits for exact multiplication.
> >
> > I don't have access to an ld128 system with suitable
> > facilities to allow me to spit out the hex representation.
> 
> flame is still in the freebsd cluster, but is almost unusable because
> its files are an old copy of actual home directories on freefall.

Yes, I know flame is still available.  I deleted all infrastructure
I had on freefall over 2 years agos.  Need at a minimum GMP and 
MPFR.

> Utilities for printing hex and FP in good formats are remarkably rare.
> pari/gp supports many special math types but can only input or output
> ordinarly numbers in decimal.  I use my library routines to print hex
> and special FP formats for it.  sh, printf(1) and awk(1) are limited
> by native tyoes and have many other defects.

I wrote my own utility with the catchy name hex.

> > As such, I've added a comment
> >
> > /*
> > * pi_hi contains the leading 56 bits of a 169 bit approximation for pi.
> > */
> 
> Why 56?  53 + 56 = 109.

This is ld128.

static const long double
pi_hi = 3.14159265358979322702026593105983920e+00L,
pi_lo = 1.14423774522196636802434264184180742e-17L;

pi_hi has 113/2 = 56 bits.
pi_lo has 113 bit.

56 + 113 = 169 

> 
> >> These don't have the normal spelling.  fdlibm never uses pihi or pio2hi,
> >> or pi_hi.  It often uses pio2_hi and other pio2_*.  My s_sinpi.c uses
> >> pi_hi.

I've changed my spelling.

> 
> My improvements to lgamma()'s sin_pi*() are buggy.  I forgot the original
> target of avoiding inexact better:

I'll need to wait until the weekend to study your improved sin_pi.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Thu May 18 06:35:26 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5B41D72F03
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu, 18 May 2017 06:35:26 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 71E551DC2
 for <freebsd-numerics@freebsd.org>; Thu, 18 May 2017 06:35:26 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 36E473C9655;
 Thu, 18 May 2017 16:35:22 +1000 (AEST)
Date: Thu, 18 May 2017 16:34:57 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
In-Reply-To: <20170518014226.GA63819@troutmask.apl.washington.edu>
Message-ID: <20170518154820.A8280@besplex.bde.org>
References: <20170409220809.GA25076@troutmask.apl.washington.edu>
 <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
 <20170517094848.A52247@besplex.bde.org>
 <20170517180927.GA54431@troutmask.apl.washington.edu>
 <20170518072636.E951@besplex.bde.org>
 <20170518014226.GA63819@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=WvBbCZXv c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=AT3K7O-BTnsdAHa8EAYA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 May 2017 06:35:26 -0000

On Wed, 17 May 2017, Steve Kargl wrote:

> On Thu, May 18, 2017 at 09:25:58AM +1000, Bruce Evans wrote:
>> On Wed, 17 May 2017, Steve Kargl wrote:
> ...
>>> As such, I've added a comment
>>>
>>> /*
>>> * pi_hi contains the leading 56 bits of a 169 bit approximation for pi.
>>> */
>>
>> Why 56?  53 + 56 = 109.
>
> This is ld128.

ld128 has precision 113.

> static const long double
> pi_hi = 3.14159265358979322702026593105983920e+00L,
> pi_lo = 1.14423774522196636802434264184180742e-17L;
>
> pi_hi has 113/2 = 56 bits.
> pi_lo has 113 bit.
>
> 56 + 113 = 169

But hi is set intentionally sloppily by converting to double, so it has
53 bits instead of the 57 that 56 would go with (57 = 113/2 rounded up).

>> My improvements to lgamma()'s sin_pi*() are buggy.  I forgot the original
>> target of avoiding inexact better:
>
> I'll need to wait until the weekend to study your improved sin_pi.

I now have closer to production version.  sinpi() runs more than 2
times faster that your version, without doing much different except
not having so much EXTRACT/INSERT or special cases.  (66 cycles instead
of 150 on Haswell-i386, and the 66 can be reduced to 58 by removing
the STRICT_ASSIGN() which breaks only i386 with non-default extra
precision.  The difference would be smaller on amd64 or better compilers
, but Haswell-i386 has much the same slownesses as old Athlon from the
EXTRACTs and INSERTs.

However, both versions fail tests.  Min has a max error of 2048 ulps
and an everage error of 928 ulps and yours has a maximum error of many
giga-ups for the sign bit wrong and an average error of 561 ulps.  The
enormous errors are probably for denormals or NaNs.

XX double
XX ysinpi(double x)
XX {
XX 	double s,y,z;
XX 	int32_t hx,ix;
XX 	int n;
XX 
XX 	GET_HIGH_WORD(hx,x);
XX 	ix = hx & 0x7fffffff;
XX 
XX 	if (ix < 0x3fd00000) {		/* |x| < 0.25 */
XX 	    if (ix < 0x3e200000)	/* |x| < 2**-29 */
XX 		/* XXX: extra precision would generate spurious denormals. */
XX 		if(((int)x)==0) return M_PI*x; /* pi*x with inexact if x != 0 */

Note a problem with denormals.  The hi+lo decomposition on values up
to about 2**53 times the largest denormal creates denormals and thus
spurious denormal and underflow exceptions and possibly slowness.  I'm
not sure if it gives correct results either.  Your version has a special
case for calculating Pi*x, but doesn't handle this.  I think it would
work to scale up into non-denormal range for the calculation.  I just
multiply by M_PI.  I suspects this gives an error of nearly 1 ulp and
will test to see if it is strictly below 1 ulp (the details depend on
how closely M_PI approximates pi).

XX 	    return __kernel_sinpi(x);
XX 	}
XX 
XX 	if (ix>=0x7ff00000) return x-x;	/* Inf or NaN -> NaN */
XX

This classification was cloned from sin().  Then adjust constants as in
your version.  Not doing so much here probably gives half of the 150:66
speed improvement.

Don't bother with special cases for |x| <= 2 or 2.25 as done by sinf().
We don't really want the special case for |x| <= 0.25, but need it to
avoid denormals and underflow.  BTW, I just noticed that I broke this
case in lgamma*().  I moved the check for tiny x out of the kernels into
the sin*() and cos*() callers (or kept the duplicate of it there), but
didn't add it for the lgamma*() callers.  I broke it for sinf() or sin()
too.  You broke it for sinl().

XX 	if (ix >= 0x43400000) return 0;	/* even integer -> 0 */
XX 
XX 	y = fabs(x);
XX 
XX 	STRICT_ASSIGN(double,z,y+0x1p50);
XX 	GET_LOW_WORD(n,z);		/* bits for rounded y (units 0.25) */
XX 	z -= 0x1p50;			/* y rounded to a multiple of 0.25 */
XX 	if (z > y) {
XX 	    z -= 0.25;			/* adjust to round down */
XX 	    n--;
XX 	}
XX 	n &= 7;				/* octant of y mod 2 */
XX 	y -= z;				/* y mod 0.25 */

This is from old optimizations in lgamma() with minor improvements:
- remove extra reduction step to classify integers
- use STRICT_ASSIGN() instead of hard-coded volatile to avoid pessimizing
   arches without extra precision

XX 
XX 	if (n >= 4) {
XX 	    s = -1;
XX 	    n -= 4;
XX 	} else
XX 	    s = 1;
XX 	if (n == 0)
XX 	    return s*__kernel_sinpi(y);	/* this handles poles right */

Oops, the poles are specific to gamma().

This handles integers right.  y is 0 and the kernel returns +=0 without
setting inexact (depend on knowing its internals for the latter).  Don't
be careful about the sign.  I think it is always +0 with the default rounding
mode.

XX 	else if (n == 1)
XX 	    return s*__kernel_cospi(y-0.25);
XX 	else if (n == 2)
XX 	    return s*__kernel_cospi(y);

This handles half-integers right.  No problem with the sign since the kernel
always returns 1.

XX 	else
XX 	    return s*__kernel_sinpi(y-0.25);
XX }

This is from old optimizations in lgamma with minor improvements (but
larger code changes):
- don't use a case statement
- don't add n * 0.25 to y only to have to a multiple of 0.25 from y in more
   cases later.

Bruce

From owner-freebsd-numerics@freebsd.org  Thu May 18 17:54:37 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9070DD70642
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu, 18 May 2017 17:54:37 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 6455FCF7
 for <freebsd-numerics@freebsd.org>; Thu, 18 May 2017 17:54:37 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4IHsZVu074686
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Thu, 18 May 2017 10:54:35 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4IHsZFV074685;
 Thu, 18 May 2017 10:54:35 -0700 (PDT) (envelope-from sgk)
Date: Thu, 18 May 2017 10:54:34 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
Message-ID: <20170518175434.GA74453@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170409220809.GA25076@troutmask.apl.washington.edu>
 <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
 <20170517094848.A52247@besplex.bde.org>
 <20170517180927.GA54431@troutmask.apl.washington.edu>
 <20170518072636.E951@besplex.bde.org>
 <20170518014226.GA63819@troutmask.apl.washington.edu>
 <20170518154820.A8280@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170518154820.A8280@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 May 2017 17:54:37 -0000

On Thu, May 18, 2017 at 04:34:57PM +1000, Bruce Evans wrote:
> On Wed, 17 May 2017, Steve Kargl wrote:
> 
> > On Thu, May 18, 2017 at 09:25:58AM +1000, Bruce Evans wrote:
> >> On Wed, 17 May 2017, Steve Kargl wrote:
> > ...
> >>> As such, I've added a comment
> >>>
> >>> /*
> >>> * pi_hi contains the leading 56 bits of a 169 bit approximation for pi.
> >>> */
> >>
> >> Why 56?  53 + 56 = 109.
> >
> > This is ld128.
> 
> ld128 has precision 113.

Yes. I know.

> > static const long double
> > pi_hi = 3.14159265358979322702026593105983920e+00L,
> > pi_lo = 1.14423774522196636802434264184180742e-17L;
> >
> > pi_hi has 113/2 = 56 bits.
> > pi_lo has 113 bit.
> >
> > 56 + 113 = 169
> 
> But hi is set intentionally sloppily by converting to double, so it has
> 53 bits instead of the 57 that 56 would go with (57 = 113/2 rounded up).

A 53-bit entity times a 56-bit entity needs 109-bits for the result.
A 113-bit entity can hold a 109-bit entity.  Or am I missing something?

> >> My improvements to lgamma()'s sin_pi*() are buggy.  I forgot the original
> >> target of avoiding inexact better:
> >
> > I'll need to wait until the weekend to study your improved sin_pi.
> 
> I now have closer to production version.  sinpi() runs more than 2
> times faster that your version, without doing much different except
> not having so much EXTRACT/INSERT or special cases.  (66 cycles instead
> of 150 on Haswell-i386, and the 66 can be reduced to 58 by removing
> the STRICT_ASSIGN() which breaks only i386 with non-default extra
> precision.  The difference would be smaller on amd64 or better compilers
> , but Haswell-i386 has much the same slownesses as old Athlon from the
> EXTRACTs and INSERTs.
> 
> However, both versions fail tests.  Min has a max error of 2048 ulps
> and an everage error of 928 ulps and yours has a maximum error of many
> giga-ups for the sign bit wrong and an average error of 561 ulps.  The
> enormous errors are probably for denormals or NaNs.

You're correct that I did not give much consideration to subnormal
number.  For float this, can be fixed by

	 		if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
				if (x == 0)
					return (x);
				SET_FLOAT_WORD(hi, hx & 0xffff0000);
				if (ix <= 0x0b7f0001) { /* Avoid spurios underflow. */
					hi *= 0x1p23F;
					lo = x * 0x1p23F - hi;
					s = (pi_lo + pi_hi) * lo + pi_lo * hi +
					    pi_hi * hi;
					return (s * 0x1p-23F);
				}
				lo = x - hi;
				s = (pi_lo + pi_hi) * lo + pi_lo * hi +
				    pi_hi * hi;
				return (s);
			}
or if you want to avoid the extra if-statement.  Simply scale

	 		if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
				if (x == 0)
					return (x);
				SET_FLOAT_WORD(hi, hx & 0xffff0000);
				hi *= 0x1p23F;
				lo = x * 0x1p23F - hi;
				s = (pi_lo + pi_hi) * lo + pi_lo * hi +
				    pi_hi * hi;
				return (s * 0x1p-23F);
			}

Starting with x = 0x1p-14  and using nextafterf(x,-1.f), the 
first value where underflow is signalled is

% ./testf -S -D
sinpif(3.74171353e-39) = 1.17549407e-38
x is subnormal
sinpif(x) is subnormal

Exhaustive testing of numbers in the domain [FLT_MIN, 0x1p-14] yields

% ./testf -S -m 1.17549435E-38F -M 0x1p-14F -ed
         MAX ULP: 0.58366567
    Total tested: 939524096

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Thu May 18 22:56:41 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B4D4AD71A0F
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu, 18 May 2017 22:56:41 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 529461D4B
 for <freebsd-numerics@freebsd.org>; Thu, 18 May 2017 22:56:41 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 4FCCBD67DD8;
 Fri, 19 May 2017 08:56:28 +1000 (AEST)
Date: Fri, 19 May 2017 08:56:27 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
In-Reply-To: <20170518175434.GA74453@troutmask.apl.washington.edu>
Message-ID: <20170519074512.M884@besplex.bde.org>
References: <20170409220809.GA25076@troutmask.apl.washington.edu>
 <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
 <20170517094848.A52247@besplex.bde.org>
 <20170517180927.GA54431@troutmask.apl.washington.edu>
 <20170518072636.E951@besplex.bde.org>
 <20170518014226.GA63819@troutmask.apl.washington.edu>
 <20170518154820.A8280@besplex.bde.org>
 <20170518175434.GA74453@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=30vkLdqRwj5xFZukw2YA:9 a=2GCknW8xBUceK4Wk:21
 a=Tm1mFJjD2G5J-NMC:21 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 May 2017 22:56:41 -0000

On Thu, 18 May 2017, Steve Kargl wrote:

> On Thu, May 18, 2017 at 04:34:57PM +1000, Bruce Evans wrote:
>> On Wed, 17 May 2017, Steve Kargl wrote:
>>
>>> On Thu, May 18, 2017 at 09:25:58AM +1000, Bruce Evans wrote:
>>>> On Wed, 17 May 2017, Steve Kargl wrote:
>>> ...
>>>>> As such, I've added a comment
>>>>>
>>>>> /*
>>>>> * pi_hi contains the leading 56 bits of a 169 bit approximation for pi.
>>>>> */
>>>>
>>>> Why 56?  53 + 56 = 109.
>>>
>>> This is ld128.
>>
>> ld128 has precision 113.
>
> Yes. I know.
>
>>> static const long double
>>> pi_hi = 3.14159265358979322702026593105983920e+00L,
>>> pi_lo = 1.14423774522196636802434264184180742e-17L;
>>>
>>> pi_hi has 113/2 = 56 bits.
>>> pi_lo has 113 bit.
>>>
>>> 56 + 113 = 169
>>
>> But hi is set intentionally sloppily by converting to double, so it has
>> 53 bits instead of the 57 that 56 would go with (57 = 113/2 rounded up).
>
> A 53-bit entity times a 56-bit entity needs 109-bits for the result.
> A 113-bit entity can hold a 109-bit entity.  Or am I missing something?

It wastes 4 bits.  This reader has to guess if this is intentional for
magic elsewhere, or just sloppy.

Also, the rounded value might naturally have some extra low zero bits.
So looking at the bits, it is not easy to see if there are the right
number.  This is not worth a comment.  The reader should trust that
the right (maximal) number of bits are used, and only check it
approximately.

The double precision pi*hi and pi*lo also seemed to be sloppily
rounded.  My version has less precision (28 low zero bits in pi_hi
instead of the minimal 24), while yours has 27 low zero bits in
pihi.  The explanation for this is something like:
- I copied pi_hi from another file with slightly different requirements
- the choice makes pi_lo positive but pilo negative.  The other file
   might have need pi_lo positive, or just extra zero bits in pi_hi.
   Most likely just the latter.
- your 27 is apparently from the calculation 53 - 53 / 2 for almost
   even splitting of x, but the splitting of x is uneven (24 + 29).

>>> I'll need to wait until the weekend to study your improved sin_pi.
>> ..
>> However, both versions fail tests.  Min has a max error of 2048 ulps
>> and an everage error of 928 ulps and yours has a maximum error of many
>> giga-ups for the sign bit wrong and an average error of 561 ulps.  The
>> enormous errors are probably for denormals or NaNs.

The were mostly incomplete setup of the tests.  After fixing this, many
real bugs were found:
- 2 or 3 wrong magic numbers in your version (different ones for sinpi()
   and sinpil())
- wrong quadrant in my version
- incompatible signs in my version (sign errors are reported as value
   errors in the exa-ulp range, so my tests can't pass with even 1)
- the non-use of double_t breaks sinpi() completely, and gives errors
   of up to about 1.4 ulps (slightly worse than a simple multiplication
   by M_PI).  Using double_t is no better, since the kernels don't support
   it.  hi+lo ends up with extra precision in hi either way, and this
   gets destroyed on calling the kernel.  The tests find this by switching
   the mode to extra precision to actually test it.  The bug would be
   more obvious using the same algorithm in float precision, since then
   the default precision would be extra.

   This can be fixed using STRICT_ASSIGN() or volatile hacks in _2sumF(),
   but that would turn _2sumF() into back into one of its slow development
   versions.  Options for controlling this were intentionally left out
   to get an easy to ise API for _2sumF().

   The quick fix of declaring hi and lo volatile seems to work, but is
   probably slower than doing it in _2sumnvF() and is certainly worse
   without extra precision.  Only 1 more strict assignment is needed.

   The quick fix of compiling with -ffloat-store works.

> You're correct that I did not give much consideration to subnormal
> number.  For float this, can be fixed by
>
> 	 		if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
> 				if (x == 0)
> 					return (x);
> 				SET_FLOAT_WORD(hi, hx & 0xffff0000);
> 				if (ix <= 0x0b7f0001) { /* Avoid spurios underflow. */
> 					hi *= 0x1p23F;
> 					lo = x * 0x1p23F - hi;
> 					s = (pi_lo + pi_hi) * lo + pi_lo * hi +
> 					    pi_hi * hi;
> 					return (s * 0x1p-23F);
> 				}
> 				lo = x - hi;
> 				s = (pi_lo + pi_hi) * lo + pi_lo * hi +
> 				    pi_hi * hi;
> 				return (s);
> 			}

Far too verbose.  The unfixed version was bad enough.

> or if you want to avoid the extra if-statement.  Simply scale
>
> 	 		if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
> 				if (x == 0)
> 					return (x);
> 				SET_FLOAT_WORD(hi, hx & 0xffff0000);
> 				hi *= 0x1p23F;
> 				lo = x * 0x1p23F - hi;
> 				s = (pi_lo + pi_hi) * lo + pi_lo * hi +
> 				    pi_hi * hi;
> 				return (s * 0x1p-23F);
> 			}

Better, but this is bogus for float.  Just multiply by M_PI in double
precision, as is done everywhere else in the function.

My version for double precision moves this multiplication to a kernel
to defend against n-tuplication, but botches the scaling by splitting
x before scaling it:

XX static inline double
XX __kernel_mpi(double x)
XX {
XX 	double hi, lo;
XX 
XX 	hi = (float)x;
XX 	lo = x - hi;
XX 	return (0x1p1000 * pi_lo * x + 0x1p1000 * pi_hi * lo +
XX 	    0x1p1000 * pi_hi * hi) * 0x1p-1000;
XX }
XX ...
XX 	if (ix < 0x3fd00000) {		/* |x| < 0.25 */
XX 	    if (ix < 0x3e200000) {	/* |x| < 2**-29 */
XX 		if (x == 0)
XX 		    return x;
XX 		if ((int)x == 0)	/* generate inexact if x != 0 */
XX 		    return __kernel_mpi(x);
XX 	    }
XX 	    return __kernel_sinpi(x);
XX 	}

The hi+lo decomposition can also be done in a kernel.  This requires
returning hi and lo indirectly, which is fast enough with inlining to
make the indirections not actually indirect.  My exp kernels return
hi, lo, and also (approx log2(x)) indirectly, where hi and lo are
closer to the final result.  Other exp kernels re-assemble with several
variations of (hi + lo) * 2**k and return that directly.

My sinpif() now passes all tests and gives the same results as yours,
except possibly near 0.

The maximum error in float precision is about 0.54 ulps.  This is much
larger than 0.5009 ulps for sin() and cos().  I have no explanation.
The approx. to pi should give only 12 bits of inaccuracy, so the errror
shouldn't be much more than 1/1024 more.

XX float
XX sinpif(float x)
XX {
XX 	float s,y,z;
XX 	int32_t hx,ix;
XX 	int n;
XX 
XX 	GET_FLOAT_WORD(hx,x);
XX 	ix = hx & 0x7fffffff;
XX 
XX 	if (ix < 0x3e800000) {		/* |x| < 0.25 */
XX 	    if (ix < 0x38800000)	/* |x| < 2**-14 */
XX 		if(((int)x)==0) return M_PI*x; /* pi*x with inexact if x != 0 */
XX 	    return __kernel_sindf(M_PI*x);
XX 	}

We multiply by M_PI for the usual case, so it is silly not to multiply by
it when x is small.

XX 
XX 	if (ix>=0x7f800000) return x-x;	/* Inf or NaN -> NaN */
XX 
XX 	if (ix >= 0x4b800000) return (x < 0 ? -0.0F : 0); /* even int -> +-0 */

You said that n1950.pdf documents returning +0 at even integers and -0 at
odd intgers.  I couldn't find n1950.pdf, and your version doesn't do
anything like that.  It just returns +0 for all nonnegative integers and
-0 for all negative integers, as actually makes sense.

The threshold in the above is to classify even integers so as to possibly
return +0 for them and -0 as part of the general classification below.  It
can be reduced to possibly give special cases for odd and even large
integers above, as in your versions.

XX 
XX 	y = fabs(x);
XX 
XX 	STRICT_ASSIGN(float,z,y+0x1p21F);
XX 	GET_FLOAT_WORD(n,z);		/* bits for rounded y (units 0.25) */
XX 	z -= 0x1p21F;			/* y rounded to a multiple of 0.25 */
XX 	if (z > y) {
XX 	    z -= 0.25F;			/* adjust to round down */
XX 	    n--;
XX 	}
XX 	n &= 7;				/* octant of y mod 2 */
XX 	y -= z;				/* y mod 0.25 */
XX 
XX 	if (n >= 3 && n < 7)
XX 	    s = -1;
XX 	else
XX 	    s = 1;
XX 	if (x < 0)
XX 	    s = -s;
XX 	n &= 3;
XX 	if (y == 0 && n == 0)
XX 	    return (x < 0 ? -0.0F : 0);

I don't like the complications for signs, but they seem to execute fast
enough using direct logic.  On modern OOE x86, 's' can be calculated in
parallel, so it only costs about 4 cycles of latency to assemble with it
at the end, and my tests de-emphasize latency a bit too much.

XX 	if (n == 0)
XX 	    return s*__kernel_sindf(M_PI*y);
XX 	else if (n == 1)
XX 	    return s*__kernel_cosdf(M_PI*(y-0.25F));
XX 	else if (n == 2)
XX 	    return s*__kernel_cosdf(M_PI*y);
XX 	else
XX 	    return s*__kernel_sindf(M_PI*(y-0.25F));
XX }

Bruce

From owner-freebsd-numerics@freebsd.org  Thu May 18 23:47:30 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2DD05D73CB5
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu, 18 May 2017 23:47:30 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 14C831763
 for <freebsd-numerics@freebsd.org>; Thu, 18 May 2017 23:47:30 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4INlMhu077565
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Thu, 18 May 2017 16:47:22 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4INlMRp077564;
 Thu, 18 May 2017 16:47:22 -0700 (PDT) (envelope-from sgk)
Date: Thu, 18 May 2017 16:47:22 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
Message-ID: <20170518234722.GB77471@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
 <20170517094848.A52247@besplex.bde.org>
 <20170517180927.GA54431@troutmask.apl.washington.edu>
 <20170518072636.E951@besplex.bde.org>
 <20170518014226.GA63819@troutmask.apl.washington.edu>
 <20170518154820.A8280@besplex.bde.org>
 <20170518175434.GA74453@troutmask.apl.washington.edu>
 <20170519074512.M884@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170519074512.M884@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 May 2017 23:47:30 -0000

On Fri, May 19, 2017 at 08:56:27AM +1000, Bruce Evans wrote:
> On Thu, 18 May 2017, Steve Kargl wrote:
> 
> > On Thu, May 18, 2017 at 04:34:57PM +1000, Bruce Evans wrote:
> >> On Wed, 17 May 2017, Steve Kargl wrote:
> >>
> >>> On Thu, May 18, 2017 at 09:25:58AM +1000, Bruce Evans wrote:
> >>>> On Wed, 17 May 2017, Steve Kargl wrote:
> >>> ...
> >>>>> As such, I've added a comment
> >>>>>
> >>>>> /*
> >>>>> * pi_hi contains the leading 56 bits of a 169 bit approximation for pi.
> >>>>> */
> >>>>
> >>>> Why 56?  53 + 56 = 109.
> >>>
> >>> This is ld128.
> >>
> >> ld128 has precision 113.
> >
> > Yes. I know.
> >
> >>> static const long double
> >>> pi_hi = 3.14159265358979322702026593105983920e+00L,
> >>> pi_lo = 1.14423774522196636802434264184180742e-17L;
> >>>
> >>> pi_hi has 113/2 = 56 bits.
> >>> pi_lo has 113 bit.
> >>>
> >>> 56 + 113 = 169
> >>
> >> But hi is set intentionally sloppily by converting to double, so it has
> >> 53 bits instead of the 57 that 56 would go with (57 = 113/2 rounded up).
> >
> > A 53-bit entity times a 56-bit entity needs 109-bits for the result.
> > A 113-bit entity can hold a 109-bit entity.  Or am I missing something?
> 
> It wastes 4 bits.  This reader has to guess if this is intentional for
> magic elsewhere, or just sloppy.
> 
> Also, the rounded value might naturally have some extra low zero bits.
> So looking at the bits, it is not easy to see if there are the right
> number.  This is not worth a comment.  The reader should trust that
> the right (maximal) number of bits are used, and only check it
> approximately.

Unfortunately, I don't have the capibility to easily convert
an ld128 number to hex; otherwise, I would include a comment
with the hex representation.

> The double precision pi*hi and pi*lo also seemed to be sloppily
> rounded.  My version has less precision (28 low zero bits in pi_hi
> instead of the minimal 24), while yours has 27 low zero bits in
> pihi.  The explanation for this is something like:
> - I copied pi_hi from another file with slightly different requirements
> - the choice makes pi_lo positive but pilo negative.  The other file
>    might have need pi_lo positive, or just extra zero bits in pi_hi.
>    Most likely just the latter.
> - your 27 is apparently from the calculation 53 - 53 / 2 for almost
>    even splitting of x, but the splitting of x is uneven (24 + 29).

The last one is the closest.  I did p/2+1.  Yes, the splitting for
x is too sloppy.  

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Fri May 19 06:52:52 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D0B69D74F2C
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Fri, 19 May 2017 06:52:52 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 5576513B0
 for <freebsd-numerics@freebsd.org>; Fri, 19 May 2017 06:52:51 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 1706B42B46B;
 Fri, 19 May 2017 16:52:02 +1000 (AEST)
Date: Fri, 19 May 2017 16:52:02 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
In-Reply-To: <20170518234722.GB77471@troutmask.apl.washington.edu>
Message-ID: <20170519153326.A20099@besplex.bde.org>
References: <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
 <20170517094848.A52247@besplex.bde.org>
 <20170517180927.GA54431@troutmask.apl.washington.edu>
 <20170518072636.E951@besplex.bde.org>
 <20170518014226.GA63819@troutmask.apl.washington.edu>
 <20170518154820.A8280@besplex.bde.org>
 <20170518175434.GA74453@troutmask.apl.washington.edu>
 <20170519074512.M884@besplex.bde.org>
 <20170518234722.GB77471@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=nyoC9Yw7XdoYBhFG8wcA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 May 2017 06:52:52 -0000

I fixed all bugs found by my tests:
- sinpif() and cospif() were unnecessarily inaccurate.  Now they are
   have the same maximum error of 0.5009 ulps as sinf() and cosf(), and
   more accuracy on average.
- sinpi(), cospi() and tanpi() were broken on i386 with extra precision.
   Now they have a smaller error than sin(), etc. (about 0.55 ulps; the
   bug gave 1.5 ulps).
- fix cospi*(odd integer) returning +1 for odd integers.  Clean up related
   but accidentally working code for sinpi*() and tanpi*() (12 copies
   altogether).
- fix broken threshold for Infs and NaNs in sinpi(), cosp() and tanpi()
- try to fix off-by-1 error in translation of floorl() for ld80.  This
   seems to work.  The bug showed up as 2**62 being treated as 2**63, so
   args above 2**62 and below 2**63 only worked accidentally for sinl(),
   cosl() and tanl().
- fix the return value for NaNs.

XX diff -u2 k_cospi.c~ k_cospi.c
XX --- k_cospi.c~	Fri May 19 09:05:14 2017
XX +++ k_cospi.c	Fri May 19 12:02:24 2017
XX @@ -35,5 +35,8 @@
XX  __kernel_cospi(double x)
XX  {
XX -	double hi, lo;
XX +	/* XXX: volatile hack to abuse _2sumF(): */
XX +	volatile double hi;
XX +	double lo;
XX +
XX  	hi = (float)x;
XX  	lo = x - hi;

This hack is sufficent to get strict assignment to hi.  It is a bit slow.
It costs about 30% on i386 (70 cycles increased to 90) in my version.

XX diff -u2 k_sinpi.c~ k_sinpi.c
XX --- k_sinpi.c~	Fri May 19 09:05:14 2017
XX +++ k_sinpi.c	Fri May 19 12:02:56 2017
XX @@ -35,5 +35,8 @@
XX  __kernel_sinpi(double x)
XX  {
XX -	double hi, lo;
XX +	/* XXX: volatile hack to abuse _2sumF(): */
XX +	volatile double hi;
XX +	double lo;
XX +
XX  	hi = (float)x;
XX  	lo = x - hi;
XX diff -u2 ld128-s_cospil.c~ ld128-s_cospil.c
XX --- ld128-s_cospil.c~	Fri May 19 11:45:43 2017
XX +++ ld128-s_cospil.c	Fri May 19 12:30:56 2017
XX @@ -45,5 +45,4 @@
XX  cospil(long double x)
XX  {
XX -	volatile static const double vzero = 0;
XX  	long double ax, c, xf;
XX  	uint32_t ix;

This variable was only used to return wrong results for NaNs.

XX @@ -73,5 +72,5 @@
XX  	if (ax < 0x1p112) {
XX 
XX -		xf = floorl(ax);
XX +		xf = ax - remainderl(ax, 1);	/* floorl() without inexact */
XX 
XX  		ax -= xf;

Try to fix spurious inexact for half-integers (quarter-integers for
tanpil*()).  I didn't test ld128.  I tested ld80 enough to find
inconsisties with double.  They were much the same as inconsistencies with
float.

XX @@ -98,12 +97,6 @@
XX 
XX  	if (isinf(x) || isnan(x))
XX -		return (vzero / vzero);
XX +		return (x - x);

Use the standard convention and method for producing NaNs (+-Inf ->
default NaN; NaN -> quiet(same NaN).

XX 
XX -	/*
XX -	 * |x| >= 0x1p112 is always an even integer, so return 1.
XX -	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
XX -	 */
XX -	if (ax + 1 > 1)
XX -		ax = 1;
XX -	return (ax);
XX +	return (ix >= 0x403f || (lx & 1) == 0 ? 1 : -1);
XX  }

Remove lots of style bugs and make this work.  ax + 1 > 1 was always
true, so this always returned +1, but it must return -1 at odd integers.

The details are intentionally only given in s_cospi.c.

XX diff -u2 ld128-s_sinpil.c~ ld128-s_sinpil.c
XX --- ld128-s_sinpil.c~	Fri May 19 09:19:00 2017
XX +++ ld128-s_sinpil.c	Fri May 19 12:34:02 2017
XX @@ -45,5 +45,4 @@
XX  sinpil(long double x)
XX  {
XX -	volatile static const double vzero = 0;
XX  	long double ax, s, xf, xhi, xlo;
XX  	uint32_t ix;
XX @@ -78,5 +77,5 @@
XX  	if (ax < 0x1p112) {
XX 
XX -		xf = floorl(ax);
XX +		xf = ax - remainderl(ax, 1);	/* floorl() without inexact */
XX 
XX  		ax -= xf;
XX @@ -106,12 +105,6 @@
XX 
XX  	if (isinf(x) || isnan(x))
XX -		return (vzero / vzero);
XX +		return (x - x);
XX 
XX -	/*
XX -	 * |x| >= 0x1p112 is always an integer, so return +-0.
XX -	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
XX -	 */
XX -	if (ax + 1 > 1)
XX -		ax = copysignl(0, x);
XX -	return (ax);
XX +	return (x < 0 ? -0.0 : 0);
XX  }

The sign wasn't even copied for cospil().

As usual, ax + 1 > 1 is always true, so this returned the correct value
of 0 with the sign of x in all cases.  I only fixed this because the
bug was more serious for cospi*() so I had to edit that, and I wanted
to remove the nearly-duplicated style bugs.

XX diff -u2 ld128-s_tanpil.c~ ld128-s_tanpil.c
XX --- ld128-s_tanpil.c~	Fri May 19 09:15:52 2017
XX +++ ld128-s_tanpil.c	Fri May 19 12:34:41 2017
XX @@ -70,5 +70,4 @@
XX  tanpil(long double x)
XX  {
XX -	volatile static const double vzero = 0;
XX  	long double ax, xf;
XX  	uint32_t ix;
XX @@ -97,5 +96,5 @@
XX  	if (ix < 0x1p112) {
XX 
XX -		xf = floorl(ax);
XX +		xf = ax - remainderl(ax, 1);	/* floorl() without inexact */
XX 
XX  		ax -= xf;
XX @@ -112,12 +111,6 @@
XX  	/* x = +-inf or nan. */
XX  	if (isinf(x) || isnan(x))
XX -		return (vzero / vzero);
XX +		return (x - x);
XX 
XX -	/*
XX -	 * |x| >= 0x1p53 is always an integer, so return +-0.
XX -	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
XX -	 */
XX -	if (ax + 1 > 1)
XX -		ax = copysignl(0, x);
XX -	return (ax);
XX +	return (x < 0 ? -0.0 : 0);
XX  }
XX diff -u2 s_cospi.c~ s_cospi.c
XX --- s_cospi.c~	Fri May 19 09:05:16 2017
XX +++ s_cospi.c	Fri May 19 12:31:08 2017
XX @@ -74,5 +74,4 @@
XX  cospi(double x)
XX  {
XX -	volatile static const double vzero = 0;
XX  	double ax, c;
XX  	uint32_t hx, ix, j0, lx;
XX @@ -136,14 +135,12 @@
XX  	}
XX 
XX -	if (ix >= 0x7f800000)
XX -		return (vzero / vzero);
XX +	if (ix >= 0x7ff00000)
XX +		return (x - x);

The wrong threshold was copied from the float case.

XX 
XX  	/*
XX -	 * |x| >= 0x1p52 is always an even integer, so return 1.
XX -	 * FIXME: should this raise FE_INEXACT or FE_INVALID?
XX +	 * 0x1p53 <= |x|, then x is an even integer.  Otherwise,
XX +	 * 0x1p52 <= |x| < 0x1p53 and is odd iff lx is odd.
XX  	 */
XX -	if (ax + 1 > 1)
XX -		ax = 1;
XX -	return (ax);
XX +	return (ix >= 0x43400000 || (lx & 1) == 0 ? 1 : -1);
XX  }
XX

This gives details of how to decide if an value is an even or odd
integer.

XX diff -u2 s_cospif.c~ s_cospif.c
XX --- s_cospif.c~	Fri May 19 09:05:16 2017
XX +++ s_cospif.c	Fri May 19 12:56:24 2017
XX @@ -42,5 +42,4 @@
XX  cospif(float x)
XX  {
XX -	volatile static const float vzero = 0;
XX  	float ax, c;
XX  	uint32_t ix, j0;
XX @@ -100,12 +99,6 @@
XX  	/* x = +-inf or nan. */
XX  	if (ix >= 0x7f800000)
XX -		return (vzero / vzero);
XX +		return (x - x);
XX 
XX -	/*
XX -	 * |x| >= 0x1p23 is always an even integer, so return 1.
XX -	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
XX -	 */
XX -	if (ax + 1 > 1)
XX -		ax = 1;
XX -	return (ax);
XX +	return (ix >= 0x4b800000 || (ix & 1) == 0 ? 1 : -1);
XX  }
XX diff -u2 s_cospil.c~ s_cospil.c
XX --- s_cospil.c~	Fri May 19 09:22:48 2017
XX +++ s_cospil.c	Fri May 19 12:31:28 2017
XX @@ -48,5 +48,4 @@
XX  cospil(long double x)
XX  {
XX -	volatile static const double vzero = 0;
XX  	long double ax, c;
XX  	uint32_t j0;
XX @@ -87,5 +86,5 @@
XX  			lx &= ~(((lx << 32)-1) >> j0);
XX  		} else {
XX -			uint64_t m = (uint64_t)-1 >> (j0 + 1);
XX +			uint64_t m = (uint64_t)-1 >> j0;
XX  			if (lx & m) lx &= ~m;
XX  		}
XX @@ -118,12 +117,6 @@
XX 
XX  	if (ix >= 0x7fff)
XX -		RETURNI(vzero / vzero);
XX +		RETURNI(x - x);
XX 
XX -	/*
XX -	 * |x| >= 0x1p63 is always an even integer, so return 1.
XX -	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
XX -	 */
XX -	if (ax + 1 > 1)
XX -		ax = 1;
XX -	RETURNI(ax);
XX +	RETURNI(ix >= 0x403f || (lx & 1) == 0 ? 1 : -1);
XX  }
XX diff -u2 s_sinpi.c~ s_sinpi.c
XX --- s_sinpi.c~	Fri May 19 09:05:16 2017
XX +++ s_sinpi.c	Fri May 19 13:44:44 2017
XX @@ -77,5 +77,4 @@
XX  sinpi(double x)
XX  {
XX -	volatile static const double vzero = 0;
XX  	double ax, s;
XX  	uint32_t hx, ix, j0, lx;
XX @@ -87,5 +86,5 @@
XX  	if (ix < 0x3ff00000) {			/* |x| < 1 */
XX  		if (ix < 0x3fd00000) {		/* |x| < 0.25 */
XX -			if (ix < 0x3e200000) {	/* |x| < 0x1p-29 */
XX +			if (ix < 0x3de00000) {	/* |x| < 2**-33 */
XX  				if (x == 0)
XX  					return (x);

Reduce this threshold by a large factor for sinpi() and tanpi(), and by
an even larger factor for sinpif() and tanpif().  For some reason, the
x for sin(x), etc. is just as accurate as the poly up to a much higher
threshold than pi*x for sinpi(x), etc.

This reduces the maximum error for sinpif() from ~0.54 ulps to the same
~0.5009 ulps as for sinf().  For tanpif(), it only reduces the error like
that for small x (tanpif()'s max error is large elswhere).  For sinpi()
and tanpi(), similarly except sinpi()'s maximum error is smaller than
tanpi()'s (it seems to be smaller than sin()'s too -- I haven't seen it
larger than 0.5312 ulps, where .0312 is from the poly error of 2**-5).
For sinpil() and tanpil(), I didn't change this since I can't test it.

This might depend on extra precision.  Accuracy was thrown away for some
other functions by increasing the threshold to what works without extra
precision.  It is unusual to even test with extra precision for doubles.

Start fixing style bugs -- use the Fortran spelling of powers and not
C hex FP values when the code doesn't use the latter.

XX @@ -147,14 +146,8 @@
XX  	}
XX 
XX -	if (ix >= 0x7f800000)
XX -		return (vzero / vzero);
XX +	if (ix >= 0x7ff00000)
XX +		return (x - x);
XX 
XX -	/*
XX -	 * |x| >= 0x1p52 is always an integer, so return +-0.
XX -	 * FIXME: should this raise FE_INEXACT or FE_INVALID?
XX -	 */
XX -	if (ax + 1 > 1)
XX -		ax = copysign(0, x);
XX -	return (ax);
XX +	return (x < 0 ? -0.0 : 0);
XX  }
XX 
XX diff -u2 s_sinpif.c~ s_sinpif.c
XX --- s_sinpif.c~	Fri May 19 09:05:16 2017
XX +++ s_sinpif.c	Fri May 19 13:22:55 2017
XX @@ -46,5 +46,4 @@
XX  sinpif(float x)
XX  {
XX -	volatile static const float vzero = 0;
XX  	float ax, s;
XX  	uint32_t hx, ix, j0;
XX @@ -56,5 +55,5 @@
XX  	if (ix < 0x3f800000) {			/* |x| < 1 */
XX  		if (ix < 0x3e800000) {		/* |x| < 0.25 */
XX -	 		if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
XX +	 		if (ix < 0x34800000) {	/* |x| < 2**-22 */
XX  				if (x == 0)
XX  					return (x);
XX @@ -111,12 +110,6 @@
XX  	/* x = +-inf or nan. */
XX  	if (ix >= 0x7f800000)
XX -		return (vzero / vzero);
XX +		return (x - x);
XX 
XX -	/*
XX -	 * |x| >= 0x1p23 is always an integer, so return +-0.
XX -	 * FIXME: should this raise FE_INEXACT or FE_INVALID?
XX -	 */
XX -	if (ax + 1 > 1)
XX -		ax = copysignf(0, x);
XX -	return (ax);
XX +	return (x < 0 ? -0.0F : 0);
XX  }
XX diff -u2 s_sinpil.c~ s_sinpil.c
XX --- s_sinpil.c~	Fri May 19 09:23:00 2017
XX +++ s_sinpil.c	Fri May 19 12:34:28 2017
XX @@ -48,5 +48,4 @@
XX  sinpil(long double x)
XX  {
XX -	volatile static const double vzero = 0;
XX  	long double ax, s;
XX  	uint32_t j0;
XX @@ -91,5 +90,5 @@
XX  			lx &= ~(((lx << 32)-1) >> j0);
XX  		} else {
XX -			uint64_t m = (uint64_t)-1 >> (j0 + 1);
XX +			uint64_t m = (uint64_t)-1 >> j0;
XX  			if (lx & m) lx &= ~m;
XX  		}
XX @@ -125,12 +124,6 @@
XX  	/* x = +-inf or nan. */
XX  	if (ix >= 0x7fff)
XX -		RETURNI(vzero / vzero);
XX +		RETURNI(x - x);
XX 
XX -	/*
XX -	 * |x| >= 0x1p63 is always an integer, so return +-0.
XX -	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
XX -	 */
XX -	if (ax + 1 > 1)
XX -		ax = copysignl(0, x);
XX -	RETURNI(ax);
XX +	RETURNI(x < 0 ? -0.0 : 0);
XX  }
XX diff -u2 s_tanpi.c~ s_tanpi.c
XX --- s_tanpi.c~	Fri May 19 09:05:16 2017
XX +++ s_tanpi.c	Fri May 19 14:02:34 2017
XX @@ -78,5 +78,7 @@
XX  __kernel_tanpi(double x)
XX  {
XX -	double hi, lo, t;
XX +	/* XXX: volatile hack to abuse _2sumF(): */
XX +	volatile double hi;
XX +	double lo, t;
XX 
XX  	if (x < 0.25) {
XX @@ -104,5 +106,4 @@
XX  tanpi(double x)
XX  {
XX -	volatile static const double vzero = 0;
XX  	double ax, t;
XX  	uint32_t hx, ix, j0, lx;
XX @@ -114,5 +115,5 @@
XX  	if (ix < 0x3ff00000) {			/* |x| < 1 */
XX  		if (ix < 0x3fe00000) {		/* |x| < 0.5 */
XX -			if (ix < 0x3e200000) {	/* |x| < 0x1p-29 */
XX +			if (ix < 0x3df00000) {	/* |x| < 0x1p-32 */
XX  				if (x == 0)
XX  					return (x);
XX @@ -156,14 +157,8 @@
XX 
XX  	/* x = +-inf or nan. */
XX -	if (ix >= 0x7f800000)
XX -		return (vzero / vzero);
XX +	if (ix >= 0x7ff00000)
XX +		return (x - x);
XX 
XX -	/*
XX -	 * |x| >= 0x1p52 is always an integer, so return +-0.
XX -	 * FIXME: should this raise FE_INEXACT or FE_INVALID?
XX -	 */
XX -	if (ax + 1 > 1)
XX -		ax = copysign(0, x);
XX -	return (ax);
XX +	return (x < 0 ? -0.0 : 0);
XX  }
XX 
XX diff -u2 s_tanpif.c~ s_tanpif.c
XX --- s_tanpif.c~	Fri May 19 09:05:16 2017
XX +++ s_tanpif.c	Fri May 19 13:20:29 2017
XX @@ -57,5 +57,4 @@
XX  tanpif(float x)
XX  {
XX -	volatile static const float vzero = 0;
XX  	float ax, t;
XX  	uint32_t hx, ix, j0;
XX @@ -67,5 +66,5 @@
XX  	if (ix < 0x3f800000) {			/* |x| < 1 */
XX  		if (ix < 0x3f000000) {		/* |x| < 0.5 */
XX -			if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
XX +	 		if (ix < 0x34800000) {	/* |x| < 2**-22 */
XX  				if (ix == 0)
XX  					return (x);
XX @@ -104,12 +103,6 @@
XX  	/* x = +-inf or nan. */
XX  	if (ix >= 0x7f800000)
XX -		return (vzero / vzero);
XX +		return (x - x);
XX 
XX -	/*
XX -	 * |x| >= 0x1p23 is always an integer, so return +-0.
XX -	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
XX -	 */
XX -	if (ax + 1 > 1)
XX -		ax = copysignf(0, x);
XX -	return (ax);
XX +	return (x < 0 ? -0.0F : 0);
XX  }
XX diff -u2 s_tanpil.c~ s_tanpil.c
XX --- s_tanpil.c~	Fri May 19 09:15:55 2017
XX +++ s_tanpil.c	Fri May 19 12:35:05 2017
XX @@ -71,5 +71,4 @@
XX  tanpil(long double x)
XX  {
XX -	volatile static const double vzero = 0;
XX  	long double ax, t;
XX  	uint32_t j0;
XX @@ -109,5 +108,5 @@
XX  			lx &= ~(((lx << 32)-1) >> j0);
XX  		} else {
XX -			uint64_t m = (uint64_t)-1 >> (j0 + 1);
XX +			uint64_t m = (uint64_t)-1 >> j0;
XX  			if (lx & m) lx &= ~m;
XX  		}
XX @@ -128,12 +127,6 @@
XX  	/* x = +-inf or nan. */
XX  	if (ix >= 0x7fff)
XX -		RETURNI(vzero / vzero);
XX +		RETURNI(x - x);
XX 
XX -	/*
XX -	 * |x| >= 0x1p63 is always an integer, so return +-0.
XX -	 * FIXME: should this raise FE_INEXACT or FE_INVALID.
XX -	 */
XX -	if (ax + 1 > 1)
XX -		ax = copysignl(0, x);
XX -	RETURNI(ax);
XX +	RETURNI(x < 0 ? -0.0 : 0);
XX  }

I now have complete and debugged versions of sinpif() and sinpil().
They are still about twice as fast, and competitive with sinf() etc.
instead of more than twice as slow.  They needed further fixes for
classifications of half-integers and quarter-integers near the
threshold.  lgamma*() was broken (probably by us) by not doing this
right, but for lgamma() these errors are hidden in the noise of
other large errors.  Here my a complete sinpinf().  It grew larger
and slower than I like to become correct:

YY #include <float.h>
YY 
YY #include "math.h"
YY #include "math_private.h"
YY 
YY float
YY ysinpif(float x)
YY {
YY 	float s,y,z;
YY 	int32_t hx,ix;
YY 	int n;
YY 
YY 	GET_FLOAT_WORD(hx,x);
YY 	ix = hx & 0x7fffffff;
YY 
YY 	if (ix < 0x3e800000) {		/* |x| < 0.25 */
YY 	    if (ix < 0x34800000)	/* |x| < 2**-22 */
YY 		if(((int)x)==0) return M_PI*x; /* pi*x with inexact if x != 0 */
YY 	    return __kernel_sindf(M_PI*x);
YY 	}
YY 
YY 	if (ix>=0x7f800000) return x-x;	/* Inf or NaN -> NaN */
YY 
YY 	if (ix >= 0x4b000000) return (x < 0 ? -0.0F : 0); /* integer -> +-0 */
YY 
YY 	y = fabs(x);
YY 
YY 	if (ix >= 0x4a800000) {		/* half-integer */
YY 	    n = (ix & 3) * 2;
YY 	    y = 0;
YY 	} else if (ix >= 0x4a000000) {	/* quarter-integer */
YY 	    n = ix & 7;
YY 	    y = 0;

The half-integer and quarter-intger classification is new.  Adding 0x1p21F
to classify only works below 0x1p21F.  The commments should translate the
thresholds.

The version for lgamma*() in -current is broken too, but the error is
reduced by a preliminary classification of all integers in the above
ranges.  Adding 0x1p21F doesn't work for these either.  So lgamma*() only
gives wrong results for half-integers and quarter-integers in the above
range.  I think gamma*() overflows on these values, and the errors are
relatively not so enormous for lgamma*().

YY 	} else {
YY 	    STRICT_ASSIGN(float,z,y+0x1p21F);
YY 	    GET_FLOAT_WORD(n,z);	/* bits for rounded y (units 0.25) */
YY 	    z -= 0x1p21F;		/* y rounded to a multiple of 0.25 */
YY 	    if (z > y) {
YY 		z -= 0.25F;		/* adjust to round down */
YY 		n--;
YY 	    }
YY 	    n &= 7;			/* octant of y mod 2 */
YY 	    y -= z;			/* y mod 0.25 */
YY 	}
YY 
YY 	if (n >= 3 && n < 7)
YY 	    s = -1;
YY 	else
YY 	    s = 1;
YY 	if (x < 0)
YY 	    s = -s;
YY 	n &= 3;
YY 	if (y == 0 && n == 0)
YY 	    return (x < 0 ? -0.0F : 0);
YY 	if (n == 0)
YY 	    return s*__kernel_sindf(M_PI*y);
YY 	else if (n == 1)
YY 	    return s*__kernel_cosdf(M_PI*(y-0.25F));
YY 	else if (n == 2)
YY 	    return s*__kernel_cosdf(M_PI*y);
YY 	else
YY 	    return s*__kernel_sindf(M_PI*(y-0.25F));
YY }

This still has style bugs suitable for copying back to lgamma*.

Bruce

From owner-freebsd-numerics@freebsd.org  Sat May 20 00:58:14 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 36C77D738D7
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 20 May 2017 00:58:14 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id 95D5D18AD
 for <freebsd-numerics@freebsd.org>; Sat, 20 May 2017 00:58:13 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id B8DA8104973C;
 Sat, 20 May 2017 10:58:04 +1000 (AEST)
Date: Sat, 20 May 2017 10:58:03 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Steve Kargl <sgk@troutmask.apl.washington.edu>, 
 freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
In-Reply-To: <20170519153326.A20099@besplex.bde.org>
Message-ID: <20170520105748.X1017@besplex.bde.org>
References: <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
 <20170517094848.A52247@besplex.bde.org>
 <20170517180927.GA54431@troutmask.apl.washington.edu>
 <20170518072636.E951@besplex.bde.org>
 <20170518014226.GA63819@troutmask.apl.washington.edu>
 <20170518154820.A8280@besplex.bde.org>
 <20170518175434.GA74453@troutmask.apl.washington.edu>
 <20170519074512.M884@besplex.bde.org>
 <20170518234722.GB77471@troutmask.apl.washington.edu>
 <20170519153326.A20099@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=WvBbCZXv c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=rYtKBivQexIEMgcjFssA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 May 2017 00:58:14 -0000

On Fri, 19 May 2017, Bruce Evans wrote:

> I fixed all bugs found by my tests:
> ...
> - sinpi(), cospi() and tanpi() were broken on i386 with extra precision.
>  Now they have a smaller error than sin(), etc. (about 0.55 ulps; the
>  bug gave 1.5 ulps).
> ...
>
> XX diff -u2 k_cospi.c~ k_cospi.c
> XX --- k_cospi.c~	Fri May 19 09:05:14 2017
> XX +++ k_cospi.c	Fri May 19 12:02:24 2017
> XX @@ -35,5 +35,8 @@
> XX  __kernel_cospi(double x)
> XX  {
> XX -	double hi, lo;
> XX +	/* XXX: volatile hack to abuse _2sumF(): */
> XX +	volatile double hi;
> XX +	double lo;
> XX +
> XX  	hi = (float)x;
> XX  	lo = x - hi;
>
> This hack is sufficent to get strict assignment to hi.  It is a bit slow.
> It costs about 30% on i386 (70 cycles increased to 90) in my version.

I fixed this more cleanly in the infrastructure:

XX diff -u2 math_private.h~ math_private.h
XX --- math_private.h~	Tue Jun  4 21:39:14 2013
XX +++ math_private.h	Sun Nov 27 17:58:57 2005
XX @@ -21,4 +21,6 @@
XX  #include <machine/endian.h>
XX 
XX +#include <float.h>		/* XXX: pollution for STRICT_ASSIGN() */
XX +
XX  /*
XX   * The original fdlibm code used statements like:
XX @@ -322,5 +537,5 @@
XX  	__typeof(a) __s, __w;	\
XX  				\
XX -	__w = (a) + (b);	\
XX +	STRICT_ASSIGN(__typeof(a), __w, (a) + (b)); \
XX  	__s = __w - (a);	\
XX  	(b) = ((a) - (__w - __s)) + ((b) - __s); \
XX @@ -355,4 +570,8 @@
XX   * reduce their own extra-precision and efficiciency problems.  In
XX   * particular, they shouldn't convert back and forth just to call here.
XX + *
XX + * Update: the 2sum algorithms now use STRICT_ASSIGN(), so they should work
XX + * with any FP type, and only be slow if that type is not float_t, double_t
XX + * or long double.
XX   */
XX  #ifdef DEBUG
XX @@ -365,5 +584,5 @@
XX  	assert(__ia == 0 || fabsl(__ia) >= fabsl(__ib));	\
XX  							\
XX -	__w = (a) + (b);				\
XX +	STRICT_ASSIGN(__typeof(a), __w, (a) + (b));	\
XX  	(b) = ((a) - __w) + (b);			\
XX  	(a) = __w;					\
XX @@ -380,5 +599,5 @@
XX  	__typeof(a) __w;	\
XX  				\
XX -	__w = (a) + (b);	\
XX +	STRICT_ASSIGN(__typeof(a), __w, (a) + (b)); \
XX  	(b) = ((a) - __w) + (b); \
XX  	(a) = __w;		\

Other fixes: my tests only passed because they were on i386 with extra
precision for doubles.  On amd64 and on i386 with the default precision
for doubles, the problem with the hi+lo decomposition for denormals
showed up as errors of about 1.5 ulps for sinpif(), tanpif(), sinpi()
and tanpi().  Previously-discussed fixes for this worked.  I rewrote
them from scratch (not very cleanly -- they should be in a header file,
as should be the kernels.  1 header file can keep kernels and functions
to multiply by pi for all precisions):

XX diff -u2 s_sinpi.c~ s_sinpi.c
XX --- s_sinpi.c~	Fri May 19 09:05:16 2017
XX +++ s_sinpi.c	Fri May 19 19:52:07 2017
XX @@ -77,6 +77,5 @@
XX  sinpi(double x)
XX  {
XX -	volatile static const double vzero = 0;
XX -	double ax, s;
XX +	double ax, hi, lo, s;
XX  	uint32_t hx, ix, j0, lx;
XX 
XX @@ -87,12 +86,15 @@
XX  	if (ix < 0x3ff00000) {			/* |x| < 1 */
XX  		if (ix < 0x3fd00000) {		/* |x| < 0.25 */
XX -			if (ix < 0x3e200000) {	/* |x| < 0x1p-29 */
XX +			if (ix < 0x3de00000) {	/* |x| < 2**-33 */
XX  				if (x == 0)
XX  					return (x);
XX -				INSERT_WORDS(ax, hx, 0);
XX -				x -= ax;
XX -				s = pilo * x + pihi * x + pilo * ax
XX -				    + pihi * ax;
XX -				return (s);
XX +				if ((int)x == 1) /* gen. inexact if x != 0 */
XX +					return (x);	/* NOTREACHED */
XX +				hi = x;
XX +				SET_LOW_WORD(hi, 0);
XX +				hi *= 0x1p1000;
XX +				lo = 0x1p1000 * x  - hi;
XX +				return ((0x1p1000 * pilo * x + pihi * lo +
XX +				    pihi * hi) * 0x1p-1000);
XX  			}
XX

This also fixes the style bugs of clobbering x and ax to hold lo and hi,
and the pessimization of using INSER_WORDS instead of SET_LOW_WORD(),
and the pessimization of not combining low terms, and the bug of not
always setting inexact for nonzero x (e.g., for x = 2**-34), and the
style bug and pessimization of handling the sign explicitly (rounding
to nearest gives oddness automatically exept for x = 0), and has 2
older fixes.

Multiplication by pi is remarkably complicated.  I noticed that you
already avoided the trap of using the usual method of truncating x
by converting to float -- this fails for denormals and other small
values since floats don't have enough range.  Converting long doubles
fails similarly.

So 2 different methods for multiplying by pi are needed:
- a slower general method that works for small values (use bit-fiddling
     to truncate), and scale up
- the usual method of casting to float for medium-sized values
- mercifully, a third method that works on large values is not needed.
These should be in a header file and not duplicated.  But I used the
quick fix of duplicating the above for tanpi().

XX diff -u2 s_sinpif.c~ s_sinpif.c
XX --- s_sinpif.c~	Fri May 19 09:05:16 2017
XX +++ s_sinpif.c	Fri May 19 19:49:33 2017
XX @@ -46,5 +46,4 @@
XX  sinpif(float x)
XX  {
XX -	volatile static const float vzero = 0;
XX  	float ax, s;
XX  	uint32_t hx, ix, j0;
XX @@ -56,14 +55,7 @@
XX  	if (ix < 0x3f800000) {			/* |x| < 1 */
XX  		if (ix < 0x3e800000) {		/* |x| < 0.25 */
XX -	 		if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
XX -				if (x == 0)
XX -					return (x);
XX -				SET_FLOAT_WORD(ax, hx & 0xffff0000);
XX -				x -= ax;
XX -				s = pilo * x + pihi * x + pilo * ax
XX -				    + pihi * ax;
XX -				return (s);
XX -			}
XX -
XX +	 		if (ix < 0x34800000)	/* |x| < 2**-22 */
XX +				if ((int)x == 0) /* gen. inexact if x != 0 */
XX +					return (M_PI * x);
XX  			s = __kernel_sinpif(ax);
XX  			return ((hx & 0x80000000) ? -s : s);

This is simpler because it can just use double precision.  Also fix missing
setting of inexact and 1 style bug.  This can use essentially the same code
as sinf().  It doesn't need a special case for x == 0 since multiplication
if +-0 by M_PI preserves the sign.  I didn't fix the style bug and pessimal
sign handling for non-small x.  This should be the same as for sinf() too.
The kernel preserves oddness in the default rounding mode except for x == 0
which is not handled by the kernel.

My sinpi() needed similar fixes for the scaling.  I changed all of its
formatting away from fdlibm towards KNF:

XX #include <float.h>
XX 
XX #include "math.h"
XX #include "math_private.h"
XX 
XX static const double
XX pi_hi = 3.1415926218032837e+00,	/* 0x400921fb 0x50000000 */
XX pi_lo = 3.1786509547050787e-08;	/* 0x3e6110b4 0x611a5f14 */
XX 
XX static double
XX __kernel_cospi(double x)
XX {
XX 	double hi, lo;
XX 
XX 	hi = (float)x;
XX 	lo = x - hi;
XX 	lo = (pi_lo + pi_hi) * lo + pi_lo * hi;
XX 	hi = pi_hi * hi;
XX 	_2sumF(hi, lo);
XX 	return (__kernel_cos(hi, lo));
XX }
XX 
XX static double
XX __kernel_sinpi(double x)
XX {
XX 	double hi, lo;
XX 
XX 	hi = (float)x;
XX 	lo = x - hi;
XX 	lo = (pi_lo + pi_hi) * lo + pi_lo * hi;
XX 	hi = pi_hi * hi;
XX 	_2sumF(hi, lo);
XX 	return (__kernel_sin(hi, lo, 1));
XX }
XX 
XX static inline double
XX __kernel_mpi(double x)
XX {
XX 	double hi, lo;
XX 	int32_t hx, lx;
XX 
XX 	EXTRACT_WORDS(hx, lx, x);
XX 	hi = x;
XX 	SET_LOW_WORD(hi, 0);
XX 	hi *= 0x1p1000;
XX 	lo = 0x1p1000 * x  - hi;
XX 	return ((0x1p1000 * pi_lo * x + pi_hi * lo + pi_hi * hi) * 0x1p-1000);
XX }

This is cleaner, but not yet suitable for a kernel in a header file since
it does a pessimal second extraction (which should be only of the high word
again).  Since this is inline, perhaps the compile can optimize away the
second extraction.  It can do this easily only if the extractions are the
same.  If the compiler can't do this (perhaps because this isn't inlined),
then hx (and lx if available) should be passed to the kernel.

The above isn't a general multiplication kernel as originally intended,
since it has to scale up to handle small values, so doesn't work for
large values.  Actually, it would work up to 0x1p53, but be slower.

The amount of scaling up is tricky to determine.  0x1p1000 is about
half-way in the range of ~2048 exponents, to scale up denormals to
nearly 1 so that they obviously aren't denormals.

XX 
XX double
XX ysinpi(double x)
XX {
XX 	double s, y, z;
XX 	int32_t hx, ix;
XX 	int n;
XX 
XX 	GET_HIGH_WORD(hx, x);
XX 	ix = hx & 0x7fffffff;
XX 
XX 	if (ix < 0x3fd00000) {		/* |x| < 0.25 */
XX 		if (ix < 0x3de00000) {	/* |x| < 2**-33 */
XX 			if (x == 0)
XX 				return (x);
XX 			if ((int)x == 0) /* generate inexact if x != 0 */
XX 				return __kernel_mpi(x);
XX 		}
XX 		return (__kernel_sinpi(x));
XX 	}
XX 	if (ix >= 0x7ff00000)		/* Inf or NaN -> NaN */
XX 		return (x - x);
XX 	if (ix >= 0x43300000)		/* 2**52 <= |x| (integer) -> +-0 */
XX 		return (x < 0 ? -0.0 : 0);
XX 	if (ix >= 0x43200000) {		/* 2**51 <= |x| < 2**52 (int/2) */
XX 		GET_LOW_WORD(n, x);
XX 		n = (n & 3) * 2;
XX 		y = 0;
XX 	} else if (ix >= 0x43100000) {	/* 2**50 <= |x| < 2**51 (int/4) */
XX 		GET_LOW_WORD(n, x);
XX 		n &= 7;
XX 		y = 0;
XX 	} else {
XX 		y = fabs(x);
XX 		STRICT_ASSIGN(double, z, y + 0x1p50);
XX 		GET_LOW_WORD(n, z);	/* bits for rounded y (units 0.25) */
XX 		z -= 0x1p50;		/* y rounded to a multiple of 0.25 */
XX 		if (z > y) {
XX 			z -= 0.25;	/* adjust to round down */
XX 			n--;
XX 		}
XX 		n &= 7;			/* octant of y mod 2 */
XX 		y -= z;			/* y mod 0.25 */
XX 	}
XX 
XX 	s = (n >= 3 && n < 7) ? -1 : 1;
XX 	if (x < 0)
XX 		s = -s;
XX 	n &= 3;
XX 	if (y == 0 && n == 0)
XX 		return (x < 0 ? -0.0 : 0);
XX 	if (n == 0)
XX 		return (s * __kernel_sinpi(y));
XX 	if (n == 1)
XX 		return (s * __kernel_cospi(y - 0.25));
XX 	if (n == 2)
XX 		return (s * __kernel_cospi(y));
XX 	return (s * __kernel_sinpi(y - 0.25));
XX }

With better-designed kernels, there would be no kernels for cospi or sinpi,
and the code would look something like:

   		return (s * __k_cos(k_mpi(y)));

where:
- __k_cos() is essentially __kernel_cos() with the following fixes:
     - it must take double_t args to work optimally and simply with _2sumF()
     - these args should be packaged in a struct for convenience
     - rename it because its API changed.  Fix its verbose name.  I'd like to
       remove the leading underscores too, but inlining many copies of it
       would be too much.  It isn't even inlined now, and inlining it tends
       to be a pessimizaition.
- k_mpi() is the first part of __kernel_cospi() with the hi+lo decomposition
     packaged as a struct for convenience.  It takes a fairly smart compiler
     to not pessimize such structs (gcc-3.3.3 pessimizes them, but gcc-4.2.1
     mostly handles them optimally, and clang always handles them optimally).
     k_mpi() should always be inline so it doesn't need underscores.

The kernels can't quite be type-generic without complications.  The
truncation part depends on the precision.

The fix for _2sumF() is needed mainly because the current kernels don't
support float_t or double_t (except the internal one in logl(), and
my uncommitted ones for exp() and expf()).  Requiring use types without
extra precision for _2sumF() was intentional to promote use of double_t,
etc., but I forgot that the old kernels didn't support it.

Bruce

From owner-freebsd-numerics@freebsd.org  Sat May 20 14:58:15 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id BBA47D767F2
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 20 May 2017 14:58:15 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id A304A1562
 for <freebsd-numerics@freebsd.org>; Sat, 20 May 2017 14:58:15 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4KEwE43031478
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sat, 20 May 2017 07:58:14 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4KEwDfW031477;
 Sat, 20 May 2017 07:58:13 -0700 (PDT) (envelope-from sgk)
Date: Sat, 20 May 2017 07:58:13 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
Message-ID: <20170520145813.GA31395@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170517094848.A52247@besplex.bde.org>
 <20170517180927.GA54431@troutmask.apl.washington.edu>
 <20170518072636.E951@besplex.bde.org>
 <20170518014226.GA63819@troutmask.apl.washington.edu>
 <20170518154820.A8280@besplex.bde.org>
 <20170518175434.GA74453@troutmask.apl.washington.edu>
 <20170519074512.M884@besplex.bde.org>
 <20170518234722.GB77471@troutmask.apl.washington.edu>
 <20170519153326.A20099@besplex.bde.org>
 <20170520105748.X1017@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170520105748.X1017@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 May 2017 14:58:15 -0000

On Sat, May 20, 2017 at 10:58:03AM +1000, Bruce Evans wrote:
> On Fri, 19 May 2017, Bruce Evans wrote:
> 
> XX -	__w = (a) + (b);	\
> XX +	STRICT_ASSIGN(__typeof(a), __w, (a) + (b)); \
> XX  	(b) = ((a) - __w) + (b); \
> XX  	(a) = __w;		\
> 
> Other fixes: my tests only passed because they were on i386 with extra
> precision for doubles.  On amd64 and on i386 with the default precision
> for doubles, the problem with the hi+lo decomposition for denormals
> showed up as errors of about 1.5 ulps for sinpif(), tanpif(), sinpi()
> and tanpi().  Previously-discussed fixes for this worked.  I rewrote
> them from scratch (not very cleanly -- they should be in a header file,
> as should be the kernels.  1 header file can keep kernels and functions
> to multiply by pi for all precisions):

After you mentioned denormals in another email, I tested at sinpif/cospif.
I don't remember the magnitude of the ULP observed, but it was much larger
thanm I expected.  I added scaling similar to what you have done but used
a different constant. 

> XX diff -u2 s_sinpi.c~ s_sinpi.c
> XX --- s_sinpi.c~	Fri May 19 09:05:16 2017
> XX +++ s_sinpi.c	Fri May 19 19:52:07 2017
> XX @@ -77,6 +77,5 @@
> XX  sinpi(double x)
> XX  {
> XX -	volatile static const double vzero = 0;
> XX -	double ax, s;
> XX +	double ax, hi, lo, s;

I also introduced hi and lo as you have done.  If you initially
missed my overloading use of ax and x, I assumed others would
as well.

> XX  	uint32_t hx, ix, j0, lx;
> XX 
> XX @@ -87,12 +86,15 @@
> XX  	if (ix < 0x3ff00000) {			/* |x| < 1 */
> XX  		if (ix < 0x3fd00000) {		/* |x| < 0.25 */
> XX -			if (ix < 0x3e200000) {	/* |x| < 0x1p-29 */
> XX +			if (ix < 0x3de00000) {	/* |x| < 2**-33 */
> XX  				if (x == 0)
> XX  					return (x);
> XX -				INSERT_WORDS(ax, hx, 0);
> XX -				x -= ax;
> XX -				s = pilo * x + pihi * x + pilo * ax
> XX -				    + pihi * ax;
> XX -				return (s);
> XX +				if ((int)x == 1) /* gen. inexact if x != 0 */
> XX +					return (x);	/* NOTREACHED */

Is this simply to guarantee that inexact is raised?  I had
assumed that inexact would be raised by one or more if the
* and + operation in s.

> Multiplication by pi is remarkably complicated.  I noticed that you
> already avoided the trap of using the usual method of truncating x
> by converting to float -- this fails for denormals and other small
> values since floats don't have enough range.  Converting long doubles
> fails similarly.

I avoided the trap in some cases, but use cast elsewhere.
I'll need to check if those are ok.

> So 2 different methods for multiplying by pi are needed:
> - a slower general method that works for small values (use bit-fiddling
>      to truncate), and scale up
> - the usual method of casting to float for medium-sized values
> - mercifully, a third method that works on large values is not needed.
> These should be in a header file and not duplicated.  But I used the
> quick fix of duplicating the above for tanpi().
> 
> XX diff -u2 s_sinpif.c~ s_sinpif.c
> XX --- s_sinpif.c~	Fri May 19 09:05:16 2017
> XX +++ s_sinpif.c	Fri May 19 19:49:33 2017
> XX @@ -46,5 +46,4 @@
> XX  sinpif(float x)
> XX  {
> XX -	volatile static const float vzero = 0;
> XX  	float ax, s;
> XX  	uint32_t hx, ix, j0;
> XX @@ -56,14 +55,7 @@
> XX  	if (ix < 0x3f800000) {			/* |x| < 1 */
> XX  		if (ix < 0x3e800000) {		/* |x| < 0.25 */
> XX -	 		if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
> XX -				if (x == 0)
> XX -					return (x);
> XX -				SET_FLOAT_WORD(ax, hx & 0xffff0000);
> XX -				x -= ax;
> XX -				s = pilo * x + pihi * x + pilo * ax
> XX -				    + pihi * ax;
> XX -				return (s);
> XX -			}
> XX -
> XX +	 		if (ix < 0x34800000)	/* |x| < 2**-22 */
> XX +				if ((int)x == 0) /* gen. inexact if x != 0 */
> XX +					return (M_PI * x);
> XX  			s = __kernel_sinpif(ax);
> XX  			return ((hx & 0x80000000) ? -s : s);
> 
> This is simpler because it can just use double precision.

It defeats the use of the same algorithm in all precisions.
In particular, it is easier to develop an algorithm in float,
test it, and then generalize to other precisions.

> Also fix missing
> setting of inexact and 1 style bug.  This can use essentially the same code
> as sinf().  It doesn't need a special case for x == 0 since multiplication
> if +-0 by M_PI preserves the sign.  I didn't fix the style bug and pessimal
> sign handling for non-small x.  This should be the same as for sinf() too.
> The kernel preserves oddness in the default rounding mode except for x == 0
> which is not handled by the kernel.
> 
> XX static inline double
> XX __kernel_mpi(double x)
> XX {
> XX 	double hi, lo;
> XX 	int32_t hx, lx;
> XX 
> XX 	EXTRACT_WORDS(hx, lx, x);
> XX 	hi = x;
> XX 	SET_LOW_WORD(hi, 0);
> XX 	hi *= 0x1p1000;
> XX 	lo = 0x1p1000 * x  - hi;
> XX 	return ((0x1p1000 * pi_lo * x + pi_hi * lo + pi_hi * hi) * 0x1p-1000);
> XX }
> 
> This is cleaner, but not yet suitable for a kernel in a header file since
> it does a pessimal second extraction (which should be only of the high word
> again).

Why the 2nd extraction?  Neither hx nor lx are used.  Also,
int32_t should be uint32_t (or to be in line with math_private.h
perhaps u_int32_t).

> Since this is inline, perhaps the compile can optimize away the
> second extraction.  It can do this easily only if the extractions are the
> same.  If the compiler can't do this (perhaps because this isn't inlined),
> then hx (and lx if available) should be passed to the kernel.

Why?  Neither is used in the kernel.

> The above isn't a general multiplication kernel as originally intended,
> since it has to scale up to handle small values, so doesn't work for
> large values.  Actually, it would work up to 0x1p53, but be slower.
> 
> The amount of scaling up is tricky to determine.  0x1p1000 is about
> half-way in the range of ~2048 exponents, to scale up denormals to
> nearly 1 so that they obviously aren't denormals.
> 
> XX 
> XX double
> XX ysinpi(double x)

I'll need to spend part of today studying this function.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Sat May 20 23:52:43 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8B847D76664
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 20 May 2017 23:52:43 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 392671F8
 for <freebsd-numerics@freebsd.org>; Sat, 20 May 2017 23:52:42 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 84C92104B9A1;
 Sun, 21 May 2017 09:52:40 +1000 (AEST)
Date: Sun, 21 May 2017 09:52:39 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
In-Reply-To: <20170520145813.GA31395@troutmask.apl.washington.edu>
Message-ID: <20170521082808.T1142@besplex.bde.org>
References: <20170517094848.A52247@besplex.bde.org>
 <20170517180927.GA54431@troutmask.apl.washington.edu>
 <20170518072636.E951@besplex.bde.org>
 <20170518014226.GA63819@troutmask.apl.washington.edu>
 <20170518154820.A8280@besplex.bde.org>
 <20170518175434.GA74453@troutmask.apl.washington.edu>
 <20170519074512.M884@besplex.bde.org>
 <20170518234722.GB77471@troutmask.apl.washington.edu>
 <20170519153326.A20099@besplex.bde.org>
 <20170520105748.X1017@besplex.bde.org>
 <20170520145813.GA31395@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=k3lLdbPPBgMaOkpglPkA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 May 2017 23:52:43 -0000

On Sat, 20 May 2017, Steve Kargl wrote:

> On Sat, May 20, 2017 at 10:58:03AM +1000, Bruce Evans wrote:
>> On Fri, 19 May 2017, Bruce Evans wrote:
>
>> XX  	uint32_t hx, ix, j0, lx;
>> XX
>> XX @@ -87,12 +86,15 @@
>> XX  	if (ix < 0x3ff00000) {			/* |x| < 1 */
>> XX  		if (ix < 0x3fd00000) {		/* |x| < 0.25 */
>> XX -			if (ix < 0x3e200000) {	/* |x| < 0x1p-29 */
>> XX +			if (ix < 0x3de00000) {	/* |x| < 2**-33 */
>> XX  				if (x == 0)
>> XX  					return (x);
>> XX -				INSERT_WORDS(ax, hx, 0);
>> XX -				x -= ax;
>> XX -				s = pilo * x + pihi * x + pilo * ax
>> XX -				    + pihi * ax;
>> XX -				return (s);
>> XX +				if ((int)x == 1) /* gen. inexact if x != 0 */
>> XX +					return (x);	/* NOTREACHED */
>
> Is this simply to guarantee that inexact is raised?  I had
> assumed that inexact would be raised by one or more if the
> * and + operation in s.

I was confused, but accidentally correct.  Let x be a power of 2.  Then
all the multiplications are exact, but the additions are usually inexact.
However, the additions may be exact if there is extra precision, and other
cases are not clear either.

Simpler example: suppose we approximate cos(x) by 1 - x*x/2.  Then the
result is obviously exact for must powers of 2.  It is unclear if poly
coeffs have enough low bits nonzero to give inexact for _all_ args in
general.

Relevant cases here:

(1) (M_PI * x), where x is float and a small power of 2, is exact in
double precision, but inexact when rounded to float precision.  The C
standard requires destruction of the extra precision on return, since
the return type has lower rank than the nominal expression type.  (I
use a hack to prevent this.)  It is lots of low bits 1 in M_PI that
give the extra precision and the destruction of them is what sets
inexact.

(2) Expressions like:
 				s = pilo * x + pihi * x + pilo * ax
 				    + pihi * ax;
C allows the rvalue to be evaluated in extra precision.  This may be
exact, depending on how much extra precision there is.

(2a) C requires discarding the extra precision on assignment.  (This is
now discarding and not destruction, since there must be ways to remove
the extra precision intentionally and assignment is the most natural
way -- the user should have used float_t to avoid loss.)

(2b) However, (2a) is too slow when there actually is extra precision,
so x86 non-C compilers don't do it.  gcc starting with about gcc-4.8
attempts to be a C99 compiler with certain CFLAGS.  clang doesn't support
these flags.

So s may be exact too.

(2c) C11 requires destroying the extra precision on return.  (This is now
perverse destruction and incompatible with C99.  It defeats FLT_EVAL_METHOD
giving extra precision for parts of expressions using functions, at least
if the functions are written in C.

So if s is exact, C11 requires it to become inexact on return.

(2d) Howver, (2c) is too slow and broken, so x86 non-C compilers don't
do it.  gcc starting with about gcc-4.8 attempts to be a C11 compiler
with with just -std=c11 in CFLAGS and the user must use the reverse
of the more nonstandard CFLAGS to get C99 conformats to avoid this bug.
clang doesn't attempt to support this bug.

Callers can see the extra precision even if they are compiled with a C
compiler:

(3a)
 	float f = sinpif(x);
C compilers remove any extra precision and set inxact if there was any.

(3b)
 	float_t f = sinpif(x);
C compilers keep any extra precision, and code should usually be written
like this to keep it intentionally.  But if sinpif() was compiled by a
C11 compiler, it is not allowed to return extra precision.  It is unclear
if callers compiled with a C11 compiler can depend on sinpif() being
compiled with a C11 compiler so that it can't have extra precision.

(3c)
 	float_t f or float f;
 	STRICT_ASSIGN(float, f, sinpif(x));

My code to not see any extra precision when testing sinpif() does this.

>> XX diff -u2 s_sinpif.c~ s_sinpif.c
>> ...
>> XX +	 		if (ix < 0x34800000)	/* |x| < 2**-22 */
>> XX +				if ((int)x == 0) /* gen. inexact if x != 0 */
>> XX +					return (M_PI * x);
>> XX  			s = __kernel_sinpif(ax);
>> XX  			return ((hx & 0x80000000) ? -s : s);
>>
>> This is simpler because it can just use double precision.
>
> It defeats the use of the same algorithm in all precisions.
> In particular, it is easier to develop an algorithm in float,
> test it, and then generalize to other precisions.

The algorithm is already very different for the usual case since it has to
map to the kernels which use double precision.  It would be a bit much to
restore all the unoptimized kernels that use float precision just to
develop the sinpif() family.  But I have considered going back to pure
float precision to get further optimizations.  Switching precisions costs
about 2-4 cycles per switch on x86, which is a lot when some of the functions
take less than 20 cycles.  On i386, the switch should be free, but
destruction of extra precision on return costs more than 4 cycles.  On
and64, there is a switch to promote to double precision on entry and
another to demote to float precision on return.

>> XX static inline double
>> XX __kernel_mpi(double x)
>> XX {
>> XX 	double hi, lo;
>> XX 	int32_t hx, lx;
>> XX
>> XX 	EXTRACT_WORDS(hx, lx, x);
>> XX 	hi = x;
>> XX 	SET_LOW_WORD(hi, 0);
>> XX 	hi *= 0x1p1000;
>> XX 	lo = 0x1p1000 * x  - hi;
>> XX 	return ((0x1p1000 * pi_lo * x + pi_hi * lo + pi_hi * hi) * 0x1p-1000);
>> XX }
>>
>> This is cleaner, but not yet suitable for a kernel in a header file since
>> it does a pessimal second extraction (which should be only of the high word
>> again).
>
> Why the 2nd extraction?  Neither hx nor lx are used.  Also,

It is a bug.  Larger than I remembered, but the compiler will optimize
away the extra extraction.  If there is an extraction, then it should
match the one in the caller so that it can be optimized away.

> int32_t should be uint32_t (or to be in line with math_private.h
> perhaps u_int32_t).

u[_]int32_t is technically correct for raw bits, but int32_t works too
since it is pure 2's complement.  fdlibm uses int32_t to do signed
comparisons on the result or when it is just sloppy, and I must have
copied one of the sloppy places.

>> Since this is inline, perhaps the compile can optimize away the
>> second extraction.  It can do this easily only if the extractions are the
>> same.  If the compiler can't do this (perhaps because this isn't inlined),
>> then hx (and lx if available) should be passed to the kernel.
>
> Why?  Neither is used in the kernel.

They need to be the same to avoid repeated insertions or extractions.
Since my s_sinpi() only GETs the high word and __kernel_mpi() only does
a null extraction followed by a SET of the low word, the operations appear
to be independent.  However, the only efficient way to do them is 64 bits
at a time for each, with operations merged.  This would be more needed to
clear 53/2 instead of 32 low bits in 'hi'.  Then lx would be needed
explicitly.  But efficient code has to extract and insert 64 bits at a
time anyway (but on x86, keep it in unnamed SSE registers unless the
bits are used explicitly).

Bruce