From owner-freebsd-numerics@freebsd.org  Wed Feb 27 20:15:27 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 45F3A1503C6C
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 27 Feb 2019 20:15:27 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id D30AA8DAB2
 for <freebsd-numerics@freebsd.org>; Wed, 27 Feb 2019 20:15:24 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 7426B105FEF3;
 Thu, 28 Feb 2019 07:15:14 +1100 (AEDT)
Date: Thu, 28 Feb 2019 07:15:14 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Update ENTERI() macro
In-Reply-To: <20190227161906.GA77785@troutmask.apl.washington.edu>
Message-ID: <20190228060920.R4413@besplex.bde.org>
References: <20190226191825.GA68479@troutmask.apl.washington.edu>
 <20190227145002.P907@besplex.bde.org>
 <20190227074811.GA75972@troutmask.apl.washington.edu>
 <20190227201214.V1823@besplex.bde.org>
 <20190227161906.GA77785@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=k8ySeul569u4Qwmp2EoA:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: D30AA8DAB2
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates
 211.29.132.249 as permitted sender) smtp.mailfrom=brde@optusnet.com.au
X-Spamd-Result: default: False [-6.41 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 RCVD_IN_DNSWL_LOW(-0.10)[249.132.29.211.list.dnswl.org : 127.0.5.1];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23];
 FREEMAIL_FROM(0.00)[optusnet.com.au];
 MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 MX_GOOD(-0.01)[cached: extmail.optusnet.com.au];
 RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.95)[-0.947,0];
 IP_SCORE(-3.15)[ip: (-8.24), ipnet: 211.28.0.0/14(-4.16), asn: 4804(-3.31),
 country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 FREEMAIL_ENVFROM(0.00)[optusnet.com.au];
 ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU];
 MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Feb 2019 20:15:27 -0000

On Wed, 27 Feb 2019, Steve Kargl wrote:

> On Wed, Feb 27, 2019 at 09:15:52PM +1100, Bruce Evans wrote:
>>
>> ENTERI() hard-codes the long double for simplicity.  Remember, it is only
>> needed for long double precision on i386.  But I forgot about long double
>> complex types, and didn't dream about indirect long double types in sincosl().
>
> That simplicity does not work for long double complex.  We will
> 
> need either ENTERIC as in
>
> #define ENTERIC() ENTERIT(long double complex)
>
> or a direct use of ENTERIT as you have done s_clogl.c

I wrote ENTERIT() to work around this problem.

>>> I'm fine with making ENTERI() only toggle precision, and adding
>>> a LEAVEI() to reset precision.  RETURNI(r) would then be
>>>
>>> #define RETURNI(r)	\
>>> do {		\
>>>   LEAVEI();		\
>>>   return (r);	\
>>> } while (0)
>>
>> No, may be an expression, so it must be evaluated before LEAVEI().  This
>> is the reason for existence of the variable to hold the result.
>
> So, we'll need RETURNI for long double and one for long double complex.
> Or, we give RETURNI a second parameter, which is the input parameter of
> the function

I said to use your method of __typeof().  I tested this:

XX --- /tmp/math_private.h	Sun Nov 27 17:58:57 2005
XX +++ ./math_private.h	Thu Feb 28 06:17:26 2019
XX @@ -474,21 +474,22 @@
XX  /* Support switching the mode to FP_PE if necessary. */
XX  #if defined(__i386__) && !defined(NO_FPSETPREC)
XX -#define	ENTERI() ENTERIT(long double)
XX -#define	ENTERIT(returntype)			\
XX -	returntype __retval;			\
XX +#define	ENTERI()				\
XX  	fp_prec_t __oprec;			\
XX  						\
XX  	if ((__oprec = fpgetprec()) != FP_PE)	\
XX  		fpsetprec(FP_PE)
XX -#define	RETURNI(x) do {				\
XX -	__retval = (x);				\
XX -	if (__oprec != FP_PE)			\
XX -		fpsetprec(__oprec);		\
XX +#define	LEAVEI()				\
XX +	if ((__oprec = fpgetprec()) != FP_PE)	\
XX +		fpsetprec(FP_PE)
XX +#define	RETURNI(expr) do {			\
XX +	__typeof(expr) __retval = (expr);	\
XX +						\
XX +	LEAVEI();				\
XX  	RETURNF(__retval);			\
XX  } while (0)
XX  #else
XX  #define	ENTERI()
XX -#define	ENTERIT(x)
XX -#define	RETURNI(x)	RETURNF(x)
XX +#define	LEAVEI()
XX +#define	RETURNI(expr)	RETURNF(expr)
XX  #endif
XX

This compiles, but has minor problems.  Note that the apparent style
bug of initializing __retval in its declaration is needed in cases
where __typeof() gives a const type.  This happens in my code that
uses RETURNI(1 + tiny) to set inexact.  I think it would also happen
for RETURNI(1).  The type is then int instead of floating point, and
I need to check that this is harmless.

clogl() is the only user of ENTERIT().  Its size expands from 2302
bytes text to 2399 when compiled by gcc-3.3.3.  I hope that this is
just gcc not doing a very good job optimizing the returns (there are
many RETURNI()s fpr clogl()).  Repeating the return code instead of
jumping to it might even be optimal.

> #define RETURNI(x, r)	\
> do {			\
>   x = (r)		\
>   LEAVEI();		\
>   return (r);		\
> } while (0)
>
> This will cause a lot of churn.

Indeed.

My version causes 1 line of churn:

XX --- /tmp/s_clogl.c	Fri Jul 20 16:00:11 2018
XX +++ ./s_clogl.c	Thu Feb 28 05:58:05 2019
XX @@ -66,5 +66,5 @@
XX  	int kx, ky;
XX 
XX -	ENTERIT(long double complex);
XX +	ENTERI();
XX 
XX  	x = creall(z);

>> Combined sin and cos probably does work better outside of benchmarks for
>> sin and cos alone, since it does less work so leaves more resources for
>> the, more useful things.
>
> Exactly!  I have a significant amount of Fortran code that does
>
>   z = cmplx(cos(x), sin(x))
>
> in modern C this is 'z = CMPLX(cos(x), sin(x))'.  GCC with optimization
> enables will convert this to z = cexp(cmplx(0,x)) where it expects cexp
> to optimize this to sincos().

This is an pessimization unless everything is inlined.  An optimization
would convert cexp(cmplx(0,x)) to sin(x) and cos(x) or sincos(x).

> GCC on FreeBSD will not do this optimization
> because FreeBSD's libm is not C99 compliant.

It is more conformant than most for cexp().  I think old gcc just doesn't
attempt such optimizations.

> When I worked on sincos() I tried a few variations.  This included
> the simpliest implementation:
>
> void
> sincos(double x, double *s, double *c)
> {
>  *c = cos(x);
>  *s=  sin(x);
> }
>
> I tried argument reduction with kernels.
>
> void
> sincos(double x, double *s, double *c)
> {
>  a = inline argument reduction done to set a.
>  *c = k_cos(x);
>  *s=  k_sin(x);
> }

You mean *c = s_cos(x), etc.  That was good enough.

> And finally the version that was committed where k_cos and k_sin
> were manually inlined and re-arranged to reduce redundant computations.

That has excessive manual inlining.  It should have only inlined s_cos()
and s_sin(), and changed k_cos() and k_sin() from extern to static inline.
Someday the data for these inline functions should be deduplicated, but
the data is small compared with that for the expl kernel.

Bruce