FreeBSD Mail Archives

Date:      Thu, 28 Feb 2019 07:15:14 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Steve Kargl <sgk@troutmask.apl.washington.edu>
Cc:        freebsd-numerics@freebsd.org
Subject:   Re: Update ENTERI() macro
Message-ID:  <20190228060920.R4413@besplex.bde.org>
In-Reply-To: <20190227161906.GA77785@troutmask.apl.washington.edu>
References:  <20190226191825.GA68479@troutmask.apl.washington.edu> <20190227145002.P907@besplex.bde.org> <20190227074811.GA75972@troutmask.apl.washington.edu> <20190227201214.V1823@besplex.bde.org> <20190227161906.GA77785@troutmask.apl.washington.edu>

On Wed, 27 Feb 2019, Steve Kargl wrote:

> On Wed, Feb 27, 2019 at 09:15:52PM +1100, Bruce Evans wrote:
>>
>> ENTERI() hard-codes the long double for simplicity.  Remember, it is only
>> needed for long double precision on i386.  But I forgot about long double
>> complex types, and didn't dream about indirect long double types in sincosl().
>
> That simplicity does not work for long double complex.  We will
> 
> need either ENTERIC as in
>
> #define ENTERIC() ENTERIT(long double complex)
>
> or a direct use of ENTERIT as you have done s_clogl.c

I wrote ENTERIT() to work around this problem.

>>> I'm fine with making ENTERI() only toggle precision, and adding
>>> a LEAVEI() to reset precision.  RETURNI(r) would then be
>>>
>>> #define RETURNI(r)	\
>>> do {		\
>>>   LEAVEI();		\
>>>   return (r);	\
>>> } while (0)
>>
>> No, may be an expression, so it must be evaluated before LEAVEI().  This
>> is the reason for existence of the variable to hold the result.
>
> So, we'll need RETURNI for long double and one for long double complex.
> Or, we give RETURNI a second parameter, which is the input parameter of
> the function

I said to use your method of __typeof().  I tested this:

XX --- /tmp/math_private.h	Sun Nov 27 17:58:57 2005
XX +++ ./math_private.h	Thu Feb 28 06:17:26 2019
XX @@ -474,21 +474,22 @@
XX  /* Support switching the mode to FP_PE if necessary. */
XX  #if defined(__i386__) && !defined(NO_FPSETPREC)
XX -#define	ENTERI() ENTERIT(long double)
XX -#define	ENTERIT(returntype)			\
XX -	returntype __retval;			\
XX +#define	ENTERI()				\
XX  	fp_prec_t __oprec;			\
XX  						\
XX  	if ((__oprec = fpgetprec()) != FP_PE)	\
XX  		fpsetprec(FP_PE)
XX -#define	RETURNI(x) do {				\
XX -	__retval = (x);				\
XX -	if (__oprec != FP_PE)			\
XX -		fpsetprec(__oprec);		\
XX +#define	LEAVEI()				\
XX +	if ((__oprec = fpgetprec()) != FP_PE)	\
XX +		fpsetprec(FP_PE)
XX +#define	RETURNI(expr) do {			\
XX +	__typeof(expr) __retval = (expr);	\
XX +						\
XX +	LEAVEI();				\
XX  	RETURNF(__retval);			\
XX  } while (0)
XX  #else
XX  #define	ENTERI()
XX -#define	ENTERIT(x)
XX -#define	RETURNI(x)	RETURNF(x)
XX +#define	LEAVEI()
XX +#define	RETURNI(expr)	RETURNF(expr)
XX  #endif
XX

This compiles, but has minor problems.  Note that the apparent style
bug of initializing __retval in its declaration is needed in cases
where __typeof() gives a const type.  This happens in my code that
uses RETURNI(1 + tiny) to set inexact.  I think it would also happen
for RETURNI(1).  The type is then int instead of floating point, and
I need to check that this is harmless.

clogl() is the only user of ENTERIT().  Its size expands from 2302
bytes text to 2399 when compiled by gcc-3.3.3.  I hope that this is
just gcc not doing a very good job optimizing the returns (there are
many RETURNI()s fpr clogl()).  Repeating the return code instead of
jumping to it might even be optimal.

> #define RETURNI(x, r)	\
> do {			\
>   x = (r)		\
>   LEAVEI();		\
>   return (r);		\
> } while (0)
>
> This will cause a lot of churn.

Indeed.

My version causes 1 line of churn:

XX --- /tmp/s_clogl.c	Fri Jul 20 16:00:11 2018
XX +++ ./s_clogl.c	Thu Feb 28 05:58:05 2019
XX @@ -66,5 +66,5 @@
XX  	int kx, ky;
XX 
XX -	ENTERIT(long double complex);
XX +	ENTERI();
XX 
XX  	x = creall(z);

>> Combined sin and cos probably does work better outside of benchmarks for
>> sin and cos alone, since it does less work so leaves more resources for
>> the, more useful things.
>
> Exactly!  I have a significant amount of Fortran code that does
>
>   z = cmplx(cos(x), sin(x))
>
> in modern C this is 'z = CMPLX(cos(x), sin(x))'.  GCC with optimization
> enables will convert this to z = cexp(cmplx(0,x)) where it expects cexp
> to optimize this to sincos().

This is an pessimization unless everything is inlined.  An optimization
would convert cexp(cmplx(0,x)) to sin(x) and cos(x) or sincos(x).

> GCC on FreeBSD will not do this optimization
> because FreeBSD's libm is not C99 compliant.

It is more conformant than most for cexp().  I think old gcc just doesn't
attempt such optimizations.

> When I worked on sincos() I tried a few variations.  This included
> the simpliest implementation:
>
> void
> sincos(double x, double *s, double *c)
> {
>  *c = cos(x);
>  *s=  sin(x);
> }
>
> I tried argument reduction with kernels.
>
> void
> sincos(double x, double *s, double *c)
> {
>  a = inline argument reduction done to set a.
>  *c = k_cos(x);
>  *s=  k_sin(x);
> }

You mean *c = s_cos(x), etc.  That was good enough.

> And finally the version that was committed where k_cos and k_sin
> were manually inlined and re-arranged to reduce redundant computations.

That has excessive manual inlining.  It should have only inlined s_cos()
and s_sin(), and changed k_cos() and k_sin() from extern to static inline.
Someday the data for these inline functions should be deduplicated, but
the data is small compared with that for the expl kernel.

Bruce

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190228060920.R4413>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation