Date: Thu, 28 Feb 2019 07:15:14 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: freebsd-numerics@freebsd.org Subject: Re: Update ENTERI() macro Message-ID: <20190228060920.R4413@besplex.bde.org> In-Reply-To: <20190227161906.GA77785@troutmask.apl.washington.edu> References: <20190226191825.GA68479@troutmask.apl.washington.edu> <20190227145002.P907@besplex.bde.org> <20190227074811.GA75972@troutmask.apl.washington.edu> <20190227201214.V1823@besplex.bde.org> <20190227161906.GA77785@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 27 Feb 2019, Steve Kargl wrote: > On Wed, Feb 27, 2019 at 09:15:52PM +1100, Bruce Evans wrote: >> >> ENTERI() hard-codes the long double for simplicity. Remember, it is only >> needed for long double precision on i386. But I forgot about long double >> complex types, and didn't dream about indirect long double types in sincosl(). > > That simplicity does not work for long double complex. We will > > need either ENTERIC as in > > #define ENTERIC() ENTERIT(long double complex) > > or a direct use of ENTERIT as you have done s_clogl.c I wrote ENTERIT() to work around this problem. >>> I'm fine with making ENTERI() only toggle precision, and adding >>> a LEAVEI() to reset precision. RETURNI(r) would then be >>> >>> #define RETURNI(r) \ >>> do { \ >>> LEAVEI(); \ >>> return (r); \ >>> } while (0) >> >> No, may be an expression, so it must be evaluated before LEAVEI(). This >> is the reason for existence of the variable to hold the result. > > So, we'll need RETURNI for long double and one for long double complex. > Or, we give RETURNI a second parameter, which is the input parameter of > the function I said to use your method of __typeof(). I tested this: XX --- /tmp/math_private.h Sun Nov 27 17:58:57 2005 XX +++ ./math_private.h Thu Feb 28 06:17:26 2019 XX @@ -474,21 +474,22 @@ XX /* Support switching the mode to FP_PE if necessary. */ XX #if defined(__i386__) && !defined(NO_FPSETPREC) XX -#define ENTERI() ENTERIT(long double) XX -#define ENTERIT(returntype) \ XX - returntype __retval; \ XX +#define ENTERI() \ XX fp_prec_t __oprec; \ XX \ XX if ((__oprec = fpgetprec()) != FP_PE) \ XX fpsetprec(FP_PE) XX -#define RETURNI(x) do { \ XX - __retval = (x); \ XX - if (__oprec != FP_PE) \ XX - fpsetprec(__oprec); \ XX +#define LEAVEI() \ XX + if ((__oprec = fpgetprec()) != FP_PE) \ XX + fpsetprec(FP_PE) XX +#define RETURNI(expr) do { \ XX + __typeof(expr) __retval = (expr); \ XX + \ XX + LEAVEI(); \ XX RETURNF(__retval); \ XX } while (0) XX #else XX #define ENTERI() XX -#define ENTERIT(x) XX -#define RETURNI(x) RETURNF(x) XX +#define LEAVEI() XX +#define RETURNI(expr) RETURNF(expr) XX #endif XX This compiles, but has minor problems. Note that the apparent style bug of initializing __retval in its declaration is needed in cases where __typeof() gives a const type. This happens in my code that uses RETURNI(1 + tiny) to set inexact. I think it would also happen for RETURNI(1). The type is then int instead of floating point, and I need to check that this is harmless. clogl() is the only user of ENTERIT(). Its size expands from 2302 bytes text to 2399 when compiled by gcc-3.3.3. I hope that this is just gcc not doing a very good job optimizing the returns (there are many RETURNI()s fpr clogl()). Repeating the return code instead of jumping to it might even be optimal. > #define RETURNI(x, r) \ > do { \ > x = (r) \ > LEAVEI(); \ > return (r); \ > } while (0) > > This will cause a lot of churn. Indeed. My version causes 1 line of churn: XX --- /tmp/s_clogl.c Fri Jul 20 16:00:11 2018 XX +++ ./s_clogl.c Thu Feb 28 05:58:05 2019 XX @@ -66,5 +66,5 @@ XX int kx, ky; XX XX - ENTERIT(long double complex); XX + ENTERI(); XX XX x = creall(z); >> Combined sin and cos probably does work better outside of benchmarks for >> sin and cos alone, since it does less work so leaves more resources for >> the, more useful things. > > Exactly! I have a significant amount of Fortran code that does > > z = cmplx(cos(x), sin(x)) > > in modern C this is 'z = CMPLX(cos(x), sin(x))'. GCC with optimization > enables will convert this to z = cexp(cmplx(0,x)) where it expects cexp > to optimize this to sincos(). This is an pessimization unless everything is inlined. An optimization would convert cexp(cmplx(0,x)) to sin(x) and cos(x) or sincos(x). > GCC on FreeBSD will not do this optimization > because FreeBSD's libm is not C99 compliant. It is more conformant than most for cexp(). I think old gcc just doesn't attempt such optimizations. > When I worked on sincos() I tried a few variations. This included > the simpliest implementation: > > void > sincos(double x, double *s, double *c) > { > *c = cos(x); > *s= sin(x); > } > > I tried argument reduction with kernels. > > void > sincos(double x, double *s, double *c) > { > a = inline argument reduction done to set a. > *c = k_cos(x); > *s= k_sin(x); > } You mean *c = s_cos(x), etc. That was good enough. > And finally the version that was committed where k_cos and k_sin > were manually inlined and re-arranged to reduce redundant computations. That has excessive manual inlining. It should have only inlined s_cos() and s_sin(), and changed k_cos() and k_sin() from extern to static inline. Someday the data for these inline functions should be deduplicated, but the data is small compared with that for the expl kernel. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190228060920.R4413>