Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Feb 2019 17:05:15 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Steve Kargl <sgk@troutmask.apl.washington.edu>
Cc:        freebsd-numerics@freebsd.org
Subject:   Re: Update ENTERI() macro
Message-ID:  <20190227145002.P907@besplex.bde.org>
In-Reply-To: <20190226191825.GA68479@troutmask.apl.washington.edu>
References:  <20190226191825.GA68479@troutmask.apl.washington.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 26 Feb 2019, Steve Kargl wrote:

> I have submitted
>
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236063
>
> Update the ENTERI() macro in math_private.h to take a parameter.
> This parameter is used by __typeof() to set the type for an
> intermediate variable that holds the return value of a function
> on i386 FreeBSD.  The patch updates the consumers of ENTERI().

I don't like this.  It churns and complicates all the simple cases
that only need ENTERI().  It bogotifies the existence of ENTERIT(),
whose reason for existence is to avoid this.  (Actually, it removes
ENTERIT().)  __typeof() is unnecessarily unportable.  The implementation
uses literal typesin thousands of places so gains little or less than
nothing by not spelling types literally for ENTERIT().

> ...
> --- /data/kargl/freebsd/src/lib/msun/src/math_private.h	2018-10-31 16:15:21.809877000 -0700
> +++ ./src/math_private.h	2019-02-25 16:54:26.897247000 -0800
> @@ -335,15 +348,14 @@
>
> /* Support switching the mode to FP_PE if necessary. */
> #if defined(__i386__) && !defined(NO_FPSETPREC)
> -#define	ENTERI() ENTERIT(long double)
> -#define	ENTERIT(returntype)			\
> -	returntype __retval;			\
> +#define	ENTERI(a)				\
> +	__typeof(a) __retval;			\
> 	fp_prec_t __oprec;			\
> 						\
> 	if ((__oprec = fpgetprec()) != FP_PE)	\
> 		fpsetprec(FP_PE)
> -#define	RETURNI(x) do {				\
> -	__retval = (x);				\
> +#define	RETURNI(a) do {				\
> +	__retval = (a);				\
> 	if (__oprec != FP_PE)			\
> 		fpsetprec(__oprec);		\
> 	RETURNF(__retval);			\
> @@ -359,9 +371,8 @@
> 	return;			\
> } while (0)
> #else
> -#define	ENTERI()
> -#define	ENTERIT(x)
> -#define	RETURNI(x)	RETURNF(x)
> +#define	ENTERI(a)
> +#define	RETURNI(a)	RETURNF(a)
> #define	ENTERV()
> #define	RETURNV()	return
> #endif

This also changes RETURNI(), by unimproving its parameter name.  'x' for
ENTERI() wasn't a very good name for a type, but is good for a variable.
'x' for RETURNI() is slightly worse than 'r', but better than 'a'

I had forgotten that ENTERIT() creates a local variable, and don't like this,
but this is its main reason for existence.  Now I see a better way,
especially if if we accept unportabilities like __typeof():

ENTERIT() doesn't need to declare local variable with nearly function scope,
since the variable is only needed for returning.  Only the precision variable
needs to have nearly function scope.  The return variable can be created in
block scope in RETURNI().  Something like:

#define	RETURNI(r) do {				\
 	__typeof(r) __retval;			\
 						\
 	__retval = (r);				\
 	if ((__oprec = fpgetprec()) != FP_PE)	\
 		fpsetprec(FP_PE)		\
 	RETURNF(__retval);			\
} while (0)

You already added ENTERV() and RETURNV().  That seemed reasonable, but now
I don't like it.  With the above change (and of course also remove the
variable from ENTERI()), ENTERV() becomes the same as ENTERI().  RETURNV()
still needs to be different.

Another bug in ENTERV() is its name.  It is missing the 'I' to indicate that
it is a complication for only i386.

RETURNV() has the same naming problem, and more.  It can complement
ENTERI() perfectly since it doesn't need to return.  It should be named
LEAVEI(), and the return should be done by the caller.

But I now see 3 more problems.  The return in RETURNI() is not direct,
but goes through the macro RETURNF(x).  In the committed version, this
is a default that just returns x, but in my version it returns
hackdouble_t(x) or hackfloat_t(x) in some cases (no cases are needed
for long doubles, so there is no interaction with ENTERI()/LEAVEI(),
and I only do this in a few simple cases not including any with
complex types).

In the hack*_t() cases, x has type double_t or float_t, and we want
to avoid C99's requirement to destroy efficiency and accuracy on return
by removing any extra precision in x.  (C99 only requires this
destruction for careful code that uses double_t or float_t in the
function.  Then since the return type is different from the internal
type, destruction is required.  C99 also requires removing extra
precision on assignment or casts, but most compilers with most CFLAGS
don't do this since it is pessimal.  Careful code uses double_t and
float_t internally so as to not depend on compiler bugs for efficiency
and to avoid unintended loss of accuracy on assignment ands casts for
compilers without bugs.  C11 adds further destruction for sloppy code
that uses the return type internally.)  hackdouble_t(x) and
hackfloat_t(x) change from type xxx_t to type xxx using a type pun so
that the destruction is not required in the C99 case.

The 3 problems are:
- the type pun interacts in a complicated way with changing the precision.
   This problem is null since type type pun is not needed for long doubles
   (unless there is a type with more precision than long double, but there
   is no such type for any supported arch, and C99 is missing support for
   it).  If this were a problem, then we would have to be carefull about
   the order of the type pun and the precision change.  Do the type pun
   first.

- also, be careful to apply any __typeof() to an arg with the correct
   type.  Normally do the type pun first, and use the return type for
   the local variable.  However, for printing, the arg should be the
   type that has extra precision, and __typeof() on that doesn't give
   the return type in general.

   In simple cases, The modified RETURNF() expands to something like
     'return (hackfloat_t(hi + lo));',
   where hi + lo is evalated in extra precision using float_t if possible,
   hackfloat_t puns the resulting float_t value to plain float without
   any destruction, and we finally return the value.

   In more complicated cases, the punned returned value is stored in a
   variable for printing using RETURNP().  I forget if the type of this
   variable is the return type or the arg type (printing in the return
   type shows what callers will normally see but printing in the arg
   type gives more details).  The caller uses RETURNP*(), and RETURNP*()
   uses RETURNF() and RETURNF() might do the type pun.  My code is not
   careful enough to handle all cases.  For the common hi + lo case,
   RETURN2P[I]() is used to print hi and lo separately, so the extra
   precision in hi + lo can be checked even on arches where hi + lo is
   not evaluated in extra precision.

- finally, it is necessary to go through a macro to reach the printing.
   Not many functions do this.  But most long double functions get this
   at no extra cost (except complexity in the macros) by having to go
   though ENTERI()/RETURNI().  Currently, the caller must decide, but
   it is a trivaial edit to change RETURNI() to RETURNP() and a simple
   edit to change RETURNI(hi + lo) to RETURN2P(hi, lo).  There is also
   SUM2P() to handle more complicated cases of hi + lo.

The return value is not passed to RETURNV(), so it is harder to get
automatic printing at no extra cost.  In s_sincosf.c and s_sincos.c,
the macros are just not used, the same as for s_sinf.c, etc.  The
float case double on all arches instead of double_t.  On i386, this
reduces to a slightly pessimized use of double_t, and on arches with
no extra precision it sacrifices a little efficiency to get more
accuracy.  In s_sincosl.c, the values are calculated and stored in
local variables, and then RETURNV() returns void.

This reminds me of a reason why I don't like sincos*().  Its API
requires destruction of efficiency and accuracy by returning the values
indirectly.  On i386 with not very old CPUs, this costs about 8 cycles per
long double value.  Float and double values cost about half as much.  On
amd64, the long double case is the same and the float and double cases
are faster.

sinf() and cosf() on small args take only 15-20 cycles (thoughput) on
amd64 with not very old CPUs, so 2-8 extra cycles for the 2 indirect
return values is a lot.  sincosf() still ends up being slightly faster
than separate sinf()/cosf().

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190227145002.P907>