From owner-freebsd-numerics@freebsd.org Wed Feb 27 10:16:08 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 29EE51518470 for ; Wed, 27 Feb 2019 10:16:08 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 6316E70622 for ; Wed, 27 Feb 2019 10:16:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 752763D6714; Wed, 27 Feb 2019 21:15:54 +1100 (AEDT) Date: Wed, 27 Feb 2019 21:15:52 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Update ENTERI() macro In-Reply-To: <20190227074811.GA75972@troutmask.apl.washington.edu> Message-ID: <20190227201214.V1823@besplex.bde.org> References: <20190226191825.GA68479@troutmask.apl.washington.edu> <20190227145002.P907@besplex.bde.org> <20190227074811.GA75972@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=_NwNQ0rs8jac9D6pTMUA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 6316E70622 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.42 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-6.44 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[42.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: extmail.optusnet.com.au]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.88)[-0.884,0]; IP_SCORE(-3.25)[ip: (-8.75), ipnet: 211.28.0.0/14(-4.15), asn: 4804(-3.30), country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Feb 2019 10:16:08 -0000 On Tue, 26 Feb 2019, Steve Kargl wrote: > On Wed, Feb 27, 2019 at 05:05:15PM +1100, Bruce Evans wrote: >> On Tue, 26 Feb 2019, Steve Kargl wrote: >* ... >>> Update the ENTERI() macro in math_private.h to take a parameter. >> ... >> I don't like this. It churns and complicates all the simple cases >> that only need ENTERI(). It bogotifies the existence of ENTERIT(), > ... > Okay. The other option is an ENTERC() and RETURNC() as > we need to toggle FP_PE for long double complex functions. > I suppose I could follow the one example currently in the > tree that use > > ENTERIT(long double complex) > > I find it somewhat odd that we have > > ENTERI() /* Implicit declaration of __retval to long double. */ > > but must use directly ENTERIT(long double complex). ENTERI() hard-codes the long double for simplicity. Remember, it is only needed for long double precision on i386. But I forgot about long double complex types, and didn't dream about indirect long double types in sincosl(). > ... >>> -#define RETURNI(x) RETURNF(x) >>> +#define ENTERI(a) >>> +#define RETURNI(a) RETURNF(a) >>> #define ENTERV() >>> #define RETURNV() return >>> #endif >> >> This also changes RETURNI(), by unimproving its parameter name. 'x' for >> ENTERI() wasn't a very good name for a type, but is good for a variable. >> 'x' for RETURNI() is slightly worse than 'r', but better than 'a' > > The renaming is for consistency. I can use 'r'. 'r' is not quite right either, since the arg can be and is often an expression. 'a' is good for 'arg'. >> ... >> But I now see 3 more problems. The return in RETURNI() is not direct, >> but goes through the macro RETURNF(x). In the committed version, this >> is a default that just returns x, but in my version it returns >> hackdouble_t(x) or hackfloat_t(x) in some cases (no cases are needed >> for long doubles, so there is no interaction with ENTERI()/LEAVEI(), >> and I only do this in a few simple cases not including any with >> complex types). > > I'm fine with making ENTERI() only toggle precision, and adding > a LEAVEI() to reset precision. RETURNI(r) would then be > > #define RETURNI(r) \ > do { \ > LEAVEI(); \ > return (r); \ > } while (0) No, may be an expression, so it must be evaluated before LEAVEI(). This is the reason for existence of the variable to hold the result. >> [... about complications for the general case] >> This reminds me of a reason why I don't like sincos*(). Its API >> requires destruction of efficiency and accuracy by returning the values >> indirectly. On i386 with not very old CPUs, this costs about 8 cycles per >> long double value. Float and double values cost about half as much. On >> amd64, the long double case is the same and the float and double cases >> are faster. > > Not sure your efficiency claim holds. I've seen significant improves > in cexp and cexpf where sin[f]() and cos[f]() are replaced by > sincos[f]. On my core2 running i386 freebsd, I see 0.1779 usecs/call > for cexpf with sinf and cosf and 0.12522 usecs/call for sincosf. > Yes, that's a 29.6% improvement. For cexp the numbers are 0.2697 > usecs/call for sin and cos and 0.20586 for sincos (ie, 23.7% improvement). > This is for z = x + I y with x and y in the non-exceptable case. Combined sin and cos probably does work better outside of benchmarks for sin and cos alone, since it does less work so leaves more resources for the, more useful things. >> sinf() and cosf() on small args take only 15-20 cycles (thoughput) on >> amd64 with not very old CPUs, so 2-8 extra cycles for the 2 indirect >> return values is a lot. sincosf() still ends up being slightly faster >> than separate sinf()/cosf(). > > Seems to be much faster when used in other functions. It's hard tp be much faster than 15-20 cycles. The latency is more like 50 cycles, with 3 sinf()'s or cosf()'s running in parallel. sincos() is far from the best possible optimization for repeated calls on the same or nearby args. If sin() and cos() cached the arg reduction, then separate sin() and cos() on the same arg would run about as fast as sincos(), and repeated sin()'s on the same arg would run much faster than now. Caching the arg reduction may also be good when the arg changes slightly. However, caching is slower if the args are not close. Even a 1-entry cache takes a long time to look up relative to the 15-20 cycles taken by sinf() and cosf(). Caching is complicated by signal handlers and threads. Perhaps the right API one that has to ask for caching and provides the cache storage. Then sincos() could be: ... _dh_init(x, &dh); /* prefill 1-entry cache dh */ s = _sin_cache(x, &dh, 1); /* cache hit unless x is NaN /* cache misses update dh */ c = _cos_cache(x, &dh, 1); /* cache hit unless x is NaN ... and with everything inlined this is little different from the current sincos() except for NaNs. NaNs can be cache hits too if you compare them as bits, but the comparison should probably be x == dhp->dh_x for a 1-entry cache, so as to not to have to extract the bits of x. Bruce