From owner-freebsd-hackers@freebsd.org Fri Apr 28 16:56:59 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A4CE3D549E2; Fri, 28 Apr 2017 16:56:59 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 859451613; Fri, 28 Apr 2017 16:56:59 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v3SGuwdl017872 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 28 Apr 2017 09:56:58 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v3SGuwdV017871; Fri, 28 Apr 2017 09:56:58 -0700 (PDT) (envelope-from sgk) Date: Fri, 28 Apr 2017 09:56:58 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170428165658.GA17560@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428010122.GA12814@troutmask.apl.washington.edu> <20170428183733.V1497@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170428183733.V1497@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Apr 2017 16:56:59 -0000 On Fri, Apr 28, 2017 at 07:13:16PM +1000, Bruce Evans wrote: > On Thu, 27 Apr 2017, Steve Kargl wrote: > > > On Thu, Apr 27, 2017 at 04:14:11PM -0700, Steve Kargl wrote: > >> > >> I have attached a new diff to the bugzilla report. The > >> diff is 3090 lines and won't be broadcast the mailing list. > >> > >> This diff includes fixes for a few inconsequential bugs > >> and implements modified Estrin's method for sum a few > >> ploynomials. If you want the previous Horner's method > >> then add -DHORNER to your CFLAGS. > > > > For those curious about testing, here are some numbers > > for the Estrin's method. These were obtained on an AMD > > FX(tm)-8350 @ 4018.34-MHz. The times are in microseconds > > and the number in parentheses are estimated cycles. > > > > | cospi | sinpi | tanpi > > ------------+--------------+--------------+-------------- > > float | 0.0089 (37) | 0.0130 (54) | 0.0194 (80) > > double | 0.0134 (55) | 0.0160 (66) | 0.0249 (102) > > long double | 0.0557 (228) | 0.0698 (287) | 0.0956 (393) > > ------------+--------------+--------------+-------------- > > > > The timing tests are done on the interval [0,0.25] as > > this is the interval where the polynomial approximations > > apply. Limited accuracy testing gives > > These still seem slow. I did a quick test of naive evaluations of > these functions as standard_function(Pi * x) and get times a bit faster > on Haswell, except 2-4 times faster for the long double case, with the > handicaps of using gcc-3.3.3 and i386. Of course, the naive evaluation > is inaccurate, especially near multiples of Pi/2. The slowness comes from handling extra precision arithmetic. For sinpi(x) = x * p(x) where p(x) = s0 + x2 * S(x2) is the polynomial approximation and x2 = x * x. One needs to take care in the ordering on the evaluation where x and x2 are split into high and low parts. See the source code posted in response to your other email. > > x in [0,0.25] | tanpif | tanpi | tanpil > > -----------------+------------+------------+------------- > > MAX ULP | 1.37954760 | 1.37300168 | 1.38800823 > > Just use the naive evaluation to get similar errors in this > range. It is probably faster too. I haven't checked tanpif, but naive evaluation is faster for sinpif(x). But getting the worry answer fast seems to be counter to what one may expect from a math library. Consider the two naive implementations: float bde1(float x) { return (sinf((float)M_PI * x)); } float bde2(float x) { return (sinf((float)(M_PI * x))); } float bde3(float x) { return ((float)sin(M_PI * x)); } x in [0,0.25] | sinpif | bde1(x) | bde2(x) | bde3(x) -----------------+-------------+-------------+--------------+------------ Time usec (cyc) | 0.0130 (54) | 0.0084 (35) | 0.0093 (38) | 0.0109 (45) MAX ULP | 0.80103672 | 1.94017851 | 1.46830785 | 0.5 1.0 < ULP | 0 | 736496 | 47607 | 0 0.9 < ULP <= 1.0 | 0 | 563122 | 101162 | 0 0.8 < ULP <= 0.9 | 1 | 751605 | 386128 | 0 0.7 < ULP <= 0.8 | 727 | 942349 | 753647 | 0 0.6 < ULP <= 0.7 | 19268 | 1169719 | 1123518 | 0 x in [3990,3992] | sinpif | bde1(x) | bde2(x) | bde3(x) -----------------+-------------+-------------+-------------+------------ Time usec (cyc) | 0.0161 (66) | 0.0137 (56) | 0.0144 (59) | 0.0211 (87) MAX ULP | 0.65193975 | 10458803. | 6471461. | 0.5 1.0 < ULP | 0 | 16773117 | 16762878 | 0 0.9 < ULP <= 1.0 | 0 | 0 | 0 | 0 0.8 < ULP <= 0.9 | 0 | 0 | 0 | 0 0.7 < ULP <= 0.8 | 0 | 0 | 0 | 0 0.6 < ULP <= 0.7 | 19268 | 2047 | 2047 | 0 So bde3(x) is best, but if you're going to use sin(M_PI*x) for float sinpif, then use sinpi((double)x) which is faster than bde3 and just as accurate. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow