From owner-freebsd-hackers@freebsd.org  Fri Apr 28 16:56:59 2017
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A4CE3D549E2;
 Fri, 28 Apr 2017 16:56:59 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 859451613;
 Fri, 28 Apr 2017 16:56:59 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v3SGuwdl017872
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Fri, 28 Apr 2017 09:56:58 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v3SGuwdV017871;
 Fri, 28 Apr 2017 09:56:58 -0700 (PDT) (envelope-from sgk)
Date: Fri, 28 Apr 2017 09:56:58 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org, freebsd-hackers@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
Message-ID: <20170428165658.GA17560@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170409220809.GA25076@troutmask.apl.washington.edu>
 <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428010122.GA12814@troutmask.apl.washington.edu>
 <20170428183733.V1497@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170428183733.V1497@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Apr 2017 16:56:59 -0000

On Fri, Apr 28, 2017 at 07:13:16PM +1000, Bruce Evans wrote:
> On Thu, 27 Apr 2017, Steve Kargl wrote:
> 
> > On Thu, Apr 27, 2017 at 04:14:11PM -0700, Steve Kargl wrote:
> >>
> >> I have attached a new diff to the bugzilla report.  The
> >> diff is 3090 lines and won't be broadcast the mailing list.
> >>
> >> This diff includes fixes for a few inconsequential bugs
> >> and implements modified Estrin's method for sum a few
> >> ploynomials.  If you want the previous Horner's method
> >> then add -DHORNER to your CFLAGS.
> >
> > For those curious about testing, here are some numbers
> > for the Estrin's method.  These were obtained on an AMD
> > FX(tm)-8350 @ 4018.34-MHz.  The times are in microseconds
> > and the number in parentheses are estimated cycles.
> >
> >            |    cospi     |    sinpi     |    tanpi
> > ------------+--------------+--------------+--------------
> > float       | 0.0089 (37)  | 0.0130 (54)  | 0.0194 (80)
> > double      | 0.0134 (55)  | 0.0160 (66)  | 0.0249 (102)
> > long double | 0.0557 (228) | 0.0698 (287) | 0.0956 (393)
> > ------------+--------------+--------------+--------------
> >
> > The timing tests are done on the interval [0,0.25] as
> > this is the interval where the polynomial approximations
> > apply.  Limited accuracy testing gives
> 
> These still seem slow.  I did a quick test of naive evaluations of
> these functions as standard_function(Pi * x) and get times a bit faster
> on Haswell, except 2-4 times faster for the long double case, with the
> handicaps of using gcc-3.3.3 and i386.  Of course, the naive evaluation
> is inaccurate, especially near multiples of Pi/2.

The slowness comes from handling extra precision arithmetic.  For
sinpi(x) = x * p(x) where p(x) = s0 + x2 * S(x2) is the polynomial
approximation and x2 = x * x.  One needs to take care in the ordering
on the evaluation where x and x2 are split into high and low parts.
See the source code posted in response to your other email.  

> > x in [0,0.25]   |   tanpif   |   tanpi    |   tanpil
> > -----------------+------------+------------+-------------
> >         MAX ULP | 1.37954760 | 1.37300168 | 1.38800823
> 
> Just use the naive evaluation to get similar errors in this
> range.  It is probably faster too.

I haven't checked tanpif, but naive evaluation is faster
for sinpif(x).  But getting the worry answer fast seems
to be counter to what one may expect from a math library.
Consider the two naive implementations:

float
bde1(float x)
{
   return (sinf((float)M_PI * x));
}

float
bde2(float x)
{
   return (sinf((float)(M_PI * x)));
}

float
bde3(float x)
{
   return ((float)sin(M_PI * x));
}

x in [0,0.25]    |   sinpif    |    bde1(x)  |    bde2(x)   |   bde3(x)
-----------------+-------------+-------------+--------------+------------
Time usec (cyc)  | 0.0130 (54) | 0.0084 (35) | 0.0093 (38)  | 0.0109 (45)
         MAX ULP | 0.80103672  | 1.94017851  | 1.46830785   | 0.5
1.0 < ULP        | 0           | 736496      | 47607        | 0
0.9 < ULP <= 1.0 | 0           | 563122      | 101162       | 0
0.8 < ULP <= 0.9 | 1           | 751605      | 386128       | 0
0.7 < ULP <= 0.8 | 727         | 942349      | 753647       | 0
0.6 < ULP <= 0.7 | 19268       | 1169719     | 1123518      | 0

x in [3990,3992] |   sinpif    |   bde1(x)   |   bde2(x)   |   bde3(x)
-----------------+-------------+-------------+-------------+------------
Time usec (cyc)  | 0.0161 (66) | 0.0137 (56) | 0.0144 (59) | 0.0211 (87)
         MAX ULP | 0.65193975  | 10458803.   | 6471461.    | 0.5
1.0 < ULP        | 0           | 16773117    | 16762878    | 0
0.9 < ULP <= 1.0 | 0           | 0           | 0           | 0
0.8 < ULP <= 0.9 | 0           | 0           | 0           | 0
0.7 < ULP <= 0.8 | 0           | 0           | 0           | 0
0.6 < ULP <= 0.7 | 19268       | 2047        | 2047        | 0

So bde3(x) is best, but if you're going to use sin(M_PI*x) for
float sinpif, then use sinpi((double)x) which is faster than bde3
and just as accurate.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow