Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Feb 2005 00:02:35 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        David Schultz <das@freebsd.org>
Cc:        freebsd-i386@freebsd.org
Subject:   Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs
Message-ID:  <20050221223142.K3458@epsplex.bde.org>
In-Reply-To: <20050220225201.GA4339@VARK.MIT.EDU>
References:  <200406012251.i51MpkkU024224@VARK.homeunix.com> <20040602172105.T23521@gamplex.bde.org> <20050204215913.GA44598@VARK.MIT.EDU> <20050205181808.J10966@delplex.bde.org> <20050209051401.GA18775@VARK.MIT.EDU> <20050209232758.F3249@epsplex.bde.org> <20050210072314.GA26713@VARK.MIT.EDU> <20050214000320.U1866@epsplex.bde.org> <20050220202844.R5075@epsplex.bde.org> <20050220225201.GA4339@VARK.MIT.EDU>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 20 Feb 2005, David Schultz wrote:

> On Sun, Feb 20, 2005, Bruce Evans wrote:
> > I would adjust the following due to these results:
> >   Think about deleting the exp and
> >   log i387 float functions.
>
> I didn't add NetBSD's e_expf.S in the first place because my tests
> showed that it was slower.  :-P  As for log{,b,10}f, your tests show
> that the asm versions are faster on my Pentium 4:
>
> asmlogf: nsec per call:  40 41 40 40 40 40 40 40 40 40 40 40 40 40 40 40
> fdllogf: nsec per call:  76 77 77 78 76 78 78 78 77 75 78 78 78 78 78 78
> asmlogbf: nsec per call:  12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
> fdllogbf: nsec per call:  18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
> asmlog10f: nsec per call:  40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
> fdllog10f: nsec per call:  80 80 71 88 71 71 95 84 71 71 71 71 72 112 96 71

I get similar results for logf on all old machines, but fdllogf is faster
on my Athlon XP:

to.axpb-2223:
asmlogf: nsec per call:  60 60 58 61 60 57 60 62 62 58 57 57 57 62 62 62
fdllogf: nsec per call:  46 45 45 45 46 45 45 45 45 46 45 45 45 45 45 45
asmlogbf: nsec per call:  6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
fdllogbf: nsec per call:  12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
asmlog10f: nsec per call:  60 60 58 61 60 57 60 62 62 58 57 57 57 62 62 62
fdllog10f: nsec per call:  87 87 83 90 88 79 86 94 94 81 78 78 79 94 94 95

asmlog ties with fdllog on the axp (65-68 nsec for both).

logb is quite different from the other functions so it doesn't really
belong in this benchmark (I got it using grep :-).

> > - delete all inverse trig i387 functions.
>
> This is a clear win for asin() and acos().  It's not so clear for
> atan() or atan2():
>
> asmatan: nsec per call:  68 68 68 68 68 68 68 69 69 68 68 68 68 68 68 68
> fdlatan: nsec per call:  92 92 92 92 92 94 97 70 70 97 95 92 92 92 92 92
> fdlatanf: nsec per call:  70 70 70 70 70 71 72 58 58 72 70 69 70 75 71 69
>
> This is for the same Pentium 4 as above.  Do you get different
> results for a saner processor, like an Athlon?  IIRC, atan2f() was
> faster in assembly according to my naive tests, or I wouldn't have
> imported it.  I don't remember what inputs I tried or why I left out
> atanf().

I didn't test atanf or atan2*, but fdlatan was faster on a K6-1, an old
Celeron, a P3 and an AXP, but not on a 486:

to.486dx2-66
asmatan: nsec per call:  5518 5522 5527 5530 5474 5473 5674 5440 5433 5703 5625 5628 5554 5545 5554 5557
fdlatan: nsec per call:  8128 8126 8127 8132 7990 8352 8910 7667 7557 8723 8272 7929 7913 7926 7915 7921

to.axpb-2223
asmatan: nsec per call:  87 87 87 87 87 87 87 78 78 87 87 87 87 87 87 87
fdlatan: nsec per call:  65 65 65 65 65 66 68 51 51 68 66 65 65 65 65 65

to.cel366
asmatan: nsec per call:  444 444 444 444 444 444 444 424 424 444 444 444 444 444 444 444
fdlatan: nsec per call:  370 370 370 370 370 382 397 323 323 397 382 370 370 370 370 370

to.k6-233
asmatan: nsec per call:  827 827 827 827 827 827 857 838 833 853 823 823 823 823 823 823
fdlatan: nsec per call:  771 771 771 771 772 801 834 712 707 826 793 763 763 763 763 763
to.p3-800
asmatan: nsec per call:  209 209 205 209 209 209 209 200 200 209 209 209 209 209 209 209
fdlatan: nsec per call:  175 175 175 176 176 181 179 150 149 178 174 172 171 171 172 172

so asmatanf can only beat fdlatanf if the latter is doing something much
worse than the double version.

I tested with an almost unchanged version of -current's lib/msun.  I forgot
to mention that I added arg reduction to the asm cosf, sinf and tanf.
ucbtest noticed that the asm versions were broken, but after adding the
range reduction, ucbtest didn't report any significant changes since last
June.

> [...]
> > - think about optimizing the trig fdlibm double functions until they are
> >   faster than the trig i387 double functions on a larger range than
> >   [-pi/4, pi/4].  They could use extended precision, but only only on
> [...]

> It's impossible to use a polynomial approximation on [0,pi/2] or a
> larger range for tan(), since tan() grows faster than any
> polynomial as it approaches pi/2.  There may be a rational
> approximation that works well, but I doubt it.  It is possible to
> find polynomial approximations for sin() and cos() on [0,pi/2],
> but assuming that Dr. Ng knows what he's doing (a reasonable
> assumption), the degree of any such polynomial would likely be
> very large.

I was only thinking of cos() and sin().  tan() has a good (local)
rational approximation everywhere since it is the quotient of 2 functions
that are analytic everywhere, but fdlibm already uses this via range
reduction (tan() on [pi/4, 3pi/4] is like -1/tan() on [-pi/4, pi/4]).

> By the way, the CEPHES library (netlib/cephes or
> http://www.moshier.net/) has different versions of many of these
> routines.  The trig functions are also approximated on [0,pi/4],
> but accurate argument reduction is not used.  I have the licensing
> issues worked out with the author and core@ if we want to use any
> of these.  However, my experience with exp2() and expl() from
> CEPHES showed that there are some significant inaccuracies, places
> where the approximating polynomial can overflow, etc.

Good work.  I only asked the author about licensing and found that
there would be few problems.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050221223142.K3458>