Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Feb 2005 23:00:36 GMT
From:      David Schultz <das@FreeBSD.ORG>
To:        freebsd-i386@FreeBSD.org
Subject:   Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs
Message-ID:  <200502202300.j1KN0a5q005345@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR i386/67469; it has been noted by GNATS.

From: David Schultz <das@FreeBSD.ORG>
To: Bruce Evans <bde@zeta.org.au>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG, freebsd-i386@FreeBSD.ORG
Subject: Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs
Date: Sun, 20 Feb 2005 17:52:01 -0500

 DONE:
 - Remove i387 float trig functions.
 - Remove i387 acos() and asin().
 
 TODO:
 - Figure out what to do about logf(), logbf(), and log10f().
 - Figure out what to do about atan(), atan2(), and atan2f().
 - Figure out what to do with sin(), cos(), and tan().
   (I suggest removing the i387 double trig functions now and worrying
   about the speed of fdlibm routines later.)
 
 On Sun, Feb 20, 2005, Bruce Evans wrote:
 > I would adjust the following due to these results:
 >   Think about deleting the exp and
 >   log i387 float functions.
 
 I didn't add NetBSD's e_expf.S in the first place because my tests
 showed that it was slower.  :-P  As for log{,b,10}f, your tests show
 that the asm versions are faster on my Pentium 4:
 
 asmlogf: nsec per call:  40 41 40 40 40 40 40 40 40 40 40 40 40 40 40 40
 fdllogf: nsec per call:  76 77 77 78 76 78 78 78 77 75 78 78 78 78 78 78
 asmlogbf: nsec per call:  12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
 fdllogbf: nsec per call:  18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
 asmlog10f: nsec per call:  40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
 fdllog10f: nsec per call:  80 80 71 88 71 71 95 84 71 71 71 71 72 112 96 71
 
 > - delete all inverse trig i387 functions.
 
 This is a clear win for asin() and acos().  It's not so clear for
 atan() or atan2():
 
 asmatan: nsec per call:  68 68 68 68 68 68 68 69 69 68 68 68 68 68 68 68
 fdlatan: nsec per call:  92 92 92 92 92 94 97 70 70 97 95 92 92 92 92 92
 fdlatanf: nsec per call:  70 70 70 70 70 71 72 58 58 72 70 69 70 75 71 69
 
 This is for the same Pentium 4 as above.  Do you get different
 results for a saner processor, like an Athlon?  IIRC, atan2f() was
 faster in assembly according to my naive tests, or I wouldn't have
 imported it.  I don't remember what inputs I tried or why I left out
 atanf().
 
 >   Think about optimizing the fdlibm versions.
 >   They could use double or extended precision, and then they might not
 >   need range reduction for a much larger range (hopefully [-2pi, 2pi])
 >   and/or might not need a correction term for a much larger range.
 [...]
 > - think about optimizing the trig fdlibm double functions until they are
 >   faster than the trig i387 double functions on a larger range than
 >   [-pi/4, pi/4].  They could use extended precision, but only only on
 >   some arches so there would be a negative reduction in complications
 >   for replacing the MD functions by "MI" ones optimized in this way.
 >   Does the polynomial approximation for sin start to fail between pi/4
 >   and pi/2, or are there other technical rasons to reduce to the current
 >   ranges?
 
 Yeah, the float versions of just about everything would benefit
 from storing the extra bits in a double.  Replacing two split
 doubles with a long double is harder, as you mention; it works
 (a) never, for 64-bit long doubles, (b) sometimes, for 80-bit long
 doubles, and (c) always, for 128-bit long doubles.  Some of the
 lookup tables for the float routines are a bit braindead, too.
 
 It's impossible to use a polynomial approximation on [0,pi/2] or a
 larger range for tan(), since tan() grows faster than any
 polynomial as it approaches pi/2.  There may be a rational
 approximation that works well, but I doubt it.  It is possible to
 find polynomial approximations for sin() and cos() on [0,pi/2],
 but assuming that Dr. Ng knows what he's doing (a reasonable
 assumption), the degree of any such polynomial would likely be
 very large.
 
 By the way, Dr. Ng's paper on argument reduction mentions that the
 not-so-freely-distributable libm uses table lookup for medium size
 arguments that would otherwise require reduction:
 http://www.ucbtest.org/arg.ps
 
 So in summary, there's a lot of low-hanging fruit with the float
 routines, but I think that doing a better job for double requires
 multiple versions that use the appropriate kind of extended
 precision, table lookup, or consulting a numerical analyst.
 
 By the way, the CEPHES library (netlib/cephes or
 http://www.moshier.net/) has different versions of many of these
 routines.  The trig functions are also approximated on [0,pi/4],
 but accurate argument reduction is not used.  I have the licensing
 issues worked out with the author and core@ if we want to use any
 of these.  However, my experience with exp2() and expl() from
 CEPHES showed that there are some significant inaccuracies, places
 where the approximating polynomial can overflow, etc.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200502202300.j1KN0a5q005345>