Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Nov 2005 11:46:20 +0000 (UTC)
From:      Bruce Evans <bde@FreeBSD.org>
To:        src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org
Subject:   cvs commit: src/lib/msun/src k_tanf.c
Message-ID:  <200511281146.jASBkKti074003@repoman.freebsd.org>

next in thread | raw e-mail | index | archive | help
bde         2005-11-28 11:46:20 UTC

  FreeBSD src repository

  Modified files:
    lib/msun/src         k_tanf.c 
  Log:
  Rearranged the polynomial evaluation some more to reduce dependencies.
  Instead of echoing the code in a comment, try to describe why we split
  up the evaluation in a special way.
  
  The new optimization is mostly to move the evaluation of w = z*z later
  so that everything else (except z = x*x) doesn't have to wait for w.
  On Athlons, FP multiplication has a latency of 4 cycles so this
  optimization saves 4 cycles per call provided no new dependencies are
  introduced.  Tweaking the other terms in to reduce dependencies saves
  a couple more cycles in some cases (more on AXP than on A64; up to 8
  cycles out of 56 altogether in some cases).  The previous version had
  a similar optimization for s = z*x.  Special optimizations like these
  probably have a larger effect than the simple 2-way vectorization
  permitted (but not activated by gcc) in the old version, since 2-way
  vectorization is not enough and the polynomial's degree is so small
  in the float case that non-vectorizable dependencies dominate.
  
  On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now
  takes 34-55 cycles (was 39-59 cycles).
  
  Revision  Changes    Path
  1.20      +20 -8     src/lib/msun/src/k_tanf.c



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200511281146.jASBkKti074003>