From owner-cvs-src@FreeBSD.ORG Sat Nov 19 02:38:28 2005 Return-Path: X-Original-To: cvs-src@FreeBSD.org Delivered-To: cvs-src@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1B3D816A41F; Sat, 19 Nov 2005 02:38:28 +0000 (GMT) (envelope-from bde@FreeBSD.org) Received: from repoman.freebsd.org (repoman.freebsd.org [216.136.204.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id E04E443D45; Sat, 19 Nov 2005 02:38:27 +0000 (GMT) (envelope-from bde@FreeBSD.org) Received: from repoman.freebsd.org (localhost [127.0.0.1]) by repoman.freebsd.org (8.13.1/8.13.1) with ESMTP id jAJ2cR8q059481; Sat, 19 Nov 2005 02:38:27 GMT (envelope-from bde@repoman.freebsd.org) Received: (from bde@localhost) by repoman.freebsd.org (8.13.1/8.13.1/Submit) id jAJ2cRgV059480; Sat, 19 Nov 2005 02:38:27 GMT (envelope-from bde) Message-Id: <200511190238.jAJ2cRgV059480@repoman.freebsd.org> From: Bruce Evans Date: Sat, 19 Nov 2005 02:38:27 +0000 (UTC) To: src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org X-FreeBSD-CVS-Branch: HEAD Cc: Subject: cvs commit: src/lib/msun/src e_rem_pio2f.c s_cosf.c s_sinf.c s_tanf.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Nov 2005 02:38:28 -0000 bde 2005-11-19 02:38:27 UTC FreeBSD src repository Modified files: lib/msun/src e_rem_pio2f.c s_cosf.c s_sinf.c s_tanf.c Log: Moved all the optimizations for |x| <= 9pi/2 from __ieee754_rem_pio2f() to its 3 callers and manually inline them. On Athlons, with favourable compiler flags and optimizations and favourable pipeline conditions, this gives a speedup of 30-40 cycles for cosf(), sinf() and tanf() on the range pi/4 < |x| <= 9pi/4, so thes functions are now signifcantly faster than the hardware trig functions in many cases. E.g., in a benchmark with uniformly distributed x in [-2pi, 2pi], A64 hardware fcos took 72-129 cycles and cosf() took 37-55 cycles. Out-of-order execution is needed to get both of these times. The optimizations in this commit apparently work more by removing 1 serialization point than by reducing latency. Revision Changes Path 1.17 +0 -55 src/lib/msun/src/e_rem_pio2f.c 1.10 +33 -2 src/lib/msun/src/s_cosf.c 1.10 +41 -4 src/lib/msun/src/s_sinf.c 1.10 +31 -6 src/lib/msun/src/s_tanf.c