From owner-freebsd-numerics@FreeBSD.ORG Fri Jun 28 00:42:59 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 76E49B6E for ; Fri, 28 Jun 2013 00:42:59 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 3F00518B4 for ; Fri, 28 Jun 2013 00:42:59 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 3B5E5781EBD; Fri, 28 Jun 2013 10:42:54 +1000 (EST) Date: Fri, 28 Jun 2013 10:42:49 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: enh Subject: Re: sincos? In-Reply-To: Message-ID: <20130628103209.H1008@besplex.bde.org> References: <20130627013502.GA37295@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=eqSHVfVX c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=KkbArD1GNvbIUGj--mIA:9 a=CjuIK1q_8ugA:10 a=iOpTMNq0JQTl-TBF:21 a=osw8StZwuhklUwHI:21 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: freebsd-numerics@freebsd.org, Steve Kargl X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Jun 2013 00:42:59 -0000 On Thu, 27 Jun 2013, enh wrote: > well, that was Intel and the code's not been accepted, but yes --- that's > another reason for me not to accept their patch! > > Intel claimed "The reason for this fix [beside workaround for O0 switch] - > it helps to remove some sin[f]+cos[f] code duplication (which is the whole > reason for introduction of such function at all), which results in > 1.58-1.81x performance gain on intervals |x|<100." i've not seen their > benchmark code, so i don't know what their distribution of values was, and > i don't understand why they covered a range as large as +/- 100. +-2*Pi may be a bit too small, but most uses won't require very large angles. > when looking at i7 performance though, remember that x86 Android will > usually be running on Atom (and most Android devices are actually ARM, not > x86). Hardware trig may actually be best for Atom (like on x86 before about PPro for float precision and AthlonXP for double precision). Some of my optimizations in software libm depend on out of order execution so they will be pessimizations (hopefully small) on Atom and other in order execution CPUs. Bruce