From owner-freebsd-i386@FreeBSD.ORG Thu Feb 10 07:23:34 2005 Return-Path: Delivered-To: freebsd-i386@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 715E716A4CE; Thu, 10 Feb 2005 07:23:34 +0000 (GMT) Received: from VARK.MIT.EDU (VARK.MIT.EDU [18.95.3.179]) by mx1.FreeBSD.org (Postfix) with ESMTP id C23C543D1F; Thu, 10 Feb 2005 07:23:33 +0000 (GMT) (envelope-from das@freebsd.org) Received: from VARK.MIT.EDU (localhost [127.0.0.1]) by VARK.MIT.EDU (8.13.1/8.13.1) with ESMTP id j1A7NJMa026972; Thu, 10 Feb 2005 02:23:19 -0500 (EST) (envelope-from das@freebsd.org) Received: (from das@localhost) by VARK.MIT.EDU (8.13.1/8.13.1/Submit) id j1A7NETY026967; Thu, 10 Feb 2005 02:23:14 -0500 (EST) (envelope-from das@freebsd.org) Date: Thu, 10 Feb 2005 02:23:14 -0500 From: David Schultz To: Bruce Evans Message-ID: <20050210072314.GA26713@VARK.MIT.EDU> References: <200406012251.i51MpkkU024224@VARK.homeunix.com> <20040602172105.T23521@gamplex.bde.org> <20050204215913.GA44598@VARK.MIT.EDU> <20050205181808.J10966@delplex.bde.org> <20050209051401.GA18775@VARK.MIT.EDU> <20050209232758.F3249@epsplex.bde.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050209232758.F3249@epsplex.bde.org> cc: FreeBSD-gnats-submit@freebsd.org cc: freebsd-i386@freebsd.org cc: bde@freebsd.org Subject: Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs X-BeenThere: freebsd-i386@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: I386-specific issues for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Feb 2005 07:23:34 -0000 On Thu, Feb 10, 2005, Bruce Evans wrote: > > I used the following sets > > of inputs: > > > > tbl1: small numbers > > ... > > tbl2: numbers on [-8pi,8pi] greater in magnitude than 2^-18 > > ... > > tbl3: large numbers > > ... > > tbl4: special cases > > This data may be too unusual. Maybe the NaNs are slower. Denormals > would probably be slower. The data in tbl2 are pretty usual, I think, and I measured all of the data points independently. But yes, NaNs are slower, as the results for tbl4 indicate. Looking back, though, I did notice that very few of my inputs in tbl2 require argument reduction. In your tests on [0..10], on the other hand, 92% of the inputs require argument reduction in fdlibm. It would be interesting to see for which of your tests fdlibm is faster, and for which it is slower. One possibility is that fdlibm is slower most of the time; another is that it is far slower for the close-to-pi/2 cases that the i387 gets wrong, and that messes up the averages. > The synchronising cpuid here is responsible for a factor of 3 difference > for me. Moving the rdtsc out of the loop gives the following changes > in cycle counts: > > 2000 -> [944..1420] > 1000 -> 431 > 400 -> 132 > > Each rdtsc() in the loop costs 75 cycles for tbl1, and actually using > the results costs another 120 cycles. > > I think the cpuid is disturbing the timings too much. I don't care so much about the rdtsc overhead since I'm only measuring relative performance. A null function is measured as taking 388 cycles on my Pentium 4, but some of that is due to gcc getting confused by the volatile variable and generating extra code at -O0. However, it is true that I am basically measuring latency and not throughput. Ordinarily, it is possible to execute FPU and CPU instructions simultaneously, and the FPU may even have more than one FU available for executing fptan. The cpuid instructions clear out the pipeline and destroy any parallelism that might have been possible. Your version does a better job of measuring throughput. You're also right that fdlibm tan() blows out about 512 bytes of instruction cache. Anyway, I unfortunately don't have time for all this. Do you want the assembly versions of these to stay or not? If so, it would be great if you could fix them and make sure that the result isn't obviously slower than fdlibm. If not, I'll be happy to spend two minutes making all those pesky bugs in them go away. ;-)