From owner-freebsd-numerics@FreeBSD.ORG Mon Jun 24 11:06:50 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B31AF151 for ; Mon, 24 Jun 2013 11:06:50 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id A1E521DD1 for ; Mon, 24 Jun 2013 11:06:50 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r5OB6oWa001070 for ; Mon, 24 Jun 2013 11:06:50 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r5OB6oUN001068 for freebsd-numerics@FreeBSD.org; Mon, 24 Jun 2013 11:06:50 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 24 Jun 2013 11:06:50 GMT Message-Id: <201306241106.r5OB6oUN001068@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-numerics@FreeBSD.org Subject: Current problem reports assigned to freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Jun 2013 11:06:50 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o stand/175811 numerics libstdc++ needs complex support in order use C99 o bin/170206 numerics [msun] [patch] complex arcsinh, log, etc. o stand/82654 numerics C99 long double math functions are missing 3 problems total. From owner-freebsd-numerics@FreeBSD.ORG Wed Jun 26 23:45:55 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AC8ADFFA for ; Wed, 26 Jun 2013 23:45:55 +0000 (UTC) (envelope-from enh@google.com) Received: from mail-wg0-x235.google.com (mail-wg0-x235.google.com [IPv6:2a00:1450:400c:c00::235]) by mx1.freebsd.org (Postfix) with ESMTP id 43AAC1726 for ; Wed, 26 Jun 2013 23:45:55 +0000 (UTC) Received: by mail-wg0-f53.google.com with SMTP id y10so56808wgg.20 for ; Wed, 26 Jun 2013 16:45:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=tXjXL/AsXYU9vx0HK9AaryOlV5bGy4YrjFrO51FaQx0=; b=YJ/Rm0lDqb0lVv66pOBQ+Zy8D33fBovDS3DrV5iiV/TTYdh2p65PkLAtLT7K5zUh7w fUQsvgHNseLhexXblQTc+WQYNLu2brt5xDnBSwoLjrXzAG2FTRNdyIRYgaNxtFmeMA6v q83vdUGNatEv/T2c2l8ihjkruxLrJupdu+XcozG/Y6lqa3CZ4T3kOG6doXlp8SlZJbzC sZR6yVl7Gd10iOPd9tkG7yBzr+6jaML/NzDGDMvA1+h4cUCjPOLjR6jmjhSsnolXOdDc pao6pHG1xwKMtyPB+HncbppEM3DHq3fyy0c0YymVMNCJyJY1oQlJZxOYdB/9pM7KFyff xO8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type :x-gm-message-state; bh=tXjXL/AsXYU9vx0HK9AaryOlV5bGy4YrjFrO51FaQx0=; b=QPI6e6s9OjmnJfFrEF5PbJ5maqAEL8GKNdlP2vdzoh6iHEwDF3EYl8ydf/mgYSudPA +rbL+ADsHhR8XafeNNRuzJsjJlIUi5htMrpKL/OSoaavVXF0v3YuZWTaIh6Jc8eFleas aiVwuiMpPvfyvjlK4DOjI01QsxvfRXQs4fdCXHyzxxZ/A3XReWacmLrep1UdRg72lFT2 oG5kptYgb7s0yfFVddWr7iYDVeayFS0a2eR/HZ4YdwNMb5F5UHvYk4Nyvhv3cCA7d/LM hW8yIoDf52IMIWPNWAtuYhq5TIXYeot2SVm9ptLfsok5tURa+Lm5J+DZqzIuvB5/gEEn ILBw== X-Received: by 10.180.7.164 with SMTP id k4mr4153966wia.40.1372290354426; Wed, 26 Jun 2013 16:45:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.137.244 with HTTP; Wed, 26 Jun 2013 16:45:34 -0700 (PDT) From: enh Date: Wed, 26 Jun 2013 16:45:34 -0700 Message-ID: Subject: sincos? To: freebsd-numerics@freebsd.org X-Gm-Message-State: ALoCoQkVgLKFtpOcsV/gzjJLBSite6V1kI48ALfd6g459IEQmy5u+MK/HjnRsYiG2n6ycqTpTNg1ff5oEzX9rOCrlGdk8x+49/NLaaNNvu0NPlLegKotNM510MIFPWV9V+qhDlXm+b0QOi5eZYYZuZkOUkpb5gZ7DxWFshqJ3Rmg29pSs3gSK7jGC4rrFbC12tkQkezXuyJfPNDFiiQffnp943Jn9PWISA== Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Jun 2013 23:45:55 -0000 i'm a recent lurker on this list; i've inherited Android's C library, and among other things i'm trying to track FreeBSD's lib/msun much more closely than we have traditionally. i was just reminded of the existence of a change submitted to us (Android) a while back that adds a sincos/sincosf implementation cobbled together from your s_sin.c/s_sinf.c and s_cos.c/s_cosf.c implementations: https://android-review.googlesource.com/#/c/47585/ the submitter (Intel) rightly points out that at the moment GCC carefully optimizes paired sin/cos calls into a sincos call which we deoptimize back into separate sin/cos calls. i personally don't want to take on maintenance of this, but i would be happy to include you guys' sincos implementation if you had one. is there a reason you don't have one? what's the clang story with this optimization (it's my understanding you're moving away from GCC in favor of clang)? --elliott From owner-freebsd-numerics@FreeBSD.ORG Thu Jun 27 01:38:34 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 4929578A for ; Thu, 27 Jun 2013 01:38:34 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 1278D1BCA for ; Thu, 27 Jun 2013 01:38:34 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r5R1Z2vg037390; Wed, 26 Jun 2013 18:35:02 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r5R1Z2me037389; Wed, 26 Jun 2013 18:35:02 -0700 (PDT) (envelope-from sgk) Date: Wed, 26 Jun 2013 18:35:02 -0700 From: Steve Kargl To: enh Subject: Re: sincos? Message-ID: <20130627013502.GA37295@troutmask.apl.washington.edu> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jun 2013 01:38:34 -0000 On Wed, Jun 26, 2013 at 04:45:34PM -0700, enh wrote: > i'm a recent lurker on this list; i've inherited Android's C library, and > among other things i'm trying to track FreeBSD's lib/msun much more closely > than we have traditionally. > > i was just reminded of the existence of a change submitted to us (Android) > a while back that adds a sincos/sincosf implementation cobbled together > from your s_sin.c/s_sinf.c and s_cos.c/s_cosf.c implementations: > https://android-review.googlesource.com/#/c/47585/ > A quick glance at the code shows that the android project has slapped its Copyright on fdlibm code. I suspect that you'll want to restore proper attribution to Sun Microsystems. > the submitter (Intel) rightly points out that at the moment GCC carefully > optimizes paired sin/cos calls into a sincos call which we deoptimize back > into separate sin/cos calls. i personally don't want to take on maintenance > of this, but i would be happy to include you guys' sincos implementation if > you had one. is there a reason you don't have one? I haven't submitted the versions of sincos[fl], which I've developed over the last year or so, yet. First, I need to redo some testing. Second, I need to convince Bruce that the implementation would be a nice addition to libm. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Thu Jun 27 02:35:09 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 255228AB for ; Thu, 27 Jun 2013 02:35:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id BD2381E2B for ; Thu, 27 Jun 2013 02:35:08 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 8EA2C7816D9; Thu, 27 Jun 2013 12:35:00 +1000 (EST) Date: Thu, 27 Jun 2013 12:34:59 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: enh Subject: Re: sincos? In-Reply-To: Message-ID: <20130627112404.T1215@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Q6eKePKa c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Oh2cFVv5AAAA:8 a=hUWrwKZOu6MgEUimIW4A:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jun 2013 02:35:09 -0000 On Wed, 26 Jun 2013, enh wrote: > i'm a recent lurker on this list; i've inherited Android's C library, and > among other things i'm trying to track FreeBSD's lib/msun much more closely > than we have traditionally. We haven't bothered with it because there are more important optimizations to do fitst. > i was just reminded of the existence of a change submitted to us (Android) > a while back that adds a sincos/sincosf implementation cobbled together > from your s_sin.c/s_sinf.c and s_cos.c/s_cosf.c implementations: > https://android-review.googlesource.com/#/c/47585/ I couldn't read it due to a javascript problem. > the submitter (Intel) rightly points out that at the moment GCC carefully > optimizes paired sin/cos calls into a sincos call which we deoptimize back > into separate sin/cos calls. i personally don't want to take on maintenance > of this, but i would be happy to include you guys' sincos implementation if > you had one. is there a reason you don't have one? what's the clang story > with this optimization (it's my understanding you're moving away from GCC > in favor of clang)? A quick check of current speeds show that separate sin/cos calls are fairly efficient on corei7. They get pipelined and run in parallel, and you can only avoid parameter passing and arg reduction overheads by using a single call. % #include % % #define FREQ 2010168339 /* sysctl -n machdep.tsc_freq */ % % int % main(void) % { % volatile double c, s, x; % int i; % % #if 0 % /* 106 cycles on Athlon64 (i386): */ % /* 102 cycles on corei7 (amd64): */ % for (i = 0; i < FREQ / 10; i++) % asm("fld1; fsincos; fstp %st(0); fstp %st(0)"); The i387 sincos instruction is very slow (just like all i387 instructions excep addition and multiplication). FreeBSD still uses the slow sin and cos instructions on i386 (except in my version), to get their slowness and huge inaccuracy. Fixing this is more important. Note that the above is not a full sincos implementation, and isn't a C implementation. It is missing support for large args, and cheats by not passing args or using using the results or accessing memory. However, for the test arg of 1, the the arg is not large so no special arg reduction is needed. Also the test arg is not very near a multiple of pi/2, so the i387 accuracy is more than good enough for double precision (it is good enough for long double precision). % #else % /* 255 cycles on Athlon64 (i386) :-(: */ % /* 74 cycles on corei7 (amd64): */ % for (i = 0; i < FREQ / 10; i++) { % x = 1; % c = cos(x); % s = sin(x); % } The library implementation is complete, and the test does full parameter passing. However, the arg reduction is trivial for the test arg, so this tests a case where repeating the arg reduction for sin and cos does't take very long. For medium-sized args, the library sin and cos are about twice as slow the i387 is broken near multiples of pi/2. For huge args, the library sin and cos are 10-20 times slower and the i387 is broken for all args. It is in the unimportant huge-arg case that combining sin and cos is most beneficial. Most of the 10-20 times slowness factor is for the arg reduction, so avoiding doing it once would make sincos twice as fast as sin+cos. On corei7, the library implementaion easily beats the i387 for all args between -2*Pi and 2*Pi. The libary does special optimizations for this range. The i387 is also faster for a smaller range (between -Pi/4 and Pi/4 IIRC). More careful tests than the above give the following times for on corei7: cos: 28 cycles; sin: 24 cycles. So the combined time of 74 cycles is not very good. The slowness of the library implementation on Athlon64 (i386) is strange. More careful tests than the above give the following times: cos: 51 cycles; sin: 68 cycles for args between -2*Pi and 2*Pi. I forgot that although my libm doesn't use i387 sin or cos, it is not optimized for Athlon64 (it is optimized for i386 and tuned for athlon-xp). The more careful tests optimize it using -march=athlon*. I thought that the Athlon64-specific optionizations (using SSE for some things) were only important for medium-sized args. After changing sin and cos to cosf and sinf, the test runs at the expected speed (72 cycles on Athlon64 (i386, with library not using Athlon64 features) and 43 cycles on corei7 (amd64)). Optimizing double precision on i386/Athlon64 is more important. On newer CPUs, double precision doesn't have the extra penalties relative to float precision that it has on Athlon64, at least when the library is optimized for the newer CPU, so i386 libm runs at about the same speed as amd64 libm in all precisions. Note that i387 sincos delivers long double precision for some args, while library sinl and cosl deliver long double precision for all args, but are quite slow. In particular, the library isn't optimized for args between -2*Pi and 2*Pi, but only for args between -Pi/4 and Pi/4. The test arg of 1 is outside of the smaller range. After modifying the test program to use long doubles, it takes 591 cycles for cos and sin on Athlon64 (i386) :-(. Optimizing this is more important. Oops, I forgot to change the default rounding precision to 64 bits. 591 cycles is with every call to cos and every call to sin switching the rounding precision back and forth. After fixing this, the test program only takes 472 cycles. This is still larger than expected. In more careful tests, cosl takes 122 cycles and sinl takes 180 cycles for args in the range -2*Pi to 2*Pi. My cosl has minor optimizations that aren't in the committed version. 472 is interestingly more than 122+180. On corei7, the penalties for long doubles relative to doubles are smaller, so cosl takes only 60 cycles and sinl only 56, and the test program only 178. The rounding mode doesn't need switching on amd64. 178 is still interestingly more than 60+56. Apparently, overheads outside of the functions are larger than the time taken by each function. Probably this is only apparent and the overheads are really for the separate calls messing up each others scheduling. Combining the calls can give better scheduling, but optimizations related to scheduling are hard to get right. % #endif % return (0); % } Bruce From owner-freebsd-numerics@FreeBSD.ORG Thu Jun 27 16:12:25 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C8B18EF9 for ; Thu, 27 Jun 2013 16:12:25 +0000 (UTC) (envelope-from enh@google.com) Received: from mail-wi0-x232.google.com (mail-wi0-x232.google.com [IPv6:2a00:1450:400c:c05::232]) by mx1.freebsd.org (Postfix) with ESMTP id 5B8441013 for ; Thu, 27 Jun 2013 16:12:25 +0000 (UTC) Received: by mail-wi0-f178.google.com with SMTP id k10so800514wiv.17 for ; Thu, 27 Jun 2013 09:12:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=C9hMPcUMo8lel3UZ7+wlZUABE/w8ogauq2SU8e02aqs=; b=VOYC+Evsbf5YJUI9dz7e8kqmv3z7txzuuWyRV7j90QXxIWYhpkTDowahUP8ObOucPy sfDX/3INz9jswi3Nqm4hIlcG+rRtTznMPwHpr2a9FJahHr+opVZZLNc0mJWPlPLT+uHA USqGMGRsdCi4JhegNsYVDqp2btZz59bzZj6eFtsz49+y4vjWnsO9T649WDeXEASzXz1X bV1f3FiNkRORnEf+4VFguM7KEHpB4lvvIeuD32wnBa4L1laKDcusuExHHy5WSDloG3k0 +YIPhzEPp9V0vIdplj5lll0qQvhRwn3xpFxmFmyWZ28xgbXLciwZmU24a6wY27CARgHR Mm2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-gm-message-state; bh=C9hMPcUMo8lel3UZ7+wlZUABE/w8ogauq2SU8e02aqs=; b=BaWDxyKocY4wbT1NsvdDi1jwZVLj1uhHXXwx2tjYeolTVqhNQeHEFOcg4EDkqnO83G 3EKBGd4/hTB8vAL6TZes4hmRd1Vp7cEyQhZkvwfFuSdLW+qL7MStH3F40M6hICHYR+ln wBk57H/fdC6oOMRXQoRK5+3j3Qn8KmMcrE83BYUITZNVa1UUQb3+t8zDy0WIFEzIz9km 3pukIH8EWleJa4f/MGDEGz21X/ut0TYZ4YtJhX0lD2BeblKCyG1xQAtHObQVaJiOIZy3 EEbdVH+CV9frFtf2yBiglQmHlHxUTi+fZGIZeHobUmQe1Ldao2gOTYIn5zb3X16M7FiA LQ2g== X-Received: by 10.180.81.169 with SMTP id b9mr16236369wiy.40.1372349544511; Thu, 27 Jun 2013 09:12:24 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.137.244 with HTTP; Thu, 27 Jun 2013 09:12:04 -0700 (PDT) In-Reply-To: <20130627013502.GA37295@troutmask.apl.washington.edu> References: <20130627013502.GA37295@troutmask.apl.washington.edu> From: enh Date: Thu, 27 Jun 2013 09:12:04 -0700 Message-ID: Subject: Re: sincos? To: Steve Kargl X-Gm-Message-State: ALoCoQkrw/7OlskkY3QTfAfgWMVMFYmggltAMEg/csESopNliNKLI4lxYtCwQYe6NcqcWqoSc1lVBQUvylp9Qii/QxykTNucPccEQEKfa/HNnOwRB2n7keRbTijN1rzFD1d8mQh9ZN/SFuQxcFcW0zueTGF5lsFOsKs/4Tui2VXHKA3Qv5liNwcdHGZc75JguB0Bb8eyPF7agJ/wrEr8XCTomMZCxHWWHw== Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jun 2013 16:12:25 -0000 well, that was Intel and the code's not been accepted, but yes --- that's another reason for me not to accept their patch! Intel claimed "The reason for this fix [beside workaround for O0 switch] - it helps to remove some sin[f]+cos[f] code duplication (which is the whole reason for introduction of such function at all), which results in 1.58-1.81x performance gain on intervals |x|<100." i've not seen their benchmark code, so i don't know what their distribution of values was, and i don't understand why they covered a range as large as +/- 100. when looking at i7 performance though, remember that x86 Android will usually be running on Atom (and most Android devices are actually ARM, not x86). On Wed, Jun 26, 2013 at 6:35 PM, Steve Kargl < sgk@troutmask.apl.washington.edu> wrote: > On Wed, Jun 26, 2013 at 04:45:34PM -0700, enh wrote: > > i'm a recent lurker on this list; i've inherited Android's C library, and > > among other things i'm trying to track FreeBSD's lib/msun much more > closely > > than we have traditionally. > > > > i was just reminded of the existence of a change submitted to us > (Android) > > a while back that adds a sincos/sincosf implementation cobbled together > > from your s_sin.c/s_sinf.c and s_cos.c/s_cosf.c implementations: > > https://android-review.googlesource.com/#/c/47585/< > https://android-review.googlesource.com/#/c/47585/1> > > > > A quick glance at the code shows that the android project has > slapped its Copyright on fdlibm code. I suspect that you'll > want to restore proper attribution to Sun Microsystems. > > > the submitter (Intel) rightly points out that at the moment GCC carefully > > optimizes paired sin/cos calls into a sincos call which we deoptimize > back > > into separate sin/cos calls. i personally don't want to take on > maintenance > > of this, but i would be happy to include you guys' sincos implementation > if > > you had one. is there a reason you don't have one? > > I haven't submitted the versions of sincos[fl], which I've > developed over the last year or so, yet. First, I need to > redo some testing. Second, I need to convince Bruce that > the implementation would be a nice addition to libm. > > -- > Steve > From owner-freebsd-numerics@FreeBSD.ORG Thu Jun 27 19:25:39 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 09F5FE26 for ; Thu, 27 Jun 2013 19:25:39 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id C88BA1AA8 for ; Thu, 27 Jun 2013 19:25:38 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.6/8.14.6) with ESMTP id r5RJPchE041812; Thu, 27 Jun 2013 12:25:38 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.6/8.14.6/Submit) id r5RJPcGS041811; Thu, 27 Jun 2013 12:25:38 -0700 (PDT) (envelope-from sgk) Date: Thu, 27 Jun 2013 12:25:38 -0700 From: Steve Kargl To: enh Subject: Re: sincos? Message-ID: <20130627192538.GA41760@troutmask.apl.washington.edu> References: <20130627013502.GA37295@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jun 2013 19:25:39 -0000 On Thu, Jun 27, 2013 at 09:12:04AM -0700, enh wrote: > well, that was Intel and the code's not been accepted, but yes --- that's > another reason for me not to accept their patch! > > Intel claimed "The reason for this fix [beside workaround for O0 switch] - > it helps to remove some sin[f]+cos[f] code duplication (which is the whole > reason for introduction of such function at all), which results in > 1.58-1.81x performance gain on intervals |x|<100." i've not seen their > benchmark code, so i don't know what their distribution of values was, and > i don't understand why they covered a range as large as +/- 100. > The code duplication, which is removed, is the argument reduction for values |x| > pi / 4 for sin and cos. If you have void sincos(x, *s, *c) { *s = sin(x); *c = cos(x); } then both sin and cos call rem_pio2 (or whatever the function is called) if |x| > pi/4. The code in question removes one of the argument reduction calls, and so you get a speed improvement of 1.5 to 2. As Bruce noted, he would like to see some additional optimizations for -2*pi < x < 2*pi (may have the range incorrect here) integrated intoin, cos, sinl, and cosl before we worry about sincos[fl]. I'll get to those hopefully in August, but coshl, sinhl, and tanhl are on my plate. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Fri Jun 28 00:42:59 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 76E49B6E for ; Fri, 28 Jun 2013 00:42:59 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 3F00518B4 for ; Fri, 28 Jun 2013 00:42:59 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 3B5E5781EBD; Fri, 28 Jun 2013 10:42:54 +1000 (EST) Date: Fri, 28 Jun 2013 10:42:49 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: enh Subject: Re: sincos? In-Reply-To: Message-ID: <20130628103209.H1008@besplex.bde.org> References: <20130627013502.GA37295@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=eqSHVfVX c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=KkbArD1GNvbIUGj--mIA:9 a=CjuIK1q_8ugA:10 a=iOpTMNq0JQTl-TBF:21 a=osw8StZwuhklUwHI:21 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: freebsd-numerics@freebsd.org, Steve Kargl X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Jun 2013 00:42:59 -0000 On Thu, 27 Jun 2013, enh wrote: > well, that was Intel and the code's not been accepted, but yes --- that's > another reason for me not to accept their patch! > > Intel claimed "The reason for this fix [beside workaround for O0 switch] - > it helps to remove some sin[f]+cos[f] code duplication (which is the whole > reason for introduction of such function at all), which results in > 1.58-1.81x performance gain on intervals |x|<100." i've not seen their > benchmark code, so i don't know what their distribution of values was, and > i don't understand why they covered a range as large as +/- 100. +-2*Pi may be a bit too small, but most uses won't require very large angles. > when looking at i7 performance though, remember that x86 Android will > usually be running on Atom (and most Android devices are actually ARM, not > x86). Hardware trig may actually be best for Atom (like on x86 before about PPro for float precision and AthlonXP for double precision). Some of my optimizations in software libm depend on out of order execution so they will be pessimizations (hopefully small) on Atom and other in order execution CPUs. Bruce From owner-freebsd-numerics@FreeBSD.ORG Fri Jun 28 05:12:56 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2FEBA7CF for ; Fri, 28 Jun 2013 05:12:56 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net [50.196.151.174]) by mx1.freebsd.org (Postfix) with ESMTP id 13AB71270 for ; Fri, 28 Jun 2013 05:12:56 +0000 (UTC) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r5S5CsSc004253; Thu, 27 Jun 2013 22:12:54 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r5S5CrwZ004252; Thu, 27 Jun 2013 22:12:53 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Date: Thu, 27 Jun 2013 22:12:53 -0700 From: David Schultz To: Bruce Evans Subject: Re: sincos? Message-ID: <20130628051253.GB3590@zim.MIT.EDU> References: <20130627013502.GA37295@troutmask.apl.washington.edu> <20130628103209.H1008@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130628103209.H1008@besplex.bde.org> Cc: enh , Steve Kargl , freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Jun 2013 05:12:56 -0000 Steve had some patches for this that could have been committed years ago if people would stop quibbling about it. :) I put it on the TODO list on the wiki for a reason! Merging sin() and cos() is basically all there is to it. The advantages to having it are: 1. It's faster when computing the sine and cosine of large angles, because the arg reduction only has to be done once. In particular, functions like cexp() can benefit. 2. Most math libraries have it, even though it's not standardized. The disadvantage is that someone has to spend 15 minutes writing it, plus do some testing... From owner-freebsd-numerics@FreeBSD.ORG Fri Jun 28 05:23:48 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1F90FAD4 for ; Fri, 28 Jun 2013 05:23:48 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (50-196-151-174-static.hfc.comcastbusiness.net [50.196.151.174]) by mx1.freebsd.org (Postfix) with ESMTP id AF80212AC for ; Fri, 28 Jun 2013 05:23:47 +0000 (UTC) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.7/8.14.2) with ESMTP id r5S4wUZ6004172; Thu, 27 Jun 2013 21:58:30 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.7/8.14.2/Submit) id r5S4wTwC004171; Thu, 27 Jun 2013 21:58:29 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Date: Thu, 27 Jun 2013 21:58:29 -0700 From: David Schultz To: Eitan Adler Subject: Re: operation precedence bug: lib/msun/src Message-ID: <20130628045829.GA3590@zim.MIT.EDU> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Jun 2013 05:23:48 -0000 On Wed, Jun 19, 2013, Eitan Adler wrote: > Does the following look correct? [...] > diff --git a/lib/msun/src/s_fma.c b/lib/msun/src/s_fma.c [...] > - if (bits_lost != 1 ^ (int)(hibits & 1)) { > + if (bits_lost != (1 ^ (int)(hibits & 1))) { No, logical xor is intended: If we lost one bit due to denormalization, we need to adjust if the low bit is 1, and if we lost more than one bit, we need to adjust if the low bit is 0. I apologize for writing 6 lines of code that would take pages to explain. The relevant background (guard and sticky bits) is covered in the following book chapter if you're interested: http://uenics.evansville.edu/~mr56/ece752/AppendixH.pdf