From owner-freebsd-numerics@FreeBSD.ORG Fri Mar 13 03:58:33 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5E2F7A3A for ; Fri, 13 Mar 2015 03:58:33 +0000 (UTC) Received: from mail-ie0-x233.google.com (mail-ie0-x233.google.com [IPv6:2607:f8b0:4001:c03::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 23457F2 for ; Fri, 13 Mar 2015 03:58:30 +0000 (UTC) Received: by iegc3 with SMTP id c3so81887854ieg.3 for ; Thu, 12 Mar 2015 20:58:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=wPCQFP2mAsAka5YWCdyecJXJN2M2UnRqKSVa+k/reAw=; b=GG43+ofDjD8MQ2zdp/f0EeTrYTfDYGqKIDxE7wt4fXo9ubbsP3FaLpRyye2m5yMb03 rua1L4j1cuX+qnCn8mDTOLFI4HwAUPfCn083JoHnr9wt4RmBwryB/Iy5GiWImEmeeHVT vGogDS8ltaqU8E9XD3CZIRo2w/DSj7QOKxMikCr78LRHn57yD2oFKns6HZnQxslgxXbh V63GZrOouBTdpTJh3Kz3j72Em5iVrE/qWf4MgifVXVHjCj1Hc2grqy1v8wYooKD8xUYS 2m0DmSG8fFVvmUZrgHZ+1qsQxfYCE4abicLlEx/MSzZJmuZFwP9nXgvwak6w/DlGjDtv RcyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=wPCQFP2mAsAka5YWCdyecJXJN2M2UnRqKSVa+k/reAw=; b=hBew13VzRm/RByR+Fg7b5Yb1VZEDNDeLxmaAJ11pe6+6TG0nIxaSBsjPLJzcIzjiTv F5lKXSIzV53Kle0Dd8aEWdkpTaEF2Zaz+EY6rfvBOvmlJPfL4Xs6I9N0R9R88iqQ5QZB mQf298164qNRgGSN+TuxWirx9aDg4AhkyfdmjXPp3q3ZQmHiHWoM5GIxF58qsPyWZlh7 +5dgbBX3R+Fobs2CigCEmHu1BxXFuPz0I0e875p39TbceMIjpImeqjMrWXP2Yn/f2xRc lADYAr+CyxYv+payOp5h92w2o4g/Do63jUILSAlHbSMbTprqrH8SU8Seu3HqUHLO4t5H NCxA== X-Gm-Message-State: ALoCoQkn67M/HqMe5/epwLFwUEgs+rakBTNiG2yyxkIs4ODIzFnrue7PZufqXwmD45coUYo6C4WK X-Received: by 10.107.135.212 with SMTP id r81mr56011877ioi.38.1426219108157; Thu, 12 Mar 2015 20:58:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.25.129 with HTTP; Thu, 12 Mar 2015 20:58:06 -0700 (PDT) From: enh Date: Thu, 12 Mar 2015 20:58:06 -0700 Message-ID: Subject: libm functions sensitive to current rounding mode To: freebsd-numerics@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Mar 2015 03:58:33 -0000 various SoC vendors have been sending us (Android) assembler implementations of various libm functions, and we noticed when looking at one change that it was slower on certain inputs than the FreeBSD C implementation we were already using. the response was that the cost was coping with other rounding modes, and they gave this example: #include #include #include int main(){ double x = -0xb.6365e22ee46fp4; fesetround(FE_UPWARD); printf("sin(%f) = %.16e\n", x, sin(x)); fesetround(FE_DOWNWARD); printf("sin(%f) = %.16e\n", x, sin(x)); x = 0x1.921fb54442d16p0; fesetround(FE_UPWARD); printf("sin(%f) = %.16e\n", x, sin(x)); fesetround(FE_DOWNWARD); printf("sin(%f) = %.16e\n", x, sin(x)); return 0; } see https://android-review.googlesource.com/112700 for the original change and related conversation. glibc 2.19: sin(-182.212373) = -2.4759225463534308e-18 sin(-182.212374) = -2.4759225463534309e-18 sin(1.570797) = 1.0000000000000000e+00 sin(1.570796) = 1.0000000000000000e+00 (glibc's fixed bugs here in the past. see https://sourceware.org/bugzilla/show_bug.cgi?id=3976 for example.) the FreeBSD libm: sin(-182.212374) = -2.4759225463534304e-18 sin(-182.212374) = 2.2575413760606011e-11 sin(1.570796) = 1.0000000000000000e+00 sin(1.570796) = 1.0000000002522276e+00 it looks like e_sqrtl.c, s_fmal.c, s_fma.c, and s_fmaf.c save/restore the rounding mode but nothing else does. let me know if you'd like me to send trivial patches for any of the affected functions. (do all the libm functions need this?) -- Elliott Hughes - http://who/enh - http://jessies.org/~enh/ Android native code/tools questions? Mail me/drop by/add me as a reviewer. From owner-freebsd-numerics@FreeBSD.ORG Fri Mar 13 05:34:41 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A6D574A2 for ; Fri, 13 Mar 2015 05:34:41 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 88DF1C0D for ; Fri, 13 Mar 2015 05:34:41 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.14.9/8.14.9) with ESMTP id t2D5Ho8A063320 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 12 Mar 2015 22:17:50 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.9/8.14.9/Submit) id t2D5HocW063319; Thu, 12 Mar 2015 22:17:50 -0700 (PDT) (envelope-from sgk) Date: Thu, 12 Mar 2015 22:17:50 -0700 From: Steve Kargl To: enh Subject: Re: libm functions sensitive to current rounding mode Message-ID: <20150313051749.GA63174@troutmask.apl.washington.edu> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Mar 2015 05:34:41 -0000 On Thu, Mar 12, 2015 at 08:58:06PM -0700, enh via freebsd-numerics wrote: > various SoC vendors have been sending us (Android) assembler > implementations of various libm functions, and we noticed when looking > at one change that it was slower on certain inputs than the FreeBSD C > implementation we were already using. the response was that the cost > was coping with other rounding modes, and they gave this example: > > #include > #include > #include > > int main(){ > double x = -0xb.6365e22ee46fp4; > fesetround(FE_UPWARD); > printf("sin(%f) = %.16e\n", x, sin(x)); > fesetround(FE_DOWNWARD); > printf("sin(%f) = %.16e\n", x, sin(x)); > > x = 0x1.921fb54442d16p0; > fesetround(FE_UPWARD); > printf("sin(%f) = %.16e\n", x, sin(x)); > fesetround(FE_DOWNWARD); > printf("sin(%f) = %.16e\n", x, sin(x)); > return 0; > } > > see https://android-review.googlesource.com/112700 for the original > change and related conversation. > > glibc 2.19: > > sin(-182.212373) = -2.4759225463534308e-18 > sin(-182.212374) = -2.4759225463534309e-18 > sin(1.570797) = 1.0000000000000000e+00 > sin(1.570796) = 1.0000000000000000e+00 > > (glibc's fixed bugs here in the past. see > https://sourceware.org/bugzilla/show_bug.cgi?id=3976 for example.) > > the FreeBSD libm: > > sin(-182.212374) = -2.4759225463534304e-18 > sin(-182.212374) = 2.2575413760606011e-11 > sin(1.570796) = 1.0000000000000000e+00 > sin(1.570796) = 1.0000000002522276e+00 > > it looks like e_sqrtl.c, s_fmal.c, s_fma.c, and s_fmaf.c save/restore > the rounding mode but nothing else does. > > let me know if you'd like me to send trivial patches for any of the > affected functions. (do all the libm functions need this?) > http://wiki.freebsd.org/Numerics documents the progress on implementing missing libm functions. It's somewhat out of date as only powl and tgammal as required by C99/11 are missing for the long double functions. I don't know the status of complex.h functions. I don't do wiki, so haven't even tried to update the page. At the bottom of the page, you'll see a list of future directions. Dealing with rounding modes is on the list. As of now, FreeBSD libm should work with round-to-nearest; other modes probably need to be tested. e_sqrtl.c was implemented to handle the rounding mode for 2 reasons: (1) IEEE-754 mandates that square root is correctly rounded in all rounding modes; and (2), fdlibm's src/e_sqrt.c (might be wrong filename) has a long comment that explains an algorithm that can be used to implement software sqrt() in all rounding modes. That's what I implemented and then Bruce Evans and David Schultz fixed what I did. I suspect all other functions, which are implemented in software, need to be tested and patches developed. If you have trivial patches, the best thing to do is submit a bug report via FreeBSD's bugzilla so that the patch doesn't get lost. Someday after I get powl and tgammal written, and I become independently wealthy, I may have time to work on libm. BTW, n1785.pdf is adding a number new functions to the C library (e.g., sinpi(x), etc). I suspect I would work on those before dealing with rounding modes; as all of my personal codes are expecting round-to-nearest. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Fri Mar 13 11:14:05 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 660A47F9 for ; Fri, 13 Mar 2015 11:14:05 +0000 (UTC) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2665A88A for ; Fri, 13 Mar 2015 11:14:04 +0000 (UTC) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 55FD13C6020; Fri, 13 Mar 2015 21:46:17 +1100 (AEDT) Date: Fri, 13 Mar 2015 21:46:14 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: enh Subject: Re: libm functions sensitive to current rounding mode In-Reply-To: Message-ID: <20150313192625.B885@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=ZuzUdbLG c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=Oh2cFVv5AAAA:8 a=CCpqsmhAAAAA:8 a=UWV0wBGDR1em69PW5csA:9 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Mar 2015 11:14:05 -0000 On Thu, 12 Mar 2015, enh via freebsd-numerics wrote: > various SoC vendors have been sending us (Android) assembler > implementations of various libm functions, and we noticed when looking > at one change that it was slower on certain inputs than the FreeBSD C > implementation we were already using. the response was that the cost > was coping with other rounding modes, and they gave this example: This seems to be due to bad fixes. Supporting rounding in the current rounding mode is only very useful if the rounding is perfect, but perfect rounding is difficult for complicated functions. Switching the rounding mode is incredibly slow on x86 (it takes as long or longer than sin()), and doing it usually gives worse results. The result _should_ depend on the rounding mode. It is specified to do so for sqrt* and probably for many other simple functions. This has FreeBSD has only been tested to do perfect rounding in all rounding modes for the following functions: ceil*, fabs*, floor*, logb*, rint*, round*, significan*, sqrt*, trunc* ceil*, fabs*, floor*, logb*, rint*, round*, significan*, sqrt*, trunc* Implementing this for the software sqrt* makes sqrt* unusably slow, but most arches use hardware sqrt*. > #include > #include > #include > > int main(){ > double x = -0xb.6365e22ee46fp4; > fesetround(FE_UPWARD); > printf("sin(%f) = %.16e\n", x, sin(x)); > fesetround(FE_DOWNWARD); > printf("sin(%f) = %.16e\n", x, sin(x)); > > x = 0x1.921fb54442d16p0; > fesetround(FE_UPWARD); > printf("sin(%f) = %.16e\n", x, sin(x)); > fesetround(FE_DOWNWARD); > printf("sin(%f) = %.16e\n", x, sin(x)); > return 0; > } > > see https://android-review.googlesource.com/112700 for the original > change and related conversation. I couldn't read this with my old browsers. > glibc 2.19: > > sin(-182.212373) = -2.4759225463534308e-18 > sin(-182.212374) = -2.4759225463534309e-18 > sin(1.570797) = 1.0000000000000000e+00 > sin(1.570796) = 1.0000000000000000e+00 sin() is a good-bad example. If you "optimize" it by writing it in asm, then you get enormous errors (multiple tera-ulps) even near small even multiples of Pi/2. The 1-ulp difference made by the rounding mode is in the noise. Actual results for i387 sin() on i386, with a middle result for FE_TONEAREST added: sin(-182.212373) = -2.7105054312137606e-18 sin(-182.212374) = -2.7105054312137611e-18 sin(-182.212374) = -2.7105054312137611e-18 sin(1.570797) = 1.0000000000000000e+00 sin(1.570796) = 1.0000000000000000e+00 sin(1.570796) = 9.9999999999999988e-01 Results should be printed in %a format so that the bits can be seen. i387 values: sin(-182.212373) = -0x1.8ffffffffffffp-59 sin(-182.212374) = -0x1.9p-59 sin(-182.212374) = -0x1.9p-59 sin(1.570797) = 0x1p+0 sin(1.570796) = 0x1p+0 sin(1.570796) = 0x1.fffffffffffffp-1 We see that the rounding mode indeed makes a 1-bit difference. software values on i386 (almost the same as below): sin(-182.212373) = -2.4759225463534304e-18 sin(-182.212374) = -2.4759225463534308e-18 sin(-182.212374) = 2.2575413760606011e-11 sin(1.570797) = 1.0000000000000000e+00 sin(1.570796) = 1.0000000000000000e+00 sin(1.570796) = 1.0000000002522273e+00 sin(-182.212373) = -0x1.6d61b58c99c42p-59 sin(-182.212374) = -0x1.6d61b58c99c43p-59 sin(-182.212374) = 0x1.8d26ap-36 sin(1.570797) = 0x1p+0 sin(1.570796) = 0x1p+0 sin(1.570796) = 0x1.000000011553bp+0 We see much more from this: - somehow TONEAREST gives almost the same results as UPWARDS. It appears to work correctly at -182.212373 in the FreeBSD software case only - enormous errors at -182.212373 from the i387, as predicted (the software result is believed to be correct, while the i387 result shows evidence of canceling all except the top 5 bits). All bits except the top bit are wrong, so the error is about 2**52 = 4 Pulps (1 Pulp = 1 peta-ulp). DOWNWARDS makes a 1 ulp adjustment to this. - glibc apparently uses software, since it is correct in all cases, except it apparently forces a common rounding for UPWARDS and NEAREST and thus is certain to be off by 1 ulp for 1 of them. - the 1.57079x are was obviously chosen to be as close as possible to Pi/2 (it should be printed in %a format too, so that we can see its bits (these must be more accurate than shown by the decimal format) and see that the bits don't depend on the rounding mode). Since sin() has a maximum at this value, and the infinitely-precise result at exactly Pi/2 is exactly 1, the approximate internal result must be slightly less than 1, and UPWARDS and TONEAREST should round it to 1, while DOWNWARDS should round it to 1 ulp less than 1. Hardware sin does this. glibc sin apparently does extra work to round in the wrong direction... - ... but FreeBSD software sin has an enormous error for DOWNWARDS on 1.570796 -- it rounds in the wrong direction by 0x11553 ulps :-(. About 1 Mulp (1 Mulp = 1 mega-ulp). > (glibc's fixed bugs here in the past. see > https://sourceware.org/bugzilla/show_bug.cgi?id=3976 for example.) This is interesting. Vincent L certainly knows what he is doing. As stated in the thread, the the library can do anything it wants in non-default rounding modes provided it documents this. It apparently uses the bad fix of changing the rounding mode to FE_TONEAREST. This is slow and guarantees an error of 1 ulp in most cases, so to avoid much larger errors in hopefully most cases. Long ago, I all of the most common functions for such errors, exhaustively for float functions, and didn't find any of more than an ulp or 2. I might have only done this on i386. i386 protects against such errors in float precision by evaluating intermediate results in higher precision. In non-FreeBSD, the default rounding precision is not double, so i386 should also protect agains such errors in double precision. Special cases like multiples of Pi/2 are too hard to find without special tests except in float precision. > the FreeBSD libm: > > sin(-182.212374) = -2.4759225463534304e-18 > sin(-182.212374) = 2.2575413760606011e-11 > sin(1.570796) = 1.0000000000000000e+00 > sin(1.570796) = 1.0000000002522276e+00 Verified above. > it looks like e_sqrtl.c, s_fmal.c, s_fma.c, and s_fmaf.c save/restore > the rounding mode but nothing else does. This makes these functions incredibly slow. > let me know if you'd like me to send trivial patches for any of the > affected functions. (do all the libm functions need this?) This cannot be fixed using trivial patches. Almost all transcendental functions are probably affected. Actually, it is possible to fix many long double functions without making them more than twice as slow in the usual case where the rounding mode is not the default. The ENTERI() macros does a conditional mode switch to change the rounding precision from 53 bits to 64 bits, so that long doubles work properly. It avoids the switch when it would be null. (This reminds me that modern x86 does the same in hardware for at least switching the rounding mode, so unconditional switching of the rounding mode might not be so slow when the switch is null. Also, fpset*() is pessimized in space and time to avoid traps, while feset*() never bothered with this. To switch the rounding mode, you should use fesetround(), but fesetprec() doesn't exist so ENTERI() has to use fpsetprec().) You could try adding unconditional rounding mode switches to ENTERI() to test the long double case. However, testing long doubles is harder. exp() is mentioned in the glibc thread. It has some details of interest in FreeBSD. On i386 only, it written in "optimized" asm that is broken as follows: X /* X * Extended precision is needed to reduce the maximum error from X * hundreds of ulps to less than 1 ulp. Switch to it if necessary. X * We may as well set the rounding mode to to-nearest and mask traps X * if we switch. X */ I thought that this rounding mode switch was a good idea when I wrote this 20+ years ago. Now I know better, and haven't used this version for 3 years. The software version never did this. I now use the software version on i386 as well as on amd64, after optimizing it a bit so that it is a little more accurate than this asm (hardware i387) version. The software version has been faster for many years on x86. The precision mode switch here is part of what makes it slow. glibc probably assumes that the default rounding precision is 64 bits, so that it doesn't need this mode switch. It turns out that 64 bits is just enough to reduce the error in double precision to below 1 ulp. expl() cannot be written like this in asm, since there is no mode switch to get more than long double precision. Not switching would give the same large errors in long double precision that this mode switch fixes for double precision. The rounding mode switch accidentally "fixes" any algorithm instability in i386 exp() in the same way as in glibc. But I think the extra precision gives enough stablity. Bruce From owner-freebsd-numerics@FreeBSD.ORG Fri Mar 13 12:16:57 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5EFBF5BE for ; Fri, 13 Mar 2015 12:16:57 +0000 (UTC) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id E4B5DECC for ; Fri, 13 Mar 2015 12:16:56 +0000 (UTC) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 30DF2D6372E; Fri, 13 Mar 2015 23:16:42 +1100 (AEDT) Date: Fri, 13 Mar 2015 23:16:41 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans Subject: Re: libm functions sensitive to current rounding mode In-Reply-To: <20150313192625.B885@besplex.bde.org> Message-ID: <20150313224213.W2994@besplex.bde.org> References: <20150313192625.B885@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=Za4kaKlA c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=CCpqsmhAAAAA:8 a=04n_wW2SkaRwE1C0LGgA:9 a=CjuIK1q_8ugA:10 Cc: enh , freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Mar 2015 12:16:57 -0000 On Fri, 13 Mar 2015, Bruce Evans wrote: > On Thu, 12 Mar 2015, enh via freebsd-numerics wrote: > ... > - ... but FreeBSD software sin has an enormous error for DOWNWARDS on > 1.570796 -- it rounds in the wrong direction by 0x11553 ulps :-(. > About 1 Mulp (1 Mulp = 1 mega-ulp). > > >> (glibc's fixed bugs here in the past. see >> https://sourceware.org/bugzilla/show_bug.cgi?id=3976 for example.) > > This is interesting. Vincent L certainly knows what he is doing. As > stated in the thread, the the library can do anything it wants in > non-default rounding modes provided it documents this. It apparently > uses the bad fix of changing the rounding mode to FE_TONEAREST. This > is slow and guarantees an error of 1 ulp in most cases, so to avoid > much larger errors in hopefully most cases. > > Long ago, I all of the most common functions for such errors, > exhaustively for float functions, and didn't find any of more than an > ulp or 2. I might have only done this on i386. i386 protects against > such errors in float precision by evaluating intermediate results in > higher precision. In non-FreeBSD, the default rounding precision is > not double, so i386 should also protect agains such errors in double > precision. Special cases like multiples of Pi/2 are too hard to find > without special tests except in float precision. A quick rerun of the test finds problems in much the same areas as the glibc thread. I must have put off fixing this since most functions work OK and there are larger bugs to fix (like the multi-Pulp errors given by using the i387). Test of most float functions relative to double, on i386, testing 1/256 of possible values. X acos: max_er = 0x129b24c8 0.5814, avg_er = 0.170, #>=1:0.5 = 0:2724 X acos: max_er = 0x228da264 1.0797, avg_er = 0.299, #>=1:0.5 = 1334:7453322 X acos: max_er = 0x2266570c 1.0750, avg_er = 0.197, #>=1:0.5 = 1324:869716 X acos: max_er = 0x228da264 1.0797, avg_er = 0.299, #>=1:0.5 = 1334:7453322 The 4 cases are FP_RN, FP_RZ, RP_RP, FP_RM (ieeefp has better names). The errors are given in bits, then in ulps. With perfect rounding, max_er would be at most 0x0fffffff (0.5- ulps) for FP_RN and 0x1fffffff (1.0- ulps) for the other rounding modes. This is acheived mainly by sqrtf. Less than 0.5 ulps more than that is OK. A few functions (e.g., cos) try to get less than 0.01 or 0.1 ulps more, and some functions should be exact. X acosh: max_er = 0x2bf776b9 1.3740, avg_er = 0.063, #>=1:0.5 = 62:29183 X acosh: max_er = 0x517afdff 2.5462, avg_er = 0.128, #>=1:0.5 = 57310:2149989 X acosh: max_er = 0x619b49f6 3.0503, avg_er = 0.129, #>=1:0.5 = 60042:2153013 X acosh: max_er = 0x517afdff 2.5462, avg_er = 0.128, #>=1:0.5 = 57310:2149989 X asin: max_er = 0x1359931a 0.6047, avg_er = 0.012, #>=1:0.5 = 0:12924 X asin: max_er = 0x22938df5 1.0805, avg_er = 0.022, #>=1:0.5 = 3592:367188 X asin: max_er = 0x2459bbfe 1.1360, avg_er = 0.024, #>=1:0.5 = 6373:393217 X asin: max_er = 0x23dc2d74 1.1206, avg_er = 0.023, #>=1:0.5 = 6325:393217 X asinh: max_er = 0x2f6f84c3 1.4824, avg_er = 0.141, #>=1:0.5 = 2644:244920 X asinh: max_er = 0x5d1b55c0 2.9095, avg_er = 0.301, #>=1:0.5 = 464816:4941040 X asinh: max_er = 0x5ee3fd1a 2.9654, avg_er = 0.301, #>=1:0.5 = 464616:4938660 X asinh: max_er = 0x5d1b55c0 2.9095, avg_er = 0.301, #>=1:0.5 = 464816:4941040 X atan: max_er = 0x1089602a 0.5168, avg_er = 0.185, #>=1:0.5 = 0:2956 X atan: max_er = 0x208b8696 1.0170, avg_er = 0.325, #>=1:0.5 = 936:7876354 X atan: max_er = 0x208b8696 1.0171, avg_er = 0.275, #>=1:0.5 = 1439:4587521 X atan: max_er = 0x208b8696 1.0170, avg_er = 0.274, #>=1:0.5 = 1439:4587521 X atanh: max_er = 0x2e56025d 1.4480, avg_er = 0.016, #>=1:0.5 = 2584:180474 X atanh: max_er = 0x5e75b4c4 2.9518, avg_er = 0.045, #>=1:0.5 = 359898:667576 X atanh: max_er = 0x5e325a52 2.9437, avg_er = 0.032, #>=1:0.5 = 180535:437158 X atanh: max_er = 0x5e75b4c4 2.9518, avg_er = 0.031, #>=1:0.5 = 182422:434622 X cbrt: max_er = 0xffffec5 0.5000, avg_er = 0.250, #>=1:0.5 = 0:0 X cbrt: max_er = 0x1fffffff 0.9999, avg_er = 0.497, #>=1:0.5 = 1724:8326990 X cbrt: max_er = 0x1fffffff 1.0000, avg_er = 0.499, #>=1:0.5 = 1636:8355750 X cbrt: max_er = 0x1fffffff 0.9999, avg_er = 0.498, #>=1:0.5 = 1380:8355494 X ceil: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X ceil: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X ceil: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X ceil: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 The above errors are a bit larger than for the default FP_RN, but not too bad. X cos: max_er = 0x10074700 0.5009, avg_er = 0.138, #>=1:0.5 = 0:2512 X cos: max_er = 0x1bb65457418000 14529186.7267, avg_er = 43.479, #>=1:0.5 = 458712:4761870 X cos: max_er = 0x1bb65477410000 14529187.7267, avg_er = 43.508, #>=1:0.5 = 525970:4867436 X cos: max_er = 0x1bb65457418000 14529186.7267, avg_er = 43.486, #>=1:0.5 = 488650:4793876 The algorithm breaks down for cosf. This is with my i386 system not using the i387 for sin like -current still does (neither uses it for sinf). Otherwise these errors of a mere 14 Mulps would be masked by errors of 17 Gulps. That is 17 Gulps of error in 24-bit precision from the reference function. Multiply by 2**29 for the error in double precision. X cosh: max_er = 0x104af206 0.5091, avg_er = 0.019, #>=1:0.5 = 0:2180 X cosh: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 809.671, #>=1:0.5 = 8088592:8619926 X cosh: max_er = 0x1d9252a8f 14.7858, avg_er = 0.516, #>=1:0.5 = 6750688:8175486 X cosh: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 809.671, #>=1:0.5 = 8088592:8619926 The algorithm breaks down for cosh, presumably due to its breakdown for exp. This is with my i386 system not using the i387 for exp like -current still does (neither uses it for expf). The glibc thread says that the 32-bit exp worked but the 64-bit exp didn't work. This would be explained by algorithmic errors in the non-i387 version only, and glibc using the i387 in 32-bit mode only for the same bad/historical reasons as FreeBSD. X erf: max_er = 0x14faf158 0.6556, avg_er = 0.126, #>=1:0.5 = 0:109523 X erf: max_er = 0x4c3e1ab7 2.3825, avg_er = 0.260, #>=1:0.5 = 153763:4375088 X erf: max_er = 0x4041d2d3 2.0081, avg_er = 0.253, #>=1:0.5 = 67182:4227700 X erf: max_er = 0x4a70f434 2.3262, avg_er = 0.252, #>=1:0.5 = 78494:4227994 X exp: max_er = 0x1047a00a 0.5087, avg_er = 0.033, #>=1:0.5 = 0:2032 X exp: max_er = 0x3800000020000000 7516192769.0000, avg_er = 1343.574, #>=1:0.5 = 7259405:8747670 X exp: max_er = 0x39a138016 28.8149, avg_er = 1.162, #>=1:0.5 = 7256189:8776222 X exp: max_er = 0x3800000020000000 7516192769.0000, avg_er = 1343.574, #>=1:0.5 = 7259405:8747670 X exp2: max_er = 0x1006a936 0.5008, avg_er = 0.033, #>=1:0.5 = 0:581 X exp2: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1056.147, #>=1:0.5 = 6412147:8351665 X exp2: max_er = 0x20669100 1.0126, avg_er = 0.500, #>=1:0.5 = 6429089:8371868 X exp2: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1056.147, #>=1:0.5 = 6412147:8351665 X expm1: max_er = 0x105e998d 0.5115, avg_er = 0.031, #>=1:0.5 = 0:508 X expm1: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1342.318, #>=1:0.5 = 3985247:4976979 X expm1: max_er = 0x231f0351 1.0976, avg_er = 0.061, #>=1:0.5 = 1888:1014283 X expm1: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1342.317, #>=1:0.5 = 3986102:4957995 All the functions that are exp*f or use exp*f have problems, except exp*f in RP mode. coshf has lesser problems in RP mode. Its errors are then large but not enormous. Probably a small error in expf turns into a larger one after addition. (My coshf uses an expf kernel like coshl uses an expl kernel in -current.) X fabs: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X fabs: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X fabs: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X fabs: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X floor: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X floor: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X floor: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X floor: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X j0: max_er = 0x2f51e6f4f1c 6056.9511, avg_er = 0.390, #>=1:0.5 = 2401634:5079654 X j0: max_er = 0x109227b28a731a 8687933.5794, avg_er = 17.443, #>=1:0.5 = 6602082:9050730 X j0: max_er = 0x80108bbf4ba8d7e8 17188543994.3644, avg_er = 29.127, #>=1:0.5 = 5773002:7488366 X j0: max_er = 0x109227b28a731a 8687933.5794, avg_er = 17.492, #>=1:0.5 = 6521176:9046266 X j1: max_er = 0x1b08a6a7544 3460.3255, avg_er = 0.395, #>=1:0.5 = 2415186:5077432 X j1: max_er = 0x800eac41530e07cc 17187561994.5954, avg_er = 28.930, #>=1:0.5 = 4387856:7844188 X j1: max_er = 0x64e89d1202af4 3306574.5352, avg_er = 16.048, #>=1:0.5 = 6010996:7994846 X j1: max_er = 0x800eac41530e07ca 17187561994.5954, avg_er = 29.129, #>=1:0.5 = 5858116:8158408 X lgamma:max_er = 0x1d957c2a8000 60587.8802, avg_er = 0.232, #>=1:0.5 = 140016:899042 X lgamma:max_er = 0x65ee2cf0b75af4 53440871.5223, avg_er = 1682.939, #>=1:0.5 = 1398802:7250814 X lgamma:max_er = 0xd6b9b6e60000 439757.7156, avg_er = 0.476, #>=1:0.5 = 920256:6777199 X lgamma:max_er = 0x65ee2cf0b75af4 53440871.5223, avg_er = 1682.914, #>=1:0.5 = 1097301:6940713 Besel and gamma functions have large errors even in RN. These get worse in other rounding modes. X log: max_er = 0xfffffe6 0.5000, avg_er = 0.124, #>=1:0.5 = 0:0 X log: max_er = 0x1fffffdd 0.9999, avg_er = 0.249, #>=1:0.5 = 0:4178615 X log: max_er = 0x1fffffdd 1.0000, avg_er = 0.249, #>=1:0.5 = 0:4177376 X log: max_er = 0x1fffff83 0.9999, avg_er = 0.249, #>=1:0.5 = 0:4178462 X log10: max_er = 0xfffff6d 0.5000, avg_er = 0.125, #>=1:0.5 = 0:0 X log10: max_er = 0x1fffffff 0.9999, avg_er = 0.249, #>=1:0.5 = 6:4177256 X log10: max_er = 0x1fffffff 1.0000, avg_er = 0.250, #>=1:0.5 = 6:4177785 X log10: max_er = 0x1fffffff 0.9999, avg_er = 0.249, #>=1:0.5 = 6:4178059 X log1p: max_er = 0x10288a73 0.5049, avg_er = 0.088, #>=1:0.5 = 0:604 X log1p: max_er = 0x2028aaef 1.0049, avg_er = 0.174, #>=1:0.5 = 341:2883620 X log1p: max_er = 0x2025c234 1.0047, avg_er = 0.175, #>=1:0.5 = 251:2896319 X log1p: max_er = 0x2028aaef 1.0049, avg_er = 0.173, #>=1:0.5 = 300:2870804 X log2: max_er = 0x100128e0 0.5001, avg_er = 0.125, #>=1:0.5 = 0:17 X log2: max_er = 0x20002e79 1.0000, avg_er = 0.249, #>=1:0.5 = 5:4177477 X log2: max_er = 0x20005cf1 1.0001, avg_er = 0.250, #>=1:0.5 = 10:4187131 X log2: max_er = 0x20011f0c 1.0001, avg_er = 0.248, #>=1:0.5 = 1:4168439 log and log10 still use the i387, for the test and reference function, so are deceptively consistent/accurate. The others use software that calls the hardware log. Apparently the software has no algorithmic problems. X logb: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X logb: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X logb: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X logb: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X rint: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X rint: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X rint: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X rint: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X round: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X round: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X round: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X round: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X significand:max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X significand:max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X significand:max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X significand:max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X sin: max_er = 0x10073f96 0.5009, avg_er = 0.137, #>=1:0.5 = 0:2436 X sin: max_er = 0x117c97fcf3c000 9168063.9047, avg_er = 39.254, #>=1:0.5 = 457300:4795627 X sin: max_er = 0x117c981cf34000 9168064.9047, avg_er = 39.281, #>=1:0.5 = 524127:4853666 X sin: max_er = 0x117c97fcf3c000 9168063.9047, avg_er = 39.260, #>=1:0.5 = 486966:4806289 X sinh: max_er = 0x10431a04 0.5082, avg_er = 0.019, #>=1:0.5 = 0:1734 X sinh: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 809.674, #>=1:0.5 = 8089378:8598420 X sinh: max_er = 0x380000001fffffff 7516192769.0000, avg_er = 1428.864, #>=1:0.5 = 4108592:4646473 X sinh: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1428.893, #>=1:0.5 = 4123299:4654026 X sqrt: max_er = 0xffffc47 0.5000, avg_er = 0.124, #>=1:0.5 = 0:0 X sqrt: max_er = 0x1ffffcca 0.9999, avg_er = 0.247, #>=1:0.5 = 0:4147423 X sqrt: max_er = 0x1fffd9d7 1.0000, avg_er = 0.250, #>=1:0.5 = 0:4192033 X sqrt: max_er = 0x1ffffcca 0.9999, avg_er = 0.247, #>=1:0.5 = 0:4147423 X tan: max_er = 0x1997ff78 0.7998, avg_er = 0.150, #>=1:0.5 = 0:1188178 X tan: max_er = 0xe0f2b5b350a325 117937581.6035, avg_er = 1715.855, #>=1:0.5 = 1378705:5016746 X tan: max_er = 0xe0f2b5b35198ca 117937581.6038, avg_er = 1715.881, #>=1:0.5 = 1385644:5030977 X tan: max_er = 0xe0f2b5b35479ad 117937581.6040, avg_er = 1715.880, #>=1:0.5 = 1385644:5030977 X tanh: max_er = 0x11710b6d 0.5450, avg_er = 0.016, #>=1:0.5 = 0:5702 X tanh: max_er = 0x349ef98c7 26.3104, avg_er = 0.981, #>=1:0.5 = 14807030:16230146 X tanh: max_er = 0x367cb0c6c 27.2436, avg_er = 0.272, #>=1:0.5 = 3378729:4287978 X tanh: max_er = 0x369ef98c7 27.3104, avg_er = 0.516, #>=1:0.5 = 7436992:8380189 X trunc: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X trunc: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X trunc: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X trunc: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X y0: max_er = 0x830232be000 16769.0991, avg_er = 0.315, #>=1:0.5 = 1356271:4560040 X y0: max_er = 0x1a77ca8c125e1f 13876820.3772, avg_er = 10.160, #>=1:0.5 = 5678376:7876031 X y0: max_er = 0x800c3393437ab82d 17186266266.1088, avg_er = 1038.278, #>=1:0.5 = 4674471:7301111 X y0: max_er = 0x1a77cacc125e17 13876822.3772, avg_er = 9.955, #>=1:0.5 = 3116737:4399584 X y1: max_er = 0x859fa2773502b 4378577.2328, avg_er = 1.307, #>=1:0.5 = 1472099:4660542 X y1: max_er = 0xc45f308dc9c883 102955396.4308, avg_er = 1577.846, #>=1:0.5 = 4247855:7239448 X y1: max_er = 0xc45f308dc9c883 102955396.4309, avg_er = 1572.738, #>=1:0.5 = 4781943:7380419 X y1: max_er = 0x19ddad63a68667 13561195.1140, avg_er = 14.622, #>=1:0.5 = 2746030:3875201 The other trig, hyperbolic and Bessel functions are consistently broken. pow() is probably broken like exp(). My quick test didn't reach it. It is mentioned in the glibc thread. The glibc fixes for exp seem to be more subtle than clobbering the rounding mode. Bruce From owner-freebsd-numerics@FreeBSD.ORG Fri Mar 13 13:44:00 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0446ABE4 for ; Fri, 13 Mar 2015 13:44:00 +0000 (UTC) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 603A9AD9 for ; Fri, 13 Mar 2015 13:43:59 +0000 (UTC) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 1972C7819C2; Sat, 14 Mar 2015 00:43:48 +1100 (AEDT) Date: Sat, 14 Mar 2015 00:43:47 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans Subject: Re: libm functions sensitive to current rounding mode In-Reply-To: <20150313224213.W2994@besplex.bde.org> Message-ID: <20150313234029.N6922@besplex.bde.org> References: <20150313192625.B885@besplex.bde.org> <20150313224213.W2994@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=Za4kaKlA c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=pijS6xPEOuVcagq0fuIA:9 a=gQ6FDVexZt2O46Ez:21 a=Q-miSVZZwoN2xv3S:21 a=CjuIK1q_8ugA:10 Cc: enh , freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Mar 2015 13:44:00 -0000 On Fri, 13 Mar 2015, Bruce Evans wrote: > On Fri, 13 Mar 2015, Bruce Evans wrote: > >> On Thu, 12 Mar 2015, enh via freebsd-numerics wrote: >> ... >> - ... but FreeBSD software sin has an enormous error for DOWNWARDS on >> 1.570796 -- it rounds in the wrong direction by 0x11553 ulps :-(. >> About 1 Mulp (1 Mulp = 1 mega-ulp). >> ... > A quick rerun of the test finds problems in much the same areas as the > glibc thread. I must have put off fixing this since most functions work > OK and there are larger bugs to fix (like the multi-Pulp errors given > by using the i387). > X exp: max_er = 0x1047a00a 0.5087, avg_er = 0.033, #>=1:0.5 = 0:2032 > X exp: max_er = 0x3800000020000000 7516192769.0000, avg_er = 1343.574, > #>=1:0.5 = 7259405:8747670 > X exp: max_er = 0x39a138016 28.8149, avg_er = 1.162, #>=1:0.5 = > 7256189:8776222 > X exp: max_er = 0x3800000020000000 7516192769.0000, avg_er = 1343.574, > #>=1:0.5 = 7259405:8747670 > X exp2: max_er = 0x1006a936 0.5008, avg_er = 0.033, #>=1:0.5 = 0:581 > X exp2: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1056.147, > #>=1:0.5 = 6412147:8351665 > X exp2: max_er = 0x20669100 1.0126, avg_er = 0.500, #>=1:0.5 = > 6429089:8371868 > X exp2: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1056.147, > #>=1:0.5 = 6412147:8351665 > X expm1: max_er = 0x105e998d 0.5115, avg_er = 0.031, #>=1:0.5 = 0:508 > X expm1: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1342.318, > #>=1:0.5 = 3985247:4976979 > X expm1: max_er = 0x231f0351 1.0976, avg_er = 0.061, #>=1:0.5 = 1888:1014283 > X expm1: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1342.317, > #>=1:0.5 = 3986102:4957995 > > All the functions that are exp*f or use exp*f have problems, except exp*f > in RP mode. coshf has lesser problems in RP mode. Its errors are then > large but not enormous. Probably a small error in expf turns into a larger > one after addition. (My coshf uses an expf kernel like coshl uses an > expl kernel in -current.) Bah, I just remember that exp and trig functions intentionally depend on round-to-nearest mode, since other modes aren't really supported and efficiency is gained by assuming this mode. The optimizations started in the range reduction for exp2*. I didn't like exp2* being so much faster than more important functions, and copied some of its methods to trig and plain exp. Poorly documented copies of the optimization replicated. My version now uses inline functions in math_private.h for this. There is still a problem choosing the algorithm. To work, calculations like double x; ... x + 0x1p52 - 0x1p52 need not only round-to-nearest, but also no extra precision. With extra precision, a different adjustment must be used. But if the caller can change the precision, this adjustment varies at runtime. On i386, it assumed that the precision is normally 53 bits, and ENTERI() does the necessary switch to 64-bit precision which matches the variation of the adjustment to 0x1p63. Here is my current centralized version: X /* X * The rnint() family rounds to the nearest integer for a restricted range X * range of args (up to about 2**MANT_DIG). We assume that the current ^^^^^^^^^^^^^^^^^^^^^^^^^^ X * rounding mode is FE_TONEAREST so that this can be done efficiently. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ X * Extra precision causes more problems in practice, and we only centralize X * this here to reduce those problems, and have not solved the efficiency X * problems. The exp2() family uses a more delicate version of this that X * requires extracting bits from the intermediate value, so it is not X * centralized here and should copy any solution of the efficiency problems. X */ X X static inline double X rnint(__double_t x) X { X /* X * This casts to double to kill any extra precision. This depends X * on the cast being applied to a double_t to avoid compiler bugs X * (this is a cleaner version of STRICT_ASSIGN()). This is X * inefficient if there actually is extra precision, but is hard X * to improve on. We use double_t in the API to minimise conversions X * for just calling here. Note that we cannot easily change the X * magic number to the one that works directly with double_t, since X * the rounding precision is variable at runtime on x86 so the X * magic number would need to be variable. Assuming that the X * rounding precision is always the default is too fragile. This X * and many other complications will move when the default is X * changed to FP_PE. X */ X return ((double)(x + 0x1.8p52) - 0x1.8p52); X } X X static inline float X rnintf(__float_t x) X { X /* X * As for rnint(), except we could just call that to handle the X * extra precision case, usually without losing efficiency. X */ X return ((float)(x + 0x1.8p23F) - 0x1.8p23F); Switching the rounding mode to FP_RN here fixed the large error in expf(). I don't actually use this, since the cast to float is slow. I actually use X } X X #ifdef LDBL_MANT_DIG X /* X * The complications for extra precision are smaller for rnintl() since it X * can safely assume that the rounding precision has been increased from X * its default to FP_PE on x86. We don't exploit that here to get small X * optimizations from limiting the rangle to double. We just need it for X * the magic number to work with long doubles. ld128 callers should use X * rnint() instead of this if possible. ld80 callers should prefer X * rnintl() since for amd64 this avoids swapping the register set, while X * for i386 it makes no difference (assuming FP_PE), and for other arches X * it makes little difference. X */ X static inline long double X rnintl(long double x) X { X return (x + __CONCAT(0x1.8p, LDBL_MANT_DIG) / 2 - X __CONCAT(0x1.8p, LDBL_MANT_DIG) / 2); X } X #endif /* LDBL_MANT_DIG */ X X /* X * irint() and irint64() give the same result as casting to their integer X * return type provided their arg is a floating point integer. They can X * sometimes be more efficient because no rounding is required. X */ X #if (defined(amd64) || defined(__i386__)) && defined(__GNUCLIKE_ASM) X #define irint(x) \ X (sizeof(x) == sizeof(float) && \ X sizeof(__float_t) == sizeof(long double) ? irintf(x) : \ X sizeof(x) == sizeof(double) && \ X sizeof(__double_t) == sizeof(long double) ? irintd(x) : \ X sizeof(x) == sizeof(long double) ? irintl(x) : (int)(x)) X #else X #define irint(x) ((int)(x)) X #endif X X #define i64rint(x) ((int64_t)(x)) /* only needed for ld128 so not opt. */ X X #if defined(__i386__) && defined(__GNUCLIKE_ASM) X static __inline int X irintf(float x) X { X int n; X X asm("fistl %0" : "=m" (n) : "t" (x)); X return (n); X } X X static __inline int X irintd(double x) X { X int n; X X asm("fistl %0" : "=m" (n) : "t" (x)); X return (n); X } X #endif X X #if (defined(__amd64__) || defined(__i386__)) && defined(__GNUCLIKE_ASM) X static __inline int X irintl(long double x) X { X int n; X X asm("fistl %0" : "=m" (n) : "t" (x)); X return (n); X } X #endif amd64 doesn't need irintf() or irintd() since it has SSE instructions that work better than fistl since they don't depend on the rounding mode and don't require using a different register set, and compilers use these automatically so the above optimizations are not useful. fistl uses the current rounding mode, but the rounding mode doesn't matter so much for irint*() because its arg is already an integer for its intended uses. The above is used for range reduction in trig functions too. See e_rem_pio2*.[ch]. Old versions of these used the following very slow code so they worked in all rounding modes: first, a diff to use the above: X Index: e_rem_pio2.c X =================================================================== X RCS file: /home/ncvs/src/lib/msun/src/e_rem_pio2.c,v X retrieving revision 1.25 X diff -u -2 -r1.25 e_rem_pio2.c X --- e_rem_pio2.c 17 Nov 2012 01:50:07 -0000 1.25 X +++ e_rem_pio2.c 11 Nov 2013 09:17:08 -0000 X @@ -128,12 +128,6 @@ X if(ix<0x413921fb) { /* |x| ~< 2^20*(pi/2), medium size */ X medium: X - /* Use a specialized rint() to get fn. Assume round-to-nearest. */ X - STRICT_ASSIGN(double,fn,x*invpio2+0x1.8p52); X - fn = fn-0x1.8p52; X -#ifdef HAVE_EFFICIENT_IRINT X + fn = rnint((double_t)x*invpio2); X n = irint(fn); X -#else X - n = (int32_t)fn; X -#endif X r = x-fn*pio2_1; X w = fn*pio2_1t; /* 1st round good to 85 bit */ (The STRICT_ASSIGN() here is very slow on i386. Using double_t does it it in essentially the same way. Note that rnint() returns double_t, as is best for better-written functions. The slow conversion occurs on assignment to 'double fn' here. fn and most other variables here should have type double_t to avoid conversions.) X Index: e_rem_pio2.c X =================================================================== X RCS file: /home/ncvs/src/lib/msun/src/e_rem_pio2.c,v X retrieving revision 1.1 X diff -u -2 -r1.1 e_rem_pio2.c X --- e_rem_pio2.c 19 Aug 1994 09:39:43 -0000 1.1 X +++ e_rem_pio2.c 11 Nov 2013 09:17:08 -0000 X @@ -98,15 +62,75 @@ X GET_HIGH_WORD(hx,x); /* high word of x */ X ix = hx&0x7fffffff; X +#if 0 /* Must be handled in caller. */ X if(ix<=0x3fe921fb) /* |x| ~<= pi/4 , no need for reduction */ X {y[0] = x; y[1] = 0; return 0;} X - if(ix<=0x413921fb) { /* |x| ~<= 2^19*(pi/2), medium size */ X - t = fabs(x); X - n = (int32_t) (t*invpio2+half); X - fn = (double)n; X - r = t-fn*pio2_1; X - w = fn*pio2_1t; /* 1st round good to 85 bit */ X ... X + fn = rnint((double_t)x*invpio2); X + n = irint(fn); X + r = x-fn*pio2_1; X + w = fn*pio2_1t; /* 1st round good to 85 bit */ X ... The old method is: - work with t = fabs(x) so that we know which way the rounding goes. Elswhere, working with |x| often requires another slow step to adjust things based on the sign of x. - add half. This is part of manual rounding to nearest - assign to 'int n'. The rounding of this is specified to be towards 0. This tends to be the slowest step. It may involve changing the register step. On i387 and early SSE, there is only an instruction that uses the current rounding mode, which is usually FP_RN and not known to the compiler, so the compiler must switch the rounding mode. Inefficiencies here cost at least 20 cycles (throughput) on modern x86. This is a lot when you are trying to make the whole function run in 50 cycles. The algorithm depends critically on round-to-nearest. Another algorithm is to extract the bits of x*invpio2 and adjust them in some way. The equivalent of rounding to nearest is to add a bias before right-shifting the bits to get n. Then assign n to fn. This tends to be slower because the FPU has to wait for the integer ALU to calculate n. The above produces fn as soon as possible. The integer ALU has to wait. This wait can be a bottleneck, but it is usually not as slow as making the FPU wait. exp has a chance of rounding results in the correct direction if it just does arg reduction in that direction, since it is monotic. At least for FP_RP and FP_RM. Not for FP_RZ -- rounding of the result towards zero requires rounding of the arg towards minus infinity. Trig functions have no similar chances, since they are not monotonic. Arg reduction almost always gives an arg that is too large or too small. Then the result may be too large or too small, respectively, depending on the slope of the function and the rounding mode. I don't plan to fix this. Just keep the error down to 1.5 ulps. Bruce