From owner-freebsd-standards@FreeBSD.ORG Sun Nov 4 05:26:39 2007 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D24E16A417 for ; Sun, 4 Nov 2007 05:26:39 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail15.syd.optusnet.com.au (mail15.syd.optusnet.com.au [211.29.132.196]) by mx1.freebsd.org (Postfix) with ESMTP id F225913C49D for ; Sun, 4 Nov 2007 05:26:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail15.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lA45Q3dG015936 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 4 Nov 2007 16:26:15 +1100 Date: Sun, 4 Nov 2007 16:26:03 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Steve Kargl In-Reply-To: <20071103213000.GA92030@troutmask.apl.washington.edu> Message-ID: <20071104142220.G15933@delplex.bde.org> References: <20070928152227.GA39233@troutmask.apl.washington.edu> <20071001173736.U1985@besplex.bde.org> <20071002001154.GA3782@troutmask.apl.washington.edu> <20071002172317.GA95181@VARK.MIT.EDU> <20071002173237.GA12586@troutmask.apl.washington.edu> <20071003103519.X14175@delplex.bde.org> <20071010204249.GA7446@troutmask.apl.washington.edu> <20071103144151.U671@besplex.bde.org> <20071103213000.GA92030@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-standards@freebsd.org Subject: Re: long double broken on i386? X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Nov 2007 05:26:39 -0000 On Sat, 3 Nov 2007, Steve Kargl wrote: > On Sat, Nov 03, 2007 at 03:53:19PM +1100, Bruce Evans wrote: >> On Wed, 10 Oct 2007, Steve Kargl wrote: >>> I've given up on i386. All of my attempts to make my C implementations >>> work on both amd64 and i386 have failed. The code when run on i386 >>> results in ULP's that exceed 10^5. Yes, that is 1E5. Results for amd64 >>> are reported below. As far as assembly, someone else will need to write >>> those routines because I do not know assembly. >> >> Bad errors are expressed in GULPs (Giga-ULPs) :-). Maybe you are hitting >> the compiler bugs that break double-precision arithmetic on i386's when >> the rounding precision is changed to long double. The rounding precision >> ... > I did some limiting testing of the output from k_rem_pio2.c > against argument reduction using MPFR. I found no long > double values that gave results which did not agree. I fairly > certain that the GULPs are do to improper polynomial approximations > for the 53-bit precision long doubles on i386. Rounding errors in polynomial coefficients can't possibly have that much effect. An error of 1 53-bit ULP in a coefficient is only 2^11 63-bit ULPs. Since the first 2 coefficients of cos() and the first coefficient of sin() are exact, the relative errors are further reduced by a factor of x^n/4! for cos(x) and x^3/3! for sin(x). Actual testing of messed-up Taylor coefficients using pari-gp gives the following error bounds: 17 ULPs cos(x) up to x^18/18! term with 1/18! rounded to 53 bits 0.06 ULPs ... 64 bits 0.07 ULPs ... not rounded 105 ULPs sin(x)/x up to x^18/19! term with 1/19! rounded to 53 bits 0.06 ULPs ... 64 bits 0.003 ULPs ... not rounded All arithmetic was in ~1000-bit precision. The interval was [-Pi/4, Pi/4]. The errors are much larger than I said they should be above, but still much less than 1 GULP. Not rounding for cos(x) really does give a larger max error than rounding to 64 bits. My Remez poynomial constructor rounds the coefficients during its search for the best coefficients. It is possible to compensate to some extent for unfortunate rounding of early coefficients by tweaking higher coefficients, but this is not very important and currently causes stability problems in the search. With 1 less term, errors in the Taylor approximation dominate errors in (53-bit) coefficients, and with 1 more term, errors in the 64-bit coefficients dominate errors in the Taylor approximation. >>>>> PS: There is also the minor possibility of giving bde either >>>>> a heartache or death via laughter when he finally sees what I >>>>> have done. :) >>>> >>>> I would be unhappy with any versions that aren't either: >>>> - as identical as possible in algorithm and style to the double precision >>>> versions, in the same way that the float versions are as identical as >>>> possible, or >>>> - much better in algorithm and style and efficiency and accuracy than the >>>> double precision versions. I don't want to maintain different versions. >>> >>> I'll go with your second option see comments below and attached >>> code. BTW, if I add >> >> So I'm unhappy that the first point hasn't been addressed at all. This >> gives quality < the current float and double precision versions. > > You're moving the goal. You gave me an "either/or" option. I took > the "or", so the first point is irrelevant. Well, your observation > on quality is subjective because I'd claim my (almost) KNF commented > code is much better than the fdlibm code. The second option is "MUCH better in ALL of algorithm, style, efficiency and accuracy". That is impossible. Change the emphasis to MUCH better in efficiency OR accuracy OR (simplicity without loss of the others). It is difficult to be both more efficient and more accurate, and you will need a better algorithm to be either more efficient or more accurate. MUCH better is not very subjective for efficiency or accuracy. It means not rewriting everything to gain only 10% efficiency. For accuracy, it means: - all errors < 2-3 ULPs in cases where errors were sometimes > hundreds/ thousands/millions/billions of ULPs. Note that this is still not implemented for that i387 hardware trig functions -- the hardware gives errors of many GULPs near multiples of Pi/2, and the software doesn't do anything to avoid the bad cases since avoiding the bad cases without losing efficiency is not easy. - all errors < 1 ULP in cases where errors were sometimes > 1 ULP - almost perfect rounding in cases where errors were always < 1 ULP (the max error will be about 0.51 ULPs instead of up to 0.99 ULPs) - perfect rounding if cases where there was almost perfect rounding (the max error will be 0.499... ULPs instead of 0.51 ULPs). >> The argument reduction seems to begin correctly, but then the extra >> precision which is carefully provided by k_rem_pio2.c is thrown away, >> unlike in the float and double versions which pass it to the __kernel >> functions. I would expect a loss of up to a full ULP from this, giving >> a maximum error of 1.5 ULPs (0.5 from the __kernel and 1 from the arg >> reduction). > > I had __kernel_cosl and __kernel_sinl that retained both a high > and low portion of the reduced arguments, but in the range tested > I did not find any advantage to doing the extra precision arithmetic. > These may have been discarded during my i386 GULP experience, so > I revisted the issue on amd64. Test near multiples of Pi/2. >>> My timings on the basic routines using gprof are: >>> >>> % cumulative self self total >>> time seconds seconds calls ms/call ms/call name >>> 0.6 804.42 5.67 40000000 0.00 0.00 sinl [18] >>> 0.1 934.41 0.74 19998751 0.00 0.00 __kernel_sinl [95] >>> 0.1 935.82 0.66 20001249 0.00 0.00 __kernel_cosl [96] >>> >>> where the system contains a dual-cpu opteron tyan motherboard running >>> at 2.2 GHz. >> >> That seems to be about 175 nS/call. Not too bad. (My current times on >> a 2.2 GHz A64*1 in i386 mode on small args ([-2Pi,2Pi]) are 70 nS for >> the C double version, 43 nS for the asm double version, and 15 nS for >> the C float version). > > Well, I've been more concerned about correctness over speed. I'd > need to read the grof output again, but I recall that argument > reduction was the major time consumer. Yes, even for small and medium quadrant numbers. k_rem_pio2.c optimizes for these but is still slow. My float trig functions optimize better for very small quadrant numbers, but even with double precision the corresponding optimizations don't work so well since they take much more code than the float case. >> % Index: msun/src/k_cosl.c >> % =================================================================== >> % RCS file: msun/src/k_cosl.c >> % diff -N msun/src/k_cosl.c >> % --- /dev/null 1 Jan 1970 00:00:00 -0000 >> % +++ msun/src/k_cosl.c 9 Oct 2007 21:52:32 -0000 >> % @@ -0,0 +1,62 @@ >> % +/* >> % + * Compute cos(x) on the interval [-1:1] with a polynomial approximation. >> >> Interval is larger than necessary. This might result in unnecessary >> coefficients. 10 of them seem to be too many for 64 bits of mantissa >> but not enough for 113 bits (sparc64). > > Yes, I know. My original Chebyshev interpolation code assumed a function I found that 10 is right with Taylor coefficients. Chebyshev and Remez tent to need 1 fewer. >> % Index: msun/src/k_tanl.c >> % =================================================================== >> % RCS file: msun/src/k_tanl.c >> % diff -N msun/src/k_tanl.c >> % ... >> % +#define C33 (long double) 0x1.8f939ed1883f595p-32L >> >> This has far too many terms for a 64-bit mantissa, but not enough for a >> 113-bit mantissa. IIRC, the former requires about 25 terms. > > This is partly due to the range being larger than required. Ah. tan() will be adversely affected by the larger interval much more than cos() and sin(). I found that terms up to about the x^35 one are needed for a Remez approximation in 64 bits, and up to x^61 (?) in 113 bits. >> Much more extra care is needed for an accuracy of < 1 ULP for tan*() than >> for sin*() and cos*(). > > Well, I did code both steps 3 and 4 from k_tan.c, but I saw no > statistical improvement in my testing. The problems are mainly on corner cases. >> % +long double >> % +cosl(long double x) >> % +{ >> % + union IEEEl2bits z; >> >> I want to avoid using this union... >> >> % + z.e = x; >> % + z.bits.sign = 0; >> >> ... since, while code like this is easy to write, it hard-codes the >> assumption that things are layed out in bit-fields. Access macros >> are more flexible and don't require so much compiler intelligence >> to optimize away the memory accesses that are logically required for >> bit-fields. gcc does a mediocre jobs with this optimization, >> especially for 80-bit long doubles on amd64 and i386 since the bits >> don't fit evenly in registers. > > math_private.h doesn't have GET_* or SET_* macros for use with > 64-bit long double (not to mention 128-bit long double), and > the macros in math_private.h use similar unions. AFAICT, all > supported archs have an _fpmath.h header. Adding efficient versions of them is a not easy, but temporary inefficient versions can easily use the bit-fields in union IEEEl2bits instead of direct accesses to these bit-fields. I have local versions for a couple of the macros only. I needed to add a 64-bit mantissa field in _fpmath.h for efficiency on amd64. Unfortunately, the mantissa field must be split since it has more than 64 bits on some machines, so code and/or macros have to deal with manh and manl or have separate macros args for the spit. Another problem with long doubles is that unsupported formats may be passed. These should be treated as NaNs or Infs but require extra care for classifying. >> I see parametrizing the bit-field sizes and other magic numbers (not >> quoted) for various formats of long doubles as the main difficulty >> here (trig functions are easy except for this and k_rem_pio2.c). > > Which explains why GET_* and SET_* don't exist. Although these > might be present in bdeBSD. :) GET_* and SET_* are in glibc, in the same style as the float and double versions. glibc handles parametrization mainly by having zillions of different versions. >> Note that the fdlibm __kernel code for trig functions is generic except for: >> >> - my optimizations in the float case >> - the magic number for the threshold near Pi/4 for tan() >> - the number and values of terms in the polynomial, and the scheduling >> (best scheduling depends on the number of terms, and we don't have this >> automated). >> >> Thus it is easier to clone the correct code than to rewrite it incorrectly. > > It may be easier to clone fdlibm, but I'm not sure I'd call it correct > code due to > - magic numbers > - questionable source formatting (it some of the worse code I've read) > - lack of comments - magic numbers are needed for efficiency. - fdlibm has its own style, which is not too bad when you get used to it. For better/worse style numeric code, look at pari or djb*. I haven't haven't looked so much at these at so they look worse :-). - fdlibm has more and better comments than most code. Bruce From owner-freebsd-standards@FreeBSD.ORG Mon Nov 5 11:07:06 2007 Return-Path: Delivered-To: freebsd-standards@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CCCEF16A419 for ; Mon, 5 Nov 2007 11:07:06 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id BCB8A13C48A for ; Mon, 5 Nov 2007 11:07:06 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id lA5B768p026455 for ; Mon, 5 Nov 2007 11:07:06 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id lA5B76bB026451 for freebsd-standards@FreeBSD.org; Mon, 5 Nov 2007 11:07:06 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 5 Nov 2007 11:07:06 GMT Message-Id: <200711051107.lA5B76bB026451@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-standards@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-standards@FreeBSD.org X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Nov 2007 11:07:07 -0000 Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/25542 standards /bin/sh: null char in quoted string o kern/46239 standards posix semaphore implementation errors o stand/54410 standards one-true-awk not POSIX compliant (no extended REs) o stand/82654 standards C99 long double math functions are missing o stand/94729 standards [libc] fcntl() throws undocumented ENOTTY 5 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/21519 standards sys/dir.h should be deprecated some more o bin/24390 standards Replacing old dir-symlinks when using /bin/ln s stand/24590 standards timezone function not compatible witn Single Unix Spec s kern/28260 standards UIO_MAXIOV needs to be made public s stand/36076 standards Implementation of POSIX fuser command o stand/39256 standards snprintf/vsnprintf aren't POSIX-conformant for strings p stand/41576 standards POSIX compliance of ln(1) o stand/44425 standards getcwd() succeeds even if current dir has perm 000. o stand/46119 standards Priority problems for SCHED_OTHER using pthreads o stand/54833 standards [pcvt] more pcvt deficits o stand/54839 standards [pcvt] pcvt deficits p stand/55112 standards glob.h, glob_t's gl_pathc should be "size_t", not "int o stand/56476 standards cd9660 unicode support simple hack o stand/58676 standards grantpt(3) alters storage used by ptsname(3) s stand/62858 standards malloc(0) not C99 compliant s kern/64875 standards [libc] [patch] [feature request] add a system call: fd o stand/66357 standards make POSIX conformance problem ('sh -e' & '+' command- o stand/66531 standards _gettemp uses a far smaller set of filenames than docu o stand/70813 standards [PATCH] ls(1) not Posix compliant o stand/72006 standards floating point formating in non-C locales o stand/79056 standards regex(3) regression tests a stand/80293 standards sysconf() does not support well-defined unistd values o stand/81287 standards [PATCH]: fingerd(8) might send a line not ending in CR o stand/83845 standards [libm] [patch] add log2() and log2f() support for libm o stand/85080 standards output of long double subnormals (with printf) is wron o stand/92360 standards [headers] [patch] Missing TAB3 in kernel headers o stand/92362 standards [headers] [patch] Missing SIGPOLL in kernel headers o kern/93705 standards [headers] [patch] ENODATA and EGREGIOUS (for glibc com o stand/96016 standards [headers] clock_getres et al should be in o stand/96236 standards [PATCH] [POSIX] sed.1 incorrectly describes a function p stand/99517 standards Missing SIGRTMIN and SIGRTMAX signals o stand/99960 standards [Patch] make(1): Add -p flag o stand/100017 standards [Patch] Add fuser(1) functionality to fstat(1) o stand/104743 standards [headers] [patch] Wrong values for _POSIX_ minimal lim o stand/104841 standards [libm] [patch] C99 long double square root. o stand/107561 standards [patch] Missing SUS function tcgetsid() o kern/114578 standards [libc] wide character printing using swprintf(dst, n, o stand/114633 standards /etc/rc.subr: line 511: omits a quotation mark: "force o stand/116081 standards make does not work with the directive sinclude o stand/116221 standards SUS issue -- FreeBSD has not flag WNOWAIT for wait*() o stand/116826 standards [PATCH] sh support for POSIX character classes 41 problems total. From owner-freebsd-standards@FreeBSD.ORG Mon Nov 5 12:35:23 2007 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC09A16A419 for ; Mon, 5 Nov 2007 12:35:23 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au [211.29.132.194]) by mx1.freebsd.org (Postfix) with ESMTP id 4200013C4A8 for ; Mon, 5 Nov 2007 12:35:23 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lA5CZ5rb012179 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 5 Nov 2007 23:35:08 +1100 Date: Mon, 5 Nov 2007 23:35:05 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Steve Kargl In-Reply-To: <20071103195952.GA91682@troutmask.apl.washington.edu> Message-ID: <20071105220251.C19770@delplex.bde.org> References: <20071012180959.GA36345@troutmask.apl.washington.edu> <20071103155721.K671@besplex.bde.org> <20071103195952.GA91682@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-standards@freebsd.org Subject: Re: [PATCH] hypotl, cabsl, and code removal in cabs X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Nov 2007 12:35:23 -0000 On Sat, 3 Nov 2007, Steve Kargl wrote: > On Sat, Nov 03, 2007 at 04:15:27PM +1100, Bruce Evans wrote: >> >> On Fri, 12 Oct 2007, Steve Kargl wrote: >> >>> The attached patch implements hypotl(), cabsl(), and removes >>> z_abs() from w_cabs.c. z_abs() is undocumented namespace >>> pollution in libm, and no header file under /usr/include >>> provides a prototype. The documentation has been updated, >>> and in particular an incorrect paragraph in cabs.3 has been >>> removed. >> >> I fear that that paragraph is correct again, since the long double >> version only seems to take enough care with the overflow cases. I verified that the > 1.2 ULP error is back using the same algorithm for a float-precision versions. Bugs were also easier to find in lower precision. > Ah, no. Underflow is prevented by the if() statement. > > a.e *= a.e; > m -= n; > if (m > - PREC) { > b.bits.exp += m; > a.e += b.e * b.e; > } > a.e = sqrtl(a.e); > a.bits.exp += n; > return (a.e); > > If m < -PREC, then |b|/|a| < 2**(-32), so it skips the "a.e += b.e * b.e;" > "a.e. *= a.e" can't underflow because a.e is in [0.5,1), and > "a.e = sqrtl(a.e);" can't underflow because, here, a.e is in [0.25,2). The problem has nothing to do with underflow (I don't even see underflow as a problem. Avoiding underflow is just an optimization)). The problem is to limit roundoff error in intermediate calculations so that the final error is < 1 ULP. More below. BTW, PREC = 32 seems a little low. I would have expected PREC = 33 to be needed to give 2 guard bits. However, 32 for 64-bit long double precision corresponds to 12 for 24-bit float precision, but even 6 seems to work for float precision. >> % Index: man/hypot.3 >> % ... >> % @@ -82,11 +90,6 @@ Consequently >> % exactly; >> % in general, hypot and cabs return an integer whenever an >> % integer might be expected. >> % -.Pp >> % -The same cannot be said for the shorter and faster version of hypot >> % -and cabs that is provided in the comments in cabs.c; its error can >> % -exceed 1.2 >> % -.Em ulps . > > This paragraph is useless. There is no cabs.c in the source tree, > and in particular, there is no cabs.c in an installed system. Surely, > this paragraph isn't referring to src/contrib/libf2c/libF77/cabs.c. > It has one comment with a total of one word. cabs.c is only in cvs history (libm/ieee/cabs.c). It is instructive since the ifdefed-out version (not exactly the commented-out version) gets the 1.2 ULP error in much the same way as your version, while the primary version (by Ng) has to work hard to limit the error to 0.959 ULPs. The version in fdlibm works slightly harder to reduce the error a bit more. Here is the main part of the ifdefed-out version: % if(finite(x)) % if(finite(y)) % { % x=copysign(x,one); % y=copysign(y,one); % if(y > x) % { temp=x; x=y; y=temp; } % if(x == zero) return(zero); % if(y == zero) return(x); % exp= logb(x); % x=scalb(x,-exp); % if(exp-(int)logb(y) > ibig ) % /* raise inexact flag and return |x| */ % { one+small; return(scalb(x,exp)); } % else y=scalb(y,-exp); % return(scalb(sqrt(x*x+y*y),exp)); % } It does the necessary range reduction, which takes the bulk of the code, but then just returns sqrt(x*x+y*y) (scaled). This expression gives a max error of 1.2+ ULPs for almost all implementations of sqrt() and FP arithmetic. >> It's silly that on i386's, naive double precision code would work if >> the default precision were extended, but the double precision code >> isn't naive; OTOH, long double precision code can never depend on extra >> precision, but is naive. > > I don't understand to what the above comment is directed. With extra precision in hardware, the naive expression sqrt(x*x+y*y) has an error of < 1 ULP, but you propose to use the naive expression in precisely the case where there is no extra precision in hardware. >> % Index: src/e_hypotl.c >> % ... >> % + if (n >= m) { >> % + a.e *= a.e; >> % + m -= n; >> % + if (m > - PREC) { >> % + b.bits.exp += m; >> % + a.e += b.e * b.e; >> % + } >> % + a.e = sqrtl(a.e); >> % + a.bits.exp += n; >> % + return (a.e); >> >> The usual case reduces to just sqrtl(x*x + y*y). > > See my comment above. The usual case isn't the important case. > It's the unusual case that is of concern. The above code prevents > the unusual, yet provides acceptable performance in the usual case. Well, even the ifdefed-out version handled the unusual cases. >> fdlibm is more careful in this case (but similar in others): >> >> % * So, compute sqrt(x*x+y*y) with some care as >> % * follows to get the error below 1 ulp: >> % * >> % * Assume x>y>0; >> % * (if possible, set rounding to round-to-nearest) >> % * 1. if x > 2y use >> % * x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y >> % * where x1 = x with lower 32 bits cleared, x2 = x-x1; else >> % * 2. if x <= 2y use >> % * t1*y1+((x-y)*(x-y)+(t1*y2+t2*y)) >> % * where t1 = 2x with lower 32 bits cleared, t2 = 2x-t1, >> % * y1= y with lower 32 bits chopped, y2 = y-y1. >> >> I don't understand the details of this, and couldn't find any cases >> where this makes a difference. Testing of 2-arg functions is hard >> since exhaustive testing is impossible even for floats, and the >> cases where this matters for hypot() seem to be sparse. > > The above seems more complicated than needed when one considers > that x and y can be easily split into |x|=a*2**n and |y|=b*2**m > where f,g are in [0.5,1). One then has the comment that is > found in the code: Now I understand the details. I left out the most important part of the comment: % * Method : % * If (assume round-to-nearest) z=x*x+y*y % * has error less than sqrt(2)/2 ulp, than % * sqrt(z) has error less than 1 ulp (exercise). % * Do this exercise. Hints: o reduce to x in [1, 4) and sqrt(x) in [1, 2) o all estimates can be made in infinite precision o the relative and absolute sizes of 1 ULP depend on x and sqrt(x) o the worst case is x slightly larger than 2 If we compute x*x+y*y naively, then we should expect a max error of 1.5 ULPs (0.5 for each operation). The actual max error seems to be 1.0 ULPs (I don't understand why yet). Even 1 ULP is too much, since it is larger than the value of sqrt(2)/2 given by the exercise. The final error is: 0.5 + N / (sqrt(2)/2) where 0.5 ULPs of error result from rounding of sqrt(x*x+y*y) and N ULPs of error in x*x+y*y contribute less than N ULPs of error to the final result (according to the scale factor of 1/(sqrt(2)/2), provided N is not very large -- note that linear scaling here and in the exercise gives an upper bound for the error, essentially via the linear term in a Taylor approximation -- we can simply drop the higher terms without breaking the estimate since log() is convex). N = 1 gives the magic number 1.2: 0.5 + 1 / (sqrt(2)/2) ~= 0.5 + 0.7071... = 1.2071... I observed an error of 1.202... in practice for float precision in practice. The cases with a large error are sparse, so it would be hard to find them by testing in long double precision. For a final error of > 1 ULP, x and y must be such that: - x*x+y*y has an error of > sqrt(2)/2 ULPs - x*x+y*y must be fairly near the problem value of 2+ (else the relative size of an ULP shrinks in the domain and/or range, so sqrt() compresses errors more and there is no problem - x*x+y*y must be very near a half-way case for sqrt(). This analysis also explains why missing guard bits for x*x+y*y don't seem to be a problem. We only ignore y*y (or x*x) if this term is <= 2**-(2*PREC) relative to the other term. Then we may have an error of more than 0.5 ULPs for the addition, but we only have a tiny error for the ignored term so the combined error in x*x+y*y is < 1 ULP, which is no worse than other worst cases. I found only 1 serious bug n your hypotl(): adding n or m back to the exponent gives garbage if the resulting exponent doesn't fit. Fix for the float precision case: % if (n >= m) { % a.e *= a.e; % m -= n; % if (m > - PREC) { % b.bits.exp += m; % a.e += b.e * b.e; % } % a.e = sqrtf(a.e); % if (a.bits.exp + n <= 0 || a.bits.exp + n >= 0xff) { % a.bits.exp += n - n / 2; % SET_FLOAT_WORD(t, 0x3f800000 + ((n / 2) << 23)); % a.e *= t; % } else % a.bits.exp += n; % return (a.e); % } else { % b.e *= b.e; % n -= m; % if (n > - PREC) { % a.bits.exp += n; % b.e += a.e * a.e; % } % b.e = sqrtf(b.e); % if (b.bits.exp + m <= 0 || b.bits.exp + m >= 0xff) { % b.bits.exp += m - m / 2; % SET_FLOAT_WORD(t, 0x3f800000 + ((m / 2) << 23)); % b.e *= t; % } else % b.bits.exp += m; % return (b.e); % } Simpler and slower code can use a multiplier of powf(2.0F, n), etc. fdlibm hupot() and hypotf() use dynamically constructed multpliers like the SET_FLOAT_WORD()s in the above, but require less code than the above for various reasons: - a and b are swapped to make a >= b so as to reduce duplication - scaling is different. In the above, n is decomposed into halves since the full n sometimes doesn't fit in the exponent either. fdlibm always ends up with a medium-sized n (spelled k) so that rescaling is easier. The different scaling may be responsible for fdlibm being significantly faster (though not in the above -- the no-fit case is rare). Bruce From owner-freebsd-standards@FreeBSD.ORG Mon Nov 5 20:14:54 2007 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5EE3816A418 for ; Mon, 5 Nov 2007 20:14:54 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx03.syd.optusnet.com.au (fallbackmx03.syd.optusnet.com.au [211.29.133.136]) by mx1.freebsd.org (Postfix) with ESMTP id A5A2B13C4AA for ; Mon, 5 Nov 2007 20:14:53 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au [211.29.133.51]) by fallbackmx03.syd.optusnet.com.au (8.12.11.20060308/8.12.11) with ESMTP id lA5D59qK006493 for ; Tue, 6 Nov 2007 00:05:09 +1100 Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lA5D44n3013663 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 6 Nov 2007 00:04:07 +1100 Date: Tue, 6 Nov 2007 00:04:04 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Bruce Evans In-Reply-To: <20071105220251.C19770@delplex.bde.org> Message-ID: <20071105234202.T19976@delplex.bde.org> References: <20071012180959.GA36345@troutmask.apl.washington.edu> <20071103155721.K671@besplex.bde.org> <20071103195952.GA91682@troutmask.apl.washington.edu> <20071105220251.C19770@delplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-standards@freebsd.org, Steve Kargl Subject: Re: [PATCH] hypotl, cabsl, and code removal in cabs X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Nov 2007 20:14:54 -0000 On Mon, 5 Nov 2007, Bruce Evans wrote: > I found only 1 serious bug n your hypotl(): adding n or m back to the > exponent gives garbage if the resulting exponent doesn't fit. Fix for > the float precision case: > > % if (n >= m) { > ... Full patch to convert to float precision and fix some bugs: % --- z~ Mon Nov 5 23:40:51 2007 % +++ z Mon Nov 5 23:41:32 2007 % @@ -27,6 +27,6 @@ % /* % * Compute hypotl(x, y) = sqrt(x**2 + y**2) such that underflow and % - * overflow are avoided. To accomplishe the task, write x = a * 2**n % - * and y = b * 2**m, one then has % + * overflow are avoided. To accomplish the task, write x = a * 2**n % + * and y = b * 2**m; one then has % * % * hypotl(x, y) = 2**n * sqrt(a**2 + b**(2*(m - n))) for n >= m I didn't converrt the comments to float precision. % @@ -42,9 +42,10 @@ % * hypotl(NaN, NaN) = hypotl(NaN, NaN) = Inf % */ % + % #include "math.h" % #include "math_private.h" % #include "fpmath.h" % % -#define ZERO 0.L % +static const float ZERO = 0.0; % % /* ZERO must be a non-literal constant to prevent evaluation of 1.0/0.0 at runtime. But due to compiler bugs/lack of support for C99/fenv, the above is probably no different. % @@ -53,12 +54,21 @@ % * function sqrtl() function exists. % */ % -#define PREC 32 % +#define PREC 12 /* XXX 6 works, but why isn't > 12 needed? */ % + % +union IEEEfbits { % + float e; % + struct { % + unsigned int man :23; % + unsigned int exp :8; % + unsigned int sign :1; % + } bits; % +}; I forgot that there was a suitable struct IEEEf2bits already defined in fpmath.h. % % -long double % -hypotl(long double x, long double y) % +float % +myhypotf(float x, float y) % { % - union IEEEl2bits a, b; % + union IEEEfbits a, b; % + float t; % int m, n; % - long double val; % % a.e = x; val is not used. % @@ -68,6 +78,8 @@ % % /* If x = 0 or y = 0, then hypotl(x, y) = |x| + |y|. */ % - if (!(a.bits.manh | a.bits.manl) || !(b.bits.manh | b.bits.manl)) % + if ((a.bits.exp == 0 && a.bits.man == 0) || % + (b.bits.exp == 0 && b.bits.man == 0)) % return (a.e + b.e); % + % /* % * If x = +-Inf or y = +- Inf, then hypotl(x, y) = Inf (even if the % @@ -75,11 +87,31 @@ % * argument is not +- Inf, then hypotl(x, y) = NaN. % */ % - if (a.bits.exp == 32767 || b.bits.exp == 32767) { % - mask_nbit_l(a); % - mask_nbit_l(b); % - if (!(a.bits.manh | a.bits.manl) || % - !(b.bits.manh | b.bits.manl)) % - return (1.L / ZERO); % - return ((x - x) / (x - x)); % + if (a.bits.exp == 0xff || b.bits.exp == 0xff) { % + if ((a.bits.exp == 0xff && a.bits.man == 0) || % + (b.bits.exp == 0xff && b.bits.man == 0)) % + return (1.0F / ZERO); % + /* % + * Here we do extra work to return the same NaN as fdlibm % + * on i386's, so that simple binary comparisons work. If % + * both a.e and b.e are NaNs, we want to return the one % + * that is highest as an integer. The height depends on % + * whether the hardware has set the qNaN bit, and whether % + * the hardware gets a chance to do that depends on the % + * optimization level and on whether our comparison function % + * has higher precision, since passing floats in integer % + * registers prevents quieting and converting floats to % + * higher precision uses the hardware. % + */ % + if (a.bits.exp == 0xff && b.bits.exp == 0xff) { % +#if 1 /* if call to comparison function will quieten */ % + if (a.bits.man != 0) % + a.bits.man |= 0x00400000; /* quieten */ % + if (b.bits.man != 0) % + b.bits.man |= 0x00400000; % +#endif % + if (a.bits.man < b.bits.man) % + return (b.e + a.e); % + } % + return (a.e + b.e); % } % Return the same value as fdlibm for testing. We should at least return some combination of both x and y in the NaN case, not just ((x - x) / (x - x)) when only y is a NaN, since when x is not a NaN the latter is unrelated to the only NaN arg. fdlibm sets the signs to 0 early similarly, so it tends to convert negative NaNs to positive ones in a way that the hardware would not do. Hacking on values using bit-fields or SET_*() tends to allow changes to NaNs that the hardware would not do. The above was written mainly on an A64 in i386 mode. In amd64 mode, I think loading sNaNs doesn't quieten them, so there would be even more binary differences. % @@ -90,16 +122,15 @@ % */ % if (a.bits.exp == 0) { % - a.e *= 0x1.0p514; % - n = a.bits.exp - 0x4200; % + a.e *= 0x1.0p25; % + n = a.bits.exp - 0x97; % } else % - n = a.bits.exp - 0x3ffe; % - a.bits.exp = 0x3ffe; % - % + n = a.bits.exp - 0x7e; % + a.bits.exp = 0x7e; % if (b.bits.exp == 0) { % - b.e *= 0x1.0p514; % - m = b.bits.exp - 0x4200; % + b.e *= 0x1.0p25; % + m = b.bits.exp - 0x97; % } else % - m = b.bits.exp - 0x3ffe; % - b.bits.exp = 0x3ffe; % + m = b.bits.exp - 0x7e; % + b.bits.exp = 0x7e; % % if (n >= m) { I'm not sure why this uses 514. I first tried converting 514 to 34, to get a nice round number instead of 0x97. This didn't work, but the problems may have been just the rescaling bugs later. fdlibm normalizes to a non-constant exponent. This would be more complicated here but simpler later. % @@ -110,6 +141,11 @@ % a.e += b.e * b.e; % } % - a.e = sqrtl(a.e); % - a.bits.exp += n; % + a.e = sqrtf(a.e); % + if (a.bits.exp + n <= 0 || a.bits.exp + n >= 0xff) { % + a.bits.exp += n - n / 2; % + SET_FLOAT_WORD(t, 0x3f800000 + ((n / 2) << 23)); % + a.e *= t; % + } else % + a.bits.exp += n; % return (a.e); % } else { % @@ -120,6 +156,11 @@ % b.e += a.e * a.e; % } % - b.e = sqrtl(b.e); % - b.bits.exp += m; % + b.e = sqrtf(b.e); % + if (b.bits.exp + m <= 0 || b.bits.exp + m >= 0xff) { % + b.bits.exp += m - m / 2; % + SET_FLOAT_WORD(t, 0x3f800000 + ((m / 2) << 23)); % + b.e *= t; % + } else % + b.bits.exp += m; % return (b.e); % } Fix rescaling. Bruce