From owner-freebsd-numerics@FreeBSD.ORG Mon Dec 2 11:06:51 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 00AB1B7C for ; Mon, 2 Dec 2013 11:06:50 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D93311955 for ; Mon, 2 Dec 2013 11:06:50 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id rB2B6o5I007807 for ; Mon, 2 Dec 2013 11:06:50 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id rB2B6oma007805 for freebsd-numerics@FreeBSD.org; Mon, 2 Dec 2013 11:06:50 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 2 Dec 2013 11:06:50 GMT Message-Id: <201312021106.rB2B6oma007805@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-numerics@FreeBSD.org Subject: Current problem reports assigned to freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 11:06:51 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o stand/175811 numerics libstdc++ needs complex support in order use C99 o bin/170206 numerics [msun] [patch] complex arcsinh, log, etc. o stand/82654 numerics C99 long double math functions are missing 3 problems total. From owner-freebsd-numerics@FreeBSD.ORG Mon Dec 2 21:50:39 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 67D73E4A for ; Mon, 2 Dec 2013 21:50:39 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 37F151B40 for ; Mon, 2 Dec 2013 21:50:39 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id rB2LgehS044225 for ; Mon, 2 Dec 2013 13:42:40 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id rB2LgejX044224 for freebsd-numerics@freebsd.org; Mon, 2 Dec 2013 13:42:40 -0800 (PST) (envelope-from sgk) Date: Mon, 2 Dec 2013 13:42:40 -0800 From: Steve Kargl To: freebsd-numerics@freebsd.org Subject: erff, erfl, and erfcl patch Message-ID: <20131202214240.GA44209@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 21:50:39 -0000 I intend to commit the patch that follows by sig in the next day or two. In particular, I want to get the ld80 and ld128 versions of erfl and erfcl into tree as the current hack provided by imprecise.c is ugly. * msun/Makefile: . Add s_erfl.c. . Remove nearby extra space. . Add MLINKS for erfl.3 and erfcl.3. * msun/Symbol.map: . Sort erfl and erfcl into proper order. * msun/ld128/s_erfl.c: . Implementations for erfl and erfcl in IEEE 128-bit long double format. * msun/ld80/s_erfl.c: . Implementations for erfl and erfcl in Intel 80-bit long double format. * msun/imprecise.c: . Remove hack that maps erfl to erf and erfcl to erfc. * msun/src/s_erf.c: . Fix whitespace issues. . Add weak references for erfl and erfcl on targets where long double has 53 bits of precisions. * msun/src/s_erff.c: . Consistently use lower case in hex constants. . Update the rational approximations. . Fix descriptions of the the domain and range. . Remove leading zeros in exponents, e.g., 1.28379166e-01 becomes 1.28379166e-1. -- Steve Index: Makefile =================================================================== --- Makefile (revision 258855) +++ Makefile (working copy) @@ -99,9 +99,9 @@ e_fmodl.c e_hypotl.c e_remainderl.c e_sqrtl.c \ invtrig.c k_cosl.c k_sinl.c k_tanl.c \ s_asinhl.c s_atanl.c s_cbrtl.c s_ceill.c s_cosl.c s_cprojl.c \ - s_csqrtl.c s_exp2l.c s_expl.c s_floorl.c s_fmal.c \ + s_csqrtl.c s_erfl.c s_exp2l.c s_expl.c s_floorl.c s_fmal.c \ s_frexpl.c s_logbl.c s_logl.c s_nanl.c s_nextafterl.c \ - s_nexttoward.c s_remquol.c s_rintl.c s_roundl.c s_scalbnl.c \ + s_nexttoward.c s_remquol.c s_rintl.c s_roundl.c s_scalbnl.c \ s_sinl.c s_tanl.c s_truncl.c w_cabsl.c .endif @@ -162,7 +162,8 @@ MLINKS+=cos.3 cosf.3 cos.3 cosl.3 MLINKS+=cosh.3 coshf.3 MLINKS+=csqrt.3 csqrtf.3 csqrt.3 csqrtl.3 -MLINKS+=erf.3 erfc.3 erf.3 erff.3 erf.3 erfcf.3 +MLINKS+=erf.3 erfc.3 erf.3 erff.3 erf.3 erfcf.3 \ + erf.3 erfl.3 erf.3 erfcl.3 MLINKS+=exp.3 expm1.3 exp.3 expm1f.3 exp.3 expm1l.3 exp.3 pow.3 exp.3 powf.3 \ exp.3 exp2.3 exp.3 exp2f.3 exp.3 exp2l.3 exp.3 expf.3 exp.3 expl.3 MLINKS+=fabs.3 fabsf.3 fabs.3 fabsl.3 Index: Symbol.map =================================================================== --- Symbol.map (revision 258855) +++ Symbol.map (working copy) @@ -264,6 +264,8 @@ ctanf; ctanh; ctanhf; + erfcl; + erfl; expl; expm1l; log10l; @@ -272,8 +274,6 @@ logl; /* Implemented as weak aliases for imprecise versions */ coshl; - erfcl; - erfl; lgammal; powl; sinhl; Index: ld128/s_erfl.c =================================================================== --- ld128/s_erfl.c (revision 0) +++ ld128/s_erfl.c (working copy) @@ -0,0 +1,347 @@ +/* @(#)s_erf.c 5.1 93/09/24 */ +/* + * ==================================================== + * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. + * + * Developed at SunPro, a Sun Microsystems, Inc. business. + * Permission to use, copy, modify, and distribute this + * software is freely granted, provided that this notice + * is preserved. + * ==================================================== + */ + +#include +__FBSDID("$FreeBSD"); + +/* + * See s_erf.c for complete comments. + * + * Converted to long double by Steven G. Kargl. + */ +#include +#ifdef __i386__ +#include +#endif + +#include "fpmath.h" +#include "math.h" +#include "math_private.h" + +/* Sloppy limits. */ +#define LIM0 2.85715 /* ~ 1. / 0.35 */ +#define LIM1 9 /* x**2 = - log(sqrt(pi) * x) - (1 - p) * log(2) */ +#define LIM2 108 /* x**2 = - log(sqrt(pi) * x) - emin */ + +/* XXX Prevent compilers from erroneously constant folding these: */ +static const volatile long double tiny = 0x1p-10000L; + +static const long double +half= 0.5L, +one = 1, +two = 2; +/* + * Domain [0, 0.84375], range ~[-2.076e-38, 2.074e-38]: + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-126 + */ +static const long double +efx = 1.28379167095512573896158903121545167e-1L, /* 0xecbff6a7, 0x481dd788, 0xb64d21a8, 0xeb06fc3f */ +efx8= 1.02703333676410059116927122497236133L, /* 0xecbff6a7, 0x481dd788, 0xb64d21a8, 0xeb06ff3f */ +pp0 = 1.28379167095512573896158903121545167e-1L, /* 0x3ffc06eb, 0xa8214db6, 0x88d71d48, 0xa7f6bfec */ +pp1 = -3.14930453779220199897729762882897733e-1L, /* 0xbffd427d, 0x20fdfc19, 0x5395ba59, 0x26b231dc */ +pp2 = -5.27541911766414675560926940708709654e-2L, /* 0xbffab029, 0x8eba9454, 0xd7a92b48, 0x80023341 */ +pp3 = -1.13207381741395209251907161443436107e-2L, /* 0xbff872f5, 0x3c1cbcad, 0x90c8a6f3, 0x2b227613 */ +pp4 = -9.18720900388670035443046669196822569e-4L, /* 0xbff4e1ac, 0xa1c543f3, 0x522d6f94, 0x4ebd756e */ +pp5 = -7.87594031651048602426379053816416599e-5L, /* 0xbff14a57, 0x43e4e927, 0xd8815b94, 0xc2cd92da */ +pp6 = -3.42417294394664810489445209450592648e-6L, /* 0xbfeccb95, 0xafbd3483, 0x5b58d3e7, 0x712e6f9f */ +pp7 = -1.37340867367364131528690800390865353e-7L, /* 0xbfe826ef, 0xf0b8c986, 0xe445805b, 0x60654151 */ +pp8 = -2.71188885190546726082043702148966261e-9L, /* 0xbfe274b8, 0x1b3eee47, 0x1739ceb3, 0x2217b257 */ +pp9 = -3.38021083561888762328182153549042469e-11L, /* 0xbfdc2953, 0x94cffcc1, 0x249ea8a2, 0x2152a05a */ +qq1 = 4.76681198648751759882307858251781949e-1L, /* 0x3ffde81f, 0x1dbb4203, 0xf3a919ed, 0x9d4721ed */ +qq2 = 1.06717237624212542329851335448543918e-1L, /* 0x3ffbb51d, 0x22583f4c, 0x480ed405, 0x2b999881 */ +qq3 = 1.47756708414471579803258620664775360e-2L, /* 0x3ff8e42b, 0x4f852e54, 0x30ab6e4d, 0x5ffcefe3 */ +qq4 = 1.39951717291319836305643111144723543e-3L, /* 0x3ff56ee0, 0x01f38f9f, 0x5ee018dc, 0x355c6b48 */ +qq5 = 9.44415276581511130343426049763432505e-5L, /* 0x3ff18c1d, 0xd1738632, 0xf03254be, 0x6a6e2071 */ +qq6 = 4.56270013341289983962750459103111817e-6L, /* 0x3fed3232, 0x97672910, 0xc334df35, 0xfe01507b */ +qq7 = 1.53049464020884224321556705278110797e-7L, /* 0x3fe848ab, 0xd52911d5, 0xabe051cc, 0x8abf1cd2 */ +qq8 = 3.25623515051818985116653695841960050e-9L, /* 0x3fe2bf88, 0x96d22da3, 0x0207a0ee, 0x17463bb2 */ +qq9 = 3.37510289528866336760710014443284362e-11L; /* 0x3fdc28e0, 0x8f8a8bcc, 0x6cc0a227, 0x6fce7fe7 */ +/* + * Domain [0.84375, 1.25], range ~[-2.372e-36, 2.373e-36]: + * |(erf(x) - erx) - p(x)/q(x)| < 2**-120 + */ +static const long double +erx = 8.42700792949714894142232424201210961e-1L, /* 0x3ffeaf76, 0x7a741088, 0xb0000000, 0x00000000 */ +pa0 = -2.48010117891186017025877408964183854e-17L, /* 0xbfc7c97f, 0x77812279, 0x6c87b01b, 0xec38d68e */ +pa1 = 4.15107497420594680851927951552407784e-1L, /* 0x3ffda911, 0xf096fbc2, 0x5562ff1b, 0x3b544e2b */ +pa2 = -3.87083915464088350525375137436107590e-2L, /* 0xbffa3d19, 0x6178b90c, 0x2c86df33, 0x5319973e */ +pa3 = 4.44954959590143486203056389208894541e-2L, /* 0x3ffa6c81, 0xd17ed314, 0xdcacac2f, 0xc58a3516 */ +pa4 = 8.05166892893185510859535256028511853e-2L, /* 0x3ffb49cb, 0xde347a21, 0x08774372, 0xabfb5f6a */ +pa5 = -1.03051186773465597646984141058502267e-2L, /* 0xbff851ad, 0x99d9ad55, 0xa4289298, 0xc77d3e5a */ +pa6 = 5.72604838819243779197220502236739196e-3L, /* 0x3ff77743, 0x2690068e, 0x45e8dce5, 0x9ca78cd1 */ +pa7 = 1.22625964697528485542108776707051120e-3L, /* 0x3ff54174, 0xe4521657, 0x9b318890, 0x918d9b04 */ +pa8 = 5.39388406655659100369616014543606682e-4L, /* 0x3ff41acb, 0x7c880b32, 0x296affe7, 0xed6f9a01 */ +pa9 = -1.98335880680615762490364057251014162e-4L, /* 0xbff29ff0, 0xc3e06ed7, 0x3227ca50, 0x5db44d09 */ +pa10= 6.20889395425360375062078472544482827e-5L, /* 0x3ff1046b, 0x7dbeee60, 0xd2792787, 0x19c9aa7b */ +pa11= -5.40647780776317096102550495545612307e-6L, /* 0xbfed6ad2, 0x94dc1b75, 0xedf4970c, 0x9f2fcd17 */ +qa1 = 9.06750921660206127928124875031300980e-1L, /* 0x3ffed041, 0xa8244c00, 0xc4182e08, 0x27bd05f1 */ +qa2 = 6.80607885383663092673769524543423470e-1L, /* 0x3ffe5c78, 0xa3023ebe, 0xcb5d7d74, 0x6465c592 */ +qa3 = 4.05656787299304601494351680677926251e-1L, /* 0x3ffd9f64, 0x7e2b675d, 0x307e6db8, 0x612204de */ +qa4 = 1.69503824422159751714827838146178306e-1L, /* 0x3ffc5b24, 0xd2338554, 0x5192bb8c, 0x0e186eff */ +qa5 = 7.46584241233528910281045656042597495e-2L, /* 0x3ffb31cd, 0x081fb0f0, 0x878f1e11, 0xce245503 */ +qa6 = 2.03534629098115442535903549876232764e-2L, /* 0x3ff94d78, 0x9c63b619, 0xed805e2c, 0x9467d788 */ +qa7 = 6.96756479371300019285814024610131574e-3L, /* 0x3ff7c8a0, 0x56ebf85a, 0x78d461d0, 0x5ac7c4fe */ +qa8 = 1.13424099180080891902336864678392922e-3L, /* 0x3ff52955, 0x9fdcbd9d, 0xc035851a, 0xfa944ed7 */ +qa9 = 3.15259605619527194057660981542654259e-4L, /* 0x3ff34a92, 0xdb225912, 0x7b9010de, 0x3fe39023 */ +qa10= 1.18798605497472051761873112336132064e-5L, /* 0x3fee8e9f, 0x399f21c2, 0x4e53cc38, 0x5e6e1c31 */ +qa11= 4.65215947978206806357776624485104779e-6L, /* 0x3fed3833, 0x7dc4e756, 0x4c73bce3, 0x43a896cc */ +qa12= -1.02073578608782110388686097137679971e-6L; /* 0xbfeb1200, 0x6dd9de6c, 0x647bcf5b, 0xec577463 */ +/* + * Domain [1.25,2.85715], range ~[-2.932e-37,2.932e-37]: + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-121 + */ +static const long double +ra0 = -9.86494292470069026041027804612925719e-3L, /* 0xbff84341, 0x239e8709, 0xeac6f011, 0x288a805d */ +ra1 = -1.13579789657313713897975924115138941L, /* 0xbfff22c3, 0xa6a4a5af, 0xd9327434, 0x30d5c80e */ +ra2 = -4.89737449616464986815099118405280362e+1L, /* 0xc00487ca, 0x3acc6754, 0xf6b0cba0, 0x968544a8 */ +ra3 = -1.10763525821733005976688912399857907e+3L, /* 0xc00914e8, 0xa81214fc, 0xa9daec12, 0x04a27e27 */ +ra4 = -1.49985906868704004330240577932725368e+4L, /* 0xc00cd4b4, 0xb9ba09b4, 0x5f2150b4, 0xe9d43451 */ +ra5 = -1.29799869868909118595022585186399503e+5L, /* 0xc00ffb07, 0xdeafba94, 0x78b8464d, 0x2243a38a */ +ra6 = -7.42786170818211875353524648807020262e+5L, /* 0xc0126ab0, 0x457757c1, 0x321b6122, 0xfbf31ed7 */ +ra7 = -2.85617550342537589290678185465422207e+6L, /* 0xc0145ca7, 0x7c0703e2, 0x2b7e096e, 0x6f5a3f5d */ +ra8 = -7.40614267250515630140845919774792706e+6L, /* 0xc015c408, 0xfab0a531, 0x12d2dc13, 0xa2b8704b */ +ra9 = -1.28641241327410554128542575661432164e+7L, /* 0xc0168894, 0xf843f6a2, 0xb7ab6219, 0x744c300c */ +ra10= -1.47182276924276357440551219889149080e+7L, /* 0xc016c12a, 0x276285e0, 0x04bc1105, 0xe4e10e3c */ +ra11= -1.07799899573272478359709008914120358e+7L, /* 0xc01648fa, 0xabea26cc, 0x0a0ce4a0, 0xb7ec8778 */ +ra12= -4.83480214872222791837505030943085341e+6L, /* 0xc0152717, 0xc8984aa3, 0xc46405f6, 0xa0f6f640 */ +ra13= -1.23938051019472227666411457770036806e+6L, /* 0xc0132e95, 0x4829c1f0, 0xec523a33, 0x776ccfd3 */ +ra14= -1.62262886680556039283486133222303718e+5L, /* 0xc0103ceb, 0x717ebf9b, 0x180a6a70, 0x34417b99 */ +ra15= -8.82734130841920962111981230171892874e+3L, /* 0xc00c13da, 0xbaffe892, 0xd6975b67, 0x05b7a992 */ +ra16= -1.22568436072739969460925172985178212e+2L, /* 0xc005ea46, 0x141b1923, 0xaea7e61c, 0x7207a8df */ +sa1 = 6.44502356908054050802853406380055114e+1L, /* 0x400501cd, 0x0a95be01, 0x360afc70, 0x2d5f256b */ +sa2 = 1.76114826500718135746343166847984357e+3L, /* 0x4009b849, 0x7d2c833f, 0x0c758b06, 0x07d73e83 */ +sa3 = 2.69439831921639091939469234421147772e+4L, /* 0x400da4ff, 0xeec9ed36, 0xb19f5f0d, 0x6e5ff0b1 */ +sa4 = 2.56815637017014469128294345646768041e+5L, /* 0x4010f597, 0xd189c606, 0x11fac05c, 0x9dda4b30 */ +sa5 = 1.60638682704348688266145179445842807e+6L, /* 0x4013882f, 0x2d3b91f3, 0x887e4dcf, 0x8f990fab */ +sa6 = 6.76918282227476395975704195544831096e+6L, /* 0x40159d28, 0x7b4a0265, 0x4e221317, 0x8f19f36b */ +sa7 = 1.94280485423922835890016575512146130e+7L, /* 0x40172872, 0xd08ada38, 0x7f9e15c4, 0x1fbf1deb */ +sa8 = 3.79740375883744863135748410413555680e+7L, /* 0x401821b8, 0x0acb4fda, 0xec4240d5, 0x38ffcf2c */ +sa9 = 5.00608194437408776705107810210078804e+7L, /* 0x40187def, 0x09b8cc80, 0x46bf2089, 0x9ec255ed */ +sa10= 4.36435715828395309369777169953877366e+7L, /* 0x40184cf9, 0x59ca9a7c, 0x5a185bbc, 0x2c480858 */ +sa11= 2.43748298478043040399488401407107134e+7L, /* 0x401773ee, 0x2dd909b3, 0xef427684, 0xb0127a72 */ +sa12= 8.30614655253558095273324523461136470e+6L, /* 0x4015faf7, 0x8a35cbe3, 0x2845fd7e, 0x1c2d314d */ +sa13= 1.60136068063256826643328510848675710e+6L, /* 0x401386f5, 0x0ae3def9, 0xd4bfc49f, 0x9a3fcbe4 */ +sa14= 1.54229628674911158232902134264852520e+5L, /* 0x40102d3a, 0xd0786b63, 0x9ec2ab01, 0xdffa186c */ +sa15= 5.87842946700351712242359416526459677e+3L, /* 0x400b6f66, 0xdf18caed, 0x13140abd, 0x0b881c24 */ +sa16= 4.97176688590085185068333226860548624e+1L; /* 0x40048dbd, 0xc92bb664, 0xce7d48ce, 0x6380778e */ +/* + * Domain [2.85715,9], range ~[-7.886e-37,7.918e-37]: + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-120 + */ +static const long double +rb0 = -9.86494292470008707171371994479162369e-3L, /* 0xbff84341, 0x239e86f4, 0x2f57e561, 0xf4469360 */ +rb1 = -1.57047326624110727986326503729442830L, /* 0xbfff920a, 0x8935bf73, 0x8803b894, 0x4656482d */ +rb2 = -1.03228196364885474342132255440317065e+2L, /* 0xc0059ce9, 0xac4ed0ff, 0x2cff0ff7, 0x5e70d1ab */ +rb3 = -3.74000570653418227179358710865224376e+3L, /* 0xc00ad380, 0x2ebf7835, 0xf6b07ed2, 0x861242f7 */ +rb4 = -8.35435477739098044190860390632813956e+4L, /* 0xc00f4657, 0x8c3ae934, 0x3647d7b3, 0x80e76fb7 */ +rb5 = -1.21398672055223642118716640216747152e+6L, /* 0xc0132862, 0x2b8761c8, 0x27d18c0f, 0x137c9463 */ +rb6 = -1.17669175877248796101665344873273970e+7L, /* 0xc0166719, 0x0b2cea46, 0x81f14174, 0x11602ea5 */ +rb7 = -7.66108006086998253606773064264599615e+7L, /* 0xc019243f, 0x3c26f4f0, 0x1cc05241, 0x3b953728 */ +rb8 = -3.32547117558141845968704725353130804e+8L, /* 0xc01b3d24, 0x42d8ee26, 0x24ef6f3b, 0x604a8c65 */ +rb9 = -9.41561252426350696802167711221739746e+8L, /* 0xc01cc0f8, 0xad23692a, 0x8ddb2310, 0xe9937145 */ +rb10= -1.67157110805390944549427329626281063e+9L, /* 0xc01d8e88, 0x9a903734, 0x09a55fa3, 0xd205c903 */ +rb11= -1.74339631004410841337645931421427373e+9L, /* 0xc01d9fa8, 0x77582d2a, 0xc183b8ab, 0x7e00cb05 */ +rb12= -9.57655233596934915727573141357471703e+8L, /* 0xc01cc8a5, 0x460cc685, 0xd0271fa0, 0x6a70e3da */ +rb13= -2.26320062731339353035254704082495066e+8L, /* 0xc01aafab, 0xd7d76721, 0xc9720e11, 0x6a8bd489 */ +rb14= -1.42777302996263256686002973851837039e+7L, /* 0xc016b3b8, 0xc499689f, 0x2b88d965, 0xc32414f9 */ +sb1 = 1.08512869705594540211033733976348506e+2L, /* 0x4005b20d, 0x2db7528d, 0x00d20dcb, 0x858f6191 */ +sb2 = 5.02757713761390460534494530537572834e+3L, /* 0x400b3a39, 0x3bf4a690, 0x3025d28d, 0xfd40a891 */ +sb3 = 1.31019107205412870059331647078328430e+5L, /* 0x400fffcb, 0x1b71d05e, 0x3b28361d, 0x2a3c3690 */ +sb4 = 2.13021555152296846166736757455018030e+6L, /* 0x40140409, 0x3c6984df, 0xc4491d7c, 0xb04aa08d */ +sb5 = 2.26649105281820861953868568619768286e+7L, /* 0x401759d6, 0xce8736f0, 0xf28ad037, 0x2a901e0c */ +sb6 = 1.61071939490875921812318684143076081e+8L, /* 0x401a3338, 0x686fb541, 0x6bd27d06, 0x4f95c9ac */ +sb7 = 7.66895673844301852676056750497991966e+8L, /* 0x401c6daf, 0x31cec121, 0x54699126, 0x4bd9bf9e */ +sb8 = 2.41884450436101936436023058196042526e+9L, /* 0x401e2059, 0x46b0b8d7, 0x87b64cbf, 0x78bc296d */ +sb9 = 4.92403055884071695093305291535107666e+9L, /* 0x401f257e, 0xbe5ed739, 0x39e17346, 0xcadd2e55 */ +sb10= 6.18627786365587486459633615573786416e+9L, /* 0x401f70bb, 0x1be7a7e7, 0x6a45b5ae, 0x607c70f0 */ +sb11= 4.45898013426501378097430226324743199e+9L, /* 0x401f09c6, 0xa32643d7, 0xf1724620, 0x9ea46c32 */ +sb12= 1.63006115763329848117160344854224975e+9L, /* 0x401d84a3, 0x0996887f, 0x65a4f43b, 0x978c1d74 */ +sb13= 2.39216717012421697446304015847567721e+8L, /* 0x401ac845, 0x09a065c2, 0x30095da7, 0x9d72d6ae */ +sb14= 7.84837329009278694937250358810225609e+6L; /* 0x4015df06, 0xd5290e15, 0x63031fac, 0x4d9c894c */ +/* + * Domain [9,108], range ~[-5.324e-38,5.340e-38]: + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-124 + */ +static const long double +rc0 = -9.86494292470008707171367567652935673e-3L, /* 0xbff84341, 0x239e86f4, 0x2f57e55b, 0x1aa10fd3 */ +rc1 = -1.26229447747315096406518846411562266L, /* 0xbfff4325, 0xbb1aab28, 0xda395cd9, 0xfb861c15 */ +rc2 = -6.13742634438922591780742637728666162e+1L, /* 0xc004eafe, 0x7dd51cd8, 0x3c7c5928, 0x751e50cf */ +rc3 = -1.50455835478908280402912854338421517e+3L, /* 0xc0097823, 0xbc15b9ab, 0x3d60745c, 0x523e80a5 */ +rc4 = -2.04415631865861549920184039902945685e+4L, /* 0xc00d3f66, 0x40b3fc04, 0x5388f2ec, 0xb009e1f0 */ +rc5 = -1.57625662981714582753490610560037638e+5L, /* 0xc01033dc, 0xd4dc95b6, 0xfd4da93b, 0xf355b4a9 */ +rc6 = -6.73473451616752528402917538033283794e+5L, /* 0xc01248d8, 0x2e73a4f9, 0xcded49c5, 0xfa3bfeb7 */ +rc7 = -1.47433165421387483167186683764364857e+6L, /* 0xc01367f1, 0xba77a8f7, 0xcfdd0dbb, 0x25d554b3 */ +rc8 = -1.38811981807868828563794929997744139e+6L, /* 0xc01352e5, 0x7d16d9ad, 0xbbdcbf38, 0x38fbc5ea */ +rc9 = -3.59659700530831825640766479698155060e+5L, /* 0xc0115f3a, 0xecd57f45, 0x21f8ad6c, 0x910a5958 */ +sc1 = 7.72730753022908298637508998072635696e+1L, /* 0x40053517, 0xa10d52bc, 0xdabb55b6, 0xbd0328cd */ +sc2 = 2.36825757341694050500333261769082182e+3L, /* 0x400a2808, 0x3e0a9b42, 0x82977842, 0x9c5de29e */ +sc3 = 3.72210540173034735352888847134073099e+4L, /* 0x400e22ca, 0x1ba827ef, 0xac8390d7, 0x1fc39a41 */ +sc4 = 3.24136032646418336712461033591393412e+5L, /* 0x40113c8a, 0x0216e100, 0xc59d1e44, 0xf0e68d9d */ +sc5 = 1.57836135851134393802505823370009175e+6L, /* 0x40138157, 0x95bc7664, 0x17575961, 0xdbe58eeb */ +sc6 = 4.12881981392063738026679089714182355e+6L, /* 0x4014f801, 0x9e82e8d2, 0xb8b3a70e, 0xfd84185d */ +sc7 = 5.24438427289213488410596395361544142e+6L, /* 0x40154017, 0x81177109, 0x2aa6c3b0, 0x1f106625 */ +sc8 = 2.59909544563616121735963429710382149e+6L, /* 0x40143d45, 0xbb90a9b1, 0x12bf9390, 0xa827a700 */ +sc9 = 2.80930665169282501639651995082335693e+5L; /* 0x40111258, 0xaa92222e, 0xa97e3216, 0xa237fa6c */ + +long double +erfl(long double x) +{ + int32_t i; + long double ax,R,S,P,Q,s,y,z,r; + uint64_t lx, llx; + uint16_t hx; + + EXTRACT_LDBL128_WORDS(hx, lx, llx, x); + + /* erf(+-0) = +-0 */ + if ((hx == 0 || hx == 0x8000) && (lx | llx) == 0) + return (x); + + /* x is INF or NaN */ + if (hx >= 0x7fff) { + if (hx == 0xffff) + return (-1); + if ((lx | llx) == 0) + return (1); + return (x * x); + } + + ax = fabsl(x); + if(ax < 0.84375) { + if(ax < 0x1.p-34L) { + if (ax < 0x1.p-16373L) + return (8*x+efx8*x)/8; /* avoid spurious underflow */ + return x + efx*x; + } + z = x*x; + r = pp0+z*(pp1+z*(pp2+z*(pp3+z*(pp4+z*(pp5+z*(pp6+z*(pp7+ + z*(pp8+z*pp9)))))))); + s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*(qq5+z*(qq6+z*(qq7+ + z*(qq8+z*qq9)))))))); + y = r/s; + return x + x*y; + } + + if(ax < 1.25) { + s = ax-one; + P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*(pa6+s*(pa7+ + s*(pa8+s*(pa9+s*(pa10+s*pa11)))))))))); + Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*(qa6+s*(qa7+ + s*(qa8+s*(qa9+s*(qa10+s*(qa11+s*qa12))))))))))); + if(x>=0) return (erx + P/Q); else return (-erx - P/Q); + } + + if (ax < LIM1) { + s = one/(ax*ax); + if(ax < LIM0) { /* |x| < 2.85715 */ + R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*(ra5+s*(ra6+s*(ra7+ + s*(ra8+s*(ra9+s*(ra10+s*(ra11+s*(ra12+s*(ra13+s*(ra14+ + s*(ra15+s*ra16))))))))))))))); + S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(sa5+s*(sa6+s*(sa7+ + s*(sa8+s*(sa9+s*(sa10+s*(sa11+s*(sa12+s*(sa13+s*(sa14+ + s*(sa15+s*sa16))))))))))))))); + } else { /* |x| >= 2.85715 */ + R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*(rb5+s*(rb6+s*(rb7+ + s*(rb8+s*(rb9+s*(rb10+s*(rb11+s*(rb12+s*(rb13+ + s*rb14))))))))))))); + S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*(sb5+s*(sb6+s*(sb7+ + s*(sb8+s*(sb9+s*(sb10+s*(sb11+s*(sb12+s*(sb13+ + s*sb14))))))))))))); + } + z = (float)ax; + r = expl(-z*z-0.5625)*expl((z-ax)*(z+ax)+R/S); + if(x>=0) return (one-r/ax); else return (r/ax-one); + } else { + if(x>=0) return (one-tiny); else return (tiny-one); + } +} + +long double +erfcl(long double x) +{ + long double ax,R,S,P,Q,s,y,z,r; + uint64_t lx, llx; + uint16_t hx; + + EXTRACT_LDBL128_WORDS(hx, lx, llx, x); + + /* x is INF or NaN */ + if (hx >= 0x7fff) { + if (hx == 0xffff) + return (2); + if ((lx | llx) == 0) + return (0); + return (x * x); + } + + ax = fabsl(x); + if(ax < 0.84375) { + if(ax < 0x1.p-34L) + return one-x; + z = x*x; + r = pp0+z*(pp1+z*(pp2+z*(pp3+z*(pp4+z*(pp5+z*(pp6+z*(pp7+ + z*(pp8+z*pp9)))))))); + s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*(qq5+z*(qq6+z*(qq7+ + z*(qq8+z*qq9)))))))); + y = r/s; + if(ax < 0.25) { /* x<1/4 */ + return one-(x+x*y); + } else { + r = x*y; + r += (x-half); + return half - r; + } + } + + if(ax < 1.25) { + s = ax-one; + P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*(pa6+s*(pa7+ + s*(pa8+s*(pa9+s*(pa10+s*pa11)))))))))); + Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*(qa6+s*(qa7+ + s*(qa8+s*(qa9+s*(qa10+s*(qa11+s*qa12))))))))))); + if(x>=0) { + z = one-erx; return z - P/Q; + } else { + z = erx+P/Q; return one+z; + } + } + + if (ax < LIM2) { /* |x| < 108 */ + s = one/(ax*ax); + if(ax < LIM0) { /* |x| < 2.85715*/ + R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*(ra5+s*(ra6+s*(ra7+ + s*(ra8+s*(ra9+s*(ra10+s*(ra11+s*(ra12+s*(ra13+s*(ra14+ + s*(ra15+s*ra16))))))))))))))); + S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(sa5+s*(sa6+s*(sa7+ + s*(sa8+s*(sa9+s*(sa10+s*(sa11+s*(sa12+s*(sa13+s*(sa14+ + s*(sa15+s*sa16))))))))))))))); + } else if (ax < LIM1) { + R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*(rb5+s*(rb6+s*(rb7+ + s*(rb8+s*(rb9+s*(rb10+s*(rb11+s*(rb12+s*(rb13+ + s*rb14))))))))))))); + S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*(sb5+s*(sb6+s*(sb7+ + s*(sb8+s*(sb9+s*(sb10+s*(sb11+s*(sb12+s*(sb13+ + s*sb14))))))))))))); + } else { + if(x < -LIM1) return two-tiny; /* x < -9 */ + R=rc0+s*(rc1+s*(rc2+s*(rc3+s*(rc4+s*(rc5+s*(rc6+s*(rc7+ + s*(rc8+s*rc9)))))))); + S=one+s*(sc1+s*(sc2+s*(sc3+s*(sc4+s*(sc5+s*(sc6+s*(sc7+ + s*(sc8+s*sc9)))))))); + } + z = (float)ax; + r = expl(-z*z-0.5625)*expl((z-ax)*(z+ax)+R/S); + if(x>0) return r/ax; else return two-r/ax; + } else { + if(x>0) return tiny*tiny; else return two-tiny; + } +} Property changes on: ld128/s_erfl.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Index: ld80/s_erfl.c =================================================================== --- ld80/s_erfl.c (revision 0) +++ ld80/s_erfl.c (working copy) @@ -0,0 +1,353 @@ +/* @(#)s_erf.c 5.1 93/09/24 */ +/* + * ==================================================== + * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. + * + * Developed at SunPro, a Sun Microsystems, Inc. business. + * Permission to use, copy, modify, and distribute this + * software is freely granted, provided that this notice + * is preserved. + * ==================================================== + */ + +#include +__FBSDID("$FreeBSD"); + +/* + * See s_erf.c for complete comments. + * + * Converted to long double by Steven G. Kargl. + */ +#include +#ifdef __i386__ +#include +#endif + +#include "fpmath.h" +#include "math.h" +#include "math_private.h" + +/* Sloppy limits. */ +#define LIM0 2.85715 /* ~ 1. / 0.35 */ +#define LIM1 7 /* x**2 = - log(sqrt(pi) * x) - (1 - p) * log(2) */ +#define LIM2 108 /* x**2 = - log(sqrt(pi) * x) - emin */ + +/* XXX Prevent compilers from erroneously constant folding these: */ +static const volatile long double tiny = 0x1p-10000L; + +static const double +half= 0.5, +one = 1, +two = 2; + +/* + * Domain [0, 0.84375], range ~[-1.573e-22, 1.524e-22]: + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-73 + */ +static const union IEEEl2bits +efxu = LD80C(0x8375d410a6db446c, -3, 1.28379167095512573902e-1L), +efx8u= LD80C(0x8375d410a6db446c, 0, 1.02703333676410059122L), +pp0u = LD80C(0x8375d410a6db446c, -3, 1.28379167095512573902e-1L), +pp1u = LD80C(0xa46c3b45612b6a1e, 32766, -3.21138241023240692798e-1L), +pp2u = LD80C(0x9b3719d3c4ac4895, 32763, -3.78943451762934832069e-2L), +pp3u = LD80C(0x804ca11f3c7b884e, 32761, -7.83076986858308357277e-3L), +pp4u = LD80C(0x9f4ea36c9af744be, 32756, -3.03854334175109160953e-4L), +pp5u = LD80C(0x9ec5abb84f9e646d, 32752, -1.89271153794195656738e-5L), +qq1u = LD80C(0xdb4d8f02145058e4, -2, 4.28326100353076006451e-1L), +qq2u = LD80C(0xa5783a50a8d83c35, -4, 8.07957225671139745698e-2L), +qq3u = LD80C(0x8b8a0942511cd01a, -7, 8.51679710582846710136e-3L), +qq4u = LD80C(0x873879d9502010a8, -11, 5.15825688424454091100e-4L), +qq5u = LD80C(0x834af3bb90a69cc0, -16, 1.56513192135679610005e-5L), +qq6u = LD80C(0xf59f11a3404946fb, -24, 1.14376360028953741942e-7L); +#define efx (efxu.e) +#define efx8 (efx8u.e) +#define pp0 (pp0u.e) +#define pp1 (pp1u.e) +#define pp2 (pp2u.e) +#define pp3 (pp3u.e) +#define pp4 (pp4u.e) +#define pp5 (pp5u.e) +#define qq1 (qq1u.e) +#define qq2 (qq2u.e) +#define qq3 (qq3u.e) +#define qq4 (qq4u.e) +#define qq5 (qq5u.e) +#define qq6 (qq6u.e) +/* + * Domain [0.84375, 1.25], range ~[-8.396e-22, 8.417e-22]: + * |(erf(x) - erx) - p(x)/q(x)| < 2**-71 + */ +static const union IEEEl2bits +erxu = LD80C(0xd7bb3d0000000000, -1, 8.42700779438018798828e-1L), +pa0u = LD80C(0xe8211158d9ff1d17, -27, 1.35116960705129358047e-8L), +pa1u = LD80C(0xd488f89f5a81a191, -2, 4.15107507183395851624e-1L), +pa2u = LD80C(0xebe046a54d22eaa5, 32764, -1.15173866195293202610e-1L), +pa3u = LD80C(0xc7e7f3955bcda99b, -4, 9.76103811221723127828e-2L), +pa4u = LD80C(0x99ef8f412193ce21, -5, 3.75819774509169158152e-2L), +pa5u = LD80C(0xaa117d1eca526d19, 32760, -5.19007310284054296686e-3L), +pa6u = LD80C(0x872c6dc079ba8e90, -8, 4.12516936280645110337e-3L), +pa7u = LD80C(0xdb1858d6cf405001, -13, 2.08945375039182583868e-4L), +qa1u = LD80C(0xb8f8ac7beb11603e, -1, 7.22544460538363523291e-1L), +qa2u = LD80C(0x9fd5ca8e1b934583, -1, 6.24355945295588098283e-1L), +qa3u = LD80C(0x9d6078f0c864ecd2, -2, 3.07376651184492511034e-1L), +qa4u = LD80C(0x883087d5c1cfe6f2, -3, 1.32997629567810717548e-1L), +qa5u = LD80C(0xbb4f5c04cb521d1a, -5, 4.57299799424686600875e-2L), +qa6u = LD80C(0xa706ad813a9b5ab9, -7, 1.01944631941621374419e-2L), +qa7u = LD80C(0x90985f65f597dbcf, -9, 2.20634774264849368479e-3L); +#define erx (erxu.e) +#define pa0 (pa0u.e) +#define pa1 (pa1u.e) +#define pa2 (pa2u.e) +#define pa3 (pa3u.e) +#define pa4 (pa4u.e) +#define pa5 (pa5u.e) +#define pa6 (pa6u.e) +#define pa7 (pa7u.e) +#define qa1 (qa1u.e) +#define qa2 (qa2u.e) +#define qa3 (qa3u.e) +#define qa4 (qa4u.e) +#define qa5 (qa5u.e) +#define qa6 (qa6u.e) +#define qa7 (qa7u.e) +/* + * Domain [1.25,2.85715], range ~[-2.469e-22,2.475e-22]: + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-71 + */ +static const union IEEEl2bits +ra0u = LD80C(0xa1a091e11c233978, 32761, -9.86494298962466012389e-3L), +ra1u = LD80C(0xc2aa33978c3ef213, 32767, -7.60409569262488622607e-1L), +ra2u = LD80C(0xf2b156cdd7197a15, 32771, -1.51682956734308320894e+1L), +ra3u = LD80C(0x811f905104c4f471, 32775, -1.29123295844711678773e+2L), +ra4u = LD80C(0x87097aae3b0dd531, 32777, -5.40148112828893618931e+2L), +ra5u = LD80C(0x8fbad259dd4823ef, 32778, -1.14983817761630154830e+3L), +ra6u = LD80C(0x97d998d331edd1a6, 32778, -1.21479990539314035369e+3L), +ra7u = LD80C(0x923a52bb507131c5, 32777, -5.84911299542004396834e+2L), +ra8u = LD80C(0xd4ee8a37aee580b4, 32774, -1.06465898265933675587e+2L), +ra9u = LD80C(0x92b266b05a914543, 32770, -4.58427748149964800614L), +sa1u = LD80C(0xd32e06f57a5beb78, 4, 2.63974742105451605084e+1L), +sa2u = LD80C(0x838103b3fd116e2e, 8, 2.63007925509041215972e+2L), +sa3u = LD80C(0x9f5a7c5381e040e7, 10, 1.27482767653814701425e+3L), +sa4u = LD80C(0xca50d95a53553c99, 11, 3.23705306465675292871e+3L), +sa5u = LD80C(0x874bbb386365a447, 12, 4.32946641614583907964e+3L), +sa6u = LD80C(0xb673625fae9ceb9b, 11, 2.91921151703079406192e+3L), +sa7u = LD80C(0xdf7599ee57dd0b93, 9, 8.93837520204357297959e+2L), +sa8u = LD80C(0xc76f0ea852bba805, 6, 9.97169087029449912399e+1L), +sa9u = LD80C(0x92523c9ccba4ec38, 1, 2.28626933395442112455L); +#define ra0 (ra0u.e) +#define ra1 (ra1u.e) +#define ra2 (ra2u.e) +#define ra3 (ra3u.e) +#define ra4 (ra4u.e) +#define ra5 (ra5u.e) +#define ra6 (ra6u.e) +#define ra7 (ra7u.e) +#define ra8 (ra8u.e) +#define ra9 (ra9u.e) +#define sa1 (sa1u.e) +#define sa2 (sa2u.e) +#define sa3 (sa3u.e) +#define sa4 (sa4u.e) +#define sa5 (sa5u.e) +#define sa6 (sa6u.e) +#define sa7 (sa7u.e) +#define sa8 (sa8u.e) +#define sa9 (sa9u.e) +/* + * Domain [2.85715,11], range ~[-8.323e-22,8.390e-22]: + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-70 + */ +static const union IEEEl2bits +rb0u = LD80C(0xa1a091cf43aed35d, 32761, -9.86494292470301435071e-3L), +rb1u = LD80C(0xd17dedaf18eda123, 32767, -8.18327765701659009211e-1L), +rb2u = LD80C(0x9a04d7d229bec743, 32772, -1.92523647707929260794e+1L), +rb3u = LD80C(0xbf4b6037d8d723c7, 32775, -1.91294436922492550143e+2L), +rb4u = LD80C(0xdcc0f0c2a30e05bc, 32777, -8.83014694842560938559e+2L), +rb5u = LD80C(0xe5fb64aebeda2e08, 32778, -1.83985604035640557274e+3L), +rb6u = LD80C(0xb946be95d55f5988, 32778, -1.48221076480554874966e+3L), +rb7u = LD80C(0x941fb05d9911fb04, 32776, -2.96247569751492811574e+2L), +sb1u = LD80C(0x81130861eae89202, 5, 3.22685866641642338307e+1L), +sb2u = LD80C(0xbdb73fddefb0a7a7, 8, 3.79431636564289246766e+2L), +sb3u = LD80C(0x80024bbf0b6ea3b3, 11, 2.04814349274125764278e+3L), +sb4u = LD80C(0xa444b434a0a8549b, 12, 5.25658799100412086380e+3L), +sb5u = LD80C(0xbb6ff428247dfadc, 12, 5.99799421719083271221e+3L), +sb6u = LD80C(0x9b76cbd808a6f2e2, 11, 2.48742476657275909036e+3L), +sb7u = LD80C(0xd3e4fe61373ce96a, 7, 2.11894506526933502549e+2L); +#define rb0 (rb0u.e) +#define rb1 (rb1u.e) +#define rb2 (rb2u.e) +#define rb3 (rb3u.e) +#define rb4 (rb4u.e) +#define rb5 (rb5u.e) +#define rb6 (rb6u.e) +#define rb7 (rb7u.e) +#define sb1 (sb1u.e) +#define sb2 (sb2u.e) +#define sb3 (sb3u.e) +#define sb4 (sb4u.e) +#define sb5 (sb5u.e) +#define sb6 (sb6u.e) +#define sb7 (sb7u.e) +/* + * Domain [7,108], range ~[-4.880e-22,4.903e-22]: + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-71 + */ +static const union IEEEl2bits +rc0u = LD80C(0xa1a091cf437a17ad, 32761, -9.86494292470008707260e-3L), +rc1u = LD80C(0xbe73398d6aa320bf, 32767, -7.43945691133046437622e-1L), +rc2u = LD80C(0xdb0b9c6c7c98cf95, 32771, -1.36903347242128353893e+1L), +rc3u = LD80C(0xb5cc0cecf47b4b20, 32774, -9.08985361145438643948e+1L), +rc4u = LD80C(0xd749186e6457c66b, 32775, -2.15285529040811320620e+2L), +rc5u = LD80C(0xfdd1925f754a9031, 32774, -1.26909319861476689424e+2L), +sc1u = LD80C(0xc5d40fb999d75384, 4, 2.47285456180642684368e+1L), +sc2u = LD80C(0xc5c6fe0e300c822b, 7, 1.97777314078074149137e+2L), +sc3u = LD80C(0x961dc355db85da06, 9, 6.00465047325494566954e+2L), +sc4u = LD80C(0x997ad30640d6bb4d, 9, 6.13919129908866058554e+2L), +sc5u = LD80C(0xf7836d4d36940f8a, 6, 1.23756693280136487692e+2L); +#define rc0 (rc0u.e) +#define rc1 (rc1u.e) +#define rc2 (rc2u.e) +#define rc3 (rc3u.e) +#define rc4 (rc4u.e) +#define rc5 (rc5u.e) +#define sc1 (sc1u.e) +#define sc2 (sc2u.e) +#define sc3 (sc3u.e) +#define sc4 (sc4u.e) +#define sc5 (sc5u.e) + +long double +erfl(long double x) +{ + int32_t i; + long double ax,R,S,P,Q,s,y,z,r; + uint64_t lx; + uint16_t hx; + + EXTRACT_LDBL80_WORDS(hx, lx, x); + + /* erf(+-0) = +-0 */ + if ((hx == 0 || hx == 0x8000) && lx == 0) + return (x); + + /* x is INF or NaN */ + if (hx >= 0x7fff) { + if (hx == 0xffff) + return (-1); + if (lx == 0x8000000000000000) + return (1); + return (x * x); + } + + ENTERI(); + + ax = fabsl(x); + if(ax < 0.84375) { + if(ax < 0x1.p-34L) { + if (ax < 0x1.p-16373L) + RETURNI((8*x+efx8*x)/8); /* avoid spurious underflow */ + RETURNI(x + efx*x); + } + z = x*x; + r = pp0+z*(pp1+z*(pp2+z*(pp3+z*(pp4+z*pp5)))); + s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*(qq5+z*qq6))))); + y = r/s; + RETURNI(x + x*y); + } + + if(ax < 1.25) { + s = ax-one; + P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*(pa6+s*pa7)))))); + Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*(qa6+s*qa7)))))); + if(x>=0) RETURNI(erx + P/Q); else RETURNI(-erx - P/Q); + } + + if (ax < LIM1) { /* |x| < 7 */ + s = one/(ax*ax); + if(ax < LIM0) { /* |x| < 2.85715 */ + R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*(ra5+s*(ra6+s*(ra7+ + s*(ra8+s*ra9)))))))); + S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(sa5+s*(sa6+s*(sa7+ + s*(sa8+s*sa9)))))))); + } else { + R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*(rb5+s*(rb6+s*rb7)))))); + S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*(sb5+s*(sb6+s*sb7)))))); + } + z=(float)ax; + r=expl(-z*z-0.5625)*expl((z-ax)*(z+ax)+R/S); + if(x>=0) RETURNI(one-r/ax); else RETURNI(r/ax-one); + } else { + if(x>=0) RETURNI(one-tiny); else RETURNI(tiny-one); + } +} + +long double +erfcl(long double x) +{ + long double ax,R,S,P,Q,s,y,z,r; + uint64_t lx; + uint16_t hx; + + EXTRACT_LDBL80_WORDS(hx, lx, x); + + /* x is INF or NaN */ + if (hx >= 0x7fff) { + if (hx == 0xffff) + return (2); + if (lx == 0x8000000000000000) + return (0); + return (x * x); + } + + ENTERI(); + + ax = fabsl(x); + if(ax < 0.84375L) { + if(ax < 0x1.p-34L) + RETURNI(one-x); + z = x*x; + r = pp0+z*(pp1+z*(pp2+z*(pp3+z*(pp4+z*pp5)))); + s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*(qq5+z*qq6))))); + y = r/s; + if(ax < 0.25L) { /* x<1/4 */ + RETURNI(one-(x+x*y)); + } else { + r = x*y; + r += (x-half); + RETURNI(half - r); + } + } + + if(ax < 1.25L) { + s = ax-one; + P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*(pa6+s*pa7)))))); + Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*(qa6+s*qa7)))))); + if(x>=0) { + z = one-erx; RETURNI(z - P/Q); + } else { + z = (erx+P/Q); RETURNI(one+z); + } + } + + if (ax < LIM2) { /* |x|<108 */ + s = one/(ax*ax); + if(ax < LIM0) { /* |x| < 2.85715 */ + R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*(ra5+s*(ra6+s*(ra7+ + s*(ra8+s*ra9)))))))); + S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(sa5+s*(sa6+s*(sa7+ + s*(sa8+s*sa9)))))))); + } else if (ax < LIM1) { /* | |x| < 7 */ + R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*(rb5+s*(rb6+s*rb7)))))); + S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*(sb5+s*(sb6+s*sb7)))))); + } else { + if(x < -LIM1) RETURNI(two-tiny);/* x < -7 */ + R=rc0+s*(rc1+s*(rc2+s*(rc3+s*(rc4+s*rc5)))); + S=one+s*(sc1+s*(sc2+s*(sc3+s*(sc4+s*sc5)))); + } + z=(float)ax; + r=expl(-z*z-0.5625)*expl((z-ax)*(z+ax)+R/S); + if(x>0) RETURNI(r/ax); else RETURNI(two-r/ax); + } else { + if(x>0) RETURNI(tiny*tiny); else RETURNI(two-tiny); + } +} Property changes on: ld80/s_erfl.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Index: src/imprecise.c =================================================================== --- src/imprecise.c (revision 258855) +++ src/imprecise.c (working copy) @@ -61,8 +61,6 @@ DECLARE_WEAK(f ## l) DECLARE_IMPRECISE(cosh); -DECLARE_IMPRECISE(erfc); -DECLARE_IMPRECISE(erf); DECLARE_IMPRECISE(lgamma); DECLARE_IMPRECISE(sinh); DECLARE_IMPRECISE(tanh); Index: src/s_erf.c =================================================================== --- src/s_erf.c (revision 258855) +++ src/s_erf.c (working copy) @@ -121,8 +121,8 @@ /* * Coefficients for approximation to erf on [0,0.84375] */ -efx = 1.28379167095512586316e-01, /* 0x3FC06EBA, 0x8214DB69 */ -efx8= 1.02703333676410069053e+00, /* 0x3FF06EBA, 0x8214DB69 */ +efx = 1.28379167095512586316e-01, /* 0x3FC06EBA, 0x8214DB69 */ +efx8= 1.02703333676410069053e+00, /* 0x3FF06EBA, 0x8214DB69 */ pp0 = 1.28379167095512558561e-01, /* 0x3FC06EBA, 0x8214DB68 */ pp1 = -3.25042107247001499370e-01, /* 0xBFD4CD7D, 0x691CB913 */ pp2 = -2.84817495755985104766e-02, /* 0xBF9D2A51, 0xDBD7194F */ @@ -201,7 +201,7 @@ if(ix < 0x3feb0000) { /* |x|<0.84375 */ if(ix < 0x3e300000) { /* |x|<2**-28 */ if (ix < 0x00800000) - return (8*x+efx8*x)/8; /* avoid spurious underflow */ + return (8*x+efx8*x)/8; /* avoid spurious underflow */ return x + efx*x; } z = x*x; @@ -238,6 +238,10 @@ if(hx>=0) return one-r/x; else return r/x-one; } +#if (LDBL_MANT_DIG == 53) +__weak_reference(erf, erfl); +#endif + double erfc(double x) { @@ -299,3 +303,7 @@ if(hx>0) return tiny*tiny; else return two-tiny; } } + +#if (LDBL_MANT_DIG == 53) +__weak_reference(erfc, erfcl); +#endif Index: src/s_erff.c =================================================================== --- src/s_erff.c (revision 258855) +++ src/s_erff.c (working copy) @@ -21,62 +21,54 @@ static const float tiny = 1e-30, -half= 5.0000000000e-01, /* 0x3F000000 */ -one = 1.0000000000e+00, /* 0x3F800000 */ -two = 2.0000000000e+00, /* 0x40000000 */ +half= 5.0000000000e-1, /* 0x3f000000 */ +one = 1.0000000000e+0, /* 0x3f800000 */ +two = 2.0000000000e+0, /* 0x40000000 */ /* - * Coefficients for approximation to erf on [0,0.84375] + * Domain [0, 0.84375], range ~[-5.4419e-10,5.5179e-10]: + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-31 */ -efx = 1.2837916613e-01, /* 0x3e0375d4 */ -efx8= 1.0270333290e+00, /* 0x3f8375d4 */ +efx = 1.28379166e-1, /* 0x3e0375d4 */ +efx8 = 1.02703333, /* 0x3f8375d4 */ +pp0 = 1.28379166e-1, /* 0x3e0375d4 */ +pp1 = -3.36030394e-1, /* 0xbeac0c2d */ +pp2 = -1.86261395e-3, /* 0xbaf422f4 */ +qq1 = 3.12324315e-1, /* 0x3e9fe8f9 */ +qq2 = 2.16070414e-2, /* 0x3cb10140 */ +qq3 = -1.98859372e-3, /* 0xbb025311 */ /* - * Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]: - * |(erf(x) - x)/x - p(x)/q(x)| < 2**-31. + * Domain [0.84375, 1.25], range ~[-1.023e-9,1.023e-9]: + * |(erf(x) - erx) - p(x)/q(x)| < 2**-31 */ -pp0 = 1.28379166e-01F, /* 0x1.06eba8p-3 */ -pp1 = -3.36030394e-01F, /* -0x1.58185ap-2 */ -pp2 = -1.86260219e-03F, /* -0x1.e8451ep-10 */ -qq1 = 3.12324286e-01F, /* 0x1.3fd1f0p-2 */ -qq2 = 2.16070302e-02F, /* 0x1.620274p-6 */ -qq3 = -1.98859419e-03F, /* -0x1.04a626p-9 */ +erx = 8.42697144e-1, /* 0x3f57bb00 */ +pa0 = 3.65041046e-6, /* 0x3674f993 */ +pa1 = 4.15109307e-1, /* 0x3ed48935 */ +pa2 = -2.09395722e-1, /* 0xbe566bd5 */ +pa3 = 8.67677554e-2, /* 0x3db1b34b */ +qa1 = 4.95560974e-1, /* 0x3efdba2b */ +qa2 = 3.71248513e-1, /* 0x3ebe1449 */ +qa3 = 3.92478965e-2, /* 0x3d20c267 */ /* - * Domain [0.84375, 1.25], range ~[-1.953e-11,1.940e-11]: - * |(erf(x) - erx) - p(x)/q(x)| < 2**-36. + * Domain [1.25,1/0.35], range ~[-4.821e-9,4.927e-9]: + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-28 */ -erx = 8.42697144e-01F, /* 0x1.af7600p-1. erf(1) rounded to 16 bits. */ -pa0 = 3.64939137e-06F, /* 0x1.e9d022p-19 */ -pa1 = 4.15109694e-01F, /* 0x1.a91284p-2 */ -pa2 = -1.65179938e-01F, /* -0x1.5249dcp-3 */ -pa3 = 1.10914491e-01F, /* 0x1.c64e46p-4 */ -qa1 = 6.02074385e-01F, /* 0x1.344318p-1 */ -qa2 = 5.35934687e-01F, /* 0x1.126608p-1 */ -qa3 = 1.68576106e-01F, /* 0x1.593e6ep-3 */ -qa4 = 5.62181212e-02F, /* 0x1.cc89f2p-5 */ +ra0 = -9.88156721e-3, /* 0xbc21e64c */ +ra1 = -5.43658376e-1, /* 0xbf0b2d32 */ +ra2 = -1.66828310, /* 0xbfd58a4d */ +ra3 = -6.91554189e-1, /* 0xbf3109b2 */ +sa1 = 4.48581553, /* 0x408f8bcd */ +sa2 = 4.10799170, /* 0x408374ab */ +sa3 = 5.53855181e-1, /* 0x3f0dc974 */ /* - * Domain [1.25,1/0.35], range ~[-7.043e-10,7.457e-10]: + * Domain [2.85715, 11], range ~[-1.484e-9,1.505e-9]: * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-30 */ -ra0 = -9.87132732e-03F, /* -0x1.4376b2p-7 */ -ra1 = -5.53605914e-01F, /* -0x1.1b723cp-1 */ -ra2 = -2.17589188e+00F, /* -0x1.1683a0p+1 */ -ra3 = -1.43268085e+00F, /* -0x1.6ec42cp+0 */ -sa1 = 5.45995426e+00F, /* 0x1.5d6fe4p+2 */ -sa2 = 6.69798088e+00F, /* 0x1.acabb8p+2 */ -sa3 = 1.43113089e+00F, /* 0x1.6e5e98p+0 */ -sa4 = -5.77397496e-02F, /* -0x1.d90108p-5 */ -/* - * Domain [1/0.35, 11], range ~[-2.264e-13,2.336e-13]: - * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-42 - */ -rb0 = -9.86494310e-03F, /* -0x1.434124p-7 */ -rb1 = -6.25171244e-01F, /* -0x1.401672p-1 */ -rb2 = -6.16498327e+00F, /* -0x1.8a8f16p+2 */ -rb3 = -1.66696873e+01F, /* -0x1.0ab70ap+4 */ -rb4 = -9.53764343e+00F, /* -0x1.313460p+3 */ -sb1 = 1.26884899e+01F, /* 0x1.96081cp+3 */ -sb2 = 4.51839523e+01F, /* 0x1.6978bcp+5 */ -sb3 = 4.72810211e+01F, /* 0x1.7a3f88p+5 */ -sb4 = 8.93033314e+00F; /* 0x1.1dc54ap+3 */ +rb0 = -9.86496918e-3, /* 0xbc21a0ae */ +rb1 = -5.48049808e-1, /* 0xbf0c4cfe */ +rb2 = -1.84115684, /* 0xbfebab07 */ +sb1 = 4.87132740, /* 0x409be1ea */ +sb2 = 3.04982710, /* 0x4043305e */ +sb3 = -7.61900663e-1; /* 0xbf430bec */ float erff(float x) @@ -105,7 +97,7 @@ if(ix < 0x3fa00000) { /* 0.84375 <= |x| < 1.25 */ s = fabsf(x)-one; P = pa0+s*(pa1+s*(pa2+s*pa3)); - Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4))); + Q = one+s*(qa1+s*(qa2+s*qa3)); if(hx>=0) return erx + P/Q; else return -erx - P/Q; } if (ix >= 0x40800000) { /* inf>|x|>=4 */ @@ -113,12 +105,12 @@ } x = fabsf(x); s = one/(x*x); - if(ix< 0x4036DB6E) { /* |x| < 1/0.35 */ + if(ix< 0x4036db8c) { /* |x| < 2.85715 ~ 1/0.35 */ R=ra0+s*(ra1+s*(ra2+s*ra3)); - S=one+s*(sa1+s*(sa2+s*(sa3+s*sa4))); - } else { /* |x| >= 1/0.35 */ - R=rb0+s*(rb1+s*(rb2+s*(rb3+s*rb4))); - S=one+s*(sb1+s*(sb2+s*(sb3+s*sb4))); + S=one+s*(sa1+s*(sa2+s*sa3)); + } else { /* |x| >= 2.85715 ~ 1/0.35 */ + R=rb0+s*(rb1+s*rb2); + S=one+s*(sb1+s*(sb2+s*sb3)); } SET_FLOAT_WORD(z,hx&0xffffe000); r = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S); @@ -155,7 +147,7 @@ if(ix < 0x3fa00000) { /* 0.84375 <= |x| < 1.25 */ s = fabsf(x)-one; P = pa0+s*(pa1+s*(pa2+s*pa3)); - Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4))); + Q = one+s*(qa1+s*(qa2+s*qa3)); if(hx>=0) { z = one-erx; return z - P/Q; } else { @@ -165,13 +157,13 @@ if (ix < 0x41300000) { /* |x|<11 */ x = fabsf(x); s = one/(x*x); - if(ix< 0x4036DB6D) { /* |x| < 1/.35 ~ 2.857143*/ + if(ix< 0x4036db8c) { /* |x| < 2.85715 ~ 1/.35 */ R=ra0+s*(ra1+s*(ra2+s*ra3)); - S=one+s*(sa1+s*(sa2+s*(sa3+s*sa4))); - } else { /* |x| >= 1/.35 ~ 2.857143 */ + S=one+s*(sa1+s*(sa2+s*sa3)); + } else { /* |x| >= 2.85715 ~ 1/.35 */ if(hx<0&&ix>=0x40a00000) return two-tiny;/* x < -5 */ - R=rb0+s*(rb1+s*(rb2+s*(rb3+s*rb4))); - S=one+s*(sb1+s*(sb2+s*(sb3+s*sb4))); + R=rb0+s*(rb1+s*rb2); + S=one+s*(sb1+s*(sb2+s*sb3)); } SET_FLOAT_WORD(z,hx&0xffffe000); r = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S); From owner-freebsd-numerics@FreeBSD.ORG Tue Dec 3 00:57:01 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 60257475 for ; Tue, 3 Dec 2013 00:57:01 +0000 (UTC) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id C3676187C for ; Tue, 3 Dec 2013 00:57:00 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 4B2AD1043FFD; Tue, 3 Dec 2013 11:56:51 +1100 (EST) Date: Tue, 3 Dec 2013 11:56:49 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: erff, erfl, and erfcl patch In-Reply-To: <20131202214240.GA44209@troutmask.apl.washington.edu> Message-ID: <20131203095936.G1230@besplex.bde.org> References: <20131202214240.GA44209@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=bpB1Wiqi c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=pwry36zkC_oA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=Y1DDuN7oUWAA:10 a=qWm97dYoU6-eoFHfUM0A:9 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Dec 2013 00:57:01 -0000 On Mon, 2 Dec 2013, Steve Kargl wrote: > I intend to commit the patch that follows by sig in the next day or > two. In particular, I want to get the ld80 and ld128 versions of > erfl and erfcl into tree as the current hack provided by imprecise.c > is ugly. Looks good, and a lot of work. I don' have time to look at it closely for a week or 2. Did you find translating the fdlibm code, including keeping its style bugs and algorithms to a fault, easier than using a new method like you did for expl()? > ... > * msun/src/s_erf.c: > . Fix whitespace issues. > . Add weak references for erfl and erfcl on targets where long double > has 53 bits of precisions. > > * msun/src/s_erff.c: > . Consistently use lower case in hex constants. > . Update the rational approximations. > . Fix descriptions of the the domain and range. > . Remove leading zeros in exponents, e.g., 1.28379166e-01 becomes > 1.28379166e-1. Some style is still inconsistent. Better not change it without making it fully consistent. > Index: ld128/s_erfl.c > =================================================================== > --- ld128/s_erfl.c (revision 0) > +++ ld128/s_erfl.c (working copy) > @@ -0,0 +1,347 @@ > ... > +/* > + * Domain [0, 0.84375], range ~[-2.076e-38, 2.074e-38]: > + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-126 > + */ > +static const long double > +efx = 1.28379167095512573896158903121545167e-1L, /* 0xecbff6a7, 0x481dd788, 0xb64d21a8, 0xeb06fc3f */ > +efx8= 1.02703333676410059116927122497236133L, /* 0xecbff6a7, 0x481dd788, 0xb64d21a8, 0xeb06ff3f */ > +pp0 = 1.28379167095512573896158903121545167e-1L, /* 0x3ffc06eb, 0xa8214db6, 0x88d71d48, 0xa7f6bfec */ There are really 2 subranges, one near 0 covered by a linear approximation using efx and efx8, and the expanded comment doesn't really apply to this range (though the comment might be true, p(x)/q(x) is not actually used for this subrange). Elsewhere, we often write similar comments, but we don't put the coefficients for the approximation near 0 under the scope of the comment. This approximation near 0 is especially interesting so a separate comment for it would be good. > +pp1 = -3.14930453779220199897729762882897733e-1L, /* 0xbffd427d, 0x20fdfc19, 0x5395ba59, 0x26b231dc */ > ... > +qa12= -1.02073578608782110388686097137679971e-6L; /* 0xbfeb1200, 0x6dd9de6c, 0x647bcf5b, 0xec577463 */ > ... s_erf.c uses more spaces, so that there is always at least 1 space before '=' (except for efx8, which is misformatted). In fact, there are always at least 2. There, the coeff numbers don't go above 9, so there is more space than necessary (except for lining up with the efx8 line if that were not misformatted). Here, using the same column for '=' as there is just enough to leave 1 space after the wider coeffs. > +long double > +erfl(long double x) > +{ > + int32_t i; > + long double ax,R,S,P,Q,s,y,z,r; > + uint64_t lx, llx; > + uint16_t hx; > + > + EXTRACT_LDBL128_WORDS(hx, lx, llx, x); > + > + /* erf(+-0) = +-0 */ > + if ((hx == 0 || hx == 0x8000) && (lx | llx) == 0) > + return (x); Why a special case for +-0? s_erf.c doesn't have one. I think it is correct too. erf(-0) is evaluated as 0.125*(8*-0+efx8*-0) = 0.125*(-0+-0) = -0, hopefully in all rounding modes. > + > + /* x is INF or NaN */ > + if (hx >= 0x7fff) { > + if (hx == 0xffff) > + return (-1); > + if ((lx | llx) == 0) > + return (1); > + return (x * x); > + } s_erf.c has better comments. x is _not_ INF or NaN before it is classified as such... This seems to be quite broken. hx is unsigned, so (hx >= 0x7fff) is true for all negative numbers, as well as for Infs and NaNs. This gives the bug that erfl(negative) = square(negative). (hx == 0xffff) is true for negative NaNs as well as for -Inf. This gives the bug that erfl(negative NaN) = -1. s_erf.c avoids these bugs using the following methods: - first mask out the sign bit: (ix = hx & MASK). - be more careful testing for NaNs. In the above, the (lx | llx) test would probably work if if it were not broken by being after the (hx == 0xffff) test; the latter is apparently to detect -Inf in a not so good way after not masking out the sign bit initially. Then it uses a more magic expression than x*x so as to avoid the 3 cases for +-1 and x*x. In expl.c, we eventually found an even more magic expression that avoids the (lx | llx) test. This test tends to be slow (although it is in a rarely executed case, just looking at llx sometimes confuses the compiler to load llx for all cases). The test is even more complicated for ld80, since you have to handle the normalization bit and should detected unsupported formats. So it is good to avoid testing this in bits. For the magic expression, the one in s_erf.c is already good. We don't need to look at either lx or llx. hx already consists of just the exponent (and sign) bits. This corresponds to masking off the mantissa bits in s_erf.c. (hx & 0x7fff) == 0x7fff classifies Infs and NaNs. Divide 1 by x to turn +-Inf into +-0 and not change NaNs except to quieten them. This result is not used for +-Inf, but it keeps NaNs unchanged except for quietening. Then add the integer +-1 to get the result +-1 for +-Inf and keep NaNs again unchanged except for quietening. s_erf.c calculates the +-1 as an integer. I don't know of any better way to turn +-Inf into +-1. Expressions like x/Inf don't quite work. Similarly elsewhere. > + > + ax = fabsl(x); > + if(ax < 0.84375) { I agree with not using the bit tests here, at least if you can avoid loading lx and llx at all. (The way the above is written, lx and llx are always loaded and there is no fast path for the usual case of finite x where they are not used.) > + if(ax < 0x1.p-34L) { I like to omit the "point" if possible, and there is no need to use an L suffix for small powers of 2. > + if (ax < LIM1) { Not consistently misformatted as 'if(foo)'. Why #defined constants for this and not for the others? This one varies between ld80 and ld128, so using it makes the diffs between those easier to read, but it is a literal constant for f32 and d64, so using it makes the diffs from ld* to the reference d64 harder to read. > + s = one/(ax*ax); > + if(ax < LIM0) { /* |x| < 2.85715 */ Repeating magic numbers in comments defeats the point of using #define to make them less magic. LIM0 seems to be the same for all precisions, and s_erf.c spells it less magically as 1/0.35. > Index: ld80/s_erfl.c > =================================================================== > --- ld80/s_erfl.c (revision 0) > +++ ld80/s_erfl.c (working copy) > @@ -0,0 +1,353 @@ > ... > +long double > +erfl(long double x) > ... > + /* x is INF or NaN */ > + if (hx >= 0x7fff) { > + if (hx == 0xffff) > + return (-1); > + if (lx == 0x8000000000000000) > + return (1); > + return (x * x); > + } After changing the first hx test to ((hx & 0x7fff) == 0x7fff) and removing the second hx test as above, this seems to be a correct but slow way to classify Infs and NaNs. Oops, we still need a second hx test to classify -Inf. This must be done after testing the normalization bit: /* Wrong comment intentionally left out. */ if ((hx & 0x7fff) == 0x7fff) { if (lx == 0x8000000000000000ULL) return (hx & 0x8000 ? -1 : 1); /* erf(+-Inf) = +-1 */ return (x + x); /* erf(NaN/pseudo{Inf,NaN}) = qNaN */ } I don't like the ULL suffix, but it is needed to fix warnings for non-C99 compilers, and I used it in s_logl.c. The above may be slow, but it is close to what I used in s_logl.c. There it is impossible to avoid loading lx. s_logl.c is a little more complicated, both to handle denormals and to classify unnormals as NaNs so that they get handled by the x+x return. It is better to avoid these complications by using the method in s_erf.c. "Finite" unnormals are not classified by the hx test. I think they are hanled correctly by falling though, as in expl(), since there will by an operation involving them. > Index: src/s_erf.c > =================================================================== > --- src/s_erf.c (revision 258855) > +++ src/s_erf.c (working copy) > ... > @@ -121,8 +121,8 @@ > /* > * Coefficients for approximation to erf on [0,0.84375] > */ > -efx = 1.28379167095512586316e-01, /* 0x3FC06EBA, 0x8214DB69 */ > -efx8= 1.02703333676410069053e+00, /* 0x3FF06EBA, 0x8214DB69 */ > +efx = 1.28379167095512586316e-01, /* 0x3FC06EBA, 0x8214DB69 */ > +efx8= 1.02703333676410069053e+00, /* 0x3FF06EBA, 0x8214DB69 */ > pp0 = 1.28379167095512558561e-01, /* 0x3FC06EBA, 0x8214DB68 */ > pp1 = -3.25042107247001499370e-01, /* 0xBFD4CD7D, 0x691CB913 */ > pp2 = -2.84817495755985104766e-02, /* 0xBF9D2A51, 0xDBD7194F */ This lines up the comments, but the misformatting was not lining up either the '=' or the first digit. > Index: src/s_erff.c > =================================================================== > --- src/s_erff.c (revision 258855) > +++ src/s_erff.c (working copy) > @@ -21,62 +21,54 @@ > > static const float > tiny = 1e-30, > -half= 5.0000000000e-01, /* 0x3F000000 */ > -one = 1.0000000000e+00, /* 0x3F800000 */ > -two = 2.0000000000e+00, /* 0x40000000 */ > +half= 5.0000000000e-1, /* 0x3f000000 */ > +one = 1.0000000000e+0, /* 0x3f800000 */ > +two = 2.0000000000e+0, /* 0x40000000 */ If changing these, fix the '=' and first digit to line up with later sections. > /* > - * Coefficients for approximation to erf on [0,0.84375] > + * Domain [0, 0.84375], range ~[-5.4419e-10,5.5179e-10]: > + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-31 > */ > -efx = 1.2837916613e-01, /* 0x3e0375d4 */ > -efx8= 1.0270333290e+00, /* 0x3f8375d4 */ > +efx = 1.28379166e-1, /* 0x3e0375d4 */ > +efx8 = 1.02703333, /* 0x3f8375d4 */ > +pp0 = 1.28379166e-1, /* 0x3e0375d4 */ This does line up the '=' and first digit. So the formatting seems consistent in this file except for half/one/two, but may be too different from s_erfc.c > /* > - * Domain [0.84375, 1.25], range ~[-1.953e-11,1.940e-11]: > - * |(erf(x) - erx) - p(x)/q(x)| < 2**-36. > + * Domain [1.25,1/0.35], range ~[-4.821e-9,4.927e-9]: > + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-28 > */ > -erx = 8.42697144e-01F, /* 0x1.af7600p-1. erf(1) rounded to 16 bits. */ > -pa0 = 3.64939137e-06F, /* 0x1.e9d022p-19 */ > -pa1 = 4.15109694e-01F, /* 0x1.a91284p-2 */ > -pa2 = -1.65179938e-01F, /* -0x1.5249dcp-3 */ > -pa3 = 1.10914491e-01F, /* 0x1.c64e46p-4 */ > -qa1 = 6.02074385e-01F, /* 0x1.344318p-1 */ > -qa2 = 5.35934687e-01F, /* 0x1.126608p-1 */ > -qa3 = 1.68576106e-01F, /* 0x1.593e6ep-3 */ > -qa4 = 5.62181212e-02F, /* 0x1.cc89f2p-5 */ > +ra0 = -9.88156721e-3, /* 0xbc21e64c */ > +ra1 = -5.43658376e-1, /* 0xbf0b2d32 */ Nah, this becomes inconsistent again by removing the space bnefore the '=' that is needed to like up with the longer variable name efx8 and a space after that before the '='. > @@ -113,12 +105,12 @@ > } > x = fabsf(x); > s = one/(x*x); > - if(ix< 0x4036DB6E) { /* |x| < 1/0.35 */ > + if(ix< 0x4036db8c) { /* |x| < 2.85715 ~ 1/0.35 */ Don't give more detail than s_erf.c. We are only keeping these comments (that should be identical) because removing them would make the diffs harder to read). This also changes the style. s_erf.c still uses an upper case hex constant here. Bruce From owner-freebsd-numerics@FreeBSD.ORG Tue Dec 3 04:01:29 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 59CD1E7F for ; Tue, 3 Dec 2013 04:01:29 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2FD74127A for ; Tue, 3 Dec 2013 04:01:29 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id rB341Sdv046511; Mon, 2 Dec 2013 20:01:28 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id rB341R1r046510; Mon, 2 Dec 2013 20:01:27 -0800 (PST) (envelope-from sgk) Date: Mon, 2 Dec 2013 20:01:27 -0800 From: Steve Kargl To: Bruce Evans Subject: Re: erff, erfl, and erfcl patch Message-ID: <20131203040127.GA46357@troutmask.apl.washington.edu> References: <20131202214240.GA44209@troutmask.apl.washington.edu> <20131203095936.G1230@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131203095936.G1230@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Dec 2013 04:01:29 -0000 On Tue, Dec 03, 2013 at 11:56:49AM +1100, Bruce Evans wrote: > On Mon, 2 Dec 2013, Steve Kargl wrote: > > > I intend to commit the patch that follows by sig in the next day or > > two. In particular, I want to get the ld80 and ld128 versions of > > erfl and erfcl into tree as the current hack provided by imprecise.c > > is ugly. > > Looks good, and a lot of work. I don' have time to look at it closely > for a week or 2. For not having time to look over the diff closely, you've pointed out a few things I can fix/investigate. > Did you find translating the fdlibm code, including keeping its style > bugs and algorithms to a fault, easier than using a new method like > you did for expl()? Interesting question. I have 2 more years of experience with fdlibm code since I went looking for a different method for expl() (and logl()). I find that I have to go through fdlibm code and add comments and trace the code see what is happen. I did look at replacing the rational approximations with simple minimax polynomials over different domains. minimax polynomials were faster than the rational approximations, but I couldn't get the max ULP below 1.4 for erff. > > ... > > * msun/src/s_erf.c: > > . Fix whitespace issues. > > . Add weak references for erfl and erfcl on targets where long double > > has 53 bits of precisions. > > > > * msun/src/s_erff.c: > > . Consistently use lower case in hex constants. > > . Update the rational approximations. > > . Fix descriptions of the the domain and range. > > . Remove leading zeros in exponents, e.g., 1.28379166e-01 becomes > > 1.28379166e-1. > > Some style is still inconsistent. Better not change it without making > it fully consistent. I can leave s_erff.c for a later commit. I'm much more interested in getting ld80 and ld128 committed. > > Index: ld128/s_erfl.c > > =================================================================== > > --- ld128/s_erfl.c (revision 0) > > +++ ld128/s_erfl.c (working copy) > > @@ -0,0 +1,347 @@ > > ... > > +/* > > + * Domain [0, 0.84375], range ~[-2.076e-38, 2.074e-38]: > > + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-126 > > + */ > > +static const long double > > +efx = 1.28379167095512573896158903121545167e-1L, /* 0xecbff6a7, 0x481dd788, 0xb64d21a8, 0xeb06fc3f */ > > +efx8= 1.02703333676410059116927122497236133L, /* 0xecbff6a7, 0x481dd788, 0xb64d21a8, 0xeb06ff3f */ > > +pp0 = 1.28379167095512573896158903121545167e-1L, /* 0x3ffc06eb, 0xa8214db6, 0x88d71d48, 0xa7f6bfec */ > > There are really 2 subranges, one near 0 covered by a linear approximation > using efx and efx8, and the expanded comment doesn't really apply to this > range (though the comment might be true, p(x)/q(x) is not actually used for > this subrange). Elsewhere, we often write similar comments, but we don't > put the coefficients for the approximation near 0 under the scope of the > comment. This approximation near 0 is especially interesting so a separate > comment for it would be good. Yes, of course, you're right. I'll expand the comment to note the subrange is not included in the error estimate, and I'll update the comment to use pp(x)/qq(x). > > +pp1 = -3.14930453779220199897729762882897733e-1L, /* 0xbffd427d, 0x20fdfc19, 0x5395ba59, 0x26b231dc */ > > ... > > +qa12= -1.02073578608782110388686097137679971e-6L; /* 0xbfeb1200, 0x6dd9de6c, 0x647bcf5b, 0xec577463 */ > > ... > > s_erf.c uses more spaces, so that there is always at least 1 space before > '=' (except for efx8, which is misformatted). In fact, there are always > at least 2. There, the coeff numbers don't go above 9, so there is more > space than necessary (except for lining up with the efx8 line if that were > not misformatted). Here, using the same column for '=' as there is just > enough to leave 1 space after the wider coeffs. OK. I'll align the '=' and ensure at least one space precedes it. > > > +long double > > +erfl(long double x) > > +{ > > + int32_t i; > > + long double ax,R,S,P,Q,s,y,z,r; > > + uint64_t lx, llx; > > + uint16_t hx; > > + > > + EXTRACT_LDBL128_WORDS(hx, lx, llx, x); > > + > > + /* erf(+-0) = +-0 */ > > + if ((hx == 0 || hx == 0x8000) && (lx | llx) == 0) > > + return (x); > > Why a special case for +-0? s_erf.c doesn't have one. I think it is correct > too. erf(-0) is evaluated as 0.125*(8*-0+efx8*-0) = 0.125*(-0+-0) = -0, > hopefully in all rounding modes. I'll check again, but I recall that n1256.pdf says erf(+-0) = +-0 is exact. Does falling through avoid raising any exception? > > + > > + /* x is INF or NaN */ > > + if (hx >= 0x7fff) { > > + if (hx == 0xffff) > > + return (-1); > > + if ((lx | llx) == 0) > > + return (1); > > + return (x * x); > > + } > > s_erf.c has better comments. x is _not_ INF or NaN before it is classified > as such... The comment isn't for s_erf.c. It for me. :-) I can move inside the 'if (hx >= 0x7fff)'. > This seems to be quite broken. hx is unsigned, so (hx >= 0x7fff) is true > for all negative numbers, as well as for Infs and NaNs. This gives the > bug that erfl(negative) = square(negative). (hx == 0xffff) is true for > negative NaNs as well as for -Inf. This gives the bug that > erfl(negative NaN) = -1. Possibly, it broken. I changed the previous method that I used to the above, because you weren't happy with it either. > > s_erf.c avoids these bugs using the following methods: > - first mask out the sign bit: (ix = hx & MASK). > - be more careful testing for NaNs. In the above, the (lx | llx) test > would probably work if if it were not broken by being after the > (hx == 0xffff) test; the latter is apparently to detect -Inf in a > not so good way after not masking out the sign bit initially. > Then it uses a more magic expression than x*x so as to avoid the 3 cases > for +-1 and x*x. I don't like magic fdlibm expressions. :) > In expl.c, we eventually found an even more magic expression that > avoids the (lx | llx) test. This test tends to be slow (although it > is in a rarely executed case, just looking at llx sometimes confuses > the compiler to load llx for all cases). The test is even more > complicated for ld80, since you have to handle the normalization bit > and should detected unsupported formats. So it is good to avoid testing > this in bits. > > For the magic expression, the one in s_erf.c is already good. We don't > need to look at either lx or llx. hx already consists of just the > exponent (and sign) bits. This corresponds to masking off the mantissa > bits in s_erf.c. (hx & 0x7fff) == 0x7fff classifies Infs and NaNs. > Divide 1 by x to turn +-Inf into +-0 and not change NaNs except to > quieten them. This result is not used for +-Inf, but it keeps NaNs > unchanged except for quietening. Then add the integer +-1 to get the > result +-1 for +-Inf and keep NaNs again unchanged except for > quietening. s_erf.c calculates the +-1 as an integer. I don't know > of any better way to turn +-Inf into +-1. Expressions like x/Inf > don't quite work. I'll look at it again, but I found these magic expressions to be a tad too ugly for my taste. > > + if (ax < LIM1) { > > Not consistently misformatted as 'if(foo)'. > > Why #defined constants for this and not for the others? This one varies > between ld80 and ld128, so using it makes the diffs between those easier > to read, but it is a literal constant for f32 and d64, so using it makes > the diffs from ld* to the reference d64 harder to read. I can't remember why I used the #defines other than I flipped between using 1/0.35 and 2.85715. I probably got tired of editing the different locations. > > + s = one/(ax*ax); > > + if(ax < LIM0) { /* |x| < 2.85715 */ > > Repeating magic numbers in comments defeats the point of using #define to > make them less magic. LIM0 seems to be the same for all precisions, and > s_erf.c spells it less magically as 1/0.35. 1/0.35 is exactly representable. 2.85715 is exactly representable. > > Index: ld80/s_erfl.c > > =================================================================== > > --- ld80/s_erfl.c (revision 0) > > +++ ld80/s_erfl.c (working copy) > > @@ -0,0 +1,353 @@ > > ... > > +long double > > +erfl(long double x) > > ... > > + /* x is INF or NaN */ > > + if (hx >= 0x7fff) { > > + if (hx == 0xffff) > > + return (-1); > > + if (lx == 0x8000000000000000) > > + return (1); > > + return (x * x); > > + } > > After changing the first hx test to ((hx & 0x7fff) == 0x7fff) and removing > the second hx test as above, this seems to be a correct but slow way to > classify Infs and NaNs. Oops, we still need a second hx test to classify > -Inf. This must be done after testing the normalization bit: > > /* Wrong comment intentionally left out. */ > if ((hx & 0x7fff) == 0x7fff) { > if (lx == 0x8000000000000000ULL) > return (hx & 0x8000 ? -1 : 1); /* erf(+-Inf) = +-1 */ > return (x + x); /* erf(NaN/pseudo{Inf,NaN}) = qNaN */ > } > > I don't like the ULL suffix, but it is needed to fix warnings for > non-C99 compilers, and I used it in s_logl.c. The above may be slow, > but it is close to what I used in s_logl.c. There it is impossible > to avoid loading lx. s_logl.c is a little more complicated, both to > handle denormals and to classify unnormals as NaNs so that they get > handled by the x+x return. I vaguely recall using a "xxx == 0x7ff" type of test. I may have it in a different msun tree or worse lost it. I go over the special case handling. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Thu Dec 5 17:37:06 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A23E648A for ; Thu, 5 Dec 2013 17:37:06 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 825A91E69 for ; Thu, 5 Dec 2013 17:37:06 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id rB5Hb02c064655 for ; Thu, 5 Dec 2013 09:37:00 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id rB5Hb0xQ064654 for freebsd-numerics@freebsd.org; Thu, 5 Dec 2013 09:37:00 -0800 (PST) (envelope-from sgk) Date: Thu, 5 Dec 2013 09:37:00 -0800 From: Steve Kargl To: freebsd-numerics@freebsd.org Subject: dead code in lgamma_r[f]? Message-ID: <20131205173700.GA64575@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 17:37:06 -0000 In reviewing the implementation details for lgamma_r[f], I noticed that in src/e_lgamma.c (and similar code in e_lgammaf.c): static double sin_pi(double x) { ... } else { if(ix>=0x43400000) { y = zero; n = 0; /* y must be even */ } else { if(ix<0x43300000) z = y+two52; /* exact */ ... } double __ieee754_lgamma_r(double x, int *signgamp) { ... if(hx<0) { if(ix>=0x43300000) /* |x|>=2**52, must be -integer */ return one/zero; t = sin_pi(x); ... } sin_pi is only called as shown above. It is appears that the 'if(ix>=0x43400000)' block is dead code. It also appears that (if I'm reading the code correctly) the 'if(ix<0x43300000)' is always true. Thus, if seems that following patch (note cut-n-paste whitespace munging) should be appropriate: Index: src/e_lgamma_r.c =================================================================== --- src/e_lgamma_r.c (revision 1427) +++ src/e_lgamma_r.c (working copy) @@ -177,15 +177,11 @@ y = 2.0*(y - floor(y)); /* y = |x| mod 2.0 */ n = (int) (y*4.0); } else { - if(ix>=0x43400000) { - y = zero; n = 0; /* y must be even */ - } else { - if(ix<0x43300000) z = y+two52; /* exact */ - GET_LOW_WORD(n,z); - n &= 1; - y = n; - n<<= 2; - } + z = y+two52; /* exact */ + GET_LOW_WORD(n,z); + n &= 1; + y = n; + n<<= 2; } switch (n) { case 0: y = __kernel_sin(pi*y,zero,0); break; Am I mssing something here? -- Steve From owner-freebsd-numerics@FreeBSD.ORG Thu Dec 5 18:23:25 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B39195B0 for ; Thu, 5 Dec 2013 18:23:25 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 777C3119A for ; Thu, 5 Dec 2013 18:23:25 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id rB5INOwT065173 for ; Thu, 5 Dec 2013 10:23:24 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id rB5INOmH065172 for freebsd-numerics@freebsd.org; Thu, 5 Dec 2013 10:23:24 -0800 (PST) (envelope-from sgk) Date: Thu, 5 Dec 2013 10:23:24 -0800 From: Steve Kargl To: freebsd-numerics@freebsd.org Subject: Re: dead code in lgamma_r[f]? Message-ID: <20131205182324.GA65135@troutmask.apl.washington.edu> References: <20131205173700.GA64575@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131205173700.GA64575@troutmask.apl.washington.edu> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 18:23:25 -0000 On Thu, Dec 05, 2013 at 09:37:00AM -0800, Steve Kargl wrote: > In reviewing the implementation details for lgamma_r[f], > I noticed that in src/e_lgamma.c (and similar code in > e_lgammaf.c): These should be e_lgamma_r.c and e_lgammaf_r.c. I've only done some limited testing, but the patch following my sig seems to work. Note the first part of the diff allows us to remove the cast to float in e_lgamma_r.c, which minimizes the diff between e_lgamma_r.c and e_lgammaf_r.c (and e_lgammal_r.c :). -- Steve Index: src/e_lgamma_r.c =================================================================== --- src/e_lgamma_r.c (revision 1427) +++ src/e_lgamma_r.c (working copy) @@ -173,19 +173,15 @@ */ z = floor(y); if(z!=y) { /* inexact anyway */ - y *= 0.5; - y = 2.0*(y - floor(y)); /* y = |x| mod 2.0 */ - n = (int) (y*4.0); + y /= 2; + y = 2*(y - floor(y)); /* y = |x| mod 2.0 */ + n = (int)(y*4); } else { - if(ix>=0x43400000) { - y = zero; n = 0; /* y must be even */ - } else { - if(ix<0x43300000) z = y+two52; /* exact */ - GET_LOW_WORD(n,z); - n &= 1; - y = n; - n<<= 2; - } + z = y+two52; /* exact */ + GET_LOW_WORD(n,z); + n &= 1; + y = n; + n<<= 2; } switch (n) { case 0: y = __kernel_sin(pi*y,zero,0); break; Index: src/e_lgammaf_r.c =================================================================== --- src/e_lgammaf_r.c (revision 1427) +++ src/e_lgammaf_r.c (working copy) @@ -106,19 +106,15 @@ */ z = floorf(y); if(z!=y) { /* inexact anyway */ - y *= (float)0.5; - y = (float)2.0*(y - floorf(y)); /* y = |x| mod 2.0 */ - n = (int) (y*(float)4.0); + y /= 2; + y = 2*(y - floorf(y)); /* y = |x| mod 2.0 */ + n = (int)(y*4); } else { - if(ix>=0x4b800000) { - y = zero; n = 0; /* y must be even */ - } else { - if(ix<0x4b000000) z = y+two23; /* exact */ - GET_FLOAT_WORD(n,z); - n &= 1; - y = n; - n<<= 2; - } + z = y+two23; /* exact */ + GET_FLOAT_WORD(n,z); + n &= 1; + y = n; + n<<= 2; } switch (n) { case 0: y = __kernel_sindf(pi*y); break; From owner-freebsd-numerics@FreeBSD.ORG Thu Dec 5 19:07:03 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6E02B58D for ; Thu, 5 Dec 2013 19:07:03 +0000 (UTC) Received: from mail-pb0-x22a.google.com (mail-pb0-x22a.google.com [IPv6:2607:f8b0:400e:c01::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 43A061436 for ; Thu, 5 Dec 2013 19:07:03 +0000 (UTC) Received: by mail-pb0-f42.google.com with SMTP id uo5so26716675pbc.1 for ; Thu, 05 Dec 2013 11:07:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=8MMGfgT+QboKsBWjYJ69xdpKAI/ADW4b2OcY13Mou5E=; b=Z2mDWR2Lwj+Cm+ME+x+bA1crJ78WWffrRcACRUYZbJcYit+rWbam4u7aPaSk9o9c2+ a86btcRQx4blkeeRHE8OGIzPivxgfdy2cKynUCBEyqeeUpcHkensK3XnJwGkcLpMQ+y5 3MxHGJqpyLtfNnNykq8TzjHhk18FMQIdS7xIaCxKPVv+1inI+//OtkQkW9BcHoaVE6Uf 0NfPVdcuClwvYz17f/eIFUMjESbHAquX2JKm6crNh8PS43CmpJhRNR5W/l5AvFMSl5BL GV1GBGO2yncoAUeMHWag3jBlZeK203vT1Bz7eUDDWazwSf9wRFGkyz0mdGUfJbXgEq+0 e6hg== X-Received: by 10.66.254.200 with SMTP id ak8mr33801804pad.24.1386270422708; Thu, 05 Dec 2013 11:07:02 -0800 (PST) MIME-Version: 1.0 Received: by 10.70.34.235 with HTTP; Thu, 5 Dec 2013 11:06:22 -0800 (PST) In-Reply-To: <20131205182324.GA65135@troutmask.apl.washington.edu> References: <20131205173700.GA64575@troutmask.apl.washington.edu> <20131205182324.GA65135@troutmask.apl.washington.edu> From: Filipe Maia Date: Thu, 5 Dec 2013 20:06:22 +0100 Message-ID: Subject: Re: dead code in lgamma_r[f]? To: Steve Kargl , freebsd-numerics@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 19:07:03 -0000 What if ix == 0x433xxxxx? Wouldn't the if clause be false? On Thu, Dec 5, 2013 at 7:23 PM, Steve Kargl < sgk@troutmask.apl.washington.edu> wrote: > On Thu, Dec 05, 2013 at 09:37:00AM -0800, Steve Kargl wrote: > > In reviewing the implementation details for lgamma_r[f], > > I noticed that in src/e_lgamma.c (and similar code in > > e_lgammaf.c): > > These should be e_lgamma_r.c and e_lgammaf_r.c. > I've only done some limited testing, but the > patch following my sig seems to work. > > Note the first part of the diff allows us to remove the > cast to float in e_lgamma_r.c, which minimizes the diff > between e_lgamma_r.c and e_lgammaf_r.c (and e_lgammal_r.c :). > > -- > Steve > > Index: src/e_lgamma_r.c > =================================================================== > --- src/e_lgamma_r.c (revision 1427) > +++ src/e_lgamma_r.c (working copy) > @@ -173,19 +173,15 @@ > */ > z = floor(y); > if(z!=y) { /* inexact anyway */ > - y *= 0.5; > - y = 2.0*(y - floor(y)); /* y = |x| mod 2.0 */ > - n = (int) (y*4.0); > + y /= 2; > + y = 2*(y - floor(y)); /* y = |x| mod 2.0 */ > + n = (int)(y*4); > } else { > - if(ix>=0x43400000) { > - y = zero; n = 0; /* y must be even */ > - } else { > - if(ix<0x43300000) z = y+two52; /* exact */ > - GET_LOW_WORD(n,z); > - n &= 1; > - y = n; > - n<<= 2; > - } > + z = y+two52; /* exact */ > + GET_LOW_WORD(n,z); > + n &= 1; > + y = n; > + n<<= 2; > } > switch (n) { > case 0: y = __kernel_sin(pi*y,zero,0); break; > Index: src/e_lgammaf_r.c > =================================================================== > --- src/e_lgammaf_r.c (revision 1427) > +++ src/e_lgammaf_r.c (working copy) > @@ -106,19 +106,15 @@ > */ > z = floorf(y); > if(z!=y) { /* inexact anyway */ > - y *= (float)0.5; > - y = (float)2.0*(y - floorf(y)); /* y = |x| mod 2.0 */ > - n = (int) (y*(float)4.0); > + y /= 2; > + y = 2*(y - floorf(y)); /* y = |x| mod 2.0 */ > + n = (int)(y*4); > } else { > - if(ix>=0x4b800000) { > - y = zero; n = 0; /* y must be even */ > - } else { > - if(ix<0x4b000000) z = y+two23; /* exact */ > - GET_FLOAT_WORD(n,z); > - n &= 1; > - y = n; > - n<<= 2; > - } > + z = y+two23; /* exact */ > + GET_FLOAT_WORD(n,z); > + n &= 1; > + y = n; > + n<<= 2; > } > switch (n) { > case 0: y = __kernel_sindf(pi*y); break; > _______________________________________________ > freebsd-numerics@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-numerics > To unsubscribe, send any mail to "freebsd-numerics-unsubscribe@freebsd.org > " > From owner-freebsd-numerics@FreeBSD.ORG Thu Dec 5 19:52:29 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BB0401F2 for ; Thu, 5 Dec 2013 19:52:29 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7D98016FC for ; Thu, 5 Dec 2013 19:52:29 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id rB5JqPMj065813; Thu, 5 Dec 2013 11:52:25 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id rB5JqPqw065812; Thu, 5 Dec 2013 11:52:25 -0800 (PST) (envelope-from sgk) Date: Thu, 5 Dec 2013 11:52:25 -0800 From: Steve Kargl To: Filipe Maia Subject: Re: dead code in lgamma_r[f]? Message-ID: <20131205195225.GA65732@troutmask.apl.washington.edu> References: <20131205173700.GA64575@troutmask.apl.washington.edu> <20131205182324.GA65135@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 19:52:29 -0000 On Thu, Dec 05, 2013 at 08:06:22PM +0100, Filipe Maia wrote: > What if ix == 0x433xxxxx? See below. > Wouldn't the if clause be false? See below. > > Index: src/e_lgamma_r.c > > =================================================================== > > --- src/e_lgamma_r.c (revision 1427) > > +++ src/e_lgamma_r.c (working copy) > > @@ -173,19 +173,15 @@ > > */ > > z = floor(y); > > if(z!=y) { /* inexact anyway */ > > - y *= 0.5; > > - y = 2.0*(y - floor(y)); /* y = |x| mod 2.0 */ > > - n = (int) (y*4.0); > > + y /= 2; > > + y = 2*(y - floor(y)); /* y = |x| mod 2.0 */ > > + n = (int)(y*4); If you have a y value that corresponds to ix=0x433xxxxx where one of the x is nonzero, I believe that you end up going through this portion of code. > > } else { You end up here if and only if z==y, which means y is integral. As noted in the original email, the 'if(ix>=0x43400000)' below is dead code because this code in __ieee754_lgamma_r(): if(ix>=0x43300000) /* |x|>=2**52, must be -integer */ return one/zero; t = sin_pi(x); prevents the call to sin_pi(x). > > - if(ix>=0x43400000) { > > - y = zero; n = 0; /* y must be even */ > > - } else { > > - if(ix<0x43300000) z = y+two52; /* exact */ If we again look at the code from __ieee754_lgamma_r(), we see that sin_pi() is called if ix < 0x43300000, so by the time we arrive at the 'if(ix<0x43300000)' statement we already know that the condition is true. > > - GET_LOW_WORD(n,z); > > - n &= 1; > > - y = n; > > - n<<= 2; > > - } > > + z = y+two52; /* exact */ > > + GET_LOW_WORD(n,z); > > + n &= 1; > > + y = n; > > + n<<= 2; > > } > > switch (n) { > > case 0: y = __kernel_sin(pi*y,zero,0); break; -- Steve From owner-freebsd-numerics@FreeBSD.ORG Fri Dec 6 00:37:04 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 75750D04 for ; Fri, 6 Dec 2013 00:37:04 +0000 (UTC) Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au [211.29.132.59]) by mx1.freebsd.org (Postfix) with ESMTP id 2437A1862 for ; Fri, 6 Dec 2013 00:37:03 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id 2C8131A4774; Fri, 6 Dec 2013 11:36:53 +1100 (EST) Date: Fri, 6 Dec 2013 11:36:52 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: dead code in lgamma_r[f]? In-Reply-To: <20131205195225.GA65732@troutmask.apl.washington.edu> Message-ID: <20131206102724.W1033@besplex.bde.org> References: <20131205173700.GA64575@troutmask.apl.washington.edu> <20131205182324.GA65135@troutmask.apl.washington.edu> <20131205195225.GA65732@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=DstvpgP+ c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=m8NKqXhiB1EA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=ElNGO9NIuDsA:10 a=xXsPXvZvRc-c0by79MAA:9 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@FreeBSD.org, Filipe Maia X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 00:37:04 -0000 On Thu, 5 Dec 2013, Steve Kargl wrote: > On Thu, Dec 05, 2013 at 08:06:22PM +0100, Filipe Maia wrote: >> What if ix == 0x433xxxxx? There are also 8 more x's that aren't explicitly tested for the double case, and more for the long double case. > See below. > >> Wouldn't the if clause be false? > > See below. > >>> Index: src/e_lgamma_r.c >>> =================================================================== >>> --- src/e_lgamma_r.c (revision 1427) >>> +++ src/e_lgamma_r.c (working copy) >>> @@ -173,19 +173,15 @@ >>> */ >>> z = floor(y); BTW, floor() is a fairly pessimal way to convert to an integer. It doesn't even avoid overflow by rounding towards zero (so callers must limit the arg a little, and I think they do for at least the problematic negative args as a side effect of underflowing early for such args). rint() is probably better. On i386, both floor() and rint() are in asm, but floor() has to do a slow switch of the rounding mode to force rounding towards minus infinity. On amd64, both are in C and take about the same amount of code. On amd64, this not optimal for at least rint(). >>> if(z!=y) { /* inexact anyway */ >>> - y *= 0.5; >>> - y = 2.0*(y - floor(y)); /* y = |x| mod 2.0 */ >>> - n = (int) (y*4.0); >>> + y /= 2; >>> + y = 2*(y - floor(y)); /* y = |x| mod 2.0 */ >>> + n = (int)(y*4); > > If you have a y value that corresponds to ix=0x433xxxxx where > one of the x is nonzero, I believe that you end up going through > this portion of code. Perhaps for the float case where all the bits are in the ix word, but not in the double case assuming that 0x43400000 is the correct threshold for detecting even numbers. Indeed, the double with bits 0x4340000000000001 is odd. It is x = 0x1.0000000000001p+52 x = 4503599627370497.00 We see that the 0x43400000 is the correct threshold for evenness, and 1 below that for the exponent gives oddness by the least significant bit. > >>> } else { > > You end up here if and only if z==y, which means y is integral. > As noted in the original email, the 'if(ix>=0x43400000)' below > is dead code because this code in __ieee754_lgamma_r(): > > if(ix>=0x43300000) /* |x|>=2**52, must be -integer */ > return one/zero; > t = sin_pi(x); > > prevents the call to sin_pi(x). Only for negative integers. > >>> - if(ix>=0x43400000) { >>> - y = zero; n = 0; /* y must be even */ This case might be just an optimization, but it is necessary to avoid overflow somewhere. Adding two52 would preserve evenness, but it might give spurious overflow. >>> - } else { >>> - if(ix<0x43300000) z = y+two52; /* exact */ This case is needed for putting the oddness bit in the least significant bit. When ix == 0x43300000, it is already there, as the above example shows. OTOH, when ix == 0x43300000, two52 must not be added, since it moves the oddness bit up by 1 place. E.g., adding 0x1p52 in the above example gives: x = 0x1p+53 x = 9007199254740992.00 I just noticed that this addition of two52 barely escapes problems with extra precision. The same method is used elswhere and requires more care. It might still not take enough care with double rounding. > If we again look at the code from __ieee754_lgamma_r(), we see > that sin_pi() is called if ix < 0x43300000, so by the time we > arrive at the 'if(ix<0x43300000)' statement we already know that > the condition is true. No, only for negative integers. hx<0 classifies negative values, and then ix>=0x43300000 classifies numbers that are so large negative that they must be integers, and the result is a sort of pole error. We are just filtering out this case, perhaps as an optimization. > >>> - GET_LOW_WORD(n,z); >>> - n &= 1; >>> - y = n; >>> - n<<= 2; >>> - } >>> + z = y+two52; /* exact */ The main bug that this introduces is misclassifying all odd (positive) intgers with ix = 0x433xxxxx. Similarly in float precision, except the misclassified odd integers are easier to describe. They are ones with ix between 0x4b000000 and 0x4b7fffff with low bit 1. >>> + GET_LOW_WORD(n,z); >>> + n &= 1; >>> + y = n; >>> + n<<= 2; >>> } >>> switch (n) { >>> case 0: y = __kernel_sin(pi*y,zero,0); break; >From a previous reply: > Note the first part of the diff allows us to remove the > cast to float in e_lgamma_r.c, which minimizes the diff > between e_lgamma_r.c and e_lgammaf_r.c (and e_lgammal_r.c :). The gamma functions are not good candidates for conversion to long double, since they are very inaccurate (especially near zeros for lgamma()) in float and double precision. Bruce From owner-freebsd-numerics@FreeBSD.ORG Fri Dec 6 01:55:28 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 68E124FC for ; Fri, 6 Dec 2013 01:55:28 +0000 (UTC) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 28A211E44 for ; Fri, 6 Dec 2013 01:55:27 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 62215783839; Fri, 6 Dec 2013 12:55:19 +1100 (EST) Date: Fri, 6 Dec 2013 12:55:16 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans Subject: Re: dead code in lgamma_r[f]? In-Reply-To: <20131206102724.W1033@besplex.bde.org> Message-ID: <20131206114426.I1329@besplex.bde.org> References: <20131205173700.GA64575@troutmask.apl.washington.edu> <20131205182324.GA65135@troutmask.apl.washington.edu> <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=DstvpgP+ c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=m8NKqXhiB1EA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=ElNGO9NIuDsA:10 a=QbVCLMTJzu0JbE78RuEA:9 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@FreeBSD.org, Filipe Maia , Steve Kargl X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 01:55:28 -0000 On Fri, 6 Dec 2013, Bruce Evans wrote: > On Thu, 5 Dec 2013, Steve Kargl wrote: > ... >> If we again look at the code from __ieee754_lgamma_r(), we see >> that sin_pi() is called if ix < 0x43300000, so by the time we >> arrive at the 'if(ix<0x43300000)' statement we already know that >> the condition is true. > > No, only for negative integers. hx<0 classifies negative values, and > then ix>=0x43300000 classifies numbers that are so large negative that > they must be integers, and the result is a sort of pole error. We > are just filtering out this case, perhaps as an optimization. Oops, sin_pi() is only called for negative integers, so your change seems to be correct. Just add a comment about the limited domain of sin_pi() (it already has one saying that "x is assumed negative". With the limited range, we can improve more things. floor() can be replaced by adding and subtracting 0x1p52 or 0x1.8p52 like we do for trig and exp functions, followed by an adjustment to get floor() if necessary. We can use irint(z) instead of bit magic or (int)z to extract the parity bit of z z == y case. Note that fdlibm can't use (int)z since this can overflow, and overflow is often not benign in practice. irint(z) can overflow too, but it is easy to specify irint() as having benign behaviour. Bruce From owner-freebsd-numerics@FreeBSD.ORG Fri Dec 6 02:08:08 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 645498E5 for ; Fri, 6 Dec 2013 02:08:08 +0000 (UTC) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 186DE1F01 for ; Fri, 6 Dec 2013 02:08:07 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id F02A0D4053E; Fri, 6 Dec 2013 13:07:58 +1100 (EST) Date: Fri, 6 Dec 2013 13:07:57 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans Subject: Re: dead code in lgamma_r[f]? In-Reply-To: <20131206114426.I1329@besplex.bde.org> Message-ID: <20131206130353.U1549@besplex.bde.org> References: <20131205173700.GA64575@troutmask.apl.washington.edu> <20131205182324.GA65135@troutmask.apl.washington.edu> <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> <20131206114426.I1329@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=bpB1Wiqi c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=m8NKqXhiB1EA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=ElNGO9NIuDsA:10 a=NULW6ebW8qo2wbMJ_24A:9 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@FreeBSD.org, Filipe Maia , Steve Kargl X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Dec 2013 02:08:08 -0000 On Fri, 6 Dec 2013, Bruce Evans wrote: > On Fri, 6 Dec 2013, Bruce Evans wrote: > >> On Thu, 5 Dec 2013, Steve Kargl wrote: >> ... >>> If we again look at the code from __ieee754_lgamma_r(), we see >>> that sin_pi() is called if ix < 0x43300000, so by the time we >>> arrive at the 'if(ix<0x43300000)' statement we already know that >>> the condition is true. >> >> No, only for negative integers. hx<0 classifies negative values, and >> then ix>=0x43300000 classifies numbers that are so large negative that >> they must be integers, and the result is a sort of pole error. We >> are just filtering out this case, perhaps as an optimization. > > Oops, sin_pi() is only called for negative integers, so your change > seems to be correct. Just add a comment about the limited domain > of sin_pi() (it already has one saying that "x is assumed negative". > > With the limited range, we can improve more things. floor() can be > replaced by adding and subtracting 0x1p52 or 0x1.8p52 like we do for > trig and exp functions, followed by an adjustment to get floor() if > necessary. We can use irint(z) instead of bit magic or (int)z to > extract the parity bit of z z == y case. Note that fdlibm can't use > (int)z since this can overflow, and overflow is often not benign in > practice. irint(z) can overflow too, but it is easy to specify irint() > as having benign behaviour. Oops^2. s/easy/hard/. When irint(z) overflows on i386, it gives an overflow exception and a result of INT_MIN instead of benign truncation. To avoid these problems, it would have to extract the low bits like lgamma() does now, but that would slow down the usual case of a severely limited range. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sat Dec 7 06:46:09 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8DBC288A for ; Sat, 7 Dec 2013 06:46:09 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 675651839 for ; Sat, 7 Dec 2013 06:46:09 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id rB76k3OR076097; Fri, 6 Dec 2013 22:46:03 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id rB76k2Z7076096; Fri, 6 Dec 2013 22:46:02 -0800 (PST) (envelope-from sgk) Date: Fri, 6 Dec 2013 22:46:02 -0800 From: Steve Kargl To: Bruce Evans Subject: Re: dead code in lgamma_r[f]? Message-ID: <20131207064602.GA76042@troutmask.apl.washington.edu> References: <20131205173700.GA64575@troutmask.apl.washington.edu> <20131205182324.GA65135@troutmask.apl.washington.edu> <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> <20131206114426.I1329@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131206114426.I1329@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@FreeBSD.org, Filipe Maia X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Dec 2013 06:46:09 -0000 On Fri, Dec 06, 2013 at 12:55:16PM +1100, Bruce Evans wrote: > On Fri, 6 Dec 2013, Bruce Evans wrote: > > > On Thu, 5 Dec 2013, Steve Kargl wrote: > > ... > >> If we again look at the code from __ieee754_lgamma_r(), we see > >> that sin_pi() is called if ix < 0x43300000, so by the time we > >> arrive at the 'if(ix<0x43300000)' statement we already know that > >> the condition is true. > > > > No, only for negative integers. hx<0 classifies negative values, and > > then ix>=0x43300000 classifies numbers that are so large negative that > > they must be integers, and the result is a sort of pole error. We > > are just filtering out this case, perhaps as an optimization. > > Oops, sin_pi() is only called for negative integers, so your change > seems to be correct. Just add a comment about the limited domain > of sin_pi() (it already has one saying that "x is assumed negative". > I wish to retract my earlier statement that after 2 additional years of reading fdlibm code that it was easier to work with. I spent the better part of Friday giving myself a headache trying to understand the algorithm for lgamma_r(). The code for x in the interval (0,2) does not match any comment in lgamma_r(). I also think, but can't prove yet, that like erff() the polynomial and rational approximations in lgammaf_r() have too many terms. -- Steve