From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 19:53:27 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 73817106564A for ; Sun, 16 Sep 2012 19:53:27 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id E9EDC8FC0A for ; Sun, 16 Sep 2012 19:53:26 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8GJrIne020374 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 17 Sep 2012 05:53:19 +1000 Date: Mon, 17 Sep 2012 05:53:18 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20120917022614.R2943@besplex.bde.org> Message-ID: <20120917041848.F3504@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Stephen Montgomery-Smith , freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 19:53:27 -0000 On Mon, 17 Sep 2012, Bruce Evans wrote: > On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote: > >> On 09/16/2012 12:14 AM, Bruce Evans wrote: >>> On Sat, 15 Sep 2012, Stephen Montgomery-Smith wrote: >>> >>>> One more thing I would like an opinion on. >>>> >>>> In my code I check for |z| being small, and then use the approximations: >>>> casinh(z) = z >>>> cacos(z) = Pi - z >>> >>> Actually Pi/2 - z. >>> >>>> catanh(z) = z >>>> >>>> However these approximations are not used in the papers by Hull et al, >>>> and the code works just fine if I don't include these in the code. >>> >>> Probably a bug in the papers. >> >> It is not a bug in the papers. The algorithms they provide really do work >> when |z| is small. In fact, you have to deal separately with the cases |x| >> is small and |y| is small (z=x+I*y), so dealing with both of them being >> small is not any additional problem. >> >> And now I see your other post, that using PI/2 is problematic especially >> when rounding is not to nearest. (Then the problem of rounding PI/2 >> properly is relegated to the acos function, and so it is someone else's >> problem.) >> >> So all things being said and done, I am going to remove the use of these >> approximations. > > I don't like that. It will be much slower on almost 1/4 of arg space. > The only reason to consider not doing it is that the args that it > applies to are not very likely, and optimizing for them may pessimize > the usual case. It gives the expected pessimizations, and unexpected accuracy improvements and unimprovements. On amd64: % 9,10c9,10 % < rcacos:max_er = 0x6947ecac 3.2900, avg_er = 0.317, #>=1:0.5 = 30489:303638 % < 5.47 real 5.47 user 0.00 sys % --- % > rcacos:max_er = 0x6947ecac 3.2900, avg_er = 0.317, #>=1:0.5 = 30489:268862 % > 5.87 real 5.86 user 0.00 sys Only float functions were updated for this test, and results are only shown for float functions (comparing them with double functions). '<' in the diff is for an old result and '>' for a new result. The above shows: - accuracy improvement. Apparently the thresholds were too large. - slowdown of 0.39 seconds. The test program mainly calls cacos() and cacosf(), but has some overheads. Say 1 1.47 seconds for the overheads and 2 seconds for each of the functions. 0.39/2 is almost 20%. % 21,22c21,22 % < rcacosh:max_er = 0x51e70742 2.5595, avg_er = 0.257, #>=1:0.5 = 25766:3286888 % < 5.81 real 5.79 user 0.00 sys % --- % > rcacosh:max_er = 0x51e70742 2.5595, avg_er = 0.258, #>=1:0.5 = 26034:3313256 % > 6.06 real 6.05 user 0.00 sys Similar slowdown, but now the old version os more accurate. Apparently the general code doesn't reduce to simply Pi/2. % 34c34 % < 5.98 real 5.98 user 0.00 sys % --- % > 6.30 real 6.28 user 0.00 sys This is for rcasin. Similar slowdown, but no change in values. % 45,46c45,46 % < rcasinh:max_er = 0x51e70742 2.5595, avg_er = 0.257, #>=1:0.5 = 25766:3286888 % < 5.57 real 5.56 user 0.00 sys % --- % > rcasinh:max_er = 0x51e70742 2.5595, avg_er = 0.258, #>=1:0.5 = 26034:3313256 % > 5.82 real 5.81 user 0.00 sys Like rcacosh (lose speed and accuracy). % 57,58c57,58 % < rcatan:max_er = 0x51d7c47a 2.5576, avg_er = 0.295, #>=1:0.5 = 77670:443246 % < 3.69 real 3.68 user 0.00 sys % --- % > rcatan:max_er = 0x51d7c47a 2.5576, avg_er = 0.296, #>=1:0.5 = 77874:469678 % > 3.64 real 3.64 user 0.00 sys No slowdown, but accuracy loss. % 69,70c69,70 % < rcatanh:max_er = 0x5304b263 2.5943, avg_er = 0.201, #>=1:0.5 = 185298:1337156 % < 3.88 real 3.86 user 0.00 sys % --- % > rcatanh:max_er = 0x5304b263 2.5943, avg_er = 0.203, #>=1:0.5 = 204986:1370276 % > 3.84 real 3.83 user 0.00 sys Like rcatan, but the accuracy loss is smaller. % [... unrelated functions] % [... imaginary parts have similar behaviour (various symmetries)] On i386: % 9,10c9,10 % < rcacos:max_er = 0x4517ee94 2.1592, avg_er = 0.315, #>=1:0.5 = 4607:246852 % < 8.18 real 7.48 user 0.02 sys % --- % > rcacos:max_er = 0x4517ee94 2.1592, avg_er = 0.314, #>=1:0.5 = 4607:212076 % > 8.48 real 7.82 user 0.03 sys % ... % 201,202c201,202 % < icacosh:max_er = 0x4517ee94 2.1592, avg_er = 0.315, #>=1:0.5 = 4607:246852 % < 8.62 real 8.50 user 0.01 sys % --- % > icacosh:max_er = 0x4517ee94 2.1592, avg_er = 0.314, #>=1:0.5 = 4607:212076 % > 9.88 real 9.05 user 0.03 sys Similar slowdowns (only look at the user time), but only changes in accuracy for these two. Now the accuracy is never reduced. I think you just have to reduce the threshold a little in the old version to get this improvement. Indeed, the following works well for me (only edited the float version): @ --- /home/stephen/public_html/catrigf.c 2012-09-16 15:14:05.000000000 +0000 @ +++ catrigf.c 2012-09-16 19:23:18.559723000 +0000 @ @@ -165,4 +165,9 @@ @ } @ @ + /* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */ @ + if (ax < 2048 * FLT_EPSILON && ay < 2048 *FLT_EPSILON) @ + if ((int)ax==0 && (int)ay==0) @ + return (z); @ + @ do_hard_work(ax, ay, &rx, &B_is_usable, &B, &sqrt_A2my2, &new_y); @ if (B_is_usable) @ @@ -200,5 +205,6 @@ @ if (isinf(y)) @ return (cpackf(x+x, -y)); @ - if (x == 0) return (cpackf(m_pi_2, y+y)); @ + if (x == 0) @ + return (cpackf(m_pi_2+tiny, y+y)); @ return (cpackf(x+0.0L+(y+0), x+0.0L+(y+0))); @ } Also fix NaN cases. @ @@ -214,4 +220,8 @@ @ } @ @ + /* XXX the number for ay is related to sqrt(6 * FLT_EPSILON). */ @ + if (ax < FLT_EPSILON / 8 && ay < 2048 * FLT_EPSILON) @ + return (cpackf(m_pi_2 + tiny, -y)); @ + @ do_hard_work(ay, ax, &ry, &B_is_usable, &B, &sqrt_A2mx2, &new_x); @ if (B_is_usable) { The thresholds should be asymmetric since the real part uses the approximation Pi/2 + O(x) while the imaginary part uses the approximation -y + O(y**3). If we used the approximation Pi/2 - x for the real part as before, then the thresholds could be more symmetric and more cases could be handled here. But then the expression with m_pi_2 would be (m_pi_2 - x) again, so it wouldn't necessarily set inexact, and the special code for setting inexact would be needed again. The old thresholds were too conservative. I'm being sloppy with the xy product terms and nonstandard rounding modes. The magic numbers were whatever minimised the number of incorrectly rounded cases in my tests. sqrt(6 * FLT_EPSILON) is obviously not conservative enough. The divisor of 8 gives 3 guard digits which would fix some cases (iff the cases go through here). The magic 2048 gives about 4.5 guard "digits". A couple more guard digits make little difference, by 1 fewer gives observably more errors. @ @@ -313,5 +323,6 @@ @ return (cpackf(copysignf(0, x), y+y)); @ if (isinf(y)) @ - return (cpackf(copysignf(0, x), copysignf(m_pi_2, y))); @ + return (cpackf(copysignf(0, x), @ + copysignf(m_pi_2 + tiny, y))); @ if (x == 0) @ return (cpackf(x, y+y)); @ @@ -320,9 +331,16 @@ @ @ if (isinf(x) || isinf(y)) @ - return (cpackf(copysignf(0, x), copysignf(m_pi_2, y))); @ + return (cpackf(copysignf(0, x), copysignf(m_pi_2 + tiny, y))); @ @ if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) @ if ((int)(1+tiny)==1) @ - return (cpackf(copysignf(real_part_reciprocal(ax, ay), x), copysignf(m_pi_2, y))); @ + return (cpackf( @ + copysignf(real_part_reciprocal(ax, ay), x), @ + copysignf(m_pi_2 + tiny, y))); @ + @ + /* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */ @ + if (ax < 2048 * FLT_EPSILON && ay < 2048 * FLT_EPSILON) @ + if ((int)ax==0 && (int)ay==0) @ + return (z); @ @ if (ax == 1 && ay < FLT_EPSILON) { Bruce