From owner-freebsd-numerics@FreeBSD.ORG  Sun Sep 16 19:53:27 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 73817106564A
	for <freebsd-numerics@freebsd.org>;
	Sun, 16 Sep 2012 19:53:27 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au
	[211.29.132.183])
	by mx1.freebsd.org (Postfix) with ESMTP id E9EDC8FC0A
	for <freebsd-numerics@freebsd.org>;
	Sun, 16 Sep 2012 19:53:26 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8GJrIne020374
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 17 Sep 2012 05:53:19 +1000
Date: Mon, 17 Sep 2012 05:53:18 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20120917022614.R2943@besplex.bde.org>
Message-ID: <20120917041848.F3504@besplex.bde.org>
References: <5017111E.6060003@missouri.edu>
	<20120814201105.T934@besplex.bde.org>
	<502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org>
	<502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org>
	<5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu>
	<504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org>
	<20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu>
	<20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu>
	<20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu>
	<20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu>
	<20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu>
	<5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu>
	<20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu>
	<20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu>
	<20120917022614.R2943@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Stephen Montgomery-Smith <stephen@missouri.edu>,
	freebsd-numerics@freebsd.org
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Sep 2012 19:53:27 -0000

On Mon, 17 Sep 2012, Bruce Evans wrote:

> On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote:
>
>> On 09/16/2012 12:14 AM, Bruce Evans wrote:
>>> On Sat, 15 Sep 2012, Stephen Montgomery-Smith wrote:
>>> 
>>>> One more thing I would like an opinion on.
>>>> 
>>>> In my code I check for |z| being small, and then use the approximations:
>>>> casinh(z) = z
>>>> cacos(z) = Pi - z
>>> 
>>> Actually Pi/2 - z.
>>> 
>>>> catanh(z) = z
>>>> 
>>>> However these approximations are not used in the papers by Hull et al,
>>>> and the code works just fine if I don't include these in the code.
>>> 
>>> Probably a bug in the papers.
>> 
>> It is not a bug in the papers.  The algorithms they provide really do work 
>> when |z| is small.  In fact, you have to deal separately with the cases |x| 
>> is small and |y| is small (z=x+I*y), so dealing with both of them being 
>> small is not any additional problem.
>> 
>> And now I see your other post, that using PI/2 is problematic especially 
>> when rounding is not to nearest.  (Then the problem of rounding PI/2 
>> properly is relegated to the acos function, and so it is someone else's 
>> problem.)
>> 
>> So all things being said and done, I am going to remove the use of these 
>> approximations.
>
> I don't like that.  It will be much slower on almost 1/4 of arg space.
> The only reason to consider not doing it is that the args that it
> applies to are not very likely, and optimizing for them may pessimize
> the usual case.

It gives the expected pessimizations, and unexpected accuracy improvements
and unimprovements.  On amd64:

% 9,10c9,10
% < rcacos:max_er = 0x6947ecac 3.2900, avg_er = 0.317, #>=1:0.5 = 30489:303638
% <         5.47 real         5.47 user         0.00 sys
% ---
% > rcacos:max_er = 0x6947ecac 3.2900, avg_er = 0.317, #>=1:0.5 = 30489:268862
% >         5.87 real         5.86 user         0.00 sys

Only float functions were updated for this test, and results are only
shown for float functions (comparing them with double functions).
'<' in the diff is for an old result and '>' for a new result.  The above
shows:

- accuracy improvement.  Apparently the thresholds were too large.
- slowdown of 0.39 seconds.  The test program mainly calls cacos()
   and cacosf(), but has some overheads.  Say 1 1.47 seconds for the
   overheads and 2 seconds for each of the functions.  0.39/2 is
   almost 20%.

% 21,22c21,22
% < rcacosh:max_er = 0x51e70742 2.5595, avg_er = 0.257, #>=1:0.5 = 25766:3286888
% <         5.81 real         5.79 user         0.00 sys
% ---
% > rcacosh:max_er = 0x51e70742 2.5595, avg_er = 0.258, #>=1:0.5 = 26034:3313256
% >         6.06 real         6.05 user         0.00 sys

Similar slowdown, but now the old version os more accurate.  Apparently
the general code doesn't reduce to simply Pi/2.

% 34c34
% <         5.98 real         5.98 user         0.00 sys
% ---
% >         6.30 real         6.28 user         0.00 sys

This is for rcasin.  Similar slowdown, but no change in values.

% 45,46c45,46
% < rcasinh:max_er = 0x51e70742 2.5595, avg_er = 0.257, #>=1:0.5 = 25766:3286888
% <         5.57 real         5.56 user         0.00 sys
% ---
% > rcasinh:max_er = 0x51e70742 2.5595, avg_er = 0.258, #>=1:0.5 = 26034:3313256
% >         5.82 real         5.81 user         0.00 sys

Like rcacosh (lose speed and accuracy).

% 57,58c57,58
% < rcatan:max_er = 0x51d7c47a 2.5576, avg_er = 0.295, #>=1:0.5 = 77670:443246
% <         3.69 real         3.68 user         0.00 sys
% ---
% > rcatan:max_er = 0x51d7c47a 2.5576, avg_er = 0.296, #>=1:0.5 = 77874:469678
% >         3.64 real         3.64 user         0.00 sys

No slowdown, but accuracy loss.

% 69,70c69,70
% < rcatanh:max_er = 0x5304b263 2.5943, avg_er = 0.201, #>=1:0.5 = 185298:1337156
% <         3.88 real         3.86 user         0.00 sys
% ---
% > rcatanh:max_er = 0x5304b263 2.5943, avg_er = 0.203, #>=1:0.5 = 204986:1370276
% >         3.84 real         3.83 user         0.00 sys

Like rcatan, but the accuracy loss is smaller.

% [... unrelated functions]

% [... imaginary parts have similar behaviour (various symmetries)]

On i386:

% 9,10c9,10
% < rcacos:max_er = 0x4517ee94 2.1592, avg_er = 0.315, #>=1:0.5 = 4607:246852
% <         8.18 real         7.48 user         0.02 sys
% ---
% > rcacos:max_er = 0x4517ee94 2.1592, avg_er = 0.314, #>=1:0.5 = 4607:212076
% >         8.48 real         7.82 user         0.03 sys
% ...
% 201,202c201,202
% < icacosh:max_er = 0x4517ee94 2.1592, avg_er = 0.315, #>=1:0.5 = 4607:246852
% <         8.62 real         8.50 user         0.01 sys
% ---
% > icacosh:max_er = 0x4517ee94 2.1592, avg_er = 0.314, #>=1:0.5 = 4607:212076
% >         9.88 real         9.05 user         0.03 sys

Similar slowdowns (only look at the user time), but only changes in
accuracy for these two.  Now the accuracy is never reduced.  I think
you just have to reduce the threshold a little in the old version to
get this improvement.  Indeed, the following works well for me (only
edited the float version):

@ --- /home/stephen/public_html/catrigf.c	2012-09-16 15:14:05.000000000 +0000
@ +++ catrigf.c	2012-09-16 19:23:18.559723000 +0000
@ @@ -165,4 +165,9 @@
@  		}
@ 
@ +	/* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */
@ +	if (ax < 2048 * FLT_EPSILON && ay < 2048 *FLT_EPSILON)
@ +		if ((int)ax==0 && (int)ay==0)
@ +			return (z);
@ +
@  	do_hard_work(ax, ay, &rx, &B_is_usable, &B, &sqrt_A2my2, &new_y);
@  	if (B_is_usable)
@ @@ -200,5 +205,6 @@
@  		if (isinf(y))
@  			return (cpackf(x+x, -y));
@ -		if (x == 0) return (cpackf(m_pi_2, y+y));
@ +		if (x == 0)
@ +			return (cpackf(m_pi_2+tiny, y+y));
@  		return (cpackf(x+0.0L+(y+0), x+0.0L+(y+0)));
@  	}

Also fix NaN cases.

@ @@ -214,4 +220,8 @@
@  		}
@ 
@ +	/* XXX the number for ay is related to sqrt(6 * FLT_EPSILON). */
@ +	if (ax < FLT_EPSILON / 8 && ay < 2048 * FLT_EPSILON)
@ +		return (cpackf(m_pi_2 + tiny, -y));
@ +
@  	do_hard_work(ay, ax, &ry, &B_is_usable, &B, &sqrt_A2mx2, &new_x);
@  	if (B_is_usable) {

The thresholds should be asymmetric since the real part uses the
approximation Pi/2 + O(x) while the imaginary part uses the approximation
-y + O(y**3).  If we used the approximation Pi/2 - x for the real part
as before, then the thresholds could be more symmetric and more cases
could be handled here.  But then the expression with m_pi_2 would be
(m_pi_2 - x) again, so it wouldn't necessarily set inexact, and the
special code for setting inexact would be needed again.  The old
thresholds were too conservative.  I'm being sloppy with the xy product
terms and nonstandard rounding modes.

The magic numbers were whatever minimised the number of incorrectly
rounded cases in my tests.  sqrt(6 * FLT_EPSILON) is obviously not
conservative enough.  The divisor of 8 gives 3 guard digits which would
fix some cases (iff the cases go through here).  The magic 2048 gives
about 4.5 guard "digits".  A couple more guard digits make little
difference, by 1 fewer gives observably more errors.

@ @@ -313,5 +323,6 @@
@  			return (cpackf(copysignf(0, x), y+y));
@  		if (isinf(y))
@ -			return (cpackf(copysignf(0, x), copysignf(m_pi_2, y)));
@ +			return (cpackf(copysignf(0, x),
@ +			    copysignf(m_pi_2 + tiny, y)));
@  		if (x == 0)
@  			return (cpackf(x, y+y));
@ @@ -320,9 +331,16 @@
@ 
@  	if (isinf(x) || isinf(y))
@ -		return (cpackf(copysignf(0, x), copysignf(m_pi_2, y)));
@ +		return (cpackf(copysignf(0, x), copysignf(m_pi_2 + tiny, y)));
@ 
@  	if (ax > RECIP_EPSILON || ay > RECIP_EPSILON)
@  		if ((int)(1+tiny)==1)
@ -			return (cpackf(copysignf(real_part_reciprocal(ax, ay), x), copysignf(m_pi_2, y)));
@ +			return (cpackf(
@ +			    copysignf(real_part_reciprocal(ax, ay), x),
@ +			    copysignf(m_pi_2 + tiny, y)));
@ +
@ +	/* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */
@ +	if (ax < 2048 * FLT_EPSILON && ay < 2048 * FLT_EPSILON)
@ +		if ((int)ax==0 && (int)ay==0)
@ +			return (z);
@ 
@  	if (ax == 1 && ay < FLT_EPSILON) {

Bruce