Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Sep 2012 05:53:18 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Stephen Montgomery-Smith <stephen@missouri.edu>, freebsd-numerics@freebsd.org
Subject:   Re: Complex arg-trig functions
Message-ID:  <20120917041848.F3504@besplex.bde.org>
In-Reply-To: <20120917022614.R2943@besplex.bde.org>
References:  <5017111E.6060003@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 17 Sep 2012, Bruce Evans wrote:

> On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote:
>
>> On 09/16/2012 12:14 AM, Bruce Evans wrote:
>>> On Sat, 15 Sep 2012, Stephen Montgomery-Smith wrote:
>>> 
>>>> One more thing I would like an opinion on.
>>>> 
>>>> In my code I check for |z| being small, and then use the approximations:
>>>> casinh(z) = z
>>>> cacos(z) = Pi - z
>>> 
>>> Actually Pi/2 - z.
>>> 
>>>> catanh(z) = z
>>>> 
>>>> However these approximations are not used in the papers by Hull et al,
>>>> and the code works just fine if I don't include these in the code.
>>> 
>>> Probably a bug in the papers.
>> 
>> It is not a bug in the papers.  The algorithms they provide really do work 
>> when |z| is small.  In fact, you have to deal separately with the cases |x| 
>> is small and |y| is small (z=x+I*y), so dealing with both of them being 
>> small is not any additional problem.
>> 
>> And now I see your other post, that using PI/2 is problematic especially 
>> when rounding is not to nearest.  (Then the problem of rounding PI/2 
>> properly is relegated to the acos function, and so it is someone else's 
>> problem.)
>> 
>> So all things being said and done, I am going to remove the use of these 
>> approximations.
>
> I don't like that.  It will be much slower on almost 1/4 of arg space.
> The only reason to consider not doing it is that the args that it
> applies to are not very likely, and optimizing for them may pessimize
> the usual case.

It gives the expected pessimizations, and unexpected accuracy improvements
and unimprovements.  On amd64:

% 9,10c9,10
% < rcacos:max_er = 0x6947ecac 3.2900, avg_er = 0.317, #>=1:0.5 = 30489:303638
% <         5.47 real         5.47 user         0.00 sys
% ---
% > rcacos:max_er = 0x6947ecac 3.2900, avg_er = 0.317, #>=1:0.5 = 30489:268862
% >         5.87 real         5.86 user         0.00 sys

Only float functions were updated for this test, and results are only
shown for float functions (comparing them with double functions).
'<' in the diff is for an old result and '>' for a new result.  The above
shows:

- accuracy improvement.  Apparently the thresholds were too large.
- slowdown of 0.39 seconds.  The test program mainly calls cacos()
   and cacosf(), but has some overheads.  Say 1 1.47 seconds for the
   overheads and 2 seconds for each of the functions.  0.39/2 is
   almost 20%.

% 21,22c21,22
% < rcacosh:max_er = 0x51e70742 2.5595, avg_er = 0.257, #>=1:0.5 = 25766:3286888
% <         5.81 real         5.79 user         0.00 sys
% ---
% > rcacosh:max_er = 0x51e70742 2.5595, avg_er = 0.258, #>=1:0.5 = 26034:3313256
% >         6.06 real         6.05 user         0.00 sys

Similar slowdown, but now the old version os more accurate.  Apparently
the general code doesn't reduce to simply Pi/2.

% 34c34
% <         5.98 real         5.98 user         0.00 sys
% ---
% >         6.30 real         6.28 user         0.00 sys

This is for rcasin.  Similar slowdown, but no change in values.

% 45,46c45,46
% < rcasinh:max_er = 0x51e70742 2.5595, avg_er = 0.257, #>=1:0.5 = 25766:3286888
% <         5.57 real         5.56 user         0.00 sys
% ---
% > rcasinh:max_er = 0x51e70742 2.5595, avg_er = 0.258, #>=1:0.5 = 26034:3313256
% >         5.82 real         5.81 user         0.00 sys

Like rcacosh (lose speed and accuracy).

% 57,58c57,58
% < rcatan:max_er = 0x51d7c47a 2.5576, avg_er = 0.295, #>=1:0.5 = 77670:443246
% <         3.69 real         3.68 user         0.00 sys
% ---
% > rcatan:max_er = 0x51d7c47a 2.5576, avg_er = 0.296, #>=1:0.5 = 77874:469678
% >         3.64 real         3.64 user         0.00 sys

No slowdown, but accuracy loss.

% 69,70c69,70
% < rcatanh:max_er = 0x5304b263 2.5943, avg_er = 0.201, #>=1:0.5 = 185298:1337156
% <         3.88 real         3.86 user         0.00 sys
% ---
% > rcatanh:max_er = 0x5304b263 2.5943, avg_er = 0.203, #>=1:0.5 = 204986:1370276
% >         3.84 real         3.83 user         0.00 sys

Like rcatan, but the accuracy loss is smaller.

% [... unrelated functions]

% [... imaginary parts have similar behaviour (various symmetries)]

On i386:

% 9,10c9,10
% < rcacos:max_er = 0x4517ee94 2.1592, avg_er = 0.315, #>=1:0.5 = 4607:246852
% <         8.18 real         7.48 user         0.02 sys
% ---
% > rcacos:max_er = 0x4517ee94 2.1592, avg_er = 0.314, #>=1:0.5 = 4607:212076
% >         8.48 real         7.82 user         0.03 sys
% ...
% 201,202c201,202
% < icacosh:max_er = 0x4517ee94 2.1592, avg_er = 0.315, #>=1:0.5 = 4607:246852
% <         8.62 real         8.50 user         0.01 sys
% ---
% > icacosh:max_er = 0x4517ee94 2.1592, avg_er = 0.314, #>=1:0.5 = 4607:212076
% >         9.88 real         9.05 user         0.03 sys

Similar slowdowns (only look at the user time), but only changes in
accuracy for these two.  Now the accuracy is never reduced.  I think
you just have to reduce the threshold a little in the old version to
get this improvement.  Indeed, the following works well for me (only
edited the float version):

@ --- /home/stephen/public_html/catrigf.c	2012-09-16 15:14:05.000000000 +0000
@ +++ catrigf.c	2012-09-16 19:23:18.559723000 +0000
@ @@ -165,4 +165,9 @@
@  		}
@ 
@ +	/* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */
@ +	if (ax < 2048 * FLT_EPSILON && ay < 2048 *FLT_EPSILON)
@ +		if ((int)ax==0 && (int)ay==0)
@ +			return (z);
@ +
@  	do_hard_work(ax, ay, &rx, &B_is_usable, &B, &sqrt_A2my2, &new_y);
@  	if (B_is_usable)
@ @@ -200,5 +205,6 @@
@  		if (isinf(y))
@  			return (cpackf(x+x, -y));
@ -		if (x == 0) return (cpackf(m_pi_2, y+y));
@ +		if (x == 0)
@ +			return (cpackf(m_pi_2+tiny, y+y));
@  		return (cpackf(x+0.0L+(y+0), x+0.0L+(y+0)));
@  	}

Also fix NaN cases.

@ @@ -214,4 +220,8 @@
@  		}
@ 
@ +	/* XXX the number for ay is related to sqrt(6 * FLT_EPSILON). */
@ +	if (ax < FLT_EPSILON / 8 && ay < 2048 * FLT_EPSILON)
@ +		return (cpackf(m_pi_2 + tiny, -y));
@ +
@  	do_hard_work(ay, ax, &ry, &B_is_usable, &B, &sqrt_A2mx2, &new_x);
@  	if (B_is_usable) {

The thresholds should be asymmetric since the real part uses the
approximation Pi/2 + O(x) while the imaginary part uses the approximation
-y + O(y**3).  If we used the approximation Pi/2 - x for the real part
as before, then the thresholds could be more symmetric and more cases
could be handled here.  But then the expression with m_pi_2 would be
(m_pi_2 - x) again, so it wouldn't necessarily set inexact, and the
special code for setting inexact would be needed again.  The old
thresholds were too conservative.  I'm being sloppy with the xy product
terms and nonstandard rounding modes.

The magic numbers were whatever minimised the number of incorrectly
rounded cases in my tests.  sqrt(6 * FLT_EPSILON) is obviously not
conservative enough.  The divisor of 8 gives 3 guard digits which would
fix some cases (iff the cases go through here).  The magic 2048 gives
about 4.5 guard "digits".  A couple more guard digits make little
difference, by 1 fewer gives observably more errors.

@ @@ -313,5 +323,6 @@
@  			return (cpackf(copysignf(0, x), y+y));
@  		if (isinf(y))
@ -			return (cpackf(copysignf(0, x), copysignf(m_pi_2, y)));
@ +			return (cpackf(copysignf(0, x),
@ +			    copysignf(m_pi_2 + tiny, y)));
@  		if (x == 0)
@  			return (cpackf(x, y+y));
@ @@ -320,9 +331,16 @@
@ 
@  	if (isinf(x) || isinf(y))
@ -		return (cpackf(copysignf(0, x), copysignf(m_pi_2, y)));
@ +		return (cpackf(copysignf(0, x), copysignf(m_pi_2 + tiny, y)));
@ 
@  	if (ax > RECIP_EPSILON || ay > RECIP_EPSILON)
@  		if ((int)(1+tiny)==1)
@ -			return (cpackf(copysignf(real_part_reciprocal(ax, ay), x), copysignf(m_pi_2, y)));
@ +			return (cpackf(
@ +			    copysignf(real_part_reciprocal(ax, ay), x),
@ +			    copysignf(m_pi_2 + tiny, y)));
@ +
@ +	/* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */
@ +	if (ax < 2048 * FLT_EPSILON && ay < 2048 * FLT_EPSILON)
@ +		if ((int)ax==0 && (int)ay==0)
@ +			return (z);
@ 
@  	if (ax == 1 && ay < FLT_EPSILON) {

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120917041848.F3504>