From owner-freebsd-numerics@FreeBSD.ORG  Tue Sep 18 14:15:49 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id AD6ED106566B
	for <freebsd-numerics@FreeBSD.org>;
	Tue, 18 Sep 2012 14:15:49 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au
	[211.29.132.194])
	by mx1.freebsd.org (Postfix) with ESMTP id 09B938FC19
	for <freebsd-numerics@FreeBSD.org>;
	Tue, 18 Sep 2012 14:15:48 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8IEFeHX012910
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 19 Sep 2012 00:15:46 +1000
Date: Wed, 19 Sep 2012 00:15:40 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20120918162105.U991@besplex.bde.org>
Message-ID: <20120918232850.N2144@besplex.bde.org>
References: <5017111E.6060003@missouri.edu> <504FF726.9060001@missouri.edu>
	<20120912191556.F1078@besplex.bde.org>
	<20120912225847.J1771@besplex.bde.org>
	<50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org>
	<5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org>
	<50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org>
	<50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org>
	<50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu>
	<5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org>
	<50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org>
	<5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org>
	<50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org>
	<50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org>
	<5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu>
	<20120918162105.U991@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Stephen Montgomery-Smith <stephen@missouri.edu>,
	freebsd-numerics@FreeBSD.org
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 14:15:50 -0000

On Tue, 18 Sep 2012, Bruce Evans wrote:

> On Mon, 17 Sep 2012, Stephen Montgomery-Smith wrote:
> ...
>> But I could put something like
>> 
>> if ((x == 0 && y == 0) || (x == 0 && y == 1) || (int)(1+tiny) == 1) {
>> ........
>> at the beginning of do_hard_work and catanh.
>
> I put without (x == 0 && y == 1) in catanh().  (x == 0 && y == 1) in it
> is a bug, since catanh(I) = I*Pi/2 with inexact.  However, I seemed to
> have missed (x == 1 && y == 0) -> catanh(1) = +Inf without inexact.

I also broke cases with infinities...

>> In the case A < A_crossover, a threshold like DBL_EPSILON*DBL_EPSILON/128 
>> is required.  I think the one you set is too large.  It is important that 
>> sqrt(x) + x/2 is sqrt(x).  (Again I don't think your tests would pick this 
>> up, because you need to do a lot of tests where y is close to or equal to 
>> 1.)
>
> Well, there were 2**12 of them with y = 1+denormal, with 7 different
> denormals, but none with y = 1.  Will test some more.  (I'm testing

... and many cases with ax or ay precisely 1, due to not testing these.

Fixing these and finding a few more simplifications and optimizations
gives:

@ diff -u2 catrig.c~ catrig.c
@ --- catrig.c~	2012-09-18 03:42:32.000000000 +0000
@ +++ catrig.c	2012-09-18 11:53:28.017331000 +0000
@ @@ -261,4 +261,6 @@
@ 
@  /*
@ + * casinh(z) = z + O(|z|^3)   as z -> 0
@ + *
@   * casinh(z) = sign(x)*clog(sign(x)*z) + O(1/|z|^2)   as z -> infinity
@   * The above formula works for the imaginary part as well, because

Part of restoring your old optimization -- fix the comments.

@ @@ -297,4 +299,5 @@
@ 
@  	if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) {
@ +		/* clog...() will raise inexact unless x or y is infinite. */
@  		if (signbit(x) == 0)
@  			w = clog_for_large_values(z) + m_ln2;

Further minimal changes for the double precision case -- try to document
all magic for setting inexact.

@ @@ -304,4 +307,8 @@
@  	}
@ 
@ +	if (ax < DBL_EPSILON && ay < DBL_EPSILON)
@ +		if ((int)ax==0 && (int)ay==0) /* raise inexact */
@ +			return (z);
@ +
@  	do_hard_work(ax, ay, &rx, &B_is_usable, &B, &sqrt_A2my2, &new_y);
@  	if (B_is_usable)

Your old optimization.  Not done as completely as in float precision.

@ @@ -328,4 +335,6 @@
@   * close to 1.
@   *
@ + * cacos(z) = PI/2 - z + O(|z|^3)   as z -> 0
@ + *
@   * cacos(z) = -sign(y)*I*clog(z) + O(1/|z|^2)   as z -> infinity
@   * The above formula works for the real part as well, because
@ @@ -355,6 +364,6 @@
@  		if (isinf(y))
@  			return (cpack(x+x, -y));
@ -		/* cacos(0 + I*NaN) = PI/2 + I*NaN */
@ -		if (x == 0) return (cpack(m_pi_2 + tiny, y+y)); /* raise inexact */
@ +		/* cacos(0 + I*NaN) = PI/2 + I*NaN with inexact */
@ +		if (x == 0) return (cpack(m_pi_2 + tiny, y+y));
@  		/*
@  		 * All other cases involving NaN return NaN + I*NaN.

Comments about exceptions raised should be together with comments about
values returned, at least if we can't attach them closely to the magic
that raises them.

@ @@ -366,4 +375,5 @@
@ 
@  	if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) {
@ +		/* clog...() will raise inexact unless x or y is infinite. */
@  		w = clog_for_large_values(z);
@  		rx = fabs(cimag(w));
@ @@ -374,4 +384,7 @@
@  	}
@ 
@ +	if (ax < DBL_EPSILON && ay < DBL_EPSILON)
@ +		return (cpack(m_pi_2 + tiny - x, -y));	/* raise inexact */
@ +
@  	do_hard_work(ay, ax, &ry, &B_is_usable, &B, &sqrt_A2mx2, &new_x);
@  	if (B_is_usable) {

Your old optimization, updated to raise inexact by adding tiny.  Not
updated to avoid subtracting x -- see the float precision code for that.
The fixed comment above goes with this subtraction -- without it, the
approximation would be Pi/2 + O(z).

@ @@ -517,4 +530,6 @@
@   *             + I * atan2(2*y, (1-x)*(1+x)-y*y) / 2
@   *
@ + * catanh(z) = z + O(|z|^3)   as z -> 0
@ + *
@   * catanh(z) = 1/z + sign(y)*I*PI/2 + O(1/|z|^3)   as z -> infinity
@   * The above formula works for the real part as well, because
@ @@ -536,7 +551,7 @@
@  		if (isinf(x))
@  			return (cpack(copysign(0, x), y+y));
@ -		/* catanh(NaN + I*+-Inf) = sign(NaN)0 + I*+-PI/2 */
@ +		/* catanh(NaN + I*+-Inf) = sign(NaN)0 + I*+-PI/2 with inexact */
@  		if (isinf(y))
@ -			return (cpack(copysign(0, x), copysign(m_pi_2 + tiny, y))); /* raise inexact */
@ +			return (cpack(copysign(0, x), copysign(m_pi_2 + tiny, y)));
@  		/* catanh(+-0 + I*NaN) = +-0 + I*NaN */
@  		if (x == 0)
@ @@ -550,4 +565,5 @@
@  	}
@ 
@ +	/* XXX should improve following comments. */
@  	/* If x or y is inf, then catanh(x + I*y) = 0 + I*sign(y)*PI/2 */
@  	if (isinf(x) || isinf(y))

Here there was no space for commenting about the exceptions.  The sign of
the 0 is not documented, but there is no space for that either.  It is
in the code as a copysign().  So is the sign for PI/2, but that is in the
comment too.

@ @@ -557,6 +573,10 @@
@  		return (cpack(copysign(real_part_reciprocal(ax, ay), x), copysign(m_pi_2 + tiny, y))); /* raise inexact */
@ 
@ +	if (ax < DBL_EPSILON && ay < DBL_EPSILON)
@ +		if ((int)ax==0 && (int)ay==0) /* raise inexact */
@ +			return (z);
@ +

Your old optimization.  It also improves accuacy significantly -- see the
float precision comment.

@  	if (ax == 1 && ay < DBL_EPSILON) {
@ -		if ((int)ay==0) { /* raise inexact */
@ +		if (1) { /* inexact will be raised by log() */
@  			/*
@  			 * If ay == 0, divide-by-zero will be (correctly)

I didn't re-indent this.

@ diff -u2 catrigf.c~ catrigf.c
@ --- catrigf.c~	2012-09-18 03:42:35.000000000 +0000
@ +++ catrigf.c	2012-09-18 13:23:20.972740000 +0000
@ @@ -165,4 +165,9 @@
@  	}
@ 
@ +	/* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */
@ +	if (ax < 2048 * FLT_EPSILON && ay < 2048 * FLT_EPSILON)
@ +		if ((int)ax==0 && (int)ay==0)
@ +			return (z);
@ +
@  	do_hard_work(ax, ay, &rx, &B_is_usable, &B, &sqrt_A2my2, &new_y);
@  	if (B_is_usable)

Old optimization refined.

@ @@ -213,4 +218,8 @@
@  	}
@ 
@ +	/* XXX the number for ay is related to sqrt(6 * FLT_EPSILON). */
@ +	if (ax < FLT_EPSILON / 8 && ay < 2048 * FLT_EPSILON)
@ +		return (cpackf(m_pi_2 + tiny, -y));
@ +
@  	do_hard_work(ay, ax, &ry, &B_is_usable, &B, &sqrt_A2mx2, &new_x);
@  	if (B_is_usable) {

Old optimization refined.

@ @@ -277,7 +286,5 @@
@  {
@  	if (y < SQRT_MIN)
@ -		if ((int)y==0)
@ -			return (x*x);
@ -
@ +		return (x*x);
@  	return (x*x + y*y);
@  }

Depend on the up-front setting of inexact here in sum_squares().

@ @@ -288,4 +295,6 @@
@  	int ex, ey;
@ 
@ +	if (isinf(x) || isinf(y))
@ +		return (0);
@  	if (y == 0) return (1/x);
@  	if (x == 0) return (x/y/y);

Handle special case for infinities here in real_part_reciprocal() instead
of the general code.

@ @@ -319,29 +328,60 @@
@  	}
@ 
@ -	if (isinf(x) || isinf(y))
@ -		return (cpackf(copysignf(0, x), copysignf(m_pi_2 + tiny, y)));

Move this into real_part_reciprocal(), so that the classification of
infinities is only done if x or y is large.  This is a minor optimization.
My previous version removed this, but was broken since
real_part_reciprocal() somehow doesn't naturally return 0 for infinities.
It does rather slow scaling steps.

@ +	/*
@ +	 * Handle the annoying special case +-1 + I*+-0, and collaterally
@ +	 * handle the not-so-special case y == 0.  C99 specifies that
@ +	 * catanh(+-1 + I*+-0) = +-Inf + I*+-0 instead of the limiting
@ +	 * value +-Inf + I*+-PI/2 since it wants y == 0 to give the same
@ +	 * result as the real atanh() (at least for y == +0).  The special
@ +	 * behaviour for +-1 + I*+-0 begins with classifying it to avoid
@ +	 * raising inexact for it.  Make the classification as simple and
@ +	 * short as possible (except for this comment about it) and ensure
@ +	 * identical results by calling the real atanh() for all non-NaN x
@ +	 * when y == 0.  This turns out to be significantly more accurate.
@ +	 *
@ +	 * TODO: move this before the NaN classification and let atanh()
@ +	 * handle NaN x too.  Make a similar special case for x == 0 to
@ +	 * improve accuracy; this takes no extra lines of code since it
@ +	 * removes the need to handle x == 0 under the NaN classification.
@ +	 */
@ +	if (y == 0)
@ +		return (cpackf(atanh(x), y));

See the comment.

@ +
@ +	/* Raise inexact unless z == 0; return for z == 0 as a side effect. */
@ +	if ((x == 0 && y == 0) || (int)(1 + tiny) != 1)
@ +		return (z);

z == 0 is the only remaining case that shouldn't raise inexact.

@ 
@  	if (ax > RECIP_EPSILON || ay > RECIP_EPSILON)
@ -		return (cpackf(copysignf(real_part_reciprocal(ax, ay), x), copysignf(m_pi_2 + tiny, y)));
@ +		return (cpackf(copysignf(real_part_reciprocal(ax, ay), x), copysignf(m_pi_2, y)));

Inexact was raised up-front.

@ 
@ -	if (ax == 1 && ay < FLT_EPSILON) {
@ -		if ((int)ay==0) {
@ -			if ( ilogbf(ay) > FLT_MIN_EXP)
@ -				rx = - logf(ay/2) / 2;
@ -			else
@ -				rx = - (logf(ay) - m_ln2) / 2;
@ -		}
@ -	} else

Inexact was raised up front, and will also be raised by logf().

ilogbf() is rather slow, though it is now a builtin.  There is no need
to use it here:
- the condition can be written as (ay > 2 * FLT_MIN_EXP)
- the expression with m_ln2 is accurate enough, so there is no need for
   the condition.  I thought I tested this assertion and found no difference
   at all in accuracy, but now I can't see why it is true.  This case is
   fundamentally quite accurate -- within about 1 ulp for the logf() part --
   and subtracting m_ln2 will only lose about 0.5 ulps pf accuracy (since
   |logf(FLT_EPSILON)| dominates m_ln2), so it is not near the worse case
   for accuracy, but the loss of accuracy is not null.

@ +	/* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */
@ +	if (ax < 2048 * FLT_EPSILON && ay < 2048 * FLT_EPSILON)
@ +		return (z);

Old optimization.

Not just an optimization -- see below about accuracy.

@ +
@ +	if (ax == 1 && ay < FLT_EPSILON)
@ +		rx = - (logf(ay) - m_ln2) / 2;

Above with the extra code for accuracy removed.

@ +	else
@ +		/*
@ +		 * If we didn't handle y == 0 earlier, the following for
@ +		 * y == 0 would reduce to log1pf(4*ax/(ax-1)**2)) / 4.
@ +		 * This is significantly less accurate than the expression
@ +		 * log1pf(ax+ax+(ax*ax)*x/(1-ax)) / 2 used by atanhf() for
@ +		 * ax < 0.5, though not much less accurate than the expr
@ +		 * log1pf(ax+ax/(1-ax)) / 2 used by atanhf() for 0.5 <=
@ +		 * ax <= 1.  Can we do better with ay mixed in?
@ +		 *
@ +		 * This is also significantly less accurate than the
@ +		 * expression (z) used above when ax < 2048 * FLT_EPSILON
@ +		 * and y == 0.  Presumably similarly when y is small but
@ +		 * nonzero.  This explains why the above optimization also
@ +		 * improves accuracy.
@ +		 */
@  		rx = log1pf(4*ax / sum_squares(ax-1, ay)) / 4;

See the comment.

@ 
@ -	if (ax == 1) {
@ -		if (ay == 0)
@ -			ry = 0;
@ -		else
@ -			ry = atan2f(2, -ay) / 2;
@ -	} else if (ay < FOUR_SQRT_MIN) {
@ -		if ((int)ay==0)
@ -			ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2;
@ -	} else
@ +	if (ax == 1)
@ +		ry = atan2(2, -ay) / 2;
@ +	else if (ay < FLT_EPSILON * 128)
@ +		ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2;
@ +	else
@  		ry = atan2f(2*ay, (1-ax)*(1+ax) - ay*ay) / 2;
@

This is the part that I completely broke before.  Now it does:
- special case for ay == 0 moved above
- don't remove the minus sign in -ay
- use up-front setting of inexact
- the expanded threshold still works for me.

@ diff -u2 catrigl.c~ catrigl.c
@ --- catrigl.c~	2012-09-18 03:42:37.000000000 +0000
@ +++ catrigl.c	2012-09-18 11:50:35.362160000 +0000
@ @@ -180,4 +180,8 @@
@  	}
@ 
@ +	if (ax < LDBL_EPSILON && ay < LDBL_EPSILON)
@ +		if ((int)ax==0 && (int)ay==0)
@ +			return (z);
@ +
@  	do_hard_work(ax, ay, &rx, &B_is_usable, &B, &sqrt_A2my2, &new_y);
@  	if (B_is_usable)
@ @@ -228,4 +232,7 @@
@  	}
@ 
@ +	if (ax < LDBL_EPSILON && ay < LDBL_EPSILON)
@ +		return (cpackl(m_pi_2 + tiny - x, -y));
@ +
@  	do_hard_work(ay, ax, &ry, &B_is_usable, &B, &sqrt_A2mx2, &new_x);
@  	if (B_is_usable) {
@ @@ -340,4 +347,8 @@
@  		return (cpackl(copysignl(real_part_reciprocal(ax, ay), x), copysignl(m_pi_2 + tiny, y)));
@ 
@ +	if (ax < LDBL_EPSILON && ay < LDBL_EPSILON)
@ +		if ((int)ax==0 && (int)ay==0)
@ +			return (z);
@ +
@  	if (ax == 1 && ay < LDBL_EPSILON) {
@  		if ((int)ay==0) {

catrigl.c only has changes to restore the old optimizations.

Bruce