From owner-freebsd-numerics@FreeBSD.ORG  Tue Sep 18 04:02:21 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 935B41065672
	for <freebsd-numerics@freebsd.org>;
	Tue, 18 Sep 2012 04:02:21 +0000 (UTC)
	(envelope-from stephen@missouri.edu)
Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu
	[128.206.184.213])
	by mx1.freebsd.org (Postfix) with ESMTP id 3335E8FC12
	for <freebsd-numerics@freebsd.org>;
	Tue, 18 Sep 2012 04:02:20 +0000 (UTC)
Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213])
	by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id
	q8I42JSt005318 for <freebsd-numerics@freebsd.org>;
	Mon, 17 Sep 2012 23:02:19 -0500 (CDT)
	(envelope-from stephen@missouri.edu)
Message-ID: <5057F24B.7020605@missouri.edu>
Date: Mon, 17 Sep 2012 23:02:19 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:15.0) Gecko/20120827 Thunderbird/15.0
MIME-Version: 1.0
To: freebsd-numerics@freebsd.org
References: <5017111E.6060003@missouri.edu>
	<20120906221028.O1542@besplex.bde.org>
	<5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu>
	<504FF726.9060001@missouri.edu>
	<20120912191556.F1078@besplex.bde.org>
	<20120912225847.J1771@besplex.bde.org>
	<50511B40.3070009@missouri.edu>
	<20120913204808.T1964@besplex.bde.org>
	<5051F59C.6000603@missouri.edu>
	<20120914014208.I2862@besplex.bde.org>
	<50526050.2070303@missouri.edu>
	<20120914212403.H1983@besplex.bde.org>
	<50538E28.6050400@missouri.edu>
	<20120915231032.C2669@besplex.bde.org>
	<50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu>
	<5054C200.7090307@missouri.edu>
	<20120916041132.D6344@besplex.bde.org>
	<50553424.2080902@missouri.edu>
	<20120916134730.Y957@besplex.bde.org>
	<5055ECA8.2080008@missouri.edu>
	<20120917022614.R2943@besplex.bde.org>
	<50562213.9020400@missouri.edu>
	<20120917060116.G3825@besplex.bde.org>
	<50563C57.60806@missouri.edu>
	<20120918012459.V5094@besplex.bde.org>
	<5057A932.3000603@missouri.edu>
In-Reply-To: <5057A932.3000603@missouri.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 04:02:21 -0000

On 09/17/2012 05:50 PM, Stephen Montgomery-Smith wrote:

>> cacos*() and casin*() should benefit even more from an up-front raising
>> of inexact, since do_hard_work() has 7 magic statements to raise inexact
>> where sum_squares has only 1.
>
> Where is the code that raises inexact up-front?

I don't see why having code upfront will make it much more efficient. 
Out of these 7 magic statements, at most two of them will be called.

But I could put something like

if ((x == 0 && y == 0) || (x == 0 && y == 1) || (int)(1+tiny) == 1) {
........
at the beginning of do_hard_work and catanh.


>> ...  I now understand what the threshold should be.  You have
>> filtered out ax == 1.  This makes 1 - ax*ax at least ~2*EPSILON, so
>> ay*ay can be dropped if ay is less than sqrt(2*EPSILON*EPSILON) *
>> 2**-GUARD_DIGITS = EPSILON * 2**-5 say.  SQRT_MIN is way smaller
>> than that, so FOUR_SQRT_MIN works too.  We should use a larger
>> threshold for efficiency, or avoid the special case for ax == 1.
>> Testing shows that this analysis is off by a factor of about
>> sqrt(EPSILON), since a threshold of EPSILON * 2**7 is optimal.
>> The optimization made no difference to speed; it is just an
>> optimization for understanding.  Maybe the special case for ax == 1
>> can be avoided, or folded together with the same special case for
>> evaluation of the real part.  This special case is similar to the
>> one in clog(), but easier.

OK, I think I made changes more or less according to your suggestions.

In the case A < A_crossover, a threshold like 
DBL_EPSILON*DBL_EPSILON/128 is required.  I think the one you set is too 
large.  It is important that sqrt(x) + x/2 is sqrt(x).  (Again I don't 
think your tests would pick this up, because you need to do a lot of 
tests where y is close to or equal to 1.)