From owner-freebsd-standards@FreeBSD.ORG  Sun Nov  4 05:26:39 2007
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9D24E16A417
	for <freebsd-standards@freebsd.org>;
	Sun,  4 Nov 2007 05:26:39 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail15.syd.optusnet.com.au (mail15.syd.optusnet.com.au
	[211.29.132.196])
	by mx1.freebsd.org (Postfix) with ESMTP id F225913C49D
	for <freebsd-standards@freebsd.org>;
	Sun,  4 Nov 2007 05:26:38 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au
	(c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213])
	by mail15.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	lA45Q3dG015936
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 4 Nov 2007 16:26:15 +1100
Date: Sun, 4 Nov 2007 16:26:03 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
In-Reply-To: <20071103213000.GA92030@troutmask.apl.washington.edu>
Message-ID: <20071104142220.G15933@delplex.bde.org>
References: <20070928152227.GA39233@troutmask.apl.washington.edu>
	<20071001173736.U1985@besplex.bde.org>
	<20071002001154.GA3782@troutmask.apl.washington.edu>
	<20071002172317.GA95181@VARK.MIT.EDU>
	<20071002173237.GA12586@troutmask.apl.washington.edu>
	<20071003103519.X14175@delplex.bde.org>
	<20071010204249.GA7446@troutmask.apl.washington.edu>
	<20071103144151.U671@besplex.bde.org>
	<20071103213000.GA92030@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-standards@freebsd.org
Subject: Re: long double broken on i386?
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 04 Nov 2007 05:26:39 -0000

On Sat, 3 Nov 2007, Steve Kargl wrote:

> On Sat, Nov 03, 2007 at 03:53:19PM +1100, Bruce Evans wrote:
>> On Wed, 10 Oct 2007, Steve Kargl wrote:
>>> I've given up on i386.  All of my attempts to make my C implementations
>>> work on both amd64 and i386 have failed.  The code when run on i386
>>> results in ULP's that exceed 10^5.  Yes, that is 1E5.  Results for amd64
>>> are reported below.  As far as assembly, someone else will need to write
>>> those routines because I do not know assembly.
>>
>> Bad errors are expressed in GULPs (Giga-ULPs) :-).  Maybe you are hitting
>> the compiler bugs that break double-precision arithmetic on i386's when
>> the rounding precision is changed to long double.  The rounding precision
>> ...
> I did some limiting testing of the output from k_rem_pio2.c
> against argument reduction using MPFR.  I found no long
> double values that gave results which did not agree.  I fairly
> certain that the GULPs are do to improper polynomial approximations
> for the 53-bit precision long doubles on i386.

Rounding errors in polynomial coefficients can't possibly have that
much effect.  An error of 1 53-bit ULP in a coefficient is only 2^11
63-bit ULPs.  Since the first 2 coefficients of cos() and the first
coefficient of sin() are exact, the relative errors are further reduced
by a factor of x^n/4! for cos(x) and x^3/3! for sin(x).

Actual testing of messed-up Taylor coefficients using pari-gp gives the
following error bounds:

  17     ULPs   cos(x)   up to x^18/18! term with 1/18! rounded to 53 bits
   0.06  ULPs   ...                                                64 bits
   0.07  ULPs   ...                                     not rounded
105     ULPs   sin(x)/x up to x^18/19! term with 1/19! rounded to 53 bits
   0.06  ULPs   ...                                                64 bits
   0.003 ULPs   ...                                     not rounded

All arithmetic was in ~1000-bit precision.  The interval was [-Pi/4, Pi/4].

The errors are much larger than I said they should be above, but still
much less than 1 GULP.

Not rounding for cos(x) really does give a larger max error than rounding
to 64 bits.  My Remez poynomial constructor rounds the coefficients during
its search for the best coefficients.  It is possible to compensate to some
extent for unfortunate rounding of early coefficients by tweaking higher
coefficients, but this is not very important and currently causes stability
problems in the search.

With 1 less term, errors in the Taylor approximation dominate errors
in (53-bit) coefficients, and with 1 more term, errors in the 64-bit
coefficients dominate errors in the Taylor approximation.

>>>>> PS: There is also the minor possibility of giving bde either
>>>>> a heartache or death via laughter when he finally sees what I
>>>>> have done. :)
>>>>
>>>> I would be unhappy with any versions that aren't either:
>>>> - as identical as possible in algorithm and style to the double precision
>>>>  versions, in the same way that the float versions are as identical as
>>>>  possible, or
>>>> - much better in algorithm and style and efficiency and accuracy than the
>>>>  double precision versions.  I don't want to maintain different versions.
>>>
>>> I'll go with your second option see comments below and attached
>>> code.  BTW, if I add
>>
>> So I'm unhappy that the first point hasn't been addressed at all.  This
>> gives quality < the current float and double precision versions.
>
> You're moving the goal.  You gave me an "either/or" option.  I took
> the "or", so the first point is irrelevant.  Well, your observation
> on quality is subjective because I'd claim my (almost) KNF commented
> code is much better than the fdlibm code.

The second option is "MUCH better in ALL of algorithm, style, efficiency
and accuracy".  That is impossible.  Change the emphasis to MUCH better
in efficiency OR accuracy OR (simplicity without loss of the others).
It is difficult to be both more efficient and more accurate, and you will
need a better algorithm to be either more efficient or more accurate.
MUCH better is not very subjective for efficiency or accuracy.  It means
not rewriting everything to gain only 10% efficiency.  For accuracy, it
means:
- all errors < 2-3 ULPs in cases where errors were sometimes > hundreds/
   thousands/millions/billions of ULPs.  Note that this is still not
   implemented for that i387 hardware trig functions -- the hardware gives
   errors of many GULPs near multiples of Pi/2, and the software doesn't
   do anything to avoid the bad cases since avoiding the bad cases without
   losing efficiency is not easy.
- all errors < 1 ULP in cases where errors were sometimes > 1 ULP
- almost perfect rounding in cases where errors were always < 1 ULP
   (the max error will be about 0.51 ULPs instead of up to 0.99 ULPs)
- perfect rounding if cases where there was almost perfect rounding
   (the max error will be 0.499... ULPs instead of 0.51 ULPs).

>> The argument reduction seems to begin correctly, but then the extra
>> precision which is carefully provided by k_rem_pio2.c is thrown away,
>> unlike in the float and double versions which pass it to the __kernel
>> functions.  I would expect a loss of up to a full ULP from this, giving
>> a maximum error of 1.5 ULPs (0.5 from the __kernel and 1 from the arg
>> reduction).
>
> I had __kernel_cosl and __kernel_sinl that retained both a high
> and low portion of the reduced arguments, but in the range tested
> I did not find any advantage to doing the extra precision arithmetic.
> These may have been discarded during my i386 GULP experience, so
> I revisted the issue on amd64.

Test near multiples of Pi/2.

>>> My timings on the basic routines using gprof are:
>>>
>>> %   cumulative   self              self     total
>>> time   seconds   seconds    calls  ms/call  ms/call  name
>>> 0.6     804.42     5.67 40000000     0.00     0.00  sinl [18]
>>> 0.1     934.41     0.74 19998751     0.00     0.00  __kernel_sinl [95]
>>> 0.1     935.82     0.66 20001249     0.00     0.00  __kernel_cosl [96]
>>>
>>> where the system contains a dual-cpu opteron tyan motherboard running
>>> at 2.2 GHz.
>>
>> That seems to be about 175 nS/call.  Not too bad.  (My current times on
>> a 2.2 GHz A64*1 in i386 mode on small args ([-2Pi,2Pi]) are 70 nS for
>> the C double version, 43 nS for the asm double version, and 15 nS for
>> the C float version).
>
> Well, I've been more concerned about correctness over speed.  I'd
> need to read the grof output again, but I recall that argument
> reduction was the major time consumer.

Yes, even for small and medium quadrant numbers.  k_rem_pio2.c optimizes
for these but is still slow.  My float trig functions optimize better for
very small quadrant numbers, but even with double precision the corresponding
optimizations don't work so well since they take much more code than the
float case.

>> % Index: msun/src/k_cosl.c
>> % ===================================================================
>> % RCS file: msun/src/k_cosl.c
>> % diff -N msun/src/k_cosl.c
>> % --- /dev/null	1 Jan 1970 00:00:00 -0000
>> % +++ msun/src/k_cosl.c	9 Oct 2007 21:52:32 -0000
>> % @@ -0,0 +1,62 @@
>> % +/*
>> % + * Compute cos(x) on the interval [-1:1] with a polynomial approximation.
>>
>> Interval is larger than necessary.  This might result in unnecessary
>> coefficients.  10 of them seem to be too many for 64 bits of mantissa
>> but not enough for 113 bits (sparc64).
>
> Yes, I know.  My original Chebyshev interpolation code assumed a function

I found that 10 is right with Taylor coefficients.  Chebyshev and Remez
tent to need 1 fewer.

>> % Index: msun/src/k_tanl.c
>> % ===================================================================
>> % RCS file: msun/src/k_tanl.c
>> % diff -N msun/src/k_tanl.c
>> % ...
>> % +#define C33  (long double)  0x1.8f939ed1883f595p-32L
>>
>> This has far too many terms for a 64-bit mantissa, but not enough for a
>> 113-bit mantissa.  IIRC, the former requires about 25 terms.
>
> This is partly due to the range being larger than required.

Ah.  tan() will be adversely affected by the larger interval much more than
cos() and sin().

I found that terms up to about the x^35 one are needed for a Remez
approximation in 64 bits, and up to x^61 (?) in 113 bits.

>> Much more extra care is needed for an accuracy of < 1 ULP for tan*() than
>> for sin*() and cos*().
>
> Well, I did code both steps 3 and 4 from k_tan.c, but I saw no
> statistical improvement in my testing.

The problems are mainly on corner cases.

>> % +long double
>> % +cosl(long double x)
>> % +{
>> % +	union IEEEl2bits z;
>>
>> I want to avoid using this union...
>>
>> % +	z.e = x;
>> % +	z.bits.sign = 0;
>>
>> ... since, while code like this is easy to write, it hard-codes the
>> assumption that things are layed out in bit-fields.  Access macros
>> are more flexible and don't require so much compiler intelligence
>> to optimize away the memory accesses that are logically required for
>> bit-fields.  gcc does a mediocre jobs with this optimization,
>> especially for 80-bit long doubles on amd64 and i386 since the bits
>> don't fit evenly in registers.
>
> math_private.h doesn't have GET_* or SET_* macros for use with
> 64-bit long double (not to mention 128-bit long double), and
> the macros in math_private.h use similar unions.  AFAICT, all
> supported archs have an _fpmath.h header.

Adding efficient versions of them is a not easy, but temporary inefficient
versions can easily use the bit-fields in union IEEEl2bits instead of
direct accesses to these bit-fields.  I have local versions for a
couple of the macros only.  I needed to add a 64-bit mantissa field
in _fpmath.h for efficiency on amd64.  Unfortunately, the mantissa
field must be split since it has more than 64 bits on some machines,
so code and/or macros have to deal with manh and manl or have separate
macros args for the spit.

Another problem with long doubles is that unsupported formats may be
passed.  These should be treated as NaNs or Infs but require extra
care for classifying.

>> I see parametrizing the bit-field sizes and other magic numbers (not
>> quoted) for various formats of long doubles as the main difficulty
>> here (trig functions are easy except for this and k_rem_pio2.c).
>
> Which explains why GET_* and SET_* don't exist.  Although these
> might be present in bdeBSD. :)

GET_* and SET_* are in glibc, in the same style as the float and double
versions.  glibc handles parametrization mainly by having zillions of
different versions.

>> Note that the fdlibm __kernel code for trig functions is generic except for:
>>
>> - my optimizations in the float case
>> - the magic number for the threshold near Pi/4 for tan()
>> - the number and values of terms in the polynomial, and the scheduling
>>   (best scheduling depends on the number of terms, and we don't have this
>>   automated).
>>
>> Thus it is easier to clone the correct code than to rewrite it incorrectly.
>
> It may be easier to clone fdlibm, but I'm not sure I'd call it correct
> code due to
>  - magic numbers
>  - questionable source formatting (it some of the worse code I've read)
>  - lack of comments

- magic numbers are needed for efficiency.
- fdlibm has its own style, which is not too bad when you get used to it.
   For better/worse style numeric code, look at pari or djb*.  I haven't
   haven't looked so much at these at so they look worse :-).
- fdlibm has more and better comments than most code.

Bruce

From owner-freebsd-standards@FreeBSD.ORG  Mon Nov  5 11:07:06 2007
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CCCEF16A419
	for <freebsd-standards@hub.freebsd.org>;
	Mon,  5 Nov 2007 11:07:06 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id BCB8A13C48A
	for <freebsd-standards@hub.freebsd.org>;
	Mon,  5 Nov 2007 11:07:06 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id lA5B768p026455
	for <freebsd-standards@FreeBSD.org>; Mon, 5 Nov 2007 11:07:06 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.1/8.14.1/Submit) id lA5B76bB026451
	for freebsd-standards@FreeBSD.org; Mon, 5 Nov 2007 11:07:06 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 5 Nov 2007 11:07:06 GMT
Message-Id: <200711051107.lA5B76bB026451@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-standards@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-standards@FreeBSD.org
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 05 Nov 2007 11:07:07 -0000

Current FreeBSD problem reports
Critical problems
Serious problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o bin/25542    standards  /bin/sh: null char in quoted string
o kern/46239   standards  posix semaphore implementation errors
o stand/54410  standards  one-true-awk not POSIX compliant (no extended REs)
o stand/82654  standards  C99 long double math functions are missing
o stand/94729  standards  [libc] fcntl() throws undocumented ENOTTY

5 problems total.

Non-critical problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o bin/21519    standards  sys/dir.h should be deprecated some more
o bin/24390    standards  Replacing old dir-symlinks when using /bin/ln
s stand/24590  standards  timezone function not compatible witn Single Unix Spec
s kern/28260   standards  UIO_MAXIOV needs to be made public
s stand/36076  standards  Implementation of POSIX fuser command
o stand/39256  standards  snprintf/vsnprintf aren't POSIX-conformant for strings
p stand/41576  standards  POSIX compliance of ln(1)
o stand/44425  standards  getcwd() succeeds even if current dir has perm 000.
o stand/46119  standards  Priority problems for SCHED_OTHER using pthreads
o stand/54833  standards  [pcvt] more pcvt deficits
o stand/54839  standards  [pcvt] pcvt deficits
p stand/55112  standards  glob.h, glob_t's gl_pathc should be "size_t", not "int
o stand/56476  standards  cd9660 unicode support simple hack
o stand/58676  standards  grantpt(3) alters storage used by ptsname(3)
s stand/62858  standards  malloc(0) not C99 compliant
s kern/64875   standards  [libc] [patch] [feature request] add a system call: fd
o stand/66357  standards  make POSIX conformance problem ('sh -e' & '+' command-
o stand/66531  standards  _gettemp uses a far smaller set of filenames than docu
o stand/70813  standards  [PATCH] ls(1) not Posix compliant
o stand/72006  standards  floating point formating in non-C locales
o stand/79056  standards  regex(3) regression tests
a stand/80293  standards  sysconf() does not support well-defined unistd values
o stand/81287  standards  [PATCH]: fingerd(8) might send a line not ending in CR
o stand/83845  standards  [libm] [patch] add log2() and log2f() support for libm
o stand/85080  standards  output of long double subnormals (with printf) is wron
o stand/92360  standards  [headers] [patch] Missing TAB3 in kernel headers
o stand/92362  standards  [headers] [patch] Missing SIGPOLL in kernel headers
o kern/93705   standards  [headers] [patch] ENODATA and EGREGIOUS (for glibc com
o stand/96016  standards  [headers] clock_getres et al should be in <time.h>
o stand/96236  standards  [PATCH] [POSIX] sed.1 incorrectly describes a function
p stand/99517  standards  Missing SIGRTMIN and SIGRTMAX signals
o stand/99960  standards  [Patch] make(1): Add -p flag
o stand/100017 standards  [Patch] Add fuser(1) functionality to fstat(1)
o stand/104743 standards  [headers] [patch] Wrong values for _POSIX_ minimal lim
o stand/104841 standards  [libm] [patch] C99 long double square root.
o stand/107561 standards  [patch] Missing SUS function tcgetsid()
o kern/114578  standards  [libc] wide character printing using swprintf(dst, n, 
o stand/114633 standards  /etc/rc.subr: line 511: omits a quotation mark: "force
o stand/116081 standards  make does not work with the directive sinclude
o stand/116221 standards  SUS issue -- FreeBSD has not flag WNOWAIT for wait*() 
o stand/116826 standards  [PATCH] sh support for POSIX character classes

41 problems total.


From owner-freebsd-standards@FreeBSD.ORG  Mon Nov  5 12:35:23 2007
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BC09A16A419
	for <freebsd-standards@freebsd.org>;
	Mon,  5 Nov 2007 12:35:23 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au
	[211.29.132.194])
	by mx1.freebsd.org (Postfix) with ESMTP id 4200013C4A8
	for <freebsd-standards@freebsd.org>;
	Mon,  5 Nov 2007 12:35:23 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au
	(c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213])
	by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	lA5CZ5rb012179
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 5 Nov 2007 23:35:08 +1100
Date: Mon, 5 Nov 2007 23:35:05 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
In-Reply-To: <20071103195952.GA91682@troutmask.apl.washington.edu>
Message-ID: <20071105220251.C19770@delplex.bde.org>
References: <20071012180959.GA36345@troutmask.apl.washington.edu>
	<20071103155721.K671@besplex.bde.org>
	<20071103195952.GA91682@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-standards@freebsd.org
Subject: Re: [PATCH] hypotl, cabsl, and code removal in cabs
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 05 Nov 2007 12:35:23 -0000

On Sat, 3 Nov 2007, Steve Kargl wrote:

> On Sat, Nov 03, 2007 at 04:15:27PM +1100, Bruce Evans wrote:
>>
>> On Fri, 12 Oct 2007, Steve Kargl wrote:
>>
>>> The attached patch implements hypotl(), cabsl(), and removes
>>> z_abs() from w_cabs.c.  z_abs() is undocumented namespace
>>> pollution in libm, and no header file under /usr/include
>>> provides a prototype.  The documentation has been updated,
>>> and in particular an incorrect paragraph in cabs.3 has been
>>> removed.
>>
>> I fear that that paragraph is correct again, since the long double
>> version only seems to take enough care with the overflow cases.

I verified that the > 1.2 ULP error is back using the same algorithm
for a float-precision versions.  Bugs were also easier to find in
lower precision.

> Ah, no.  Underflow is prevented by the if() statement.
>
> 		a.e *= a.e;
> 		m -= n;
> 		if (m > - PREC) {
> 			b.bits.exp += m;
> 			a.e += b.e * b.e;
> 		}
> 		a.e = sqrtl(a.e);
> 		a.bits.exp += n;
> 		return (a.e);
>
> If m < -PREC, then |b|/|a| < 2**(-32), so it skips the "a.e += b.e * b.e;"
> "a.e. *= a.e" can't underflow because a.e is in [0.5,1), and
> "a.e = sqrtl(a.e);" can't underflow because, here, a.e is in [0.25,2).

The problem has nothing to do with underflow (I don't even see underflow
as a problem.  Avoiding underflow is just an optimization)).  The
problem is to limit roundoff error in intermediate calculations so
that the final error is < 1 ULP.  More below.

BTW, PREC = 32 seems a little low.  I would have expected PREC = 33
to be needed to give 2 guard bits.  However, 32 for 64-bit long double
precision corresponds to 12 for 24-bit float precision, but even 6
seems to work for float precision.

>> % Index: man/hypot.3
>> % ...
>> % @@ -82,11 +90,6 @@ Consequently
>> %  exactly;
>> %  in general, hypot and cabs return an integer whenever an
>> %  integer might be expected.
>> % -.Pp
>> % -The same cannot be said for the shorter and faster version of hypot
>> % -and cabs that is provided in the comments in cabs.c; its error can
>> % -exceed 1.2
>> % -.Em ulps .
>
> This paragraph is useless.  There is no cabs.c in the source tree,
> and in particular, there is no cabs.c in an installed system.  Surely,
> this paragraph isn't referring to src/contrib/libf2c/libF77/cabs.c.
> It has one comment with a total of one word.

cabs.c is only in cvs history (libm/ieee/cabs.c).  It is instructive
since the ifdefed-out version (not exactly the commented-out version)
gets the 1.2 ULP error in much the same way as your version, while the
primary version (by Ng) has to work hard to limit the error to 0.959
ULPs.  The version in fdlibm works slightly harder to reduce the error
a bit more.

Here is the main part of the ifdefed-out version:

% 	if(finite(x))
% 	    if(finite(y))
% 	    {
% 		x=copysign(x,one);
% 		y=copysign(y,one);
% 		if(y > x)
% 		    { temp=x; x=y; y=temp; }
% 		if(x == zero) return(zero);
% 		if(y == zero) return(x);
% 		exp= logb(x);
% 		x=scalb(x,-exp);
% 		if(exp-(int)logb(y) > ibig )
% 			/* raise inexact flag and return |x| */
% 		   { one+small; return(scalb(x,exp)); }
% 		else y=scalb(y,-exp);
% 		return(scalb(sqrt(x*x+y*y),exp));
% 	    }

It does the necessary range reduction, which takes the bulk of the code,
but then just returns sqrt(x*x+y*y) (scaled).  This expression gives a
max error of 1.2+ ULPs for almost all implementations of sqrt() and
FP arithmetic.

>> It's silly that on i386's, naive double precision code would work if
>> the default precision were extended, but the double precision code
>> isn't naive; OTOH, long double precision code can never depend on extra
>> precision, but is naive.
>
> I don't understand to what the above comment is directed.

With extra precision in hardware, the naive expression sqrt(x*x+y*y)
has an error of < 1 ULP, but you propose to use the naive expression
in precisely the case where there is no extra precision in hardware.

>> % Index: src/e_hypotl.c
>> % ...
>> % +	if (n >= m) {
>> % +		a.e *= a.e;
>> % +		m -= n;
>> % +		if (m > - PREC) {
>> % +			b.bits.exp += m;
>> % +			a.e += b.e * b.e;
>> % +		}
>> % +		a.e = sqrtl(a.e);
>> % +		a.bits.exp += n;
>> % +		return (a.e);
>>
>> The usual case reduces to just sqrtl(x*x + y*y).
>
> See my comment above.  The usual case isn't the important case.
> It's the unusual case that is of concern.  The above code prevents
> the unusual, yet provides acceptable performance in the usual case.

Well, even the ifdefed-out version handled the unusual cases.

>> fdlibm is more careful in this case (but similar in others):
>>
>> %  *	So, compute sqrt(x*x+y*y) with some care as
>> %  *	follows to get the error below 1 ulp:
>> %  *
>> %  *	Assume x>y>0;
>> %  *	(if possible, set rounding to round-to-nearest)
>> %  *	1. if x > 2y  use
>> %  *		x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y
>> %  *	where x1 = x with lower 32 bits cleared, x2 = x-x1; else
>> %  *	2. if x <= 2y use
>> %  *		t1*y1+((x-y)*(x-y)+(t1*y2+t2*y))
>> %  *	where t1 = 2x with lower 32 bits cleared, t2 = 2x-t1,
>> %  *	y1= y with lower 32 bits chopped, y2 = y-y1.
>>
>> I don't understand the details of this, and couldn't find any cases
>> where this makes a difference.  Testing of 2-arg functions is hard
>> since exhaustive testing is impossible even for floats, and the
>> cases where this matters for hypot() seem to be sparse.
>
> The above seems more complicated than needed when one considers
> that x and y can be easily split into |x|=a*2**n and |y|=b*2**m
> where f,g are in [0.5,1).   One then has the comment that is
> found in the code:

Now I understand the details.  I left out the most important part of
the comment:

%  * Method : 
%  *	If (assume round-to-nearest) z=x*x+y*y 
%  *	has error less than sqrt(2)/2 ulp, than 
%  *	sqrt(z) has error less than 1 ulp (exercise).
%  *

Do this exercise.  Hints:
o reduce to x in [1, 4) and sqrt(x) in [1, 2)
o all estimates can be made in infinite precision
o the relative and absolute sizes of 1 ULP depend on x and sqrt(x)
o the worst case is x slightly larger than 2

If we compute x*x+y*y naively, then we should expect a max error of
1.5 ULPs (0.5 for each operation).  The actual max error seems to
be 1.0 ULPs (I don't understand why yet).  Even 1 ULP is too much,
since it is larger than the value of sqrt(2)/2 given by the exercise.
The final error is:

 	0.5 + N / (sqrt(2)/2)

where 0.5 ULPs of error result from rounding of sqrt(x*x+y*y) and N ULPs
of error in x*x+y*y contribute less than N ULPs of error to the final
result (according to the scale factor of 1/(sqrt(2)/2), provided N is
not very large -- note that linear scaling here and in the exercise
gives an upper bound for the error, essentially via the linear term
in a Taylor approximation -- we can simply drop the higher  terms without
breaking the estimate since log() is convex).

N = 1 gives the magic number 1.2:

 	0.5 + 1 / (sqrt(2)/2) ~= 0.5 + 0.7071... = 1.2071...

I observed an error of 1.202... in practice for float precision in practice.

The cases with a large error are sparse, so it would be hard to find
them by testing in long double precision.  For a final error of > 1
ULP, x and y must be such that:
- x*x+y*y has an error of > sqrt(2)/2 ULPs
- x*x+y*y must be fairly near the problem value of 2+ (else the relative
   size of an ULP shrinks in the domain and/or range, so sqrt() compresses
   errors more and there is no problem
- x*x+y*y must be very near a half-way case for sqrt().

This analysis also explains why missing guard bits for x*x+y*y don't
seem to be a problem.  We only ignore y*y (or x*x) if this term is <=
2**-(2*PREC) relative to the other term.  Then we may have an error
of more than 0.5 ULPs for the addition, but we only have a tiny error
for the ignored term so the combined error in x*x+y*y is < 1 ULP,
which is no worse than other worst cases.

I found only 1 serious bug n your hypotl(): adding n or m back to the
exponent gives garbage if the resulting exponent doesn't fit.  Fix for
the float precision case:

% 	if (n >= m) {
% 		a.e *= a.e;
% 		m -= n;
% 		if (m > - PREC) {
% 			b.bits.exp += m;
% 			a.e += b.e * b.e;
% 		}
% 		a.e = sqrtf(a.e);
% 		if (a.bits.exp + n <= 0 || a.bits.exp + n >= 0xff) {
% 			a.bits.exp += n - n / 2;
% 			SET_FLOAT_WORD(t, 0x3f800000 + ((n / 2) << 23));
% 			a.e *= t;
% 		} else
% 			a.bits.exp += n;
% 		return (a.e);
% 	} else {
% 		b.e *= b.e;
% 		n -= m;
% 		if (n > - PREC) {
% 			a.bits.exp += n;
% 			b.e += a.e * a.e;
% 		}
% 		b.e = sqrtf(b.e);
% 		if (b.bits.exp + m <= 0 || b.bits.exp + m >= 0xff) {
% 			b.bits.exp += m - m / 2;
% 			SET_FLOAT_WORD(t, 0x3f800000 + ((m / 2) << 23));
% 			b.e *= t;
% 		} else
% 			b.bits.exp += m;
% 		return (b.e);
% 	}

Simpler and slower code can use a multiplier of powf(2.0F, n), etc.

fdlibm hupot() and hypotf() use dynamically constructed multpliers
like the SET_FLOAT_WORD()s in the above, but require less code than
the above for various reasons:
- a and b are swapped to make a >= b so as to reduce duplication
- scaling is different.  In the above, n is decomposed into halves since
   the full n sometimes doesn't fit in the exponent either.  fdlibm
   always ends up with a medium-sized n (spelled k) so that rescaling
   is easier.  The different scaling may be responsible for fdlibm
   being significantly faster (though not in the above -- the no-fit
   case is rare).

Bruce

From owner-freebsd-standards@FreeBSD.ORG  Mon Nov  5 20:14:54 2007
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5EE3816A418
	for <freebsd-standards@freebsd.org>;
	Mon,  5 Nov 2007 20:14:54 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx03.syd.optusnet.com.au
	(fallbackmx03.syd.optusnet.com.au [211.29.133.136])
	by mx1.freebsd.org (Postfix) with ESMTP id A5A2B13C4AA
	for <freebsd-standards@freebsd.org>;
	Mon,  5 Nov 2007 20:14:53 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au
	[211.29.133.51])
	by fallbackmx03.syd.optusnet.com.au (8.12.11.20060308/8.12.11) with
	ESMTP id lA5D59qK006493
	for <freebsd-standards@freebsd.org>; Tue, 6 Nov 2007 00:05:09 +1100
Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au
	(c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213])
	by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	lA5D44n3013663
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 6 Nov 2007 00:04:07 +1100
Date: Tue, 6 Nov 2007 00:04:04 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20071105220251.C19770@delplex.bde.org>
Message-ID: <20071105234202.T19976@delplex.bde.org>
References: <20071012180959.GA36345@troutmask.apl.washington.edu>
	<20071103155721.K671@besplex.bde.org>
	<20071103195952.GA91682@troutmask.apl.washington.edu>
	<20071105220251.C19770@delplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-standards@freebsd.org,
	Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: [PATCH] hypotl, cabsl, and code removal in cabs
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 05 Nov 2007 20:14:54 -0000

On Mon, 5 Nov 2007, Bruce Evans wrote:

> I found only 1 serious bug n your hypotl(): adding n or m back to the
> exponent gives garbage if the resulting exponent doesn't fit.  Fix for
> the float precision case:
>
> % 	if (n >= m) {
> ...

Full patch to convert to float precision and fix some bugs:

% --- z~	Mon Nov  5 23:40:51 2007
% +++ z	Mon Nov  5 23:41:32 2007
% @@ -27,6 +27,6 @@
%  /*
%   * Compute hypotl(x, y) = sqrt(x**2 + y**2) such that underflow and 
% - * overflow are avoided.  To accomplishe the task, write x = a * 2**n
% - * and y = b * 2**m, one then has
% + * overflow are avoided.  To accomplish the task, write x = a * 2**n
% + * and y = b * 2**m; one then has
%   *
%   *   hypotl(x, y) = 2**n * sqrt(a**2 + b**(2*(m - n))) for n >= m

I didn't converrt the comments to float precision.

% @@ -42,9 +42,10 @@
%   *   hypotl(NaN, NaN) = hypotl(NaN, NaN) = Inf
%   */
% +
%  #include "math.h"
%  #include "math_private.h"
%  #include "fpmath.h"
% 
% -#define ZERO	0.L
% +static const float ZERO = 0.0;
% 
%  /*

ZERO must be a non-literal constant to prevent evaluation of 1.0/0.0
at runtime.  But due to compiler bugs/lack of support for C99/fenv,
the above is probably no different.

% @@ -53,12 +54,21 @@
%   * function sqrtl() function exists.
%   */
% -#define PREC	32
% +#define PREC	12		/* XXX 6 works, but why isn't > 12 needed? */
% +
% +union IEEEfbits {
% +	float	e;
% +	struct {
% +		unsigned int	man	:23;
% +		unsigned int	exp	:8;
% +		unsigned int	sign	:1;
% +	} bits;
% +};

I forgot that there was a suitable struct IEEEf2bits already defined in
fpmath.h.

% 
% -long double
% -hypotl(long double x, long double y)
% +float
% +myhypotf(float x, float y)
%  {
% -	union IEEEl2bits a, b;
% +	union IEEEfbits a, b;
% +	float t;
%  	int m, n;
% -	long double val;
% 
%  	a.e = x;

val is not used.

% @@ -68,6 +78,8 @@
% 
%  	/* If x = 0 or y = 0, then hypotl(x, y) = |x| + |y|. */
% -	if (!(a.bits.manh | a.bits.manl) || !(b.bits.manh | b.bits.manl))
% +	if ((a.bits.exp == 0 && a.bits.man == 0) ||
% +	    (b.bits.exp == 0 && b.bits.man == 0))
%  		return (a.e + b.e);
% +
%  	/*
%  	 * If x = +-Inf or y = +- Inf, then hypotl(x, y) = Inf (even if the
% @@ -75,11 +87,31 @@
%  	 * argument is not +- Inf, then hypotl(x, y) = NaN.
%  	 */
% -	if (a.bits.exp == 32767 || b.bits.exp == 32767) {
% -		mask_nbit_l(a);
% -		mask_nbit_l(b);
% -		if (!(a.bits.manh | a.bits.manl) ||
% -		    !(b.bits.manh | b.bits.manl))
% -			return (1.L / ZERO);
% -		return ((x - x) / (x - x));
% +	if (a.bits.exp == 0xff || b.bits.exp == 0xff) {
% +		if ((a.bits.exp == 0xff && a.bits.man == 0) ||
% +		    (b.bits.exp == 0xff && b.bits.man == 0))
% +			return (1.0F / ZERO);
% +		/*
% +		 * Here we do extra work to return the same NaN as fdlibm
% +		 * on i386's, so that simple binary comparisons work.  If
% +		 * both a.e and b.e are NaNs, we want to return the one
% +		 * that is highest as an integer.  The height depends on
% +		 * whether the hardware has set the qNaN bit, and whether
% +		 * the hardware gets a chance to do that depends on the
% +		 * optimization level and on whether our comparison function
% +		 * has higher precision, since passing floats in integer
% +		 * registers prevents quieting and converting floats to
% +		 * higher precision uses the hardware.
% +		 */
% +		if (a.bits.exp == 0xff && b.bits.exp == 0xff) {
% +#if 1 /* if call to comparison function will quieten */
% +			if (a.bits.man != 0)
% +				a.bits.man |= 0x00400000;	/* quieten */
% +			if (b.bits.man != 0)
% +				b.bits.man |= 0x00400000;
% +#endif
% +			if (a.bits.man < b.bits.man)
% +			    return (b.e + a.e);
% +		}
% +		return (a.e + b.e);
%  	}
%

Return the same value as fdlibm for testing.

We should at least return some combination of both x and y in the NaN case,
not just ((x - x) / (x - x)) when only y is a NaN, since when x is not a
NaN the latter is unrelated to the only NaN arg.

fdlibm sets the signs to 0 early similarly, so it tends to convert negative
NaNs to positive ones in a way that the hardware would not do.

Hacking on values using bit-fields or SET_*() tends to allow changes to NaNs
that the hardware would not do.

The above was written mainly on an A64 in i386 mode.  In amd64 mode, I
think loading sNaNs doesn't quieten them, so there would be even more
binary differences.

% @@ -90,16 +122,15 @@
%  	 */
%  	if (a.bits.exp == 0) {
% -		a.e *= 0x1.0p514;
% -		n = a.bits.exp - 0x4200;
% +		a.e *= 0x1.0p25;
% +		n = a.bits.exp - 0x97;
%  	} else
% -		n = a.bits.exp - 0x3ffe;
% -	a.bits.exp = 0x3ffe;
% -
% +		n = a.bits.exp - 0x7e;
% +	a.bits.exp = 0x7e;
%  	if (b.bits.exp == 0) {
% -		b.e *= 0x1.0p514;
% -		m = b.bits.exp - 0x4200;
% +		b.e *= 0x1.0p25;
% +		m = b.bits.exp - 0x97;
%  	} else
% -		m = b.bits.exp - 0x3ffe;
% -	b.bits.exp = 0x3ffe;
% +		m = b.bits.exp - 0x7e;
% +	b.bits.exp = 0x7e;
% 
%  	if (n >= m) {

I'm not sure why this uses 514.  I first tried converting 514 to 34, to
get a nice round number instead of 0x97.  This didn't work, but the
problems may have been just the rescaling bugs later.

fdlibm normalizes to a non-constant exponent.  This would be more complicated
here but simpler later.

% @@ -110,6 +141,11 @@
%  			a.e += b.e * b.e;
%  		}
% -		a.e = sqrtl(a.e);
% -		a.bits.exp += n;
% +		a.e = sqrtf(a.e);
% +		if (a.bits.exp + n <= 0 || a.bits.exp + n >= 0xff) {
% +			a.bits.exp += n - n / 2;
% +			SET_FLOAT_WORD(t, 0x3f800000 + ((n / 2) << 23));
% +			a.e *= t;
% +		} else
% +			a.bits.exp += n;
%  		return (a.e);
%  	} else {
% @@ -120,6 +156,11 @@
%  			b.e += a.e * a.e;
%  		}
% -		b.e = sqrtl(b.e);
% -		b.bits.exp += m;
% +		b.e = sqrtf(b.e);
% +		if (b.bits.exp + m <= 0 || b.bits.exp + m >= 0xff) {
% +			b.bits.exp += m - m / 2;
% +			SET_FLOAT_WORD(t, 0x3f800000 + ((m / 2) << 23));
% +			b.e *= t;
% +		} else
% +			b.bits.exp += m;
%  		return (b.e);
%  	}

Fix rescaling.

Bruce