Date:Fri, 23 Aug 2013 21:12:33 +1000 (EST)From:Bruce Evans <brde@optusnet.com.au>To:Steve Kargl <sgk@troutmask.apl.washington.edu>Cc:freebsd-numerics@FreeBSD.orgSubject:Re: (2nd time) tweaks to erff() threshold valuesMessage-ID:<20130823202257.Q1593@besplex.bde.org>In-Reply-To:<20130822213315.GA6708@troutmask.apl.washington.edu>References:<20130822213315.GA6708@troutmask.apl.washington.edu>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help

On Thu, 22 Aug 2013, Steve Kargl wrote: > (Having problems with freebsd-numerics list. Hopefully, not > a duplicate.) It was a duplicate :-). > I would like to apply the patch, which follows my .sig, > to msun/src/s_erff.c. It tweaks the threshold values > to those appropriate for erff(). > ... > Index: src/s_erff.c > =================================================================== > --- src/s_erff.c (revision 1358) > +++ src/s_erff.c (working copy) > @@ -107,10 +107,10 @@ > } > > if(ix < 0x3f580000) { /* |x|<0.84375 */ > - if(ix < 0x31800000) { /* |x|<2**-28 */ > - if (ix < 0x04000000) > + if(ix < 0x39000000) { /* |x|<2**-13 */ This increases the number of incorrectly rounded cases slightly on i386, but 2**-14 decreases it slightly. This is because i386 has extra precision for floats which the nonlinear return expression can actually use for approx. 2**-14 < |x| < 2**-13. > + if (ix < 0x04000000) /* |x|<0x1p-119 */ > /*avoid underflow */ > - return (float)0.125*((float)8.0*x+efx8*x); > + return (8*x+efx8*x)/8; Also put the comment back on the same line as the return statement now that it fits again (see the double version). Also add "spurious" to the comment. Underflow still occurs for most denormal x. This is non-spurious. We just avoid it for |x| larger than about 8 times the largest denormal, by multiplying efx by 8 so that it is larger than 1 (efx8 = 8(efx)). A recent discussion in comp.std.c made me more aware that underflow is mishandled for most functions in libm. They need to use something like the above to handle it correctly. For example, sin(x) returns x with inexact for denormal x, but it should return x with inexact AND UNDERFLOW for denormal x, since x is only an approximation to sin(x) -- the exact result is slightly smaller in magnitude in x, so is "tiny" when x is, and underflow means tiny plus inexact. The above still fails to raise underflow if 8*x, efx8*x, and their sum are all exact. 8*x is exact if x has 3 lower bits 0, which is quite often, and the others are rarely exact, except on i386 without clang everything is always exact since it is in extra precision. In nonstandard rounding modes, sin(x) shouldn't be rounded to x, but fdlibm and FreeBSD don't try to support such rounding modes for most functions including sin(). If they were supported, then they would increase the complications for underflow. IEEE x54 allows 2 alternatives for underflow. One is when the result is tiny after rounding, and inexact, and the other is when the result is tiny before rounding, and inexact. This causes complications even in round-to-nearest mode. Consider sin(x) on the largest denormal. The result of x is tiny both before and after rounding, and inexact, so underflow should be rasied. Consider sin(x) on the smallest normal. The result of x is normal after rounding, so is not tiny, but before rounding it is smaller than the smallest normal so should probably be considered as "tiny", so underflow should be raised iff it depends on tinyness before rounding. The discussion on comp.std.c was mainly about how extra precision complicates things further. When an expression like (8*x+efx8*x)/8 is evaluated in extra precision, nothing is tiny, so underflow is not raised. Inexact is probably not raised either for this particular expression. Underflow should be raised iff the extra precision is discarded, but the information for raising it has been partly lost. It will only be raised on conversion if the result is "tiny" then (before or after rounding) and is inexact THEN. It is inexact then iff any of the low bits that are discarded is nonzero. But sometimes the extra-precision result makes all these bits zero and discards nonzero bits even further out. The delayed conversion cannot do the right thing since it doesn't know about the bits already discarded. C standards get this wrong by having strict requirements for raising underflow. It is very difficult to raise it exactly when specified. C standards don't have strict requirements for raising inexact because they know about this problem for inexact. But underflow is even harded to decide than inexact. Implementations that support trapping of overflows have additional complications. IEEE x54 specifies that if traps for underflow are enabled, then underflow traps occur iff the result is tiny, while the underflow exception is raised iff the result is tiny and inexact. Thus, underflow traps, if enabled, occur more often than underflow exceptions. The trapped case is simpler but different (if the trap handler handler just returns). C standards don't support trapping of FP exceptions, and are even further from supporting returning from trap handlers. > return x + efx*x; > } > z = x*x; > ... Some modified lines: @ if(ix < 0x3f580000) { /* |x|<0.84375 */ @ if(ix < 0x38800000) { /* |x|<2**-14 */ @ if (ix < 0x04000000) /* |x|<0x1p-119 */ @ return (8*x+efx8*x)/8; /*avoid spurious underflow */ @ } @ return x + efx*x; @ } The whole erf implementation is probably not the best way for float precision, but is good for understanding higher precisions. Bruce

Want to link to this message? Use this URL: <http://docs.FreeBSD.org/cgi/mid.cgi?20130823202257.Q1593>