Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 12 Oct 1997 12:23:13 +1000 (EST)
From:      Andrew Reilly <reilly@zeta.org.au>
To:        tlambert@primenet.com
Cc:        reilly@zeta.org.au, freebsd-hackers@FreeBSD.ORG
Subject:   Re: Floating point exceptions
Message-ID:  <199710120223.MAA00970@gurney.reilly.home>
In-Reply-To: <199710110715.AAA17411@usr04.primenet.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11 Oct, Terry Lambert wrote:
>> > Fix:		Correct the code to not generate exceptions
>> 
>> This is just plain rude.  There are a bunch of exceptions that are

Sorry about that.  I was out of line.  In my defence, it was over 32
degrees C in my office yesterday afternoon. Scorcher.

> Actually, I find this strange.  It kind of assumes that all hardware has
> equivalent precision.  I can guarantee you that code that works fine
> on UniCOS will have problems on an Intel-based PC if it expects 128 bit
> precision.  8-(.

Well, in this case the code was written with the deliberate
understanding that the precision would vary between implementations.
Continual scaling ensures that we are getting the maximum dynamic
range where it counts, and differences in round-off characteristics
results in recognition accuracy variations of a small fraction of a
percent across the architectures that it has been run on so far.  I
take your point, though.  It would be easy for this to be a real
error, and it would be good to know about it.  In this case, the
correct fix was to ignore the exception, because that was the original
intent of the maths.  What caught me off guard was that FreeBSD was
the first of about six platforms that signalled this particular
exception.  The DSP platforms saturate or underflow to zero, and the
other Unix platforms must have had this exception masked by default.

> To me, this was not a rude response.  An exception where an exception
> was not an intended result of the calculation is an exception that is
> not masked, to my mind, and a useful indicator that all is not right with
> the code.  It was certainly not intended as a rude remark.

I think that it is my problem that I take exception to some of the
IEEE floating point semantics: perhaps it is a good thing to try to
make a countable, auto-scaling (within a limited range) numeric
representation behave more like the set of reals in some cases.  I
prefer to think of floats as scaled integers, and I get caught out
with some of the modern twists.

> I think it's better to get an error than to get non-obviously erroneous
> results (the alternative).  But I am a physics geek at heart, so maybe
> I am biased toward useful answers and ugly exceptions vs. useless answers
> and no exceptions... depends on your idea of "useless", I suppose...

Most of my audio DSP work takes place on fixed-point processors, where
the notion of "full scale" and the associated noise floor are ever
present. I expect that if I multiply two small, non-zero numbers
together the result will sometimes be zero.  To me, this is not a
useless, or even a wrong result, in the context of a known dynamic
range.

>> > fpsetmask( 0);
>> 
>> Is there
>> a pointer to fpsetmask in any other manual page?
> 
> To be honest, I knew the general name of the function off the top of
> my head (I do a lot of event simulation), and I used man -k setmask to
> find the specific name.  But it is referenced in floatingpoint.h by
> prototype... and that header is referenced by most of the FP functions.

Given that FreeBSD's behaviour is different from other systems in this
regard, perhaps this warrants a pointer in the handbook or FAQ?

>> > Worst fix:	signal( SIGFPE, SIG_IGN);
>> 
>> Very bad fix, because when I tried it, it just didn't work.  I assume
>> that the trap handler does not correctly restore the floating point
>> state.  The program ran to completion, but IEEE error values
>> of some sort propagated from the exception point and ruined the results.
> 
> The point of the handler is to localise the errors.  Mask those which are
> intentional, and fix those that aren't on a case-by-case basis.  I had a
> number of precision fixes to 21 year old FORTRAN code that resulted from
> getting exceptions thrown like this.

This is a good strategy.  I just didn't know the mechanism for the
masking when the error occurred.  I do think that it is unfortunate
that ignoring the SIGFPE, as described above, does /not/ have the same
effect as masking the exception.

> An interesting application; you'll note it falls into my "signal processing"
> bucket which I designated as a bad thing to need to fix because of the
> need for repeatability...

I'm not sure what you mean by this comment.  Certainly there are audio
DSP applications where you would hope for complete repeatability, but
speech recognition with HMMs is a stochastic process, and rounding
errors in the calculations are not significantly different from noise
in the input signal.

How about this for an example of non-repeatability: One of the first
ports of this code was to a DSP card that used AT&T (now Lucent) DSP32C
processors.  The recogniser ran as a background (non/soft real time)
process, while the signal was buffered in real time, in response to the
frame interrupt.  The DSP32C has 40-bit floating point accumulators
(8 guard bits on the mantissa) and 32-bit memory, and no mechanism to
save or restore those guard bits in the interrupt service routine... 
Talk about noise injection.  We couldn't even get the same answer on
consecutive runs on test files!  Never the less, this did not affect
the measured performance of the recogniser more than few tenths of a
percent.

> Look to the calculation immediately before the compare.

That would be it.  The previous instruction stored s as a 32-bit float,
which would generate an underflow exception if not masked.  I guess
that if the '87 did not have extended precision floating point
registers, then the exception would have occurred some time earlier,
when the over precision result was generated.

[description of lazy reporting of FP exceptions, and implications for
SMP]

>> > Continuing from SIGFPE handlers is much harder than masking FP exceptions,
>> > at least on i386's.
>> 
>> Yes.  I tried doing a signal(SIGFPE,SIG_IGN) at the top of
>> main, but that just made it produce totally incorrect results.
> 
> The FPU registers are not saved (or restored) by signal handlers, which
> are not expected to execute FPU instructions.  If you will look at the
> man page, there is actually a *lot* of calls which are not "safe" to use
> from a signal handler, according to POSIX.

Which man page?  I just looked at kill(2) signal(3) and
sigaction(2), and did not see a reference to this, although
I do not doubt that such restrictions exist.

Where in SIG_IGN are floating point instructions used?  If there are
none, why doesn't it work (i.e., why is the floating point state
changed)?

On the subject of saving registers on context switches, are there
really so many Unix applications that do no floating point at all that
it is worth differentiating them?  Is it a characteristic of the Intel
processors that you can set them to trap on the use of _any_ floating
point instruction?

-- 
Andrew

"The steady state of disks is full."
				-- Ken Thompson





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199710120223.MAA00970>