From owner-svn-src-all@FreeBSD.ORG  Sat Mar  3 14:00:10 2012
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id E03A2106564A;
	Sat,  3 Mar 2012 14:00:09 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 283758FC0C;
	Sat,  3 Mar 2012 14:00:08 +0000 (UTC)
Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au
	(c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q23DxxMh003421
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 4 Mar 2012 01:00:00 +1100
Date: Sun, 4 Mar 2012 00:59:59 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
In-Reply-To: <20120303091426.GS75778@deviant.kiev.zoral.com.ua>
Message-ID: <20120303221614.G5236@besplex.bde.org>
References: <201202282217.q1SMHrIk094780@svn.freebsd.org>
	<201203012347.32984.tijl@freebsd.org>
	<20120302132403.P929@besplex.bde.org>
	<201203022231.43186.tijl@freebsd.org>
	<20120303110551.Q1494@besplex.bde.org>
	<20120303091426.GS75778@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: svn-src-head@FreeBSD.org, Tijl Coosemans <tijl@FreeBSD.org>,
	src-committers@FreeBSD.org, svn-src-all@FreeBSD.org,
	Bruce Evans <brde@optusnet.com.au>
Subject: Re: svn commit: r232275 - in head/sys: amd64/include i386/include
 pc98/include x86/include
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
	user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 03 Mar 2012 14:00:10 -0000

On Sat, 3 Mar 2012, Konstantin Belousov wrote:

> On Sat, Mar 03, 2012 at 12:02:23PM +1100, Bruce Evans wrote:
>> On Fri, 2 Mar 2012, Tijl Coosemans wrote:
>>
>> So the interesting points for signal handlers move to:
>> - should signal handlers have to initialize their own state if they want
>>   to use FP explicitly?  I think they should.
> Might be, they should if talking about abstract C implementation,
> but any useful Unix (U*x, probably) implementation gives much more
> guarantees there.

They don't document it of course.

>> - should signal handlers have to initialize their own state if they want
>>   to use FP or shared registers implicitly (because the compiler wants to)?
>>   No.  The kernel must handle this transparently, much like it does now,
>>   and I think this makes the previous case work transparently too.  The
>>   kernel tries to do this lazily, but it doesn't do this very well (it
>>   copies the state several times in sendsig() and sigreturn()).
>> - when the signal handler wants to modify the interrupted state, how does
>>   it do this?  There is minimal support for this.  The easiest way to
>>   modify it is to modify the current state and then longjmp() instead of
>>   returning.
> I disagree. The most correct way is to modify ucontext_t supplied to the
> handler, and then return normally. There may be state grown in next
> generations of architecture which signal handler author is not aware.
> Also, on some architectures some parts of the ucontext/sigcontext
> can only be restored by kernel. This is true even for x86.

So you want an average SIGINT handler that doesn't want to do any FP,
to understand the complications of FP better than longjmp() does, so
as to do what longjmp() doesn't know how to do, for future arches.
It is true that some parts of contexts can only be restored by the
kernel.  Even the signal mask requires sigreturn(2) for restoral
without races.  FreeBSD's [sig]longjmp() doesn't know how to do this,
and neither does an average user of [sig]longjmp().  Returning from
signal handlers works better because it uses sigreturn() automatically.
However, it is difficult to _modify_ delicate FP or other state that
you don't understand in a signal handler so that sigreturn() restores
what you want.  It is easiest to prepare the state before calling
setjmp() and have longjmp() simply restore it.  The normal preparation
is to do nothing -- the program knows nothing of FP, and is happily
running with a usable FP state.  setjmp() simply saves this state,
and longjmp() should restore it (except for exception flags).  Note
that it doesn't work to require the program to fix up the state after
setjmp() returns 1, since the program knows nothing of FP so it won't
know how to fix up the state then any more that it knows how to fix
up the state before longjmp(), although the fixup is much easier.

Apart from no program knowing that it should be done, something like
the following would work: for _every_ call to setjmp:

 	/* Save FP env, because some setjmp()s are too broken to do it. */
 	fegetenv(&env);
 	if (setjmp(jb) != 0) {
 		/* Restore FP env, since some longjmp()s are too broken... */
 		 *
 		 * But first, if we are actually an FP program that wants
 		 * to use fenv, then try to recover the current exception
 		 * flags.  Most longjmp()s from signal handlers lose these,
 		 * but* this is harder to fix so we just hope that we don't
 		 * have to.
 		 */
 		fegetexceptflags(&ex, FE_ALL_EXCEPT)
 		fesetenv(&env);
 		fesetexceptflags(&ex, FE_ALL_EXCEPT)
 		/*
 		 * XXX what about raising any exceptions that we just
 		 * unmasked?
 		 *
 		 * In a signal handler (before longjmp() the code to
 		 * not lose the exception flags (assuming that the signal
 		 * handler is passed a clean state) would be something like:
 		 * - use fenv to mask all exceptions
 		 * - read the exception flags from uncontext_t.  The MI
 		 *   API fegetexceptflags() is usually unavailable for this
 		 * - store the exception flags into the hardware.  Since
 		 *   the MI API fesetexceptflags() is not available either
 		 *   this seems easier than converting the harware
 		 *   representation that is probably in ucontext_t into an
 		 *   fexcept_t.  We masked all exceptions so that new ones
 		 *   don't bite us.
 		 * - we can now use longjmp(), provided longjmp() doesn't
 		 *   change any FP state and the caller of setjmp() has
 		 *   the above complications to replace the rest of the
 		 *   signal handler's unknown FP state with a good one,
 		 *   including unmasking any exceptions that we masked.
 		 */
 	}

The above complications belong in setjmp() and longjmp(), not in every
program.  setjmp() and longjmp() can do them much more efficiently.  For
example, on and64 it is not necessary to save the full FP environment
(which is a very slow operation).

>> - how can signal handlers and debuggers even see the interrupted state?
>>   gdb has less clue about this than it did 20 years ago.  Users can
>>   probably use debuggers to follow various pointers to the saved state
>>   if they know more about this than signal handlers and debuggers.
> Signal handlers should examine ucontext_t.
>
> ptrace(2) interface on FreeBSD allows to fully examine and modify the
> thread CPU state. gdb indeed was not upgraded to be aware of recent
> FreeBSD features (and not very recent features, too).

Yes, it is difficult.

I didn't even mention portability before.  For standard C, there is no
ucontext_t.  For POSIX, ucontext_t is essentially opaque.  You can
save and restore it but you can't modify it without doing unportable
things.  But FP changes require doing very unportable OS- and CPU-
dependent things.  Depending on longjmp() to work right is of course
very OS-dependent, but longjmp() can very easily handle some CPU-
dependent things provided it has non-broken semantics.

>>> If longjmp is not supposed to change the FP env then, when called from
>>> a signal handler, either the signal handler must install a proper FP
>>> env before calling longjmp or a proper FP env must be installed after
>>> the target setjmp call. Otherwise the FP env is unspecified.
>>
>> Better handle the usual case right like it used to be, without the
>> signal handler having to do anything, by always saving a minimal
>> environment in setjmp(), but now only restoring it for longjmp() in
>> signal handlers.  The minimal environment doesn't include any normal
>> register on at least amd64 and i386 (except for i387 it includes the
>> stack and the tags -- these must be empty on return from any function
>> call).
>>
>> Again there is a problem with transparent use of FP or SSE by the
>> compiler.  An average SIGINT handler that doesn't want to do any
>> explicit FP and just wants to longjmp() back to the main loop can't
>> be expected to understand this stuff better than standards, kernels
>> and compilers and have the complications neccessary to fix up the FP
>> state after the compiler has transparently (it thinks) used FP or SSE.
>
> longjmp() from a signal handler has very high chance of providing
> wrong CPU state for anything except basic integer registers.

Only if longjmp() it is broken.

A slightly different way to look at this is that without fenv support.
restoring the entire FP state (or all of it that matters) to that the
setjmp() works perfectly, because conforming programs just can't see
any fenv state.  FP exception flags correspond to the overflow flag in
integer arithmetic, and average programs know nothing of either.
Support for fenv must not be allowed to break this.

Here is my old program for testing that some of this works on i386.
It has rotted a bit (last edit 25 Oct 1994).  It assumes that the
divison by 0 exception and the invalid operand exception are unmasked,
as in FreeBSD-[1-~2].

% #undef TEST_CW_PRESERVED_ACROSS_SIGFPE
% #define TEST_LONGJMP_RESTORES_FP
% 
% #define _POSIX_SOURCE	1
% 
% #include <setjmp.h>
% #include <signal.h>
% #include <unistd.h>
% 
% static sigjmp_buf sjb;
% 
% static void catch(int sig)
% {
%     write(1, "1", 1);
%     siglongjmp(sjb, 1);
% }
% 
% int main(void)
% {
%     struct sigaction action;
% 
%     action.sa_handler = catch;
%     sigemptyset(&action.sa_mask);
%     action.sa_flags = 0;
% #ifdef TEST_CW_PRESERVED_ACROSS_SIGFPE
%     sigaction(SIGFPE, &action, (struct sigaction *) NULL);
% #endif
% #ifdef TEST_LONGJMP_RESTORES_FP
%     sigaction(SIGINT, &action, (struct sigaction *) NULL);
% #endif
% 
%     while (1)
%     {
% 	if (sigsetjmp(sjb, 1))
% 	    write(1, "2", 1);
% 	else
% 	{
% #ifdef TEST_CW_PRESERVED_ACROSS_SIGFPE
% 	    __asm("fldz; fld1; fdiv %st(1),%st; fwait");
% 	    write(1, "?", 1);
% #endif
% #ifdef TEST_LONGJMP_RESTORES_FP
% 	    while (1)
% 		__asm("fld1; fstp %st");
% #endif
% 	}
%     }
% }

When TEST_LONGJMP_RESTORES_FP is defined, this tests that the FP stack
doesn't become corrupted by SIGINTs (the corrupt stack should give
a SIGFPE which is not caught, else "12" should be printed after every
SIGINT.

When TEST_CW_PRESERVED_ACROSS_SIGFPE is defined, this tests that the
the divison by 0 exception is unmasked and remains unmasked after
longjmp() from a signal handler for the SIGFPE.  Now it also needs to
check that the exception bit for division by zero is not lost by any
of
- handling an unmasked SIGFPE or a SIGINT, with and without longjmp()ing
   from the handler
- looping without handling any signal
The exception bit is lost in most cases.

Similarly for SSE and mxcsr.

Similarly for other arches.  Division by 0 is fairly easy to arrange
without using asm, but the above uses asm so that it can control the
amount of FP used, and its placement.  The asms probably now need to
be volatile to prevent them moving.  Or just compile with -O0.

I have a much larger test that unmasked FP exceptions (mainly for
division by 0) work correctly with i486 and later with exception 16
and as well as possible for i386/i387 with IRQ13.  -current still
passes the former, but this is due to a bug in the test.  The test
assumes that the exception flags are clobbered by SIGFPE handling,s
so that when the SIGFPE handler returns normally, the SIGFPE doesn't
repeat.  i387 FP exceptions aren't quite normal faults since the
signal trap is delayed until the next non-control FP instruction
after the one that caused the exception, especially with IRQ13, but
they behave similarly unless the exception flags are clobberred
(the fault repeats on the next non-control FP instruction).  SSE
FP exceptions are normal faults (the fault repeats on the instruction
that causes it, and the exception flags have no effect on this, and
FreeBSD's SIGFPE handler doesn't clobber them anyway).  FP exceptions
are rarely unmasked now, so the buggy behaviour is mostly moot now
even for i387.

Anyway, no one should expect to continue after a SIGFPE handler returns
normally without fixing up the problem completely.  That's much more
harmful than longjmp()ing from the handler (on i387, the stack is
certain to be corrupt, and the exception flags are clobbered to hide
some problems).  longjmp()ing from SIGFPE handlers needs to work as well
as longjmp() from SIGINT handlers to provide a reasonably easy way out
of them.

Bruce