Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Jan 2002 16:21:11 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Peter Wemm <peter@wemm.org>
Cc:        Alfred Perlstein <bright@mu.org>, Kelly Yancey <kbyanc@posi.net>, Nate Williams <nate@yogotech.com>, Terry Lambert <tlambert2@mindspring.com>, Daniel Eischen <eischen@pcnet1.pcnet.com>, Dan Eischen <eischen@vigrid.com>, Archie Cobbs <archie@dellroad.org>, <arch@FreeBSD.ORG>
Subject:   Re: Request for review: getcontext, setcontext, etc 
Message-ID:  <20020112152622.W4598-100000@gamplex.bde.org>
In-Reply-To: <20020112005212.5CB2038FF@overcee.netplex.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 11 Jan 2002, Peter Wemm wrote:

> Alfred Perlstein wrote:
> > 1) Is atomicity required?  (looks like a "no")
>
> Question, why do we have a sigreturn(2) syscall if atomicity isn't required?
> setcontext() is supposed to be able to be used in place of sigreturn().

Possibly to restore state that can't be restored in user mode.  The signal
mask is usually such state because it is handled in the kernel.  Once a
syscall is required to restore some state, it may as well restore all state.

> sigreturn() atomically restores the signal mask and context so that
> unmasking the signal doesn't re-trigger a pending signal before we've
> finished restoring.

Loading %esp because calling sigprocmask() in siglongjmp() seems to
fix the atomicity problems.  Of course, it might not be much faster
than using sigreturn() because it sigprocmask() might be a syscall and
siglongjmp() uses the FPU.  I think it should always be faster except
for the FPU issue, since sigreturn() might be a non-syscall if
sigprocmask() is a non-syscall, and using sigreturn() forces you to
switch _all_ the state that is switched by sigreturn().

> > 2) Are states like FP usage trackable from userspace?
> >    (looks like a "yes" with some kernel help)
>
> With kernel help, yes.  But if you are going to use the kernel to find out
> when to save/restore fp context then you may as well do it all in the
> kernel.

This is not completely clear.  You can look at the TSS bit (in %msw because
%cr0) to see if executing fnsave would be especially wasteful because it
would trap.  Maybe save/restor of the context could be completely avoided
in this case.  The problem is that the state would need to be saved lazily
when another thread uses it.  The kernel trap for the fnsave in another
thread would have to be hooked into userland...

> The biggest problem on the x86 implementation is that once you touch the
> fpu at all, you now own a fpu context forever.  When we context switch

Not forever; only until the next (kernel) context switch.  This touching
is what can be avoided easily by checking the TSS bit.

> a process, we save its active FPU state if[it has an active one] into
> the pcb.  When we return to the process, we do *not* load the fpu state
> into the FPU until the process touches it again.

And this is good (*).  The problem is that it is defeated if there are
userland threads and there are a lot of userland thread switches that
touch the FPU before the next kernel context switch.  The first touch
gives the kernel extra work to do on the next kernel context switch
(if userland would not have touched the FPU otherwise).  It also clears
the TSS bit, so subsequent userland thread switches will find it harder
to avoid doing useless work to switch the FPU state.

(*) It may not be all that good.  It was good on old machines when 108
bytes was a lot of memory and moving the state in and out of the FPU
was slow too.  It is possible that the logic to avoid doing the switch
takes longer than always doing it, but not all that likely because logic
speed is increasing faster than memory speed and new machines have more
state to save (512 (?) bytes for SSE).

> For a userland application to do a swapcontext(), it would have to look
> at the present fpu state (causing a kernel trap, which loads the fpu state
> into the fpu), dump out the registers, switch contexts and load the
> fpu state from the new context into the active fpu registers.  If the old
> context hadn't used the FPU and the new context doesn't actually use it before
> switching out to another process, then we've wasted a kernel trap, a two
> fpu state loads and two fpu state saves.
>
> Specifically:
> 0: cpu_switch() to new process. fpu state not loaded (lazy)

Only half-lazy :-).  FPU state for old process is saved to pcb for old
process here (if necessary).  Full laziness would leave the state in
the FPU in the hope that only the next process that accesses the FPU is
the one that owns it, and not trap for the access in this case.
Unconditionally touching the FPU in the userland context switcher is
even more of a pessimization for full-lazy.

The rest of the details seem to be correct.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020112152622.W4598-100000>