FreeBSD Mail Archives

Date:      Fri, 11 Jan 2002 16:52:12 -0800
From:      Peter Wemm <peter@wemm.org>
To:        Alfred Perlstein <bright@mu.org>
Cc:        Kelly Yancey <kbyanc@posi.net>, Nate Williams <nate@yogotech.com>, Terry Lambert <tlambert2@mindspring.com>, Daniel Eischen <eischen@pcnet1.pcnet.com>, Dan Eischen <eischen@vigrid.com>, Archie Cobbs <archie@dellroad.org>, arch@FreeBSD.ORG
Subject:   Re: Request for review: getcontext, setcontext, etc 
Message-ID:  <20020112005212.5CB2038FF@overcee.netplex.com.au>
In-Reply-To: <20020110135217.M7984@elvis.mu.org>

Alfred Perlstein wrote:
> * Kelly Yancey <kbyanc@posi.net> [020110 13:14] wrote:
> > On Thu, 10 Jan 2002, Nate Williams wrote:
> > 
> > > See above.  Even in 5.0, we're going to have some threads being switched
> > > in userland context, while others are switched in the kernel.  (KSE is a
> > > hybrid approach that attempts to gain both the effeciency of userland
> > > threads with the ability to parallelize the effeciency gains of multiple
> > > CPU && I/O processing from kernel threads.
> > > 
> > 
> >   OK, I'm going to stick my head in and show my ignorance. If {get,set}cont
    ext
> > have to be implemented as system calls, then doesn't that eliminate much, i
    f
> > not all, the gains assumed by having a separate userland scheduler? I mean 
    if
> > we've got to go to the kernel to switch thread contexts, why not just have 
    the
> > kernel track all of the threads and restore context once, just for the curr
    ent
> > thread, rather than twice (once for the scheduler and another for the
> > scheduler to switch to the current thread context)?
> 
> That's the point of this discussion, we're trying to figure out
> why and if possible how to avoid them being system calls. :)
> 
> Basically what it seems to come down to are two points:
> 
> 1) Is atomicity required?  (looks like a "no")

Question, why do we have a sigreturn(2) syscall if atomicity isn't required?
setcontext() is supposed to be able to be used in place of sigreturn().

sigreturn() atomically restores the signal mask and context so that
unmasking the signal doesn't re-trigger a pending signal before we've
finished restoring.

> 2) Are states like FP usage trackable from userspace?
>    (looks like a "yes" with some kernel help)

With kernel help, yes.  But if you are going to use the kernel to find out
when to save/restore fp context then you may as well do it all in the
kernel.

The biggest problem on the x86 implementation is that once you touch the
fpu at all, you now own a fpu context forever.  When we context switch
a process, we save its active FPU state if[it has an active one] into
the pcb.  When we return to the process, we do *not* load the fpu state
into the FPU until the process touches it again.

For a userland application to do a swapcontext(), it would have to look
at the present fpu state (causing a kernel trap, which loads the fpu state
into the fpu), dump out the registers, switch contexts and load the
fpu state from the new context into the active fpu registers.  If the old
context hadn't used the FPU and the new context doesn't actually use it before
switching out to another process, then we've wasted a kernel trap, a two
fpu state loads and two fpu state saves.

Specifically:
0: cpu_switch() to new process. fpu state not loaded (lazy)
[no fpu activity at all, so the fpu state is still sitting in the pcb]
1: user does swapcontext()
[process does a sigprocmask(2) syscall when being used outside of libc_r]
2: userland swapcontext blindly attempts to save fpu state
3: kernel traps, and loads fpu context from pcb into fpu registers
4: userland swapcontext blindly copys fpu registers to old ucontext_t
[process does a sigprocmask(2) syscall when being used outside of libc_r]
5: userland swapcontext blindly copys new ucontext fpu state intp fpu regs
6: new context is running...
[no more fpu activity until timeslice ends]
7: cpu_switch copies the active fpu regs into the pcb

So, for no actual fpu activity, we had one kernel trap (stage 3), one
fpu load context (stage 3), one fpu save context (stage 4), another fpu
load context (stage 5) and yet another fpu save context (stage 7).
And when being used outside of libc_r, there are also two system calls!

And all this with not one FPU operation in userland!

Contrast this to a kernel getsetcontext(2) call:
0: cpu_switch() to new process, fpu state is not loaded (lazy)
[no fpu activity at all, so the fpu state is still sitting in the pcb]
1: user does swapcontext()
2: system call getsetcontext(SWAPCONTEXT, ucontext_t *ocp, ucontext_t *ncp)
3: kernel copies old registers into ocp
4: kernel copies fpu state from *pcb* into ocp
[kernel saves sigprocmask if told to via ocp flags, libc_r saves it itself]
5: kernel copies new registers from ncp
6: kernel copies new fpu state from ncp into *pcb*
[kernel restores sigprocmask if told to via ncp flags, libc_r saves it itself]
[return to user in new context]
[no fpu activity at all, so the fpu state is still sitting in the pcb]
7: cpu_switch notices the fpu state is still lazily sitting in the pcb

This time through we dont waste one kernel trap and four fpu load/save
contexts and enter the kernel only 1 time, versus 1 or 3 times depending on
whether we're in libc_r or not.

Cheers,
-Peter
--
Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au
"All of this is for nothing if we don't go to the stars" - JMS/B5

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020112005212.5CB2038FF>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation