Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 Jan 2002 18:28:29 -0500 (EST)
From:      Daniel Eischen <eischen@pcnet1.pcnet.com>
To:        Peter Jeremy <peter.jeremy@alcatel.com.au>
Cc:        Bruce Evans <bde@zeta.org.au>, Terry Lambert <tlambert2@mindspring.com>, Peter Wemm <peter@wemm.org>, Alfred Perlstein <bright@mu.org>, Kelly Yancey <kbyanc@posi.net>, Nate Williams <nate@yogotech.com>, Archie Cobbs <archie@dellroad.org>, arch@FreeBSD.ORG
Subject:   Re: Request for review: getcontext, setcontext, etc
Message-ID:  <Pine.SUN.3.91.1020113182433.16242B-100000@pcnet1.pcnet.com>
In-Reply-To: <20020114074238.S561@gsmx07.alcatel.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 14 Jan 2002, Peter Jeremy wrote:
> On 2002-Jan-12 21:40:20 +1100, Bruce Evans <bde@zeta.org.au> wrote:
> >On Sat, 12 Jan 2002, Terry Lambert wrote:
> ...
> >> Assuming a 64 bit data path, then we are talking a minimum of
> >> 3 * 512/(64/8) * (16:1) or 3k (3076) clocks to save the damn FPU
> >> state off to main memory (a store in a loop is 3 clocks ignoring
> >> the setup and crap, right?).  Add another 3k clocks to bring it
> >> back.
> >>
> >> Best case, God loves us, and we spill and restore from L1
> >> without an IPI or an invalidation, and without starting the
> >> thread on a CPU other than the one where it was suspended, and
> >> all spills are to cacheable write-through pages.  That's a 16
> >> times speed increase because we get to ignore the bus speed
> >> differential, or 3 * 512/(65/8) * 2 = (6k/16) = 384 clocks.
> >
> >This seems to be off by a bit.  Actual timing on an Athlon1600
> >overclocked a little gives the following times for some crtical
> >parts of context switching for each iteration of instructions in
> >a loop (not counting 2 cycles of loop overhead):
> >
> >pushal; popal:             9 cycles
> >pushl %ds; popl %ds:      21 cycles
> >fxsave; fxrstor:         105 cycles
> >fnsave; frstor:          264 cycles
> 
> I can think of a possible reason: The FPU knows when it has been used
> vs just having executed fninit.  In the latter case, all it needs to
> save is "I've been initialised".  Also the FPU architecture includes
> "used" flags associated with each register - possibly the f*save
> instructions don't flush unused registers.  Do the above numbers
> change when you push real data into the FP registers?
> 
> Also, how expensive is a DNA trap?  Would it be cheaper overall to
> always load FPU context on a switch - this is more expensive for
> processes that don't use FP, but saves a DNA trap per context switch
> (assuming they use FP in that slice) for those that do.

Or perhaps keep some sort of heuristic (per-process) on whether or
not it needs to FPU?  If you've got N traps per M quanta, perhaps you
could always load the FPU context for that process.

-- 
Dan Eischen

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.SUN.3.91.1020113182433.16242B-100000>