Date: Sun, 13 Jan 2002 18:28:29 -0500 (EST) From: Daniel Eischen <eischen@pcnet1.pcnet.com> To: Peter Jeremy <peter.jeremy@alcatel.com.au> Cc: Bruce Evans <bde@zeta.org.au>, Terry Lambert <tlambert2@mindspring.com>, Peter Wemm <peter@wemm.org>, Alfred Perlstein <bright@mu.org>, Kelly Yancey <kbyanc@posi.net>, Nate Williams <nate@yogotech.com>, Archie Cobbs <archie@dellroad.org>, arch@FreeBSD.ORG Subject: Re: Request for review: getcontext, setcontext, etc Message-ID: <Pine.SUN.3.91.1020113182433.16242B-100000@pcnet1.pcnet.com> In-Reply-To: <20020114074238.S561@gsmx07.alcatel.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 14 Jan 2002, Peter Jeremy wrote: > On 2002-Jan-12 21:40:20 +1100, Bruce Evans <bde@zeta.org.au> wrote: > >On Sat, 12 Jan 2002, Terry Lambert wrote: > ... > >> Assuming a 64 bit data path, then we are talking a minimum of > >> 3 * 512/(64/8) * (16:1) or 3k (3076) clocks to save the damn FPU > >> state off to main memory (a store in a loop is 3 clocks ignoring > >> the setup and crap, right?). Add another 3k clocks to bring it > >> back. > >> > >> Best case, God loves us, and we spill and restore from L1 > >> without an IPI or an invalidation, and without starting the > >> thread on a CPU other than the one where it was suspended, and > >> all spills are to cacheable write-through pages. That's a 16 > >> times speed increase because we get to ignore the bus speed > >> differential, or 3 * 512/(65/8) * 2 = (6k/16) = 384 clocks. > > > >This seems to be off by a bit. Actual timing on an Athlon1600 > >overclocked a little gives the following times for some crtical > >parts of context switching for each iteration of instructions in > >a loop (not counting 2 cycles of loop overhead): > > > >pushal; popal: 9 cycles > >pushl %ds; popl %ds: 21 cycles > >fxsave; fxrstor: 105 cycles > >fnsave; frstor: 264 cycles > > I can think of a possible reason: The FPU knows when it has been used > vs just having executed fninit. In the latter case, all it needs to > save is "I've been initialised". Also the FPU architecture includes > "used" flags associated with each register - possibly the f*save > instructions don't flush unused registers. Do the above numbers > change when you push real data into the FP registers? > > Also, how expensive is a DNA trap? Would it be cheaper overall to > always load FPU context on a switch - this is more expensive for > processes that don't use FP, but saves a DNA trap per context switch > (assuming they use FP in that slice) for those that do. Or perhaps keep some sort of heuristic (per-process) on whether or not it needs to FPU? If you've got N traps per M quanta, perhaps you could always load the FPU context for that process. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.SUN.3.91.1020113182433.16242B-100000>