From owner-freebsd-arch Sun Jan 13 15:29:44 2002 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 3D5A737B417 for ; Sun, 13 Jan 2002 15:29:41 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.12.1/8.12.1) id g0DNSTvN018698; Sun, 13 Jan 2002 18:28:29 -0500 (EST) Date: Sun, 13 Jan 2002 18:28:29 -0500 (EST) From: Daniel Eischen To: Peter Jeremy Cc: Bruce Evans , Terry Lambert , Peter Wemm , Alfred Perlstein , Kelly Yancey , Nate Williams , Archie Cobbs , arch@FreeBSD.ORG Subject: Re: Request for review: getcontext, setcontext, etc In-Reply-To: <20020114074238.S561@gsmx07.alcatel.com.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 14 Jan 2002, Peter Jeremy wrote: > On 2002-Jan-12 21:40:20 +1100, Bruce Evans wrote: > >On Sat, 12 Jan 2002, Terry Lambert wrote: > ... > >> Assuming a 64 bit data path, then we are talking a minimum of > >> 3 * 512/(64/8) * (16:1) or 3k (3076) clocks to save the damn FPU > >> state off to main memory (a store in a loop is 3 clocks ignoring > >> the setup and crap, right?). Add another 3k clocks to bring it > >> back. > >> > >> Best case, God loves us, and we spill and restore from L1 > >> without an IPI or an invalidation, and without starting the > >> thread on a CPU other than the one where it was suspended, and > >> all spills are to cacheable write-through pages. That's a 16 > >> times speed increase because we get to ignore the bus speed > >> differential, or 3 * 512/(65/8) * 2 = (6k/16) = 384 clocks. > > > >This seems to be off by a bit. Actual timing on an Athlon1600 > >overclocked a little gives the following times for some crtical > >parts of context switching for each iteration of instructions in > >a loop (not counting 2 cycles of loop overhead): > > > >pushal; popal: 9 cycles > >pushl %ds; popl %ds: 21 cycles > >fxsave; fxrstor: 105 cycles > >fnsave; frstor: 264 cycles > > I can think of a possible reason: The FPU knows when it has been used > vs just having executed fninit. In the latter case, all it needs to > save is "I've been initialised". Also the FPU architecture includes > "used" flags associated with each register - possibly the f*save > instructions don't flush unused registers. Do the above numbers > change when you push real data into the FP registers? > > Also, how expensive is a DNA trap? Would it be cheaper overall to > always load FPU context on a switch - this is more expensive for > processes that don't use FP, but saves a DNA trap per context switch > (assuming they use FP in that slice) for those that do. Or perhaps keep some sort of heuristic (per-process) on whether or not it needs to FPU? If you've got N traps per M quanta, perhaps you could always load the FPU context for that process. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message