From owner-freebsd-arch  Sun Jan 13 15:29:44 2002
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP id 3D5A737B417
	for <arch@FreeBSD.ORG>; Sun, 13 Jan 2002 15:29:41 -0800 (PST)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.12.1/8.12.1) id g0DNSTvN018698;
	Sun, 13 Jan 2002 18:28:29 -0500 (EST)
Date: Sun, 13 Jan 2002 18:28:29 -0500 (EST)
From: Daniel Eischen <eischen@pcnet1.pcnet.com>
To: Peter Jeremy <peter.jeremy@alcatel.com.au>
Cc: Bruce Evans <bde@zeta.org.au>,
	Terry Lambert <tlambert2@mindspring.com>,
	Peter Wemm <peter@wemm.org>, Alfred Perlstein <bright@mu.org>,
	Kelly Yancey <kbyanc@posi.net>, Nate Williams <nate@yogotech.com>,
	Archie Cobbs <archie@dellroad.org>, arch@FreeBSD.ORG
Subject: Re: Request for review: getcontext, setcontext, etc
In-Reply-To: <20020114074238.S561@gsmx07.alcatel.com.au>
Message-ID: <Pine.SUN.3.91.1020113182433.16242B-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-arch.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-arch>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-arch>
X-Loop: FreeBSD.ORG

On Mon, 14 Jan 2002, Peter Jeremy wrote:
> On 2002-Jan-12 21:40:20 +1100, Bruce Evans <bde@zeta.org.au> wrote:
> >On Sat, 12 Jan 2002, Terry Lambert wrote:
> ...
> >> Assuming a 64 bit data path, then we are talking a minimum of
> >> 3 * 512/(64/8) * (16:1) or 3k (3076) clocks to save the damn FPU
> >> state off to main memory (a store in a loop is 3 clocks ignoring
> >> the setup and crap, right?).  Add another 3k clocks to bring it
> >> back.
> >>
> >> Best case, God loves us, and we spill and restore from L1
> >> without an IPI or an invalidation, and without starting the
> >> thread on a CPU other than the one where it was suspended, and
> >> all spills are to cacheable write-through pages.  That's a 16
> >> times speed increase because we get to ignore the bus speed
> >> differential, or 3 * 512/(65/8) * 2 = (6k/16) = 384 clocks.
> >
> >This seems to be off by a bit.  Actual timing on an Athlon1600
> >overclocked a little gives the following times for some crtical
> >parts of context switching for each iteration of instructions in
> >a loop (not counting 2 cycles of loop overhead):
> >
> >pushal; popal:             9 cycles
> >pushl %ds; popl %ds:      21 cycles
> >fxsave; fxrstor:         105 cycles
> >fnsave; frstor:          264 cycles
> 
> I can think of a possible reason: The FPU knows when it has been used
> vs just having executed fninit.  In the latter case, all it needs to
> save is "I've been initialised".  Also the FPU architecture includes
> "used" flags associated with each register - possibly the f*save
> instructions don't flush unused registers.  Do the above numbers
> change when you push real data into the FP registers?
> 
> Also, how expensive is a DNA trap?  Would it be cheaper overall to
> always load FPU context on a switch - this is more expensive for
> processes that don't use FP, but saves a DNA trap per context switch
> (assuming they use FP in that slice) for those that do.

Or perhaps keep some sort of heuristic (per-process) on whether or
not it needs to FPU?  If you've got N traps per M quanta, perhaps you
could always load the FPU context for that process.

-- 
Dan Eischen

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message