Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Jan 2002 07:42:39 +1100
From:      Peter Jeremy <peter.jeremy@alcatel.com.au>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        Terry Lambert <tlambert2@mindspring.com>, Peter Wemm <peter@wemm.org>, Alfred Perlstein <bright@mu.org>, Kelly Yancey <kbyanc@posi.net>, Nate Williams <nate@yogotech.com>, Daniel Eischen <eischen@pcnet1.pcnet.com>, Dan Eischen <eischen@vigrid.com>, Archie Cobbs <archie@dellroad.org>, arch@FreeBSD.ORG
Subject:   Re: Request for review: getcontext, setcontext, etc
Message-ID:  <20020114074238.S561@gsmx07.alcatel.com.au>
In-Reply-To: <20020112205919.E5372-100000@gamplex.bde.org>; from bde@zeta.org.au on Sat, Jan 12, 2002 at 09:40:20PM %2B1100
References:  <3C4001A3.5ECCAEB9@mindspring.com> <20020112205919.E5372-100000@gamplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2002-Jan-12 21:40:20 +1100, Bruce Evans <bde@zeta.org.au> wrote:
>On Sat, 12 Jan 2002, Terry Lambert wrote:
...
>> Assuming a 64 bit data path, then we are talking a minimum of
>> 3 * 512/(64/8) * (16:1) or 3k (3076) clocks to save the damn FPU
>> state off to main memory (a store in a loop is 3 clocks ignoring
>> the setup and crap, right?).  Add another 3k clocks to bring it
>> back.
>>
>> Best case, God loves us, and we spill and restore from L1
>> without an IPI or an invalidation, and without starting the
>> thread on a CPU other than the one where it was suspended, and
>> all spills are to cacheable write-through pages.  That's a 16
>> times speed increase because we get to ignore the bus speed
>> differential, or 3 * 512/(65/8) * 2 = (6k/16) = 384 clocks.
>
>This seems to be off by a bit.  Actual timing on an Athlon1600
>overclocked a little gives the following times for some crtical
>parts of context switching for each iteration of instructions in
>a loop (not counting 2 cycles of loop overhead):
>
>pushal; popal:             9 cycles
>pushl %ds; popl %ds:      21 cycles
>fxsave; fxrstor:         105 cycles
>fnsave; frstor:          264 cycles

I can think of a possible reason: The FPU knows when it has been used
vs just having executed fninit.  In the latter case, all it needs to
save is "I've been initialised".  Also the FPU architecture includes
"used" flags associated with each register - possibly the f*save
instructions don't flush unused registers.  Do the above numbers
change when you push real data into the FP registers?

Also, how expensive is a DNA trap?  Would it be cheaper overall to
always load FPU context on a switch - this is more expensive for
processes that don't use FP, but saves a DNA trap per context switch
(assuming they use FP in that slice) for those that do.

To add some further numbers, in December 1999, I did some measurements
on FP switching by patching npx.c.  This was on a PII-266 running then
-current.  (The original e-mail was sent to -arch on Mon, 20 Dec 1999
07:34:06 +1100 in a thread titled "Concrete plans for ucontext/
mcontext changes around 4.0" - I don't have the message-id available).

  ctxt     DNA    FP
 swtch    traps  swtch
1754982  281557  59753  build world and a few CVS operations [1]
  79044   18811  10341  gnuplot and xv in parallel [2]
    800     138    130  parallel FP-intensive progs [3].

In the above, `ctxt swtch' is the number of context switches counted
via vm.stats.sys.v_swtch.  `DNA traps' is the number of device not
available traps registered and `FP swtch' is the number of DNA traps
where the FP context loaded is different to that saved on the
preceeding context switch.

Notes:
[1] Boot to single user, run 'make buildworld' inside script(1).  The
    buildworld had a few hiccups along the way which I patched around
    and then re-ran 'make everything'.

[2] I ran the gnuplot demos and the xv visual schnauzer updating a
    large directory of pictures in parallel.  (Multi-user X11).

[3] This was four parallel copies of a circuit analysis program I
    wrote.  It spent most of its time solving a complex 26x26 matrix
    using Gaussian elimination.  (Multi-user console).

Peter

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020114074238.S561>