From owner-freebsd-arch Sat Jan 12 12:34:35 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rwcrmhc51.attbi.com (rwcrmhc51.attbi.com [204.127.198.38]) by hub.freebsd.org (Postfix) with ESMTP id 2FF5437B417 for ; Sat, 12 Jan 2002 12:34:31 -0800 (PST) Received: from peter3.wemm.org ([12.232.27.13]) by rwcrmhc51.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020112203430.IVQL10951.rwcrmhc51.attbi.com@peter3.wemm.org> for ; Sat, 12 Jan 2002 20:34:30 +0000 Received: from overcee.netplex.com.au (overcee.wemm.org [10.0.0.3]) by peter3.wemm.org (8.11.0/8.11.0) with ESMTP id g0CKYUs73731 for ; Sat, 12 Jan 2002 12:34:30 -0800 (PST) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by overcee.netplex.com.au (Postfix) with ESMTP id EE98738CC; Sat, 12 Jan 2002 12:34:29 -0800 (PST) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Bruce Evans Cc: Terry Lambert , Alfred Perlstein , Kelly Yancey , Nate Williams , Daniel Eischen , Dan Eischen , Archie Cobbs , arch@FreeBSD.ORG Subject: Re: Request for review: getcontext, setcontext, etc In-Reply-To: <20020112205919.E5372-100000@gamplex.bde.org> Date: Sat, 12 Jan 2002 12:34:29 -0800 From: Peter Wemm Message-Id: <20020112203429.EE98738CC@overcee.netplex.com.au> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Bruce Evans wrote: > On Sat, 12 Jan 2002, Terry Lambert wrote: > > > Bruce Evans wrote: > > > (*) It may not be all that good. It was good on old machines when 108 > > > bytes was a lot of memory and moving the state in and out of the FPU > > > was slow too. It is possible that the logic to avoid doing the switch > > > takes longer than always doing it, but not all that likely because logic > > > speed is increasing faster than memory speed and new machines have more > > > state to save (512 (?) bytes for SSE). > > > > Correct me if my math is wrong, but let's run with this... > > > > If I have a 2GHz CPU and 133MHz memory, then we are talking a 16:1 > > slowdown for a transfer of 512 bytes from register to L1 to L2 > > to main memory for an FPU state spill. > > > > Assuming a 64 bit data path, then we are talking a minimum of > > 3 * 512/(64/8) * (16:1) or 3k (3076) clocks to save the damn FPU > > state off to main memory (a store in a loop is 3 clocks ignoring > > the setup and crap, right?). Add another 3k clocks to bring it > > back. > > > > Best case, God loves us, and we spill and restore from L1 > > without an IPI or an invalidation, and without starting the > > thread on a CPU other than the one where it was suspended, and > > all spills are to cacheable write-through pages. That's a 16 > > times speed increase because we get to ignore the bus speed > > differential, or 3 * 512/(65/8) * 2 = (6k/16) = 384 clocks. > > This seems to be off by a bit. Actual timing on an Athlon1600 > overclocked a little gives the following times for some crtical > parts of context switching for each iteration of instructions in > a loop (not counting 2 cycles of loop overhead): > > pushal; popal: 9 cycles > pushl %ds; popl %ds: 21 cycles > fxsave; fxrstor: 105 cycles > fnsave; frstor: 264 cycles > > This certainly hits the L1 cache almost every time. So the 512-byte L1 > case "only" takes 105 cycles, not 384, but the 108-byte L1 case takes > much longer. fxsave/fxrstor is so fast that I don't quite believe the > times -- it saves 16 times as much state as pushal/popal in less than > 12 times as much time. Well, fxsave/fxrstor were specifically designed so that this could all be done with burst transfers. fxsave/fxrstor are possibly doing 256 bit wide transfers to/from the L1 cache. Also dont forget that the fast save/ restore operations were designed with strict alignment requirements so that a whole bunch of checks can be skipped at runtime that fnsave/frstor have to still deal with. > > So it seems to me that it is *incredibly* expensive to do the > > FPU save and restore, considering what *else* I could be doing > > with those clock cycles. > > I agree that fnsave/frstor are still incredibly expensive if the > above times are correct. fxsave/fxrstor is only credibly expensive. > However, the overheads for fnsave/frstor are small compared with > the overheads for the !*#*$% segment registers. We switch 3 segment > registers explicitly and 2 implicitly on every switch to the kernel. > According to the above, this has the same overhead as 1 fxsave/frstor. > It gets done much more often than context switches. I hoped to get > rid of the 2 expicit segment register switches, but couldn't keep > up with the forces of bloat that added a 3rd. Now I don't notice > this bloat unless I count cycles and forget that a billion of them > is a lot :-). Heh. That reminds me, I need to talk over some IPI vector tweaks with you. I had forgotten that segment register operations were so bad. Hmm. What are they again? I see %ds, %es and %fs. I assume the two implicit ones were %cs and %ss. Which had you hoped to remove? What *is* %es used for anyway? > > With an average instruction time of 6 clocks (erring on the > > side of caution), the question is "can we perform the logic > > for the avoidance in 64 or less instructions?" I think the > > answer is "yes", even if we throw in half a dozen uncached > > memory references to main memory as part of the process and > > take the 16:1 hit on each of them (that would be 96 clocks > > in memory references, leaving us 288/6 = 38 instructions to > > massage whatever we got back from those references). > > The Xdna trap to do load the state if we guessed wrong about the > next timeslice not using the FPU takes about 200 instructions > including several slow ones like iret, so we don't get near 38 > instructions in all cases although we could (Xdna can be written > in about 10 instructions if it doesn't go through trap() and > other general routines). Hmm, that is good to know too. Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message