Date: Tue, 25 Apr 95 16:22:40 MDT From: terry@cs.weber.edu (Terry Lambert) To: bakul@netcom.com (Bakul Shah) Cc: mycroft@ai.mit.edu, hackers@FreeBSD.org Subject: Re: benchmark hell.. Message-ID: <9504252222.AA02359@cs.weber.edu> In-Reply-To: <199504251951.MAA27349@netcom9.netcom.com> from "Bakul Shah" at Apr 25, 95 12:50:59 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > The idea here is that by default, processes are assumed to not use the > > FPU, and thus by default the FPU trap results in code fixup rather than > > FPU operations. Thus it does not matter which FPU operation is used > > first. > > Isn't it easier to just assume that everyone may > _eventually_ use FP and simply initialize the FP state on > exec? If they never use FP there is no loss. One time > initialization hit is not worth worrying about. There are > no FPU operations involved (you just copy the initial state > from somewhere and set the `stale' bit). No. The difference is whether you have to save/restore state on context swicth or not. If you know the process doesn't use the FPU, you can leave it alone, hoping that the process that doe use it will be the next FPU using process to run. This is typically the case, and it takes the FPU overhead out of each of the processes not actually using the FPU. Putting a stale state in there means you have to switch because you expect that the process will use the FPU. The difference is the difference between a startup initialization that may not be necessary plus a context switch for every process that runs vs a runtime fixup one time (that is no more expensive than the startup hit) and which is only paid by processes actually using the FPU. Here's the lowdown: o The FPU has a "last used by" state variable o When the first FPU using process is run, this variable is set to its PID. o When a context switch occurs, the FPU state is left unchanged if the process being switched to is a non FPU using process, or if the process being switched to is the one that "last used" the FPU. o A process has a flag that tells if it has ever used the FPU. If it has, the flag is set, if it hasn't, the flag is not set. o The flag is set by the linker based on FPU instructions being used (floagged by the objec files or by linking the math lib or whatever. The flag would be set on the process at exec time. o Alternately, the flag is only set on the first FPU use attempt. You cheat to get this information by causing processes without the flag to be set to cause an "emulator" that isn't an emulator to be trapped to by the FPU access. This means the trap vector needs switching between FPU using and non FPU using programs. o It's probably a tossup whether it is more expensive to incorrectly predict an FPU instruction containing program will execute one of the FPU instructions in an average code path, or whether the extra switching from presetting the "uses the FPU" flag on the process results in more overhead. o When a switch is done to an FPU using program that is not the last FPU using program, the las program's state is saved in the per processstruct for that program; also the current FPU state is restored for the program. o The code is two additional compares on context switch for FPU using programs and one additional compare instead of an FPU state save and restore for each non-FPU using programs. o If you use the preflagging method at link time (my preference), then there is not an extra overhead of altering the trap vector on transition between fPU using and non-FPU using programs; otherwise, there is, and the overhead does not occur in normal usages (which is typlically non-FPU using programs-to-non-FPU using program context switching). o The benfit in defeating benchmarks (make no bones about it, that's what's going on) is that a single process benchmark that uses FPU instructions results in no FPU context switches because there is a single FPU using program. o The overall benefit in benchmarks not using the FPU and sensitive to the context switch overhead is that you trade a single compare insterad of the FPU switch overhead on a per process FPU context save and restore. If you're worried about benchmarks, you should do this sort of thing. For Non FPU using installations, the overall benefit will be there, but for average FPU using installations, the benefit will be reduced by a single additional compare. > This can even be considered a `cleaner" solution! When > someone else is using the FPU, _you_ don't have it! So your > choices are to either take it away from this someone else or > emulate!! Emulation may even be cheaper than save/restore > of FPU state for some simple FP operation (once or twice)! > Though, probably not worth microoptimizing like this. I'd agree; I'd also say that this type of splitting, while possible in Sequent hardware, is a *bad* precedent for SMP, which largely assumes processer homogeneity. I'd say that SMP is a goal and that ASMP is not, and therefore inequal processr resources in a multiprocesser box are a case that probably should not be handled by default. Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9504252222.AA02359>