Date: Mon, 20 Dec 1999 07:34:06 +1100 From: Peter Jeremy <peter.jeremy@alcatel.com.au> To: Martin Cracauer <cracauer@cons.org> Cc: arch@freebsd.org Subject: Re: Concrete plans for ucontext/mcontext changes around 4.0 Message-ID: <99Dec20.072529est.40328@border.alcanet.com.au> In-Reply-To: <19991213091915.D13197@cons.org>; from cracauer@cons.org on Mon, Dec 13, 1999 at 07:19:16PM %2B1100 References: <19991212172602.A10611@cons.org> <19991213091915.D13197@cons.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 1999-Dec-13 19:19:16 +1100, Martin Cracauer <cracauer@cons.org> wrote: >Forgot about lazy FPU context switching (should have finished reading >my mailbox). FPU context is not always there. > >The Linux people claim that lazy FPU switching is not worth the effort >anymore on modern machines. I didn't see any proof or numbers. Anyone >of you? Currently, the i386 half implements lazy FPU switching[1]. Based on some experimenting over the weekend, I don't believe it is worthwhile implementing full lazy FPU switching, but our semi-lazy switching is a definite win. I patched npx.c (patches at end) and extracted the following statistics: ctxt DNA FP swtch traps swtch 1754982 281557 59753 build world and a few CVS operations [2] 79044 18811 10341 gnuplot and xv in parallel [3] 800 138 130 parallel FP-intensive progs [4]. In the above, `ctxt swtch' is the number of context switches counted via vm.stats.sys.v_swtch. `DNA traps' is the number of device not available traps registered and `FP swtch' is the number of DNA traps where the FP context loaded is different to that saved on the preceeding context switch. Moving to full lazy FPU switching would save (DNA traps - FP swtch) fsave/frestor pairs and the same number of traps[5]. Whilst the real savings incurred can't be directly derived from the above figures, external knowledge of the real time taken for the above, together with the estimated cost of a DNA trap + fsave, suggests a saving of much less than 0.1% - which is getting towards the unmeasurable level. The best case would be a single, low priority FP-intensive process combined with lots of I/O bound integer-only processes (eg setiathome as an idle task) - which I don't have figures for, but expect the overheads (for the FP process only) would be <1%. The above figures do suggest that moving from the semi-lazy approach to one where the FPU context was saved/restored on each context switch would be wasteful - FP is not used about 80% of the time and fsave/ frestor are expensive instructions. Notes: [1] Currently, on the i386, the FP (NPX) registers are saved when a context switch occurs and the FPU had been used. The NPX is then flagged as `not equipped', causing a Device Not Available (DNA) trap when the next FP instruction is executed. At that point the appropriate FPU context is restored. Full lazy switching would postpone the register save until an FP instruction was executed by a different process. [2] Boot to single user, run 'make buildworld' inside script(1). The buildworld had a few hiccups along the way which I patched around and then re-ran 'make everything'. [3] I ran the gnuplot demos and the xv visual schnauzer updating a large directory of pictures in parallel. (Multi-user X11). [4] This was four parallel copies of a circuit analysis program I wrote. It spent most of its time solving a complex 26x26 matrix using Gaussian elimination. (Multi-user console). [5] The trap saving would occur if the FPU enabled bit was set according to the contents of the FPU (ie the FPU is left as `enabled' when a context switch occurred into the process that last used the FPU, and `not enabled' otherwise). Index: npx.c =================================================================== RCS file: /home/peter/cvs/src/sys/i386/isa/npx.c,v retrieving revision 1.78 diff -u -r1.78 npx.c --- npx.c 1999/09/21 10:51:47 1.78 +++ npx.c 1999/12/17 09:53:02 @@ -779,6 +779,15 @@ } } +static int fp_dna; /* number of DNA traps */ +static int fp_swtch; /* Number of real FP context switches */ +static struct proc *fpuproc; /* Last proc to use FPU */ + +SYSCTL_INT(_hw, OID_AUTO, fp_dna, CTLFLAG_RW, &fp_dna, 0, + "Number of NPX DNA traps"); +SYSCTL_INT(_hw, OID_AUTO, fp_swtch, CTLFLAG_RW, &fp_swtch, 0, + "Number of NPX context switches"); + /* * Implement device not available (DNA) exception * @@ -797,6 +806,11 @@ panic("npxdna"); } stop_emulating(); + fp_dna++; + if (curproc != fpuproc) { + fpuproc = curproc; + fp_swtch++; + } /* * Record new context early in case frstor causes an IRQ13. */ Peter -- Peter Jeremy (VK2PJ) peter.jeremy@alcatel.com.au Alcatel Australia Limited 41 Mandible St Phone: +61 2 9690 5019 ALEXANDRIA NSW 2015 Fax: +61 2 9690 5982 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?99Dec20.072529est.40328>