Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 31 Mar 2005 21:40:40 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Peter Jeremy <PeterJeremy@optushome.com.au>
Cc:        bde@freebsd.org
Subject:   Re: Fwd: 5-STABLE kernel build with icc broken
Message-ID:  <20050331210931.S2670@epsplex.bde.org>
In-Reply-To: <20050331104635.GH71384@cirb503493.alcatel.com.au>
References:  <20050327133059.3d68a78c@Magellan.Leidinger.net> <20050327162839.2fafa6aa@Magellan.Leidinger.net> <5bbfe7d405032823232103d537@mail.gmail.com> <424A23A8.5040109@ec.rr.com><20050330130051.GA4416@VARK.MIT.EDU> <20050331104635.GH71384@cirb503493.alcatel.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 31 Mar 2005, Peter Jeremy wrote:

> On Thu, 2005-Mar-31 17:17:58 +1000, Bruce Evans wrote:
>>  I still
>> think fully lazy switching (c2) is the best general method.
>
> I think it depends on the FP workload.  It's a definite win if there
> is exactly one FP thread - in this case the FPU state never needs to
> be saved (and you could even optimise away the DNA trap by clearing
> the TS and EM bits if the switched-to curthread is fputhread).

I think stopping the trap would be the usual method (not sure what
Linux did), but to collect statistics for determining affinity you
would want to take the trap anyway.

> The worst case is two (or more) FP-intensive threads - in this case,
> lazy switching is of no benefit.  The DNA trap overheads mean that
> the performance is worse than just saving/restoring the FP state
> during a context switch.
>
> My guess is that the current generation workstation is closer to the
> second case - current generation graphical bloatware uses a lot of
> FP for rendering, not to mention that the idle task has a reasonable
> chance of being an FP-intensive distributed computing task (setiathome
> or similar).  It's probably time to do some more measuring (I'm not
> offering just now, I have lots of other things on my TODO list).

Bloatware might be so hoggish that it rarely makes context switches :-).
Context switches for interrupts increase the problem though, as would
using FP more in the kernel.

>> BTW, David and I recently found a bug in the context switching in the
>> fxsr case, at least on Athlon-XP's and AMD64's.
>
> I gather this is not noticable unless the application is doing its
> own FPU save/restore.  Is there a solution or work-around?

It's most noticeable for debugging, and if you worry about leaking
thread context.  Fortunately, the last-instruction pointers won't
have real user data in them unless the application encodes it there
intentionally.  I can't see any efficent solution or workaround.
The kernel should do a full save/restore for processes being debugged.
For applications, the bug seems to be larger.  Even if they know about
the amd behaviour and do a full save/restore because they need it, it
won't work because the kernel doesn't preserve the state across
context switches.  Applications like vmware might care more than most.

I forgot to mention that we couldn't find anything in intel manuals
about this behaviour, so it might be completely amd-specific.  Also,
the instruction pointers are fundamentally broken for 64-bit CPUs,
since although they are 64 bits, they have the segment selector encoded
in their top 32 bits, so they are not really different from the 32:32
selector:pointer format for the non-fxsr case.  Their format is specified
by SSE2 so 64-bit extensions would have to be elsewhere, but amd64 doesn't
seem to extend them.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050331210931.S2670>