Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 10 May 2008 22:28:53 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Juergen Lock <nox@jelal.kn-bremen.de>
Cc:        freebsd-emulation@freebsd.org
Subject:   Re: seems I finally found what upset kqemu on amd64 SMP... shared gdt! (please test patch :)
Message-ID:  <20080510213519.P3083@besplex.bde.org>
In-Reply-To: <20080509220922.GA13480@saturn.kn-bremen.de>
References:  <20080507162713.73A3A5B47@mail.bitblocks.com> <20080508195843.G17500@delplex.bde.org> <20080509220922.GA13480@saturn.kn-bremen.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 10 May 2008, Juergen Lock wrote:

> On Thu, May 08, 2008 at 09:59:57PM +1000, Bruce Evans wrote:

>> The message in npx.c is actually about violation of an even more
>> fundamental invariant -- the invariant that owning the FPU includes
>> having the TS flag clear so that DNA traps cannot occur.  The bug in
>> kqemu seems to be mismanagement of the TS flag related to this.  I
>> forget if it is the host or the target TS flag that seems to be mismanaged.
>> For the target, it would take a bug in the virtualization of the TS flag
>> to break this invariant (assuming no related bugs in the target kernel).
>>
> Well the `fpcurthread == curthread' bug has been fixed quite a while
> ago already, or do you mean another one?

I didn't know what is already fixed.

>> The message in amd64/machdep.c is about violation of the invariant
>> that the kernel cannot cause DNA traps.  Spurious DNA traps in the
>> ...
>>
> Okay I _think_ I know a little more about this now...  kqemu itself
> doesn't use the fpu, but the guest code it runs can, and in that case the
> DNA trap is just used for (host) lazy fpu context switching like as if the
> code was running in userland regularly.  And I just tested the following
> patch that should get rid of the message by calling fpudna/npxdna directly
> (files/patch-fpucontext is the interesting part:)

This seems reasonable.  Is the following summary of my understanding of
kqemu's implementation of this and your change correct?:
- kqemu runs in kernel mode on the host and needs to have exactly the
   same effect as a DNA exception on the target.
- having exactly the same effect requires calling the host DNA exception
   handler.
- now it uses a software int $7 (dna) to implement the above, but this is
   not permitted in kernel mode (although the software int could be permitted,
   it is hard to distinguish from a hardware exception for unintentional use).
- your change makes it call the DNA trap handler directly.  This gives the
   same effect as a permitted software int $7.  It is also faster.

It would be better to use an official API for this, but none exists.

> ...
> +Index: kqemu-freebsd.c
> +@@ -33,6 +33,11 @@
> +
> + #include <machine/vmparam.h>
> + #include <machine/stdarg.h>
> ++#ifdef __x86_64__
> ++#include <machine/fpu.h>
> ++#else
> ++#include <machine/npx.h>
> ++#endif
> +
> + #include "kqemu-kernel.h"
> +
> +@@ -172,6 +177,15 @@
> + {
> + }
> +
> ++void CDECL kqemu_loadfpucontext(unsigned long cpl)
> ++{
> ++#ifdef __x86_64__
> ++    fpudna();
> ++#else
> ++    npxdna();
> ++#endif
> ++}

Just be sure that the system state is not too different from that of
trap() (directly below a syscall or trap from userland) when this is
called.  Better not have any interrupts disabled or locks held, though
I think npxdna() doesn't care.  The FPU must not be owned already at
this point.

> ++
> + #if __FreeBSD_version < 500000
> + static int
> + curpriority_cmp(struct proc *p)

I guess kqemu duplicates this old mistake instead of calling it because it
is static.  npxdna() is already public so it can be abused easily :-),

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080510213519.P3083>