Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 17 Jun 2006 16:13:37 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Juergen Lock <nox@jelal.kn-bremen.de>
Cc:        freebsd-emulation@FreeBSD.org, nikolas.britton@gmail.com
Subject:   Re: QEMU: npxdna: fpcurthread == curthread 640962 times
Message-ID:  <20060617150847.W38142@delplex.bde.org>
In-Reply-To: <200606161855.k5GItlP1001526@saturn.kn-bremen.de>
References:  <200606161855.k5GItlP1001526@saturn.kn-bremen.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 16 Jun 2006, Juergen Lock wrote:

> In article <ef10de9a0606131730m7ec33522q2b2273eb3d44d603@mail.gmail.com> you write:
>> I'm getting these messages on the console when I run QEMU with kqemu
>> kernel module:
>> npxdna: fpcurthread == curthread 640957 times
>> ...
>> What are these and how do I make them stop?
>> [...]
>
> Ok since the previous thread on this,
>    http://lists.freebsd.org/pipermail/freebsd-emulation/2006-May/002081.html
> didnt end with a consensus on how to fix it, I've made a band-aid
> patch that simply disables the message for the kqemu case: (which
> _appears_ to be harmless there, maybe because qemu doesnt seem to
> be using the fpu while calling kqemu)
>
> Index: sys/i386/isa/npx.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/i386/isa/npx.c,v
> retrieving revision 1.162.2.1.2.2
> diff -u -r1.162.2.1.2.2 npx.c
> --- sys/i386/isa/npx.c	30 Apr 2006 05:17:59 -0000	1.162.2.1.2.2
> +++ sys/i386/isa/npx.c	16 Jun 2006 18:03:23 -0000
> @@ -756,6 +756,9 @@
> 	if (!npx_exists)
> 		return (0);
> 	if (PCPU_GET(fpcurthread) == curthread) {
> +#if 1
> +		if (strcmp(curthread->td_proc->p_comm, "qemu"))
> +#endif
> 		printf("npxdna: fpcurthread == curthread %d times\n",
> 		    ++err_count);
> 		stop_emulating();
>
> Of course I don't ask for something like this be committed, it is just
> for ppl for who the messages cause trouble.  Similar patch for amd64
> (untested):

This message should be a panic like it used so that the problem gets
fixed.  An invariant has been violated.  See the other other thread.

There should be another panic for this, one which occurs if npxdna() is
even called in kernel mode.  amd64 has a message instead, and i386 has
nothing, except in my version it has a Debugger() call:

%%%
Index: trap.c
===================================================================
RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v
retrieving revision 1.265
diff -u -2 -r1.265 trap.c
--- trap.c	2 Jun 2004 07:52:33 -0000	1.265
+++ trap.c	17 Jun 2004 05:38:48 -0000
@@ -428,4 +478,5 @@
  			 * registered such use.
  			 */
+			Debugger("T_DNA in kernel mode");
  			if (npxdna())
  				goto out;
%%%

Note that Debugger() has rotted in -current.  It is now misnamed
kdb_enter() and no longer prints a message when no debugger is configured.
Most uses of it had very little to do with [dgk]db, but (were like the
above one) replacements for panic() in cases you didn't really want
to panic or wanted to debug if a debugger is configured.  Such uses
should not exist in production kernels, and it is important for a
message to be printed when no debugger is configured so that the
replacement for panic() doesn't become not even a printf().

T_DNA in kernel mode used to cause an unconditional panic too, but in
rev.1.78 of i386/trap.c the panic was made conditional on npxdna()
failing.  This was to support using the FPU for optimizing copying and
zeroing on original Pentiums.  amd64 never supported this use of the
FPU but still has the above vestige of it.  The code in rev.1.78 should
have been

 			if (curpcb->pcb_flags & PCB_KERNEL_USING_NPX) {
 				if (npxdna())
 					goto out;
 			}
 			break;		/* To trap_fatal(). */

PCB_KERNEL_USING_NPX should only be set when the kernel is actually using
npx.  Your strcmp() acts similarly to a flags test but not so well.

> And finally a question for the experts: could this be fixed for real
> once the dfly fpu patch mentioned in the `Adding optimized kernel
> copying support' thread,
>    http://lists.freebsd.org/pipermail/freebsd-arch/2006-June/005267.html
> is committed?

No.  That patch still depends on the buggy rev.1.78 of trap.c to avoid
a panic in trap().  It should use a flags test like the above.  With
such a test, qemu whould panic because it has not claimed the FPU
like the dfly patch does.  If qemu claimed the FPU, npxdna() should
still panic due to the violated invariant, but qemu shouldn't claim
the FPU in the same way as the kernel copying support since it wants
to load the user state and not use the FPU itself.

I thought about the npxdna() in trap() mainly in connection with
changing the optimized kernel copying support.  I just thought of
another complication from using kqemu.  I think kqemu does an "int
$7" to cause a T_DNA to load the user FPU state.  If the kernel were
using the FPU, then the kernel FPU state would actually be loaded.
This shouldn't happen in practice since kqemu shoudn't be using the
FPU at the point where it does the "int $7".

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060617150847.W38142>