Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 05 Aug 2002 11:50:59 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        "John S. Bucy" <bucy@ece.cmu.edu>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: weird npxintr
Message-ID:  <3D4EC913.528452C2@mindspring.com>
References:  <20020805182753.GD494@catalepsy.pdl.cmu.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
"John S. Bucy" wrote:
> We're playing with disk request scheduling as part of a research
> project; we've introduced a lot of new code to 4.4 and are now getting
> a weird npxintr that's killing us.  My understanding is that npxintr
> has to do with the x87 fpu interface for ia32s and that you get it
> when fp instructions issued from the kernel are interrupted and then
> restarted.
> 
> We are pretty sure that all of our code is fp free and are trying to
> figure out what's going on.  We're using long long a lot and I've
> heard that gcc generates buggy code for long long sometimes.  But I'd
> expect an integer arithmetic exception instead for a problem there.

The "multimedia" instructions also use the FPU registers,
because they overlay their regsters on tp of the FPU.  If you
are using the CPU specific bcopy code, this choulc be the
source of your problem.

On a hunch: are you using an AMD K6 or similar and enabling
the CPU specific options within the config file?

Copies occurring at interrupt time can result in this behaviour
due to an inability to obtain a process context for a current
process that's the real current process when the FPU state is
switched out via late-binding.


> We mask some interrupts for a relatively long period of time doing
> some computation; could that cause this?  I don't own the piece of the
> code that manipulates interrupts; is there some way to misuse
> splx/... that might cause this?
> 
> We're getting
> 
> npxintr: npxproc = 0, curproc = 0, npx_exists = 1
> panic: npxintr from nowhere
> 
> right after we do an splbio() (I think)

The copy you are doing at that point is attempting a lazy bind
without a process context (because it's happening at interrupt).

If you can, move the large data manipulation, etc., out of the
interrupt handler itself, and do it via pullup instead.  That
type of thing should only ever be in the upper level interrupt
handler (e.g. via software interrupt, or in the user process
context on behalf of which the work is being done, after the
wakeup of the user process which is waiting on an operation).
It's a bad idea to do a lot of work in the interrupt handler,
in any case, unless there is a technical reason for it, like
quenching interrupts on purpose for network cards to avoid
receiver livelock.

An example (pseudocode) would be:

bad:
	user process makes request
	sleep user process
	...
	take interrupt
		copy data from card memory to user memory
	ack interrupt
	wake user process
	user process request complete

good:
	user process makes request
	sleep user process
	...
	take interrupt
	ack interrupt
	wake user process
		copy data from card memory to user memory
	user process request complete

Not always possible, but the best bet, if the card doesn't support
prper DMA, like God intended (most hardware designers are heretics).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D4EC913.528452C2>