Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 28 May 2006 21:15:29 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        David Xu <davidxu@freebsd.org>
Cc:        cvs-src@freebsd.org, src-committers@freebsd.org, cvs-all@freebsd.org
Subject:   Re: cvs commit: src/sys/i386/isa npx.c
Message-ID:  <20060528201544.L20679@delplex.bde.org>
In-Reply-To: <200605281741.52752.davidxu@freebsd.org>
References:  <200605280440.k4S4ej96064322@repoman.freebsd.org> <20060528163018.F1344@epsplex.bde.org> <200605281741.52752.davidxu@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 28 May 2006, David Xu wrote:

> On Sunday 28 May 2006 15:25, Bruce Evans wrote:
>> On Sun, 28 May 2006, David Xu wrote:
>>> davidxu     2006-05-28 04:40:45 UTC
>>>
>>>  FreeBSD src repository
>>>
>>>  Modified files:
>>>    sys/i386/isa         npx.c
>>>  Log:
>>>  If parent thread never used FPU, the only work is to clear flag
>>
>> PCB_NPXINITDONE doesn't mean that a thread never used the FPU.  It
>> means that the FPU state is not the default.
>>
> PCB_NPXINITDONE does mean thread used FPU, I don't know where you got
> the information.

>From discussions with deischen when we designed and he implemented it.
PCB_NPXINITDONE is cleared by npxdrop().  npxdrop() is called by
npxgetregs() (this gives another bug in the new code -- see below)
and by fpstate_drop().  fpstate_drop() is called from several places,
notably by sendsig() to give a clean state for signal handlers, and by
exec_setregs().  So it is normal for a single thread to have used the
FPU many times despite PCB_NPXINITDONE being clear -- after the first
use of the FPU, this happens in every signal handler and after every
exec().

Another bug in new code: npxgetregs() drops (resets to the default)
the npx state of fpcurthread when the state is in the npx and cpu_fxsr
is clear.  It does this because that's what the hardware (fnsave) does
and reloading the state would be a pessimization for most callers.  It
is designed to be called mainly from sendsig() where resetting the
state is exactly what is wanted.  Calling it from the new code clobbers
the parent's state in this case.

>
>>>  PCB_NPXINITDONE for new thread and let trap code initialize it.
>>>
>>>  Revision  Changes    Path
>>>  1.168     +6 -1      src/sys/i386/isa/npx.c
>>
>> Why do so much?  If PCB_NPXINITDONE is clear in the parent, then it
>> is already clear in the child, since it has just been copied (the whole
>> pcb has been copied).  It doesn't take a function call to check if
>> PCB_INITDONE is clear in the parent.
>>
> I have already checked PCB_NPXINITDONE for parent thread, if it is not set,
> npx_fork_thread will return from fast path.

Yes, why do so much?  Check the parent's flag in the caller and clear the
child's flag when it is known to be already clear.

>> cpu_fork() calls npxsave() to force the FPU state to the pcb so that
>> it is automatically copied at no extra cost when the whole pcb is
>> copied.  This causes the entire FPU state to be inherited.  Why is
>> cpu_set_upcall() different?  I the old difference was just an invalid
>> "optimization" and the new difference is not good.  POSIX requires
>> fork() to duplicate the process _exactly_ except for some things not
>> including any FPU or even integer state, and FreeBSD implements this
>> except for clobbering a few integer registers.  POSIX places fewer
>> requirments on pthread_create() by listing things that are inherited
>> and not requiring things not listed to be preserved; old versions of
>> POSIX apparently didn't even require the FPU state to be preserved,
>> but that was a bug and was fixed by aligning with SUS.  Since there
>> is considerable experience that duplicating the whole FPU state in
>> fork() doesn't cause problems, I think it wouldn't cause problems in
>> pthread_create() either.  Anyway, POSIX must really mean that the
>> whole FP environment must be inherited, so all that pthread_create()
>> is permitted to do differently than fork() is scrubbing the data
>> registers.
>>
>
> fork and pthread_create are different, cpu_set_upcall is for new thread when
> the process is threaded application, pthread_create executes code from
> different point, while fork continues from the syscall, the new thread
> entry function DOES NOT understand pending exceptions, the new thread
> should be started with clean FPU state, it is not needed to mess new thread's
> FPU state.

Starting with a clean FP state like the old code does is safest, but POSIX
explicitly requires copying the environment, and bugs in the new code
result in half of the most dangerous part of the environment (the SSE
half of the exception flags) being copied anyway.

Pending exceptions are not the only problem here.  Pending exceptions
are an i387 thing.  Most exceptions are non-pending ones for IEEE inexact.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060528201544.L20679>