Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 28 May 2006 23:58:21 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        David Xu <davidxu@freebsd.org>
Cc:        cvs-src@freebsd.org, src-committers@freebsd.org, cvs-all@freebsd.org
Subject:   Re: cvs commit: src/sys/i386/isa npx.c
Message-ID:  <20060528223012.H2592@epsplex.bde.org>
In-Reply-To: <200605282006.26364.davidxu@freebsd.org>
References:  <200605280440.k4S4ej96064322@repoman.freebsd.org> <200605281741.52752.davidxu@freebsd.org> <20060528201544.L20679@delplex.bde.org> <200605282006.26364.davidxu@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 28 May 2006, David Xu wrote:

> On Sunday 28 May 2006 19:15, Bruce Evans wrote:
>
>> Starting with a clean FP state like the old code does is safest, but POSIX
>> explicitly requires copying the environment, and bugs in the new code
>> result in half of the most dangerous part of the environment (the SSE
>> half of the exception flags) being copied anyway.
>>
>> Pending exceptions are not the only problem here.  Pending exceptions
>> are an i387 thing.  Most exceptions are non-pending ones for IEEE inexact.
>
> how about following patch ?

I want one that removes about 30 lines, not one that adds about 20 lines.

> Index: npx.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/i386/isa/npx.c,v
> retrieving revision 1.168
> diff -u -u -r1.168 npx.c
> --- npx.c	28 May 2006 04:40:45 -0000	1.168
> +++ npx.c	28 May 2006 11:40:58 -0000
> @@ -99,6 +99,7 @@
> #define	fxrstor(addr)		__asm("fxrstor %0" : : "m" (*(addr)))
> #define	fxsave(addr)		__asm __volatile("fxsave %0" : "=m" (*(addr)))
> #define	ldmxcsr(__csr)		__asm __volatile("ldmxcsr %0" : : "m" (__csr))
> +#define	stmxcsr(addr)		__asm __volatile("stmxcsr %0" : "=m" (*(addr)))
> #endif
> #define	start_emulating()	__asm("smsw %%ax; orb %0,%%al; lmsw %%ax" \
> 				      : : "n" (CR0_TS) : "ax")
> @@ -950,7 +951,8 @@
> {
> 	union savefpu	*state;
> 	u_int32_t	mxcsr;
> -	u_int32_t	cw;
> +	u_int16_t	cw;
> +	register_t	s;
>
> 	if (!(td->td_pcb->pcb_flags & PCB_NPXINITDONE)) {
> 		newtd->td_pcb->pcb_flags &= ~PCB_NPXINITDONE;
> @@ -958,21 +960,40 @@
> 	}
>
> 	state = &newtd->td_pcb->pcb_save;
> -	/* get control word */
> -	if (npxgetregs(td, state))
> -		return;
> -	if (cpu_fxsr) {
> -		mxcsr = state->sv_xmm.sv_env.en_mxcsr;
> -		cw = state->sv_xmm.sv_env.en_cw;
> +	s = intr_disable();
> +	if (curthread == PCPU_GET(fpcurthread)) {
> +#ifdef CPU_ENABLE_SSE
> +		if (cpu_fxsr) {
> +			stmxcsr(&mxcsr);
> +			fnstcw(&cw);
> +		}
> +		else
> +#endif
> +		{
> +			mxcsr = 0;
> +			fnstcw(&cw);
> +		}
> 	} else {
> -		cw = state->sv_87.sv_env.en_cw;
> -		mxcsr = 0;
> +#ifdef CPU_ENABLE_SSE
> +		if (cpu_fxsr) {
> +			mxcsr = td->td_pcb->pcb_save.sv_xmm.sv_env.en_mxcsr;
> +			cw = td->td_pcb->pcb_save.sv_87.sv_env.en_cw;
> +		}
> +		else
> +#endif
> +		{
> +			mxcsr = 0;
> +			cw = td->td_pcb->pcb_save.sv_87.sv_env.en_cw;
> +		}
> 	}
> +	intr_restore(s);

Save the parent's state using npxsave() as in cpu_fork().  This is much
cleaner and and only slightly less efficient.  I still think the entire
environment in that state needs to be copied.

> 	bcopy(&npx_cleanstate, state, sizeof(*state));
> +#ifdef CPU_ENABLE_SSE
> 	if (cpu_fxsr) {
> 		state->sv_xmm.sv_env.en_cw = cw;
> -		state->sv_xmm.sv_env.en_mxcsr = mxcsr;
> +		state->sv_xmm.sv_env.en_mxcsr = mxcsr & ~0x3F;

OK if it is correct to not copy the environment.

BTW, your other patch that masks the top 16 bits in mxcsr needs to be
extended to sometimes mask 0x40 (DAZ).  The amd64 arch manual says to
mask with mxcsr_mask which is in the fxsave image.  Cleared bits in
mxcsr_mask indicate reserved bits in a future-proof way.  Unfortunately,
this isn't past-proof so the following complications are needed: if
the CPU stores 0 in mxcsr_mask, it means that mxcsr_mask is not
supported; old CPUs (at least amd64 ones) that don't support mxcsr_mask
also don't support DAZ, so the correct mask for them is 0xFFBF.  These
complications are implemented in Linux's i386 so I suppose they are
no-ops at worst for non-amd64 CPUs.  Linux stores the mask in a global
variable.  I think this breaks the asymmetric SMP case more than
necessary.  FreeBSD's amd64 uses mxcsr_mask without fixing up the case
where it is 0.  So does Linux-2.6.10's x86_64 (it Initializes the
global variable but doesn't use it).

Behaviour of mxcsr and DAZ on some CPUs:

AXP2600 (Barton): mxcsr_mask = 0; attempting to set DAZ traps
P3-800 (freefall): same as AXP2600
A64 (Id0xf48 stepping 8): mxcsr_mask = 0xFFFF; setting DAZ works

> 	} else
> +#endif

vm_machdep.c is missing a DEV_NPX ifdef around the call here.  Of course,
lots of things would break if the option to not use npx were exercised.
CPU_ENABLE_SSE is a silly option too.

> 		state->sv_87.sv_env.en_cw = cw;
> 	newtd->td_pcb->pcb_flags |= PCB_NPXINITDONE;
> }
>

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060528223012.H2592>