From owner-freebsd-arch@FreeBSD.ORG Sat Mar 12 18:52:12 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 00569106566B; Sat, 12 Mar 2011 18:52:12 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id DD1018FC08; Sat, 12 Mar 2011 18:52:10 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p2CIabr0081680 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 12 Mar 2011 20:36:37 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p2CIabRB045121; Sat, 12 Mar 2011 20:36:37 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p2CIab9H045120; Sat, 12 Mar 2011 20:36:37 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 12 Mar 2011 20:36:37 +0200 From: Kostik Belousov To: Tim Bird Message-ID: <20110312183637.GY78089@deviant.kiev.zoral.com.ua> References: <4D7AA4E4.3010508@am.sony.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="793AmKFHRt0DieWq" Content-Disposition: inline In-Reply-To: <4D7AA4E4.3010508@am.sony.com> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-2.2 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_40, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-amd64@freebsd.org, freebsd-arch@freebsd.org Subject: Re: [PATCH] Add support for AVX X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Mar 2011 18:52:12 -0000 --793AmKFHRt0DieWq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Mar 11, 2011 at 02:40:36PM -0800, Tim Bird wrote: > Hello all, >=20 > I am representing a group in Sony that wishes to submit the patch > below to add the described feature to AVX. I know this is somewhat > unconventional, but I am not the original author of the patch, nor > do I have a build or run environment for BSD. However, I have > been asked to be a proxy for our BSD developer to help shepherd > this patch into FreeBSD. If you have questions about the patch, > or issues with it, I will relay the information to the original > author and try to resolve any problems. >=20 > As I am new to FreeBSD, I'm not sure the correct individual or > group to direct this patch to, but hopefully someone can provide > me some pointers. I am starting with freebsd-amd64, but maybe this > should be sent to freebsd-hackers??? The list is fine, but freebsd-arch@ or freebsd-current@ would get your contribution a bigger exposure. I added arch@ to the Cc: due to some issues that I think are critical for the patch. >=20 > Thanks for any assistance you can provide. > -- Tim Bird, Principal Software Engineer > Sony Network Entertainment Thank you for taking interest in the FreeBSD and providing a contribution ! >=20 > --------------------------- > This is a patch that provides support for AVX in FreeBSD. After it is > applied to FreeBSD 9.0-CURRENT, user programs on it can use AVX fully > and also the kernel may use it by calling fpu_kern_enter(). AVX is the > newest SIMD extension for x86 CPUs introduced from Intel's Sandy Bridge > and also will be introduced by AMD's next generation processor Bulldozer. >=20 > This patch can be applied to FreeBSD 9.0-CURRENT r217297. (r217297 is a > revision of subversion in the repository http://svn.freebsd.org/base) > You can use clang to make a kernel with it as well as gcc. The kernel > runs on both amd64 processors either AVX has or not without any > configurations. >=20 > The strategy of the patch is the same as for FPU(MMX)/SSE. Namely, code = that > initializes FPU(MMX)/SSE also initializes AVX, when it saves them it also > saves AVX as well, and restores it too only if the processor has AVX. > If the processor doesn't have AVX, it does nothing and it only adds a > test instruction and a conditional branch instruction. >=20 > Actually it does the followings; > - When a processor boots if the processor has AVX then enable its use. > at sys/amd64/amd64/initcpu.c > - Save AVX context when a thread switching occurs. > at sys/amd64/amd64/cpu_switch.S > - Change fxsave()/fxrstor() to xsave()/xrstor() in order to save/restore = AVX > status whenever it saves/restores FPU(MMX)/SSE. > at sys/amd64/amd64/fpu.c > - Initializes AVX. > fpuinit() at sys/amd64/amd64/fpu.c > - Adjust an alignment of xsave/xrstor's target addrress in order to avoid= GP > exception due to miss alignment. > xsave/xrstor instructions require that a boundary of a target address is > 64B. But the alignment of a stack is 32B. So fpugetregs()/fpusetregs() = adjust > it via a bounce buffer before they issue xsave/xrstor instruction. > fpugetregs()/fpusetregs() at sys/amd64/amd64/fpu.c >=20 > There are two issues in the patch; > 1. Machine code is embedded > The default assembler for building a kernel, as of 2.15 doesn't know > instructions which is added for AVX. So, the patch embeds machine code f= or > xsetbv/xsave/xrstor instruction. This is fine, currently there is no other way to handle our old binutils. The issue is with licensing, and it is outside the project. >=20 > 2. Extra data copying > If an alignment of an address of a stack is less than 64B, in order > to avoid GP exception, it copies contents to/from a bounce buffer. > If callers of fpugetregs()/fpusetregs() can be changed, this extra > data coping is removed. Because of the callers are MI, I hesitated > about changing them. I think it is reasonable to require the struct fpu_kern_ctx, supplied to the fpu_kern_enter(), be properly aligned. In fact, the kernel malloc will provide aligned memory silently. I do not think that allocating struct fpu_kern_ctx on the stack is good practice, and definitely would object against such code entering our tree. The reason is that the structure is large, and kernel stack size is limited. The only left place which might require the copying with your patch is the struct pcb itself, pcb_user_save probably ends up unaligned for XSAVE use. This should be fixed in amd64/amd64/vm_machdep.c:cpu_fork() when calculating the pcb location in the memory block allocated for stack. Also, might be, the kernel stack size should be increased for amd64, to account for larger pcb. >=20 > Thanks, >=20 > Index: sys/amd64/include/ucontext.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/amd64/include/ucontext.h (revision 217297) > +++ sys/amd64/include/ucontext.h (working copy) > @@ -87,7 +87,7 @@ > /* > * See for the internals of mc_fpstate[]. > */ > - long mc_fpstate[64] __aligned(16); > + long mc_fpstate[104] __aligned(64); >=20 > __register_t mc_fsbase; > __register_t mc_gsbase; And this part is the most problematic for whole patch. The cause is that you changed the ABI of signal delivery and syscalls sigreturn(2), getcontext(2), swapcontext(2) and setcontext(2). Either some measures should be implemented to provide compatibility syscalls and renumber listed syscalls, or existing syscalls and signal frame layout should be changed to be backward-compatible while still provide the place to store XSAVE header and YMM register context. I would prefer to see the second option implemented. I thought about adding new flag to mc_flags for mcontext_t, lets call it _MC_XSAVE, setting of which cause using two words at the end mc_spare. The words would contain a len and pointer to the XSAVE area, added at the end of mcontext_t. The only issue with this scheme is getcontext(2) syscall, which probably requires much more complex treating. The lesser problem with the patch, comparing with the ABI breakage, is the Intel promise to add further bits into XCR0, requiring further breakage of the signal(2) and context(2) ABIs when more architectural cpu context state is added. I would prefer to have it handled once and for all, esp. if ABI changes are required. Also, it would be nice to have i386 taken care of, both in native variant, and COMPAT_FREEBSD32 case on amd64. The issues with i386 ABI breakage are similar to that of amd64. I wanted to work on YMM stuff, but I still do not own sandybridge system yet. I am very glad to see Sony (wow ! :) to contribute there. Will you work further on the items I listed ? If not, I intend to find the time after I have SB machine in posession. > Index: sys/amd64/include/fpu.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/amd64/include/fpu.h (revision 217297) > +++ sys/amd64/include/fpu.h (working copy) > @@ -51,6 +51,11 @@ > u_char xmm_bytes[16]; > }; >=20 > +/* Contents of each AVX extended accumulator */ > +struct ymmacc { > + u_char ymm_bytes[16]; > +}; > + > struct envxmm { > u_int16_t en_cw; /* control word (16bits) */ > u_int16_t en_sw; /* status word (16bits) */ > @@ -71,7 +76,9 @@ > } sv_fp[8]; > struct xmmacc sv_xmm[16]; > u_char sv_pad[96]; > -} __aligned(16); > + u_char xsv_hd[64]; > + struct ymmacc sv_ymm[16]; > +} __aligned(64); >=20 > #ifdef _KERNEL > struct fpu_kern_ctx { > Index: sys/amd64/include/signal.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/amd64/include/signal.h (revision 217297) > +++ sys/amd64/include/signal.h (working copy) > @@ -98,7 +98,7 @@ > */ > long sc_fpformat; > long sc_ownedfp; > - long sc_fpstate[64] __aligned(16); > + long sc_fpstate[104] __aligned(64); >=20 > long sc_fsbase; > long sc_gsbase; > Index: sys/amd64/include/specialreg.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/amd64/include/specialreg.h (revision 217297) > +++ sys/amd64/include/specialreg.h (working copy) > @@ -66,6 +66,7 @@ > #define CR4_PCE 0x00000100 /* Performance monitoring counter enable */ > #define CR4_FXSR 0x00000200 /* Fast FPU save/restore used by OS */ > #define CR4_XMM 0x00000400 /* enable SIMD/MMX2 to use except 16 */ > +#define CR4_XSAVE 0x00040000 /* enable XSETBV/XGETBV */ >=20 > /* > * Bits in AMD64 special registers. EFER is 64 bits wide. > @@ -134,6 +135,9 @@ > #define CPUID2_MOVBE 0x00400000 > #define CPUID2_POPCNT 0x00800000 > #define CPUID2_AESNI 0x02000000 > +#define CPUID2_XSAVE 0x04000000 > +#define CPUID2_OSXSAVE 0x08000000 > +#define CPUID2_AVX 0x10000000 >=20 > /* > * Important bits in the Thermal and Power Management flags > Index: sys/amd64/amd64/initcpu.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/amd64/amd64/initcpu.c (revision 217297) > +++ sys/amd64/amd64/initcpu.c (working copy) > @@ -142,6 +142,19 @@ > } > } >=20 > + > +static void > +xsetbv(unsigned int index, unsigned long value) > +{ > + register_t eax =3D value; > + register_t edx =3D value >> 32; > + > + /* xsetbv */ > + __asm __volatile(".byte 0x0f,0x01,0xd1" :: > + "a"(eax), "d"(edx), "c"(index)); > +} > + > + > /* > * Initialize CPU control registers > */ > @@ -154,6 +167,11 @@ > load_cr4(rcr4() | CR4_FXSR | CR4_XMM); > cpu_fxsr =3D hw_instruction_sse =3D 1; > } > + if (cpu_feature2 & CPUID2_XSAVE) { > + load_cr4(rcr4() | CR4_XSAVE); > + /* enable to use xsave/xrstor */ > + xsetbv(0, 7); > + } > if ((amd_feature & AMDID_NX) !=3D 0) { > msr =3D rdmsr(MSR_EFER) | EFER_NXE; > wrmsr(MSR_EFER, msr); > Index: sys/amd64/amd64/cpu_switch.S > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/amd64/amd64/cpu_switch.S (revision 217297) > +++ sys/amd64/amd64/cpu_switch.S (working copy) > @@ -115,7 +115,20 @@ > jne 1f > movq PCB_SAVEFPU(%r8),%r8 > clts > + > + /* if the cpu has AVX, use xsave instead of fxsave */ > + testl $CPUID2_XSAVE,cpu_feature2 > + jne 2f > fxsave (%r8) > + jmp 3f > +2: > + mov $0x7, %eax > + mov %rdx, %rbx > + mov $0x0, %edx > + /* xsave (%r8) fix me when xsave is available */ > + .byte 0x41, 0x0f, 0xae, 0x20 > + mov %rbx, %rdx > +3: > smsw %ax > orb $CR0_TS,%al > lmsw %ax > @@ -302,8 +315,80 @@ > * Update pcb, saving current processor state. > */ > ENTRY(savectx) > + /* Fetch PCB. */ > + movq %rdi,%rcx > + > /* Save caller's return address. */ > movq (%rsp),%rax > + movq %rax,PCB_RIP(%rcx) > + > + movq %cr3,%rax > + movq %rax,PCB_CR3(%rcx) > + > + movq %rbx,PCB_RBX(%rcx) > + movq %rsp,PCB_RSP(%rcx) > + movq %rbp,PCB_RBP(%rcx) > + movq %r12,PCB_R12(%rcx) > + movq %r13,PCB_R13(%rcx) > + movq %r14,PCB_R14(%rcx) > + movq %r15,PCB_R15(%rcx) > + > + /* > + * If fpcurthread =3D=3D NULL, then the fpu h/w state is irrelevant and= the > + * state had better already be in the pcb. This is true for forks > + * but not for dumps (the old book-keeping with FP flags in the pcb > + * always lost for dumps because the dump pcb has 0 flags). > + * > + * If fpcurthread !=3D NULL, then we have to save the fpu h/w state to > + * fpcurthread's pcb and copy it to the requested pcb, or save to the > + * requested pcb and reload. Copying is easier because we would > + * have to handle h/w bugs for reloading. We used to lose the > + * parent's fpu state for forks by forgetting to reload. > + */ > + pushfq > + cli > + movq PCPU(FPCURTHREAD),%rax > + testq %rax,%rax > + je 1f > + > + movq TD_PCB(%rax),%rdi > + leaq PCB_SAVEFPU(%rdi),%rdi > + clts > + > + /* if the cpu has AVX, use xsave instead of fxsave */ > + testl $CPUID2_XSAVE,cpu_feature2 > + jne 2f > + fxsave (%rdi) > + jmp 3f > +2: > + mov $0x7, %eax > + mov $0x0, %edx > + /* xsave (%rdi) fix me when xsave is available */ > + .byte 0x0f, 0xae, 0x27 > +3: > + smsw %ax > + orb $CR0_TS,%al > + lmsw %ax > + > + movq $PCB_SAVEFPU_SIZE,%rdx /* arg 3 */ > + leaq PCB_SAVEFPU(%rcx),%rsi /* arg 2 */ > + /* arg 1 (%rdi) already loaded */ > + call bcopy > +1: > + popfq > + > + ret > +END(savectx) > + > +/* > + * savectx2(xpcb) > + * Update xpcb, saving current processor state. > + */ > +ENTRY(savectx2) > + /* Fetch XPCB. */ > + movq %rdi,%r8 > + /* Save caller's return address. */ > + movq (%rsp),%rax > movq %rax,PCB_RIP(%rdi) >=20 > movq %rbx,PCB_RBX(%rdi) > @@ -355,7 +440,17 @@ > str PCB_TR(%rdi) >=20 > clts > + /* if the cpu has AVX, use xsave instead of fxsave */ > + testl $CPUID2_XSAVE,cpu_feature2 > + jne 1f > fxsave PCB_USERFPU(%rdi) > + jmp 2f > +1: > + mov $0x7, %eax > + mov $0x0, %edx > + /* xsave PCB_USERFPU(%rdi) fix me when xsave is available */ > + .byte 0x0f, 0xae, 0xa7, 0x00, 0x01, 0x00, 0x00 > +2:=09 > movq %rsi,%cr0 /* The previous %cr0 is saved in %rsi. */ >=20 > movl $1,%eax > Index: sys/amd64/amd64/fpu.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/amd64/amd64/fpu.c (revision 217297) > +++ sys/amd64/amd64/fpu.c (working copy) > @@ -105,6 +105,40 @@ >=20 > static struct savefpu fpu_initialstate; >=20 > + > +static void > +xsave(struct savefpu *addr) > +{ > + if (cpu_feature2 & CPUID2_XSAVE) { > + /* __asm("xsave %0" : "=3Dm" (*(addr))) -> xsave (%rdi) */ > + __asm __volatile("mov $0x7, %%eax\n" > + "mov $0x0, %%edx\n" > + ".byte 0x0f, 0xae, 0x27" > + :: "D"(addr) > + : "%eax", "%edx", "memory"); > + } else { > + fxsave(addr); > + } > +} > + > + > +static void > +xrstor(struct savefpu *addr) > +{ > + if (cpu_feature2 & CPUID2_XSAVE) { > + /* __asm("xrstor %0" : : "m" (*(addr))) -> xrstor (%rdi) */ > + __asm __volatile("mov $0x7, %%eax\n" > + "mov $0x0, %%edx\n" > + ".byte 0x0f, 0xae, 0x2f" > + :: "D"(addr) > + : "%eax", "%edx"); > + } else { > + fxrstor(addr); > + } > +} > + > + > + > /* > * Initialize the floating point unit. On the boot CPU we generate a > * clean state that is used to initialize the floating point unit when > @@ -128,13 +162,14 @@ > mxcsr =3D __INITIAL_MXCSR__; > ldmxcsr(mxcsr); > if (PCPU_GET(cpuid) =3D=3D 0) { > - fxsave(&fpu_initialstate); > + xsave(&fpu_initialstate); > if (fpu_initialstate.sv_env.en_mxcsr_mask) > cpu_mxcsr_mask =3D fpu_initialstate.sv_env.en_mxcsr_mask; > else > cpu_mxcsr_mask =3D 0xFFBF; > bzero(fpu_initialstate.sv_fp, sizeof(fpu_initialstate.sv_fp)); > bzero(fpu_initialstate.sv_xmm, sizeof(fpu_initialstate.sv_xmm)); > + bzero(fpu_initialstate.sv_ymm, sizeof(fpu_initialstate.sv_ymm)); > } > start_emulating(); > intr_restore(saveintr); > @@ -150,7 +185,7 @@ > critical_enter(); > if (curthread =3D=3D PCPU_GET(fpcurthread)) { > stop_emulating(); > - fxsave(PCPU_GET(curpcb)->pcb_save); > + xsave(PCPU_GET(curpcb)->pcb_save); > start_emulating(); > PCPU_SET(fpcurthread, 0); > } > @@ -423,7 +458,7 @@ > * the PCB doesn't contain a clean FPU state. Explicitly > * load an initial state. > */ > - fxrstor(&fpu_initialstate); > + xrstor(&fpu_initialstate); > if (pcb->pcb_initial_fpucw !=3D __INITIAL_FPUCW__) > fldcw(pcb->pcb_initial_fpucw); > if (PCB_USER_FPU(pcb)) > @@ -432,7 +467,7 @@ > else > set_pcb_flags(pcb, PCB_FPUINITDONE); > } else > - fxrstor(pcb->pcb_save); > + xrstor(pcb->pcb_save); > critical_exit(); > } >=20 > @@ -469,7 +504,26 @@ > } > critical_enter(); > if (td =3D=3D PCPU_GET(fpcurthread) && PCB_USER_FPU(pcb)) { > - fxsave(&pcb->pcb_user_save); > + /* > + * FIXME: > + * xsave requires that a boundary of a target address is 64B > + * and struct savefpu has an attribute for aligned 64B. But > + * gcc doesn't treat a variable on stack as 64B boundary. > + * So, it uses a buffer to do xsave safely and copy them the > + * target if a miss aligned address is passed. > + */ > + if ((unsigned long)&pcb->pcb_user_save % 64) { > + char tmp[sizeof(struct savefpu) + 63]; > + void *p =3D (void *)((unsigned long)(tmp + 63) & ~0x3f); > + > + bzero(((struct savefpu*)p)->xsv_hd, 64); > + > + xsave((struct savefpu*)p); > + bcopy(p, &pcb->pcb_user_save, sizeof(struct savefpu)); > + } else { > + bzero(pcb->pcb_user_save.xsv_hd, 64); > + xsave(&pcb->pcb_user_save); > + } > critical_exit(); > return (_MC_FPOWNED_FPU); > } else { > @@ -502,7 +556,24 @@ > pcb =3D td->td_pcb; > critical_enter(); > if (td =3D=3D PCPU_GET(fpcurthread) && PCB_USER_FPU(pcb)) { > - fxrstor(addr); > + > + /* In order to avoid #GP, initialize some of XSAVE.HEADER. */ > + u_long *header =3D (u_long*)(addr->xsv_hd); > + header[0] =3D 7; > + header[1] =3D 0; > + header[2] =3D 0; > + > + /* > + * FIXME: See fpugetregs() > + */ > + if ((unsigned long)addr % 64) { > + char tmp[sizeof(struct savefpu) + 63]; > + void *p =3D (void *)((unsigned long)(tmp + 63) & ~0x3f); > + bcopy(addr, p, sizeof(struct savefpu)); > + xrstor((struct savefpu*)p); > + } else { > + xrstor(addr); > + } > critical_exit(); > set_pcb_flags(pcb, PCB_FPUINITDONE | PCB_USERFPUINITDONE); > } else { > Index: sys/amd64/amd64/identcpu.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/amd64/amd64/identcpu.c (revision 217297) > +++ sys/amd64/amd64/identcpu.c (working copy) > @@ -275,7 +275,7 @@ > "\012SSSE3" /* SSSE3 */ > "\013CNXT-ID" /* L1 context ID available */ > "\014" > - "\015" > + "\015FMA" > "\016CX16" /* CMPXCHG16B Instruction */ > "\017xTPR" /* Send Task Priority Messages*/ > "\020PDCM" /* Perf/Debug Capability MSR */ > @@ -291,7 +291,7 @@ > "\032AESNI" /* AES Crypto*/ > "\033XSAVE" > "\034OSXSAVE" > - "\035" > + "\035AVX" > "\036" > "\037" > "\040" >=20 >=20 >=20 > _______________________________________________ > freebsd-amd64@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-amd64 > To unsubscribe, send any mail to "freebsd-amd64-unsubscribe@freebsd.org" --793AmKFHRt0DieWq Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk17vTQACgkQC3+MBN1Mb4g45QCgywazhcRbcACHyMaBpog/ueXM MiMAnjRnrIo6LmnoZARyytIdtXVEwcSG =xt2a -----END PGP SIGNATURE----- --793AmKFHRt0DieWq--