Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 17 Oct 2015 16:09:07 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Garrett Cooper <ngie@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r289332 - head/tools/regression/lib/msun
Message-ID:  <20151017130907.GZ2257@kib.kiev.ua>
In-Reply-To: <20151015200157.I2174@besplex.bde.org>
References:  <201510142022.t9EKMC1C088993@repo.freebsd.org> <20151015072039.GY2257@kib.kiev.ua> <20151015200157.I2174@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Oct 15, 2015 at 10:12:03PM +1100, Bruce Evans wrote:
> On Thu, 15 Oct 2015, Konstantin Belousov wrote:
> 
> > On Wed, Oct 14, 2015 at 08:22:12PM +0000, Garrett Cooper wrote:
> >> Author: ngie
> >> Date: Wed Oct 14 20:22:12 2015
> >> New Revision: 289332
> >> URL: https://svnweb.freebsd.org/changeset/base/289332
> >>
> >> Log:
> >>   Fix test-fenv:test_dfl_env when run on some amd64 CPUs
> >>
> >>   Compare the fields that the AMD [1] and Intel [2] specs say will be
> >>   set once fnstenv returns.
> >>
> >>   Not all amd64 capable processors zero out the env.__x87.__other field
> >>   (example: AMD Opteron 6308). The AMD64/x64 specs aren't explicit on what the
> >>   env.__x87.__other field will contain after fnstenv is executed, so the values
> >>   in env.__x87.__other could be filled with arbitrary data depending on how the
> >>   CPU-specific implementation of fnstenv.
> > No Intel or AMD CPU write to __other field at all.
> 
> No, they all do.
> 
> Test on old i386 on old A64:
> 
> Initial state for fegetenv():
> X 00000000  7F 12 00 00 00 00 80 1F FF FF FF FF 52 3F 67 C0
>              --cw- -mxhi --sw- -mxlo --tw- -pad- ----fip----
>  					  -----------------
> X 00000010  08 00 05 01 48 5F 74 C0 10 00 FF FF
>              -fcs- -opc- ----foff--- -fds- -pad-
>  	    ----other[16]----------------------
> 
> This always fails the test:
> cw:     passes
> mxhi:   passes
> sw:     passes
> pad+tw: passes
> fip:    always fails.  The test wants 0, but the value is always the kernel
>          eip for the instruction that is used to avoid the security hole.
>  	(The A64 CPU gives the security hole, and the kernel code is
>  	pessimal so it apparently runs always on the first use of the
>  	NPX by a thread.)
> fcs:    always fails.  The test wants 0, but the value is always the kernel
>          cs, as above.
> opc:    always fails.  The value is the opcode for the kernel instruction
> foff:   always fails.  The value is the data address for the kernel instruction
> fds:    always fails.  The value is the kernel ds extended by padding with 1's
> 
> Modified state for fxsave:
> X 0000001C  7F 12 00 38 80 00 F0 3F EE EE EE EE EE EE EE EE
> X 0000002C  EE EE EE EE EE EE EE EE 80 1F 00 00 FF FF 00 00
> 
> Partial result of an fxsave instruction after fld1; fstl in the test
> program to try to change the kernel data.  A byte value of EE indicates
> that the byte was written by memset() and not overwritten by fxsave.
> There are 16 such bytes.  These correspond to __other[16].  You are
> probably thinking of fxsave not writing these, but fegetenv() doesn't
> use fxsave and always writes these.  I think fxsave on all Intel CPUs
> write these (freefall's Xeon does), and all AMD CPUs write these after
> an error.
No, I did not thought about fegetenv() as executing FXSAVE instruction.
I did knew that fegetenv() is a wrapper around FNSAVE, and I was completely
sure that FNSAVE in the long mode, when executed by a 64bit program,
never writes anything into the %eip/data offset portion of the FNSAVE
area.  I suspect that this was motivated by unavoidable 32-bitness of
the store format.

I was unable to find a reference in either Intel SDM or in AMD APM which
would support my statement.  The closest thing is the claim that the FOP
field is not filled, in the SDM.  Still, I wonder how things are really
arranged by hardware for FNSAVE in 64bit mode.

Are your experiments below were done for the 32bit programs, or for 64bit ?
Both FXSAVE and XSAVE area formats and rules would be irrelevant for the
FreeBSD ABI.

> 
> The format is slightly different.  All the other values seem to be the
> same as for fegetenv().
> 
> Modified state for fegetenv():
> X 0000005C  7F 12 00 00 00 00 80 1F FF FF FF FF 63 82 04 08
>              --cw- -mxhi --sw- -mxlo --tw- -pad- ----fip----
>  					  -----------------
> X 0000006C  1F 00 1D 05 A0 92 04 08 2F 00 FF FF
>              -fcs- -opc- ----foff--- -fds- -pad-
>  	    ----other[16]----------------------
> 
> This is from the same state as the fxsave.  It has the following
> modifications relative to the initial state:
> - all kernel data changed to user addresses, as intended
> 
> Modified state for fnsave:
> X 0000003C  7F 12 FF FF 00 00 FF FF FF FF FF FF 63 82 04 08
> X 0000004C  1F 00 1D 05 A0 92 04 08 2F 00 FF FF
> 
> This is from the same state as the fxsave.  It has the same format
> and values as the state saved by fegetenv() except 2 padding
> fields are padded with 1's and not repurposed for mxcsr.
> 
> Modified state for fnstenv:
> X 00000078  7F 12 FF FF 00 00 FF FF FF FF FF FF 63 82 04 08
> X 00000088  1F 00 1D 05 A0 92 04 08 2F 00 FF FF EE EE EE EE
> 
> This is from the same state as the fxsave.  It has the same format
> and values as the state saved by fnsave.
> 
> So last instruction/data values are always in fenv_t on i386.  They
> are undocumented, but might be used.  On amd64 in 64-bit mode, they
> don't fit in the hardware format used by fnstenv.  fenv_t extends
> this format only to append mxcsr.  amd64 should have used the env
> part of the better-designed 64-bit fxsave format.
> 
> Test on -current amd64 on Xeon (freefall):
> 
> early fegetenv():
> X 00000000  7f 03 ff ff 00 00 ff ff  ff ff ff ff 00 00 00 00
>              --cw- -pad- --sw- -pad-  --tw- -pad- -----------
> X 00000010  00 00 00 00 00 00 00 00  00 00 ff ff 80 1f 00 00
>              ---other[16]------------------------ ---mxcsr---
> 
> The test passes.  The 2 bytes at the end of other[20] are reserved
> and not part of the last instruction/data.  They are padded normally
> with 1's and not with 0's.  The default env hard-codes the assumption
> that these reserved bits are 1's.
> 
> Since Xeons are not AMD CPUs, the kernel doesn't execute the code that
> closes the security hole and all the last instruction/data values are
> 0.  These values are written.
> 
> later fxsave:
> X 00000020  7f 03 00 00 00 00 1c 05  75 08 40 00 43 00 00 00
>              --cw- --sw- tw 00 -opc-  ----fip---- ----fcs----
> X 00000030  20 62 60 00 3b 00 00 00  80 1f 00 00 ff ff 00 00
>              ----fdp---- ----fds----  ---mxcsr--- -mxcsrmask-
> 
> fxsave always stores the last instruction/data values on Intel CPUs.
> They have the nonzero user values.  The format is the 64-bit format.
> I now understand why it looked looked 32-bit format before -- it
> has segment registers instead of 64-bit offsets, but that is apparently
> from the memory model being small -- seg:offset32 is stored instead
> of rip and edp if the operand size is 32 bits.  fcs == %cs.  %ds is
> 0 but fds == %ss.
> 
> Later fegetenv():
> X 00000060  7f 03 ff ff 00 00 ff ff  ff ff ff ff 75 08 40 00
> X 00000070  43 00 1c 05 20 62 60 00  3b 00 ff ff 80 1f 00 00
> 
> Later fnsave:
> X 00000040  7f 03 ff ff 00 00 ff ff  ff ff ff ff 75 08 40 00
> X 00000050  43 00 1c 05 20 62 60 00  3b 00 ff ff 00 00 00 00
> 
> Later fnstenv:
> X 00000080  7f 03 ff ff 00 00 ff ff  ff ff ff ff 75 08 40 00
> X 00000090  43 00 1c 05 20 62 60 00  3b 00 ff ff ee ee ee ee
> 
> Everything works right and the values are the same as for fxsave
> except for rearrangement and padding, but only because 64-bit
> offsets are not needed.
> 
> This shows that:
> - it is the __other field that should be least MD, not most MD,
>    since it has no reserved padding bits and just doesn't
>    support 64-bit offsets
> - there seem to be no hardware differences for the __other field
>    as saved by fnstenv and fegetenv().  The AMD bug/optimization
>    is only for fxsave.
> - however, the FreeBSD fix for the AMD bug/optimization gives
>    different initial values in the __other field for AMD CPUs only.
>    I don't like it, but thought it was smarter than that.  Perhaps
>    it is, but I tested my version.  My version is more efficient,
>    and this involves doing the cleaning more unconditionally.  For
>    the first use of the NPX by a thread, it would be efficient
>    enough to clean using fstenv; clean; fldenv before the normal
>    fxrstor.  But cleaning is needed on every NPX context switch,
>    and we don't want to do fstenv; clean; fldenv on every switch.
> 
> The AMD bug/optimization breaks some debugging methods and some
> error handling in applications, and the FreeBSD cleaning makes
> this worse.  An application could try to use the last instruction/
> data info for its error handling and use methods that preserve
> it internally.  But the info is lost on any context switch.  Even
> if no other thread uses the NPX, cleaning guarantees clobbering
> of the info when a thread is switched back to.  It is lost even
> after an exception where AMD hardware doesn't lose it.  Its loss
> is visible to debuggers -- debuggers will usually see the result
> of the cleaning.
> 
> Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151017130907.GZ2257>