Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Mar 2017 12:06:04 -0700
From:      Mark Millard <markmi@dsl-only.net>
To:        Andrew Gierth <andrew@tao11.riddles.org.uk>, Sylvain Garrigues <sylvain@sylvaingarrigues.com>
Cc:        freebsd-arm <freebsd-arm@freebsd.org>, mmel@freebsd.org
Subject:   Re: Is CPUTYPE=cortex-A7 supposed to work?
Message-ID:  <91BB5C78-B2F8-458E-B5F4-DE31068D5943@dsl-only.net>
In-Reply-To: <6D4C0F5B-6F13-44DE-A1C6-83A1D3608EB4@dsl-only.net>
References:  <871suc3nv8.fsf@news-spur.riddles.org.uk> <CANCZdfq4EwH%2B_9FVNai8s6Y-gdTjHJ8dNkJwSrnF%2BSAkdwvYdg@mail.gmail.com> <8737ely05c.fsf@news-spur.riddles.org.uk> <CANCZdfpftVHaPahTOP0vxB-FR%2BKtpqY9JMJr=F2DGifD0fhKMQ@mail.gmail.com> <87wpbxw3yd.fsf@news-spur.riddles.org.uk> <79EBD44B-2C2D-4394-A90C-DF494A049F20@dsl-only.net> <E6BC9F77-F65B-4918-9E22-3BFECA268E30@sylvaingarrigues.com> <674facba-68cd-8ce1-887a-1ef3c51520bc@freebsd.org> <ED8405B6-1150-4A9C-AD93-5C74306FB361@sylvaingarrigues.com> <87pohh2n3c.fsf@news-spur.riddles.org.uk> <6D4C0F5B-6F13-44DE-A1C6-83A1D3608EB4@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
[I accidentally included some fpregs32 material, so this corrects
the impression that might leave.]

On 2017-Mar-16, at 11:44 AM, Mark Millard <markmi at dsl-only.net> =
wrote:

> On 2017-Mar-16, at 10:27 AM, Andrew Gierth <andrew at =
tao11.riddles.org.uk> wrote:
>>=20
>>>>>>> "Sylvain" =3D=3D Sylvain Garrigues =
<sylvain@sylvaingarrigues.com> writes:
>>=20
>> Sylvain> Thank you so much for your quick feedback Michal. Good to =
know
>> Sylvain> this matter is into good hands. I=E2=80=99m afraid I'm still =
afraid
>> Sylvain> about `basic=E2=80=99 programs like git being still =
potentially broken
>> Sylvain> when kernel+world+ports have CPUTYPE=3Dcortex-a7 in =
make.conf -
>> Sylvain> Andrew said a simple `git clone' could fail, more precisely
>> Sylvain> (quoting him):
>>=20
>>>> I have determined that the sha1 failures occur only if the
>>>> NEON-enabled SHA1 block function is interrupted by a signal. This
>>>> explains why it fails in git (which is using SIGALRM to set a
>>>> "display progress" flag) but not in standalone SHA1 tests;
>>=20
>> Sylvain> Removing CPUTYPE apparently fixes things hence I=E2=80=99m =
not 100%
>> Sylvain> confident yet of keeping CPUTYPE=3Dcortex-a7 myself even if =
only
>> Sylvain> a few ports might be affected. Git is an important port, who
>> Sylvain> knows what other ports are broken :-)
>>=20
>> Let me clarify this.
>>=20
>> Without CPUTYPE, things _appear_ to work because only explicit use of
>> floating-point exposes the bug, and it's extremely rare for programs =
to
>> use floating-point in signal handlers, and even then the only result
>> would be incorrect floating-point calculations in the interrupted =
code.
>>=20
>> With CPUTYPE, the compiler can generate NEON instructions for integer
>> code; even without heavy optimization enabled, it might choose to use
>> NEON register load/store to copy small data structures for example. =
One
>> piece of code which is affected by this is the signal handling =
functions
>> in libthr, which wrap the program's own signal handler functions; so
>> now, _every_ signal handler in a program linked with libthr uses the
>> NEON registers, and the result isn't limited to corrupting
>> floating-point calculations but can corrupt data structures or copied
>> memory, or the results of vectorized code.
>>=20
>> The specific failures that I saw -- git failing, emacs crashing, =
errors
>> from openssl speed -- were all of this second kind and therefore not
>> directly reproducible without CPUTYPE; but the test program I gave =
for
>> the bug report demonstrates the problem by using explicit =
floating-point
>> (in a highly contrived way) and therefore reproduces the issue even
>> without CPUTYPE.
>>=20
>> So even though the bug is in the kernel and affects all armv6 targets
>> whether NEON is in use or not, the chances of actually hitting it are
>> pretty negligible if you built the world without CPUTYPE. But if you
>> build with CPUTYPE, then potentially any code that catches a signal =
is
>> affected; it's just that programs (like git) that combine signal
>> handling with vectorized crypto code, or programs (like emacs) that =
use
>> signal handling very heavily, have the greatest probability of =
failure.
>>=20
>> tl/dr: building without CPUTYPE is a workaround that simply reduces =
both
>> the chance and severity of failure; building with CPUTYPE currently
>> breaks almost everything, but with a probability that varies wildly
>> depending on what the application does.
>>=20
>> --=20
>> Andrew.
>=20
> As I understand there are also issues beyond the fix for signal
> delivery.
>=20
> On 2017-Mar-12, at 12:17 AM, Michal Meloun <melounmichal@gmail.com> =
wrote:
>=20
>> The struct fpreg is also wrong and I'm not sure if
>> or how we can to fix this in compatible way.
>=20
> Looking up some details shows sys/arm/include/reg.h with:
>=20
> struct fpreg {
>        unsigned int fpr_fpsr;
>        fp_reg_t fpr[8];
> };
>=20
> [Covers only 8 floating point registers?]
>=20
> and shows sys/arm/include/fp.h with:
>=20
> typedef struct fp_extended_precision {
>        u_int32_t fp_exponent;
>        u_int32_t fp_mantissa_hi;
>        u_int32_t fp_mantissa_lo;
> } fp_extended_precision_t;
>=20
> typedef struct fp_extended_precision fp_reg_t;
> . . .
> /*
> * Type for a saved FP context, if we want to translate the context to =
a
> * user-readable form
> */
>=20
> typedef struct {
>        u_int32_t fpsr;
>        fp_extended_precision_t regs[8];
> } fp_state_t;
>=20
>=20
> So each of:
>=20
> struct fpreg
> fp_state_t
>=20
> has room for 8 instances of 96 bits (beyond fpsr), not sufficient
> for 32 double precision (i.e., 64-bit) registers.
>=20
> The arm code also has:
>=20
> # grep -r "\<fpreg\>" /usr/src/sys/arm/ | more
> /usr/src/sys/arm/arm/machdep.c:fill_fpregs(struct thread *td, struct =
fpreg *regs)
> /usr/src/sys/arm/arm/machdep.c:set_fpregs(struct thread *td, struct =
fpreg *regs)
> /usr/src/sys/arm/include/reg.h:struct fpreg {
> /usr/src/sys/arm/include/reg.h:int     fill_fpregs(struct thread *, =
struct fpreg *);
> /usr/src/sys/arm/include/reg.h:int     set_fpregs(struct thread *, =
struct fpreg *);
>=20
> And the system has:
>=20
> /usr/src/sys/sys/procfs.h:typedef struct fpreg fpregset_t;
> /usr/src/sys/sys/procfs.h:    size_t    pr_fpregsetsz;  /* =
sizeof(fpregset_t) (1) */
> /usr/src/sys/sys/procfs.h:typedef fpregset_t prfpregset_t;
> /usr/src/sys/sys/ptrace.h:struct fpreg;
> /usr/src/sys/sys/ptrace.h:int   proc_read_fpregs(struct thread *_td, =
struct fpreg *_fpreg);
> /usr/src/sys/sys/ptrace.h:int   proc_write_fpregs(struct thread *_td, =
struct fpreg *_fpreg);
>=20
> and:
>=20
> /usr/src/sys/kern/sys_process.c: * Ptrace doesn't support fpregs at =
all, and there are no security holes
> /usr/src/sys/kern/sys_process.c: * or translations for fpregs, so we =
can just copy them.
> /usr/src/sys/kern/sys_process.c:proc_read_fpregs(struct thread *td, =
struct fpreg *fpregs)
> /usr/src/sys/kern/sys_process.c:        PROC_ACTION(fill_fpregs(td, =
fpregs));
> /usr/src/sys/kern/sys_process.c:proc_write_fpregs(struct thread *td, =
struct fpreg *fpregs)
> /usr/src/sys/kern/sys_process.c:        PROC_ACTION(set_fpregs(td, =
fpregs));
> /usr/src/sys/kern/sys_process.c:                struct fpreg fpreg;
> /usr/src/sys/kern/sys_process.c:                error =3D =
COPYIN(uap->addr, &r.fpreg, sizeof r.fpreg);
> /usr/src/sys/kern/sys_process.c:                error =3D =
COPYOUT(&r.fpreg, uap->addr, sizeof r.fpreg);
> /usr/src/sys/kern/sys_process.c:                error =3D =
PROC_WRITE(fpregs, td2, addr);
> /usr/src/sys/kern/sys_process.c:                error =3D =
PROC_READ(fpregs, td2, addr);
>=20
> and there is use of some of the above in:
>=20
> /usr/src/sys/kern/sys_process.c: * proc_read_fpregs, proc_write_fpregs
> /usr/src/sys/kern/sys_process.c:proc_read_fpregs(struct thread *td, =
struct fpreg *fpregs)
> /usr/src/sys/kern/sys_process.c:        PROC_ACTION(fill_fpregs(td, =
fpregs));
> /usr/src/sys/kern/sys_process.c:proc_write_fpregs(struct thread *td, =
struct fpreg *fpregs)
> /usr/src/sys/kern/sys_process.c:        PROC_ACTION(set_fpregs(td, =
fpregs));

Sorry: I did not intend to include the "fpregs32" material:

> /usr/src/sys/kern/sys_process.c:proc_read_fpregs32(struct thread *td, =
struct fpreg32 *fpregs32)
> /usr/src/sys/kern/sys_process.c:        PROC_ACTION(fill_fpregs32(td, =
fpregs32));
> /usr/src/sys/kern/sys_process.c:proc_write_fpregs32(struct thread *td, =
struct fpreg32 *fpregs32)
> /usr/src/sys/kern/sys_process.c:        PROC_ACTION(set_fpregs32(td, =
fpregs32));

I have not looked into fpregs32 at all. As far as I
saw looking at fpregs material fpregs32 seemed to
be separate.

My guess would be fpregs32 is for amd64, powerpc64, sparc64,
mips and their supporting 32 bit processes as well as 64-bit
on 64-bit processors.

So far as I know FreeBSD does not have such support for
32-bit code in arm64/aarch64 contexts.

> I may not have found everything relevant.
>=20
> It appears that fp_state_t is unused in /usr/src/sys/ .
>=20
>=20
> Note: I was looking at /usr/src/sys/ for. . .
>=20
> # uname -paKU
> FreeBSD FreeBSDx64 12.0-CURRENT FreeBSD 12.0-CURRENT  r314687M  amd64 =
amd64 1200023 1200023


=3D=3D=3D
Mark Millard
markmi at dsl-only.net




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?91BB5C78-B2F8-458E-B5F4-DE31068D5943>