Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 May 2016 11:46:19 -0700
From:      Mark Millard <markmi@dsl-only.net>
To:        Ian Lepore <ian@freebsd.org>
Cc:        freebsd-arm <freebsd-arm@freebsd.org>, FreeBSD Toolchain <freebsd-toolchain@freebsd.org>
Subject:   Re: xorg broken on Beaglebone black revision 300438 [some armv7 instructions do not match new kernel code?]
Message-ID:  <D6F5538A-26DE-479D-86EC-0FBCF2D15981@dsl-only.net>
In-Reply-To: <C3ADFDB0-9AD1-4FB1-8F32-30F0C35879BE@dsl-only.net>
References:  <AE62E2F1-1D9F-418F-97E8-6D7F0E6B4B87@dsl-only.net> <1464127156.1204.10.camel@freebsd.org> <E268D55F-7E4D-4FF7-B38E-0912F275BF0C@dsl-only.net> <1464130548.1204.25.camel@freebsd.org> <C3ADFDB0-9AD1-4FB1-8F32-30F0C35879BE@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
[I'm just fixing some typos that I've noticed and adding a couple of =
short notes for ease of interpretation.]

On 2016-May-26, at 9:59 AM, Mark Millard <markmi@dsl-only.net> wrote:

> It may be that the following just means that more would have to be =
described to be complete about the expectations for the usage-context =
for the new alignment code in the kernel. But I ask anyway.
>=20
> It appears to me that the following code in the 11.0-CURRENT -r300701 =
check in:
>=20
>> +#if __ARM_ARCH >=3D 6
>> +#define	_ALIGNBYTES	(sizeof(int) - 1)
>> +#else
>> #define	_ALIGNBYTES	(sizeof(long long) - 1)
>> +#endif
>> #define	_ALIGN(p)	(((unsigned)(p) + _ALIGNBYTES) & =
~_ALIGNBYTES)
>=20
> and:
>=20
>> + * armv4 and v5 require alignment to the type's size.  armv6 and =
later require
>> + * that an 8-byte type be aligned to at least a 4-byte boundary; =
access to
>> + * smaller types can be unaligned.
>>  */
>>=20
>> +#if __ARM_ARCH >=3D 6
>> +#define	ALIGNED_POINTER(p, t)	(((sizeof(t) !=3D 8) || =
((unsigned)(p) & 3) =3D=3D 0))
>> +#else
>=20
>=20
> has the implicit assumption that some armv7 instructions will not be =
used: those that violate the claimed and implicit alignment requirements =
even for SCTRLR.A=3D0.
>=20
> The "ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition" =
Table A3-1 "Alignment requirements of load/store instructions" has a =
"SCTRL.A is 0" column that still lists various examples of "Alignment =
fault" instead of "Unaligned access" for some instructions. Even =
"Halfword" has a couple of instructions that still get "Alignment fault" =
for armv7, not just 8 byte wide operations.
>=20
> The instructions showing "Alignment fault" under the "SCTLR.A is 0" =
column are (or at least include):

The sizes below, such as Halfword, are the "Alignment check" sizes =
listed in Table A3-1. The "SCTRL.A is 0" columns lists "Result if check =
fails when" SCTRL.A is 0.

> Halfword: LDREXH, STREXH
>=20
> Word: LDREX, STREX, all forms of "LDM, STM, PDRD, RFE, SRS, STRD, =
SWP", PUSH (except for T3 and A2 encodings), POP (except for T3 and A2 =
encodings), LDC, LDC12, STC, STC2, VLDM, VLDR, VPOP, VPUSH, VSTM, VSTR
>=20
> Doubleword: LRDEXD, STREXD

LDREXD (not LRD. . .).

> :<align> specified: VLD1, VLD2, VLD3, VLD4, VST1, VST2, VST3, VST4
>=20
>=20
> Is the updated FreeBSD kernel supporting armv7 making the requirement =
that none of these be used? (Thus clang and gcc would need to never add =
use of them under FreeBSD-allowed compiler options.)
>=20
>=20
> The specific potential mismatches that I noticed are:
>=20
> (Below I'll use "Halfword" and the like as if they were C/C++ type =
names directly.)
>=20
> Expanding the definition for ALIGNED_POINTER(p,Halfword) is:
>=20
> (sizeof(Halfword) !=3D 8) || ((unsigned)(p) & 3) =3D=3D 0)
>=20
> which evaluates to TRUE even for odd addresses used in LDREXH and =
STREXH instructions. Yet such would get the "Alignment fault" under =
"SCTLR.A is 0".
>=20
> A similar point goes for ALIGNED_POINTER(p,Word) and its list of =
instructions showing "Alignment fault" under "SCTLR.A is 0".

For the ALIGNED_POINTER(p,Doubleword) I instead just listed the =
_ALIGN(p) form of the issue below. The ALIGNED_POINTER(p,Doubleword) =
evaluates to TRUE in cases where two instructions would get an =
"Alignment fault" under "SCTLR.A is 0". See below.

> Presuming sizeof(int)=3D=3D4: _ALIGNBYTES =3D=3D (sizeof(int) - 1) and =
in turn (sizeof(int) - 1) =3D=3D 3.
> Presuming also: sizeof(Doubleword) =3D=3D 8 .
>=20
> Then expanding _ALIGN(p) gives:
>=20
> ((unsigned)(p)+3) & ~3)
>=20
> but for sizeof(*p)=3D=3D8 the result can be an odd multiple of 4 which =
would get an "Alignment fault" under "SCTLR.A is 0" for LRDEXD and/or =
STREXD instructions.

LDREXD (not LRD. . .).

> (I've not worded the above to be explicit about :<align> (or @<align>) =
notation use that would have similar issues.)
>=20
> =3D=3D=3D
> Mark Millard
> markmi at dsl-only.net

=3D=3D=3D
Mark Millard
markmi at dsl-only.net

On 2016-May-24, at 3:55 PM, Ian Lepore <ian@freebsd.org> wrote:

On Tue, 2016-05-24 at 15:19 -0700, Mark Millard wrote:
> On 2016-May-24, at 2:59 PM, Ian Lepore <ian at freebsd.org> wrote:
>=20
>> On Tue, 2016-05-24 at 14:35 -0700, Mark Millard wrote:
>>> Quoting from Otac=EDlio Tue May 24 00:06:10 UTC 2016 and its locore
>>> -v6.S changes:
>>>=20
>>>> -    orr    r7, #CPU_CONTROL_UNAL_ENABLE
>>>> -    orr    r7, #CPU_CONTROL_AFLT_ENABLE
>>>> +    bic    r7, #CPU_CONTROL_UNAL_ENABLE
>>>> +    bic    r7, #CPU_CONTROL_AFLT_ENABLE
>>>=20
>>> -r295256 (2016-Feb-14) changed from:
>>>=20
>>> bic     r7, #CPU_CONTROL_UNAL_ENABLE
>>>=20
>>> to:
>>>=20
>>> orr     r7, #CPU_CONTROL_UNAL_ENABLE
>>>=20
>>> in two places (moving it a few lines down for each example as
>>> well).
>>> So this much of the proposed changes would be reverting the=20
>>> -r295256
>>> change. The check in comment indicates the bit is RAO/SBOP for
>>> armv7.
>>> For armv6 the check in comment claims it controls armv5
>>> compatible
>>> alignment support.
>>>=20
>>> But:
>>>=20
>>> orr     r7, #CPU_CONTROL_AFLT_ENABLE
>>>=20
>>> has been in locore-v6.S since the file's first checkin. So this
>>> change to bic here be new.
>>>=20
>>> What is the FreeBSD intent for each of the two new settings for
>>> armv7? armv6?
>>>=20
>>=20
>> It was always wrong to clear CPU_CONTROL_UNAL_ENABLE on armv7 (it's
>> documented as RAO/SBOP).  Setting it on armv6 makes the v6 (which
>> is
>> only the RPi in our world) behave the same as v7.  So that change
>> was
>> just a bugfix.
>>=20
>> I think FreeBSD is the only major OS left that is enforcing strict
>> alignment on armv6/v7 and it causes a lot of trouble for ports and
>> other 3rd party software, and prevents us from enabling certain
>> compiler options and optimizations.  I'm very close to a commit to
>> stop
>> enforcing strict alignment (clear rather than
>> CPU_CONTROL_AFLT_ENABLE).
>> I've been testing it yesterday and today, and haven't run into any
>> trouble at all.
>>=20
>> -- Ian
>=20
> Good to know. I had submitted at least one port bug report that will
> likely need to be canceled if this goes through. Effectively its an
> ABI change allowing a wider variety of code to be compliant.
>=20

It was partly all that testing you did a few months ago, and the PRs
and discussions coming out of that, which are driving these changes.=20
If I could get away with procrastinating a bit more, I probably would
(always too busy), but with the big hardfloat abi change and with a
code freeze coming up later this week, this seems like the last chance
to do some disruptive changes that are long overdue.

> Is the kernel involved in emulating access/instructions via some
> technique for misaligned access for armv6/armv7 for some types of
> instructions? Are there performance issues/tradeoffs that might
> contribute to sometimes choosing to be careful about alignment?
>=20

Nope, no emulation, the hardware is able to do this, we've just always
run with alignment faults enabled, partly because base freebsd already
has to work on other strict-alignment hardware anyway.  The driver of
this change is ports more than anything -- increasingly you run into
code that assumes #ifdef __arm__ is sufficient to mean "unaligned
access will work".

There are a few arm instructions that still require alignment, but (at
least in theory) the compiler knows about that and only emits those
instructions when it knows they're safe (such as it knowing that the
stack stays aligned to 8-byte boundaries in non-leaf functions).  We'll
see; everything seems okay in testing I've done so far.

Performance-wise, there is a cost for unaligned access.  The hardware
has to do more work so unaligned accesses take extra cycles.  On the
other hand, if the data is unaligned, you also have to use extra
cycles, potentially a lot of them, to copy-align the data or access it
a byte at a time and reassmble it in a register.  In theory this should
be faster than doing copy-align stuff.

-- Ian

> In one way I liked the strict alignment environment being around: It
> allowed easily testing if software was more portable for such issues
> vs. not. (Not that FreeBSD should use such criteria for its choices.)
>=20






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D6F5538A-26DE-479D-86EC-0FBCF2D15981>