Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Sep 2013 23:13:26 +0200
From:      Dimitry Andric <dim@FreeBSD.org>
To:        Tijl Coosemans <tijl@freebsd.org>
Cc:        toolchain@FreeBSD.org
Subject:   Re: i386 clang optimisation problem with stack alignment
Message-ID:  <9893CCE3-C7EF-4B52-B32E-8F1A0CE022C8@FreeBSD.org>
In-Reply-To: <20130910183456.175162f7@kalimero.tijl.coosemans.org>
References:  <20130910181601.2e89af87@kalimero.tijl.coosemans.org> <20130910183456.175162f7@kalimero.tijl.coosemans.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_389096EB-0326-4041-A3A8-19707FF05C83
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

On Sep 10, 2013, at 18:34, Tijl Coosemans <tijl@freebsd.org> wrote:
> On Tue, 10 Sep 2013 18:16:01 +0200 Tijl Coosemans wrote:
>> I've attached a small test program extracted from =
multimedia/gstreamer-ffmpeg
>> (libavcodec/h264_cabac.c:ff_h264_init_cabac_states(H264Context *h)).
>>=20
>> When you compile and run it like this on FreeBSD/i386, it results in =
a
>> SIGBUS:
>>=20
>> % cc -o paddd paddd.c -O3 -msse2 -fPIE -fomit-frame-pointer=20
>> % ./paddd
>> Bus error
>>=20
>> The reason is this instruction where %esp isn't 16-byte aligned:
>> paddd   (%esp), %xmm7

Hmm, as far as I can see, the problem is related to position independent =
code, in combination with omitting the frame pointer:

$ cc -o paddd paddd.c -O3 -msse2 -fomit-frame-pointer
$ ./paddd
$=20

$ cc -o paddd paddd.c -O3 -msse2 -fPIE -fomit-frame-pointer
$ ./paddd
Bus error (core dumped)
$=20

$ cc -o paddd paddd.c -O3 -msse2 -fPIE -fno-omit-frame-pointer
$ ./paddd
$=20


>> Is this an upstream bug or is this because of local changes (to make =
the
>> stack 4 byte aligned by default or something)?

The 4 byte alignment on i386 changes are from upstream, but we initiated =
them after a bit of discussion (see =
http://llvm.org/viewvc/llvm-project?view=3Drevision&revision=3D167632 ).

Note the problem only occurs at -O3, which enables the vectorizer, so =
there might an issue with it in combination with position independent =
code generation and omitting frame pointers.  If you check what clang =
passes to its cc1 stage with your original command line, it gives:

"/usr/bin/cc" -cc1 -triple i386-unknown-freebsd10.0 -emit-obj =
-disable-free -main-file-name paddd.c -mrelocation-model pic -pic-level =
2 -pie-level 2 -masm-verbose -mconstructor-aliases -target-cpu i486 =
-target-feature +sse2 -v -resource-dir /usr/bin/../lib/clang/3.3 -O3 =
-fdebug-compilation-dir /home/dim/bugs/paddd -ferror-limit 19 =
-fmessage-length 130 -mstackrealign -fobjc-runtime=3Dgnustep =
-fobjc-default-synthesize-properties -fdiagnostics-show-option =
-fcolor-diagnostics -backend-option -vectorize-loops -o =
/tmp/paddd-zdRbKM.o -x c paddd.c

So it does pass -mstackrealign, but for some reason it isn't always =
effective.  For the -fPIE -fomit-frame-pointer case, the prolog for =
init_states() becomes :

init_states:                            # @init_states
# BB#0:                                 # %vector.ph
        pushl   %ebp
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        subl    $28, %esp
        calll   .L0$pb
.L0$pb:
        popl    %edx

If you remove -fPIE, the data is directly accessed via its (properly 16 =
byte aligned) symbol, so there is no alignment problem:

        paddd   .LCPI0_0, %xmm7

but the stack is not realigned in the prolog either:

init_states:                            # @init_states
# BB#0:                                 # %vector.ph
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        movd    16(%esp), %xmm0
...

Then, if you use -fPIE, but add -fno-omit-frame-pointer:

init_states:                            # @init_states
# BB#0:                                 # %vector.ph
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        andl    $-16, %esp
        subl    $48, %esp
        calll   .L0$pb
.L0$pb:
        popl    %edx
.Ltmp0:

E.g., here the stack is properly realigned, and the function works fine.

In any case: yes, I think this is a bug, and we should report it =
upstream.  This is a very nice test case to do so.

-Dimitry


--Apple-Mail=_389096EB-0326-4041-A3A8-19707FF05C83
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.20 (Darwin)

iEYEARECAAYFAlI6F3wACgkQsF6jCi4glqNiNQCg7WqJ652yHWmKp+kJyXN/gSVo
dywAoOa/SDonIUIXm9fhTTbOOwLbalss
=Y5MY
-----END PGP SIGNATURE-----

--Apple-Mail=_389096EB-0326-4041-A3A8-19707FF05C83--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9893CCE3-C7EF-4B52-B32E-8F1A0CE022C8>