Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 17 Sep 2011 11:31:57 +0200
From:      "Hartmann, O." <ohartman@zedat.fu-berlin.de>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        Jason Harmening <jason.harmening@gmail.com>, freebsd-current@freebsd.org
Subject:   Re: Crashes in world built w/ clang: FP registers?
Message-ID:  <4E74690D.5070500@zedat.fu-berlin.de>
In-Reply-To: <20110917090239.GM1511@deviant.kiev.zoral.com.ua>
References:  <CAM=8qan5K6025J5oBT25s4fz9YgT15mp5SpNsEdzR0Fw%2BHVwig@mail.gmail.com> <20110917090239.GM1511@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 09/17/11 11:02, Kostik Belousov wrote:
> On Fri, Sep 16, 2011 at 10:34:40PM -0500, Jason Harmening wrote:
>> Hi everyone,
>>
>> Using clang as the default compiler, the kernel and drivers will work
>> fine, but a lot of programs in the base system and ports will crash w/
>> SIGBUS.  In fact, so much of the stuff in the chroot'ed world will
>> crash (everything from csh to gcc) that it's basically unusable.  I
>> finally got around to building w/ debug symbols, and ran gdb on a
>> coredump generated while I was trying to use tab completion in csh:

I got a similar phenomenon with "sh" or "bash". Using tab exited the
shell as it
has been exited by typing exit. The box (A Dell Latitude E6510) ran so
far the kernel,
but compiling wasn't possible anymore, nearly every software crashed
with SIGBUS.

My kernel and world were compiled with CLANG, "-march=native" or left void.
The CPU of the Latitude E6510 is a Core-i7 based Core-5 notebook CPU,
"Lynnfield".

At this very moment, the notebook runs again a fresh setup most recent
FreeBSD 9.0/amd64,
but I explicetely set "-march=core2", since looking at the postings
here, it seems that CLANG
is miscompiling on Core-i7 architectures when enabled on FreeBSD (I had
never problems using the most recent CLANG on Linux Ubuntu 11.04,
running on XEON 5670, a "Westmere" six core Core-i7 architecture with
explicitely enabled -march=core-i7).

I also realise that even on a Core2 and compiled with -march=core2 some
clients compiled with CLANG
sporadically SIGBUS, like Firefox 6.

I can not provide more details, since I haven't run into the problem
since a couple of days now and the notebook seems to work properly with
-march-core2 set to CLANG this moment and I need the machines for work
for the moment.

But I hope I can provide with this some additional hints.

By the way, I can reproduce the above mentioned behaviour of the broken
OS when compiled with CLANG and non-set march or march set to native or
core-i7. I did this two times and it was always the same phenomenon with
different shells.
The funny thing was: using the shells in multiuser mode tend to crash
100%. Using single user mode, the problem wasn't
there, but compilation also failed with a "cc1" compiler error, although
I've set the make.conf as recommended in the updated wiki.

>>
>> (gdb) bt
>> #0  tw_collect (command=dwarf2_read_address: Corrupted DWARF expression.)
>>     at /usr/src/bin/csh/../../contrib/tcsh/tw.parse.c:1308
>> #1  0x000000000042777b in t_search (word=Unhandled dwarf expression opcode 0x0)
>>     at /usr/src/bin/csh/../../contrib/tcsh/tw.parse.c:1725
>> #2  0x0000000000426829 in tenematch (inputline=Variable "inputline" is
>> not avail               able.)
>>     at /usr/src/bin/csh/../../contrib/tcsh/tw.parse.c:301
>> #3  0x000000000043545d in Inputl ()
>>     at /usr/src/bin/csh/../../contrib/tcsh/ed.inputl.c:415
>> #4  0x0000000000417a90 in readc (wanteof=Variable "wanteof" is not available.)
>>     at /usr/src/bin/csh/../../contrib/tcsh/sh.lex.c:1653
>> #5  0x0000000000416f37 in lex (hp=Variable "hp" is not available.)
>>     at /usr/src/bin/csh/../../contrib/tcsh/sh.lex.c:162
>> #6  0x0000000000405afb in process (catch=Unhandled dwarf expression opcode 0x0)
>>     at /usr/src/bin/csh/../../contrib/tcsh/sh.c:1922
>> #7  0x0000000000404b51 in main (argc=Variable "argc" is not available.)
>>     at /usr/src/bin/csh/../../contrib/tcsh/sh.c:1289
>>
>> gdb) disas
>> Dump of assembler code for function tw_collect:
>> 0x00000000004288b0 <tw_collect+0>:      push   %rbp
>> 0x00000000004288b1 <tw_collect+1>:      mov    %rsp,%rbp
>> 0x00000000004288b4 <tw_collect+4>:      push   %r15
>> 0x00000000004288b6 <tw_collect+6>:      push   %r14
>> 0x00000000004288b8 <tw_collect+8>:      push   %r13
>> 0x00000000004288ba <tw_collect+10>:     push   %r12
>> 0x00000000004288bc <tw_collect+12>:     push   %rbx
>> 0x00000000004288bd <tw_collect+13>:     sub    $0x2e8,%rsp
>> 0x00000000004288c4 <tw_collect+20>:     mov    %r9,-0x308(%rbp)
>> 0x00000000004288cb <tw_collect+27>:     mov    %r8,-0x300(%rbp)
>> 0x00000000004288d2 <tw_collect+34>:     mov    %rcx,-0x2f8(%rbp)
>> 0x00000000004288d9 <tw_collect+41>:     mov    %rdx,-0x2f0(%rbp)
>> 0x00000000004288e0 <tw_collect+48>:     mov    %esi,-0x2e8(%rbp)
>> 0x00000000004288e6 <tw_collect+54>:     mov    %edi,-0x2e4(%rbp)
>> 0x00000000004288ec <tw_collect+60>:     movl   $0x0,-0x1d4(%rbp)
>> 0x00000000004288f6 <tw_collect+70>:     movaps 0x23115b(%rip),%xmm0
>>     # 0x6                                                   59a58
>> <reslab+48>
> This is actually 0x659a58 <reslab+48>
> movaps tried to load %xmm0 from the unaligned address, which is forbidden
> and causes #GP.
>
> I have no idea why clang generates unaligned loads.
>> 0x00000000004288fd <tw_collect+77>:     lea    -0x2(%rdi),%eax
>> 0x0000000000428900 <tw_collect+80>:     mov    %eax,-0x2e0(%rbp)
>> 0x0000000000428906 <tw_collect+86>:     test   %edi,%edi
>> 0x0000000000428908 <tw_collect+88>:     movaps %xmm0,-0x210(%rbp)
>> 0x000000000042890f <tw_collect+95>:     sete   %al
>> ---Type <return> to continue, or q <return> to quit---q
>> Quit
>> (gdb) info line tw.parse.c:1308
>> Line 1308 of "/usr/src/bin/csh/../../contrib/tcsh/tw.parse.c"
>>    starts at address 0x4288f6 <tw_collect+70>
>>    and ends at 0x4288fd <tw_collect+77>.
>>
>>
>> Looks like it's crashing as soon as it tries to use the XMM registers.
>>  I'm not sure if all of the crashes I'm getting are like this one, but
>> I was surprised to see FP registers in code like this.
>>
>> I'm using march=corei7 and -O2 for both world and kernel, but using
>> march=nocona or just leaving out CPUTYPE has no effect (actual CPU is
>> Nehalem Xeon 5520)
>> Here's the relevant part of make.conf for completeness:
>>
>> .if !defined(CC) || ${CC} == "cc"
>> CC=clang
>> .endif
>> .if !defined(CXX) || ${CXX} == "c++"
>> CXX=clang++
>> .endif
>> .if !defined(CPP) || ${CPP} == "cpp"
>> CPP=clang -E
>> .endif
>> NO_WERROR=
>> WERROR=
>> NO_FSCHG=
>> CPUTYPE?=corei7
>> CFLAGS= -O2 -pipe
>> COPTFLAGS= -O2 -pipe
>>
>> Any thoughts? Is there some simple fix for this I'm missing?
>>
>> Thanks,
>> Jason
>> _______________________________________________
>> freebsd-current@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E74690D.5070500>