Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 31 Jan 2017 18:39:05 -0800
From:      Mark Millard <markmigm@gmail.com>
To:        Tom Vijlbrief <tvijlbrief@gmail.com>, freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: Arm64 stack issues (was Re: FreeBSD status for/on ODroid-C2?)
Message-ID:  <EB1D79C2-CF5E-4C21-BA1B-EC9F34BB737E@gmail.com>
In-Reply-To: <54642E5C-D5D6-45B7-BB74-2407CFB351C2@dsl-only.net>
References:  <CAOQrpVfK-Dw_rSo_YVY5MT1wbc6Ah-Pj%2BWv8UGjeiUQ1b3%2B-mg@mail.gmail.com> <20170124191357.0ec0abfd@zapp> <20170128010138.iublazyrhhqycn37@mutt-hardenedbsd> <20170128010223.tjivldnh7pyenbg6@mutt-hardenedbsd> <CAOQrpVfxKvSR5PoahnqEsYspHhjjOGJ8iCBUetKxRV57oX_aUg@mail.gmail.com> <009857E3-35BB-4DE4-B3BB-5EC5DDBB5B06@dsl-only.net> <CAOQrpVdKyP2T0V77sfpuKbNP3ARoD1EcwtH6E9o7p5KF%2B=A56A@mail.gmail.com> <CB36F13F-85E9-41D2-A7F3-DA183BE5985A@dsl-only.net> <890B7D8A-27FF-41AC-8291-1858393EC7B1@gmail.com> <54642E5C-D5D6-45B7-BB74-2407CFB351C2@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
[Show .core file creation times instead.]

On 2017-Jan-31, at 6:30 PM, Mark Millard <markmi at dsl-only.net> wrote:

> [Just adding more accurate/precise times for the .core files.]
> [The original was accidentally sent from the "wrong" E-mail account
> but I've adjusted that here.]
>=20
> On 2017-Jan-31, at 12:35 PM, Mark Millard <markmi at dsl-only.net> =
wrote:
>=20
>> [More notes on what I observe on a pine64 from head -r312982 .]
>>=20
>> On 2017-Jan-28, at 2:17 PM, Tom Vijlbrief <tvijlbrief at gmail.com> =
wrote:
>>=20
>>> Note that on the pine64 the network interface hangs from time to =
time and I get a core dump with very low frequency from long running =
processes, eg the shell that invokes "make world".
>>=20
>> I got sh crashes (multiple processes in the same time frame) from
>> just trying to build pkg:
>>=20
>> make[5]: stopped in =
/usr/obj/portswork/usr/ports/ports-mgmt/pkg/work/pkg-1.9.4/libpkg
>> *** [all-recursive] Error code 1
>>=20
>> # ls -lt /var/crash/
>> total 41764
>> -rw-------  1 root  wheel  4702208 Jan 31 03:15 sh.13676.core
>> -rw-------  1 root  wheel  4702208 Jan 31 03:15 sh.13511.core
>> -rw-------  1 root  wheel  4702208 Jan 31 03:15 sh.13499.core
>> -rw-------  1 root  wheel  4702208 Jan 31 03:15 sh.12095.core
>> -rw-r--r--  1 root  wheel        5 Nov  3 10:18 minfree
>>=20
>> In all the crashes lldb on the .core shows that the pc was no longer
>> pointing a memory with code in it. It is interesting that all
>> 4 sh instances died at about the same time.
>=20
> More time detail (using -T):
>=20
> -rw-------  1 root  wheel  4702208 Jan 31 03:15:44 2017 sh.13676.core
> -rw-------  1 root  wheel  4702208 Jan 31 03:15:43 2017 sh.13511.core
> -rw-------  1 root  wheel  4702208 Jan 31 03:15:42 2017 sh.13499.core
> -rw-------  1 root  wheel  4702208 Jan 31 03:15:32 2017 sh.12095.core

I should have used creation times:

# ls -UTlt /var/crash/
. . .
-rw-------  1 root  wheel  4702208 Jan 31 03:15:42 2017 sh.13676.core
-rw-------  1 root  wheel  4702208 Jan 31 03:15:41 2017 sh.13511.core
-rw-------  1 root  wheel  4702208 Jan 31 03:15:41 2017 sh.13499.core
-rw-------  1 root  wheel  4702208 Jan 31 03:15:30 2017 sh.12095.core


>> SIGILL, SIGSEGV, SIGBUS, and SIGILL (again) from the non-code
>> consequences.
>>=20
>> The two SIGILL's have some interesting similarities to each other.
>> So I list them first below. x0-x3, x8-x9, x13, x17, x27, and cpsr
>> all match in these two. x1=3Dld-elf.so.1`_rtld_tlsdesc,
>> x17=3Dlibc.so.7`__free at jemalloc_jemalloc.c:2007,
>> x23=3Dld-elf.so.1`symlook_global + 124 at rtld.c:3916,
>> x27=3Dsh..bss + 6336.
>>=20
>> The other two have the following in common:
>> x10-x12, x16-x17. x17=3Dlibc.so.7`close at close.c:48 .
>>=20
>> x18 =3D 0xaaaaaaaaaaaaaaab is common between one SIGILL and one not.
>>=20
>> Only one does not have x27=3Dsh..bss + 6336. It instead has:
>> x28=3Dsh..bss + 6336 .
>>=20
>> (lldb) bt
>> * thread #1: tid =3D 100142, 0x000000004044f800, name =3D 'sh', stop =
reason =3D signal SIGILL
>> * frame #0: 0x000000004044f800
>> (lldb) register read
>> General Purpose Registers:
>>       x0 =3D 0x0000000000000000
>>       x1 =3D 0x00000000404346e8  ld-elf.so.1`_rtld_tlsdesc
>>       x2 =3D 0x0000000040a00000
>>       x3 =3D 0x0000000000000002
>>       x4 =3D 0x0000000000000050
>>       x5 =3D 0x0000000040a4c9c0
>>       x6 =3D 0x2e2e2f2e2e2f2e2e
>>       x7 =3D 0x6c6f6f7462696c2f
>>       x8 =3D 0x0000000000000001
>>       x9 =3D 0x0000000000000000
>>      x10 =3D 0x00000000000000df
>>      x11 =3D 0x000000000000002f
>>      x12 =3D 0x0000000040a0e690
>>      x13 =3D 0x0000000000000427
>>      x14 =3D 0x0000000000000001
>>      x15 =3D 0x0000000000000000
>>      x16 =3D 0x0000000000432340 =20
>>      x17 =3D 0x000000004054cd00  libc.so.7`__free at =
jemalloc_jemalloc.c:2007
>>      x18 =3D 0x0000000000000000
>>      x19 =3D 0x000000004044e330
>>      x20 =3D 0x000000001c93deed
>>      x21 =3D 0x0000000007ab9b5c
>>      x22 =3D 0x00000000404ba7b0 =20
>>      x23 =3D 0x000000004043c4b0  ld-elf.so.1`symlook_global + 124 at =
rtld.c:3916
>>      x24 =3D 0x0000ffffffffd2d0
>>      x25 =3D 0x0000ffffffffd370
>>      x26 =3D 0x0000ffffffffd340
>>      x27 =3D 0x0000000000434000  sh..bss + 6336
>>      x28 =3D 0x0000000040a4c1b0
>>       fp =3D 0x0000ffff00000001
>>       lr =3D 0x000000004044f800
>>       sp =3D 0x0000ffffffffd2a0
>>       pc =3D 0x000000004044f800
>>     cpsr =3D 0x60000000
>> (lldb) disass
>> ->  0x4044f800: .long  0xd550b87a                ; unknown opcode
>>   0x4044f804: .long  0x00000000                ; unknown opcode
>>   0x4044f808: .long  0x00000001                ; unknown opcode
>>   0x4044f80c: .long  0x00000000                ; unknown opcode
>>   0x4044f810: .long  0x4044fc00                ; unknown opcode
>>   0x4044f814: .long  0x00000000                ; unknown opcode
>>   0x4044f818: .long  0x4044f410                ; unknown opcode
>>   0x4044f81c: .long  0x00000000                ; unknown opcode
>>=20
>> (lldb) thread list
>> Process 0 stopped
>> * thread #1: tid =3D 100161, 0x0000ffffffffee68, name =3D 'sh', stop =
reason =3D signal SIGILL
>> (lldb) register read
>> General Purpose Registers:
>>       x0 =3D 0x0000000000000000
>>       x1 =3D 0x00000000404346e8  ld-elf.so.1`_rtld_tlsdesc
>>       x2 =3D 0x0000000040a00000
>>       x3 =3D 0x0000000000000002
>>       x4 =3D 0x0000000000000017
>>       x5 =3D 0x00080002a0290a00
>>       x6 =3D 0x0000000000434c28  sh..bss + 9448
>>       x7 =3D 0x000000000005e1cd
>>       x8 =3D 0x0000000000000001
>>       x9 =3D 0x0000000000000000
>>      x10 =3D 0x0000000000000000
>>      x11 =3D 0x0000000040a5c000
>>      x12 =3D 0x0000000040a0e670
>>      x13 =3D 0x0000000000000427
>>      x14 =3D 0x000000000000000d
>>      x15 =3D 0x0000000000432740  sh..bss + 0
>>      x16 =3D 0x0000000000432340 =20
>>      x17 =3D 0x000000004054cd00  libc.so.7`__free at =
jemalloc_jemalloc.c:2007
>>      x18 =3D 0xaaaaaaaaaaaaaaab
>>      x19 =3D 0x0000ffffffffee18
>>      x20 =3D 0x0000ffffffffedb4
>>      x21 =3D 0x0000ffffffffed80
>>      x22 =3D 0x0000ffffffffed59
>>      x23 =3D 0x0000ffffffffed47
>>      x24 =3D 0x0000ffffffffed38
>>      x25 =3D 0x0000ffffffffed28
>>      x26 =3D 0x0000ffffffffed20
>>      x27 =3D 0x0000000000434000  sh..bss + 6336
>>      x28 =3D 0x0000000040a803a0
>>       fp =3D 0x0000ffffffffee59
>>       lr =3D 0x0000ffffffffee68
>>       sp =3D 0x0000ffffffffe1a0
>>       pc =3D 0x0000ffffffffee68
>>     cpsr =3D 0x60000000
>> (lldb) disass
>> ->  0xffffffffee68: .long  0x44504d54                ; unknown opcode
>>   0xffffffffee6c: .long  0x2f3d5249                ; unknown opcode
>>   0xffffffffee70: .long  0x00706d74                ; unknown opcode
>>   0xffffffffee74: .long  0x4c454853                ; unknown opcode
>>   0xffffffffee78: .long  0x622f3d4c                ; unknown opcode
>>   0xffffffffee7c: .long  0x732f6e69                ; unknown opcode
>>   0xffffffffee80: .long  0x4f430068                ; unknown opcode
>>   0xffffffffee84: .long  0x4749464e                ; unknown opcode
>>=20
>> (lldb) bt
>> * thread #1: tid =3D 100088, 0x356c7265702f676e, name =3D 'sh', stop =
reason =3D signal SIGBUS
>> * frame #0: 0x356c7265702f676e
>> (lldb) register read
>> General Purpose Registers:
>>       x0 =3D 0x0000000000000000
>>       x1 =3D 0x0000000000000000
>>       x2 =3D 0x0000000040a00000
>>       x3 =3D 0x0000000000000005
>>       x4 =3D 0x0000000000000038
>>       x5 =3D 0x0000000040a754e5
>>       x6 =3D 0x584946455250442d
>>       x7 =3D 0x6c2f7273752f223d
>>       x8 =3D 0x0000000000000000
>>       x9 =3D 0x0000000000000000
>>      x10 =3D 0x0000000000434000  sh..bss + 6336
>>      x11 =3D 0x0000000000000000
>>      x12 =3D 0x0000000000434217  sh..bss + 6871
>>      x13 =3D 0x0000000000434000  sh..bss + 6336
>>      x14 =3D 0x0000000000432000  sh`__frame_dummy_init_array_entry
>>      x15 =3D 0x000000000000003d
>>      x16 =3D 0x00000000004322b0 =20
>>      x17 =3D 0x000000004050d090  libc.so.7`close at close.c:48
>>      x18 =3D 0xaaaaaaaaaaaaaaab
>>      x19 =3D 0x766564206f666e69
>>      x20 =3D 0x7865646e692f746e
>>      x21 =3D 0x69727020676b702f
>>      x22 =3D 0x746d676d2d737472
>>      x23 =3D 0x6f7020656d69746e
>>      x24 =3D 0x75722d7478657474
>>      x25 =3D 0x65672f6c65766564
>>      x26 =3D 0x206e6f7369622f6c
>>      x27 =3D 0x0000000040a53716
>>      x28 =3D 0x0000000000434000  sh..bss + 6336
>>       fp =3D 0x616c20346d2f6c65
>>       lr =3D 0x356c7265702f676e
>>       sp =3D 0x0000ffffffffe740
>>       pc =3D 0x356c7265702f676e
>>     cpsr =3D 0x20000000
>>=20
>> (lldb) disass
>> error: core file does not contain 0x356c7265702f676e
>> error: Failed to disassemble memory at 0xffffffffffffffff.
>>=20
>>=20
>>=20
>> (lldb) bt
>> * thread #1: tid =3D 100186, 0x0000000000000000, name =3D 'sh', stop =
reason =3D signal SIGSEGV
>> * frame #0: 0x0000000000000000
>> (lldb) disass
>> error: core file does not contain 0x0
>> error: Failed to disassemble memory at 0xffffffffffffffff.
>> (lldb) register read
>> General Purpose Registers:
>>       x0 =3D 0x0000000000000000
>>       x1 =3D 0x0000000000000000
>>       x2 =3D 0x0000000000000002
>>       x3 =3D 0x0000000000006c6f
>>       x4 =3D 0x0000000040a50bb3
>>       x5 =3D 0x0000000040a499ba
>>       x6 =3D 0x6f7462696c2f2e2e
>>       x7 =3D 0x6c6f6f7462696c2f
>>       x8 =3D 0x0000000000000000
>>       x9 =3D 0x0000000000000000
>>      x10 =3D 0x0000000000434000  sh..bss + 6336
>>      x11 =3D 0x0000000000000000
>>      x12 =3D 0x0000000040a499f8
>>      x13 =3D 0x0000000000434000  sh..bss + 6336
>>      x14 =3D 0x0000000000000001
>>      x15 =3D 0x0000000000000000
>>      x16 =3D 0x00000000004322b0 =20
>>      x17 =3D 0x000000004050d090  libc.so.7`close at close.c:48
>>      x18 =3D 0x0000000000000000
>>      x19 =3D 0x0000000000000065
>>      x20 =3D 0x0000000000000065
>>      x21 =3D 0x00000000004168f0  sh`readtoken1 + 5212 at =
parser.c:1602
>>      x22 =3D 0x0000ffffffffda90
>>      x23 =3D 0x0000000040a498c0
>>      x24 =3D 0x000000000000000a
>>      x25 =3D 0x0000000000000000
>>      x26 =3D 0x0000000000000000
>>      x27 =3D 0x0000000040a49258
>>      x28 =3D 0x0000000000434000  sh..bss + 6336
>>       fp =3D 0x0000ffffffffda08
>>       lr =3D 0x0000000000000000
>>       sp =3D 0x0000ffffffffd970
>>       pc =3D 0x0000000000000000
>>     cpsr =3D 0x20000000
>>=20
>>=20
>> Looks to me like something major is wrong.


=3D=3D=3D
Mark Millard
markmi at dsl-only.net

On 2017-Jan-30, at 11:57 PM, Mark Millard <markmi at dsl-only.net> =
wrote:

> I updated to head -r312982 on the pine64 that I have access to:
>=20
> # uname -apKU
> FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT  r312982M  arm64 =
aarch64 1200020 1200020
>=20
> after several months of not using the pine64.
> ( -mcpu=3Dcortex-a53 used for buildworld buildkernel;
> non-debug variant of GENERIC [GENERIC included
> then overridden]; usb SSD root file system)
>=20
> I find that any time some of the cores are busy I get thousands
> of the gic0 spurious interrupt messages in fairly sort order.
> (This is not new: it is unchanged.)
>=20
> For example during either of:
>=20
> openssl speed
>=20
> or:
>=20
> cp /dev/zero /dev/null
> (similarly for copying actual files around,
> local or nfs involved)
>=20
> Once the cores are no longer busy the gic0 messages stop.
>=20
> The "on CPU<?>" varies. The "last irq: <?>" varies.
> (But 27 is the most common by far.)


=3D=3D=3D
Mark Millard
markmi at dsl-only.net

On 2017-Jan-28, at 2:17 PM, Tom Vijlbrief <tvijlbrief at gmail.com> =
wrote:

Note that on the pine64 the network interface hangs from time to time =
and I get a core dump with very low frequency from long running =
processes, eg the shell that invokes "make world". Note that I had =
similar issues on the ODroid-C2.

Currently rebuilding world without MALLOC_PRODUCTION.

The arm64 port is getting close to working 100%, just a last few =
glitches.


Op 22:03 ZA 28 Jan 2017 schreef Mark Millard <markmi at dsl-only.net>:
[About: "gic0: Spurious interrupt detected" on armv6 as well.]

On 2017-Jan-28, at 6:43 AM, Tom Vijlbrief <tvijlbrief at gmail.com> =
wrote:

> Did a build/install world/kernel with r312916 and =
MALLOC_PRODUCTION=3DYES on
> a pine64, removed /etc/malloc.conf, rebooted
>=20
> and I am now rebuilding the python2 port without problems so far =
(except
> the "gic0: Spurious interrupt detected" messages which reappeared =
shortly
> after my previous post)

While very rare, I have seen the gic0 notices on armv6 (e.g., a bpim3)
during large builds (with -j 4). Recently I got a:

gic0: Spurious interrupt detected: last irq: 29 on CPU1

on:

# uname -apKU
FreeBSD bpim3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r312726M: Tue Jan 24 =
20:57:48 PST 2017     =
markmi@FreeBSDx64:/usr/obj/bpim3_clang/arm.armv6/usr/src/sys/BPIM3-NODBG =
 arm armv6 1200020 1200020

while building devel/gcc6 (via a full bootstrap) via -j 4 .

This is from a non-debug buildworld buildkernel context and has =
MALLOC_PRODUCTION=3D
in /etc/make.conf . No /etc/malloc.conf present. I do use =
-mcpu=3Dcortex-a7 .



Details if you care:

# more /usr/src/sys/arm/conf/BPIM3-NODBG
#
# BPIM3 -- Custom configuration for the Banana Pi M3
#

include "GENERIC"

ident   BPIM3-NODBG

makeoptions     DEBUG=3D-g                # Build kernel with gdb(1) =
debug symbols

options         ALT_BREAK_TO_DEBUGGER

options         KDB                     # Enable kernel debugger support

# For minimum debugger support (stable branch) use:
options         KDB_TRACE               # Print a stack trace for a =
panic
options         DDB                     # Enable the kernel debugger

# Extra stuff:
#options        VERBOSE_SYSINIT         # Enable verbose sysinit =
messages
#options        BOOTVERBOSE=3D1
#options        BOOTHOWTO=3DRB_VERBOSE
#options        KTR
#options        KTR_MASK=3DKTR_TRAP
##options       KTR_CPUMASK=3D0xF
#options        KTR_VERBOSE

# Disable any extra checking for. . .
nooptions       DEADLKRES               # Enable the deadlock resolver
nooptions       INVARIANTS              # Enable calls of extra sanity =
checking
nooptions       INVARIANT_SUPPORT       # Extra sanity checks of =
internal structures, required by INVARIANTS
nooptions       WITNESS                 # Enable checks to detect =
deadlocks and cycles
nooptions       WITNESS_SKIPSPIN        # Don't run witness on spinlocks =
for speed
nooptions       DIAGNOSTIC


It was a from cross build for buildworld buildkernel :
(I've not checked on lldb builds linking recently.)

# more ~/src.configs/src.conf.bpim3-clang-bootstrap.amd64-host
TO_TYPE=3Darmv6
#
KERNCONF=3DBPIM3-NODBG
TARGET=3Darm
.if ${.MAKE.LEVEL} =3D=3D 0
TARGET_ARCH=3D${TO_TYPE}
.export TARGET_ARCH
.endif
#
WITH_CROSS_COMPILER=3D
WITHOUT_SYSTEM_COMPILER=3D
#
#CPUTYPE=3Dsoft
WITH_LIBCPLUSPLUS=3D
WITH_BINUTILS_BOOTSTRAP=3D
WITH_CLANG_BOOTSTRAP=3D
WITH_CLANG=3D
WITH_CLANG_IS_CC=3D
WITH_CLANG_FULL=3D
WITH_CLANG_EXTRAS=3D
WITH_LLD=3D
#
# Linking lldb fails for armv6(/v7)
WITHOUT_LLDB=3D
#
WITH_BOOT=3D
WITHOUT_LIB32=3D
WITHOUT_LIBSOFT=3D
#
WITHOUT_ELFTOOLCHAIN_BOOTSTRAP=3D
WITHOUT_GCC_BOOTSTRAP=3D
WITHOUT_GCC=3D
WITHOUT_GCC_IS_CC=3D
WITHOUT_GNUCXX=3D
#
NO_WERROR=3D
#WERROR=3D
MALLOC_PRODUCTION=3D
#
WITH_REPRODUCIBLE_BUILD=3D
WITH_DEBUG_FILES=3D
#
XCFLAGS+=3D -mcpu=3Dcortex-a7
XCXXFLAGS+=3D -mcpu=3Dcortex-a7
# There is no XCPPFLAGS but XCPP gets XCFLAGS content.


Used for buildworld buildkernel :

# more ~/src.configs/make.conf
#MALLOC_PRODUCTION=3D
#NO_WERROR=3D
#WERROR=3D
CFLAGS.gcc+=3D -v


Used for port builds:

# more /etc/make.conf
WANT_QT_VERBOSE_CONFIGURE=3D1
#
DEFAULT_VERSIONS+=3Dperl5=3D5.24
WRKDIRPREFIX=3D/usr/obj/portswork
WITH_DEBUG=3D
WITH_DEBUG_FILES=3D
MALLOC_PRODUCTION=3D


# svnlite status /usr/src/ | sort
?       /usr/src/sys/amd64/conf/GENERIC-DBG
?       /usr/src/sys/amd64/conf/GENERIC-NODBG
?       /usr/src/sys/arm/conf/BPIM3-DBG
?       /usr/src/sys/arm/conf/BPIM3-NODBG
?       /usr/src/sys/arm/conf/RPI2-DBG
?       /usr/src/sys/arm/conf/RPI2-NODBG
?       /usr/src/sys/arm64/conf/GENERIC-DBG
?       /usr/src/sys/arm64/conf/GENERIC-NODBG
?       /usr/src/sys/powerpc/conf/GENERIC64vtsc-DBG
?       /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG
?       /usr/src/sys/powerpc/conf/GENERICvtsc-DBG
?       /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG
M       /usr/src/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td
M       /usr/src/contrib/llvm/tools/lld/ELF/Target.cpp
M       /usr/src/lib/csu/powerpc64/Makefile
M       /usr/src/libexec/rtld-elf/Makefile
M       /usr/src/sys/boot/ofw/Makefile.inc
M       /usr/src/sys/boot/powerpc/Makefile.inc
M       /usr/src/sys/boot/powerpc/kboot/Makefile
M       /usr/src/sys/boot/uboot/Makefile.inc
M       /usr/src/sys/conf/kern.mk
M       /usr/src/sys/conf/kmod.mk
M       /usr/src/sys/ddb/db_main.c
M       /usr/src/sys/ddb/db_script.c
M       /usr/src/sys/modules/zfs/Makefile
M       /usr/src/sys/powerpc/ofw/ofw_machdep.c

The M's are generally tied to powerpc64 and powerpc
explorations. I tend to use the same source for all
the TARGET_ARCH's that I build.


=3D=3D=3D
Mark Millard
markmi at dsl-only.net


_______________________________________________
freebsd-arm@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EB1D79C2-CF5E-4C21-BA1B-EC9F34BB737E>