Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Nov 2016 22:42:08 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        freebsd-arm <freebsd-arm@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: FYI: a Pine64+ 2G aarch64 head -r308247 crash and some information about it
Message-ID:  <7561E656-3EFC-4338-A419-C35BBA40D07A@dsl-only.net>
In-Reply-To: <B2C3CC2A-B60C-46A4-A359-1FAD0B15862E@dsl-only.net>
References:  <B2C3CC2A-B60C-46A4-A359-1FAD0B15862E@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2016-Nov-7, at 8:54 PM, Mark Millard <markmi at dsl-only.net> wrote:

> This is just in case any of the information happens to prove =
useful/interesting.
> I'm not expecting any assistance.
>=20
> Note: After the crash ddb was not responding to input so this is all =
that I have.
>=20
> Note: This was an experiment with head -r308247 but was built like =
stable for
>      performance issues (GENERIC included and then overridden, not =
GENERIC-UP
>      based).
>=20
> The below was found via dmesg and/or /var/log/messages content while =
the Pine64
> was busy building lang/gcc6 and its prerequisites (as a stability =
test).
>=20
> It got lots of spurious interrupt notices per second, such as:
>=20
>> gic0: Spurious interrupt detected: last irq: 27 on CPU0
>> gic0: Spurious interrupt detected: last irq: 27 on CPU2
>> gic0: Spurious interrupt detected: last irq: 106 on CPU3
>> gic0: Spurious interrupt detected: last irq: 27 on CPU1
>> gic0: Spurious interrupt detected: last irq: 27 on CPU1
>> gic0: gic0: Spurious interrupt detected: last irq: 27 on CPU1
>> Spurious interrupt detected: last irq: 27 on CPU3
>> gic0: gic0: Spurious interrupt detected: last irq: 27 on CPU0
>> Spurious interrupt detected: last irq: 27 on CPU1
>> gic0: Spurious interrupt detected: last irq: 27 on CPU1
>> gic0: Spurious interrupt detected: last irq: 27 on CPU2
>> gic0: Spurious interrupt detected: last irq: 27 on CPU3
>=20
> 27 happened the most by far. 106 was fairly rare. I'd not noticed any =
other
> figures. =46rom what I saw all were "gic0".
>=20
> sh had a few signal 11's and one signal 4 as of when I had last =
checked:
>=20
>> pid 13900 (sh), uid 0: exited on signal 11 (core dumped)
>> pid 19325 (sh), uid 0: exited on signal 11 (core dumped)
>> pid 49697 (sh), uid 0: exited on signal 11 (core dumped)
>> pid 68390 (sh), uid 0: exited on signal 4 (core dumped)
>> pid 81149 (sh), uid 0: exited on signal 11 (core dumped)
>=20
> I did not notice any other core dumps.
>=20
> And the following happened once that I noticed:
>=20
>> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 16 a3 a4 80 00 00 40 00=20=

>> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an =
error
>> (da0:umass-sim0:0:0:0): Retrying command
>=20
> The root filesystem was on a USB SSD.
>=20
> The above was all from a ssh session history. The below is from the =
serial
> console. . .
>=20
> Later it got the crash:
>=20
>> panic: ARM64TODO: reclaim_pv_chunk
>> cpuid =3D 2
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self_wrapper+0x28
>>        pc =3D 0xffff00000068b430  lr =3D 0xffff0000000631dc
>>        sp =3D 0xffff000083758080  fp =3D 0xffff000083758290
>>=20
>> db_trace_self_wrapper() at vpanic+0x170
>>        pc =3D 0xffff0000000631dc  lr =3D 0xffff000000335f10
>>        sp =3D 0xffff0000837582a0  fp =3D 0xffff000083758320
>>=20
>> vpanic() at panic+0x4c
>>        pc =3D 0xffff000000335f10  lr =3D 0xffff000000335d9c
>>        sp =3D 0xffff000083758330  fp =3D 0xffff0000837583b0
>>=20
>> panic() at reclaim_pv_chunk+0x10
>>        pc =3D 0xffff000000335d9c  lr =3D 0xffff0000006a13d4
>>        sp =3D 0xffff0000837583c0  fp =3D 0xffff0000837583c0
>>=20
>> reclaim_pv_chunk() at get_pv_entry+0x2bc
>>        pc =3D 0xffff0000006a13d4  lr =3D 0xffff00000069c270
>>        sp =3D 0xffff0000837583d0  fp =3D 0xffff000083758400
>>=20
>> get_pv_entry() at pmap_enter+0x68c
>>        pc =3D 0xffff00000069c270  lr =3D 0xffff00000069b41c
>>        sp =3D 0xffff000083758410  fp =3D 0xffff0000837584b0
>>=20
>> pmap_enter() at vm_fault_hold+0x2f0
>>        pc =3D 0xffff00000069b41c  lr =3D 0xffff000000641eb8
>>        sp =3D 0xffff0000837584c0  fp =3D 0xffff000083758600
>>=20
>> vm_fault_hold() at vm_fault_quick_hold_pages+0x120
>>        pc =3D 0xffff000000641eb8  lr =3D 0xffff000000645004
>>        sp =3D 0xffff000083758610  fp =3D 0xffff000083758670
>>=20
>> vm_fault_quick_hold_pages() at vn_io_fault1+0x250
>>        pc =3D 0xffff000000645004  lr =3D 0xffff00000042b788
>>        sp =3D 0xffff000083758680  fp =3D 0xffff0000837587c0
>>=20
>> vn_io_fault1() at vn_io_fault+0x170
>>        pc =3D 0xffff00000042b788  lr =3D 0xffff0000004297a4
>>        sp =3D 0xffff0000837587d0  fp =3D 0xffff000083758840
>>=20
>> vn_io_fault() at dofilewrite+0xbc
>>        pc =3D 0xffff0000004297a4  lr =3D 0xffff0000003a35e4
>>        sp =3D 0xffff000083758850  fp =3D 0xffff000083758890
>>=20
>> dofilewrite() at kern_writev+0x6c
>>        pc =3D 0xffff0000003a35e4  lr =3D 0xffff0000003a32d4
>>        sp =3D 0xffff0000837588a0  fp =3D 0xffff0000837588e0
>>=20
>> kern_writev() at sys_write+0x7c
>>        pc =3D 0xffff0000003a32d4  lr =3D 0xffff0000003a3258
>>        sp =3D 0xffff0000837588f0  fp =3D 0xffff000083758930
>>=20
>> sys_write() at do_el0_sync+0x6fc
>>        pc =3D 0xffff0000003a3258  lr =3D 0xffff0000006a2778
>>        sp =3D 0xffff000083758940  fp =3D 0xffff000083758a70
>>=20
>> do_el0_sync() at handle_el0_sync+0x64
>>        pc =3D 0xffff0000006a2778  lr =3D 0xffff00000068d1d0
>>        sp =3D 0xffff000083758a80  fp =3D 0xffff000083758b90
>>=20
>> handle_el0_sync() at 0x696ff0
>>        pc =3D 0xffff00000068d1d0  lr =3D 0x0000000000696ff0
>>        sp =3D 0xffff000083758ba0  fp =3D 0x0000ffffffffc610
>>=20
>> KDB: enter: panic
>> [ thread pid 850 tid 100149 ]
>> Stopped at      kdb_enter+0x40: undefined       d4200000
>=20
>=20
> Side notes:
>=20
> To do the lang/gcc6 test I adjusted lang/gcc6/Makefile to (hopefully) =
allow
> an aarch64 build attempt:
>=20
>> # svnlite diff /usr/ports/lang/gcc6
>> Index: /usr/ports/lang/gcc6/Makefile
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- /usr/ports/lang/gcc6/Makefile	(revision 424540)
>> +++ /usr/ports/lang/gcc6/Makefile	(working copy)
>> @@ -35,7 +35,7 @@
>> DISTVERSION=3D	${PORTVERSION:C/([0-9]+).*\.s([0-9]+)/\1-\2/}
>> GCC_VERSION=3D	${PORTVERSION:C/(.+)\.s[0-9]{8}/\1/}
>> SUFFIX=3D		${PORTVERSION:C/([0-9]+).*/\1/}
>> -ONLY_FOR_ARCHS=3D	amd64 i386 powerpc powerpc64 sparc64 arm armv6 =
armv6hf
>> +ONLY_FOR_ARCHS=3D	amd64 i386 powerpc powerpc64 sparc64 arm armv6 =
armv6hf aarch64
>> USES=3D		compiler cpe gmake iconv libtool makeinfo perl5 =
tar:bzip2
>> USE_BINUTILS=3D	yes
>> USE_PERL5=3D	build
>> @@ -73,7 +73,7 @@
>> USE_GCC=3D	4.9+
>> .endif
>>=20
>> -.if ${ARCH} =3D=3D "armv6" || ${ARCH} =3D=3D "armv6hf"
>> +.if ${ARCH} =3D=3D "armv6" || ${ARCH} =3D=3D "armv6hf" || ${ARCH} =3D=3D=
 "aarch64"
>> . if ${COMPILER_TYPE} =3D=3D clang
>> .  if empty(PORT_OPTIONS:MBOOTSTRAP)
>> MAKE_ARGS+=3DCXXFLAGS=3D-fbracket-depth=3D512
>=20
> It did not make it to lang/gcc6 itself but did build several things.
>=20
> (I did not care if lang/gcc6 really worked or not: Environment =
stability
> test.)
>=20
> =3D=3D=3D
> Mark Millard
> markmi at dsl-only.net

There was fsck activity after cutting and applying power but luckily the =
SSD
content (largely) survived.

The compressed message log shows the following as an example of when the =
"gic0"
Spurious interrupts start:

> Nov  7 18:54:28 pine64 kernel: ugen3.2: <OWC Envoy Pro mini> at usbus3
> Nov  7 18:54:28 pine64 kernel: umass0 on uhub3
> Nov  7 18:54:28 pine64 kernel: umass0: <OWC Envoy Pro mini, class 0/0, =
rev 2.10/1.00, addr 2> on usbus3
> Nov  7 18:54:28 pine64 kernel: umass0:  SCSI over Bulk-Only; quirks =3D =
0x0100
> Nov  7 18:54:28 pine64 kernel: umass0:0:0: Attached to scbus0
> Nov  7 18:54:28 pine64 kernel: mountroot: waiting for device =
/dev/ufs/PINE642Grootfs...
> Nov  7 18:54:28 pine64 kernel: da0 at umass-sim0 bus 0 scbus0 target 0 =
lun 0
> Nov  7 18:54:28 pine64 kernel: da0: <OWC Envoy Pro mini 0> Fixed =
Direct Access SPC-4 SCSI device
> Nov  7 18:54:28 pine64 kernel: da0: Serial Number <REMOVED>
> Nov  7 18:54:28 pine64 kernel: da0: 40.000MB/s transfers
> Nov  7 18:54:28 pine64 kernel: da0: 228936MB (468862128 512 byte =
sectors)
> Nov  7 18:54:28 pine64 kernel: da0: quirks=3D0x2<NO_6_BYTE>
> Nov  7 18:54:28 pine64 kernel: awg0: link state changed to DOWN
> Nov  7 18:54:28 pine64 kernel: gic0: Spurious interrupt detected: last =
irq: 27 on CPU2
> Nov  7 18:54:28 pine64 kernel: awg0: link state changed to UP
> Nov  7 18:54:28 pine64 kernel: gic0: Spurious interrupt detected: last =
irq: 27 on CPU3
> Nov  7 18:54:31 pine64 kernel: .
> Nov  7 18:54:31 pine64 kernel: gic0: Spurious interrupt detected: last =
irq: 27 on CPU2
> Nov  7 18:54:31 pine64 ntpd[602]: ntpd 4.2.8p8-a (1): Starting
> Nov  7 18:54:31 pine64 ntpd[603]: leapsecond file =
('/var/db/ntpd.leap-seconds.list'): good hash signature
> Nov  7 18:54:31 pine64 ntpd[603]: leapsecond file =
('/var/db/ntpd.leap-seconds.list'): loaded, expire=3D2017-06-01T00:00:00Z =
last=3D2017-01-01T00:00:00Z ofs=3D37
> Nov  7 19:36:17 pine64 kernel: gic0: gic0: Spurious interrupt =
detected: last irq: 27 on CPU1
> Nov  7 19:36:17 pine64 kernel: Spurious interrupt detected: last irq: =
106 on CPU3
> Nov  7 19:36:17 pine64 kernel: gic0: Spurious interrupt detected: last =
irq: 27 on CPU1
> Nov  7 19:36:17 pine64 kernel: gic0: Spurious interrupt detected: last =
irq: 27 on CPU2
> Nov  7 19:36:17 pine64 kernel: gic0: Spurious interrupt detected: last =
irq: 27 on CPU0
> Nov  7 19:36:17 pine64 last message repeated 2 times
> Nov  7 19:36:17 pine64 kernel: gic0: gic0: Spurious interrupt =
detected: last irq: 27 on CPU0
> Nov  7 19:36:17 pine64 kernel: Spurious interrupt detected: last irq: =
27 on CPU3

Always shortly after awg0's link state changed to down after the da0 =
information.


In looking at the logs I've seen a 32 and some 114's and 92's as well as =
the 27's and 106's.
There were a few brief periods over which non-27's were fairly frequent.

There are a few of the CAM status message blocks, about 7 over =
everything recorded,
including a few prior boots of the Pine64.


An interesting point is that somewhat before and while the lang/gcc6 =
related build activity
was active the "gic0" notices stopped:

> Nov  8 02:44:40 pine64 kernel: gic0: Spurious interrupt detected: last =
irq: 27 on CPU3
> Nov  8 03:01:15 pine64 kernel: pid 13900 (sh), uid 0: exited on signal =
11 (core dumped)
> Nov  8 03:01:39 pine64 kernel: pid 19325 (sh), uid 0: exited on signal =
11 (core dumped)
> Nov  8 03:02:02 pine64 pkg-static: indexinfo-0.2.5 installed
> Nov  8 03:02:04 pine64 pkg-static: gettext-runtime-0.19.8.1 installed
> Nov  8 03:04:42 pine64 kernel: pid 49697 (sh), uid 0: exited on signal =
11 (core dumped)
> Nov  8 03:05:26 pine64 kernel: pid 68390 (sh), uid 0: exited on signal =
4 (core dumped)
> Nov  8 03:06:10 pine64 kernel: pid 81149 (sh), uid 0: exited on signal =
11 (core dumped)
> Nov  8 03:08:36 pine64 pkg-static: gettext-tools-0.19.8.1 installed
> Nov  8 03:09:07 pine64 pkg-static: gmake-4.2.1_1 installed
> Nov  8 03:27:01 pine64 pkg-static: perl5-5.24.1.r4 installed
> Nov  8 03:27:33 pine64 pkg-static: p5-Locale-gettext-1.06 installed
> Nov  8 03:27:42 pine64 pkg-static: help2man-1.43.3_1 installed
> Nov  8 03:30:22 pine64 pkg-static: texinfo-6.1.20160425,1 installed
> Nov  8 03:31:59 pine64 pkg-static: m4-1.4.17_1,1 installed
> Nov  8 03:34:04 pine64 pkg-static: bison-2.7.1,1 installed
> Nov  8 03:37:44 pine64 pkg-static: gmp-5.1.3_3 installed
> Nov  8 03:39:14 pine64 pkg-static: mpfr-3.1.5 installed
> Nov  8 03:44:53 pine64 kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: =
2a 00 16 bd 19 c0 00 00 80 00=20
> Nov  8 03:44:53 pine64 kernel: (da0:umass-sim0:0:0:0): CAM status: CCB =
request completed with an error
> Nov  8 03:44:53 pine64 kernel: (da0:umass-sim0:0:0:0): Retrying =
command
> Nov  8 03:44:55 pine64 kernel: gic0: Spurious interrupt detected: last =
irq: 106 on CPU3
> Nov  8 03:44:55 pine64 kernel: gic0: Spurious interrupt detected: last =
irq: 106 on CPU3

Somewhat after the mpfr installation above the "gic0" notices return.

devel/binutils started to build but did not complete.



=3D=3D=3D
Mark Millard
markmi at dsl-only.net




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7561E656-3EFC-4338-A419-C35BBA40D07A>