From owner-freebsd-arm@freebsd.org Tue Feb 14 05:31:38 2017 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 90C22CDE169 for ; Tue, 14 Feb 2017 05:31:38 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-74.reflexion.net [208.70.210.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 55F071B4E for ; Tue, 14 Feb 2017 05:31:37 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 11392 invoked from network); 14 Feb 2017 05:26:50 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 14 Feb 2017 05:26:50 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.30.0) with SMTP; Tue, 14 Feb 2017 00:24:50 -0500 (EST) Received: (qmail 27974 invoked from network); 14 Feb 2017 05:24:50 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 14 Feb 2017 05:24:50 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id CA48BEC8FB9; Mon, 13 Feb 2017 21:24:49 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: pine64 (an A64 Cortex-A53 context, e.g. -r312982): sh`forkshell child-process path after fork sometimes has a bad stack pointer value From: Mark Millard In-Reply-To: <2D04FF37-DEC8-42CE-961D-AE8CD58A0EAA@dsl-only.net> Date: Mon, 13 Feb 2017 21:24:49 -0800 Cc: freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: <93064627-5F72-4167-90B1-0A98ABF4C99C@dsl-only.net> References: <2D04FF37-DEC8-42CE-961D-AE8CD58A0EAA@dsl-only.net> To: andrew@freebsd.org, Shawn Webb , Tom Vijlbrief X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Feb 2017 05:31:38 -0000 [A top post of 7 summaries of more stack pointer changes for before fork vs. after fork.] Unless I figure out a way to well-track what is happening internal to fork's activity this is likely my last post of evidence for the issue. Based on the new test that detects both directions and checks the parent and child cases for fork's return value. . . All the examples of core dumps report: (lldb) print pid (pid_t) $0 =3D 0 and so go down the child-process code path in sh`forkshell. There are no examples of the parent code path cases having any such issue. I do not show the "print commandname" output but it varies greatly. I have 3 examples of the stack pointer getting smaller instead of staying at the same value, each landed in the potential active stack area and so did not trash other memory areas: (lldb) up 3 frame #3: 0x000000000040f56c sh`forkshell(jp=3D, = n=3D, mode=3D) + 1080 at jobs.c:859 856 stack_address_before_fork =3D example_stack_address(); 857 pid =3D fork(); 858 stack_address_after_fork =3D example_stack_address(); -> 859 if (stack_address_after_fork !=3D = stack_address_before_fork) abort(); 860 if (pid =3D=3D -1) { 861 TRACE(("Fork failed, errno=3D%d\n", errno)); 862 INTON; (lldb) register read General Purpose Registers: x19 =3D 0x0000000000000000 x20 =3D 0x0000000040a33180 x21 =3D 0x0000000000000000 x22 =3D 0x0000000040abce68 x23 =3D 0x0000000000434000 sh..bss + 6336 x24 =3D 0x0000000000434000 sh..bss + 6336 x25 =3D 0x0000000000000003 x26 =3D 0x0000000040a33180 x27 =3D 0x0000ffffffffcb08 x28 =3D 0x0000000000000000 fp =3D 0x0000ffffffffca70 lr =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 sp =3D 0x0000ffffffffc960 pc =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 20 registers were unavailable. (lldb) print/x stack_address_before_fork (uintptr_t) $0 =3D 0x0000ffffffffca10 (lldb) print/x stack_address_after_fork (uintptr_t) $1 =3D 0x0000ffffffffc940 0x0000ffffffffca10-0x0000ffffffffc940 =3D 0xD0 (lldb) up 3 frame #3: 0x000000000040f56c sh`forkshell(jp=3D, = n=3D, mode=3D) + 1080 at jobs.c:859 856 stack_address_before_fork =3D example_stack_address(); 857 pid =3D fork(); 858 stack_address_after_fork =3D example_stack_address(); -> 859 if (stack_address_after_fork !=3D = stack_address_before_fork) abort(); 860 if (pid =3D=3D -1) { 861 TRACE(("Fork failed, errno=3D%d\n", errno)); 862 INTON; (lldb) register read General Purpose Registers: x19 =3D 0x0000000000000000 x20 =3D 0x0000000040a23300 x21 =3D 0x0000000000000000 x22 =3D 0x0000000040a50510 x23 =3D 0x0000000000434000 sh..bss + 6336 x24 =3D 0x0000000000434000 sh..bss + 6336 x25 =3D 0x0000000040a23300 x26 =3D 0x00000000ffffffff x27 =3D 0x0000000000434000 sh..bss + 6336 x28 =3D 0x0000000040a50538 fp =3D 0x0000ffffffffbdc0 lr =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 sp =3D 0x0000ffffffffbcb0 pc =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 20 registers were unavailable. (lldb) print/x stack_address_before_fork (uintptr_t) $0 =3D 0x0000ffffffffbd60 (lldb) print/x stack_address_after_fork (uintptr_t) $1 =3D 0x0000ffffffffbc90 0x0000ffffffffbd60-0x0000ffffffffbc90 =3D 0xD0 (No more examples of 0xD0 though.) (lldb) up 3 frame #3: 0x000000000040f56c sh`forkshell(jp=3D, = n=3D, mode=3D) + 1080 at jobs.c:859 856 stack_address_before_fork =3D example_stack_address(); 857 pid =3D fork(); 858 stack_address_after_fork =3D example_stack_address(); -> 859 if (stack_address_after_fork !=3D = stack_address_before_fork) abort(); 860 if (pid =3D=3D -1) { 861 TRACE(("Fork failed, errno=3D%d\n", errno)); 862 INTON; (lldb) register read General Purpose Registers: x19 =3D 0x0000000000000000 x20 =3D 0x0000000040a36180 x21 =3D 0x0000000000000002 x22 =3D 0x0000000040a53150 x23 =3D 0x0000000000434000 sh..bss + 6336 x24 =3D 0x0000000000434000 sh..bss + 6336 x25 =3D 0x00000000000000fa x26 =3D 0x0000000040a53e40 x27 =3D 0x0000000040a53226 x28 =3D 0x0000000000434000 sh..bss + 6336 fp =3D 0x0000ffffffffcf90 lr =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 sp =3D 0x0000ffffffffb540 pc =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 20 registers were unavailable. (lldb) print/x stack_address_before_fork (uintptr_t) $0 =3D 0x0000ffffffffcf30 (lldb) print/x stack_address_after_fork (uintptr_t) $1 =3D 0x0000ffffffffb520 0x0000ffffffffcf30-0x0000ffffffffb520 =3D 0x1A10 For the rest of the examples the stack pointer gets larger, but by widely variable amounts: (lldb) up 3 frame #3: 0x000000000040f56c sh`forkshell(jp=3D, = n=3D, mode=3D) + 1080 at jobs.c:859 856 stack_address_before_fork =3D example_stack_address(); 857 pid =3D fork(); 858 stack_address_after_fork =3D example_stack_address(); -> 859 if (stack_address_after_fork !=3D = stack_address_before_fork) abort(); 860 if (pid =3D=3D -1) { 861 TRACE(("Fork failed, errno=3D%d\n", errno)); 862 INTON; (lldb) register read General Purpose Registers: x19 =3D 0x0000000000000000 x20 =3D 0x0000000040a350c0 x21 =3D 0x0000000000000002 x22 =3D 0x0000000040a49898 x23 =3D 0x0000000000434000 sh..bss + 6336 x24 =3D 0x0000000000434000 sh..bss + 6336 x25 =3D 0x00000000000000fb x26 =3D 0x0000000040a49b90 x27 =3D 0x0000000040a49918 x28 =3D 0x0000000000434000 sh..bss + 6336 fp =3D 0x0000ffffffffcee0 lr =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 sp =3D 0x0000ffffffffd180 pc =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 20 registers were unavailable. (lldb) print/x stack_address_before_fork (uintptr_t) $0 =3D 0x0000ffffffffce80 (lldb) print/x stack_address_after_fork (uintptr_t) $1 =3D 0x0000ffffffffd160 0x0000ffffffffd160-0x0000ffffffffce80 =3D 0x2E0 (lldb) up 3 frame #3: 0x000000000040f56c sh`forkshell(jp=3D, = n=3D, mode=3D) + 1080 at jobs.c:859 856 stack_address_before_fork =3D example_stack_address(); 857 pid =3D fork(); 858 stack_address_after_fork =3D example_stack_address(); -> 859 if (stack_address_after_fork !=3D = stack_address_before_fork) abort(); 860 if (pid =3D=3D -1) { 861 TRACE(("Fork failed, errno=3D%d\n", errno)); 862 INTON; (lldb) register read General Purpose Registers: x19 =3D 0x0000000000000000 x20 =3D 0x0000000040a33180 x21 =3D 0x0000000000000000 x22 =3D 0x0000000040a53748 x23 =3D 0x0000000000434000 sh..bss + 6336 x24 =3D 0x0000000000434000 sh..bss + 6336 x25 =3D 0x0000000040a33180 x26 =3D 0x00000000ffffffff x27 =3D 0x0000000000434000 sh..bss + 6336 x28 =3D 0x0000000040a53770 fp =3D 0x0000ffffffffcae0 lr =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 sp =3D 0x0000ffffffffd5a0 pc =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 20 registers were unavailable. (lldb) print/x stack_address_before_fork (uintptr_t) $0 =3D 0x0000ffffffffca80 (lldb) print/x stack_address_after_fork (uintptr_t) $1 =3D 0x0000ffffffffd580 0x0000ffffffffd580-0x0000ffffffffca80 =3D 0xB00 (lldb) up 3 frame #3: 0x000000000040f56c sh`forkshell(jp=3D, = n=3D, mode=3D) + 1080 at jobs.c:859 856 stack_address_before_fork =3D example_stack_address(); 857 pid =3D fork(); 858 stack_address_after_fork =3D example_stack_address(); -> 859 if (stack_address_after_fork !=3D = stack_address_before_fork) abort(); 860 if (pid =3D=3D -1) { 861 TRACE(("Fork failed, errno=3D%d\n", errno)); 862 INTON; (lldb) register read General Purpose Registers: x19 =3D 0x0000000000000000 x20 =3D 0x0000000040a33180 x21 =3D 0x0000000000000000 x22 =3D 0x0000000040a9f6b8 x23 =3D 0x0000000000434000 sh..bss + 6336 x24 =3D 0x0000000000434000 sh..bss + 6336 x25 =3D 0x0000000040a33180 x26 =3D 0x0000000000000003 x27 =3D 0x0000000000434000 sh..bss + 6336 x28 =3D 0x0000000040a9f6a8 fp =3D 0x0000ffffffffc710 lr =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 sp =3D 0x0000ffffffffce40 pc =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 20 registers were unavailable. (lldb) print/x stack_address_before_fork (uintptr_t) $0 =3D 0x0000ffffffffc6b0 (lldb) print/x stack_address_after_fork (uintptr_t) $1 =3D 0x0000ffffffffce20 0x0000ffffffffce20-0x0000ffffffffc6b0 =3D 0x770 (lldb) up 3 frame #3: 0x000000000040f56c sh`forkshell(jp=3D, = n=3D, mode=3D) + 1080 at jobs.c:859 856 stack_address_before_fork =3D example_stack_address(); 857 pid =3D fork(); 858 stack_address_after_fork =3D example_stack_address(); -> 859 if (stack_address_after_fork !=3D = stack_address_before_fork) abort(); 860 if (pid =3D=3D -1) { 861 TRACE(("Fork failed, errno=3D%d\n", errno)); 862 INTON; (lldb) register read General Purpose Registers: x19 =3D 0x0000000000000000 x20 =3D 0x0000000040a33180 x21 =3D 0x0000000000000000 x22 =3D 0x0000000040a9be28 x23 =3D 0x0000000000434000 sh..bss + 6336 x24 =3D 0x0000000000434000 sh..bss + 6336 x25 =3D 0x0000000040a33180 x26 =3D 0x0000000000000003 x27 =3D 0x0000000000434000 sh..bss + 6336 x28 =3D 0x0000000040a9bdb0 fp =3D 0x0000ffffffffca50 lr =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 sp =3D 0x0000ffffffffd8a0 pc =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 20 registers were unavailable. (lldb) print/x stack_address_before_fork (uintptr_t) $2 =3D 0x0000ffffffffc9f0 (lldb) print/x stack_address_after_fork (uintptr_t) $3 =3D 0x0000ffffffffd880 0x0000ffffffffd880-0x0000ffffffffc9f0 =3D 0xE90 That is it for evidence, at least for now. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2017-Feb-12, at 6:46 PM, Mark Millard wrote: > On 2017-Feb-12, at 2:52 AM, Mark Millard = wrote: >=20 >> On pine64 (an A64 Cortex-A53 context) multiple people on the lists >> including me have reported sh getting occasional core dumps. >>=20 >> I've analyzed a bunch of the sh core dumps and all failed in the >> child-process path of forkshell when forkshell tried to return.=20 >> I've since done experiments with code to detect some forms of >> odd stack pointer values so that the adjusted code calls abort >> for such a detection before such a return would happen. [This >> gives a nicer context to look at in core dumps (before things >> are very messed up if the sp is bad).] >>=20 >> In sh`forkshell, just after the fork returns, on the child-process >> path there is sometimes a messed up sp value by what direction >> it is from the prior frame-pointers on the stack --and on occasion >> the value difference is very large, such as: >> (from: lldb register read on the frame with the pc in sh`forkshell ) >>=20 >> fp =3D 0x0000ffffffffce90 >> sp =3D 0x0000ffffffffe980 >>=20 >> This has the sp with a larger address than what sh`__start >> stored as the frame-pointer back-link when it is put to use via >> ld-elf.so.1`.rtld_start (more like 0x0000ffffffffde10 as I >> remember): outside the active stack region. >>=20 >> [Note: my experiments so far would not establish if the sp >> might sometimes have an unexpectedly large distance toward >> lower memory addresses, specially if it was still in the >> potential stack-region. It may be that both directions >> happen.] >>=20 >> The distance when it fails is vary variable across examples. >> I just picked an example were stack frames would be written >> over the top of other material when sh`forkshell makes other >> calls on the child-process path, material that would be >> outside what should be the active stack region. >>=20 >> # uname -apKU >> FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r312982M arm64 = aarch64 1200020 1200020 >>=20 >> (I've frozen at that version for this exploration. >> It has taken me a while.) >>=20 >> Looking around I see what might be a few possibilities. . . >> (I'm no expert so some might be trivially eliminated.) >>=20 >>=20 >> Possibility #0 (possibilities in no particular order): >>=20 >> sys/arm64/arm64/vm_machdep.c : >>=20 >> In cpu_fork what if the bcopy of td1-td_frame might not >> always have access to the latest updated values, needing >> some form of memory "fence" to be sure that such values are >> accessible? : >>=20 >> tf =3D (struct trapframe *)STACKALIGN((struct trapframe *)pcb2 = - 1); >> bcopy(td1->td_frame, tf, sizeof(*tf)); >> tf->tf_x[0] =3D 0; >> tf->tf_x[1] =3D 0; >> tf->tf_spsr =3D 0; >>=20 >> td2->td_frame =3D tf; >>=20 >> /* Set the return value registers for fork() */ >> td2->td_pcb->pcb_x[8] =3D (uintptr_t)fork_return; >> td2->td_pcb->pcb_x[9] =3D (uintptr_t)td2; >> td2->td_pcb->pcb_x[PCB_LR] =3D (uintptr_t)fork_trampoline; >> td2->td_pcb->pcb_sp =3D (uintptr_t)td2->td_frame; >> td2->td_pcb->pcb_fpusaved =3D &td2->td_pcb->pcb_fpustate; >> td2->td_pcb->pcb_vfpcpu =3D UINT_MAX; >>=20 >> /* Setup to release spin count in fork_exit(). */ >> td2->td_md.md_spinlock_count =3D 1; >> td2->td_md.md_saved_daif =3D 0; >>=20 >>=20 >> Possibility #1: >>=20 >> sys/arm64/arm64/swtch.S : >>=20 >> ENTRY(fork_trampoline) >> . . . >> /* Restore sp and lr */ >> ldp x0, x1, [sp] >> msr sp_el0, x0 >> mov lr, x1 >>=20 >> Similar point to #0 but for the ldp memory accesses >> shown. >>=20 >>=20 >> Possibility #3: >>=20 >> sys/arm64/arm64/exception.S : >>=20 >> Both of: >>=20 >> handle_el0_sync >> handle_el0_irq >>=20 >> also update sp_el0 and so if any such can happen >> during any part of fork_trampoline after its >> "msr sp_el0, x0" but before its "msr daifset, #2" >> (disabling interrupts), then the wrong sp_el0 value >> would be in place at fork_tramploine's eret . >>=20 >>=20 >> It will be interesting to see what the problem actually >> was once it has been fixed. >=20 > I have updated to a test that should report any time > fork changes the stack pointer from what it was before > the fork, parent or child process: >=20 > # svnlite diff /usr/src/bin/sh/miscbltin.c /usr/src/bin/sh/jobs.c > Index: /usr/src/bin/sh/miscbltin.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- /usr/src/bin/sh/miscbltin.c (revision 312982) > +++ /usr/src/bin/sh/miscbltin.c (working copy) > @@ -64,6 +64,15 @@ >=20 > #undef eflag >=20 > + > +/* JUST FOR TESTING */ > +uintptr_t example_stack_address(void) > +{ > + volatile uintptr_t test =3D 0; > + return (uintptr_t)(void*)&test; > +} > + > + > int readcmd(int, char **); > int umaskcmd(int, char **); > int ulimitcmd(int, char **); > Index: /usr/src/bin/sh/jobs.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- /usr/src/bin/sh/jobs.c (revision 312982) > +++ /usr/src/bin/sh/jobs.c (working copy) > @@ -51,6 +51,9 @@ > #include > #include >=20 > +/* JUST FOR TESTING */ > +#include > + > #include "shell.h" > #if JOBS > #include > @@ -833,6 +836,11 @@ > * in a pipeline). > */ >=20 > +extern uintptr_t example_stack_address(void); > + > +uintptr_t stack_address_before_fork =3D 0; > +uintptr_t stack_address_after_fork =3D 0; > + > pid_t > forkshell(struct job *jp, union node *n, int mode) > { > @@ -845,7 +853,10 @@ > if (mode =3D=3D FORK_BG && (jp =3D=3D NULL || jp->nprocs =3D=3D = 0)) > checkzombies(); > flushall(); > + stack_address_before_fork =3D example_stack_address(); > pid =3D fork(); > + stack_address_after_fork =3D example_stack_address(); > + if (stack_address_after_fork !=3D stack_address_before_fork) = abort(); > if (pid =3D=3D -1) { > TRACE(("Fork failed, errno=3D%d\n", errno)); > INTON; > @@ -946,7 +957,6 @@ > return pid; > } >=20 > - > pid_t > vforkexecshell(struct job *jp, char **argv, char **envp, const char = *path, int idx, int pip[2]) > { >=20 >=20 > I've been using repeated attempts to build devel/aarch64-none-elf-gcc > for testing because the configure activity does lots of shell work > that forks. (I first noticed the problem in such a context.) >=20 > The first example core file from repeated attempts to build > devel/aarch64-none-elf-gcc with this new test shows: >=20 > (lldb) bt > * thread #1: tid =3D 100188, 0x0000000040554e54 libc.so.7`_thr_kill + = 8, name =3D 'sh', stop reason =3D signal SIGABRT > * frame #0: 0x0000000040554e54 libc.so.7`_thr_kill + 8 > frame #1: 0x0000000040554e18 libc.so.7`__raise(s=3D6) + 64 at = raise.c:52 > frame #2: 0x0000000040554d8c libc.so.7`abort + 84 at abort.c:65 > frame #3: 0x000000000040f56c sh`forkshell(jp=3D, = n=3D, mode=3D) + 1080 at jobs.c:859 > (lldb) up 3 > frame #3: 0x000000000040f56c sh`forkshell(jp=3D, = n=3D, mode=3D) + 1080 at jobs.c:859 > 856 stack_address_before_fork =3D = example_stack_address(); > 857 pid =3D fork(); > 858 stack_address_after_fork =3D = example_stack_address(); > -> 859 if (stack_address_after_fork !=3D = stack_address_before_fork) abort(); > 860 if (pid =3D=3D -1) { > 861 TRACE(("Fork failed, errno=3D%d\n", = errno)); > 862 INTON; >=20 > (lldb) register read > General Purpose Registers: > x19 =3D 0x0000000000000000 > x20 =3D 0x0000000040a350c0 > x21 =3D 0x0000000000000002 > x22 =3D 0x0000000040a49898 > x23 =3D 0x0000000000434000 sh..bss + 6336 > x24 =3D 0x0000000000434000 sh..bss + 6336 > x25 =3D 0x00000000000000fb > x26 =3D 0x0000000040a49b90 > x27 =3D 0x0000000040a49918 > x28 =3D 0x0000000000434000 sh..bss + 6336 > fp =3D 0x0000ffffffffcee0 > lr =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 > sp =3D 0x0000ffffffffd180 > pc =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 > 20 registers were unavailable. >=20 > (lldb) print/x stack_address_before_fork > (uintptr_t) $0 =3D 0x0000ffffffffce80 >=20 > (lldb) print/x stack_address_after_fork > (uintptr_t) $1 =3D 0x0000ffffffffd160 >=20 > (lldb) print/x main_handler > (jmploc) $2 =3D { > loc =3D { > [0] =3D { > _jb =3D { > [0] =3D 0x0000ffffffffdb60fb5d25837d7ff700 > [1] =3D 0x00000000000000170000ffffffffdc08 > [2] =3D 0x00000000004320380000000000434a10 > [3] =3D 0x00000000000000000000000000000000 > [4] =3D 0x00000000000000000000000000000000 > [5] =3D 0x00000000000000000000000000000000 > [6] =3D 0x0000000000410c800000ffffffffdbb0 > . . . >=20 > (main_handler is from setjmp in main.) >=20 > In this example of the context for sh`forkshell after the fork: >=20 > fp =3D 0x0000ffffffffcee0 < sp =3D 0x0000ffffffffd180 < = 0x0000ffffffffdbb0 >=20 > That explains why the bt stopped with frame #3: part of the stack > had been trashed by the calls to example_stack_address and to abort > (and what its internal call chain involves), unlike the prior example > that trashed memory outside the active stack region. >=20 > Compared to the prior example's sp =3D 0x0000ffffffffe980 the > distance from the frame pointer in sh`forkshell is much smaller this > time --but it is still in the same (wrong) direction. >=20 > This illustrates the variability in the bad sp value from bad-case > to bad-case. >=20 > I plan to leave the repeated builds running for a time to accumulate > some more core files based on this new test. If any examples happen > of sp decreasing instead of increasing, the code should core dump for > those as well, unlike my prior testing. >=20 >=20 >=20 > Context details: >=20 > # more /usr/src/sys/arm64/conf/GENERIC-NODBG > # > # GENERIC -- Custom configuration for the arm64/aarch64 > # >=20 > include "GENERIC" >=20 > ident GENERIC-NODBG >=20 > makeoptions DEBUG=3D-g # Build kernel with gdb(1) = debug symbols >=20 > options ALT_BREAK_TO_DEBUGGER >=20 > options KDB # Enable kernel debugger = support >=20 > # For minimum debugger support (stable branch) use: > #options KDB_TRACE # Print a stack trace for a = panic > options DDB # Enable the kernel debugger >=20 > # Extra stuff: > #options VERBOSE_SYSINIT # Enable verbose sysinit = messages > #options BOOTVERBOSE=3D1 > #options BOOTHOWTO=3DRB_VERBOSE > #options KTR > #options KTR_MASK=3DKTR_TRAP > ##options KTR_CPUMASK=3D0xF > #options KTR_VERBOSE >=20 > # Disable any extra checking for. . . > nooptions DEADLKRES # Enable the deadlock resolver > nooptions INVARIANTS # Enable calls of extra sanity = checking > nooptions INVARIANT_SUPPORT # Extra sanity checks of = internal structures, required by INVARIANTS > nooptions WITNESS # Enable checks to detect = deadlocks and cycles > nooptions WITNESS_SKIPSPIN # Don't run witness on = spinlocks for speed > nooptions DIAGNOSTIC > nooptions MALLOC_DEBUG_MAXZONES # Separate malloc(9) zones >=20 >=20 > The -r312982 variant is a cross build with -mcpu=3Dcortex-a53 > specified: >=20 > # more ~/src.configs/src.conf.pine64-clang-bootstrap.amd64-host=20 > TO_TYPE=3Daarch64 > TOOLS_TO_TYPE=3D${TO_TYPE} > # > KERNCONF=3DGENERIC-NODBG > TARGET=3Darm64 > .if ${.MAKE.LEVEL} =3D=3D 0 > TARGET_ARCH=3D${TO_TYPE} > .export TARGET_ARCH > .endif > # > WITH_CROSS_COMPILER=3D > WITHOUT_SYSTEM_COMPILER=3D > # > WITH_LIBCPLUSPLUS=3D > WITHOUT_BINUTILS_BOOTSTRAP=3D > WITHOUT_ELFTOOLCHAIN_BOOTSTRAP=3D > WITH_CLANG_BOOTSTRAP=3D > WITH_CLANG=3D > WITH_CLANG_IS_CC=3D > WITH_CLANG_FULL=3D > WITH_CLANG_EXTRAS=3D > WITH_LLD=3D > WITH_LLDB=3D > # > WITH_BOOT=3D > WITHOUT_LIB32=3D > WITHOUT_LIBSOFT=3D > # > WITHOUT_GCC_BOOTSTRAP=3D > WITHOUT_GCC=3D > WITHOUT_GCC_IS_CC=3D > WITHOUT_GNUCXX=3D > # > NO_WERROR=3D > #WERROR=3D > MALLOC_PRODUCTION=3D > # > WITH_REPRODUCIBLE_BUILD=3D > WITH_DEBUG_FILES=3D > # > CROSS_BINUTILS_PREFIX=3D/usr/local/${TOOLS_TO_TYPE}-freebsd/bin/ > XCFLAGS+=3D -B${CROSS_BINUTILS_PREFIX} > XCXXFLAGS+=3D -B${CROSS_BINUTILS_PREFIX} > # There is no XCPPFLAGS but XCPP gets XCFLAGS content. > # > XCFLAGS+=3D -mcpu=3Dcortex-a53 > XCXXFLAGS+=3D -mcpu=3Dcortex-a53 > # There is no XCPPFLAGS but XCPP gets XCFLAGS content. >=20 >=20 > In order to cut down on the Spurious interrupt notices on the > console I eventually did the below personal adjustment specific > to my context (and it is in use above): >=20 > # svnlite diff /usr/src/sys/arm/arm/gic.c > Index: /usr/src/sys/arm/arm/gic.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- /usr/src/sys/arm/arm/gic.c (revision 312982) > +++ /usr/src/sys/arm/arm/gic.c (working copy) > @@ -672,9 +672,13 @@ >=20 > if (irq >=3D sc->nirqs) { > #ifdef GIC_DEBUG_SPURIOUS > +#define EXPECTED_SPURIOUS_IRQ 1023 > + if (irq !=3D EXPECTED_SPURIOUS_IRQ) { > device_printf(sc->gic_dev, > - "Spurious interrupt detected: last irq: %d on = CPU%d\n", > + "Spurious interrupt %d detected of %d: last irq: = %d on CPU%d\n", > + irq, sc->nirqs, > sc->last_irq[PCPU_GET(cpuid)], PCPU_GET(cpuid)); > + } > #endif > return (FILTER_HANDLED); > } > @@ -720,6 +724,16 @@ > if (irq < sc->nirqs) > goto dispatch_irq; >=20 > + if (irq !=3D EXPECTED_SPURIOUS_IRQ) { > +#undef EXPECTED_SPURIOUS_IRQ > +#ifdef GIC_DEBUG_SPURIOUS > + device_printf(sc->gic_dev, > + "Spurious end interrupt %d detected of %d: last = irq: %d on CPU%d\n", > + irq, sc->nirqs, > + sc->last_irq[PCPU_GET(cpuid)], PCPU_GET(cpuid)); > +#endif > + } > + > return (FILTER_HANDLED); > } >=20 > The added message block was to report unexpected values for the > exit path. None have been reported from there. >=20 > The gic.c changes eliminated all the Spurious interrupt notices = --which > made the console more reasonable to use. >=20 > But so far I've not come up with a way to discover what leads > to the irq=3D=3D1023 reports for the original message context. > [Interrupts handled by one core but sent to multiple cores?] >=20 >=20 > Other notes: >=20 > I switched to Andrew Turner's freebsd Email address. It might be more > appropriate. I also fixed the mistyped "Contex-a53" in the Subject to > say "Cortex-a53".