Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 07 Feb 2016 03:06:21 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 206990] powerpc (32-bit), projects/clang380-import vs. 11.0-CURRENT's sendsig: need to avoid signal delivery trashing the stack and so causing SIGSEGV
Message-ID:  <bug-206990-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D206990

            Bug ID: 206990
           Summary: powerpc (32-bit), projects/clang380-import vs.
                    11.0-CURRENT's sendsig: need to avoid signal delivery
                    trashing the stack and so causing SIGSEGV
           Product: Base System
           Version: 11.0-CURRENT
          Hardware: ppc
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: markmi@dsl-only.net

The observed problem:

For a TARGET_ARCH=3Dpowerpc clang 3.8.0 based buildworld installation: atte=
mpting
"make -j 6 buildworld"  (run on 4 powerpc cores) eventually gets a segmenta=
tion
fault. (More details later.) "make buildworld" does not fault. (The example
hardware currently in use is a Quad Core PowerMac G5 but not with a 64-bit
buildworld.)

(This is with the content of sys/powerpc/powerpc/sigcode32.S -r295186 in pl=
ace
so that that part of the signal delivery maintains the modulo 16 byte
stack/frame alignment for the handler. clang 3.8.0 sometimes generates code
that depends on the alignment in ways gcc 4.2.1's code does not.)

I used ktrace/kdump commands of the structure:

ktrace -di -f /usr/obj/make.out -t cs -p ???
kdump -E -f /usr/obj/make.out -p ??? > /var/tmp/make_ktrace_sigsegv_??.txt

to investigate the context of the SIGSEGV's. Example results (showing the l=
ines
that are always the same at the end for the failing process --but for addre=
sses
and timestamp variations anyway):

 65158 make     0.205791 PSIG  SIGCHLD caught handler=3D0x180aae0 mask=3D0x0
code=3DCLD_EXITED
 65158 make     0.205822 CALL  write(0x3,0x189e914,0x1)
 65158 make     0.205847 RET   write 1
 65158 make     0.205869 CALL  sigreturn(0xffffbb50)
 65158 make     0.205923 RET   sigreturn JUSTRETURN
 65158 make     0.205962 PSIG  SIGSEGV SIG_DFL code=3DSEGV_MAPERR

   599 make     5.552305 PSIG  SIGCHLD caught handler=3D0x180aae0 mask=3D0x0
code=3DCLD_EXITED
   599 make     5.552323 CALL  write(0x3,0x189e914,0x1)
   599 make     5.552337 RET   write 1
   599 make     5.552347 CALL  sigreturn(0xffffbb30)
   599 make     5.552358 RET   sigreturn JUSTRETURN
   599 make     5.552381 PSIG  SIGSEGV SIG_DFL code=3DSEGV_MAPERR

 75728 make     4.141097 PSIG  SIGCHLD caught handler=3D0x180aae0 mask=3D0x0
code=3DCLD_EXITED
 75728 make     4.141116 CALL  write(0x3,0x189e914,0x1)
 75728 make     4.141154 RET   write 1
 75728 make     4.141349 CALL  sigreturn(0xffffbaa0)
 75728 make     4.141366 RET   sigreturn JUSTRETURN
 75728 make     4.141404 PSIG  SIGSEGV SIG_DFL code=3DSEGV_MAPERR

 12195 make     27.213277 PSIG  SIGCHLD caught handler=3D0x180aae0 mask=3D0=
x0
code=3DCLD_EXITED
 12195 make     27.213322 CALL  write(0x3,0x189e914,0x1)
 12195 make     27.213346 RET   write 1
 12195 make     27.213361 CALL  sigreturn(0xffffb1e0)
 12195 make     27.213383 RET   sigreturn JUSTRETURN
 12195 make     27.213418 PSIG  SIGSEGV SIG_DFL code=3DSEGV_MAPERR

 50545 make     80.255162 PSIG  SIGCHLD caught handler=3D0x180aae0 mask=3D0=
x0
code=3DCLD_EXITED
 50545 make     80.255192 CALL  write(0x3,0x189e914,0x1)
 50545 make     80.255219 RET   write 1
 50545 make     80.255241 CALL  sigreturn(0xffffafa0)
 50545 make     80.255265 RET   sigreturn JUSTRETURN
 50545 make     80.255317 PSIG  SIGSEGV SIG_DFL code=3DSEGV_MAPERR

Every example SIGSEGV from "make -j 6 buildworld" attempts were like that.

Which instance of make varied and where in make varied. The "-E" elapsed ti=
me
give a solid clue to there being variability in when the fault happens: It =
is
not some local property of specific code.

I'll use some script log file sizes as another indication of variability. I=
've
sorted them:

2942664
3304207
3342660
3474585
3941983

So spanning from 2.9 MBytes to 3.9 MBytes. I've since gotten a few with less
and some with more.



The cause:

Comparing clang 3.8.0 generated code for TARGET_ARCH=3Dpowerpc to gcc 4.2.1
generated code. . .

clang 3.8.0 based Str_Match preamble (from make):

0x181a4a8 <Str_Match>:  mflr =C2=A0=C2=A0=C2=A0r0
0x181a4ac <Str_Match+4>:        stw =C2=A0=C2=A0=C2=A0=C2=A0r31,-4(r1) # Cl=
ang's frame pointer
(r31)=20
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
 # saved before stack pointer
changed.
0x181a4b0 <Str_Match+8>:        stw =C2=A0=C2=A0=C2=A0=C2=A0r0,4(r1) =C2=A0=
=C2=A0# lr saved before stack
pointer changed.
0x181a4b4 <Str_Match+12>:       stwu =C2=A0=C2=A0=C2=A0r1,-32(r1) # Stack p=
ointer finally
saved and
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
 # changed.
0x181a4b8 <Str_Match+16>:       mr =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0r31,r1 =C2=
=A0=C2=A0=C2=A0=C2=A0# r31 is the frame pointer
under clang.
0x181a4bc <Str_Match+20>:       stw =C2=A0=C2=A0=C2=A0=C2=A0r30,24(r31)

gcc 4.2.1 based Str_Match preamble:

0x1819cb8 <Str_Match>:  mflr =C2=A0=C2=A0=C2=A0r0
0x1819cbc <Str_Match+4>:        stwu =C2=A0=C2=A0=C2=A0r1,-32(r1) # Stack p=
ointer saved and
changed first.
0x1819cc0 <Str_Match+8>:        stw =C2=A0=C2=A0=C2=A0=C2=A0r31,28(r1) # r3=
1 saved after stack
pointer changed.
0x1819cc4 <Str_Match+12>:       mr =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0r31,r3 =C2=
=A0=C2=A0=C2=A0=C2=A0# gcc 4.2.1 does not reserve
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
 # r31 for use as a frame
pointer.
0x1819cc8 <Str_Match+16>:       stw =C2=A0=C2=A0=C2=A0=C2=A0r30,24(r1)
0x1819ccc <Str_Match+20>:       stw =C2=A0=C2=A0=C2=A0=C2=A0r0,36(r1) =C2=
=A0# lr saved after stack
pointer changed.

Picking a different example for postamble code, showing just clang 3.8.0's
code:

0x1801b8c <Buf_AddBytes+104>:   lwz =C2=A0=C2=A0=C2=A0=C2=A0r30,24(r31)
0x1801b90 <Buf_AddBytes+108>:   lwz =C2=A0=C2=A0=C2=A0=C2=A0r29,20(r31)
0x1801b94 <Buf_AddBytes+112>:   lwz =C2=A0=C2=A0=C2=A0=C2=A0r28,16(r31)
0x1801b98 <Buf_AddBytes+116>:   lwz =C2=A0=C2=A0=C2=A0=C2=A0r27,12(r31)
0x1801b9c <Buf_AddBytes+120>:   lwz =C2=A0=C2=A0=C2=A0=C2=A0r26,8(r31)
0x1801ba0 <Buf_AddBytes+124>:   addi =C2=A0=C2=A0=C2=A0r1,r1,32 =C2=A0=C2=
=A0# Stack pointer adjusted
first
0x1801ba4 <Buf_AddBytes+128>:   lwz =C2=A0=C2=A0=C2=A0=C2=A0r0,4(r1)
0x1801ba8 <Buf_AddBytes+132>:   lwz =C2=A0=C2=A0=C2=A0=C2=A0r31,-4(r1) # Th=
en Frame Pointer load
happens
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
 # "outside" the new stack
range.
0x1801bac <Buf_AddBytes+136>:   mtlr =C2=A0=C2=A0=C2=A0r0
0x1801bb0 <Buf_AddBytes+140>:   blr

In other words: clang 3.8.0's generated 32-bit powerpc code is based on the=
re
being a safe scratch area below the stack ("below" by memory address). So
similar to the 224 byte "red zone" area that 32-bit AIX powerpc and 32-bit
Darwin powerpc use.

But sendsig( sig_t, ksiginfo_t*, sigset_t*) in
sys/powerpc/powerpc/exec_machdep.c only maintains such a scratch area for
64-bit code contexts, where it uses the "288 byte scratch region below the
stack" that 64-bit Darwin and the like use.

So on 32-bit powerpc (and lib32?) sendsig sometimes causes replacement of t=
he
stored frame pointer value before the matching "lwz r31,-4(r1)" happens. And
that leads to later segmentation faults after the "lwz r31,-4(r1)".

Note: Other than "wasting" some bytes temporarily, having a "red zone" like
scratch area is compatible with gcc 4.2.1 style code as well.


A fix?. . .

I'm testing "make -j 6 buildworld" on the G5 now based on the following
proof-of-concept patch. It is still running and has gotten much farther than
all prior attempts. But it will be some time before the G5 and a G4 test are
complete.

Index: /usr/src/sys/powerpc/powerpc/exec_machdep.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /usr/src/sys/powerpc/powerpc/exec_machdep.c (revision 295351)
+++ /usr/src/sys/powerpc/powerpc/exec_machdep.c (working copy)
@@ -155,6 +155,31 @@
        ksi->ksi_info.si_addr =3D (void *)((tf->exc =3D=3D EXC_DSI) ?=20
            tf->dar : tf->srr0);

+/*
+ * clang 3.8.0+ for TARGET_ARCH=3Dpowerpc (32bit) generates the likes of
+ * "stw r31, -4(r1)", placing its frame pointer (r31) where the stack
+ * pointer does not yet reach. It may well at times put even more out
+ * there before adjusting the stack pointer.
+ *
+ * clang also generates "lwz r31, -4(r1)" after incrementing r1 during
+ * the return sequence: again there is a time during which the frame
+ * pointer storage is outside where the stack pointer reaches.
+ *
+ * Without a "scratch region below the stack" that is respected for
+ * signal delivery the frame pointer value is sometimes trashed and
+ * that leads to later segmentation faults. ("Below" by memory
+ * address viewpoint.)
+ *
+ * Using the AIX/Darwin 224 Byte "red-zone" rule for TARGET_ARCH=3Dpowerpc
+ * here is compatible with gcc 4.2.1's code generation that moves the stack
+ * pointer first. (But it does then waste some bytes temporarily), So
+ * have TARGET_ARCH=3Dpowerpc be similar to TARGET_ARCH=3Dpowerpc64 in its
+ * use of a "scratch region below the stack".
+ *
+ * 224 avoids changing the 16-byte alignment property.
+ */
+#define PPC32_SSCRATCH 224
+
        #ifdef COMPAT_FREEBSD32
        if (SV_PROC_FLAG(p, SV_ILP32)) {
                siginfo_to_siginfo32(&ksi->ksi_info, &siginfo32);
@@ -162,7 +187,7 @@
                code =3D siginfo32.si_code;
                sfp =3D (caddr_t)&sf32;
                sfpsize =3D sizeof(sf32);
-               rndfsize =3D ((sizeof(sf32) + 15) / 16) * 16;
+               rndfsize =3D PPC32_SSCRATCH + ((sizeof(sf32) + 15) / 16) * =
16;

                /*
                 * Save user context
@@ -191,9 +216,11 @@
                 */
                rndfsize =3D 288 + ((sizeof(sf) + 47) / 48) * 48;
                #else
-               rndfsize =3D ((sizeof(sf) + 15) / 16) * 16;
+               rndfsize =3D PPC32_SSCRATCH + ((sizeof(sf) + 15) / 16) * 16;
                #endif

+#undef PPC32_SSCRATCH
+
                /*
                 * Save user context
                 */

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-206990-8>