From owner-freebsd-toolchain@freebsd.org Tue Feb 2 09:48:53 2016 Return-Path: Delivered-To: freebsd-toolchain@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EACC0A98651 for ; Tue, 2 Feb 2016 09:48:52 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-1.reflexion.net [208.70.210.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AE1F51D06 for ; Tue, 2 Feb 2016 09:48:52 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 30537 invoked from network); 2 Feb 2016 09:48:44 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 2 Feb 2016 09:48:44 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v7.80.0) with SMTP; Tue, 02 Feb 2016 04:48:48 -0500 (EST) Received: (qmail 24814 invoked from network); 2 Feb 2016 09:48:47 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with SMTP; 2 Feb 2016 09:48:47 -0000 X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network Received: from [192.168.1.8] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id D52781C43C6; Tue, 2 Feb 2016 01:48:43 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: 3 quick questions about stack alignment for powerpc (32-bit) signal handlers [the change that caused misaligned] From: Mark Millard In-Reply-To: Date: Tue, 2 Feb 2016 01:48:43 -0800 Cc: Roman Divacky , Nathan Whitehorn , FreeBSD Toolchain , FreeBSD PowerPC ML Content-Transfer-Encoding: quoted-printable Message-Id: <8D38E67E-B798-4EFD-951F-DADFDBAEDD8A@dsl-only.net> References: <517B7923-5166-42D0-8FA8-52C05F956F06@dsl-only.net> <20160131140807.GA83147@vlakno.cz> <0716BE3E-B7D1-4A10-B011-C1F0245296E7@dsl-only.net> <70A66DFD-557A-4D82-813C-05EED6EAB089@dsl-only.net> <1CCB483E-882A-4068-AF5B-EF43DAF0BA79@dsl-only.net> <261D8A47-3B8A-4DE6-9D2C-F536C9143E84@dsl-only.net> To: Justin Hibbits X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-toolchain@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Maintenance of FreeBSD's integrated toolchain List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Feb 2016 09:48:53 -0000 I tried the change to -32 and 32 (from -20 and 20) on/for the powerpc = (32-bit) PowerMac that I use and the results were: A) "info frame" in gdb shows signal handlers are now started with = 16-byte aligned stack frames. (Applies to gcc 4.2.1 based contexts too, = not just to the clang 3.8.0 ones with the __vfprintf-tied segmentation = faults during signals.) and. . . B) The "clang 3.8.0 compiled __vfprintf" segmentation faults in = libc/stdio library code during signal handlers that use such code no = longer happen because the alignment matches the code requirements. I've added this information to Bug 206810. (Note: There are a couple of segmentation fault contexts that I've never = tied down to any specific property: no discovered evidence of signal = handler involvement or of __vfprintf involvement, for example. These are = still a problem. But where I had tied the faults to signal handlers = using __vfprintf now instead work fine in my experimental clang 3.8.0 = based builds.) =3D=3D=3D Mark Millard markmi at dsl-only.net On 2016-Feb-1, at 12:11 AM, Mark Millard wrote: The -16/16 code below produced correct alignment but too little space. The -20/20 code below produces enough space but misalignment. To maintain 16-byte alignment while increasing the space would have = required going from -16/16 to -32/32. At least that is how I understand = this code. > Index: sys/powerpc/powerpc/sigcode32.S > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/powerpc/powerpc/sigcode32.S = (.../head/sys/powerpc/powerpc/sigcode32.S) (revision 209975) > +++ sys/powerpc/powerpc/sigcode32.S = (.../projects/clang380-import/sys/powerpc/powerpc/sigcode32.S) (working = copy) > @@ -45,9 +45,9 @@ > */ > .globl CNAME(sigcode32),CNAME(szsigcode32) > CNAME(sigcode32): > - addi 1,1,-16 /* reserved space for callee = */ > + addi 1,1,-20 /* reserved space for callee = */ > blrl > - addi 3,1,16+SF_UC /* restore sp, and get = &frame->sf_uc */ > + addi 3,1,20+SF_UC /* restore sp, and get = &frame->sf_uc */ > li 0,SYS_sigreturn > sc /* sigreturn(scp) */ > li 0,SYS_exit The "working copy" is -r266778 from 2014-May-27. -r209975 is from 2010-Jul-13. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2016-Jan-31, at 10:58 PM, Mark Millard = wrote: Just a correction to a sentence that I wrote. I had written: > Frame at: 0x...90 vs. 0x...1c > call by frame: 0x...b0 vs. 0x...1c > Arglist at: 0x...70 vs. 0x...dc > Locals at: 0x...70 vs. 0x...dc > Previous frame's sp: 0x...90 vs. 0x...1c >=20 > It looks like 4 additional pad bytes on the user/process stack are = needed to get back to alignment. Of course the figures on the right need to get smaller, not larger: The = stack grows towards smaller addresses. So to get to 0x...0 on the right = I should have said: It looks like 12 additional pad bytes on the user/process stack are = needed to get back to alignment. That would produce: Frame at: 0x...90 vs. 0x...10 call by frame: 0x...b0 vs. 0x...10 Arglist at: 0x...70 vs. 0x...d0 Locals at: 0x...70 vs. 0x...d0 Previous frame's sp: 0x...90 vs. 0x...10 =3D=3D=3D Mark Millard markmi at dsl-only.net On 2016-Jan-31, at 10:47 PM, Mark Millard = wrote: More evidence: By adding "break raise" and then using "info frame" to = show the alignment at that point I can show that the later signal = delivery changes the alignment on the user process stack compared to = when raise was called. (Later I show the same for thr_kill.) > Breakpoint 2, __raise (s=3D29) at /usr/src/lib/libc/gen/raise.c:50 > warning: Source file is more recent than executable. > 50 if (__sys_thr_self(&id) =3D=3D -1) > (gdb) info frame > Stack level 0, frame at 0xffffdc90: > pc =3D 0x41904630 in __raise (/usr/src/lib/libc/gen/raise.c:50); saved = pc =3D 0x1800774 > called by frame at 0xffffdcb0 > source language c. > Arglist at 0xffffdc70, args: s=3D29 > Locals at 0xffffdc70, Previous frame's sp is 0xffffdc90 > Saved registers: > r29 at 0xffffdc84, r30 at 0xffffdc88, r31 at 0xffffdc8c, pc at = 0xffffdc94, lr at 0xffffdc94 > (gdb) cont > Continuing. >=20 > Program received signal SIGINFO, Information request. >=20 > Breakpoint 1, 0x018006d0 in handler () > (gdb) info frame > Stack level 0, frame at 0xffffd71c: > pc =3D 0x18006d0 in handler; saved pc =3D 0xffffe008 > called by frame at 0xffffd71c > Arglist at 0xffffd6dc, args:=20 > Locals at 0xffffd6dc, Previous frame's sp is 0xffffd71c > Saved registers: > r31 at 0xffffd718, pc at 0xffffd720, lr at 0xffffd720 Note the difference (raise before delivery vs. handler via delivery): Frame at: 0x...90 vs. 0x...1c call by frame: 0x...b0 vs. 0x...1c Arglist at: 0x...70 vs. 0x...dc Locals at: 0x...70 vs. 0x...dc Previous frame's sp: 0x...90 vs. 0x...1c It looks like 4 additional pad bytes on the user/process stack are = needed to get back to alignment. [The span of addresses seems to be about: = 0xffffdc90-0xffffd6dc=3D=3D0x5B4=3D=3D1460 (raise's "frame at" minus = handler's "Locals at").] If I look at the frame for "break thr_kill" it also still shows an = aligned user/process stack before the delivery: > Breakpoint 3, 0x419046a0 in thr_kill () from /lib/libc.so.7 > (gdb) info frame > Stack level 0, frame at 0xffffdc70: > pc =3D 0x419046a0 in thr_kill; saved pc =3D 0x41904650 > called by frame at 0xffffdc90 > Arglist at 0xffffdc70, args:=20 > Locals at 0xffffdc70, Previous frame's sp is 0xffffdc70 (The relevant addresses are the same as raise showed.) Reminder of the source program structure that uses the potentially = frame/stack alignment sensitive libc/stdio library code: > # more sig_snprintf_use_test.c=20 > #include // for signal, SIGINFO, SIG_ERR, raise. > #include // for snprintf >=20 > void handler(int sig) > { > char buf[32]; > snprintf(buf, sizeof buf, "%d", sig); // FreeBSD's world does such > // things in some of its = handlers. > } >=20 > int main(void) > { > handler(0); // handler gets aligned stack frame for this; snprintf = works here. > if (signal(SIGINFO, handler) !=3D SIG_ERR) raise(SIGINFO); > // raise gets aligned stack frame; > // handler gets misaligned stack frame; > // = snprintf/__vfrpintf/io_flush/__sfvwrite/memcpy: > // when built by clang 3.8.0 are = sensitive to > // the misalignment. > return 0; > } =3D=3D=3D Mark Millard markmi at dsl-only.net On 2016-Jan-31, at 9:12 PM, Mark Millard wrote: A summary of the later finding details for what I've done so far: It is system library code (__vfprintf and its inline io_flush call to = __sfvwrite) that may produce and use a potentially bad &iop->uio = address, depending the mix of how the calculation works and the = stack/frame alignment present in signal delivery. The gcc 4.2.1 vs. = clang 3.8.0 program status makes no difference to if it ends up with a = segmentation fault or not. When __vfprintf and its inline io_flush call to __sfvwrite is compiled = by gcc 4.2.1 --which always uses addition for offsets, voiding alignment = assumptions-- no variant of the program gets a segmentation fault. gcc = 4.2.1 does not create the dependency on the alignment that clang 3.8.0 = does. Yet the misalignment is present. (See the details.) When clang3.8.0 compiles __vfprintf and its inline io_flush call to = __sfvwrite --which uses masking for the offset in calculating &iop->uio, = making alignment assumptions-- every variant of the program gets a = segmentation fault. (The misalignment is still present.) The details for the misalignment evidence follow. For (C) "on a pure gcc 4.2.1 buildworld/buildkernel system". . . C0) For gcc421-a.out gets signal delivery to its handler: "info frame" = in this (C) context: This *has* a misaligned signal delivery stack but there is no = segmentation fault. > Program received signal SIGINFO, Information request. >=20 > Breakpoint 1, 0x018006e0 in handler () > (gdb) bt =20 > #0 0x018006e0 in handler () > #1 > #2 0x00000000 in ?? () > (gdb) info frame > Stack level 0, frame at 0xffffd73c: > pc =3D 0x18006e0 in handler; saved pc =3D 0xffffe008 > called by frame at 0xffffd73c > Arglist at 0xffffd6fc, args:=20 > Locals at 0xffffd6fc, Previous frame's sp is 0xffffd73c > Saved registers: > r31 at 0xffffd738, pc at 0xffffd740, lr at 0xffffd740 So misaligned (multiple of 4 but of no higher power of 2) for "frame = at", "called by frame at" (which is listed as the same as "frame at"), = "Arglist", "Locals", and "Previous frame's sp" (which is listed as the = same as "frame at"). In this case I also list __vfprintf's misalignment evidence for = reference: (break __vfprintf used.) > (gdb) info frame > Stack level 0, frame at 0xffffd57c: > pc =3D 0x41930af8 in __vfprintf = (/usr/src/lib/libc/stdio/vfprintf.c:452); saved pc =3D 0x41992e18 > called by frame at 0xffffd6fc > source language c. > Arglist at 0xffffd29c, args: fp=3D0xffffd5dc, locale=3D0x419c41e0 = <__xlocale_global_locale>, fmt0=3D0x1800a1c "%d", ap=3D0xffffd6cc > Locals at 0xffffd29c, Previous frame's sp is 0xffffd57c > Saved registers: > r30 at 0xffffd574, r31 at 0xffffd578, pc at 0xffffd580, lr at = 0xffffd580 So misaligned (multiple of 4 but of no higher power of 2) for "frame = at", "called by frame at", "Arglist", "Locals", and "Previous frame's = sp" (which is listed as the same as "frame at"). Just to have one for reference, here is the "info frame" for the direct = handler call --which gets a properly aligned frame/stack: > (gdb) info frame > Stack level 0, frame at 0xffffdcc0: > pc =3D 0x18006e0 in handler; saved pc =3D 0x1800734 > called by frame at 0xffffdcd0 > Arglist at 0xffffdc80, args:=20 > Locals at 0xffffdc80, Previous frame's sp is 0xffffdcc0 > Saved registers: > r31 at 0xffffdcbc, pc at 0xffffdcc4, lr at 0xffffdcc4 Only the signal delivery is creating non-aligned stack frames. C1) For clang380-a.out gets signal delivery to its handler: "info frame" = in this (C) context: This *has* a misaligned signal delivery stack but there is no = segmentation fault. > (gdb) info frame > Stack level 0, frame at 0xffffd70c: > pc =3D 0x18006d0 in handler; saved pc =3D 0xffffe008 > called by frame at 0xffffd70c > Arglist at 0xffffd6cc, args:=20 > Locals at 0xffffd6cc, Previous frame's sp is 0xffffd70c > Saved registers: > r31 at 0xffffd708, pc at 0xffffd710, lr at 0xffffd710 So misaligned (multiple of 4 but of no higher power of 2) for "frame = at", "called by frame at", "Arglist", "Locals", and "Previous frame's = sp" (which is listed as the same as "frame at"). For (B) "on a clang 3.8.0 buildworld and gcc 4.2.1 buildkernel mix". . . B0) For gcc421-a.out gets signal delivery to its handler: "info frame" = in this (B) context: This *has* a misaligned signal delivery stack and there *is* a = segmentation fault. > Program received signal SIGINFO, Information request. >=20 > Breakpoint 1, 0x018006e0 in handler () > (gdb) bt > #0 0x018006e0 in handler () > #1 > #2 0x00000000 in ?? () > (gdb) info frame > Stack level 0, frame at 0xffffd74c: > pc =3D 0x18006e0 in handler; saved pc =3D 0xffffe008 > called by frame at 0xffffd74c > Arglist at 0xffffd70c, args:=20 > Locals at 0xffffd70c, Previous frame's sp is 0xffffd74c > Saved registers: > r31 at 0xffffd748, pc at 0xffffd750, lr at 0xffffd750 > (gdb) cont > Continuing. >=20 > Program received signal SIGSEGV, Segmentation fault. > 0x419a89c8 in memcpy (dst0=3D0xffffd714, src0=3D, = length=3D) at /usr/src/lib/libc/string/bcopy.c:124 > warning: Source file is more recent than executable. > 124 TLOOP1(*--dst =3D *--src); B1) For clang380-a.out gets signal delivery to its handler: "info frame" = in this (B) context: (i.e., what I originally reported on and submitted a Bug report for) This *has* a misaligned signal delivery stack and there *is* a = segmentation fault. > Program received signal SIGINFO, Information request. >=20 > Breakpoint 1, 0x018006d0 in handler () > (gdb) info frame > Stack level 0, frame at 0xffffd71c: > pc =3D 0x18006d0 in handler; saved pc =3D 0xffffe008 > called by frame at 0xffffd71c > Arglist at 0xffffd6dc, args:=20 > Locals at 0xffffd6dc, Previous frame's sp is 0xffffd71c > Saved registers: > r31 at 0xffffd718, pc at 0xffffd720, lr at 0xffffd720 > (gdb) cont > Continuing. >=20 > Program received signal SIGSEGV, Segmentation fault. > 0x419a89c8 in memcpy (dst0=3D0xffffd6f4, src0=3D, = length=3D) at /usr/src/lib/libc/string/bcopy.c:124 > warning: Source file is more recent than executable. > 124 TLOOP1(*--dst =3D *--src); So misaligned (multiple of 4 but of no higher power of 2) for "frame = at", "called by frame at" (which is listed as the same as "frame at"), = "Arglist", "Locals", and "Previous frame's sp" (which is listed as the = same as "frame at"). More context notes. . . The "pure gcc 4.2.1 buildworld/buildkernel system" has: # freebsd-version -ku; uname -aKU 11.0-CURRENT 11.0-CURRENT FreeBSD FBSDG4C0 11.0-CURRENT FreeBSD 11.0-CURRENT #5 r294960M: Wed Jan = 27 18:25:04 PST 2016 = root@FBSDG4C0:/usr/obj/gcc421/powerpc.powerpc/usr/src/sys/GENERICvtsc-NODE= BUG powerpc 1100097 1100097 The "clang 3.8.0 buildworld and gcc 4.2.1 buildkernel mix" has: # freebsd-version -ku; uname -aKU 11.0-CURRENT 11.0-CURRENT FreeBSD FBSDG4C1 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r294962M: Fri Jan = 29 18:28:17 PST 2016 = markmi@FreeBSDx64:/usr/obj/clang_gcc421/powerpc.powerpc/usr/src/sys/GENERI= Cvtsc-NODEBUG powerpc 1100097 1100097 (Same PowerMac, different SSD.) [I have renamed a.out's to indicate compiler context as I've gone = along.] [I copied each a.out to the other SSD for use after compiling/linking.] [I'm not generally showing the "direct call" properly aligned "info = frame" texts.] [handle SIGINFO nostop print pass; break handler used in gdb 7.10_5.] [For gcc 4.2.1 I used: gcc -std=3Dc99 -Wall sig_snprintf_use_test.c .] [For clang 3.8.0 I used: clang -std=3Dc11 -Wall -Wpedantic = sig_snprintf_use_test.c .] =3D=3D=3D Mark Millard markmi at dsl-only.net On 2016-Jan-31, at 6:32 PM, Mark Millard wrote: > [I've never noticed gcc 4.2.1 generating code that was based on = presuming the alignment was present. For example: it always seems to use = addition to deal with address offsets, never masking. So I'd not expect = to see segmentation faults for that context even when the stack is = aligned modulo only 4. Separately checking the alignment is appropriate = for me to do.] >=20 > A) The reported context: >=20 > The kernel context here is a gcc 4.2.1 based buildkernel then = installkernel. > The world context here is a clang 3.8.0 based buildworld then = installworld. > The program context here is a clang 3.8.0 based: >=20 >> # clang -std=3Dc11 -Wall -Wpedantic sig_snprintf_use_test.c >> # /usr/local/bin/gdb a.out >=20 >=20 > Using "break handler" in gdb (7.10_5) and using "info frame" when it = stops for the "raise" shows the misalignment of the frame that the = handler was given ny the signal delivery. >=20 > By contrast the earlier direct call of the handler gets a "info frame" = result that shows the expected sort of alignment. >=20 > I find no evidence of frame/stack misalignment via gdb except for the = one that is created by the signal delivery. >=20 >=20 > B) I'll look at trying one or more of gcc 4.2.1, gcc49, gcc5 for the = program context, still based on a clang 3.8.0 buildworld and gcc 4.2.1 = buildkernel based on projects/clang380-import (-r294962). >=20 > C) I will look at trying the same program builds on a pure gcc 4.2.1 = buildworld/buildkernel context. (Likely 11.0-CURRENT -r294960.) >=20 >=20 > I'll send more results when I have them. >=20 >=20 =3D=3D=3D Mark Millard markmi at dsl-only.net On 2016-Jan-31, at 5:50 PM, Justin Hibbits = wrote: Does this occur with gcc-built world and/or kernel? You could put some = printf()s in sendsig(), and there are KTR tracepoints already present. = The code assumes a fully aligned user stack, which should be correct, = but may not be. - Justin On Jan 31, 2016, at 6:41 PM, Mark Millard wrote: > I have submitted Bug 206810 for this 11.0-CURRENT/clang380-import = stack alignment problem for TARGET_ARCH=3Dpowerpc signal delivery. >=20 > =3D=3D=3D > Mark Millard > markmi at dsl-only.net >=20 > On 2016-Jan-31, at 6:08 AM, Roman Divacky = wrote: >=20 > Fwiw, LLVM expect 16B aligned stack on PowerPC. >=20 > On Sun, Jan 31, 2016 at 05:55:20AM -0800, Mark Millard wrote: >> 3 quick FreeBSD for powerpc (32-bit) questions: >>=20 >>=20 >> A) For PowerPC (32-bit) what is the stack alignment requirement by = the ABI(s) that FreeBSD targets? >>=20 >> B) Are signal handlers supposed to be given that alignment? >>=20 >>=20 >> I ask because signal handlers are at times begin given just 4-byte = alignment but clang 3.8.0 powerpc's code generation can depend on the = alignment being more than 4. >>=20 >> clang 3.8.0 can calculate addresses by, for example, masking in a 0x4 = relative to what would need to be an aligned address with alignment 8 or = more instead of adding 0x4 to a more arbitrary address. >>=20 >> So far I've only seen less than 8 byte stack alignment via signal = handler activity. >>=20 >>=20 >> C) Which should be blamed for problems here: clang's code generation, = FreeBSD's stack alignment handling for signals, or both? >>=20 >> =3D=3D=3D >> Mark Millard >> markmi at dsl-only.net >>=20 >> _______________________________________________ >> freebsd-toolchain@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain >> To unsubscribe, send any mail to = "freebsd-toolchain-unsubscribe@freebsd.org" >=20