From owner-freebsd-bugs@FreeBSD.ORG Thu Apr 4 19:39:01 2013 Return-Path: Delivered-To: freebsd-bugs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D67E18E8; Thu, 4 Apr 2013 19:39:01 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au [211.29.133.51]) by mx1.freebsd.org (Postfix) with ESMTP id 5E49D9BB; Thu, 4 Apr 2013 19:39:00 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r34Jcpfb021283 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 5 Apr 2013 06:38:53 +1100 Date: Fri, 5 Apr 2013 06:38:51 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Brian Demsky Subject: Re: misc/177624: Swapcontext can get compiled incorrectly In-Reply-To: Message-ID: <20130405044424.Y2557@besplex.bde.org> References: <201304040232.r342WFTC020054@red.freebsd.org> <20130404232206.S1025@besplex.bde.org> <20130405011027.Y1350@besplex.bde.org> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-627049122-1365104331=:2557" X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Ov0XUFDt c=1 sm=1 a=HSHldBqzbsoA:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=cK7K0dqV01sA:10 a=GIPX8_9pfktIhx6qWUQA:9 a=45ClL6m2LaAA:10 a=5AVzPEotF6fAmyEr:21 a=YlQvQjP_rWJMK-Re:21 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: freebsd-bugs@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org, Bruce Evans X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Apr 2013 19:39:01 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-627049122-1365104331=:2557 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Thu, 4 Apr 2013, Brian Demsky wrote: > On Apr 4, 2013, at 8:16 AM, Bruce Evans wrote: > >> On Fri, 5 Apr 2013, Bruce Evans wrote: >> >>> On Thu, 4 Apr 2013, Brian Demsky wrote: >>> >>>>> Description: >>>> Here is the code for swap context: >>>> int >>>> swapcontext(ucontext_t *oucp, const ucontext_t *ucp) >>>> { >>>> int ret; >>>> if ((oucp =3D=3D NULL) || (ucp =3D=3D NULL)) { >>>> errno =3D EINVAL; >>>> return (-1); >>>> } >>>> oucp->uc_flags &=3D ~UCF_SWAPPED; >>>> ret =3D getcontext(oucp); >>>> if ((ret =3D=3D 0) && !(oucp->uc_flags & UCF_SWAPPED)) { >>>> oucp->uc_flags |=3D UCF_SWAPPED; >>>> ret =3D setcontext(ucp); >>>> } >>>> return (ret); >>>> } >>> >>>> On the OS X port of libc in Mac OSX 10.7.5, this gets compiled as: >>> >>>> ... >>>> 0x00007fff901e870b : pop %rbx >>>> 0x00007fff901e870c : pop %r14 >>>> 0x00007fff901e870e : jmpq 0x7fff90262855 >>>> The problem is that rbx is callee saved by compiled version of swapcon= text and then reused before getcontext is called. Getcontext then stores t= he wrong value for rbx and setcontext later restores the wrong value for rb= x. If the caller had any value in rbx, it has been trashed at this point. >>> >>> Later you wrote: >>> >>>> The analysis is a little wrong about the problem. Ultimately, the tai= l call to set context trashes the copies of bx and r14 on the stack=85. >>> >>> The bug seems to be in setcontext(). It must preserve the callee-saved >>> registers, not restore them. This would happen automatically if more >>> were written in C. But setcontext() can't be written entirely in C, >>> since it must save all callee-saved registers including ones not used >>> and therefore not normally saved by any C function that it might be in, >>> and possibly also including callee-saved registers for nonstandard or >>> non-C ABIs. In FreeBSD, it is apparently always a syscall. >> >> This is more than a little wrong. When setcontext() succeeds, it >> doesn't return here. Then it acts like longjmp() and must restore all >> the callee-saved to whatever they were when getcontext() was called. >> Otherwise, it must not clobber any callee-saved registers (then it >> differs from longjmp(). longjmp() just can't fail). >> >> Now I don't see any bug here. If the saved state is returned to, then >> it is as if getcontext() returned, and the intermediately-saved %rbx >> is correct (we will restore the orginal %rbx if we return). If >> setcontext() fails, then it should preserve all callee-saved registers. >> In the tail-call case, we have already restored the orginal %rbx and >> the failing setcontext() should preserve that. >> >> Bruce > > Take at setcontext: > > (gdb) disassemble setcontext > Dump of assembler code for function setcontext: > 0x00007fff90262855 : push %rbx > 0x00007fff90262856 : lea 0x38(%rdi),%rbx > 0x00007fff9026285a : cmp 0x30(%rdi),%rbx > 0x00007fff9026285e : je 0x7fff90262864 > 0x00007fff90262860 : mov %rbx,0x30(%rdi) > 0x00007fff90262864 : mov 0x4(%rdi),%edi > 0x00007fff90262867 : callq 0x7fff90262998 > 0x00007fff9026286c : mov %rbx,%rdi > 0x00007fff9026286f : pop %rbx > 0x00007fff90262870 : jmpq 0x7fff90262875 <_setcontex= t> > End of assembler dump. > > The stack from swapcontext had rbx and r14 popped after getcontext stored= everything. Now we push rbx and then later call sigsetmask. Those two ac= tions guarantee that the memory locations where rbx and r14 were on the sta= ck have been overwritten. When we later return to the save context, it wil= l start up swapcontext and pop the wrong values off of the stack for rbx an= d r14. Ah, it is not really rbx and r14, but rsp and the whole stack frame of swapcontext() that are mishandled. Even returning from swapcontext() leaves the saved rsp pointing to garbage. The stack frame could have had almost anything on it before it became invalid, but here it has mainly the saved rbx and r14 (not rbp; however, when compiled by clang on FreeBSD, it also has the saved rbp, and when compiled with -O0 it also has the local variable). Now I think swapcontext() can't be written in C, for the same reason that setjmp() can't be written in C -- the stack frame cannot be controlled in C, and if it has anything at all on it (even the return address requires special handling), then the stack pointer saved in the context becomes invalid when the function returns, or even earlier for tail calls and other optimizations. Also, if the C function calls another function like the library getcontext(), then there are 2 stack frames below the saved stack pointer that are hard to control. In FreeBSD on at least x86, getcontext() is a special non-automatically generated asm function to control this. The automatically generated asm function would have only the return address in its stack frame, but even this is too much, and there is even more to control. The comment about this is incomplete/ partly wrong, but gives a good hint about the problem (*). This problem is avoided in setjmp() and longjmp() by not leaving anything on stack frames except the return address for setjmp() at the point where the stack pointer is saved. The saved stack pointer is still technically invalid, since it points to the return address which goes away when setjmp() returns. This is handled by not returning in the usual way in lonjmp(). Instead, setjmp() saves the return address and longjmp() restores it; the restored stack pointer becomes valid only at the end of setjmp() when the top word on it is restored. FreeBSD getcontext() uses a more hackish method (*). The saved stack pointer becomes more than technically invalid if the function that called setjmp() or swapcontext() returns. Then the caller's frame becomes invalid. Such returns are invalid. This restriction is clearly documented for setjmp() but not for swapcontext() in FreeBSD man pages and old C99 and POSIX specs. The C99 restriction is only that longjmp() must not be invoked with the saved state after the function that saved the state using setjmp() returns. Compilers must know about this and and not do optimizations (like tail calls?) that would invalidate the saved stack pointer before the function returns. (*) Here is the FreeBSD i386 getcontext(): @ /* @ * This has to be magic to handle the multiple returns. Multiple =3D just 2. @ * Otherwise, the setcontext() syscall will return here and we'll @ * pop off the return address and go to the *setcontext* call. @ */ Actually, we would pop off garbage and go to neverland, with a lower chance of going to setcontext than most places. @ =09.weak=09_getcontext @ =09.set=09_getcontext,__sys_getcontext @ =09.weak=09getcontext @ =09.set=09getcontext,__sys_getcontext @ ENTRY(__sys_getcontext) @ =09movl=09(%esp),%ecx=09/* save getcontext return address */ @ =09mov=09$SYS_getcontext,%eax @ =09KERNCALL @ =09jb=09HIDENAME(cerror) When getcontext() fails, the return address on the stack is still valid and cerror depends on that. @ =09addl=09$4,%esp=09=09/* remove stale (setcontext) return address */ Actually, remove the non-stale (our getcontext) return address if the return is not from setcontext, and remove stack garbage if the return is from setcontext. The stack contents is not related to setcontext in either case. In the garbage case, it started as the return address for another getcontext, but became garbage when that returned. @ =09jmp=09*%ecx=09=09/* restore return address */ We want our return address in both cases. We don't know which case applies and use same code for both. The comment is imprecise. We don't restore the return address. What we do is return. The code should be changed to match the comment (don't adjust %esp, but actually restore the return address to it). This method is used without comment by FreeBSD x86 longjmp()). Optimizing this for speed is unimportant, but this is probably faster as well as cleaner. The jmp may have been faster 20 years ago, but now it unbalances call/return branch prediction. The libc swapcontext() can probably be fixed by copying setjmp()/longjmp(). Except setjmp()/longjmp() has fundamentally broken stack handling too, at least when setjmp() is actually sigsetjmp(). Then it is necessary to restore the stack pointer atomically with restoring the signal mask, and this seems to be impossible with using a single syscall that does both. The 2 different orders of restoring them give different problems. In the above, you seem to have setcontext() in userland, with some signal unmasking, so I think it has the same races as setjmp()/longjmp(). Perhaps an atomic syscall for setcontext() is enough. Bruce --0-627049122-1365104331=:2557--