From owner-freebsd-bugs@FreeBSD.ORG  Thu Apr  4 19:39:01 2013
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
Delivered-To: freebsd-bugs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id D67E18E8;
 Thu,  4 Apr 2013 19:39:01 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au
 [211.29.133.51]) by mx1.freebsd.org (Postfix) with ESMTP id 5E49D9BB;
 Thu,  4 Apr 2013 19:39:00 +0000 (UTC)
Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au
 (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106])
 by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r34Jcpfb021283
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Fri, 5 Apr 2013 06:38:53 +1100
Date: Fri, 5 Apr 2013 06:38:51 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Brian Demsky <bdemsky@uci.edu>
Subject: Re: misc/177624: Swapcontext can get compiled incorrectly
In-Reply-To: <CF93B187-2005-46EA-B581-2FB10EFDD55E@uci.edu>
Message-ID: <20130405044424.Y2557@besplex.bde.org>
References: <201304040232.r342WFTC020054@red.freebsd.org>
 <20130404232206.S1025@besplex.bde.org> <20130405011027.Y1350@besplex.bde.org>
 <CF93B187-2005-46EA-B581-2FB10EFDD55E@uci.edu>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="0-627049122-1365104331=:2557"
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=Ov0XUFDt c=1 sm=1 a=HSHldBqzbsoA:10
 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=cK7K0dqV01sA:10
 a=GIPX8_9pfktIhx6qWUQA:9 a=45ClL6m2LaAA:10 a=5AVzPEotF6fAmyEr:21
 a=YlQvQjP_rWJMK-Re:21 a=TEtd8y5WR3g2ypngnwZWYw==:117
Cc: freebsd-bugs@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org,
 Bruce Evans <brde@optusnet.com.au>
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-bugs>,
 <mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
 <mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Apr 2013 19:39:01 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-627049122-1365104331=:2557
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Thu, 4 Apr 2013, Brian Demsky wrote:

> On Apr 4, 2013, at 8:16 AM, Bruce Evans wrote:
>
>> On Fri, 5 Apr 2013, Bruce Evans wrote:
>>
>>> On Thu, 4 Apr 2013, Brian Demsky wrote:
>>>
>>>>> Description:
>>>> Here is the code for swap context:
>>>> int
>>>> swapcontext(ucontext_t *oucp, const ucontext_t *ucp)
>>>> {
>>>>     int ret;
>>>>     if ((oucp =3D=3D NULL) || (ucp =3D=3D NULL)) {
>>>>             errno =3D EINVAL;
>>>>             return (-1);
>>>>     }
>>>>     oucp->uc_flags &=3D ~UCF_SWAPPED;
>>>>     ret =3D getcontext(oucp);
>>>>     if ((ret =3D=3D 0) && !(oucp->uc_flags & UCF_SWAPPED)) {
>>>>             oucp->uc_flags |=3D UCF_SWAPPED;
>>>>             ret =3D setcontext(ucp);
>>>>     }
>>>>     return (ret);
>>>> }
>>>
>>>> On the OS X port of libc in Mac OSX 10.7.5, this gets compiled as:
>>>
>>>> ...
>>>> 0x00007fff901e870b <swapcontext+89>:    pop    %rbx
>>>> 0x00007fff901e870c <swapcontext+90>:    pop    %r14
>>>> 0x00007fff901e870e <swapcontext+92>:    jmpq   0x7fff90262855 <setcont=
ext>
>>>> The problem is that rbx is callee saved by compiled version of swapcon=
text and then reused before getcontext is called.  Getcontext then stores t=
he wrong value for rbx and setcontext later restores the wrong value for rb=
x. If the caller had any value in rbx, it has been trashed at this point.
>>>
>>> Later you wrote:
>>>
>>>> The analysis is a little wrong about the problem.  Ultimately, the tai=
l call to set context trashes the copies of bx and r14 on the stack=85.
>>>
>>> The bug seems to be in setcontext().  It must preserve the callee-saved
>>> registers, not restore them.  This would happen automatically if more
>>> were written in C.  But setcontext() can't be written entirely in C,
>>> since it must save all callee-saved registers including ones not used
>>> and therefore not normally saved by any C function that it might be in,
>>> and possibly also including callee-saved registers for nonstandard or
>>> non-C ABIs.  In FreeBSD, it is apparently always a syscall.
>>
>> This is more than a little wrong.  When setcontext() succeeds, it
>> doesn't return here.  Then it acts like longjmp() and must restore all
>> the callee-saved to whatever they were when getcontext() was called.
>> Otherwise, it must not clobber any callee-saved registers (then it
>> differs from longjmp().  longjmp() just can't fail).
>>
>> Now I don't see any bug here.  If the saved state is returned to, then
>> it is as if getcontext() returned, and the intermediately-saved %rbx
>> is correct (we will restore the orginal %rbx if we return).  If
>> setcontext() fails, then it should preserve all callee-saved registers.
>> In the tail-call case, we have already restored the orginal %rbx and
>> the failing setcontext() should preserve that.
>>
>> Bruce
>
> Take at setcontext:
>
> (gdb) disassemble setcontext
> Dump of assembler code for function setcontext:
> 0x00007fff90262855 <setcontext+0>:      push   %rbx
> 0x00007fff90262856 <setcontext+1>:      lea    0x38(%rdi),%rbx
> 0x00007fff9026285a <setcontext+5>:      cmp    0x30(%rdi),%rbx
> 0x00007fff9026285e <setcontext+9>:      je     0x7fff90262864 <setcontext=
+15>
> 0x00007fff90262860 <setcontext+11>:     mov    %rbx,0x30(%rdi)
> 0x00007fff90262864 <setcontext+15>:     mov    0x4(%rdi),%edi
> 0x00007fff90262867 <setcontext+18>:     callq  0x7fff90262998 <sigsetmask=
>
> 0x00007fff9026286c <setcontext+23>:     mov    %rbx,%rdi
> 0x00007fff9026286f <setcontext+26>:     pop    %rbx
> 0x00007fff90262870 <setcontext+27>:     jmpq   0x7fff90262875 <_setcontex=
t>
> End of assembler dump.
>
> The stack from swapcontext had rbx and r14 popped after getcontext stored=
 everything.  Now we push rbx and then later call sigsetmask.  Those two ac=
tions guarantee that the memory locations where rbx and r14 were on the sta=
ck have been overwritten.  When we later return to the save context, it wil=
l start up swapcontext and pop the wrong values off of the stack for rbx an=
d r14.

Ah, it is not really rbx and r14, but rsp and the whole stack frame of
swapcontext() that are mishandled.  Even returning from swapcontext()
leaves the saved rsp pointing to garbage.  The stack frame could have
had almost anything on it before it became invalid, but here it has mainly
the saved rbx and r14 (not rbp; however, when compiled by clang on FreeBSD,
it also has the saved rbp, and when compiled with -O0 it also has the
local variable).

Now I think swapcontext() can't be written in C, for the same reason that
setjmp() can't be written in C -- the stack frame cannot be controlled in
C, and if it has anything at all on it (even the return address requires
special handling), then the stack pointer saved in the context becomes
invalid when the function returns, or even earlier for tail calls and
other optimizations.  Also, if the C function calls another function like
the library getcontext(), then there are 2 stack frames below the saved
stack pointer that are hard to control.  In FreeBSD on at least x86,
getcontext() is a special non-automatically generated asm function to
control this.  The automatically generated asm function would have
only the return address in its stack frame, but even this is too much,
and there is even more to control.  The comment about this is incomplete/
partly wrong, but gives a good hint about the problem (*).

This problem is avoided in setjmp() and longjmp() by not leaving anything
on stack frames except the return address for setjmp() at the point where
the stack pointer is saved.  The saved stack pointer is still technically
invalid, since it points to the return address which goes away when
setjmp() returns.  This is handled by not returning in the usual way in
lonjmp().  Instead, setjmp() saves the return address and longjmp()
restores it; the restored stack pointer becomes valid only at the end
of setjmp() when the top word on it is restored.  FreeBSD getcontext()
uses a more hackish method (*).

The saved stack pointer becomes more than technically invalid if the
function that called setjmp() or swapcontext() returns.  Then the
caller's frame becomes invalid.  Such returns are invalid.  This
restriction is clearly documented for setjmp() but not for swapcontext()
in FreeBSD man pages and old C99 and POSIX specs.  The C99 restriction
is only that longjmp() must not be invoked with the saved state after
the function that saved the state using setjmp() returns.  Compilers
must know about this and and not do optimizations (like tail calls?)
that would invalidate the saved stack pointer before the function
returns.

(*) Here is the FreeBSD i386 getcontext():

@ /*
@  * This has to be magic to handle the multiple returns.

Multiple =3D just 2.

@  * Otherwise, the setcontext() syscall will return here and we'll
@  * pop off the return address and go to the *setcontext* call.
@  */

Actually, we would pop off garbage and go to neverland, with a lower
chance of going to setcontext than most places.

@ =09.weak=09_getcontext
@ =09.set=09_getcontext,__sys_getcontext
@ =09.weak=09getcontext
@ =09.set=09getcontext,__sys_getcontext
@ ENTRY(__sys_getcontext)
@ =09movl=09(%esp),%ecx=09/* save getcontext return address */
@ =09mov=09$SYS_getcontext,%eax
@ =09KERNCALL
@ =09jb=09HIDENAME(cerror)

When getcontext() fails, the return address on the stack is still valid
and cerror depends on that.

@ =09addl=09$4,%esp=09=09/* remove stale (setcontext) return address */

Actually, remove the non-stale (our getcontext) return address if the
return is not from setcontext, and remove stack garbage if the
return is from setcontext.  The stack contents is not related to
setcontext in either case.  In the garbage case, it started as the
return address for another getcontext, but became garbage when that
returned.

@ =09jmp=09*%ecx=09=09/* restore return address */

We want our return address in both cases.  We don't know which case
applies and use same code for both.  The comment is imprecise.  We
don't restore the return address.  What we do is return.

The code should be changed to match the comment (don't adjust %esp,
but actually restore the return address to it).  This method is used
without comment by FreeBSD x86 longjmp()).  Optimizing this for speed
is unimportant, but this is probably faster as well as cleaner.  The
jmp may have been faster 20 years ago, but now it unbalances call/return
branch prediction.

The libc swapcontext() can probably be fixed by copying
setjmp()/longjmp().  Except setjmp()/longjmp() has fundamentally broken
stack handling too, at least when setjmp() is actually sigsetjmp().
Then it is necessary to restore the stack pointer atomically with
restoring the signal mask, and this seems to be impossible with using
a single syscall that does both.  The 2 different orders of restoring
them give different problems.  In the above, you seem to have
setcontext() in userland, with some signal unmasking, so I think it
has the same races as setjmp()/longjmp().  Perhaps an atomic syscall
for setcontext() is enough.

Bruce
--0-627049122-1365104331=:2557--