Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 16 Dec 2000 12:13:32 -0800
From:      Bakul Shah <bakul@bitblocks.com>
To:        Marc Tardif <intmktg@CAM.ORG>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: syscall assembly 
Message-ID:  <200012162013.PAA14008@marlborough.cnchost.com>
In-Reply-To: Your message of "Fri, 15 Dec 2000 14:34:10 EST." <Pine.LNX.4.10.10012151418570.20060-100000@Gloria.CAM.ORG> 

next in thread | previous in thread | raw e-mail | index | archive | help
Marc sent me this:
> > > > 	pushl %ebp
> > > > 	movl %esp,%ebp
> > > > 	subl $8,%esp
> > > > 
> > > This might not be of interest to the rest of the mailing list
> > > but what is the purpose of the subl instruction used before
> > > calling functions? Is that where the return value is retrieved
> > > from, instead of using the %eax register as would Linux?
> > 
> > This is to keep the stack alignment to 16 bytes.  Recall that
> > a call will push the return address on the stack and the
> > frame pointer (%ebp) pushed so now we have 8 bytes on the
> > stack.  If the stack was aligned before the call, we need to
> > further adjust it by 8 more bytes so that after the procedure
> > prolog it is once again aligned on a 16 byte boundary.
> > 
> [ snip ]
> 
> Consider the following code debugged with gdb:
> int func() {
>   return 1;
> }
> int main() {
>   return func();
> }
> 
> # gcc -g func.c
> # gdb a.out
> (gdb) display/x $sp
> (gdb) display/i $pc
> (gdb) break *&main + 3
> (gdb) run
> Breakpoint 1, 0x804848c in main () at func.c:3
> 3       }
> 2: x/i $eip  0x804848f <main+3>:        sub    $0x8,%esp
> 1: /x $esp = 0xbfbff820
> (gdb) si
> 5         return func();
> 2: x/i $eip  0x8048492 <main+6>:        call   0x804847c <func>
> 1: /x $esp = 0xbfbff818
> 
> Oddly, it seems to me the stack top (pointed to by %esp)
> is aligned _before_ the sub instruction. And then, this
> instruction unaligns the stack by $0x8. How does this
> make sense?

May be people who know more about gcc will explain this
better but I will speculate in any case!  Assuming that 16
byte alignment actually helps, it would make sense to have
either
    a) the local frame start at 16 byte boundary, or
    b) the args start at a 16 byte boundary

The goal is to minimize the number of cache lines that
need to be fetched.  You want the *first free location*
to be on a 16 byte boundary (where either the args start
or the local frame starts).  What Marc observed seems to
point to a) -- the first free location is on a 16 byte
boundary _after_ the procedure prolog (push %ebp).  This
is where you start allocating locals.

gcc seems to put an additional restriction in that even args
start at a 16 byte boundary.  This seems unnecessary.  It
should do either a) or b) but not both.  If you think of args
to a called function as belonging to the caller's frame then
a) is what makes sense.  But if you want tail call optimization
(like I do), you'd want args to be part of the callee's frame
since in this case the caller's frame is *replaced* by the
callee's (since you never return to the caller you can throw
away his frame prior to the call but args to the callee must
remain).  In this case the frame pointer %ebp points in the
middle of a frame but the frame starts with args.

But I still question this optimization.  Are there any stats
on whether this 16 byte aligning improves performance?  it
certainly increases space use!

-- bakul


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200012162013.PAA14008>