Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 06 Aug 2002 14:48:33 -0700
From:      Peter Wemm <peter@wemm.org>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Dan Nelson <dnelson@allantgroup.com>, Terry Lambert <tlambert2@mindspring.com>, Darren Pilgrim <dmp@pantherdragon.org>, Jason Andresen <jandrese@mitre.org>, Dmitry Morozovsky <marck@rinet.ru>, hackers@FreeBSD.ORG
Subject:   Re: -fomit-frame-pointer for the world build 
Message-ID:  <20020806214833.705D32A7D6@canning.wemm.org>
In-Reply-To: <200208062050.g76Ko0AO015075@apollo.backplane.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
Matthew Dillon wrote:
> :I thought the main thing you got out of -fomit-frame-pointer was a free
> :register, which is a scarce commodity on x86.
> :
> :-- 
> :	Dan Nelson
> :	dnelson@allantgroup.com
> 
>     I've done considerable testing of -fomit-frame-pointer and it really
>     only has an effect if the program makes lots (millions) of calls to tiny,
>     fast procedures.  This is because the push %ebp; movl %esp,%ebp is
>     removed from the beginning of the procedure and the 'leave' is removed
>     from the end of the procedure (though an addl to restore %esp has to be
>     added in).
> 
>     GCC does not seem to be able to make use of the extra register, or if
>     it does it does not seem to be able to use it to any great degree.
>     The IA32 architecture has 6 general registers available to it 
>     (eax, edx, ecx, ebx, esi, edi).  Throwing in ebp would not make a huge
>     difference, nor can 8 and 16 bit specifications (e.g. %al, %ah) be
>     mixed together safely without a severe performance penalty on higher
>     end cpus.

The %al/%ah thing is a particular problem on Intel cpus, from pentium pro
onwards (ppro, p2, p3) and is particularly bad on the pentium4.  This is
the so-called 'partial register stall'.  Athlon cpus do not have this
vulnerability.

Far more speed benefit can be had by setting -mcpu/-march/-mtune
*correctly* than things like -fomit-frame-pointer will do.  For example, 32
bit multiply is REALLY slow on i386 (our default target until recently) so
gcc will try and "optimize" out multiplies by converting them to shift/add.
Of course, this turns out to usually be slower on pentium and above. :-]

Cheers,
-Peter
--
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020806214833.705D32A7D6>