Skip site navigation (1)Skip section navigation (2)
Date:      27 Oct 2000 17:27:06 -0400
From:      Randell Jesup <rjesup@wgate.com>
To:        Michel Talon <michel@lpthe.jussieu.fr>
Cc:        "freebsd-stable@FreeBSD.ORG" <freebsd-stable@FreeBSD.ORG>
Subject:   Re: "Malloc type lacks magic" show-stopper solved
Message-ID:  <ybu66meotzp.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net>
In-Reply-To: Michel Talon's message of "Fri, 27 Oct 2000 09:28:41 %2B0200"
References:  <20001026231134.D9391@dragon.nuxi.com> <Pine.LNX.4.21.0010271913050.16125-100000@vimfuego.saarinen.org> <20001027092841.B394@lpthe.jussieu.fr>

next in thread | previous in thread | raw e-mail | index | archive | help
Michel Talon <michel@lpthe.jussieu.fr> writes:
>> > WHY!?!?!?  Just what the heck do you think you're achieving with -O3 plus
>> > all those things?  Have you *ever* profiled anything you're compiling
>> > with these options?  Note that -O3 is not necessarily faster code than -O.
>> > 
>> > This seems Yet Another "I'm macho" compiler flags instance.
>> > Please correct me if I'm wrong.

>Kernel code is simple with essentially no computations (except of course 
>special domains like crypto in kernel). So there is no much room for
>optimizations.

        Sure there is - just not much for things like loop unrolling, etc.
Admittedly this isn't as large on an x86 as on processors with more
registers, but it's still true, especially for instruction scheduling for
today's superscalar CPU cores (in some ways, it actually matters more for
things like PII's than for Athlon/Duron).  Removing frame pointers for
example can save a lot of memory traffic, as can letting the compiler
optimize away or merge locals.

        _Measuring_ the speed of a kernel is tougher, since many operations
are IO's.  Also, high call overheads can swap apparent differences.

> Recently i have timed a scientific program to see the
>performances of my brand new PC. Here is what i found:
>Without any optimization the program runs 2 times slower. With
>-O -O2 -O3 -Os the times are similar, the fastest was -O the slowest
>was -Os. Since my PC is Duron based i have tried -march things, and have
>compared on a pentium machine. Result, almost nothing, except -march pentium
>was slower than -march k6 on the Duron as could be expected. All differences
>are small, no more than 2s on a 30s computation. As you can see nothing that
>counterbalances the risk of bugs.

        This depends a lot on the program.  Many programs will show
improvement from -O2/-O3, but not all.  Adding some -fxxxx options can get
more.  I've seen >10% improvements.

        Bugs with optimizers are by far most common with code that's
banging HW registers (i.e. drivers and some kernel code).  I've rarely seen
userland programs harmed by aggressive optimization levels.  (They can make
source debugging hard.)

>To illustrate this, i have some years ago tested a scientific program on an
>alpha machine running linux. Compiled with gcc and the best optimizations it
>runned 7 times slower than compiled with Digital compiler. Conclude by
>yourself.

        Gcc isn't optimized for numeric codes.  Dec's compiler most
certainly was.

-- 
Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94)
rjesup@wgate.com



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ybu66meotzp.fsf>