Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 09 Feb 2003 18:12:13 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Marcin Dalecki <mdcki@gmx.net>
Cc:        David Schultz <dschultz@uclink.berkeley.edu>, Adrian Chadd <adrian@freebsd.org>, Ray Kohler <ataraxia@cox.net>, freebsd-current@freebsd.org
Subject:   Re: Compiling with high optimization?
Message-ID:  <3E470A7D.D7D1EAC3@mindspring.com>
References:  <20030208173756.GA56030@arkadia.nv.cox.net> <20030208232724.GA20435@HAL9000.homeunix.com> <3E459BF3.BB3FC381@mindspring.com> <20030209002542.GA20812@HAL9000.homeunix.com> <20030209141006.GB33928@skywalker.creative.net.au> <20030209150120.GA2263@HAL9000.homeunix.com> <3E4671E6.8090000@gmx.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Marcin Dalecki wrote:
> David Schultz wrote:
> > Strangely, gcc in FreeBSD 5.0 actually generates *slower* code
> > when compiling for more recent architectures than when compiling
> > for a 386.  I don't know whether that is a bug in gcc or whether
> > gcc is using some fancy feature like SSE that the kernel handles
> > poorly on context switches.  I think there was some discussion on
> > the lists about it earlier.
> 
> The reason is that the optimization done by GCC are ill balanced.
> All the scheduling of instractions and what a not - which would be
> fine on a micro scope level is causing so much higher pressure
> on the CPUs caches that the code is actually loosing.

That's not actually it, though there *are* instruction scheduling
issues that will impact the Pentium 4 code generation, and other
Intel processor-specific code generation, mostly L1 caches have
been, relative to the size of main memory, been getting much, much
larger.

Intel has written an article on "How to generate optimized code for
Pentium 4 processors".  It has been posted to these lists a couple
of times already, and you can search it out on Intel's site, if you
care to.

For the Pentium 4, the article identifies a shopping list of things
that you are "not supposed to do", which GCC does.

Actually, cache pressure is the least of them.

If FreeBSD would cache line align locks and mutexes, and not put
them in the same cache lines (very hard to do, for some structures),
most of the so-called "cache pressure" could be made to "go away".
IBM recently posted an article comparing performance numbers for
Linux with and without this change.  Realize, though, that FreeBSD
and Linux have somewhat different philosophies when it comes to SMP,
even if that's hard to tell from the lack of detailed implementation
plans being published by either camp.

If the ability to optimize code for the Pentium 4 concerns you, then
you should become a contributor to the GCC project, which means you
need to execute a notarized assignment of rights statement with the
FSF before they will accept patches from you, and once that's done,
you can start going down Intel's optimization laundry list, sending
patches to the GCC folks.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E470A7D.D7D1EAC3>