Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Feb 2004 06:43:24 +1100
From:      Peter Jeremy <peter.jeremy@alcatel.com.au>
To:        Petri Helenius <pete@he.iki.fi>
Cc:        freebsd-alpha@freebsd.org
Subject:   Re: Bad performance on alpha? (make buildworld)
Message-ID:  <20040225194324.GI10121@gsmx07.alcatel.com.au>
In-Reply-To: <403C6A24.80804@he.iki.fi>
References:  <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> <20040225025953.GH10121@gsmx07.alcatel.com.au> <403C6A24.80804@he.iki.fi>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2004-Feb-25 11:25:56 +0200, Petri Helenius <pete@he.iki.fi> wrote:
>This probably invites the question, what, if anything people like me who 
>are interested in getting the maximum performance out of any hardware 
>our things run on (maybe with the exception of the low-MHz embedded 
>stuff :-), is there any good tutorials/books on the subject what kind of 
>things to avoid when looking for optimal performance. The tightest loops 
>mostly do counter rolling, comparisons and pattern matching and we have 
>good mileage on getting performance gains by minimizing writing to 
>memory when there are other options like arithmetic on the fly.

Keep in mind several over-riding rules:
1) Make sure the code is correct before worrying about performance
2) Measure the performance and only worry about the slow bits
3) A better algorithm will virtually always give the biggest performance gain

I can't suggest any general books off-hand (I'm sure someone else in
-performance will know).  You will need the data sheet or programmers
manual for the specific CPU you are aiming for, as well as the relevant
architecture manual (Intel publish a 3-volume iA32 architecture manual
that you can download from the web, the Alpha AXP architecture manual
is also available online from the HP website).

The AXP manual includes two chapters describing general techniques for
AXP coding.  The individual CPU datasheets describe the number and
capabilities of execution units and how the instruction scheduling
works, as well as a matrix of instruction timings (how many clocks
you need to leave between a producer and a consumer instruction to
avoid a bubble).  These numbers and definitions need to be mapped into
the scheduling tables for your compiler.

Keep in mind that both the iA32 and AXP CPUs have embedded performance
counters.  These will be very useful to monitor low-level details
like pipeline stalls, branch mis-predictions, cache misses etc.

-- 
Peter Jeremy



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040225194324.GI10121>