Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Feb 2004 10:54:46 -0800
From:      Tim Kientzle <kientzle@acm.org>
To:        Trent Nelson <trent@teleri.net>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Branch prediction
Message-ID:  <403111F6.2040505@acm.org>
In-Reply-To: <20040215164231.GA54709@teleri.net>
References:  <20040215164231.GA54709@teleri.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Trent Nelson wrote:
>     For as long as I've been programming, I've always been under the
>     impression that CPUs will always predict a branch as being false 
>     the first time they come across it.

The state of the art has advanced considerably since then.

>     Many, many years ago, I came across a DEC programming guide that
>     said the same thing.  It suggested using 'do { } while()' over
>     'while() {}' for loops as this ensured that the loop block was 
>     correctly pre-fetched (rather than whatever followed the loop) 
>     the first time the branch was encountered.

That must have been a long time ago.  while() {} loops routinely get
compiled as:

    bra condition_check
looptop:
    <... body of loop ...>
condition:
    <... test condition ...>
    bcc looptop:

A 'do' loop differs only in omitting the initial forward
branch.  Since this is an unconditional forward branch, it's
always correctly predicted. ;-)    Thus, as a practical matter,
there's no performance difference between do/while and
while.

>     ...  have either CPU architectures or compilers improved
>     over time such that this isn't an issue anymore?  Are CPUs able
>     to intelligently predict a branch the first time they encounter
>     it?  Do compilers re-order blocks to optimise pre-fetching and
>     minimise branch mis-prediction?

Yes, compilers re-order blocks.  I once spent a couple of weeks
trying to hand-assemble some code to run faster than the compiler.
I was only able to accomplish it if there were particular processor
features (SIMD instructions or explicit prefetch) that the
compiler was unaware of.  For serial C code, the compilers are
pretty smart these days.

In particular, modern instruction sets allow the compiler to
specify whether the branch should be default-taken or default-not-taken.

Processors contain "branch history" that can detect and record
certain common patterns (e.g., "branch is not taken every third
time") to improve prediction.  (This history is maintained as
long as the branch remains in the instruction cache, and can be
very effective for tight inner loops.)

If there are no compiler hints and no branch history, I think
Colin was right:  default taken for backward branches (assume
loops happen) default not taken for forward branches (assume if
tests succeed).

Practically speaking, I find it very rarely worth the trouble
of trying to predict this sort of thing unless you're dealing
with unusually performance-critical code.

Tim Kientzle



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?403111F6.2040505>