Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Jan 2000 13:31:20 -0700
From:      Brett Glass <brett@lariat.org>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Warner Losh <imp@village.org>, security@FreeBSD.ORG
Subject:   Re: Merged patches 
Message-ID:  <4.2.2.20000125132053.01a5edb0@localhost>
In-Reply-To: <200001251855.KAA05770@apollo.backplane.com>
References:  <4.2.2.20000125095042.01a5aba0@localhost> <200001251722.KAA04527@harmony.village.org> <4.2.2.20000125113518.01a59100@localhost>

next in thread | previous in thread | raw e-mail | index | archive | help
At 11:55 AM 1/25/2000 , Matthew Dillon wrote:
   
>:>    So we do multiple tests, so what?  Not only will GCC potentially
>:>     optimize the code, 
>:
>:I have never seen GCC optimize tests of the individual bits of a word
>:into a switch.
>
>     Because a 'switch' table lookup is often more expensive then a sequence
>     of conditionals.

No -- because it doesn't know how.

If you cast the series of tests as a switch, the compiler is capable of 
rendering the code either as a sequence of conditionals or as a jump table. 
(In most cases, it will recognize the efficiency of a jump table and
use one, but if there is some reason why a jump table would NOT be more
efficient, it will also recognize that.) If you use individual tests,
the compiler won't be able to pick the best method for you.

:Caching isn't the issue. Conditional jumps trigger pipeline interlocks
>:and stalls. A bunch of them in a row is a worst case. It locks up even the 
>:best superscalar CPUs because the pipelines are tied in knots and you can only
>:do so much speculative execution. Doing a switch eliminates the pipeline
>:"train wreck" and at the same time parallelizes the tests in a completely 
>:portable way. As an ASM programmer, I see MASSIVE speedups when I do this --
>:usually an order of magnitude at least. 
>
>     This is not true if the conditionals are ordered for the critical path.
>     This is especially not true on the i386 architecture which implements 
>     nearly 0-cost branches in the branch-cache case.

This is a general TCP input routine. ALL packets enter here. The critical
path is situational, so it pays to make all of the paths fast. Due to the 
way modern processors are architected, an indexed jump is as fast as a 
conditional jump and does not create dependencies or stall pipelines. 
So, the critical path is still optimal with a jump table in most cases.
Again, if for some reason it is not on a specific architecture, the
compiler can render the switch as a series of conditional jumps.

>    This is not necessarily true.  A jump table usually destroys the 
>     branch-cache for the main branch-to-jump-table and can be slower 
>     then a sequence of conditionals that have been ordered for the critical
>     path.

Again, you cannot order conditionals for "the" critical path here. And
references to a jump table, because they do not rely on condition flags, can
be pipelined and executed in parallel with other instructions. And the jump
table itself enters the cache if used frequently. The net result: the code
incurs less overhead than a conditional jump. I invite you to experiment
with this yourself; as an author of optimized assembly language, I have.
Or see Abrash, who notes the same result in his book "The Zen Of Assembly
Language."

--Brett Glass



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-security" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4.2.2.20000125132053.01a5edb0>