Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 28 Jun 2008 15:22:41 +0200
From:      Marius Strobl <marius@alchemy.franken.de>
To:        Christoph Mallon <christoph.mallon@gmx.de>
Cc:        cvs-src@FreeBSD.org, src-committers@FreeBSD.org, cvs-all@FreeBSD.org
Subject:   Re: cvs commit: src/sys/sparc64/include in_cksum.h
Message-ID:  <20080628132241.GO1215@alchemy.franken.de>
In-Reply-To: <486629AA.1050409@gmx.de>
References:  <200806252105.m5PL5AUp064418@repoman.freebsd.org> <48654667.1040401@gmx.de> <20080627222404.GJ1215@alchemy.franken.de> <48657058.6020102@gmx.de> <20080628114417.GL1215@alchemy.franken.de> <486629AA.1050409@gmx.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jun 28, 2008 at 02:08:10PM +0200, Christoph Mallon wrote:
> Marius Strobl wrote:
> >>On a related note: Is inline assembler really necessary here? For 
> >>example couldn't in_addword() be written as
> >>static __inline u_short
> >>in_addword(u_short const sum, u_short const b)
> >>{
> >>    u_int const t = sum + b;
> >>    return t + (t >> 16);
> >>} ?
> >>This should at least produce equally good code and because the compiler 
> >>has more knowledge about it than an assembler block, it potentially 
> >>leads to better code. I have no SPARC compiler at hand, though.
> >
> >With GCC 4.2.1 at -O2 the code generated for the above C version
> >takes on more instruction than the inline assembler so if one 
> 
> On SPARC?  What code does it produce? I have not SPARC compiler at hand.
> Even if it is one more instruction, I think the reduced register 
> pressure makes more than up for it.

Correct, it only uses two registers:

0000000000000000 <in_addword>:
   0:   92 02 00 09     add  %o0, %o1, %o1
   4:   91 32 60 10     srl  %o1, 0x10, %o0
   8:   90 02 00 09     add  %o0, %o1, %o0
   c:   91 2a 20 10     sll  %o0, 0x10, %o0
  10:   91 32 20 10     srl  %o0, 0x10, %o0
  14:   81 c3 e0 08     retl 
  18:   91 3a 20 00     sra  %o0, 0, %o0
  1c:   01 00 00 00     nop 

> 
> >wants to go for micro-optimizing one should certainly prefer the
> >inline assembler version.
> 
> As a compiler construction I can tell you, that regarding optimisation 
> there is no such thing as "certainty".
> The worst part about inline assembler is, that the compiler knows 
> nothing about the instructions in there and has to copy them verbatim. 
> For example it can not do any clever things with the two shifts at the 
> beginning of the inline assembler block of in_addword().

That's why my statement regarding micro-optimizing actually
was meant with sarcasm. In order to decide whether it's still
worth to do certain code as inline assembler rather than C
one would have to re-check, i.e. re-benchmark, every time the
compiler or the consumers change. Obviously that's not doable.
So the bottom line is that the best we can do is to investigate
once and if we come to the conclusion that doing something as
inline assembler generally is worth it most of the times, stick
with this (though not necessarily forever).

> 
> >>In fact the in/out specification for this asm block looks rather bad:
> >>"=&r" (__ret), "=&r" (__tmp) : "r" (sum), "r" (b) : "cc");
> >>The "&"-modifiers (do not use the same registers as for any input 
> >>operand value) force the compiler to use 4 (!) register in total for 
> >>this asm block. It could be done with 2 registers if a proper in/out 
> >>specification was used. At the very least the in/out specification can 
> >>be improved, but I suspect using plain C is the better choice.
> >>
> >
> >The "&"-modifiers are necessary as the inline assembler in
> >question consumes output operands before all input operands
> >are consumed. Omitting them caused GCC to generate broken
> >code in the past.
> 
> This should work fine and only use two registers (though the compiler 
> can choose to use three, if it deems it beneficial):
> 
> static __inline u_short
> in_addword(u_short const sum, u_short const b)
> {
>   u_long const sum16 = sum << 16;
>   u_long const b16   = b   << 16;
>   u_long       ret;
> 
>   __asm(
>     "addcc %1, %2, %0\n\t"
>     "srl   %0, 16, %0\n\t"
>     "addc  %0,  0, %0\n"
>     : "=r" (ret) : "r" (sum16), "r" (b16) : "cc");
> 
>   return (ret);
> }

This is ten instructions with two registers. Where is the
break even regarding instructions vs. registers for sparc64? :)

> 
> But I still prefer the C version.
> 

And I prefer to not re-write otherwise working code for
micro-optimizations, there are enough unfixed real bugs
to deal with. Similarly we should not waste time discussing
how to possibly optimize MD versions even more but rather
spend the time improving the MI version so it's good enough
that using MD versions isn't worth the effort.

Marius




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080628132241.GO1215>