Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 Jul 2008 19:58:13 +0200
From:      Marius Strobl <marius@alchemy.franken.de>
To:        Christoph Mallon <christoph.mallon@gmx.de>
Cc:        cvs-src@FreeBSD.org, src-committers@FreeBSD.org, cvs-all@FreeBSD.org
Subject:   Re: cvs commit: src/sys/sparc64/include in_cksum.h
Message-ID:  <20080701175813.GR1215@alchemy.franken.de>
In-Reply-To: <48664210.80204@gmx.de>
References:  <200806252105.m5PL5AUp064418@repoman.freebsd.org> <48654667.1040401@gmx.de> <20080627222404.GJ1215@alchemy.franken.de> <48657058.6020102@gmx.de> <20080628114417.GL1215@alchemy.franken.de> <486629AA.1050409@gmx.de> <20080628132241.GO1215@alchemy.franken.de> <48664210.80204@gmx.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jun 28, 2008 at 03:52:16PM +0200, Christoph Mallon wrote:
> Marius Strobl wrote:
> >On Sat, Jun 28, 2008 at 02:08:10PM +0200, Christoph Mallon wrote:
> >>Marius Strobl wrote:
> >>>>On a related note: Is inline assembler really necessary here? For 
> >>>>example couldn't in_addword() be written as
> >>>>static __inline u_short
> >>>>in_addword(u_short const sum, u_short const b)
> >>>>{
> >>>>   u_int const t = sum + b;
> >>>>   return t + (t >> 16);
> >>>>} ?
> >>>>This should at least produce equally good code and because the compiler 
> >>>>has more knowledge about it than an assembler block, it potentially 
> >>>>leads to better code. I have no SPARC compiler at hand, though.
> >>>With GCC 4.2.1 at -O2 the code generated for the above C version
> >>>takes on more instruction than the inline assembler so if one 
> >>On SPARC?  What code does it produce? I have not SPARC compiler at hand.
> >>Even if it is one more instruction, I think the reduced register 
> >>pressure makes more than up for it.
> >
> >Correct, it only uses two registers:
> >
> >0000000000000000 <in_addword>:
> >   0:   92 02 00 09     add  %o0, %o1, %o1
> >   4:   91 32 60 10     srl  %o1, 0x10, %o0
> >   8:   90 02 00 09     add  %o0, %o1, %o0
> >   c:   91 2a 20 10     sll  %o0, 0x10, %o0
> >  10:   91 32 20 10     srl  %o0, 0x10, %o0
> >  14:   81 c3 e0 08     retl 
> >  18:   91 3a 20 00     sra  %o0, 0, %o0
> >  1c:   01 00 00 00     nop 
> 
> One more instruction? That's five instructions for the actual 
> calculation afaict, just like the inline assembler version. The sra in 
> the delay slot should be present in the inline assembler version, too.

Yes, when compiled as a stand-alone function. Actually, in that
case GCC adds two additional shifts, which it also does for your
mixed version below in that case. It doesn't do this when using
it inline in sk(4) for example, though in case of the mixed
version the two 'sra <reg>, 0, <reg>' (see below) are still
present. This probably means that comparing the code generated
for the different versions when compiled as stand-alone functions
isn't a good approach for comparing the different versions.

> 
> >>This should work fine and only use two registers (though the compiler 
> >>can choose to use three, if it deems it beneficial):
> >>
> >>static __inline u_short
> >>in_addword(u_short const sum, u_short const b)
> >>{
> >>  u_long const sum16 = sum << 16;
> >>  u_long const b16   = b   << 16;
> >>  u_long       ret;
> >>
> >>  __asm(
> >>    "addcc %1, %2, %0\n\t"
> >>    "srl   %0, 16, %0\n\t"
> >>    "addc  %0,  0, %0\n"
> >>    : "=r" (ret) : "r" (sum16), "r" (b16) : "cc");
> >>
> >>  return (ret);
> >>}
> >
> >This is ten instructions with two registers. Where is the
> >break even regarding instructions vs. registers for sparc64? :)
> 
> I still have no SPARC compiler. Ten instructions? All I did was write 
> the two shifts in C and adjust the register constraints. It should 
> produce identical code.

Nine if you don't count the shift in the delay slot, seven when
used inline as noted above.

0000000000000000 <in_addword>:
   0:   91 2a 20 10     sll  %o0, 0x10, %o0
   4:   93 2a 60 10     sll  %o1, 0x10, %o1
   8:   93 3a 60 00     sra  %o1, 0, %o1
   c:   91 3a 20 00     sra  %o0, 0, %o0
  10:   90 82 00 09     addcc  %o0, %o1, %o0
  14:   91 32 20 10     srl  %o0, 0x10, %o0
  18:   90 42 20 00     addc  %o0, 0, %o0
  1c:   91 2a 20 10     sll  %o0, 0x10, %o0
  20:   91 32 20 10     srl  %o0, 0x10, %o0
  24:   81 c3 e0 08     retl 
  28:   91 3a 20 00     sra  %o0, 0, %o0
  2c:   30 68 00 05     b,a   %xcc, 40 <in_addword+0x40>
  30:   01 00 00 00     nop 
  34:   01 00 00 00     nop 
  38:   01 00 00 00     nop 
  3c:   01 00 00 00     nop

> 
> >>But I still prefer the C version.
> >>
> >
> >And I prefer to not re-write otherwise working code for
> >micro-optimizations, there are enough unfixed real bugs
> 
> Obviously the inline assembler magic did not work and is/was a real bug.
> 
> >to deal with. Similarly we should not waste time discussing
> >how to possibly optimize MD versions even more but rather
> >spend the time improving the MI version so it's good enough
> >that using MD versions isn't worth the effort.
> 
> The C alternative is MI and in length on par with the inline assembler 
> version, isn't it?

As long as it's not in a MI location it's not. Besides, I
was talking about the in_cksum stuff as a whole, not just
in_addword(). Also, with new non-GPL alternatives on the
horizon, GPL isn't necessarily the only compiler we should
care about, which in turn might justify having MD inline
assembler versions though.

Marius




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080701175813.GR1215>