Date: Sun, 29 Jun 2008 10:36:29 +0200 From: Christoph Mallon <christoph.mallon@gmx.de> To: Bruce Evans <brde@optusnet.com.au> Cc: cvs-src@freebsd.org, src-committers@freebsd.org, cvs-all@freebsd.org, Marius Strobl <marius@alchemy.franken.de> Subject: Re: cvs commit: src/sys/sparc64/include in_cksum.h Message-ID: <4867498D.5050409@gmx.de> In-Reply-To: <20080629121025.K92490@delplex.bde.org> References: <200806252105.m5PL5AUp064418@repoman.freebsd.org> <48654667.1040401@gmx.de> <20080627222404.GJ1215@alchemy.franken.de> <48657058.6020102@gmx.de> <20080628121025.F89039@delplex.bde.org> <4865F89D.4090207@gmx.de> <20080629121025.K92490@delplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote: >>> Right. Though I've never seen unnecessary's __volatiles significantly >>> affecting i386 code. This is because the code in the asms can't be >>> removed completely, and can't be moved much either. With out of order >>> execution, the type of moves that are permitted (not across >>> dependencies) > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>> are precisely the type of moves that the CPU's scheduler can do or undo >>> no matter how the compiler orders the code. >> >> I disagree. For example look at the use of in_addword() in >> dev/sk/if_sk.cv in line 2819: >> csum1 = htons(csum & 0xffff); >> csum2 = htons((csum >> 16) & 0xffff); >> ipcsum = in_addword(csum1, ~csum2 & 0xffff); >> /* checksum fixup for IP options */ >> len = hlen - sizeof(struct ip); >> if (len > 0) { >> return; >> } >> >> The calculation will be executed even if the following if (len > 0) >> leaves the function and the value of ipcsum is unused. >> If in_addword() is not marked volatile it can be moved after the if >> and not be executed in all cases. csum1 and csum2 can be moved after >> the if, too. > > No, volatile has no effect on whether the above calculation will be > executed, since the early return has no dependencies on the caclulation. The volatile induces a dependency. > Old versions of gcc used to handle volatile like that, but this changed > in gcc-3 or earlier. gcc.info now says: > > % The `volatile' keyword indicates that the instruction has important > % side-effects. GCC will not delete a volatile `asm' if it is reachable. > ^^^^^^^^^^^^^^^^^^^ This is not about whether the code is reachable or not (it is reachable), it is about whether the result is used (i.e. whether the code is dead). > % (The instruction can still be deleted if GCC can prove that > % control-flow will never reach the location of the instruction.) Note > % that even a volatile `asm' instruction can be moved relative to other > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > % code, including across jump instructions. For example, on many targets > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ jump != conditional jump. If it moves the *volatile* asm statement across the if, it would only appear on *some* execution paths, which is wrong. It is perfectly fine to move it, when the statement is not volatile, though. > Even if gcc didn't move the caclulation, then CPUs with out of order > execution might schedule it so that it is effectively never executed > (most likely by executing it in otherwise-unused pipelines while the > main pipeline returns). This is valid for the same reasons that gcc > can move the volatile asms -- the return doesn't depend on the result > of the caclulation. This is for the CPU to decide. If the assembler block really contains "important" stuff like memory barriers, writes to machine control registers etc., the CPU will not "effectively never execute" the code. The compiler does not know this, all it sees is the word "volatile". > The above C code is fairly bad, but generates not so bad code on i386: > > % % movl %esi, %eax > % #APP > % xchgb %ah, %al # byte operations can be slow; this one not > # too bad, but I wonder if rorw $8 is better > # (rorl $16 is already used for corresponding > # 32-bit operations) where there is no xchg > # alternative > % #NO_APP And this again is an example why not to use inline assembler, but let the compiler decide this: unsigned short swap16(unsigned short x) { return x >> 8 | x << 8; } is compiled to swap16: movl 4(%esp), %eax rolw $8, %ax ret The compiler is even able to do optimisations, which it absolutely cannot do, if inline assembler is used, example: unsigned short id(unsigned short x) { return swap16(swap16(x)); } results in id: movl 4(%esp), %eax ret Maybe the MD htons() macros should be replaced by MI code. > % shrl $16, %esi > % movl %esi, %edx > % #APP > % xchgb %dh, %dl # as above > % #NO_APP > % notl %edx # poor asm code -- the top 16 bits are unused > # except here to stall for merging them with > # the previous byte operation The compiler simply does not know, that the inline assembler only operates on parts of the register. Another reason not to use inline assembler: u_int g(u_short x) { return ~swap16(x); } g: movl 4(%esp), %eax rolw $8, %ax movzwl %ax, %eax # avoid stall notl %eax ret Regards Christoph
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4867498D.5050409>