Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Feb 1999 01:12:08 -0500 (EST)
From:      Alfred Perlstein <bright@cygnus.rush.net>
To:        Peter Jeremy <peter.jeremy@auss2.alcatel.com.au>
Cc:        hackers@FreeBSD.ORG
Subject:   Re: vm_page_zero_fill
Message-ID:  <Pine.BSF.3.96.990219010251.10060R-100000@cygnus.rush.net>
In-Reply-To: <99Feb19.123711est.40325@border.alcanet.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help


On Fri, 19 Feb 1999, Peter Jeremy wrote:

> Alfred Perlstein <bright@cygnus.rush.net> wrote:
> >After playing with "gcc -O -S bcmp.c" on several platforms, i386,
> >sparc32, alpha.  It seems to me that the function ought to be
> >replaced with this:
> [deleted]
> 
> The code given is portable, but not optimal for any of these
> architectures - especially the Alpha.  The original Alpha chips don't
> have character instructions so character handling is quite poor (and
> gcc2.7.x doesn't include support for the new character instructions).
> 
> Optimal code for the Alpha would read 8-byte long-word aligned chunks
> from memory, then appropriately re-align and compare them.  (There's
> some discussion about this, though not actual code, in the early Alpha
> white papers).
> 
> A similar strategy probably holds for the SPARC (but 4-bytes loads
> except on UltraSPARCs).  Something similar could be done on the ix86,
> but I'm not certain about the advantages.
> 
> This _is_ one area where carefully hand-crafted code is worth the
> effort (especially on the RISC architectures).

Yes, but after 'gcc -O -S' my code reduces the number of branches, and other
ops on all archs.  It's really a non-issue as the i386 code has this hand
done in asm.  I think it's more effecient because gcc is smart enough to
use the index instead of 2 seperate pointers.

> 
> >it uses the "rep cmpsl" opcode, i have heard that using "movs/lods/cmps"
> >was no longer optimal after the 486 line, but i'm unsure.
> Sort of true.  In theory, an explicit loop is faster than "rep cmps".
> Lack of CPU<->RAM bandwidth tends to make this less of an issue unless
> both strings are in L1 cache.

Aren't they both forced into L1 as soon as they are first accessed, making
the rest of the code execute quicker?  (at least on i586+)

Next time i'll check if it's already hand coded asm before I pipe up about
something like this. :)

-Alfred

> 
> Peter
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.990219010251.10060R-100000>