Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 May 1996 06:06:09 -0700 (PDT)
From:      asami@cs.berkeley.edu (Satoshi Asami)
To:        bde@zeta.org.au
Cc:        current@FreeBSD.org, nisha@cs.berkeley.edu, marc@bowtie.nl, ken@area238.residence.gatech.edu, wollman@lcs.mit.edu, wscott@ichips.intel.com, pattrsn@cs.berkeley.edu, culler@cs.berkeley.edu
Subject:   Re: more on fast bcopy
Message-ID:  <199605071306.GAA04277@silvia.HIP.Berkeley.EDU>
In-Reply-To: <199605061207.WAA04793@godzilla.zeta.org.au> (message from Bruce Evans on Mon, 6 May 1996 22:07:13 %2B1000)

next in thread | previous in thread | raw e-mail | index | archive | help
Wayne:

 * 	Pentium Pro 200/256
 * 	128 Meg memory 2-way interleave 
 * 	B-step Orion chipset
 * 
 * The interesting results is that 'libc' is MUCH faster than any
 * of the other results.
 * 
 * We implimented a fast string copy mode for 'rep movs' that kicks
 * in at about 128 elements.

Yes, indeed, this is very interesting.  I guess whatever I'm doing
with all this is going to be moot once we all move to P6's. ;)

By the way, I assume the external clock of the 200MHz P6 is 66MHz, is
this correct?  The memory copy speed of this machine seems to be
slower than the Triton-based P5-133 that we have (see below).  Do you
know where the "B-step" Orion stands on the maturity curve, in terms
of memory access speed?

Bruce:

 * Why not? :-)  It should be possible to use the fpu after saving and
 * restoring the FP registers reentrantly.
                              ^^^^^^^^^^^

Yeah, we were running into problems with this.  Can you tell us how to
do it? ;)

 * >We've got 67MB/s on the 133MHz Pentium + Triton here.  Wow.
 * 
 * Same here.  An FP method seems to be the fastest way of bzeroing
 * uncached memory too.  I get about 150MB/sec for an FP based bzero and
 * about 85MB/sec (max) for all reasonable integer register based versions.

I see.  By the way, we tried unrolling the loops even more, and
actually got up to 80MB/s for FP and 60MB/s for integer registers
(this is for bcopy).

I put the results on our machines as well as others on

  http://stampede.cs.berkeley.edu/~asami/Td/bcopy.html

please take a look.  If you would want to contribute, please grab

  ftp://stampede.cs.berkeley.edu/pub/bcopy/bcopy-960507.tar.gz

and follow the instructions.

Here is a brief summary:

 Name      |  CPU    | Chipset |     bcopy speed (MB/s)
           |         |         | libc unrolled-int unrolled-FP
-----------+---------+---------+-------------------------------
 Wayne's   | P6-200  | Orion-B |  47       36-      36-
 Garrett's | P6-150  | Orion-? |  26       27=      27=
 luke      | P5-133  | Triton  |  40       60       80
 obiwan    | P5-100  | SiS     |  23       29       45
 stampede  | P5-90   | Neptune |  22       23=      44
 Marc's    | P5-90   | Pluto   |  20       20=      32
 Kenneth's | 486-100 | SiS     |  10       10=       8-

"=" means it's not much faster than libc, "-" means it's slower than
libc.  It's pretty clear that the FP trick only helps for Pentiums.

Satoshi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199605071306.GAA04277>