From owner-freebsd-current  Fri Apr  5 14:14:55 1996
Return-Path: owner-current
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id OAA16288
          for current-outgoing; Fri, 5 Apr 1996 14:14:55 -0800 (PST)
Received: from sunrise.cs.berkeley.edu (sunrise.CS.Berkeley.EDU [128.32.38.121])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id OAA16282
          for <current@FreeBSD.org>; Fri, 5 Apr 1996 14:14:52 -0800 (PST)
Received: (from asami@localhost) by sunrise.cs.berkeley.edu (8.6.12/8.6.12) id OAA08351; Fri, 5 Apr 1996 14:15:32 -0800
Date: Fri, 5 Apr 1996 14:15:32 -0800
Message-Id: <199604052215.OAA08351@sunrise.cs.berkeley.edu>
To: bde@zeta.org.au
CC: bde@zeta.org.au, current@FreeBSD.org, hasty@star-gate.com,
        nisha@cs.berkeley.edu, tege@matematik.su.se
In-reply-to: <199604052157.HAA25295@godzilla.zeta.org.au> (message from Bruce Evans on Sat, 6 Apr 1996 07:57:31 +1000)
Subject: Re: fast memory copy for large data sizes
From: asami@cs.berkeley.edu (Satoshi Asami)
Sender: owner-current@FreeBSD.org
X-Loop: FreeBSD.org
Precedence: bulk

 * >Well, from the numbers below, it certainly seems faster than yours for 
 * >larger sizes even if things are in the L2 cache!
 * 
 * They aren't in the L2 cache (256K is a tie and yours are faster for 512K
 * but 2*512K isn't in the cache).

I was commenting on the two rightmost columns, the "it" was your
version of FP copy.  (You can't compare your int copy and our original
FP numbers, because we always started with everything out of the
cache! ;)

 * 				    I get similar results with fildl.  Now
 * trying reading and pushing then popping and writing 32 bytes at a time.
 * This might work better if there were more registers so the stack doesn't
 * have to have to be used.

Can you elaborate?  Can I use FP registers without using the stack?  I 
thought all the FP registers are in the stack!

 * 			     However, the stack is very fast if it's in the
 * L1 cache (I get 800 MB/s read and 750 MB/s write).

Wow.

 * Have you tried using fldt?  No conversion for that.

What's fldt?  My assembler doesn't know about that instruction....

 * >Ok.  By the way, why is your data lacking smaller sizes for your FP
 * >copy?
 * 
 * I didn't run them all and they weren't interesting (nowhere near 350K/s).

Well, it might be worthwhile to put them side by side and see how they 
compare.

By the way, may we have a copy of your routine?  Is it beerware? :)

Satoshi