From owner-freebsd-current Fri Apr 5 14:14:55 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id OAA16288 for current-outgoing; Fri, 5 Apr 1996 14:14:55 -0800 (PST) Received: from sunrise.cs.berkeley.edu (sunrise.CS.Berkeley.EDU [128.32.38.121]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id OAA16282 for ; Fri, 5 Apr 1996 14:14:52 -0800 (PST) Received: (from asami@localhost) by sunrise.cs.berkeley.edu (8.6.12/8.6.12) id OAA08351; Fri, 5 Apr 1996 14:15:32 -0800 Date: Fri, 5 Apr 1996 14:15:32 -0800 Message-Id: <199604052215.OAA08351@sunrise.cs.berkeley.edu> To: bde@zeta.org.au CC: bde@zeta.org.au, current@FreeBSD.org, hasty@star-gate.com, nisha@cs.berkeley.edu, tege@matematik.su.se In-reply-to: <199604052157.HAA25295@godzilla.zeta.org.au> (message from Bruce Evans on Sat, 6 Apr 1996 07:57:31 +1000) Subject: Re: fast memory copy for large data sizes From: asami@cs.berkeley.edu (Satoshi Asami) Sender: owner-current@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk * >Well, from the numbers below, it certainly seems faster than yours for * >larger sizes even if things are in the L2 cache! * * They aren't in the L2 cache (256K is a tie and yours are faster for 512K * but 2*512K isn't in the cache). I was commenting on the two rightmost columns, the "it" was your version of FP copy. (You can't compare your int copy and our original FP numbers, because we always started with everything out of the cache! ;) * I get similar results with fildl. Now * trying reading and pushing then popping and writing 32 bytes at a time. * This might work better if there were more registers so the stack doesn't * have to have to be used. Can you elaborate? Can I use FP registers without using the stack? I thought all the FP registers are in the stack! * However, the stack is very fast if it's in the * L1 cache (I get 800 MB/s read and 750 MB/s write). Wow. * Have you tried using fldt? No conversion for that. What's fldt? My assembler doesn't know about that instruction.... * >Ok. By the way, why is your data lacking smaller sizes for your FP * >copy? * * I didn't run them all and they weren't interesting (nowhere near 350K/s). Well, it might be worthwhile to put them side by side and see how they compare. By the way, may we have a copy of your routine? Is it beerware? :) Satoshi