From owner-freebsd-current Fri Apr 5 14:02:18 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id OAA15438 for current-outgoing; Fri, 5 Apr 1996 14:02:18 -0800 (PST) Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.19]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id OAA15430 for ; Fri, 5 Apr 1996 14:02:12 -0800 (PST) Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.12/8.6.9) id HAA25295; Sat, 6 Apr 1996 07:57:31 +1000 Date: Sat, 6 Apr 1996 07:57:31 +1000 From: Bruce Evans Message-Id: <199604052157.HAA25295@godzilla.zeta.org.au> To: asami@cs.berkeley.edu, bde@zeta.org.au Subject: Re: fast memory copy for large data sizes Cc: current@FreeBSD.org, hasty@star-gate.com, nisha@cs.berkeley.edu, tege@matematik.su.se Sender: owner-current@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk > * This seemed like a bad idea. I added a test using it (just 8 fldl's > * followed by 8 fstpl's, storing in reverse order - this works for at > * least all-zero data) and got good results, but I still think it is a bad > * idea. >Well, from the numbers below, it certainly seems faster than yours for >larger sizes even if things are in the L2 cache! They aren't in the L2 cache (256K is a tie and yours are faster for 512K but 2*512K isn't in the cache). I get similar results with fildl. Now trying reading and pushing then popping and writing 32 bytes at a time. This might work better if there were more registers so the stack doesn't have to have to be used. However, the stack is very fast if it's in the L1 cache (I get 800 MB/s read and 750 MB/s write). >Note that the speed of fldls depend on the actual data. All-zero data >is faster than random data (to avoid traps, try ((double *)src[i] = >random())), probably because the all-zero bit pattern can be converted >to floating point (ok, no conversion necesarry in this case :) in a >snap. Have you tried using fldt? No conversion for that. >Ok. By the way, why is your data lacking smaller sizes for your FP >copy? I didn't run them all and they weren't interesting (nowhere near 350K/s). Bruce