Date: Fri, 5 Apr 1996 13:19:19 -0800 (PST) From: asami@cs.berkeley.edu (Satoshi Asami) To: bde@zeta.org.au Cc: current@FreeBSD.org, hasty@star-gate.com, nisha@cs.berkeley.edu, tege@matematik.su.se Subject: Re: fast memory copy for large data sizes Message-ID: <199604052119.NAA25877@silvia.HIP.Berkeley.EDU> In-Reply-To: <199604052055.GAA23015@godzilla.zeta.org.au> (message from Bruce Evans on Sat, 6 Apr 1996 06:55:42 %2B1000)
next in thread | previous in thread | raw e-mail | index | archive | help
* Oops. I put together 5 fast memory copies that don't use floating point * registers. Speeds range from 40K/sec to 340K/sec. on a 133MHz Pentium * (ASUS), Triton chipset, 512KB PB cache, 60ns non-EDO main memory. This * is after attempting to minimize the differences caused by the cache * state. Details in other mail. Cool cool. This is the kind of response I was waiting for! Oh by the way, the 133MHz Pentium system we tested has 60ns EDO memory. * The speed differences are so large and the cache state is so variable * that it is easy to create benchmarks showing that all methods are the * best :-). We seemed to have fooled ourselves with the optimized kernel :) * This seemed like a bad idea. I added a test using it (just 8 fldl's * followed by 8 fstpl's, storing in reverse order - this works for at * least all-zero data) and got good results, but I still think it is a bad * idea. Well, from the numbers below, it certainly seems faster than yours for larger sizes even if things are in the L2 cache! Note that the speed of fldls depend on the actual data. All-zero data is faster than random data (to avoid traps, try ((double *)src[i] = random())), probably because the all-zero bit pattern can be converted to floating point (ok, no conversion necesarry in this case :) in a snap. * Perhaps it can the duplicated by copying via integer registers * through the L1 cache. This is what I don't understand, people keep saying that we can do it using integer regesters but we simply can't get it to work as fast. If we can get it to work as fast as our FP copy, I won't utter "fildq" for the rest of my life, I swear! * >133MHz Pentium (sunrise), Triton chipset, 512KB (pipeline burst) cache: * * new columns * vvvvvvvvv vvvvvvvvv vvvvvvvv * > size libc ours mine-libc mine-best(int) mine-fp * > 32 N/A 30.517578 MB/s 51493147 98069887 * > 1024 40.690104 MB/s 51.398026 MB/s 85214370 379715593 ^^^^^^^^^ wow * > 16384 39.556962 MB/s 52.966102 MB/s 65103489 97472157 * > 32768 39.506953 MB/s 53.146259 MB/s 66593990 99217964 93604474 * > 65536 39.457071 MB/s 53.282182 MB/s 61407673 79866591 93721503 * > 131072 39.457071 MB/s 53.327645 MB/s 65457449 68011573 79960595 * > 262144 39.345294 MB/s 53.350405 MB/s 51273532 53702491 75576993 * > 524288 39.044198 MB/s 53.430220 MB/s 49370136 50029142 67400433 * > 1048576 38.086533 MB/s 53.447354 MB/s 44054746 44095308 58624791 * > 2097152 37.706680 MB/s 53.387433 MB/s 42742240 42770154 56946700 * > 4194304 37.628643 MB/s 53.280763 MB/s 43381238 43381238 57727588 * My tests are obviously not equivalent for small copies - the libc * times are about twice as high. This is because I keep copying the * same data. I want to do this to test in-cache copies. Not-in-cache * copies get tested as a side effect when the buffer is much larger * that the cache (L1 or L2). Ok. By the way, why is your data lacking smaller sizes for your FP copy? * Your test gives similar times on my system. It tests the speed of * copying data that isn't in the cache. This seems to be the usual Yeah, lmbench's mem_cp was giving me ungodly numbers for small-sized copies until I realized that it's copying the same things over and over. Since we were interested in optimizing filesystem troughput (large sequentian read/writes), that wasn't what we wanted, and I changed it to walk through a larger buffer. * Only if traps are enabled. Rounding may be a problem. : * Useing 64-bit precision may be enough to avoid rounding problems. * fldl is much faster than fildl if the data is in the cache. Well, we tried disabling the traps too, and got our data mangled. ;) * >Please type "make" and it will compile & run the tests. The output * * It didn't :-). It assumes that "." is in the $PATH. Duuh, sorry. Next time I send out a test script, I'll make sure to put "./" in front of all our programs! Satoshi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199604052119.NAA25877>