Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 6 Apr 1996 09:00:26 +1000
From:      Bruce Evans <bde@zeta.org.au>
To:        asami@cs.berkeley.edu, bde@zeta.org.au
Cc:        current@freebsd.org, hasty@star-gate.com, nisha@cs.berkeley.edu, tege@matematik.su.se
Subject:   Re: fast memory copy for large data sizes
Message-ID:  <199604052300.JAA28284@godzilla.zeta.org.au>

next in thread | raw e-mail | index | archive | help
> * 				    I get similar results with fildl.  Now
> * trying reading and pushing then popping and writing 32 bytes at a time.
> * This might work better if there were more registers so the stack doesn't
> * have to have to be used.

>Can you elaborate?  Can I use FP registers without using the stack?  I 
>thought all the FP registers are in the stack!

The FP registers are just registers organized as a stack for inconvenient
access.  I'm trying the usual stack.

> * Have you tried using fldt?  No conversion for that.

>What's fldt?  My assembler doesn't know about that instruction....

fld a ten-byte (80 bit) number.  Gas knows about it.

>By the way, may we have a copy of your routine?  Is it beerware? :)

Throwawayare.  Here is the current version.  The output file in it
doesn't cover some new functions.  The copy6 function (through the
stack) is very poor.  A version of it with less unrolling was only
slightly slower than libc.

Bruce

---
I cleaned up my write and read memory bandwidth brenchmarks and
added i586 optimizations and copy benchmarks.  The zero[12] and
read[12] benchmarks are more or less compatible with the old
benchmarks.  (The initialization for the old benchmarks was sloppy.
Use the -p flag to almost duplicate it.)

The times are for a P133 on an ASUS P55TP4XE rev.2.4 with a 512KB
PB cache and 32MB 60ns non-EDO RAM with all the memory timings
reduced to the minimums supported by the BIOS setup.

getrusage() is used in the hope of getting more accurate times but
perhaps the real times were more relevant.  E.g., the 16MB copy
benchmarks took "forever" to run due to swapping, but the copy
speed was reported to be slightly faster than for the 8MB benchmarks.

The i586 optimizations seem to be pessimizations in practice.  I
benchmarked the kernel's generic_bzero, i486_bzero and i586_bzero
for compiling kernels and for a few thousand fork-exec's of small
processes on.  On the above i586 and on an i486, i486 was best and
i586_bzero was worst in all cases, significantly worse for
fork-exec.  This is presumably because read-before-write is a
pessimization if the data isn't in the L1 or L2 cache, and kernel
data being bzero'ed usually isn't in the cache.  Reducing the memory
timings in the BIOS setup significantly reduced the pessimization to
the ones shown in the output.  E.g., for zeroing 2MB:

	88287436 bytes/sec zero3: (now, was 87 MB/s)
	68917465 bytes/sec zero4: (now, was 58 MB/s)

Of course, perfect pairing isn't a pessimization and is harmless
on i486's unless it involves extra nops.  i586_bzero was only (:-)
slightly slower when the read-before-write was replaced by a
pairable read.  The slowness is presumably caused by extra setup
overheads or a wasted cycle for Adress Generation Interlock (AGI)
in the main loop.

Why is copying in the cache slightly slower than 1/2 the average
speed of reading and writing?  `perfmon' shows perfect pairing and
no AGI for the loops in the i586-optimization benchmarks, but the
tsc's and real times contain cycles (sometimes fractional cycles)
that I can't account for.
---

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  c.c r.c w.c wrc wrc.out
# Wrapped by bde@alphplex.bde.org on Sat Apr  6 08:55:06 1996
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'c.c' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'c.c'\"
else
echo shar: Extracting \"'c.c'\" \(9577 characters\)
sed "s/^X//" >'c.c' <<'END_OF_FILE'
X#include <sys/types.h>
X#include <sys/time.h>
X#include <sys/resource.h>
X
X#include <machine/cpufunc.h>
X
X#include <stdlib.h>
X#include <stdio.h>
X#include <string.h>
X
Xtypedef void func_t(void *dst, const void *src, size_t len);
X
Xstruct func
X{
X    func_t *fn;
X    char *name;
X    char *description;
X};
X
Xstatic func_t copy0, copy1, copy2, copy3, copy4, copy5, copy6, copy7;
Xstatic void usage(void);
X
Xstatic char const *progname;
X
Xstatic struct func funcs[] =
X{
X    copy0, "copy0", "movsl",
X    copy1, "copy1", "unroll 16",
X    copy2, "copy2", "unroll 16 prefetch",
X    copy3, "copy3", "unroll 64 i586-opt",
X    copy4, "copy4", "unroll 64 i586-opt prefetch",
X    copy5, "copy5", "unroll 64 i586-opx prefetch",
X    copy6, "copy6", "unroll 64 stack",
X    copy7, "copy7", "unroll 64 fp",
X    memcpy,"copy8", "memcpy (movsl)",
X};
X#define NFUNC	(sizeof funcs / sizeof funcs[0])
X
Xint main(int argc, char **argv)
X{
X    int ch;
X    unsigned char *dst;
X    int funcn;
X    int funcnspecified;
X    int i586;
X    size_t len;
X    size_t max;
X    int precache;
X    int quiet;
X    unsigned char *src;
X    size_t thrashbufsize;
X    unsigned long long tot;
X
X    progname = argv[0];
X    funcnspecified = -1;
X    i586 = 0;
X    len = 4096;
X    precache = 0;
X    quiet = 0;
X    tot = 100000000;
X    while ((ch = getopt(argc, argv, "5f:l:pqt:")) != EOF)
X    {
X	switch (ch)
X	{
X	case '5':
X	    i586 = 1;
X	    break;
X	case 'f':
X	    funcnspecified = strtoul(optarg, (char **) NULL, 0);
X	    if (funcnspecified < 0 || funcnspecified >= NFUNC)
X		usage();
X	    break;
X	case 'l':
X	    len = strtoul(optarg, (char **) NULL, 0);
X	    break;
X	case 'p':
X	    precache = 1;
X	    break;
X	case 'q':
X	    quiet = 1;
X	    break;
X	case 't':
X	    tot = strtouq(optarg, (char **) NULL, 0);
X	    break;
X	default:
X	    usage();
X	}
X    }
X    if (optind != argc)
X	usage();
X    dst = malloc(len + 64);
X    src = malloc(len + 64);
X    if (dst == NULL || src == NULL)
X    {
X	fprintf(stderr, "%s: malloc failed\n", progname);
X	exit(1);
X    }
X    max = tot / len;
X
X    for (funcn = 0; funcn < NFUNC; ++funcn)
X    {
X	func_t *funcp;
X	struct rusage finish;
X	size_t i;
X	struct rusage start;
X	unsigned long long tsc;
X	long usec;
X
X	if (funcnspecified != -1 && funcnspecified != funcn)
X	    continue;
X
X	/*
X	 * Check the function.  As side effects, make sure that the buffers
X	 * aren't a constant zero page, and leave as much of the buffers as
X	 * possible in the cache to set up the `precache' case.
X	 */
X	memset(dst, 1, len);
X	memset(src, 2, len);
X	funcp = funcs[funcn].fn;
X	funcp(dst, src, len);
X	if (memcmp(dst, src, len) != 0)
X	{
X	    fprintf(stderr, "%s: %s failed\n", progname, funcs[funcn].name);
X	    exit(1);
X	}
X
X	if (!precache)
X	    /*
X	     * Attempt to uncache the buffer so as to provide the same
X	     * uncached environnment for all the functions.
X	     */
X	    for (thrashbufsize = 2 * 1024 * 1024; thrashbufsize != 0;
X		 thrashbufsize /= 2)
X	    {
X		unsigned char *thrashbuf1;
X		unsigned char *thrashbuf2;
X
X		thrashbuf1 = malloc(thrashbufsize);
X		thrashbuf2 = malloc(thrashbufsize);
X		if (thrashbuf1 != NULL && thrashbuf2 != NULL)
X		{
X		    memcpy(thrashbuf2, thrashbuf1, thrashbufsize);
X		    memcpy(thrashbuf1, thrashbuf2, thrashbufsize);
X		}
X		free(thrashbuf1);
X		free(thrashbuf2);
X	    }
X
X	tsc = 0;
X	getrusage(RUSAGE_SELF, &start);
X	if (i586)
X	    tsc = rdtsc();
X	for (i = 0; i < max; ++i)
X	    funcp(dst, src, len);
X	if (i586)
X	    tsc = rdtsc() - tsc;
X	getrusage(RUSAGE_SELF, &finish);
X	usec = 1000000 * (finish.ru_utime.tv_sec - start.ru_utime.tv_sec)
X	       + finish.ru_utime.tv_usec - start.ru_utime.tv_usec;
X	if (usec < 0)
X	    usec = 1;
X	printf("%s: %10.0f B/s", funcs[funcn].name, tot * 1e6 / usec);
X	if (!quiet)
X	{
X	    printf(" (%7ld us)", usec);
X	    if (i586)
X		printf(" (%9qd tsc)", tsc);
X	    printf(" (%s)", funcs[funcn].description);
X	}
X	printf("\n");
X    }
X    return 0;
X}
X
Xstatic void copy0(void *dst, const void *src, size_t len)
X{
X    asm volatile("
X	.align 4,0x90
X	cld
X	shrl $2,%2
X	rep; movsl"
X	: "=D" (dst), "=S" (src), "=c" (len)
X	: "0"  (dst), "1"  (src), "2"  (len)
X	: "memory");
X}
X
Xstatic void copy1(void *dst, const void *src, size_t len)
X{
X    unsigned tmp;
X
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl 0(%1),%3
X	movl %3,0(%0)
X	movl 4(%1),%3
X	movl %3,4(%0)
X	movl 8(%1),%3
X	movl %3,8(%0)
X	movl 12(%1),%3
X	movl %3,12(%0)
X	addl $16,%0
X	addl $16,%1
X	subl $16,%2
X	ja 1b"
X	: "=r" (dst), "=r" (src), "=r" (len), "=&r" (tmp)
X	: "0"  (dst), "1"  (src), "2"  (len)
X	: "memory");
X}
X
Xstatic void copy2(void *dst, const void *src, size_t len)
X{
X    unsigned prefetch;
X    unsigned tmp;
X
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl 0(%0),%4
X	movl 0(%1),%3
X	movl %3,0(%0)
X	movl 4(%1),%3
X	movl %3,4(%0)
X	movl 8(%1),%3
X	movl %3,8(%0)
X	movl 12(%1),%3
X	movl %3,12(%0)
X	addl $16,%0
X	addl $16,%1
X	subl $16,%2
X	ja 1b"
X	: "=r" (dst), "=r" (src), "=r" (len), "=&r" (tmp), "=&r" (prefetch)
X	: "0"  (dst), "1"  (src), "2"  (len)
X	: "memory");
X}
X
Xstatic void copy3(void *dst, const void *src, size_t len)
X{
X    unsigned tmp1;
X    unsigned tmp2;
X
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl 0(%1),%3
X	movl 4(%1),%4
X	movl %3,0(%0)
X	movl %4,4(%0)
X	movl 8(%1),%3
X	movl 12(%1),%4
X	movl %3,8(%0)
X	movl %4,12(%0)
X	movl 16(%1),%3
X	movl 20(%1),%4
X	movl %3,16(%0)
X	movl %4,20(%0)
X	movl 24(%1),%3
X	movl 28(%1),%4
X	movl %3,24(%0)
X	movl %4,28(%0)
X	movl 32(%1),%3
X	movl 36(%1),%4
X	movl %3,32(%0)
X	movl %4,36(%0)
X	movl 40(%1),%3
X	movl 44(%1),%4
X	movl %3,40(%0)
X	movl %4,44(%0)
X	movl 48(%1),%3
X	movl 52(%1),%4
X	movl %3,48(%0)
X	movl %4,52(%0)
X	movl 56(%1),%3
X	movl 60(%1),%4
X	movl %3,56(%0)
X	movl %4,60(%0)
X	addl $64,%0
X	addl $64,%1
X	subl $64,%2
X	ja 1b"
X	: "=r" (dst), "=r" (src), "=r" (len), "=&r" (tmp1), "=&r" (tmp2)
X	: "0"  (dst), "1"  (src), "2"  (len)
X	: "memory");
X}
X
Xstatic void copy4(void *dst, const void *src, size_t len)
X{
X    unsigned tmp1;
X    unsigned tmp2;
X
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl 0(%0),%3
X	movl 36(%0),%4
X	movl 0(%1),%3
X	movl 4(%1),%4
X	movl %3,0(%0)
X	movl %4,4(%0)
X	movl 8(%1),%3
X	movl 12(%1),%4
X	movl %3,8(%0)
X	movl %4,12(%0)
X	movl 16(%1),%3
X	movl 20(%1),%4
X	movl %3,16(%0)
X	movl %4,20(%0)
X	movl 24(%1),%3
X	movl 28(%1),%4
X	movl %3,24(%0)
X	movl %4,28(%0)
X	movl 32(%1),%3
X	movl 36(%1),%4
X	movl %3,32(%0)
X	movl %4,36(%0)
X	movl 40(%1),%3
X	movl 44(%1),%4
X	movl %3,40(%0)
X	movl %4,44(%0)
X	movl 48(%1),%3
X	movl 52(%1),%4
X	movl %3,48(%0)
X	movl %4,52(%0)
X	movl 56(%1),%3
X	movl 60(%1),%4
X	movl %3,56(%0)
X	movl %4,60(%0)
X	addl $64,%0
X	addl $64,%1
X	subl $64,%2
X	ja 1b"
X	: "=r" (dst), "=r" (src), "=r" (len), "=&r" (tmp1), "=&r" (tmp2)
X	: "0"  (dst), "1"  (src), "2"  (len)
X	: "memory");
X}
X
Xstatic void copy5(void *dst, const void *src, size_t len)
X{
X    void *junk;
X    unsigned tmp1;
X    unsigned tmp2;
X
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl 0(%0),%3
X	movl 36(%0),%4
X	movl 0(%1),%3
X	movl 4(%1),%4
X	movl %3,0(%0)
X	movl %4,4(%0)
X	movl 8(%1),%3
X	movl 12(%1),%4
X	movl %3,8(%0)
X	movl %4,12(%0)
X	movl 16(%1),%3
X	movl 20(%1),%4
X	movl %3,16(%0)
X	movl %4,20(%0)
X	movl 24(%1),%3
X	movl 28(%1),%4
X	movl %3,24(%0)
X	movl %4,28(%0)
X	movl 32(%1),%3
X	movl 36(%1),%4
X	movl %3,32(%0)
X	movl %4,36(%0)
X	movl 40(%1),%3
X	movl 44(%1),%4
X	movl %3,40(%0)
X	movl %4,44(%0)
X	movl 48(%1),%3
X	movl 52(%1),%4
X	movl %3,48(%0)
X	movl %4,52(%0)
X	movl 56(%1),%3
X	movl 60(%1),%4
X	movl %3,56(%0)
X	movl %4,60(%0)
X	addl $64,%0
X	addl $64,%1
X	cmpl %2,%0
X	jb 1b"
X	: "=r" (dst), "=r" (src), "=r" (junk), "=&r" (tmp1), "=&r" (tmp2)
X	: "0"  (dst), "1"  (src), "2" ((char *) dst + len)
X	: "memory");
X}
X
Xstatic void copy6(void *dst, const void *src, size_t len)
X{
X    void *junk;
X    unsigned tmp1;
X    unsigned tmp2;
X    unsigned tmp3;
X    unsigned tmp4;
X
X    asm volatile("
X	.align 4,0x90
X	subl $64,%%esp
X	andl $0xffffff80,%%esp
X	1:
X	movl 0(%1),%3
X	movl 4(%1),%4
X	movl %3,0(%%esp)
X	movl %4,4(%%esp)
X	movl 8(%1),%3
X	movl 12(%1),%4
X	movl %3,8(%%esp)
X	movl %4,12(%%esp)
X	movl 16(%1),%3
X	movl 20(%1),%4
X	movl %3,16(%%esp)
X	movl %4,20(%%esp)
X	movl 24(%1),%3
X	movl 28(%1),%4
X	movl %3,24(%%esp)
X	movl %4,28(%%esp)
X	movl 32(%1),%3
X	movl 36(%1),%4
X	movl %3,32(%%esp)
X	movl %4,36(%%esp)
X	movl 40(%1),%3
X	movl 44(%1),%4
X	movl %3,40(%%esp)
X	movl %4,44(%%esp)
X	movl 48(%1),%3
X	movl 52(%1),%4
X	movl %3,48(%%esp)
X	movl %4,52(%%esp)
X	movl 56(%1),%3
X	movl 60(%1),%4
X	movl %3,56(%%esp)
X	movl %4,60(%%esp)
X	movl 0(%%esp),%3
X	movl 4(%%esp),%4
X	movl %3,0(%0)
X	movl %4,4(%0)
X	movl 8(%%esp),%3
X	movl 12(%%esp),%4
X	movl %3,8(%0)
X	movl %4,12(%0)
X	movl 16(%%esp),%3
X	movl 20(%%esp),%4
X	movl %3,16(%0)
X	movl %4,20(%0)
X	movl 24(%%esp),%3
X	movl 28(%%esp),%4
X	movl %3,24(%0)
X	movl %4,28(%0)
X	movl 32(%%esp),%3
X	movl 36(%%esp),%4
X	movl %3,32(%0)
X	movl %4,36(%0)
X	movl 40(%%esp),%3
X	movl 44(%%esp),%4
X	movl %3,40(%0)
X	movl %4,44(%0)
X	movl 48(%%esp),%3
X	movl 52(%%esp),%4
X	movl %3,48(%0)
X	movl %4,52(%0)
X	movl 56(%%esp),%3
X	movl 60(%%esp),%4
X	movl %3,56(%0)
X	movl %4,60(%0)
X	addl $32,%0
X	addl $32,%1
X	cmpl %0,%2
X	ja 1b"
X	: "=r" (dst), "=r" (src), "=r" (junk), "=&r" (tmp1), "=&r" (tmp2)
X	: "0"  (dst), "1"  (src), "2" ((char *) dst + len)
X	: "memory");
X}
X
Xstatic void copy7(void *dst, const void *src, size_t len)
X{
X    void *junk;
X
X    asm volatile("
X	.align 4,0x90
X	1:
X	fildq 0(%1)
X	fildq 8(%1)
X	fildq 16(%1)
X	fildq 24(%1)
X	fildq 32(%1)
X	fildq 40(%1)
X	fildq 48(%1)
X	fildq 56(%1)
X	fistpq 56(%0)
X	fistpq 48(%0)
X	fistpq 40(%0)
X	fistpq 32(%0)
X	fistpq 24(%0)
X	fistpq 16(%0)
X	fistpq 8(%0)
X	fistpq 0(%0)
X	addl $64,%0
X	addl $64,%1
X	cmpl %0,%2
X	ja 1b"
X	: "=r" (dst), "=r" (src), "=r" (junk)
X	: "0"  (dst), "1"  (src), "2" ((char *) dst + len)
X	: "memory");
X}
Xstatic void usage(void)
X{
X    fprintf(stderr, "%s: [-5cpq] [-f function] [-l length] [-t tot]\n",
X	    progname);
X    exit(1);
X}
END_OF_FILE
if test 9577 -ne `wc -c <'c.c'`; then
    echo shar: \"'c.c'\" unpacked with wrong size!
fi
# end of 'c.c'
fi
if test -f 'r.c' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'r.c'\"
else
echo shar: Extracting \"'r.c'\" \(5120 characters\)
sed "s/^X//" >'r.c' <<'END_OF_FILE'
X#include <sys/types.h>
X#include <sys/time.h>
X#include <sys/resource.h>
X
X#include <machine/cpufunc.h>
X
X#include <stdlib.h>
X#include <stdio.h>
X#include <string.h>
X
Xtypedef void func_t(const void *buf, size_t len);
X
Xstruct func
X{
X    func_t *fn;
X    char *name;
X    char *description;
X};
X
Xstatic func_t read0, read1, read2, read3, read4;
Xstatic void usage(void);
X
Xstatic char const *progname;
X
Xstatic struct func funcs[] =
X{
X    read0, "read0", "lodsl",
X    read1, "read1", "unroll 16",
X    read2, "read2", "unroll 16 prefetch",
X    read3, "read3", "unroll 32 i586-opt",
X    read4, "read4", "unroll 64 i586-opt",
X};
X#define NFUNC	(sizeof funcs / sizeof funcs[0])
X
Xint main(int argc, char **argv)
X{
X    unsigned char *buf;
X    int ch;
X    int funcn;
X    int funcnspecified;
X    int i586;
X    size_t len;
X    size_t max;
X    int precache;
X    int quiet;
X    size_t thrashbufsize;
X    unsigned long long tot;
X
X    progname = argv[0];
X    funcnspecified = -1;
X    i586 = 0;
X    len = 4096;
X    precache = 0;
X    quiet = 0;
X    tot = 100000000;
X    while ((ch = getopt(argc, argv, "5f:l:pqt:")) != EOF)
X    {
X	switch (ch)
X	{
X	case '5':
X	    i586 = 1;
X	    break;
X	case 'f':
X	    funcnspecified = strtoul(optarg, (char **) NULL, 0);
X	    if (funcnspecified < 0 || funcnspecified >= NFUNC)
X		usage();
X	    break;
X	case 'l':
X	    len = strtoul(optarg, (char **) NULL, 0);
X	    break;
X	case 'p':
X	    precache = 1;
X	    break;
X	case 'q':
X	    quiet = 1;
X	    break;
X	case 't':
X	    tot = strtouq(optarg, (char **) NULL, 0);
X	    break;
X	default:
X	    usage();
X	}
X    }
X    if (optind != argc)
X	usage();
X    buf = malloc(len + 64);
X    if (buf == NULL)
X    {
X	fprintf(stderr, "%s: malloc failed\n", progname);
X	exit(1);
X    }
X    max = tot / len;
X
X    for (funcn = 0; funcn < NFUNC; ++funcn)
X    {
X	func_t *funcp;
X	struct rusage finish;
X	size_t i;
X	struct rusage start;
X	unsigned long long tsc;
X	long usec;
X
X	if (funcnspecified != -1 && funcnspecified != funcn)
X	    continue;
X
X	/*
X	 * Make sure that the buffer isn't a constant zero page, and leave
X	 * as much of the buffer as possible in the cache to set up the
X	 * `precache' case.
X	 */
X	memset(buf, 1, len);
X	funcp = funcs[funcn].fn;
X	funcp(buf, len);
X
X	if (!precache)
X	    /*
X	     * Attempt to uncache the buffer so as to provide the same
X	     * uncached environnment for all the functions.
X	     */
X	    for (thrashbufsize = 2 * 1024 * 1024; thrashbufsize != 0;
X		 thrashbufsize /= 2)
X	    {
X		unsigned char *thrashbuf1;
X		unsigned char *thrashbuf2;
X
X		thrashbuf1 = malloc(thrashbufsize);
X		thrashbuf2 = malloc(thrashbufsize);
X		if (thrashbuf1 != NULL && thrashbuf2 != NULL)
X		{
X		    memcpy(thrashbuf2, thrashbuf1, thrashbufsize);
X		    memcpy(thrashbuf1, thrashbuf2, thrashbufsize);
X		}
X		free(thrashbuf1);
X		free(thrashbuf2);
X	    }
X
X	tsc = 0;
X	getrusage(RUSAGE_SELF, &start);
X	if (i586)
X	    tsc = rdtsc();
X	for (i = 0; i < max; ++i)
X	    funcp(buf, len);
X	if (i586)
X	    tsc = rdtsc() - tsc;
X	getrusage(RUSAGE_SELF, &finish);
X	usec = 1000000 * (finish.ru_utime.tv_sec - start.ru_utime.tv_sec)
X	       + finish.ru_utime.tv_usec - start.ru_utime.tv_usec;
X	if (usec < 0)
X	    usec = 1;
X	printf("%s: %10.0f B/s", funcs[funcn].name, tot * 1e6 / usec);
X	if (!quiet)
X	{
X	    printf(" (%7ld us)", usec);
X	    if (i586)
X		printf(" (%9qd tsc)", tsc);
X	    printf(" (%s)", funcs[funcn].description);
X	}
X	printf("\n");
X    }
X    return 0;
X}
X
Xstatic void read0(const void *buf, size_t len)
X{
X    asm volatile("
X	.align 4,0x90
X	cld
X	shrl $2,%1
X	rep; lodsl"
X	: "=S" (buf), "=c" (len)
X	: "0"  (buf), "1"  (len));
X}
X
Xstatic void read1(const void *buf, size_t len)
X{
X    unsigned junk;
X
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl 0(%0),%2
X	movl 4(%0),%2
X	movl 8(%0),%2
X	movl 12(%0),%2
X	addl $16,%0
X	subl $16,%1
X	ja 1b"
X	: "=r" (buf), "=r" (len), "=&r" (junk)
X	: "0"  (buf), "1"  (len));
X}
X
Xstatic void read2(const void *buf, size_t len)
X{
X    unsigned junk;
X
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl 0(%0),%2
X	movl 4(%0),%2
X	movl 8(%0),%2
X	movl 16(%0),%2
X	movl 12(%0),%2
X	addl $16,%0
X	subl $16,%1
X	ja 1b"
X	: "=r" (buf), "=r" (len), "=&r" (junk)
X	: "0"  (buf), "1"  (len));
X}
X
Xstatic void read3(const void *buf, size_t len)
X{
X    unsigned junk1;
X    unsigned junk2;
X
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl 0(%0),%1
X	movl 4(%0),%2
X	movl 8(%0),%1
X	movl 12(%0),%2
X	movl 16(%0),%1
X	movl 20(%0),%2
X	movl 24(%0),%1
X	movl 28(%0),%2
X	addl $32,%0
X	cmpl %4,%0
X	jb 1b"
X	: "=r" (buf), "=&r" (junk1), "=&r" (junk2)
X	: "0"  (buf), "r" ((char *) buf + len));
X}
X
Xstatic void read4(const void *buf, size_t len)
X{
X    unsigned junk1;
X    unsigned junk2;
X
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl 0(%0),%1
X	movl 4(%0),%2
X	movl 8(%0),%1
X	movl 12(%0),%2
X	movl 16(%0),%1
X	movl 20(%0),%2
X	movl 24(%0),%1
X	movl 28(%0),%2
X	movl 32(%0),%2
X	movl 36(%0),%1
X	movl 40(%0),%2
X	movl 44(%0),%1
X	movl 48(%0),%2
X	movl 52(%0),%2
X	movl 56(%0),%1
X	movl 60(%0),%2
X	addl $64,%0
X	cmpl %4,%0
X	jb 1b"
X	: "=r" (buf), "=&r" (junk1), "=&r" (junk2)
X	: "0"  (buf), "r" ((char *) buf + len));
X}
X
Xstatic void usage(void)
X{
X    fprintf(stderr, "%s: [-5cpq] [-f function] [-l length] [-t tot]\n",
X	    progname);
X    exit(1);
X}
END_OF_FILE
if test 5120 -ne `wc -c <'r.c'`; then
    echo shar: \"'r.c'\" unpacked with wrong size!
fi
# end of 'r.c'
fi
if test -f 'w.c' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'w.c'\"
else
echo shar: Extracting \"'w.c'\" \(6940 characters\)
sed "s/^X//" >'w.c' <<'END_OF_FILE'
X#include <sys/types.h>
X#include <sys/time.h>
X#include <sys/resource.h>
X
X#include <machine/cpufunc.h>
X
X#include <stdlib.h>
X#include <stdio.h>
X#include <string.h>
X
Xtypedef void func_t(void *buf, size_t len);
X
Xstruct func
X{
X    func_t *fn;
X    char *name;
X    char *description;
X};
X
Xstatic func_t zero0, zero1, zero2, zero3, zero4, zero5;
Xstatic void usage(void);
X
Xstatic char const *progname;
X
Xstatic struct func funcs[] =
X{
X    zero0, "zero0", "stosl",
X    zero1, "zero1", "unroll 16",
X    zero2, "zero2", "unroll 16 prefetch",
X    zero3, "zero3", "unroll 64 i586-opt",
X    zero4, "zero4", "unroll 64 i586-opt prefetch",
X    zero5, "zero5", "unroll 64 i586-opx prefetch",
X    bzero, "zero6", "bzero (stosl)",
X};
X#define NFUNC	(sizeof funcs / sizeof funcs[0])
X
Xint main(int argc, char **argv)
X{
X    unsigned char *buf;
X    int ch;
X    int funcn;
X    int funcnspecified;
X    int i586;
X    size_t len;
X    size_t max;
X    int precache;
X    int quiet;
X    size_t thrashbufsize;
X    unsigned long long tot;
X
X    progname = argv[0];
X    funcnspecified = -1;
X    i586 = 0;
X    len = 4096;
X    precache = 0;
X    quiet = 0;
X    tot = 100000000;
X    while ((ch = getopt(argc, argv, "5f:l:pqt:")) != EOF)
X    {
X	switch (ch)
X	{
X	case '5':
X	    i586 = 1;
X	    break;
X	case 'f':
X	    funcnspecified = strtoul(optarg, (char **) NULL, 0);
X	    if (funcnspecified < 0 || funcnspecified >= NFUNC)
X		usage();
X	    break;
X	case 'l':
X	    len = strtoul(optarg, (char **) NULL, 0);
X	    break;
X	case 'p':
X	    precache = 1;
X	    break;
X	case 'q':
X	    quiet = 1;
X	    break;
X	case 't':
X	    tot = strtouq(optarg, (char **) NULL, 0);
X	    break;
X	default:
X	    usage();
X	}
X    }
X    if (optind != argc)
X	usage();
X    buf = malloc(len + 64);
X    if (buf == NULL)
X    {
X	fprintf(stderr, "%s: malloc failed\n", progname);
X	exit(1);
X    }
X    max = tot / len;
X
X    for (funcn = 0; funcn < NFUNC; ++funcn)
X    {
X	func_t *funcp;
X	struct rusage finish;
X	size_t i;
X	struct rusage start;
X	unsigned long long tsc;
X	long usec;
X
X	if (funcnspecified != -1 && funcnspecified != funcn)
X	    continue;
X
X	/*
X	 * Check the function.  As side effects, make sure that the buffer
X	 * isn't a constant zero page, and leave as much of the buffer as
X	 * possible in the cache to set up the `precache' case.
X	 */
X	memset(buf, 1, len);
X	funcp = funcs[funcn].fn;
X	funcp(buf, len);
X	for (i = 0; i < len; ++i)
X	    if (buf[i] != '\0')
X	    {
X		fprintf(stderr, "%s: %s failed\n", progname, funcs[funcn].name);
X		exit(1);
X	    }
X
X	if (!precache)
X	    /*
X	     * Attempt to uncache the buffer so as to provide the same
X	     * uncached environnment for all the functions.
X	     */
X	    for (thrashbufsize = 2 * 1024 * 1024; thrashbufsize != 0;
X		 thrashbufsize /= 2)
X	    {
X		unsigned char *thrashbuf1;
X		unsigned char *thrashbuf2;
X
X		thrashbuf1 = malloc(thrashbufsize);
X		thrashbuf2 = malloc(thrashbufsize);
X		if (thrashbuf1 != NULL && thrashbuf2 != NULL)
X		{
X		    memcpy(thrashbuf2, thrashbuf1, thrashbufsize);
X		    memcpy(thrashbuf1, thrashbuf2, thrashbufsize);
X		}
X		free(thrashbuf1);
X		free(thrashbuf2);
X	    }
X
X	tsc = 0;
X	getrusage(RUSAGE_SELF, &start);
X	if (i586)
X	    tsc = rdtsc();
X	for (i = 0; i < max; ++i)
X	    funcp(buf, len);
X	if (i586)
X	    tsc = rdtsc() - tsc;
X	getrusage(RUSAGE_SELF, &finish);
X	usec = 1000000 * (finish.ru_utime.tv_sec - start.ru_utime.tv_sec)
X	       + finish.ru_utime.tv_usec - start.ru_utime.tv_usec;
X	if (usec < 0)
X	    usec = 1;
X	printf("%s: %10.0f B/s", funcs[funcn].name, tot * 1e6 / usec);
X	if (!quiet)
X	{
X	    printf(" (%7ld us)", usec);
X	    if (i586)
X		printf(" (%9qd tsc)", tsc);
X	    printf(" (%s)", funcs[funcn].description);
X	}
X	printf("\n");
X    }
X    return 0;
X}
X
Xstatic void zero0(void *buf, size_t len)
X{
X    asm volatile("
X	.align 4,0x90
X	cld
X	shrl $2,%1
X	rep; stosl"
X	: "=D" (buf), "=c" (len)
X	: "0"  (buf), "1"  (len), "a" (0)
X	: "memory");
X}
X
Xstatic void zero1(void *buf, size_t len)
X{
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl %4,0(%0)
X	movl %4,4(%0)
X	movl %4,8(%0)
X	movl %4,12(%0)
X	addl $16,%0
X	subl $16,%1
X	ja 1b"
X	: "=r" (buf), "=r" (len)
X	: "0"  (buf), "1"  (len), "r" (0)
X	: "memory");
X}
X
Xstatic void zero2(void *buf, size_t len)
X{
X    unsigned prefetch;
X
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl (%0),%2
X	movl %5,0(%0)
X	movl %5,4(%0)
X	movl %5,8(%0)
X	movl %5,12(%0)
X	addl $16,%0
X	subl $16,%1
X	ja 1b"
X	: "=r" (buf), "=r" (len), "=&r" (prefetch)
X	: "0"  (buf), "1"  (len), "r" (0)
X	: "memory");
X}
X
Xstatic void zero3(void *buf, size_t len)
X{
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl %3,0(%0)
X	movl %3,4(%0)
X	movl %3,8(%0)
X	movl %3,12(%0)
X	movl %3,16(%0)
X	movl %3,20(%0)
X	movl %3,24(%0)
X	movl %3,28(%0)
X	movl %3,32(%0)
X	movl %3,36(%0)
X	movl %3,40(%0)
X	movl %3,44(%0)
X	movl %3,48(%0)
X	movl %3,52(%0)
X	movl %3,56(%0)
X	movl %3,60(%0)
X	addl $64,%0
X	cmpl %2,%0
X	jb 1b"
X	: "=r" (buf)
X	: "0"  (buf), "r" ((char *) buf + len), "r" (0)
X	: "memory");
X}
X
Xstatic void zero4(void *buf, size_t len)
X{
X    void *buf2;
X    unsigned prefetch;
X
X    /*
X     * The main loop has 11 pairs of i586 instructions with no AGI so that
X     * it takes 11 cycles on i586's if all the data is in the L1 cache.
X     *
X     * On an ASUS P55TP4XE P133 the speeds are approx:
X     *    data in L1 cache:      740,000,000 B/s
X     *    data in L2 cache only:  90,000,000 B/s (highly variant)
X     *    data not in any cache:  60,000,000 B/s
X     * and without prefetching (function zero3) they are:
X     *    data in L1 cache:       87,000,000 B/s
X     *    data in L2 cache only:  87,000,000 B/s
X     *    data not in any cache:  90,000,000 B/s
X     *
X     * Thus the intruction selection and ordering optimizations have an
X     * insignificant effect if the data isn't in the L1 cache or the L2
X     * cache, and prefetching is a pessimization if the data isn't in the
X     * L2 cache.
X     */
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl (%0),%3
X	leal 32(%0),%2
X	movl %6,0(%0)
X	movl %6,4(%0)
X	movl %6,8(%0)
X	movl %6,12(%0)
X	movl %6,16(%0)
X	movl %6,20(%0)
X	movl %6,24(%0)
X	movl %6,28(%0)
X	movl (%2),%3
X	addl $64,%0
X	movl %6,0(%2)
X	movl %6,4(%2)
X	movl %6,8(%2)
X	movl %6,12(%2)
X	movl %6,16(%2)
X	movl %6,20(%2)
X	movl %6,24(%2)
X	movl %6,28(%2)
X	subl $64,%1
X	ja 1b"
X	: "=r" (buf), "=r" (len), "=&r" (buf2), "=&r" (prefetch)
X	: "0"  (buf), "1"  (len), "r" (0)
X	: "memory");
X}
X
Xstatic void zero5(void *buf, size_t len)
X{
X    void *buf2;
X    unsigned prefetch;
X
X    asm volatile("
X	.align 4,0x90
X	1:
X	movl (%0),%2
X	leal 32(%0),%1
X	movl %5,0(%0)
X	movl %5,4(%0)
X	movl %5,8(%0)
X	movl %5,12(%0)
X	movl %5,16(%0)
X	movl %5,20(%0)
X	movl %5,24(%0)
X	movl %5,28(%0)
X	movl (%1),%2
X	addl $64,%0
X	movl %5,0(%1)
X	movl %5,4(%1)
X	movl %5,8(%1)
X	movl %5,12(%1)
X	movl %5,16(%1)
X	movl %5,20(%1)
X	movl %5,24(%1)
X	movl %5,28(%1)
X	cmpl %4,%0
X	jb 1b"
X	: "=r" (buf), "=&r" (buf2), "=&r" (prefetch)
X	: "0"  (buf), "r" ((char *) buf + len), "r" (0)
X	: "memory");
X}
X
Xstatic void usage(void)
X{
X    fprintf(stderr, "%s: [-5cpq] [-f function] [-l length] [-t tot]\n",
X	    progname);
X    exit(1);
X}
END_OF_FILE
if test 6940 -ne `wc -c <'w.c'`; then
    echo shar: \"'w.c'\" unpacked with wrong size!
fi
# end of 'w.c'
fi
if test -f 'wrc' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'wrc'\"
else
echo shar: Extracting \"'wrc'\" \(479 characters\)
sed "s/^X//" >'wrc' <<'END_OF_FILE'
Xi=1024
Xwhile :
Xdo
X    for b in 0 1 2 3 4 5 6
X    do
X    (echo -n $i' '; ./w -f $b -l $i -q) |
X	awk '{ printf("%8s: %10.0f bytes/sec %s\n", $1, $3, $2) }'
X    done
X    for b in 0 1 2 3 4
X    do
X    (echo -n $i' '; ./r -f $b -l $i -q) |
X	awk '{ printf("%8s: %10.0f bytes/sec %s\n", $1, $3, $2) }'
X    done
X    for b in 0 1 2 3 4 5 6 7 8
X    do
X    (echo -n $i' '; ./c -f $b -l $i -q) |
X	awk '{ printf("%8s: %10.0f bytes/sec %s\n", $1, $3, $2) }'
X    done
X    i=`expr $i + $i`
Xdone
END_OF_FILE
if test 479 -ne `wc -c <'wrc'`; then
    echo shar: \"'wrc'\" unpacked with wrong size!
fi
# end of 'wrc'
fi
if test -f 'wrc.out' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'wrc.out'\"
else
echo shar: Extracting \"'wrc.out'\" \(9576 characters\)
sed "s/^X//" >'wrc.out' <<'END_OF_FILE'
X    1024:   86444769 bytes/sec zero0:
X    1024:   87023795 bytes/sec zero1:
X    1024:  383464990 bytes/sec zero2:
X    1024:   87020993 bytes/sec zero3:
X    1024:  638255775 bytes/sec zero4:
X    1024:  638528830 bytes/sec zero5:
X    1024:  168742196 bytes/sec read0:
X    1024:  385585280 bytes/sec read1:
X    1024:  326719443 bytes/sec read2:
X    1024:  605462483 bytes/sec read3:
X    1024:  716404224 bytes/sec read4:
X    1024:   86265646 bytes/sec copy0:
X    1024:   86545993 bytes/sec copy1:
X    1024:  248412643 bytes/sec copy2:
X    1024:   86715829 bytes/sec copy3:
X    1024:  403417755 bytes/sec copy4:
X    1024:  398523868 bytes/sec copy5:
X    2048:   86907853 bytes/sec zero0:
X    2048:   87022129 bytes/sec zero1:
X    2048:  399996800 bytes/sec zero2:
X    2048:   86737491 bytes/sec zero3:
X    2048:  705024711 bytes/sec zero4:
X    2048:  690197811 bytes/sec zero5:
X    2048:  172957117 bytes/sec read0:
X    2048:  408336600 bytes/sec read1:
X    2048:  338657022 bytes/sec read2:
X    2048:  647240812 bytes/sec read3:
X    2048:  787730313 bytes/sec read4:
X    2048:   87124710 bytes/sec copy0:
X    2048:   87003428 bytes/sec copy1:
X    2048:  215849389 bytes/sec copy2:
X    2048:   86859692 bytes/sec copy3:
X    2048:  344827586 bytes/sec copy4:
X    2048:  340708537 bytes/sec copy5:
X    4096:   86660970 bytes/sec zero0:
X    4096:   86803822 bytes/sec zero1:
X    4096:  413086583 bytes/sec zero2:
X    4096:   87588990 bytes/sec zero3:
X    4096:  760994468 bytes/sec zero4:
X    4096:  724716455 bytes/sec zero5:
X    4096:  174118958 bytes/sec read0:
X    4096:  414447645 bytes/sec read1:
X    4096:  343425463 bytes/sec read2:
X    4096:  668619034 bytes/sec read3:
X    4096:  807324044 bytes/sec read4:
X    4096:   86945635 bytes/sec copy0:
X    4096:   86956144 bytes/sec copy1:
X    4096:  214172662 bytes/sec copy2:
X    4096:   86677646 bytes/sec copy3:
X    4096:  337507172 bytes/sec copy4:
X    4096:  337204440 bytes/sec copy5:
X    8192:   86985794 bytes/sec zero0:
X    8192:   86898866 bytes/sec zero1:
X    8192:  359670829 bytes/sec zero2:
X    8192:   86931349 bytes/sec zero3:
X    8192:  561722015 bytes/sec zero4:
X    8192:  565892526 bytes/sec zero5:
X    8192:  166822089 bytes/sec read0:
X    8192:  368969652 bytes/sec read1:
X    8192:  313836748 bytes/sec read2:
X    8192:  563815452 bytes/sec read3:
X    8192:  667017963 bytes/sec read4:
X    8192:   84435213 bytes/sec copy0:
X    8192:   85045367 bytes/sec copy1:
X    8192:   81536141 bytes/sec copy2:
X    8192:   84545505 bytes/sec copy3:
X    8192:   61797899 bytes/sec copy4:
X    8192:   97250060 bytes/sec copy5:
X   16384:   86577163 bytes/sec zero0:
X   16384:   86511701 bytes/sec zero1:
X   16384:   91782021 bytes/sec zero2:
X   16384:   86879841 bytes/sec zero3:
X   16384:  131223435 bytes/sec zero4:
X   16384:  130990009 bytes/sec zero5:
X   16384:  110540965 bytes/sec read0:
X   16384:  161218814 bytes/sec read1:
X   16384:  161114656 bytes/sec read2:
X   16384:  190427223 bytes/sec read3:
X   16384:  197610103 bytes/sec read4:
X   16384:   65619295 bytes/sec copy0:
X   16384:   65469492 bytes/sec copy1:
X   16384:   66021156 bytes/sec copy2:
X   16384:   63431009 bytes/sec copy3:
X   16384:   97963150 bytes/sec copy4:
X   16384:   97608971 bytes/sec copy5:
X   32768:   86671636 bytes/sec zero0:
X   32768:   86648280 bytes/sec zero1:
X   32768:  129736686 bytes/sec zero2:
X   32768:   86801411 bytes/sec zero3:
X   32768:  106576631 bytes/sec zero4:
X   32768:  131485351 bytes/sec zero5:
X   32768:   95844471 bytes/sec read0:
X   32768:  161903205 bytes/sec read1:
X   32768:  131358921 bytes/sec read2:
X   32768:  190906366 bytes/sec read3:
X   32768:  195968535 bytes/sec read4:
X   32768:   65916363 bytes/sec copy0:
X   32768:   65692663 bytes/sec copy1:
X   32768:   81768489 bytes/sec copy2:
X   32768:   64470792 bytes/sec copy3:
X   32768:   99775406 bytes/sec copy4:
X   32768:   99771523 bytes/sec copy5:
X   65536:   87012285 bytes/sec zero0:
X   65536:   87014405 bytes/sec zero1:
X   65536:  116457683 bytes/sec zero2:
X   65536:   86789810 bytes/sec zero3:
X   65536:  118404569 bytes/sec zero4:
X   65536:  133328356 bytes/sec zero5:
X   65536:  111068905 bytes/sec read0:
X   65536:  162809970 bytes/sec read1:
X   65536:  145247291 bytes/sec read2:
X   65536:  169225354 bytes/sec read3:
X   65536:  175326326 bytes/sec read4:
X   65536:   65905024 bytes/sec copy0:
X   65536:   61498346 bytes/sec copy1:
X   65536:   73303035 bytes/sec copy2:
X   65536:   63758676 bytes/sec copy3:
X   65536:   74987983 bytes/sec copy4:
X   65536:   70467198 bytes/sec copy5:
X  131072:   86827338 bytes/sec zero0:
X  131072:   86796740 bytes/sec zero1:
X  131072:   98350950 bytes/sec zero2:
X  131072:   87126912 bytes/sec zero3:
X  131072:  112326105 bytes/sec zero4:
X  131072:   97081535 bytes/sec zero5:
X  131072:  103272926 bytes/sec read0:
X  131072:  126374640 bytes/sec read1:
X  131072:  133852323 bytes/sec read2:
X  131072:  158405302 bytes/sec read3:
X  131072:  186528536 bytes/sec read4:
X  131072:   57519462 bytes/sec copy0:
X  131072:   63329899 bytes/sec copy1:
X  131072:   57108462 bytes/sec copy2:
X  131072:   57810786 bytes/sec copy3:
X  131072:   65174136 bytes/sec copy4:
X  131072:   61808746 bytes/sec copy5:
X  262144:   86989956 bytes/sec zero0:
X  262144:   86783408 bytes/sec zero1:
X  262144:   96834012 bytes/sec zero2:
X  262144:   86996994 bytes/sec zero3:
X  262144:   89947713 bytes/sec zero4:
X  262144:   94686658 bytes/sec zero5:
X  262144:   89521267 bytes/sec read0:
X  262144:  122771091 bytes/sec read1:
X  262144:  122058542 bytes/sec read2:
X  262144:  130217489 bytes/sec read3:
X  262144:  136078490 bytes/sec read4:
X  262144:   55315180 bytes/sec copy0:
X  262144:   53134568 bytes/sec copy1:
X  262144:   49456106 bytes/sec copy2:
X  262144:   55849198 bytes/sec copy3:
X  262144:   50385373 bytes/sec copy4:
X  262144:   50291869 bytes/sec copy5:
X  524288:   87288369 bytes/sec zero0:
X  524288:   87189203 bytes/sec zero1:
X  524288:   83525163 bytes/sec zero2:
X  524288:   87196730 bytes/sec zero3:
X  524288:   83545260 bytes/sec zero4:
X  524288:   81666720 bytes/sec zero5:
X  524288:   79887327 bytes/sec read0:
X  524288:   99394686 bytes/sec read1:
X  524288:   98545954 bytes/sec read2:
X  524288:  119443204 bytes/sec read3:
X  524288:  115947541 bytes/sec read4:
X  524288:   48821665 bytes/sec copy0:
X  524288:   46609855 bytes/sec copy1:
X  524288:   40622945 bytes/sec copy2:
X  524288:   47938225 bytes/sec copy3:
X  524288:   43924033 bytes/sec copy4:
X  524288:   43559357 bytes/sec copy5:
X 1048576:   87285322 bytes/sec zero0:
X 1048576:   87283950 bytes/sec zero1:
X 1048576:   72887094 bytes/sec zero2:
X 1048576:   87666698 bytes/sec zero3:
X 1048576:   71178025 bytes/sec zero4:
X 1048576:   71653666 bytes/sec zero5:
X 1048576:   71103426 bytes/sec read0:
X 1048576:   89790304 bytes/sec read1:
X 1048576:   89479774 bytes/sec read2:
X 1048576:   97097748 bytes/sec read3:
X 1048576:  100427722 bytes/sec read4:
X 1048576:   43895304 bytes/sec copy0:
X 1048576:   44224540 bytes/sec copy1:
X 1048576:   38171275 bytes/sec copy2:
X 1048576:   43119302 bytes/sec copy3:
X 1048576:   40846920 bytes/sec copy4:
X 1048576:   41458081 bytes/sec copy5:
X 2097152:   87974192 bytes/sec zero0:
X 2097152:   88215783 bytes/sec zero1:
X 2097152:   68610070 bytes/sec zero2:
X 2097152:   88287436 bytes/sec zero3:
X 2097152:   68917465 bytes/sec zero4:
X 2097152:   67962716 bytes/sec zero5:
X 2097152:   69144721 bytes/sec read0:
X 2097152:   85944100 bytes/sec read1:
X 2097152:   85587347 bytes/sec read2:
X 2097152:   93399635 bytes/sec read3:
X 2097152:   94968024 bytes/sec read4:
X 2097152:   42594046 bytes/sec copy0:
X 2097152:   42654601 bytes/sec copy1:
X 2097152:   38098402 bytes/sec copy2:
X 2097152:   41848656 bytes/sec copy3:
X 2097152:   40400417 bytes/sec copy4:
X 2097152:   40570256 bytes/sec copy5:
X 4194304:   90166041 bytes/sec zero0:
X 4194304:   89977416 bytes/sec zero1:
X 4194304:   69796432 bytes/sec zero2:
X 4194304:   90171569 bytes/sec zero3:
X 4194304:   69706765 bytes/sec zero4:
X 4194304:   69677283 bytes/sec zero5:
X 4194304:   70125539 bytes/sec read0:
X 4194304:   86942989 bytes/sec read1:
X 4194304:   86395477 bytes/sec read2:
X 4194304:   94129873 bytes/sec read3:
X 4194304:   95756365 bytes/sec read4:
X 4194304:   43384927 bytes/sec copy0:
X 4194304:   43264366 bytes/sec copy1:
X 4194304:   38952141 bytes/sec copy2:
X 4194304:   43132451 bytes/sec copy3:
X 4194304:   42045921 bytes/sec copy4:
X 4194304:   41550170 bytes/sec copy5:
X 8388608:   94921870 bytes/sec zero0:
X 8388608:   94142988 bytes/sec zero1:
X 8388608:   72916910 bytes/sec zero2:
X 8388608:   94257459 bytes/sec zero3:
X 8388608:   72880082 bytes/sec zero4:
X 8388608:   72842709 bytes/sec zero5:
X 8388608:   73575342 bytes/sec read0:
X 8388608:   90701385 bytes/sec read1:
X 8388608:   90326322 bytes/sec read2:
X 8388608:   97226800 bytes/sec read3:
X 8388608:  100359689 bytes/sec read4:
X 8388608:   44787236 bytes/sec copy0:
X 8388608:   45069204 bytes/sec copy1:
X 8388608:   40680901 bytes/sec copy2:
X 8388608:   44366812 bytes/sec copy3:
X 8388608:   42067836 bytes/sec copy4:
X 8388608:   41896241 bytes/sec copy5:
X16777216:   98608729 bytes/sec zero0:
X16777216:   98309760 bytes/sec zero1:
X16777216:   78795128 bytes/sec zero2:
X16777216:   99273812 bytes/sec zero3:
X16777216:   77295522 bytes/sec zero4:
X16777216:   78430757 bytes/sec zero5:
X16777216:   78362414 bytes/sec read0:
X16777216:   94250973 bytes/sec read1:
X16777216:   96350252 bytes/sec read2:
X16777216:  100450319 bytes/sec read3:
X16777216:  110066095 bytes/sec read4:
X16777216:   46661988 bytes/sec copy0:
X16777216:   45378516 bytes/sec copy1:
X16777216:   42331805 bytes/sec copy2:
END_OF_FILE
if test 9576 -ne `wc -c <'wrc.out'`; then
    echo shar: \"'wrc.out'\" unpacked with wrong size!
fi
# end of 'wrc.out'
fi
echo shar: End of shell archive.
exit 0



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199604052300.JAA28284>