Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Mar 2001 16:13:17 -0800 (PST)
From:      Matt Dillon <dillon@earth.backplane.com>
To:        Chris Dillon <cdillon@wolves.k12.mo.us>
Cc:        "E.B. Dreger" <eddy+public+spam@noc.everquick.net>, <freebsd-hackers@FreeBSD.ORG>
Subject:   Re: Machines are getting too damn fast
Message-ID:  <200103060013.f260DHY46910@earth.backplane.com>
References:   <Pine.BSF.4.32.0103051729350.84853-100000@mail.wolves.k12.mo.us>

next in thread | previous in thread | raw e-mail | index | archive | help
:IIRC, Intel is using a very different caching method on the P4 from
:what we are used to on just about every other x86 processor we've
:seen.  Well, I can't remember if the data cache has changed much, but
:the instruction cache has.  I doubt the difference in instruction
:cache behaviour would make a difference here though.  Hmm.
:
:I wonder if it makes any difference that I'm using -march=pentium
:-mcpu=pentium for my CFLAGS?  Actually, the kernel I tested on might
:even be using -march/-mcpu=pentiumpro, since I only recently changed
:it to =pentium to allow me to do buildworlds for another Pentium-class
:machine.  I did wonder the same thing a while back and did the same
:test with and without the optimizations, and with pentiumpro opts the
:big block size transfer rate went _down_ a little bit, which was odd.
:I didn't compare with L2-cache-friendly blocks, though.
:
:-- Chris Dillon - cdillon@wolves.k12.mo.us - cdillon@inter-linc.net

    I modified my original C program again, this time to simply read
    the data from memory given a block size in kilobytes as an argument.  
    I had to throw in a little __asm to do it right, but here are my results.
    It shows about 3.2 GBytes/sec from the L2 (well, insofar as my
    3-instruction loop goes), and about 1.4 GBytes/sec from main memory.


NOTE:  cc x.c -O2 -o x

./x 4
3124.96 MBytes/sec (read)

./x 8
3242.45 MBytes/sec (read)

./x 16
3060.93 MBytes/sec (read)

./x 32
3359.97 MBytes/sec (read)

./x 64
3362.06 MBytes/sec (read)

./x 128
3365.53 MBytes/sec (read)

./x 240
3307.86 MBytes/sec (read)

./x 256
3232.33 MBytes/sec (read)

./x 512
1396.45 MBytes/sec (read)

./x 1024
1397.90 MBytes/sec (read)

    In contrast I get 1052.50 MBytes/sec on the Dell 2400 from the L2,
    and 444 MBytes/sec from main memory.

					-Matt

/*
 * NOTE:  cc x.c -O2 -o x
 */

#include <sys/types.h>
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <unistd.h>

int deltausecs(struct timeval *tv1, struct timeval *tv2);

int
main(int ac, char **av)
{
    int i;
    int bytes;
    double dtime;
    struct timeval tv1;
    struct timeval tv2;
    char *buf;

    if (ac == 1) {
	fprintf(stderr, "%s numKB\n", av[0]);
	exit(1);
    }
    bytes = strtol(av[1], NULL, 0) * 1024;
    if (bytes < 4 * 1024 || bytes > 256 * 1024 * 1024) {
	fprintf(stderr, "Oh please.  Try a reasonable value\n");
	exit(1);
    }
    buf = malloc(bytes);
    if (buf == NULL) {
	perror("malloc");
	exit(1);
    }
    bzero(buf, bytes);

    gettimeofday(&tv1, NULL);
    for (i = 0; i < 1000000000; i += bytes) {
	register int j;

	for (j = bytes - 4; j >= 0; j -= 4)
	    __asm __volatile("movl (%0,%1),%%eax" : 
		"=r" (buf), "=r" (j) :
		"0" (buf), "1" (j) : "ax" );
    }
    gettimeofday(&tv2, NULL);

    dtime = (double)deltausecs(&tv1, &tv2);
    printf("%6.2f MBytes/sec (read)\n", (double)1000000000 / dtime);
    return(0);
}

int
deltausecs(struct timeval *tv1, struct timeval *tv2)
{
    int usec;

    usec = (tv2->tv_usec + 1000000 - tv1->tv_usec);
    usec += (tv2->tv_sec - tv1->tv_sec - 1) * 1000000;
    return(usec);
}


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200103060013.f260DHY46910>