Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Mar 2001 17:38:24 -0600 (CST)
From:      Chris Dillon <cdillon@wolves.k12.mo.us>
To:        Matt Dillon <dillon@earth.backplane.com>
Cc:        "E.B. Dreger" <eddy+public+spam@noc.everquick.net>, <freebsd-hackers@FreeBSD.ORG>
Subject:   Re: Machines are getting too damn fast
Message-ID:  <Pine.BSF.4.32.0103051729350.84853-100000@mail.wolves.k12.mo.us>
In-Reply-To: <200103052324.f25NOin45226@earth.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 5 Mar 2001, Matt Dillon wrote:

> :throughput.  For example, on the PIII-850 (116MHz FSB and SDRAM, its
> :overclocked) here on my desk with 256KB L2 cache:
> :
> :dd if=/dev/zero of=/dev/null bs=512k count=4000
> :4000+0 records in
> :4000+0 records out
> :2097152000 bytes transferred in 8.229456 secs (254834825 bytes/sec)
> :
> :dd if=/dev/zero of=/dev/null bs=128k count=16000
> :16000+0 records in
> :16000+0 records out
> :2097152000 bytes transferred in 1.204001 secs (1741819224 bytes/sec)
> :
> :Now THAT is a significant difference.  :-)
>
>     Interesting.  I get very different results with the 1.3 GHz P4.  The
>     best I seem to get is 1.4 GBytes/sec.  I'm not sure what the L2 cache
>     is on the box, but it's definitely a consumer model.
>
>     dd if=/dev/zero of=/dev/null bs=512k count=4000
>     2097152000 bytes transferred in 2.363903 secs (887156520 bytes/sec)
>
>     dd if=/dev/zero of=/dev/null bs=128k count=16000
>     2097152000 bytes transferred in 1.471046 secs (1425619621 bytes/sec)
>
>     If I use lower block sizes the syscall overhead blows up the
>     performance (it gets lower rather then higher).  So I figure I don't
>     have as much L2 as on your system.

IIRC, Intel is using a very different caching method on the P4 from
what we are used to on just about every other x86 processor we've
seen.  Well, I can't remember if the data cache has changed much, but
the instruction cache has.  I doubt the difference in instruction
cache behaviour would make a difference here though.  Hmm.

I wonder if it makes any difference that I'm using -march=pentium
-mcpu=pentium for my CFLAGS?  Actually, the kernel I tested on might
even be using -march/-mcpu=pentiumpro, since I only recently changed
it to =pentium to allow me to do buildworlds for another Pentium-class
machine.  I did wonder the same thing a while back and did the same
test with and without the optimizations, and with pentiumpro opts the
big block size transfer rate went _down_ a little bit, which was odd.
I didn't compare with L2-cache-friendly blocks, though.


-- Chris Dillon - cdillon@wolves.k12.mo.us - cdillon@inter-linc.net
   FreeBSD: The fastest and most stable server OS on the planet.
   For IA32 and Alpha architectures. IA64, PPC, and ARM under development.
   http://www.freebsd.org



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.32.0103051729350.84853-100000>