From owner-freebsd-hackers  Fri Jan 31 09:40:30 1997
Return-Path: <owner-hackers>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id JAA03084
          for hackers-outgoing; Fri, 31 Jan 1997 09:40:30 -0800 (PST)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id JAA03073
          for <hackers@freefall.freebsd.org>; Fri, 31 Jan 1997 09:40:18 -0800 (PST)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id KAA03009; Fri, 31 Jan 1997 10:38:37 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199701311738.KAA03009@phaeton.artisoft.com>
Subject: Re: performance puzzler
To: ajones@ctron.com (Alexander Seth Jones)
Date: Fri, 31 Jan 1997 10:38:37 -0700 (MST)
Cc: hackers@freefall.freebsd.org
In-Reply-To: <32F20D0B.6385@ctron.com> from "Alexander Seth Jones" at Jan 31, 97 10:17:31 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

>   The code consists of protocol messages being encoded into a byte
> stream, and then decoded back into a C++ object.  One of the messages is
> 200 bytes in length, and I successfully decode it, do all error
> checking, etc., in about 0.5 milliseconds.  This is without -O, and with
> -m486 and -ggdb.  The hardware is an Intel 486-66, running
> 2.1.5-RELEASE.
> 
>   The puzzling thing comes when I try to run the test at home on my AMD
> 486-120, running 2.1.0-RELEASE.  It runs the test in 0.6 milliseconds!!

Divide each clock speed by increasing integer values starting with 1
until the result is less than or equal to 33.  This is your max bus
speed possible for the system.  An easy way to do this is magnitude
based arithmatic (yes, I own a slide-rule):

exp(log(120)%log(33)) = 30
exp(log(66)%log(33)) = 33

Your bus on the 120 is 3MHz slower than the bus on the 66.  What you
are doing is not I/O bound, it is CPU bound.

It is a common mistake to believe that a clock multiplied CPU will make
everything faster, and frequently people trade down bus speed to trade
up CPU speed.  In point of fact, access to everything but L1 is done
at bus speed, not CPU speed, and access to non-L1, non-L2 potentially
causes I/O wait states.


These are the results you should expect on I/O bound operations, even
on CPU's from the same chipmask.

There may be AMD-specific instruction speed difference on top of this.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.