From owner-freebsd-alpha  Fri Sep  3 18:39:52 1999
Delivered-To: freebsd-alpha@freebsd.org
Received: from wall.polstra.com (rtrwan160.accessone.com [206.213.115.74])
	by hub.freebsd.org (Postfix) with ESMTP id 7759E14DB5
	for <alpha@freebsd.org>; Fri,  3 Sep 1999 18:39:40 -0700 (PDT)
	(envelope-from jdp@polstra.com)
Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13])
	by wall.polstra.com (8.9.3/8.9.1) with ESMTP id SAA06702;
	Fri, 3 Sep 1999 18:37:41 -0700 (PDT)
	(envelope-from jdp@polstra.com)
From: John Polstra <jdp@polstra.com>
Received: (from jdp@localhost)
	by vashon.polstra.com (8.9.3/8.9.1) id SAA10144;
	Fri, 3 Sep 1999 18:37:41 -0700 (PDT)
	(envelope-from jdp@polstra.com)
Date: Fri, 3 Sep 1999 18:37:41 -0700 (PDT)
Message-Id: <199909040137.SAA10144@vashon.polstra.com>
To: gmckinney@megabits.net
Subject: Re: relative alpha speed
In-Reply-To: <00ee01bef5fc$203599e0$1e00000a@gary2.megabits.net>
References: <00ee01bef5fc$203599e0$1e00000a@gary2.megabits.net>
Organization: Polstra & Co., Seattle, WA
Cc: alpha@freebsd.org
Sender: owner-freebsd-alpha@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

In article <00ee01bef5fc$203599e0$1e00000a@gary2.megabits.net>,
Gary McKinney <gmckinney@megabits.net> wrote:
> There is also a thread in the Debian Linux group about performance of the
> floating point lib routines (the lib routines are patterned after the Intel
> FP unit and the Alpha is a different creature with different needs).

The test I used was a matrix multiplication function which had been
carefully hand-optimized to take advantage of the Alpha pipeline.
(Sorry, the program is not mine to redistribute.  Hidetoshi Shimokawa
kindly got permission from the author to send me a copy.)  It didn't
call any floating point library functions.  It was blazingly fast when
compiled without "-mieee".  But when built with "-mieee" it was much
slower on my 533 MHz 164LX than on my 400 MHz PII.

Hidetoshi told me:

    This program exploits four pipelines(2 integers, 2 floats), and
    L1(or L2, I forgot detail) cache efficiently.  You can easily
    check this by 'iprobe quad'.  (Even if it is compiled with
    `-mieee', it still issuing quad instructions at a cycle)

Further testing indicated that enabling precise traps was responsible
for the performance degradation.  N.B., no traps were actually
generated -- I instrumented the kernel to make sure.  But the changes
in the code generation to permit identifying which instruction caused
a trap slowed it down.

I'm not sure how much the code generation could improved.  Right now
it emits "trapb" (trap barrier) instructions in the places it thinks
they're needed -- not after each FP instruction.  It's not obvious
that it could do much better.

John
-- 
  John Polstra                                               jdp@polstra.com
  John D. Polstra & Co., Inc.                        Seattle, Washington USA
  "No matter how cynical I get, I just can't keep up."        -- Nora Ephron


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message