From owner-freebsd-alpha Fri Sep 3 18:39:52 1999 Delivered-To: freebsd-alpha@freebsd.org Received: from wall.polstra.com (rtrwan160.accessone.com [206.213.115.74]) by hub.freebsd.org (Postfix) with ESMTP id 7759E14DB5 for ; Fri, 3 Sep 1999 18:39:40 -0700 (PDT) (envelope-from jdp@polstra.com) Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13]) by wall.polstra.com (8.9.3/8.9.1) with ESMTP id SAA06702; Fri, 3 Sep 1999 18:37:41 -0700 (PDT) (envelope-from jdp@polstra.com) From: John Polstra Received: (from jdp@localhost) by vashon.polstra.com (8.9.3/8.9.1) id SAA10144; Fri, 3 Sep 1999 18:37:41 -0700 (PDT) (envelope-from jdp@polstra.com) Date: Fri, 3 Sep 1999 18:37:41 -0700 (PDT) Message-Id: <199909040137.SAA10144@vashon.polstra.com> To: gmckinney@megabits.net Subject: Re: relative alpha speed In-Reply-To: <00ee01bef5fc$203599e0$1e00000a@gary2.megabits.net> References: <00ee01bef5fc$203599e0$1e00000a@gary2.megabits.net> Organization: Polstra & Co., Seattle, WA Cc: alpha@freebsd.org Sender: owner-freebsd-alpha@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org In article <00ee01bef5fc$203599e0$1e00000a@gary2.megabits.net>, Gary McKinney wrote: > There is also a thread in the Debian Linux group about performance of the > floating point lib routines (the lib routines are patterned after the Intel > FP unit and the Alpha is a different creature with different needs). The test I used was a matrix multiplication function which had been carefully hand-optimized to take advantage of the Alpha pipeline. (Sorry, the program is not mine to redistribute. Hidetoshi Shimokawa kindly got permission from the author to send me a copy.) It didn't call any floating point library functions. It was blazingly fast when compiled without "-mieee". But when built with "-mieee" it was much slower on my 533 MHz 164LX than on my 400 MHz PII. Hidetoshi told me: This program exploits four pipelines(2 integers, 2 floats), and L1(or L2, I forgot detail) cache efficiently. You can easily check this by 'iprobe quad'. (Even if it is compiled with `-mieee', it still issuing quad instructions at a cycle) Further testing indicated that enabling precise traps was responsible for the performance degradation. N.B., no traps were actually generated -- I instrumented the kernel to make sure. But the changes in the code generation to permit identifying which instruction caused a trap slowed it down. I'm not sure how much the code generation could improved. Right now it emits "trapb" (trap barrier) instructions in the places it thinks they're needed -- not after each FP instruction. It's not obvious that it could do much better. John -- John Polstra jdp@polstra.com John D. Polstra & Co., Inc. Seattle, Washington USA "No matter how cynical I get, I just can't keep up." -- Nora Ephron To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message