From owner-freebsd-performance@FreeBSD.ORG Mon Feb 23 10:21:11 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3155C16A4DD for ; Mon, 23 Feb 2004 10:21:11 -0800 (PST) Received: from avocado.salatschuessel.net (avocado.salatschuessel.net [80.86.187.41]) by mx1.FreeBSD.org (Postfix) with ESMTP id 584B843D1F for ; Mon, 23 Feb 2004 10:21:10 -0800 (PST) (envelope-from lehmann@ans-netz.de) Received: (qmail 94837 invoked from network); 23 Feb 2004 18:20:56 -0000 Received: from unknown (HELO kartoffel.salatschuessel.net) (80.86.187.43) by avocado.salatschuessel.net with SMTP; 23 Feb 2004 18:20:56 -0000 Date: Mon, 23 Feb 2004 19:21:03 +0100 From: Oliver Lehmann To: freebsd-performance@freebsd.org Message-Id: <20040223192103.59ad7b69.lehmann@ans-netz.de> X-Mailer: Sylpheed version 0.9.9 (GTK+ 1.2.10; i386-portbld-freebsd4.9) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit cc: freebsd-alpha@freebsd.org Subject: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Feb 2004 18:21:11 -0000 Hi, I run several buildworld's on both, an Alpha EV56 and a Intel PII. I'm wondering that the Pentium beats the Alpha. Both systems have INVARIANTS* and WITNESS* disabled. Furthermore, no special tweakups - nearly "out of the box" configured. Alpha EV45, 600MHz, single CPU 768MB RAM UW-SCSI SEAGATE ST34573W 6244 running: 5.2.1-RC2 (15th Feb 2004) building: 5.2.1-RC2 (15th Feb 2004) /usr/src ufs, /usr/obj ufs make buildworld 19808.17 real 17126.29 user 1965.42 sys make -j2 buildworld 20085.05 real 17314.00 user 2139.92 sys make -j3 buildworld 20182.12 real 17467.14 user 2175.41 sys ######################################################################## Intel PII, 333MHz, single CPU 256MB RAM UDMA33 WDC AC26400R running: 5.2-CURRENT (20th Feb 2004) building: 5.2.1-RC2 (15th Feb 2004) /usr/src ufs, /usr/obj ufs make buildworld 16916.70 real 14185.83 user 2028.82 sys make -j2 buildworld 16991.71 real 14445.95 user 2201.29 sys make -j3 buildworld 17042.71 real 14532.71 user 2235.48 sys make -j4 buildworld 17171.15 real 14583.99 user 2242.81 sys And I don't think CURRENT is that "faster" than 5.2.1. Why does (my) alpha performs so bad? Some ideas? Is that behaviour common for FreeBSD/alpha or did I missed something in my consideration? -- Oliver Lehmann http://www.pofo.de/ http://wishlist.ans-netz.de/ From owner-freebsd-performance@FreeBSD.ORG Mon Feb 23 16:25:36 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C8FA216A4CE; Mon, 23 Feb 2004 16:25:36 -0800 (PST) Received: from mail-gate.ait.ac.th (mail-gate.ait.ac.th [202.183.214.47]) by mx1.FreeBSD.org (Postfix) with ESMTP id 89B9C43D1F; Mon, 23 Feb 2004 16:25:36 -0800 (PST) (envelope-from alain@ait.ac.th) Received: from mailserv.ait.ac.th (mailserv.ait.ac.th [203.159.5.10]) by mail-gate.ait.ac.th (Postfix) with ESMTP id C1434FEB42; Tue, 24 Feb 2004 07:25:34 +0700 (ICT) Received: from localhost (mailserv [127.0.0.1]) by mailserv.ait.ac.th (Postfix) with ESMTP id A5ADC47; Tue, 24 Feb 2004 07:25:34 +0700 (ICT) Received: from mailserv.ait.ac.th ([127.0.0.1]) by localhost (mailserv.ait.ac.th [127.0.0.1:10024]) (amavisd-new) with ESMTP id 13319-03; Tue, 24 Feb 2004 07:25:34 +0700 (ICT) Received: from alain.itserv.ait.ac.th (obelix.itserv.ait.ac.th [203.159.28.4]) by mailserv.ait.ac.th (Postfix) with ESMTP id 3E46F45; Tue, 24 Feb 2004 07:25:34 +0700 (ICT) Received: by alain.itserv.ait.ac.th (Postfix, from userid 10001) id EE2DA22E703; Tue, 24 Feb 2004 07:25:33 +0700 (ICT) Date: Tue, 24 Feb 2004 07:25:33 +0700 From: Alain Fauconnet To: Oliver Lehmann Message-ID: <20040224002533.GA9677@ait.ac.th> References: <20040223192103.59ad7b69.lehmann@ans-netz.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040223192103.59ad7b69.lehmann@ans-netz.de> User-Agent: Mutt/1.4i X-Virus-Scanned: by amavisd-new cc: freebsd-performance@freebsd.org cc: freebsd-alpha@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2004 00:25:36 -0000 On Mon, Feb 23, 2004 at 07:21:03PM +0100, Oliver Lehmann wrote: > Hi, > > I run several buildworld's on both, an Alpha EV56 and a Intel PII. I'm > wondering that the Pentium beats the Alpha. Both systems have INVARIANTS* > and WITNESS* disabled. Furthermore, no special tweakups - nearly "out of > the box" configured. > (rest deleted) Just a wild guess: I happen to have had the luck of counting one of the very few guys outside of the US involved in the development of the Alpha chip among my friends. I remember the long fascinating talks we were having when Alpha came out (touch of nostalgia here ;-) and him explaining to me how difficult it was to write a good compiler for the beast. The optimization pass is a real nightmare. At that time we were talking about the EV4 chip, I can only imagine that things have gone worse with the newer generation chips supporting more parallelism, deeper prefetch, branch prediction and all that. So I'm not overly surprised that a job made almost entirely of compiles takes a significant longer time, even though the CPU is more powerful. Now I'll let it to the really knowledgeable persons to give better reasons ;-) Greets, _Alain_ From owner-freebsd-performance@FreeBSD.ORG Tue Feb 24 10:44:26 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2978B16A4CE; Tue, 24 Feb 2004 10:44:26 -0800 (PST) Received: from freebee.digiware.nl (dsl390.iae.nl [212.61.63.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3CE4143D1D; Tue, 24 Feb 2004 10:44:25 -0800 (PST) (envelope-from wjw@withagen.nl) Received: from dual (dual [212.61.27.71]) by freebee.digiware.nl (8.12.10/8.12.9) with SMTP id i1OIgveL053871; Tue, 24 Feb 2004 19:42:57 +0100 (CET) (envelope-from wjw@withagen.nl) Message-ID: <079d01c3fb06$0e2299b0$471b3dd4@dual> From: "Willem Jan Withagen" To: Date: Tue, 24 Feb 2004 19:43:11 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 cc: freebsd-performance@FreeBSD.ORG Subject: The effects of WITNESS and INVARIANTS X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2004 18:44:26 -0000 Hi, Just out of curriousity, and 'cause I've got some idle boxes, I started to do some (NFS-)performance testing. There is still a long way to go, but I've but a first obvious result online running on the local server disk. It makes the claims of the effects of WITNESS and INVARIANTS very obvious. Look especially at the graph for "Sequential block read". You might want to have a look at: http://withagen.dyndns.org/FreeBSD/nfs-performance/index.html. Note that no NFS data is included. I have some Bonnie-NFS data, but need to write accompanying test and conclusions for it. Suggestions are more than welcomed. --WjW From owner-freebsd-performance@FreeBSD.ORG Tue Feb 24 11:29:34 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 226C716A4CE; Tue, 24 Feb 2004 11:29:34 -0800 (PST) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id ADFFA43D46; Tue, 24 Feb 2004 11:29:33 -0800 (PST) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.10/8.12.10) with ESMTP id i1OJShDL059289; Tue, 24 Feb 2004 14:28:43 -0500 (EST) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i1OJShis059286; Tue, 24 Feb 2004 14:28:43 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Tue, 24 Feb 2004 14:28:43 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: Willem Jan Withagen In-Reply-To: <079d01c3fb06$0e2299b0$471b3dd4@dual> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-performance@FreeBSD.ORG cc: current@FreeBSD.ORG Subject: Re: The effects of WITNESS and INVARIANTS X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2004 19:29:34 -0000 On Tue, 24 Feb 2004, Willem Jan Withagen wrote: > Just out of curriousity, and 'cause I've got some idle boxes, I started > to do some (NFS-)performance testing. There is still a long way to go, > but I've but a first obvious result online running on the local server > disk. > > It makes the claims of the effects of WITNESS and INVARIANTS very > obvious. Look especially at the graph for "Sequential block read". > > You might want to have a look at: > http://withagen.dyndns.org/FreeBSD/nfs-performance/index.html. Note > that no NFS data is included. I have some Bonnie-NFS data, but need to > write accompanying test and conclusions for it. > > Suggestions are more than welcomed. My primary suggestion is "Turn off WITNESS and INVARIANTS when benchmarking or for production systems". We turn them off in releases, and once 5.x becomes 5-stable, we'll turn it off by default also. However, they're invaluable tools when debugging the development system, so we have them on in the development branch by default. I would encourage people to generally run with them turned on unless performance of a system requires them to be off, as it really helps the debugging process, as well as helping to identify locking problems as the system evolves. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research From owner-freebsd-performance@FreeBSD.ORG Tue Feb 24 12:26:56 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 63A5B16A4CF for ; Tue, 24 Feb 2004 12:26:56 -0800 (PST) Received: from diogenis.ceid.upatras.gr (diogenis.ceid.upatras.gr [150.140.141.181]) by mx1.FreeBSD.org (Postfix) with SMTP id 7E4F643D1D for ; Tue, 24 Feb 2004 12:26:55 -0800 (PST) (envelope-from ntarmos@Noth.ceid.upatras.gr) Received: (qmail 10535 invoked from network); 24 Feb 2004 20:26:52 -0000 Received: from noth.ceid.upatras.gr (150.140.143.234) by diogenis.ceid.upatras.gr with SMTP; 24 Feb 2004 20:26:52 -0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by Noth.ceid.upatras.gr (Postfix) with ESMTP id 6D07F101CC; Tue, 24 Feb 2004 22:26:53 +0200 (EET) Received: from Noth.ceid.upatras.gr ([127.0.0.1]) by localhost (Noth [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 08479-08; Tue, 24 Feb 2004 22:26:52 +0200 (EET) Received: by Noth.ceid.upatras.gr (Postfix, from userid 1000) id DB4D2101C7; Tue, 24 Feb 2004 22:26:52 +0200 (EET) Date: Tue, 24 Feb 2004 22:26:52 +0200 From: Nikos Ntarmos To: Oliver Lehmann Message-ID: <20040224202652.GA13675@diogenis.ceid.upatras.gr> References: <20040223192103.59ad7b69.lehmann@ans-netz.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040223192103.59ad7b69.lehmann@ans-netz.de> User-Agent: Mutt/1.5.5.1+cvs20040105i X-Virus-Scanned: by amavisd-new-20030616-p7 (Debian) at Noth.ceid.upatras.gr cc: freebsd-performance@FreeBSD.org cc: freebsd-alpha@FreeBSD.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2004 20:26:56 -0000 Hi there. On Mon, Feb 23, 2004 at 07:21:03PM +0100, Oliver Lehmann wrote: > I run several buildworld's on both, an Alpha EV56 and a Intel > PII. I'm wondering that the Pentium beats the Alpha. Both systems > have INVARIANTS* and WITNESS* disabled. Furthermore, no special > tweakups - nearly "out of the box" configured. IIRC the 600MHz EV56's performance wrt integer operations (such as compiling) is somewhere in the vicinity of a 400MHz P-II, so the difference you see in turn-around times when buildworld'ing isn't quite that big. If the operations were identical, you should see better times when building on the alpha. However, also take into account that compiling (and optimizing) for a RISC CPU, apart from generating larger binaries, is AFAIK supposedly more difficult than compiling (and optimizing) for a CISC CPU. Let alone when compiling using a compiler compiled with a compiler (nice, huh?) highly optimized for x86 platforms but not as good when it comes to the AXP arch. \n\n -- Nikos "Noth" Ntarmos | < ntarmos at ceid dot upatras dot gr > NetCINS Lab. @ C.E.I.D. | [ http://{noth,p2p}.ceid.upatras.gr/ ] U. of Patras - Greece | ( 38.2594N, 21.7428E ) ( 1024D / CF95160A ) From owner-freebsd-performance@FreeBSD.ORG Tue Feb 24 12:55:04 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8F25916A4CE for ; Tue, 24 Feb 2004 12:55:04 -0800 (PST) Received: from avocado.salatschuessel.net (avocado.salatschuessel.net [80.86.187.41]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7D4E443D1F for ; Tue, 24 Feb 2004 12:55:03 -0800 (PST) (envelope-from lehmann@ans-netz.de) Received: (qmail 29107 invoked from network); 24 Feb 2004 20:54:50 -0000 Received: from unknown (HELO kartoffel.salatschuessel.net) (80.86.187.43) by avocado.salatschuessel.net with SMTP; 24 Feb 2004 20:54:50 -0000 Date: Tue, 24 Feb 2004 21:54:57 +0100 From: Oliver Lehmann To: ticso@cicely.de Message-Id: <20040224215457.34d3decd.lehmann@ans-netz.de> In-Reply-To: <20040223205152.GC44313@cicely12.cicely.de> References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040223191652.GB5837@freebie.xs4all.nl> <20040223205152.GC44313@cicely12.cicely.de> X-Mailer: Sylpheed version 0.9.9 (GTK+ 1.2.10; i386-portbld-freebsd4.9) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit cc: freebsd-performance@freebsd.org cc: freebsd-alpha@freebsd.org cc: wkb@freebie.xs4all.nl Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2004 20:55:04 -0000 Bernd Walter wrote: > I think thats around factor 2-4 slower. > Especially without having CPUTYPE >=ev56 set. Ok, I have CPUTYPE=ev56 now set in make.conf and I got slightly better results: make buildworld 19144.53 real 16475.63 user 1964.88 sys make -j2 buildworld 19530.91 real 16784.12 user 2143.32 sys make -j3 buildworld 19639.32 real 16948.23 user 2173.32 sys make -j4 buildworld 19892.68 real 17177.20 user 2202.12 sys I installed the last buildworld and built a new kernel... let's see how those new (with CPUTYPE=56 compiled) binaries are performing... *reruning-tests* -- Oliver Lehmann http://www.pofo.de/ http://wishlist.ans-netz.de/ From owner-freebsd-performance@FreeBSD.ORG Tue Feb 24 14:12:02 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2DBEA16A4CE; Tue, 24 Feb 2004 14:12:02 -0800 (PST) Received: from freebee.digiware.nl (dsl390.iae.nl [212.61.63.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id 33E4943D2D; Tue, 24 Feb 2004 14:12:01 -0800 (PST) (envelope-from wjw@withagen.nl) Received: from dual (dual [212.61.27.71]) by freebee.digiware.nl (8.12.10/8.12.9) with SMTP id i1OMAWeL058708; Tue, 24 Feb 2004 23:10:32 +0100 (CET) (envelope-from wjw@withagen.nl) Message-ID: <084901c3fb23$0ec737f0$471b3dd4@dual> From: "Willem Jan Withagen" To: "Robert Watson" References: Date: Tue, 24 Feb 2004 23:10:47 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 cc: freebsd-performance@FreeBSD.ORG cc: current@FreeBSD.ORG Subject: Re: The effects of WITNESS and INVARIANTS X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2004 22:12:02 -0000 From: "Robert Watson" > On Tue, 24 Feb 2004, Willem Jan Withagen wrote: > > > Just out of curriousity, and 'cause I've got some idle boxes, I started > > to do some (NFS-)performance testing. There is still a long way to go, > > but I've but a first obvious result online running on the local server > > disk. > > > > It makes the claims of the effects of WITNESS and INVARIANTS very > > obvious. Look especially at the graph for "Sequential block read". > > > > You might want to have a look at: > > http://withagen.dyndns.org/FreeBSD/nfs-performance/index.html. Note > > that no NFS data is included. I have some Bonnie-NFS data, but need to > > write accompanying test and conclusions for it. > > > > Suggestions are more than welcomed. > > My primary suggestion is "Turn off WITNESS and INVARIANTS when > benchmarking or for production systems". We turn them off in releases, > and once 5.x becomes 5-stable, we'll turn it off by default also. However, > they're invaluable tools when debugging the development system, so we have > them on in the development branch by default. I would encourage people to > generally run with them turned on unless performance of a system requires > them to be off, as it really helps the debugging process, as well as > helping to identify locking problems as the system evolves. Hi Robert, I understand your concern and will take your remark as made here, and put it in the document as a "Nota Bene"... And I'll emphasise the fact that the 5-stable fact is the turning corner for this matter. --WjW From owner-freebsd-performance@FreeBSD.ORG Tue Feb 24 15:39:33 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2B1FA16A4CE; Tue, 24 Feb 2004 15:39:33 -0800 (PST) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id BAA4843D2D; Tue, 24 Feb 2004 15:39:32 -0800 (PST) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.10/8.12.10) with ESMTP id i1ONcfDL062186; Tue, 24 Feb 2004 18:38:41 -0500 (EST) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i1ONce6h062183; Tue, 24 Feb 2004 18:38:40 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Tue, 24 Feb 2004 18:38:40 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: Willem Jan Withagen In-Reply-To: <084901c3fb23$0ec737f0$471b3dd4@dual> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-performance@FreeBSD.ORG cc: current@FreeBSD.ORG Subject: Re: The effects of WITNESS and INVARIANTS X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2004 23:39:33 -0000 On Tue, 24 Feb 2004, Willem Jan Withagen wrote: > I understand your concern and will take your remark as made here, and > put it in the document as a "Nota Bene"... And I'll emphasise the fact > that the 5-stable fact is the turning corner for this matter. There was recently a thread on the general topic of preparing FreeBSD for benchmarking, which you might want to take a look at: http://www.bsdforums.org/forums/showthread.php?threadid=18304 We have a lot of debugger features on, and there are a number of general features that may adversely impact microbenchmarks. I thought this was getting added to the FreeBSD handbook or developer's handbook somewhere, but Google isn't finding it... Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research From owner-freebsd-performance@FreeBSD.ORG Tue Feb 24 15:56:31 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4914716A4CE; Tue, 24 Feb 2004 15:56:31 -0800 (PST) Received: from freebee.digiware.nl (dsl390.iae.nl [212.61.63.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4D6E643D1D; Tue, 24 Feb 2004 15:56:30 -0800 (PST) (envelope-from wjw@withagen.nl) Received: from dual (dual [212.61.27.71]) by freebee.digiware.nl (8.12.10/8.12.9) with SMTP id i1ONt1eL062232; Wed, 25 Feb 2004 00:55:01 +0100 (CET) (envelope-from wjw@withagen.nl) Message-ID: <08ca01c3fb31$a7bce690$471b3dd4@dual> From: "Willem Jan Withagen" To: "Robert Watson" References: Date: Wed, 25 Feb 2004 00:55:17 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 cc: freebsd-performance@FreeBSD.ORG cc: current@FreeBSD.ORG Subject: Re: The effects of WITNESS and INVARIANTS X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2004 23:56:31 -0000 From: "Robert Watson" > There was recently a thread on the general topic of preparing FreeBSD for > benchmarking, which you might want to take a look at: > > http://www.bsdforums.org/forums/showthread.php?threadid=18304 > > We have a lot of debugger features on, and there are a number of general > features that may adversely impact microbenchmarks. I thought this was > getting added to the FreeBSD handbook or developer's handbook somewhere, > but Google isn't finding it... I understand the concerns for testing. I saw PHK post too, and for the current data is was slightly too late because most of it was already collected. There are several good suggestions in there, which will require some inventive rewritting of the scripts. Especially once I'd like to try 2 NFS clients work on one server, and they need to be started "synchronously". Some of the suggestions are not quite possible: run no daemons.... NFS needs to be there, but in general services need to be reduced to a bare minimum. --WjW From owner-freebsd-performance@FreeBSD.ORG Tue Feb 24 17:17:17 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4E44316A4CE; Tue, 24 Feb 2004 17:17:17 -0800 (PST) Received: from smtpout.mac.com (smtpout.mac.com [17.250.248.45]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3F2D143D1D; Tue, 24 Feb 2004 17:17:17 -0800 (PST) (envelope-from cswiger@mac.com) Received: from mac.com (smtpin07-en2 [10.13.10.152]) by smtpout.mac.com (Xserve/MantshX 2.0) with ESMTP id i1P1HG1T004899; Tue, 24 Feb 2004 17:17:16 -0800 (PST) Received: from [10.1.1.193] ([199.103.21.225]) (authenticated bits=0) by mac.com (Xserve/smtpin07/MantshX 3.0) with ESMTP id i1P1HFXp012110; Tue, 24 Feb 2004 17:17:16 -0800 (PST) Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: <20040224202652.GA13675@diogenis.ceid.upatras.gr> References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> Content-Transfer-Encoding: 7bit From: Charles Swiger Date: Tue, 24 Feb 2004 20:17:07 -0500 X-Mailer: Apple Mail (2.612) cc: freebsd-performance@FreeBSD.org cc: freebsd-alpha@FreeBSD.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 01:17:17 -0000 On Feb 24, 2004, at 3:26 PM, Nikos Ntarmos wrote: > IIRC the 600MHz EV56's performance wrt integer operations (such as > compiling) is somewhere in the vicinity of a 400MHz P-II, so the > difference you see in turn-around times when buildworld'ing isn't > quite that big. If the operations were identical, you should see > better times when building on the alpha. However, also take into > account that compiling (and optimizing) for a RISC CPU, apart from > generating larger binaries, is AFAIK supposedly more difficult than > compiling (and optimizing) for a CISC CPU. I'm afraid you've got this backwards. :-) The primary attributes of RISC architectures, namely lots of registers, a relatively simple but orthagonal instruction set, and a relatively fast clock rate / CPI ~= 1.0 / a short pipeline make it far easier for the compiler to generate and optimize code. CISC architectures make the compilers job much harder because they tend to require lots of register spills, they tend to have very long pipelines which involve hazards and require a lot of instruction reordering to avoid stalling the pipeline to often. The amount of CPU clocks it takes per instruction (CPI) often varies on CISC as is generally much larger than ~1.0, and sometimes varies from CPU model to CPU model making it far more difficult to determine the "fastest" instruction sequence. At a broader level, RISC closely matches the intermediate code model compilers work with than CISC does, which makes it easier to generate target code from the intermediate code. Often, on RISC there is little difference between the two, just as there is less difference between optimized and unoptimized code, whereas on CISC there is a lot more work to be done in order to get optimized code working with comparible efficiency. -- -Chuck From owner-freebsd-performance@FreeBSD.ORG Tue Feb 24 21:19:30 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 28D7D16A4CE; Tue, 24 Feb 2004 21:19:30 -0800 (PST) Received: from out003.verizon.net (out003pub.verizon.net [206.46.170.103]) by mx1.FreeBSD.org (Postfix) with ESMTP id DDC2543D2D; Tue, 24 Feb 2004 21:19:29 -0800 (PST) (envelope-from cswiger@mac.com) Received: from mac.com ([68.161.120.219]) by out003.verizon.net (InterMail vM.5.01.06.06 201-253-122-130-106-20030910) with ESMTP id <20040225051929.FFUH8426.out003.verizon.net@mac.com>; Tue, 24 Feb 2004 23:19:29 -0600 Message-ID: <403C3053.5030204@mac.com> Date: Wed, 25 Feb 2004 00:19:15 -0500 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-us, en MIME-Version: 1.0 References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> <20040225025953.GH10121@gsmx07.alcatel.com.au> In-Reply-To: <20040225025953.GH10121@gsmx07.alcatel.com.au> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at out003.verizon.net from [68.161.120.219] at Tue, 24 Feb 2004 23:19:28 -0600 cc: freebsd-performance@freebsd.org cc: freebsd-alpha@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 05:19:30 -0000 Peter Jeremy wrote: > On 2004-Feb-24 20:17:07 -0500, Charles Swiger wrote: [ ...compiler optimizations... ] >> I'm afraid you've got this backwards. :-) > > Maybe in theory, but not necessarily in practice. It's been a few years since I'd written a compiler, but my viewpoint isn't based entirely on theory. >> The primary attributes of RISC architectures, namely lots of registers, >> a relatively simple but orthagonal instruction set, and a relatively >> fast clock rate / CPI ~= 1.0 / a short pipeline make it far easier for >> the compiler to generate and optimize code. > > Alpha pipelines are only short in a relative sense - the EV5 pipeline > is 7 (integer) or 9 (FP) stages and I suspect the EV56 pipeline is the > same. In theory, it is 4-way superscalar but the different execution > units aren't equivalent and the compiler has to understand which > instructions will be allocated to which execution units in order to > minimise stalls. A Northwood P4 has 20 stages (or 21, or 28, depending on how you want to count instruction decode stages rather than just the integer pipeline), the P3 has 12, and even the P2 has 8. Prescott has what, 32 stages? Any superscalar processor architecture is going to be harder to compile for than an architecture which is not, just as parallel execution with limited execution units requires more work than a truly orthogonal architecture. >>CISC architectures make the compilers job much harder because they tend >>to require lots of register spills, they tend to have very long >>pipelines which involve hazards and require a lot of instruction >>reordering to avoid stalling the pipeline to often. The amount of CPU >>clocks it takes per instruction (CPI) often varies on CISC as is >>generally much larger than ~1.0, and sometimes varies from CPU model to >>CPU model making it far more difficult to determine the "fastest" >>instruction sequence. > > Recent iA32 implementations (basically anything more recent than a > PII) are RISC cores which directly execute a subset of the iA32 > instruction set with the remainder handled by microcode. You get > quite respectable results by treating it as a load/store RISC > architecture and relying on the L1 cache to handle the register spills > in a timely fashion. The pipelines and super-scalar execution > abilities are all handled in hardware. Register scoreboarding allows > the implementation to have more physical registers than the programmer > view supports - allowing multiple instructions to simultaneously see > different values in the same visible register. Your technical description is accurate, but the points you are making here seem to support my argument, rather than contradict what I said. :-) Basicly, you've suggested that it's easier to compile for a recent x86 than for a P2 because the hardware in a P4 goes to extravagant lengths to dynamicly optimize x86 instructions (CISC) into simpler RISC instructions which can be scheduled, executed out-of-order, using a ~120 register scoreboard to multiplex 8 visible registers amoungst all of the pipeline stages, etc, etc. Why does the P4 actually execute RISC u-ops and microcode, rather than implementing the x86 opcodes directly? Because it's far easier to optimize RISC u-ops, whether in the context of a compiler or in the context of the CPU hardware itself, than to try to optimize CISC opcodes directly. > The compiler has to expend a lot of effort on instruction scheduling > to get decent performance out of a typical RISC architecture. Much of > this is automatically handled by the hardware on an iA32 and you can > get equivalent results with a much simpler compiler. Is it easier or harder to optimize generic x86 code for the P2 or for a P4, and why? If you don't optimize intermediate code at all, which performs better? Why? Depending on your L1 cache to reduce the costs of spilling registers all over the place because you've only got about 6 or so freely available is a mediocre bandaid compared to having 32 or so registers: unoptimized PowerPC, MIPS, PA-RISC, or SPARC code does a heck of a lot better than running unoptimized x86 code, and that's without considering something like the SPARC register windows which do a heck of a job of passing context between caller and callee and freeing up 8 new registers for temp use at each function invocation (and without the mind-numbing complexity of HP's PA-RISC calling conventions). -- -Chuck From owner-freebsd-performance@FreeBSD.ORG Mon Feb 23 11:16:55 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2ADCF16A4CE; Mon, 23 Feb 2004 11:16:55 -0800 (PST) Received: from smtp-out4.xs4all.nl (smtp-out4.xs4all.nl [194.109.24.5]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7C07143D1D; Mon, 23 Feb 2004 11:16:54 -0800 (PST) (envelope-from wkb@freebie.xs4all.nl) Received: from freebie.xs4all.nl (freebie.xs4all.nl [213.84.32.253]) by smtp-out4.xs4all.nl (8.12.10/8.12.10) with ESMTP id i1NJGqOh035405; Mon, 23 Feb 2004 20:16:52 +0100 (CET) Received: from freebie.xs4all.nl (localhost [127.0.0.1]) by freebie.xs4all.nl (8.12.10/8.12.9) with ESMTP id i1NJGq6R005898; Mon, 23 Feb 2004 20:16:52 +0100 (CET) (envelope-from wkb@freebie.xs4all.nl) Received: (from wkb@localhost) by freebie.xs4all.nl (8.12.10/8.12.9/Submit) id i1NJGq9l005897; Mon, 23 Feb 2004 20:16:52 +0100 (CET) (envelope-from wkb) Date: Mon, 23 Feb 2004 20:16:52 +0100 From: Wilko Bulte To: Oliver Lehmann Message-ID: <20040223191652.GB5837@freebie.xs4all.nl> References: <20040223192103.59ad7b69.lehmann@ans-netz.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040223192103.59ad7b69.lehmann@ans-netz.de> User-Agent: Mutt/1.4.1i X-OS: FreeBSD 4.9-STABLE X-PGP: finger wilko@freebsd.org X-Mailman-Approved-At: Wed, 25 Feb 2004 00:34:29 -0800 cc: freebsd-performance@freebsd.org cc: freebsd-alpha@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Feb 2004 19:16:55 -0000 On Mon, Feb 23, 2004 at 07:21:03PM +0100, Oliver Lehmann wrote: > Hi, > > I run several buildworld's on both, an Alpha EV56 and a Intel PII. I'm > wondering that the Pentium beats the Alpha. Both systems have INVARIANTS* > and WITNESS* disabled. Furthermore, no special tweakups - nearly "out of > the box" configured. ... > And I don't think CURRENT is that "faster" than 5.2.1. Why does (my) alpha > performs so bad? Some ideas? Is that behaviour common for FreeBSD/alpha or > did I missed something in my consideration? Bit of a FAQ, but the last time I checked it was due to the code generation for Alpha being lots slower. -- Wilko Bulte wilko@FreeBSD.org From owner-freebsd-performance@FreeBSD.ORG Mon Feb 23 12:52:09 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1782F16A4CE; Mon, 23 Feb 2004 12:52:09 -0800 (PST) Received: from srv1.cosmo-project.de (srv1.cosmo-project.de [213.83.6.106]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6A71E43D2F; Mon, 23 Feb 2004 12:52:08 -0800 (PST) (envelope-from ticso@cicely12.cicely.de) Received: from cicely5.cicely.de (cicely5.cicely.de [IPv6:3ffe:400:8d0:301:200:92ff:fe9b:20e7]) (authenticated bits=0) i1NKq2rQ063978 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK); Mon, 23 Feb 2004 21:52:05 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (cicely12.cicely.de [IPv6:3ffe:400:8d0:301::12]) by cicely5.cicely.de (8.12.10/8.12.10) with ESMTP id i1NKptuL023844 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 23 Feb 2004 21:51:55 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (localhost [127.0.0.1]) by cicely12.cicely.de (8.12.10/8.12.10) with ESMTP id i1NKpsc5088521; Mon, 23 Feb 2004 21:51:54 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: (from ticso@localhost) by cicely12.cicely.de (8.12.10/8.12.10/Submit) id i1NKprfZ088520; Mon, 23 Feb 2004 21:51:53 +0100 (CET) (envelope-from ticso) Date: Mon, 23 Feb 2004 21:51:53 +0100 From: Bernd Walter To: Wilko Bulte Message-ID: <20040223205152.GC44313@cicely12.cicely.de> References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040223191652.GB5837@freebie.xs4all.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040223191652.GB5837@freebie.xs4all.nl> X-Operating-System: FreeBSD cicely12.cicely.de 5.2-CURRENT alpha User-Agent: Mutt/1.5.4i X-Spam-Status: No, hits=-4.9 required=3.0 tests=BAYES_00 autolearn=ham version=2.61 X-Spam-Report: * -4.9 BAYES_00 BODY: Bayesian spam probability is 0 to 1% * [score: 0.0000] X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on cicely5.cicely.de X-Mailman-Approved-At: Wed, 25 Feb 2004 00:34:29 -0800 cc: freebsd-performance@freebsd.org cc: Oliver Lehmann cc: freebsd-alpha@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: ticso@cicely.de List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Feb 2004 20:52:09 -0000 On Mon, Feb 23, 2004 at 08:16:52PM +0100, Wilko Bulte wrote: > On Mon, Feb 23, 2004 at 07:21:03PM +0100, Oliver Lehmann wrote: > > Hi, > > > > I run several buildworld's on both, an Alpha EV56 and a Intel PII. I'm > > wondering that the Pentium beats the Alpha. Both systems have INVARIANTS* > > and WITNESS* disabled. Furthermore, no special tweakups - nearly "out of > > the box" configured. > > ... > > > And I don't think CURRENT is that "faster" than 5.2.1. Why does (my) alpha > > performs so bad? Some ideas? Is that behaviour common for FreeBSD/alpha or > > did I missed something in my consideration? > > Bit of a FAQ, but the last time I checked it was due to the code generation > for Alpha being lots slower. I think thats around factor 2-4 slower. Especially without having CPUTYPE >=ev56 set. In fact te factor varies much around machine types, compiled code, compiler version and moon phase. Also I think the values are not that bad considered that the ev56 core is years older than that of the PII. -- B.Walter BWCT http://www.bwct.de ticso@bwct.de info@bwct.de From owner-freebsd-performance@FreeBSD.ORG Tue Feb 24 13:38:34 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8075916A4CE; Tue, 24 Feb 2004 13:38:34 -0800 (PST) Received: from srv1.cosmo-project.de (srv1.cosmo-project.de [213.83.6.106]) by mx1.FreeBSD.org (Postfix) with ESMTP id CEC0043D1F; Tue, 24 Feb 2004 13:38:33 -0800 (PST) (envelope-from ticso@cicely12.cicely.de) Received: from cicely5.cicely.de (cicely5.cicely.de [IPv6:3ffe:400:8d0:301:200:92ff:fe9b:20e7]) (authenticated bits=0) i1OLcQrQ083238 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK); Tue, 24 Feb 2004 22:38:29 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (cicely12.cicely.de [IPv6:3ffe:400:8d0:301::12]) by cicely5.cicely.de (8.12.10/8.12.10) with ESMTP id i1OLcHuL038918 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 24 Feb 2004 22:38:18 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (localhost [127.0.0.1]) by cicely12.cicely.de (8.12.10/8.12.10) with ESMTP id i1OLcHc5093543; Tue, 24 Feb 2004 22:38:17 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: (from ticso@localhost) by cicely12.cicely.de (8.12.10/8.12.10/Submit) id i1OLcG5u093542; Tue, 24 Feb 2004 22:38:16 +0100 (CET) (envelope-from ticso) Date: Tue, 24 Feb 2004 22:38:16 +0100 From: Bernd Walter To: Oliver Lehmann Message-ID: <20040224213815.GB44313@cicely12.cicely.de> References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040223191652.GB5837@freebie.xs4all.nl> <20040223205152.GC44313@cicely12.cicely.de> <20040224215457.34d3decd.lehmann@ans-netz.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040224215457.34d3decd.lehmann@ans-netz.de> X-Operating-System: FreeBSD cicely12.cicely.de 5.2-CURRENT alpha User-Agent: Mutt/1.5.4i X-Spam-Status: No, hits=-4.9 required=3.0 tests=BAYES_00 autolearn=ham version=2.61 X-Spam-Report: * -4.9 BAYES_00 BODY: Bayesian spam probability is 0 to 1% * [score: 0.0000] X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on cicely5.cicely.de X-Mailman-Approved-At: Wed, 25 Feb 2004 00:34:29 -0800 cc: freebsd-performance@freebsd.org cc: ticso@cicely.de cc: freebsd-alpha@freebsd.org cc: wkb@freebie.xs4all.nl Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: ticso@cicely.de List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2004 21:38:34 -0000 On Tue, Feb 24, 2004 at 09:54:57PM +0100, Oliver Lehmann wrote: > Bernd Walter wrote: > > > I think thats around factor 2-4 slower. > > Especially without having CPUTYPE >=ev56 set. > > Ok, I have CPUTYPE=ev56 now set in make.conf and I got slightly better > results: > > make buildworld 19144.53 real 16475.63 user 1964.88 sys > make -j2 buildworld 19530.91 real 16784.12 user 2143.32 sys > make -j3 buildworld 19639.32 real 16948.23 user 2173.32 sys > make -j4 buildworld 19892.68 real 17177.20 user 2202.12 sys > > I installed the last buildworld and built a new kernel... let's see how > those new (with CPUTYPE=56 compiled) binaries are performing... > *reruning-tests* The binaries are only used to build the toolchain. The remaining part of the buildworld aren't using the system binaries. -- B.Walter BWCT http://www.bwct.de ticso@bwct.de info@bwct.de From owner-freebsd-performance@FreeBSD.ORG Tue Feb 24 19:00:02 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8496C16A4CE; Tue, 24 Feb 2004 19:00:02 -0800 (PST) Received: from alcanet.com.au (mail2.alcanet.com.au [203.62.196.17]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7B39A43D1F; Tue, 24 Feb 2004 18:59:59 -0800 (PST) (envelope-from peter.jeremy@alcatel.com.au) Received: from sydsmtp02.alcatel.com.au (IDENT:root@localhost.localdomain [127.0.0.1])i1P2xsMw024438; Wed, 25 Feb 2004 13:59:55 +1100 Received: from gsmx07.alcatel.com.au ([139.188.20.247]) by sydsmtp02.alcatel.com.au (Lotus Domino Release 5.0.12) with ESMTP id 2004022513595362:183578 ; Wed, 25 Feb 2004 13:59:53 +1100 Received: from gsmx07.alcatel.com.au (localhost [127.0.0.1]) i1P2xrHQ061301; Wed, 25 Feb 2004 13:59:53 +1100 (EST) (envelope-from peter.jeremy@alcatel.com.au) Received: (from jeremyp@localhost) by gsmx07.alcatel.com.au (8.12.9p2/8.12.9/Submit) id i1P2xr0s061300; Wed, 25 Feb 2004 13:59:53 +1100 (EST) (envelope-from peter.jeremy@alcatel.com.au) Date: Wed, 25 Feb 2004 13:59:53 +1100 From: Peter Jeremy To: Charles Swiger Message-ID: <20040225025953.GH10121@gsmx07.alcatel.com.au> Mail-Followup-To: Charles Swiger , freebsd-performance@freebsd.org, freebsd-alpha@freebsd.org References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> Mime-Version: 1.0 In-Reply-To: <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> User-Agent: Mutt/1.4.2i X-MIMETrack: Itemize by SMTP Server on SYDSMTP02/AlcatelAustralia(Release 5.0.12 |February 13, 2003) at 25/02/2004 01:59:53 PM,|February 13, 2003) at 25/02/2004 01:59:55 PM, Serialize complete at 25/02/2004 01:59:55 PM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Mailman-Approved-At: Wed, 25 Feb 2004 00:34:29 -0800 cc: freebsd-performance@freebsd.org cc: freebsd-alpha@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 03:00:02 -0000 On 2004-Feb-24 20:17:07 -0500, Charles Swiger wrote: >On Feb 24, 2004, at 3:26 PM, Nikos Ntarmos wrote: >>IIRC the 600MHz EV56's performance wrt integer operations (such as >>compiling) is somewhere in the vicinity of a 400MHz P-II, so the >>difference you see in turn-around times when buildworld'ing isn't >>quite that big. If the operations were identical, you should see >>better times when building on the alpha. However, also take into >>account that compiling (and optimizing) for a RISC CPU, apart from >>generating larger binaries, is AFAIK supposedly more difficult than >>compiling (and optimizing) for a CISC CPU. > >I'm afraid you've got this backwards. :-) Maybe in theory, but not necessarily in practice. >The primary attributes of RISC architectures, namely lots of registers, >a relatively simple but orthagonal instruction set, and a relatively >fast clock rate / CPI ~= 1.0 / a short pipeline make it far easier for >the compiler to generate and optimize code. Alpha pipelines are only short in a relative sense - the EV5 pipeline is 7 (integer) or 9 (FP) stages and I suspect the EV56 pipeline is the same. In theory, it is 4-way superscalar but the different execution units aren't equivalent and the compiler has to understand which instructions will be allocated to which execution units in order to minimise stalls. >CISC architectures make the compilers job much harder because they tend >to require lots of register spills, they tend to have very long >pipelines which involve hazards and require a lot of instruction >reordering to avoid stalling the pipeline to often. The amount of CPU >clocks it takes per instruction (CPI) often varies on CISC as is >generally much larger than ~1.0, and sometimes varies from CPU model to >CPU model making it far more difficult to determine the "fastest" >instruction sequence. Recent iA32 implementations (basically anything more recent than a PII) are RISC cores which directly execute a subset of the iA32 instruction set with the remainder handled by microcode. You get quite respectable results by treating it as a load/store RISC architecture and relying on the L1 cache to handle the register spills in a timely fashion. The pipelines and super-scalar execution abilities are all handled in hardware. Register scoreboarding allows the implementation to have more physical registers than the programmer view supports - allowing multiple instructions to simultaneously see different values in the same visible register. The compiler has to expend a lot of effort on instruction scheduling to get decent performance out of a typical RISC architecture. Much of this is automatically handled by the hardware on an iA32 and you can get equivalent results with a much simpler compiler. Peter From owner-freebsd-performance@FreeBSD.ORG Wed Feb 25 01:25:53 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 30A8616A4CE; Wed, 25 Feb 2004 01:25:53 -0800 (PST) Received: from rms04.rommon.net (rms04.rommon.net [212.54.2.140]) by mx1.FreeBSD.org (Postfix) with ESMTP id 20CF743D2D; Wed, 25 Feb 2004 01:25:52 -0800 (PST) (envelope-from pete@he.iki.fi) Received: from he.iki.fi (i2-149.rommon.fi [195.163.185.149]) by rms04.rommon.net (8.12.9p1/8.12.9) with ESMTP id i1P9PccM023243; Wed, 25 Feb 2004 11:25:38 +0200 (EET) (envelope-from pete@he.iki.fi) Message-ID: <403C6A24.80804@he.iki.fi> Date: Wed, 25 Feb 2004 11:25:56 +0200 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Peter Jeremy References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> <20040225025953.GH10121@gsmx07.alcatel.com.au> In-Reply-To: <20040225025953.GH10121@gsmx07.alcatel.com.au> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-performance@freebsd.org cc: Charles Swiger cc: freebsd-alpha@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 09:25:53 -0000 Peter Jeremy wrote: >Recent iA32 implementations (basically anything more recent than a >PII) are RISC cores which directly execute a subset of the iA32 >instruction set with the remainder handled by microcode. You get >quite respectable results by treating it as a load/store RISC >architecture and relying on the L1 cache to handle the register spills > > This probably invites the question, what, if anything people like me who are interested in getting the maximum performance out of any hardware our things run on (maybe with the exception of the low-MHz embedded stuff :-), is there any good tutorials/books on the subject what kind of things to avoid when looking for optimal performance. The tightest loops mostly do counter rolling, comparisons and pattern matching and we have good mileage on getting performance gains by minimizing writing to memory when there are other options like arithmetic on the fly. One specific question that also comes to mind is if there is benefit on the more modern, SSE enabled code, to excersise floating point in balance with 64bit long long integers or does that gain performance only if the code is compiled without SSE? Pete From owner-freebsd-performance@FreeBSD.ORG Wed Feb 25 09:56:43 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5314C16A4CE; Wed, 25 Feb 2004 09:56:43 -0800 (PST) Received: from kartoffel.salatschuessel.net (pD9520027.dip.t-dialin.net [217.82.0.39]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7199F43D1D; Wed, 25 Feb 2004 09:56:42 -0800 (PST) (envelope-from lehmann@ans-netz.de) Received: from kartoffel.salatschuessel.net (localhost [127.0.0.1]) i1PHuW9V052168; Wed, 25 Feb 2004 18:56:32 +0100 (CET) (envelope-from lehmann@ans-netz.de) Date: Wed, 25 Feb 2004 18:56:31 +0100 From: Oliver Lehmann To: ticso@cicely.de Message-Id: <20040225185631.4cc91fec.lehmann@ans-netz.de> In-Reply-To: <20040224213815.GB44313@cicely12.cicely.de> References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040223191652.GB5837@freebie.xs4all.nl> <20040223205152.GC44313@cicely12.cicely.de> <20040224215457.34d3decd.lehmann@ans-netz.de> <20040224213815.GB44313@cicely12.cicely.de> X-Mailer: Sylpheed version 0.9.9 (GTK+ 1.2.10; i386-portbld-freebsd4.9) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit cc: freebsd-performance@freebsd.org cc: freebsd-alpha@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 17:56:43 -0000 Bernd Walter wrote: > On Tue, Feb 24, 2004 at 09:54:57PM +0100, Oliver Lehmann wrote: > > Bernd Walter wrote: > > > > > I think thats around factor 2-4 slower. > > > Especially without having CPUTYPE >=ev56 set. > > > > Ok, I have CPUTYPE=ev56 now set in make.conf and I got slightly better > > results: > > > > make buildworld 19144.53 real 16475.63 user 1964.88 sys > > make -j2 buildworld 19530.91 real 16784.12 user 2143.32 sys > > make -j3 buildworld 19639.32 real 16948.23 user 2173.32 sys > > make -j4 buildworld 19892.68 real 17177.20 user 2202.12 sys > > > > I installed the last buildworld and built a new kernel... let's see > > how those new (with CPUTYPE=56 compiled) binaries are performing... > > *reruning-tests* > > The binaries are only used to build the toolchain. > The remaining part of the buildworld aren't using the system binaries. Hm... actually it performs a _bit_ better with the new world+kernel ;) make buildworld 18749.83 real 16143.33 user 1981.61 sys make buildworld 18758.40 real 16191.54 user 1999.10 sys make -j2 buildworld 19297.45 real 16570.93 user 2127.90 sys make -j2 buildworld Next I'll try some CFLAGs... -- Oliver Lehmann http://www.pofo.de/ http://wishlist.ans-netz.de/ From owner-freebsd-performance@FreeBSD.ORG Wed Feb 25 11:14:28 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A257F16A4CE for ; Wed, 25 Feb 2004 11:14:28 -0800 (PST) Received: from dragon.nuxi.com (trang.nuxi.com [66.93.134.19]) by mx1.FreeBSD.org (Postfix) with ESMTP id 17F1443D2D for ; Wed, 25 Feb 2004 11:14:28 -0800 (PST) (envelope-from obrien@NUXI.com) Received: from dragon.nuxi.com (obrien@localhost [127.0.0.1]) by dragon.nuxi.com (8.12.10/8.12.10) with ESMTP id i1PJEROJ099217; Wed, 25 Feb 2004 11:14:27 -0800 (PST) (envelope-from obrien@dragon.nuxi.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.12.11/8.12.11/Submit) id i1PJEMoB098285; Wed, 25 Feb 2004 11:14:22 -0800 (PST) (envelope-from obrien) Date: Wed, 25 Feb 2004 11:14:22 -0800 From: "David O'Brien" To: Willem Jan Withagen Message-ID: <20040225191422.GK7567@dragon.nuxi.com> References: <079d01c3fb06$0e2299b0$471b3dd4@dual> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <079d01c3fb06$0e2299b0$471b3dd4@dual> User-Agent: Mutt/1.4.1i X-Operating-System: FreeBSD 5.2-CURRENT Organization: The NUXI BSD Group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 cc: freebsd-performance@FreeBSD.ORG Subject: Re: The effects of WITNESS and INVARIANTS X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: obrien@FreeBSD.ORG List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 19:14:28 -0000 On Tue, Feb 24, 2004 at 07:43:11PM +0100, Willem Jan Withagen wrote: > Just out of curriousity, and 'cause I've got some idle boxes, I started to do > some (NFS-)performance testing. Please don't cross-post. Otherwise why have the freebsd-performance mailing list in addition to the freebsd-current one. From owner-freebsd-performance@FreeBSD.ORG Wed Feb 25 11:30:56 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CCB0F16A4CE; Wed, 25 Feb 2004 11:30:56 -0800 (PST) Received: from dragon.nuxi.com (trang.nuxi.com [66.93.134.19]) by mx1.FreeBSD.org (Postfix) with ESMTP id 735E943D1D; Wed, 25 Feb 2004 11:30:56 -0800 (PST) (envelope-from obrien@NUXI.com) Received: from dragon.nuxi.com (obrien@localhost [127.0.0.1]) by dragon.nuxi.com (8.12.10/8.12.10) with ESMTP id i1PJUtOJ067985; Wed, 25 Feb 2004 11:30:55 -0800 (PST) (envelope-from obrien@dragon.nuxi.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.12.11/8.12.11/Submit) id i1PJUsrK067984; Wed, 25 Feb 2004 11:30:54 -0800 (PST) (envelope-from obrien) Date: Wed, 25 Feb 2004 11:30:53 -0800 From: "David O'Brien" To: Chuck Swiger Message-ID: <20040225193053.GL7567@dragon.nuxi.com> References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> <20040225025953.GH10121@gsmx07.alcatel.com.au> <403C3053.5030204@mac.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <403C3053.5030204@mac.com> User-Agent: Mutt/1.4.1i X-Operating-System: FreeBSD 5.2-CURRENT Organization: The NUXI BSD Group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 cc: freebsd-performance@freebsd.org cc: freebsd-alpha@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: freebsd-performance@freebsd.org, freebsd-alpha@freebsd.org List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 19:30:57 -0000 On Wed, Feb 25, 2004 at 12:19:15AM -0500, Chuck Swiger wrote: > >Maybe in theory, but not necessarily in practice. > > It's been a few years since I'd written a compiler, but my viewpoint isn't > based entirely on theory. ... > Your technical description is accurate, but the points you are making here > seem to support my argument, rather than contradict what I said. :-) You're assuming you're writing a compiler targeting _1_ specific architecture. It doesn't matter what is possible, what matters is what GCC does. Please go analysis GCC and report the deficiencies. I personally would love to know what they are, and how to make GCC do better on non-x86 platforms. -- -- David (obrien@FreeBSD.org) From owner-freebsd-performance@FreeBSD.ORG Wed Feb 25 11:36:03 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5FA6A16A4CE; Wed, 25 Feb 2004 11:36:03 -0800 (PST) Received: from dragon.nuxi.com (trang.nuxi.com [66.93.134.19]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3B5C643D2F; Wed, 25 Feb 2004 11:36:03 -0800 (PST) (envelope-from obrien@NUXI.com) Received: from dragon.nuxi.com (obrien@localhost [127.0.0.1]) by dragon.nuxi.com (8.12.10/8.12.10) with ESMTP id i1PJZqOJ068061; Wed, 25 Feb 2004 11:35:52 -0800 (PST) (envelope-from obrien@dragon.nuxi.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.12.11/8.12.11/Submit) id i1PJZp0S068060; Wed, 25 Feb 2004 11:35:51 -0800 (PST) (envelope-from obrien) Date: Wed, 25 Feb 2004 11:35:50 -0800 From: "David O'Brien" To: Oliver Lehmann Message-ID: <20040225193550.GM7567@dragon.nuxi.com> References: <20040223192103.59ad7b69.lehmann@ans-netz.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040223192103.59ad7b69.lehmann@ans-netz.de> User-Agent: Mutt/1.4.1i X-Operating-System: FreeBSD 5.2-CURRENT Organization: The NUXI BSD Group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 cc: freebsd-performance@freebsd.org cc: freebsd-alpha@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: freebsd-performance@freebsd.org, freebsd-alpha@freebsd.org List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 19:36:03 -0000 On Mon, Feb 23, 2004 at 07:21:03PM +0100, Oliver Lehmann wrote: > I run several buildworld's on both, an Alpha EV56 and a Intel PII. I'm > wondering that the Pentium beats the Alpha. Both systems have INVARIANTS* > and WITNESS* disabled. Furthermore, no special tweakups - nearly "out of > the box" configured. Build the same target arch on both machines. 'buildworld' times across platforms are never comparable. alpha-ev56:/usr/src# time make TARGET_ARCH=sparc64 buildworld p-ii:/usr/src# time make TARGET_ARCH=sparc64 buildworld (pick an target arch other than alpha or i386: ia64, sparc64, powerpc, amd64 to make sure neither platform has an advantage in the buildworld steps) Even this isn't a fair comparison as you've got to build the cross tools and that will be way different on the two platforms. So what you really want to do, is do a full cross-build, and then rebuild only stage4 and time that. See /usr/src/Makefile.inc1 to figure out the internal targets. From owner-freebsd-performance@FreeBSD.ORG Wed Feb 25 11:43:29 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2016316A4CE; Wed, 25 Feb 2004 11:43:29 -0800 (PST) Received: from alcanet.com.au (mail2.alcanet.com.au [203.62.196.17]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5D21343D31; Wed, 25 Feb 2004 11:43:28 -0800 (PST) (envelope-from peter.jeremy@alcatel.com.au) Received: from sydsmtp02.alcatel.com.au (IDENT:root@localhost.localdomain [127.0.0.1])i1PJhQMw022352; Thu, 26 Feb 2004 06:43:26 +1100 Received: from gsmx07.alcatel.com.au ([139.188.20.247]) by sydsmtp02.alcatel.com.au (Lotus Domino Release 5.0.12) with ESMTP id 2004022606432580:187900 ; Thu, 26 Feb 2004 06:43:25 +1100 Received: from gsmx07.alcatel.com.au (localhost [127.0.0.1]) i1PJhPHQ064718; Thu, 26 Feb 2004 06:43:25 +1100 (EST) (envelope-from peter.jeremy@alcatel.com.au) Received: (from jeremyp@localhost) by gsmx07.alcatel.com.au (8.12.9p2/8.12.9/Submit) id i1PJhOPq064717; Thu, 26 Feb 2004 06:43:24 +1100 (EST) (envelope-from peter.jeremy@alcatel.com.au) Date: Thu, 26 Feb 2004 06:43:24 +1100 From: Peter Jeremy To: Petri Helenius Message-ID: <20040225194324.GI10121@gsmx07.alcatel.com.au> Mail-Followup-To: Petri Helenius , Charles Swiger , freebsd-performance@freebsd.org, freebsd-alpha@freebsd.org References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> <20040225025953.GH10121@gsmx07.alcatel.com.au> <403C6A24.80804@he.iki.fi> Mime-Version: 1.0 In-Reply-To: <403C6A24.80804@he.iki.fi> User-Agent: Mutt/1.4.2i X-MIMETrack: Itemize by SMTP Server on SYDSMTP02/AlcatelAustralia(Release 5.0.12 |February 13, 2003) at 26/02/2004 06:43:25 AM,|February 13, 2003) at 26/02/2004 06:43:26 AM, Serialize complete at 26/02/2004 06:43:26 AM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline cc: freebsd-performance@freebsd.org cc: Charles Swiger cc: freebsd-alpha@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 19:43:29 -0000 On 2004-Feb-25 11:25:56 +0200, Petri Helenius wrote: >This probably invites the question, what, if anything people like me who >are interested in getting the maximum performance out of any hardware >our things run on (maybe with the exception of the low-MHz embedded >stuff :-), is there any good tutorials/books on the subject what kind of >things to avoid when looking for optimal performance. The tightest loops >mostly do counter rolling, comparisons and pattern matching and we have >good mileage on getting performance gains by minimizing writing to >memory when there are other options like arithmetic on the fly. Keep in mind several over-riding rules: 1) Make sure the code is correct before worrying about performance 2) Measure the performance and only worry about the slow bits 3) A better algorithm will virtually always give the biggest performance gain I can't suggest any general books off-hand (I'm sure someone else in -performance will know). You will need the data sheet or programmers manual for the specific CPU you are aiming for, as well as the relevant architecture manual (Intel publish a 3-volume iA32 architecture manual that you can download from the web, the Alpha AXP architecture manual is also available online from the HP website). The AXP manual includes two chapters describing general techniques for AXP coding. The individual CPU datasheets describe the number and capabilities of execution units and how the instruction scheduling works, as well as a matrix of instruction timings (how many clocks you need to leave between a producer and a consumer instruction to avoid a bubble). These numbers and definitions need to be mapped into the scheduling tables for your compiler. Keep in mind that both the iA32 and AXP CPUs have embedded performance counters. These will be very useful to monitor low-level details like pipeline stalls, branch mis-predictions, cache misses etc. -- Peter Jeremy From owner-freebsd-performance@FreeBSD.ORG Wed Feb 25 12:33:46 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5849D16A4CE for ; Wed, 25 Feb 2004 12:33:46 -0800 (PST) Received: from rms04.rommon.net (rms04.rommon.net [212.54.2.140]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4FC9443D2D for ; Wed, 25 Feb 2004 12:33:45 -0800 (PST) (envelope-from pete@he.iki.fi) Received: from he.iki.fi (h81.vuokselantie10.fi [193.64.42.129]) by rms04.rommon.net (8.12.9p1/8.12.9) with ESMTP id i1PKXXcM025188; Wed, 25 Feb 2004 22:33:33 +0200 (EET) (envelope-from pete@he.iki.fi) Message-ID: <403D06AE.8070903@he.iki.fi> Date: Wed, 25 Feb 2004 22:33:50 +0200 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Peter Jeremy References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> <20040225025953.GH10121@gsmx07.alcatel.com.au> <403C6A24.80804@he.iki.fi> <20040225194324.GI10121@gsmx07.alcatel.com.au> In-Reply-To: <20040225194324.GI10121@gsmx07.alcatel.com.au> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-performance@freebsd.org cc: Charles Swiger Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 20:33:46 -0000 Talking about different instruction sets and compiler scheluding options. Would it be considered a good idea to introduce a sysctl which would contain the maximum mcpu= value for the currently running architechture? This way one could provide with multiple executables and a startup script, in the fashion of: prog.i386 prog.pentium2 prog.pentium3 prog.pentium4 prog.athlon-mp etc... This would ease utilization of various options and probably excersize the CPUs to better extent. Pete > > From owner-freebsd-performance@FreeBSD.ORG Wed Feb 25 12:36:01 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F3F4216A4CE; Wed, 25 Feb 2004 12:36:00 -0800 (PST) Received: from out004.verizon.net (out004pub.verizon.net [206.46.170.142]) by mx1.FreeBSD.org (Postfix) with ESMTP id A83D443D1D; Wed, 25 Feb 2004 12:36:00 -0800 (PST) (envelope-from cswiger@mac.com) Received: from mac.com ([68.161.120.219]) by out004.verizon.net (InterMail vM.5.01.06.06 201-253-122-130-106-20030910) with ESMTP id <20040225203559.RHNB8186.out004.verizon.net@mac.com>; Wed, 25 Feb 2004 14:35:59 -0600 Message-ID: <403D072C.7090207@mac.com> Date: Wed, 25 Feb 2004 15:35:56 -0500 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org, freebsd-alpha@freebsd.org References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> <20040225025953.GH10121@gsmx07.alcatel.com.au> <403C3053.5030204@mac.com> <20040225193053.GL7567@dragon.nuxi.com> In-Reply-To: <20040225193053.GL7567@dragon.nuxi.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at out004.verizon.net from [68.161.120.219] at Wed, 25 Feb 2004 14:35:59 -0600 cc: obrien@FreeBSD.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 20:36:01 -0000 David O'Brien wrote: > On Wed, Feb 25, 2004 at 12:19:15AM -0500, Chuck Swiger wrote: >>>Maybe in theory, but not necessarily in practice. >> >>It's been a few years since I'd written a compiler, but my viewpoint isn't >>based entirely on theory. [ ... ] >> Your technical description is accurate, but the points you are making here >> seem to support my argument, rather than contradict what I said. :-) > > You're assuming you're writing a compiler targeting _1_ specific > architecture. No, sir, I certainly do not make such an assumption. Most optimization techniques are architecture-independant: liveness analysis, CSE, dead code elimination, moving invariants out of loops, branch threading, algorithmic identities and strength-reduction. These optimizations are most commonly done working with the 3-argument intermediate code that portable compilers (PCC, GCC) typically utilize before target platform code generation is actually performed. There are a few additional optimizations which are architecture specific, such as instruction scheduling and peephole/template optimizations, but these optimizations generally make much less difference to performance than the architecture-independant optimizations mentioned above. Although on some platforms, they can make enough difference that a second pass at CSE or instruction rescheduling against the target assembly code can be worth doing. > It doesn't matter what is possible, what matters is what > GCC does. Please go analysis GCC and report the deficiencies. I > personally would love to know what they are, and how to make GCC do > better on non-x86 platforms. I agree that what GCC does matters, not theories. I don't have access to Alpha hardware, which is a barrier although not an insuperable one. I'd do better considering SPARC or PPC hardware, which I actually have available to me. Still, I won't use this as an excuse: A quick look suggests that Alpha code generation is deficient dealing with unsigned integers because the architecture uses a "sign extended" format to store and convert 32-bit unsigned ints (aka "long words") into the (64-bit, aka "quad-word") registers. Dealing with unsigned ints smaller than 32-bits very probably is also slow because the Alpha requires operand-size byte-alignment for all memory access. [ "The Alpha does not directly support byte-level operations such as transferring single bytes between memory and registers. In principal, we could use the instructions already presented to realize bytelevel manipulations, but a large amount of shifting and masking would be required. For example, consider the C operation *dest = *src, where both dest and src are of type (char *). This operation must read the single byte pointed to by src and update the single byte pointed to by dest. Without special byte manipulation instructions, this simple operation requires 17 Alpha instructions!" ] Supposedly, the ldq_u and stq_u instructions are the right way to handle byte-level memory access, and it would be worth looking at how well GCC utilizes these opcodes dealing with chars and shorts. Some of these issues cannot be addressed by changes to the compiler: I suspect that FreeBSD's derivation and focus on the x86 architecture means it uses a lot of int8 or int16 values which are fast on Intel hardware, whereas using int32 or int64 representations would actually prove much faster on the Alpha than using smaller-sized quantities. -- -Chuck From owner-freebsd-performance@FreeBSD.ORG Wed Feb 25 13:06:07 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CAC2F16A4CE for ; Wed, 25 Feb 2004 13:06:07 -0800 (PST) Received: from out002.verizon.net (out002pub.verizon.net [206.46.170.141]) by mx1.FreeBSD.org (Postfix) with ESMTP id 80FBC43D1F for ; Wed, 25 Feb 2004 13:06:07 -0800 (PST) (envelope-from cswiger@mac.com) Received: from mac.com ([68.161.120.219]) by out002.verizon.net (InterMail vM.5.01.06.06 201-253-122-130-106-20030910) with ESMTP id <20040225210606.IAOQ23576.out002.verizon.net@mac.com>; Wed, 25 Feb 2004 15:06:06 -0600 Message-ID: <403D0E3B.6090805@mac.com> Date: Wed, 25 Feb 2004 16:06:03 -0500 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Petri Helenius References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> <20040225025953.GH10121@gsmx07.alcatel.com.au> <403C6A24.80804@he.iki.fi> <20040225194324.GI10121@gsmx07.alcatel.com.au> <403D06AE.8070903@he.iki.fi> In-Reply-To: <403D06AE.8070903@he.iki.fi> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at out002.verizon.net from [68.161.120.219] at Wed, 25 Feb 2004 15:06:06 -0600 cc: freebsd-performance@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 21:06:07 -0000 Petri Helenius wrote: > Talking about different instruction sets and compiler scheluding > options. Would it be considered a good idea to introduce a sysctl which > would contain the maximum mcpu= value for the currently running > architechture? This way one could provide with multiple executables and > a startup script, in the fashion of: > prog.i386 > prog.pentium2 > prog.pentium3 > prog.pentium4 > prog.athlon-mp > etc... The idea you've suggested is interesting, although the distinction between code generation between a P2 and P4, or for the AMD chips is fairly minimal for most code, the obvious exception being code which tries to take advantage of CPU features like MMX, SSE, & 3D-Now! In other words, your suggestion wouldn't help grep or the kernel very much, but could be fairly useful for multimedia apps. There's also a very good implementation for supporting multiple architectures within a single binary, called the Mach-O executable format (rather than ELF) used to create "fat binaries", or "MAB"s (multi-architecture binaries). Mach-O is the format used by NEXTSTEP and MacOS X. Typically, adding a new architecture only adds about ~15% to the size of a particular executable, although that can vary quite widely. From /usr/include/mach-o/arch.h: /* The NXArchInfo structs contain the architectures symbolic name * (such as "ppc"), its CPU type and CPU subtype as defined in * mach/machine.h, the byte order for the architecture, and a * describing string (such as "PowerPC"). * There will both be entries for specific CPUs (such as ppc604e) as * well as generic "family" entries (such as ppc). */ typedef struct { const char *name; cpu_type_t cputype; cpu_subtype_t cpusubtype; enum NXByteOrder byteorder; const char *description; } NXArchInfo; #if __cplusplus extern "C" { #endif /* __cplusplus */ /* NXGetAllArchInfos() returns a pointer to an array of all known * NXArchInfo structures. The last NXArchInfo is marked by a NULL name. */ extern const NXArchInfo *NXGetAllArchInfos(void); /* NXGetLocalArchInfo() returns the NXArchInfo for the local host, or NULL * if none is known. */ extern const NXArchInfo *NXGetLocalArchInfo(void); /* NXGetArchInfoFromName() and NXGetArchInfoFromCpuType() return the * NXArchInfo from the architecture's name or cputype/cpusubtype * combination. A cpusubtype of CPU_SUBTYPE_MULTIPLE can be used * to request the most general NXArchInfo known for the given cputype. * NULL is returned if no matching NXArchInfo can be found. */ extern const NXArchInfo *NXGetArchInfoFromName(const char *name); extern const NXArchInfo *NXGetArchInfoFromCpuType(cpu_type_t cputype, cpu_subtype_t cpusubtype); /* NXFindBestFatArch() is passed a cputype and cpusubtype and a set of * fat_arch structs and selects the best one that matches (if any) and returns * a pointer to that fat_arch struct (or NULL). The fat_arch structs must be * in the host byte order and correct such that the fat_archs really points to * enough memory for nfat_arch structs. It is possible that this routinecould * fail if new cputypes or cpusubtypes are added and an old version of this * routine is used. But if there is an exact match between the cputype and * cpusubtype and one of the fat_arch structs this routine will alwayssucceed. */ extern struct fat_arch *NXFindBestFatArch(cpu_type_t cputype, cpu_subtype_t cpusubtype, struct fat_arch *fat_archs, unsigned long nfat_archs); [ ... ] ---------- /usr/include/mach/machine.h supports the following CPUTYPEs: /* * Machine types known by all. */ #define CPU_TYPE_ANY ((cpu_type_t) -1) #define CPU_TYPE_VAX ((cpu_type_t) 1) /* skip ((cpu_type_t) 2) */ /* skip ((cpu_type_t) 3) */ /* skip ((cpu_type_t) 4) */ /* skip ((cpu_type_t) 5) */ #define CPU_TYPE_MC680x0 ((cpu_type_t) 6) #define CPU_TYPE_I386 ((cpu_type_t) 7) /* skip CPU_TYPE_MIPS ((cpu_type_t) 8) */ /* skip ((cpu_type_t) 9) */ #define CPU_TYPE_MC98000 ((cpu_type_t) 10) #define CPU_TYPE_HPPA ((cpu_type_t) 11) /* skip CPU_TYPE_ARM ((cpu_type_t) 12) */ #define CPU_TYPE_MC88000 ((cpu_type_t) 13) #define CPU_TYPE_SPARC ((cpu_type_t) 14) #define CPU_TYPE_I860 ((cpu_type_t) 15) /* skip CPU_TYPE_ALPHA ((cpu_type_t) 16) */ /* skip ((cpu_type_t) 17) */ #define CPU_TYPE_POWERPC ((cpu_type_t) 18) ...which appear to be a proper superset of the platforms FreeBSD supports. For the sake of reference, since the CPU_SUBTYPE list is ~200 lines, here are the x86 variants MachO knows about: /* * I386 subtypes. */ #define CPU_SUBTYPE_I386_ALL ((cpu_subtype_t) 3) #define CPU_SUBTYPE_386 ((cpu_subtype_t) 3) #define CPU_SUBTYPE_486 ((cpu_subtype_t) 4) #define CPU_SUBTYPE_486SX ((cpu_subtype_t) 4 + 128) #define CPU_SUBTYPE_586 ((cpu_subtype_t) 5) #define CPU_SUBTYPE_INTEL(f, m) ((cpu_subtype_t) (f) + ((m) << 4)) #define CPU_SUBTYPE_PENT CPU_SUBTYPE_INTEL(5, 0) #define CPU_SUBTYPE_PENTPRO CPU_SUBTYPE_INTEL(6, 1) #define CPU_SUBTYPE_PENTII_M3 CPU_SUBTYPE_INTEL(6, 3) #define CPU_SUBTYPE_PENTII_M5 CPU_SUBTYPE_INTEL(6, 5) #define CPU_SUBTYPE_INTEL_FAMILY(x) ((x) & 15) #define CPU_SUBTYPE_INTEL_FAMILY_MAX 15 #define CPU_SUBTYPE_INTEL_MODEL(x) ((x) >> 4) #define CPU_SUBTYPE_INTEL_MODEL_ALL 0 -- -Chuck From owner-freebsd-performance@FreeBSD.ORG Thu Feb 26 13:58:31 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BBE1216A4CE; Thu, 26 Feb 2004 13:58:31 -0800 (PST) Received: from svaha.com (svaha.com [64.46.156.67]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7FE6C43D2D; Thu, 26 Feb 2004 13:58:31 -0800 (PST) (envelope-from meconlen@obfuscated.net) Received: from [66.232.154.115] ([66.232.154.115]) (AUTH: LOGIN meconlen, TLS: TLSv1/SSLv3,128bits,RC4-SHA) by svaha.com with esmtp; Thu, 26 Feb 2004 16:58:20 -0500 In-Reply-To: <5FCEDFA8-68A3-11D8-870A-003065ABFD92@mac.com> References: <478667A6-6892-11D8-A5DD-00039367611E@obfuscated.net> <5FCEDFA8-68A3-11D8-870A-003065ABFD92@mac.com> Mime-Version: 1.0 (Apple Message framework v612) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Michael Conlen Date: Thu, 26 Feb 2004 16:57:44 -0500 To: Charles Swiger X-Mailer: Apple Mail (2.612) cc: freebsd-performance@freebsd.org cc: freebsd-questions@freebsd.org Subject: Re: NFS server usage X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Feb 2004 21:58:31 -0000 On Feb 26, 2004, at 4:33 PM, Charles Swiger wrote: > Well, you are going to be bottlenecked potentially by your network or > by the maximum I/O rate that your NFS server can sustain. Your data > suggests you ought to be able to handle about two orders of magnitude > more net traffic, if you're over a dedicated 100 Mbs connection > between server and clients (ie, using a switch), so it's likely that > you're going to run into limits due to your disks well before then. > > You can probably switch to using rsync or some other replication > scheme instead of NFS if you do run into limits, and keep the files > locally if need be. The production system will use dual channel U320 RAID controllers with 12 disks per channel, so disk shouldn't be an issue, and it will connect with GigE, so network is plenty fine, now I'm on to CPU. Low volume tests with live data indicate low CPU usage however when I best fit the graph it's dificult to tell how linear (or non linear) the data is. I've got a ton of points between 7.5Mbit/sec web traffic and and 17Mibt/sec but all the points beyond that are somewhat scattered up to about 23Mibt/sec (with a corresponding 5% load in NFS traffic.) The first interval is pretty linear but the first and second interval are not and appear exponential, and the numbers indicate that a 2Gz Xeon system that's using 2% CPU around 8Mbit in web traffic and 3% around 15 Mbit suddenly using 50% CPU at 52Mbit and 250% at 75Mbit. (presuming 5% of that traffic ends up actually going over NFS). Does that kind of curve look accurate to you (anyone)? Would a web page with pretty pictures help anyone understand what I just said? -- Michael Conlen From owner-freebsd-performance@FreeBSD.ORG Thu Feb 26 14:46:35 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A510D16A4CF; Thu, 26 Feb 2004 14:46:35 -0800 (PST) Received: from smtpout.mac.com (smtpout.mac.com [17.250.248.86]) by mx1.FreeBSD.org (Postfix) with ESMTP id 748CE43D1F; Thu, 26 Feb 2004 14:46:35 -0800 (PST) (envelope-from cswiger@mac.com) Received: from mac.com (smtpin08-en2 [10.13.10.153]) by smtpout.mac.com (Xserve/MantshX 2.0) with ESMTP id i1QMkZpS028263; Thu, 26 Feb 2004 14:46:35 -0800 (PST) Received: from [10.1.1.193] ([199.103.21.225]) (authenticated bits=0) by mac.com (Xserve/smtpin08/MantshX 3.0) with ESMTP id i1QMkXPP012073; Thu, 26 Feb 2004 14:46:34 -0800 (PST) In-Reply-To: References: <478667A6-6892-11D8-A5DD-00039367611E@obfuscated.net> <5FCEDFA8-68A3-11D8-870A-003065ABFD92@mac.com> Mime-Version: 1.0 (Apple Message framework v612) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <9F44162A-68AD-11D8-870A-003065ABFD92@mac.com> Content-Transfer-Encoding: 7bit From: Charles Swiger Date: Thu, 26 Feb 2004 17:46:31 -0500 To: Michael Conlen X-Mailer: Apple Mail (2.612) cc: freebsd-performance@freebsd.org cc: freebsd-questions@freebsd.org Subject: Re: NFS server usage X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Feb 2004 22:46:35 -0000 On Feb 26, 2004, at 4:57 PM, Michael Conlen wrote: > [ ... ] > The production system will use dual channel U320 RAID controllers with > 12 disks per channel, so disk shouldn't be an issue, and it will > connect with GigE, so network is plenty fine, now I'm on to CPU. Sounds like you've gotten nice hardware. Four or so years ago, I built out a roughly comparible fileserver [modulo the progess in technology since then] on a Sun E450, which housed 10 SCA-form-factor disks over 5 UW SCSI channels (using 64-bit PCI and backplane, though), and could have held a total of 20 disks if I'd filled it. I mention this because... > Low volume tests with live data indicate low CPU usage however when I > best fit the graph it's dificult to tell how linear (or non linear) > the data is. [ ... ] Does that kind of curve look accurate to you > (anyone)? ...even under stress testing on the faster four-disk RAID-10 volume using SEAGATE-ST336752LC drives (15K RPM, 8MB cache), each on a seperate channel, with ~35 client machines bashing away, the fileserver would bottleneck on disk I/O without more than maybe 10% or 15% CPU load, and that was using a 400MHz CPU. The notion that an NFS fileserver is going to end up CPU-bound simply doesn't match my experience or my expectations. If you have single-threaded sequential I/O patterns (like running dd, or maybe a database), you'll bottleneck on the interface or maximum disk throughput, otherwise even with ~3.5 ms seek times, multi-threaded I/O from a buncha clients will require the disk heads to move around so much that you bottleneck at a certain number of I/O operations per second per disk, rather than a given bandwidth per disk. -- -Chuck From owner-freebsd-performance@FreeBSD.ORG Sat Feb 28 10:48:00 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1BD9C16A4CE; Sat, 28 Feb 2004 10:48:00 -0800 (PST) Received: from ms-smtp-03-eri0.southeast.rr.com (ms-smtp-03-lbl.southeast.rr.com [24.25.9.102]) by mx1.FreeBSD.org (Postfix) with ESMTP id B31B343D1D; Sat, 28 Feb 2004 10:47:59 -0800 (PST) (envelope-from wegster@mindcore.net) Received: from mindcore.net (rdu162-234-100.nc.rr.com [24.162.234.100]) i1SIlus1020338; Sat, 28 Feb 2004 13:47:57 -0500 (EST) Message-ID: <4040E25C.60608@mindcore.net> Date: Sat, 28 Feb 2004 13:47:56 -0500 From: Scott W User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20031129 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Charles Swiger References: <478667A6-6892-11D8-A5DD-00039367611E@obfuscated.net> <5FCEDFA8-68A3-11D8-870A-003065ABFD92@mac.com> <9F44162A-68AD-11D8-870A-003065ABFD92@mac.com> In-Reply-To: <9F44162A-68AD-11D8-870A-003065ABFD92@mac.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: Symantec AntiVirus Scan Engine X-Mailman-Approved-At: Sat, 28 Feb 2004 14:56:19 -0800 cc: freebsd-performance@freebsd.org cc: Michael Conlen cc: freebsd-questions@freebsd.org Subject: Re: NFS server usage X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Feb 2004 18:48:00 -0000 Charles Swiger wrote: > On Feb 26, 2004, at 4:57 PM, Michael Conlen wrote: > >> [ ... ] >> The production system will use dual channel U320 RAID controllers >> with 12 disks per channel, so disk shouldn't be an issue, and it will >> connect with GigE, so network is plenty fine, now I'm on to CPU. > > > Sounds like you've gotten nice hardware. Four or so years ago, I > built out a roughly comparible fileserver [modulo the progess in > technology since then] on a Sun E450, which housed 10 SCA-form-factor > disks over 5 UW SCSI channels (using 64-bit PCI and backplane, > though), and could have held a total of 20 disks if I'd filled it. I > mention this because... > >> Low volume tests with live data indicate low CPU usage however when >> I best fit the graph it's dificult to tell how linear (or non linear) >> the data is. [ ... ] Does that kind of curve look accurate to you >> (anyone)? > > > ...even under stress testing on the faster four-disk RAID-10 volume > using SEAGATE-ST336752LC drives (15K RPM, 8MB cache), each on a > seperate channel, with ~35 client machines bashing away, the > fileserver would bottleneck on disk I/O without more than maybe 10% or > 15% CPU load, and that was using a 400MHz CPU. > > The notion that an NFS fileserver is going to end up CPU-bound simply > doesn't match my experience or my expectations. If you have > single-threaded sequential I/O patterns (like running dd, or maybe a > database), you'll bottleneck on the interface or maximum disk > throughput, otherwise even with ~3.5 ms seek times, multi-threaded I/O > from a buncha clients will require the disk heads to move around so > much that you bottleneck at a certain number of I/O operations per > second per disk, rather than a given bandwidth per disk. > Just to add a few .02 cents. Experience has shown pretty much the same as mentioned. I've done some fileserving performance benchmarks (more than I want to count) a while back for a company that was working on a new fileserver 'appliance' system like a lower end to midrange NetApp. Once your network bandwidth was taken care of (meaning enough bandwidth to handle incoming requests), the bottlenecks inevitably were disk I/O- note that this was not always nescessarily indicating adding more disks- if you have a few dozen disks hanging off a dual channel SCSI or RAID card, the actual bottleneck could be the bus the card is plugged into, or the bus speed/bandwidth, so splitting the load across multiple cards (and buses if possible) can be the culprit instead of adding more disk. Other things worth looking at are buffer sizes, both for system and TCP/IP, as well as mount options for NFS shares- if your NFS server is using battery batcked up cache, and is also on a UPS, you definately want to use async in your mount options from clients to speed things up significantly. read and write buffer sizes seem to do best nowadays (huge generalization, but seems to be true for different systems and *NIX OSes I have currently) is somewhere in the 32k-64k range (rsize/wsize client options). One thing that may be worth something as well is the disk throughput itself- on an U320 interface, if you're loaded with 15 disks per channel, it _may_ be bottlenecking the U320 bus at that point. I don't have currently valid numbers on what realistic sustained output is for U320, but I'm sure it can be googled easily enough- I'd expect sustained transfer to be on the order of ~160MB/sec, which is fairly likely to be saturated with 10 or fewer disks. Lastly, you're almost always better, if you can afford the hardware, to handle different types of access via different controllers- in other words, if you are going to be handling mail, web, user home, and a database over NFS or SMB, break them up into individual filesystems, preferably on their own channel and disks, opposed to combining. (This is ignoring the fact that mail, apache, and DBs should really be served by local disk, but as an obvious example.) This is actually just a re-statement of the previous posters comment about disk I/O from many clients moving the heads around, but is certainly true.. Scott