From owner-freebsd-amd64@FreeBSD.ORG Wed Mar 15 21:57:48 2006 Return-Path: X-Original-To: freebsd-amd64@freebsd.org Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A087D16A423 for ; Wed, 15 Mar 2006 21:57:48 +0000 (UTC) (envelope-from peter@wemm.org) Received: from daintree.corp.yahoo.com (daintree.corp.yahoo.com [216.145.52.172]) by mx1.FreeBSD.org (Postfix) with ESMTP id E47D543DA3 for ; Wed, 15 Mar 2006 21:56:41 +0000 (GMT) (envelope-from peter@wemm.org) Received: by daintree.corp.yahoo.com (Postfix, from userid 2154) id 857E919773; Wed, 15 Mar 2006 13:56:28 -0800 (PST) From: Peter Wemm To: freebsd-amd64@freebsd.org, cokane@cokane.org Date: Wed, 15 Mar 2006 13:56:27 -0800 User-Agent: KMail/1.8.3 References: <20060313221836.5491916A420@hub.freebsd.org> <200603140740.38388.joao@matik.com.br> <346a80220603141520i2ac1a4br66cbfb213453dcd6@mail.gmail.com> In-Reply-To: <346a80220603141520i2ac1a4br66cbfb213453dcd6@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200603151356.27972.peter@wemm.org> Cc: kono@kth.se Subject: Re: amd64 slower than i386 on identical AMD 64 system? / How is hyperthreading handled on amd64? X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Mar 2006 21:57:48 -0000 On Tuesday 14 March 2006 03:20 pm, Coleman Kane wrote: > On 3/14/06, JoaoBR wrote: > > On Tuesday 14 March 2006 07:06, Alexander Konovalenko wrote: > > > > Hi > > > > Since some time (>6.0R) I have the impression that amd64 runs > > > > slower > > > > than > > > > > > i386. Now I run some tests on identical hardware and using > > > > ubench confirmes this. Somebody has comments on this? > > > > > > I have Dual core AMD64 4400+ and FreeBSD RELENG_5. I don't have > > > FreeBSD i386 installed but you can just compare benchmarks. > > > > > > ubench uses all CPU/cores by default, when one ubench is running, > > > top shows: > > > > so where is your comparism? My point was that the same hardware is > > faster running i386 > > > > I experience this also on X2 machines but do not have two machines > > to compare > > I have a X2-4400-SMP running amd64 and a X2-4200-SMP running i386 > > and it gives > > me the same numbers running ubench > > > > > > > > Jo=E3o > > > > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU =20 > > > CPU COMMAND 11528 XXXX 111 0 3572K 880K RUN 1 =20 > > > 0:12 93.64% 42.29% ubench 11529 XXXX 111 0 3572K 880K > > > CPU0 1 0:11 97.21% 41.16% ubench 11526 XXXX -8 0=20 > > > 3572K 880K piperd 0 0:17 41.76% 31.98% ubench > > > > > > > > > one ubench executed (with no -s flag =3D use all CPU, default): > > > > > > Unix Benchmark Utility v.0.3 > > > Copyright (C) July, 1999 PhysTech, Inc. > > > Author: Sergei Viznyuk > > > http://www.phystech.com/download/ubench.html > > > FreeBSD 5.5-PRERELEASE FreeBSD 5.5-PRERELEASE #12: Sun Mar 5 > > > 17:34:07 > > > > CET > > > > > 2006 XXXX@XXXX:/usr/obj/usr/src/sys/DAEMON64SMP amd64 > > > Ubench CPU: 238149 > > > Ubench MEM: 255459 > > > -------------------- > > > Ubench AVG: 246804 > > > > > > > > > two ubench executed with -s flag (use single CPU only): > > > > > > Ubench Single CPU: 120184 (0.40s) > > > Ubench Single MEM: 126787 (0.39s) > > > ----------------------------------- > > > Ubench Single AVG: 123485 > > > > > > Ubench Single CPU: 121000 (0.41s) > > > Ubench Single MEM: 128762 (0.40s) > > > ----------------------------------- > > > Ubench Single AVG: 124881 > > > > > > > > > one ubench executed with -s flag (use single CPU only): > > > > > > Ubench Single CPU: 123251 (0.40s) > > > Ubench Single MEM: 161494 (0.40s) > > > ----------------------------------- > > > Ubench Single AVG: 142372 > > > > > > > > > /Alexander Konovalenko > > > > > > +46-8-5537-8142 (office) > > > +46-7-3752-2116 > > > http://daemon.nanophys.kth.se/~kono > > > > > > Royal Institute of Technology (KTH) > > > Nanostructure Physics Department, Albanova > > > Roslagstullsbacken 21 > > > 10691 Stockholm > > > Sweden > > > > A mensagem foi scaneada pelo sistema de e-mail e pode ser > > considerada segura. > > Service fornecido pelo Datacenter Matik=20 > > https://datacenter.matik.com.br > > _______________________________________________ > > freebsd-amd64@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-amd64 > > To unsubscribe, send any mail to > > "freebsd-amd64-unsubscribe@freebsd.org" > > I think that the nature of the ubench benchmark should be > investigated to reveal the reasons behind your dismay. It seems to me > that your assumption that 64-bit should be faster than 32-bit in all > cases is wrong. The nature of the processor design, the OS > implementation, and how ubench does its measurement needs to be > addressed. > > First of all, when comparing a 64-bit amd64 to a 32-bit IA-32 system > it is important to know that this *does not* in fact mean that if you > tested a loop of: > long x, y, z; > x =3D 1; > y =3D 1; > z =3D x + y; > > That the 64-bit machine would do 2X that above calculation. In fact, > on the 64-bit machine, the memory taken up by the x, y, z would be > double that on the i386, the add/load instruction would also double > in size, and as far as execution goes, the time *should* be about the > same for both units. This is all looking like 64-bit would, by its > nature, have a slower average than your 32-bit system. > > In addition, amd64 64-bit mode doubles your register set, increasing > the amount of memory that needs to be moved around on a context > switch, and everything is pointing towards.....probably slower. I tend to agree with this. ubench is not a useful benchmark for=20 comparing 32 bit vs 64 bit systems. However, what might be interesting is to compile a 32 bit binary (and=20 statically link it) on the i386 system, and compare the runtime on the=20 64 bit kernel, using the same identical binary. That way you are=20 measuring the same math operations on both platforms. Comparing 64 bit=20 operations vs 32 bit operations is apples vs oranges. Of course, it may still be slower, but at least the results would be=20 more meaningful. Don't assume the OS is slower because the compiler=20 makes the application do twice the work. =2D-=20 Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5