Date: Thu, 16 Mar 2006 09:17:29 -0800 From: Peter Wemm <peter@wemm.org> To: JoaoBR <joao@matik.com.br> Cc: freebsd-amd64@freebsd.org Subject: Re: amd64 slower than i386 on identical AMD 64 system? / How is hyperthreading handled on amd64? Message-ID: <200603160917.30225.peter@wemm.org> In-Reply-To: <200603160747.00051.joao@matik.com.br> References: <20060313221836.5491916A420@hub.freebsd.org> <200603151356.27972.peter@wemm.org> <200603160747.00051.joao@matik.com.br>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 16 March 2006 02:46 am, JoaoBR wrote: > On Wednesday 15 March 2006 18:56, Peter Wemm wrote: > > I tend to agree with this. ubench is not a useful benchmark for > > comparing 32 bit vs 64 bit systems. > > > > However, what might be interesting is to compile a 32 bit binary > > (and statically link it) on the i386 system, and compare the > > runtime on the 64 bit kernel, using the same identical binary. > > That way you are measuring the same math operations on both > > platforms. Comparing 64 bit operations vs 32 bit operations is > > apples vs oranges. > > > > Of course, it may still be slower, but at least the results would > > be more meaningful. Don't assume the OS is slower because the > > compiler makes the application do twice the work. > > good point > what do you think of unixbench since it does some real-life tasks? In general, I don't like synthetic benchmarks at all. What we do at work is put them under real workloads alongside a comparison system, and measure idle cpu trends over a day or so. A comparison where one machine has a 30% idle cpu and the other has a 40% idle cpu under the same *real* workload tells us the most. Unfortunately, we have some folks here that like to push the machines to the wall. The problem is that FreeBSD 5 and later tend to not "hit the wall gracefully" and the results of those are more often a test of how badly the kernel suffers from lock contention than how it runs under real load. Still, the max workload numbers are useful because it tells you what the worst case is. BTW: don't compare 'make buildworld' of i386 vs amd64, because amd64 not only builds things differently, but builds all the libraries twice. amd64 has 5 stages, i386 has 4. Even a 'make TARGET_ARCH=i386' isn't entirely a fair comparison because one has to build a 64 bit host compiler in one stage, the other has to build a 32 bit host compiler. gcc even turns off some optimizations when operating as a cross compiler. An actual 32 bit buildworld in a 32 bit chroot on both machines is a fair comparison of buildworld times from an OS perspective because they are building exactly the same thing. But that doesn't make it meaningful if you're interested in 'buildworld' times as a FreeBSD developer who does a buildworld umpteen times per day as part of compile testing. Anyway, one has to keep in mind whether a given test is of the operating system port, or the cpu architecture, or application performance. ubench in particular is stronly affected by 32 vs 64 bit because it generates a very different workload for itself depending on the size of the machine. There are a number of weaknesses in the amd64 port too. In particular, the math library does not yet use the generally superior SSE2 instructions. This is a real setback because the ABI uses SSE2 floating point parameter passing. The effect is that some random libm function is given a SSE2 register, which we convert to and x87 fp stack register, do the x87 operation, then convert the x87 stack register back to a SSE2 register then return the SSE2 result. This is especially unfortunate when the native SSE2 instruction that would operate on the SSE2 registers directly is faster. But, I don't know SSE2 nor x87 fpu assembler code very well, so I've done "just enough" to get things to work. It is worth reiterating that I do NOT expect the amd64 port to be better than i386 across the board. Nor even in most tests. But the difference should be minimal, except in some specific cases where the 64 bit nature really helps. eg: if you want to mmap a 3GB file. You can't do that on an i386 kernel machine. I think of the advantages of using the amd64 port in terms of functionality rather than performance. You definately have to consider functionality if you want a desktop though. flash plugins for browsers are right out, for example, unless you use the linux browser builds. Most of the time though, no flash is usually good because you get less annoying ads. :-) -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200603160917.30225.peter>