Date: Tue, 17 May 2005 18:53:02 -0400 From: David Schultz <das@FreeBSD.ORG> To: Garrett Wollman <wollman@csail.mit.edu> Cc: freebsd-chat@FreeBSD.ORG Subject: Re: FreeBSD Security Advisory FreeBSD-SA-05:09.htt [REVISED] Message-ID: <20050517225302.GA55476@VARK.MIT.EDU> In-Reply-To: <17029.25466.587442.577866@khavrinen.csail.mit.edu> References: <245f0df105051318564b1ffb6b@mail.gmail.com> <94145.1116037219@critter.freebsd.dk> <17029.25466.587442.577866@khavrinen.csail.mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, May 13, 2005, Garrett Wollman wrote: > <<On Sat, 14 May 2005 04:20:19 +0200, "Poul-Henning Kamp" <phk@phk.freebsd.dk> said: > > > The political problem is that if all operating systems do that, > > Intel has a pretty dud feature on their hands, and they are not > > particularly eager to accept that fact. > > Intel already had a pretty dud feature on their hands; just ask anyone > in the architecture community (probably including those who work for > Intel). Pentium 4 CPUs simply don't have enough I/O bandwidth to > maintain two simultaneous, independent instruction streams. The value > to the feature can't be realized until you have enough cache (in both > size and bandwidth) to be able to partition it among logical CPUs in > exactly the manner that Colin has suggested. (The fundamental problem > in computer architecture for the past several years has been how to > deal with the fact that gates are cheap and easy to make, but wires -- > particularly external I/O wires -- are expensive and hard.) > > The only way to get full performance out of an HTT processor today is > for both threads to be running out of L1 cache. Multimedia and > numerical benchmarks are often parallelizable in this way (assuming > the OS provides gang scheduling); general-purpose applications rarely > are. That's true, but SMT wasn't designed to double performance. It can still be advantageous even when the bottleneck is I/O because a single thread typically cannot make use of all available memory bandwidth. This is particularly the case with OS kernels and databases, which spend large amounts of time chasing pointers around. SMT allows the processor to perform useful work while one thread is blocked on a cache miss. Yes, if there isn't enough I/O bandwidth, chances are the second thread will eventually block, too. However, by the time the first thread unblocks, the processor will already have a new address to put on the bus, rather than leaving the memory bus idle while the first thread generates the next address. Of course, SMT works even better if one of the threads can operate out of cache. Admittedly, there are other downsides that hurt performance or detract from the above argument. Added complexity, false sharing, write buffers, additional context switch expense, etc., are a few things that come to mind. I don't know how all the tradeoffs work out for Intel's particular SMT implementation. But the fact that HTT doesn't work well doesn't mean that the idea can't work. After all, SMT wasn't designed to solve I/O bottlenecks; it was designed to address the problem that there are still millions of idle transistors on chips these days even after extracting all possible parallelism from a single instruction stream. Personally, I'm betting on multi-core chips, which address some of the same problems with a different set of tradeoffs, but I wouldn't be surprised to see hybrids of the two.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050517225302.GA55476>