Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 May 2005 18:53:02 -0400
From:      David Schultz <das@FreeBSD.ORG>
To:        Garrett Wollman <wollman@csail.mit.edu>
Cc:        freebsd-chat@FreeBSD.ORG
Subject:   Re: FreeBSD Security Advisory FreeBSD-SA-05:09.htt [REVISED]
Message-ID:  <20050517225302.GA55476@VARK.MIT.EDU>
In-Reply-To: <17029.25466.587442.577866@khavrinen.csail.mit.edu>
References:  <245f0df105051318564b1ffb6b@mail.gmail.com> <94145.1116037219@critter.freebsd.dk> <17029.25466.587442.577866@khavrinen.csail.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, May 13, 2005, Garrett Wollman wrote:
> <<On Sat, 14 May 2005 04:20:19 +0200, "Poul-Henning Kamp" <phk@phk.freebsd.dk> said:
> 
> > The political problem is that if all operating systems do that,
> > Intel has a pretty dud feature on their hands, and they are not
> > particularly eager to accept that fact.
> 
> Intel already had a pretty dud feature on their hands; just ask anyone
> in the architecture community (probably including those who work for
> Intel).  Pentium 4 CPUs simply don't have enough I/O bandwidth to
> maintain two simultaneous, independent instruction streams.  The value
> to the feature can't be realized until you have enough cache (in both
> size and bandwidth) to be able to partition it among logical CPUs in
> exactly the manner that Colin has suggested.  (The fundamental problem
> in computer architecture for the past several years has been how to
> deal with the fact that gates are cheap and easy to make, but wires --
> particularly external I/O wires -- are expensive and hard.)
> 
> The only way to get full performance out of an HTT processor today is
> for both threads to be running out of L1 cache.  Multimedia and
> numerical benchmarks are often parallelizable in this way (assuming
> the OS provides gang scheduling); general-purpose applications rarely
> are.

That's true, but SMT wasn't designed to double performance.  It
can still be advantageous even when the bottleneck is I/O because
a single thread typically cannot make use of all available memory
bandwidth.  This is particularly the case with OS kernels and
databases, which spend large amounts of time chasing pointers
around.  SMT allows the processor to perform useful work while one
thread is blocked on a cache miss.  Yes, if there isn't enough I/O
bandwidth, chances are the second thread will eventually block,
too.  However, by the time the first thread unblocks, the
processor will already have a new address to put on the bus,
rather than leaving the memory bus idle while the first thread
generates the next address.  Of course, SMT works even better if
one of the threads can operate out of cache.

Admittedly, there are other downsides that hurt performance or
detract from the above argument.  Added complexity, false sharing,
write buffers, additional context switch expense, etc., are a few
things that come to mind.  I don't know how all the tradeoffs work
out for Intel's particular SMT implementation.  But the fact that
HTT doesn't work well doesn't mean that the idea can't work.
After all, SMT wasn't designed to solve I/O bottlenecks; it was
designed to address the problem that there are still millions of
idle transistors on chips these days even after extracting all
possible parallelism from a single instruction stream.
Personally, I'm betting on multi-core chips, which address some of
the same problems with a different set of tradeoffs, but I
wouldn't be surprised to see hybrids of the two.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050517225302.GA55476>