Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Dec 1999 20:24:28 -0500 (EST)
From:      Chris Sedore <cmsedore@maxwell.syr.edu>
To:        Jason Evans <jasone@canonware.com>
Cc:        Alfred Perlstein <bright@wintelcom.net>, Kevin Day <toasty@dragondata.com>, "Ronald F. Guilmette" <rfg@monkeys.com>, hackers@FreeBSD.ORG
Subject:   Re: Practical limit for number of TCP connections?
Message-ID:  <Pine.BSF.4.05.9912201950190.82375-100000@qwerty.maxwell.syr.edu>
In-Reply-To: <19991220164517.F26743@sturm.canonware.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On Mon, 20 Dec 1999, Jason Evans wrote:

> On Sun, Dec 19, 1999 at 03:01:41PM -0500, Chris Sedore wrote:
> > 
> > 
> > On Sat, 18 Dec 1999, Alfred Perlstein wrote:
> > 
[...TRIM...]
> > > Using a thread per connection has always been a bogus way of programming,
> > > it's easy, but it doesn't work very well.
> > 
> > Ahem.  Well, that kind of depends on the threads implementation, how many
> > connections you're talking about, and likely some other factors too.  
> > I've got an NT box that handles about 1000 concurrent connections with
> > 1000 (plus a few) threads doing the work.  Runs fine, performs very well.
> >
> > I wouldn't argue that it is the most scalable solution to problems, but it
> > is easier, and scales proportionally to the quality of the threads
> > implementation.
> 
> 1000 simultaneous connections really isn't that many, but even at 1000
> threads, you could likely achieve much better performance by using thread
> pools or a simple poll() loop rather than one thread per connection.  Why?
> Locality, locality, locality.  Consider that each thread has its own stack,
> which in the best of worlds would be 4K, but is more likely at least 16K.
> Now, start switching between threads to handle relatively small amounts of
> I/O for each connection, and consider what that does to the VM, not to
> mention the memory hierarchy of the hardware.  You might as well not even
> have L2 cache, because the program will thrash the cache so badly.  Of
> course, you won't see worst case performance if client activity is unevenly
> distributed, but you just can't get past the fact that the memory footprint
> of one thread per connection is larger than a bounded pool of threads.

In my case, load is reasonably distributed.  Is poll() really that much
better than select()?  I thought that, excepting bit flag manipulations,
it worked basically the same way on the kernel end.
 
> Some threads implementations are better than others at handling such
> abuses, but the performance of such an approach will almost always suffer
> in comparison to a design that takes locality into consideration.

True enough.  In some cases, this may not be that much of an issue,
though.  Imagine a thread-per-connection that does much of its work in a
limited call tree, with much of its work context within 8k (+/-) of the
current stack pointer.  It has to pull this into cache every time that
thread is activated.  In a thread pool implementation, it would likely
have to move about the same 16k into the cache, only from a "context
structure" which would be probably approximately as open to thrashing as
the thread stack. Add to that the fact that thread-pool applications often
utilize more synchronization primitives.
 
> I disagree with your assessment that scalability of one thread per
> connection is proportional to the quality of the threads implementation.
> An ideal threaded program would have exactly as many threads as available
> processors, and the threads would always be runnable.  Of course,
> real-world applications almost never work that way, but the goal of a
> programmer should be to have as few threads as possible while still
> achieving maximal parallelism.  If connection scalability is an issue,
> using one thread per connection ignores a critical aspect of high
> performance threaded application design.

I don't disagree with any of what you have written.  I'd expect you to
concede that it is true that the scalability is proportional.  That is,
LinuxThreads (that is, rfork()) is probably not anything like optimally
scalable, but something like the last FreeBSD KSE model that I saw
bouncing around on -arch would do alot better.

I was really responding to the assertion that thread-per-connection is
broken as a methodology. I've written programs both ways, mixed the two,
etc.  My point was that at least one OS has no problem coping with 1000
threads essentially blocked on sockets, so the base argument of "it
doesn't scale well" is, without further elaboration, hollow.  I would
agree that there is a crossover point for performance, probably below 1000
threads.  I was just injecting a little real-world experience with this in
an application which is somewhere above "small" and somewhere below
"really large".

People are still doing things like this with process-per-connection, so
threads sound much better as an alternative.

-Chris



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9912201950190.82375-100000>