Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Aug 1998 08:42:46 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        gpalmer@FreeBSD.ORG (Gary Palmer)
Cc:        chuckr@glue.umd.edu, freebsd-current@FreeBSD.ORG
Subject:   Re: Threads across processors
Message-ID:  <199808250842.BAA09842@usr05.primenet.com>
In-Reply-To: <3686.904031598@gjp.erols.com> from "Gary Palmer" at Aug 25, 98 03:53:18 am

next in thread | previous in thread | raw e-mail | index | archive | help
> From an ISP standpoint:
> 
> Cyclone, Typhoon and Breeze (from Hywind Software) are all threaded. Why? 
> Because it allows them to be a really fast system.
> 
> Intermail (and probably post.office) from software.com are threaded too 
> (Intermail even uses threaded tcl). This allows them to do some really neat 
> tricks with respect to mail processing.
> 
> Various LDAP and Radius implimentations are threaded too.

Sounds to me like you want an async call gate, not kernel threads.

Kernel threads would (probably) result in a context switch overhead on
a blocking call, whereas an async call gate would result in another
thread in the same process getting the remainder of the quantum for that
process.

The context returned by an async call gate could be on a seperate
processor; it doesn't matter to the call gate.

> passing between threads in the same process. I believe Cyclones 
> time-to-transit for an article is on the order of milliseconds. By the time 
> you dump the data on a pipe, incur the context switch, etc, you've lost the 
> advantage.
> 
> Heck, SMI wrote `doors' for the very reason that IPC *blows* in all cases,
> and that to pull off the speedups with NSCD that they wanted, they had to
> get the IPC overhead reduced a lot. I think I even have slides somewhere
> comparing pipes, SYSV SHM, etc times for message passing in terms of
> transit time.

Anything requiring a process context switch overhead, like, oh, say,
getting in line behind other processes in the scheduler, takes an
incredible amount of overhead.  So does proxying objects between
address spaces, which is what you have to do if you allocate objects
in thread local storage.

> So, I think you are missing a lot of real-time applications too.

RT, hard RT, requires kernel preeemption and priority lending and
deterministic maximal interrupt processing latency, and a lot of
other things which are topologically equivalent to kernel threading
and SMP kernel reentrancy.

The big value in kernel threads is SMP scalability, in that multiple
processors can be active in different threads in the same program
simultaneously.

This benefit is hardly worth it in most cases unless there is also
support for CPU affinity and cooperative user space scheduling to
avoid context switch overhead between instances of threads in the
same process.  This generally only works when you hand off to another
thread in the same process the remaining quantum using -- an async call
gate.  Otherwise you get to lose L2 and L1 cache contents, which really
damages your throughput on any real machine, no matter what it does for
your microbenchmarks on an unloaded machine.

There is a very nice book on these topics:

	Scheduling and Load Balancing in Parallel and Distributed
	Systems
	Behrooz A. Shirazi, Ali R. Hurson, Krishna M. Kavi
	IEEE Computer Society Press
	IEEE Catalog number EH0417-6
	ISBN: 0-8186-6587-4

This should be available from your local "Computer Literacy" bookstore.

					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808250842.BAA09842>