Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 Nov 1999 18:04:17 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        julian@whistle.com (Julian Elischer)
Cc:        freebsd-arch@freebsd.org
Subject:   Re: Threads goals  version III
Message-ID:  <199911041804.LAA18253@usr06.primenet.com>
In-Reply-To: <Pine.BSF.4.05.9910311908080.8816-100000@home.elischer.org> from "Julian Elischer" at Oct 31, 99 07:09:04 pm

next in thread | previous in thread | raw e-mail | index | archive | help
I have been extermely busy at work, so I have not had time to
contribute tothis discussion until now.  I actually got up
early today so I could do so...


> ---Possible  system design goals of system threads support --
> --- Note just becasue something is in this list doesn't mean that
> it will be done, just that it's going o be carried forward to
> further discussion.
> --------------------------------------------------------------

I think there are some first principles missing here; the initial
design goals of threads, when they were first invented, were:

1)	A means of allowing programmers to build procedural
	soloutions instead of state machines.

	This is really a programmer convenience, for programmers
	who know how to write linear code, but don't know how to
	do finite state automata, to enable them to write code
	to solve certain classes of problems that they would not
	otherwise be able to solve.

	While I agree that it's nice that these people can be
	productive members of society without having to ask
	"would you like fries with that?", it's not really the
	overriding goal, it's just a side effect (a desirable
	one, and one that probably shouldn't be avoided in order
	to solve the other problems threads were designed to
	solve).

	Let's call this "allowing complex problems to be solved
	by procedural decomposition at a slightly higher cost
	than we'd have solving them by Carnot decomposition".


2)	Something "lighter weight" than a several processes
	written to cooperatively solve the same problem.

	This implies a lot of things, but the top ones are:

	o	Lower context switch overhead

	o	Better utilization of quantum

	o	Cheaper inter-context communication mechanisms

	o	A way to reduce the number of processes that
		are taking up per-process resources on the
		system

	o	A way to increase outstanding I/O and other
		high latency operations' concurrency

I think we can agree that user space threads have lower context
switch overhead that kernel space threads.

I think we can agree that a user space threads scheduler context
switch has a better quantum utilization than a blocking kernel
thread, which gives up its quantum when it blocks.

I think we can agree that user/kernel is irrelevant, so long as
the address space is shared, in providing cheaper inter-context
communications.

I think we can agree that, when we talked about number of processes
taking up per-process resources, we were really talking about
kernel scheduling contexts, and so user space threads better
achieves this goal than kernel space threads; by the same token,
however, the resources which we are concerned about are less
expensive than they used to be, and it may be time to reconsider
the tradeoff we were trying to make.

I think we can agree that, so long as the I/O and other high
latency operations can be scheduled to occur, it doesn't matter
that, while they are pending, something else is running, so that
user space call-conversion (sync call = async call + user space
threads context switch) is probably _more_ efficient than using
a kernel thread and blocking it, and then using the kernel
scheduler to schedule another thread (or process).


So why not implement user space threads?

Somewhere along the way, some additional factors have cropped up;
these were not the initial intended use of threads, but they have
been used to solve the problems associated with them, and it's
generally a good design principle to not introduce new interfaces
when you have perfectly good interfaces that can be used to solve
the problem.  The new factors are:

1)	SMP scalability

	In order to scale a threaded process with the number of
	processors that a system has available, it's necessary
	to allow multiple processors into the same process
	context.

	This is partially a scheduler issue in the kernel, and
	partially a user space stack issue.

	The traditional answer to this was to introduce all sorts
	of N:M mapping schemes between multiple user space (N) and
	kernel space (M) threads and processors (P), where (N >= M)
	and (M <= P).

2)	Long latency interleave of non-asynchronous system calls
	using heavy-weight kernel context switches in order to
	achieve the interleave.

	In laymans terms, we set (M > P) in order to allow (M)
	outstanding blocking calls to exist from the user space
	process for (M) threads.

3)	Lazy-binding of kernel resources

	In order to lazy-bind kernel resources, such that the
	number of threads could be fluid, and so that we could
	not have to think ahead of time how many potential
	blocking calls we would need, we allocate a minimum
	of 2 kernel threads initially, and allocate more as
	blocking of user space calls into the kernel occur.  To
	achieve this (2 <= M <= P + 1).

	The extra kernel thread is used to signal a user space
	scheduler thread to create more kernel threads, if
	desired, since all kernel threads bound to user space
	calls in progress are currently blocked.  This is an
	anti-starvation strategy, which keeps runnable user
	space threads from being starved of kernel threads to
	utilized for blocking call contexts.

	Note: This is the current preverred Solaris model.

4)	Abuse of kernel threads in order to compete as N kernel
	schedulable entities with respect to other processes in
	the system

	This is a dodge to avoid supporting or implementing
	scheduler callses for the class of problems that really
	need scheduler classes, in the hopes that a certain
	unfair competition ratio "will be enough".

Of these, SMP scaling is the only truly compelling argument.

Long latency interleave is not a good argument; there are obvious
and significantly lighter weight methods of achieving this goal,
starting with POSIX async I/O.  While it's obviously a half-done
hack that only covers things that were "important" at the time
they were defining it, it at least points to a workable lower
overhead alternative.

The lazy-binding of kernel resources makes a couple of assumptions:

o	The implementation will be via kernel threads

o	Because the implementation will be via kernel threads,
	there is a need to minimize the impact of the way it
	will be implemented

St. Thomas Aquinas and Douglas Hofsteader would be very pleased
about this circular argument justifying the existance of its own
topic.  8-).

We can throw the abuse argument out.  POSIX supplies mechanisms
for dealing with this, including the idea of pluggable scheduling
classes, and FreeBSD already supports many of these.


This leaves us with:

	o	Allowing complex problems to be solved by
		procedural decomposition

	o	Lower context switch overhead

	o	Better utilization of quantum

	o	Cheaper inter-context communication mechanisms

	o	A way to reduce the number of processes that
		are taking up per-process resources on the
		system

	o	A way to increase outstanding I/O and other
		high latency operations' concurrency

	o	SMP scalability

As the true goals.


Julian has asked me several times to write up my threading model;
he's seen it on a white board, and I've been able to address all
of his concerns regarding it there.  I'll write that up for the
implemetnation phase of the discussion, as my proposal for an
implementation to address all of the stated design goals.



					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199911041804.LAA18253>