FreeBSD Mail Archives

Date:      Mon, 18 Aug 1997 11:22:59 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        hoek@hwcn.org
Cc:        kd5ob@theshop.net, freebsd-hackers@FreeBSD.ORG
Subject:   Re: The low priority items
Message-ID:  <199708181822.LAA15482@phaeton.artisoft.com>
In-Reply-To: <Pine.GSO.3.96.970817212229.15871A-100000@james.freenet.hamilton.on.ca> from "Tim Vanderhoek" at Aug 17, 97 09:38:35 pm

> > I'm a little slow.  Did I read correctly that Threaded processes are
> > not supported?  And the next line down from that mentioned something
> > of the same, however, can I assume it is to deal with multiprocessor
> > CPU boards as well?
> 
> man 3 pthread
> 
> Current thread support isn't optimal, and there are multiple
> threading packages to choose from (threads aren't implemented at
> the kernel level), but it's there, and it's used.  :)

Kernel threading is not that big a win, unless it is a hybrid system,
or unless you are throwing iron at the problem (in the form of more
CPU's).


> > systems.  I never imagined that this system would have a problem with
> > threading.  I was hoping that the multiple processor issue would
> > have a higher priority than it does.  
> 
> Theading really isn't an incredible technological advantage.  The
> unique ability of UNIX that you're probably thinking of is to run
> "multiple processes", not "multiple processors".

Kernel threading buys the ability to scale a single mutlithreaded
process across multiple CPU's.  In other words, it reflects on the
SMP scalability of single processes.

A typical fault in most kernel threading implementations is that
a kernel thread is seen as a context for a blocking call; the
user issues blocking calls on kernel threads, and a thread context
switch occurs.  This is bad, since it means you are not fully
utilizing your quantum and you get to eat paging overhead.

There are some things you can do about this: thread group affinity,
which will preferentially switch to another thread in the same
process instead of another process.  This plays a bit of hell with
CPU affinitiy, and the need to maintain cache coherency in a
multiprocessor environment.

The best (currently) answer out there is to use cooperative
scheduling, where a user space scheduler does call conversion
threading (just like pthreads, only hopefully a bit less overhead
because of native support).

The other advantage of loosely coupling kernel threads (kernel
schedulable entities) and blocking calls from user space threads
(user work-to-do) is that, unlike traditional SVR4/Solaris kernel
to user space thread mappings, when there are more user space
threads than kernel space threads, you still don't get starvation.
Starvation occurs in SVR4/Solaris when there are user space threads
capable of being run., but all kernel threads are handling blocking
calls, and therefore there are no kernel threads available to run
the user space threads: the quantum was allocated, but the kernel
threads foolishly gave it away by blocking.

The current pthreads implementation is a pure call-conversion
implementation: a blocking call is converted into a non-blocking
call and a context switch.  Call conversion uses an entirely user
space threads scheduler, whose job it is to handle the context
switch (which includes stack switching and, on some RISC processors,
register window flucshing, etc.).  This type of call conversion
has limited value because it applies only to system calls which
operate on file descriptors, and then only for calls for which a
non-blocking causes the call to return incomplete.  The other
problem with this type of threading is stack allocation; stacks
can not grow automatically, they have to be preallocated.

An alternative implementation of a call conversion threading package
uses AIO (asynchronous I/O).  Typically (SunOS 4.1.x liblwp, for
instance) these implementations use the additional system calls
aioread, aiowrite, aiowait, and aiocancel.  This is superior to
the non-blocking FD implementation, in that it allows I/O to be
interleaved instead of terminated with an EWOULDBLOCK or EAGAIN.
It is inferior, in that it only applies to read/write operations.

In an ideal call-conversion scenario, there would be an async
call gate corresponding to the sync call gate used by normal
system calls.  To implement this would require a pool of stacks
and async request contexts that could be used to maintain all
outstanding blocking calls.  This is very similar to what VMS
has in terms of the context records for ASTs (async system traps),
but does not require a callback from the kernel to user space --
a major performance win.  You could easily improve this even
further by adding a flag value in the sysent structure to indicate
calls which could potentially block, and delaying the conversion
until a block would occur.  This "lazy binding" of contexts would
both save you total outstanding contexts at any one time, but in
the case of a non-sleep completion, the call could return
immediately to the caller, having successfully completed.

Since a call context could be serviced on any processor, the
normal scheduler interface is sufficient to maximize concurrency
(there are minor issues of CPU affinity, which would be handled
via cooperative scheduling on "EINPROGRESS" returns...).


> Windows programmers often see threads as an absolute necessity
> because of the Windows history. UNIX programmers use and
> generally prefer fork(), but it's recognized that threads have
> their purpose (which is why they're supported :).

Windows is, historically, a voluntary preemption model cooperative
multitasker.

In other words, Windows prior to Windows 95 was a single process
system, and all your applications were merely call-conversion
based threads within that single process.  This is why it's so
hard to understand the amount of time it has taken the Wine project
to get where it is today.

In effect, Windows had loadable kernel modules before everyone,
only they were called "Windows applications".

Even in Windows 95, if you run legacy applications, you are
running in a single process for all old applications.  The
process itself can be preeempted by Windows 95's preemptive
multitasking scheduler, but within the process, preemption is
entirely voluntary.  The next time you are running a legacy
application on Windows 95, type the three fingered salute of
"CTRL-ALT-DEL" to get the "Close Program" (Task Manager)
window, and look for "Winoldapp"... that's the process.


> > >From what I've read this Free BSD seems to support multiple sessions.
> > Yet I also read something about DLL's being supported, but I guess
> > you can't thread anything.  

You can.  You don't get "process attach" and "thread attach" type
event notifications into a main loop in the DLL, however.  You can
get the equivalent of "process attach/detach" events: these are
when the ctors/dtors are called, and you can hook these in a
library implementation.

The "thread attach/detach" is less meaningful, since objects are
not constructed in thread local storage in FreeBSD.  Why do you
have to call CoCreateFreeThreadedMarshaller() to move COM objects
between threads in the same process?  The address space is not
shared, and threads in Windows 95 are thinly disguised kernel
threads.  This saves them fixing the stack growth problem the
right way (seperate stack segment using page anonymity and guard
pages), but it means an obect instanced in one thread is not
necessarily in the address space of another thread, even in the
same process.  This is somewhat masked by the use of "first
available kernel thread".


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199708181822.LAA15482>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation