Date: Mon, 18 Aug 1997 11:22:59 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: hoek@hwcn.org Cc: kd5ob@theshop.net, freebsd-hackers@FreeBSD.ORG Subject: Re: The low priority items Message-ID: <199708181822.LAA15482@phaeton.artisoft.com> In-Reply-To: <Pine.GSO.3.96.970817212229.15871A-100000@james.freenet.hamilton.on.ca> from "Tim Vanderhoek" at Aug 17, 97 09:38:35 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > I'm a little slow. Did I read correctly that Threaded processes are > > not supported? And the next line down from that mentioned something > > of the same, however, can I assume it is to deal with multiprocessor > > CPU boards as well? > > man 3 pthread > > Current thread support isn't optimal, and there are multiple > threading packages to choose from (threads aren't implemented at > the kernel level), but it's there, and it's used. :) Kernel threading is not that big a win, unless it is a hybrid system, or unless you are throwing iron at the problem (in the form of more CPU's). > > systems. I never imagined that this system would have a problem with > > threading. I was hoping that the multiple processor issue would > > have a higher priority than it does. > > Theading really isn't an incredible technological advantage. The > unique ability of UNIX that you're probably thinking of is to run > "multiple processes", not "multiple processors". Kernel threading buys the ability to scale a single mutlithreaded process across multiple CPU's. In other words, it reflects on the SMP scalability of single processes. A typical fault in most kernel threading implementations is that a kernel thread is seen as a context for a blocking call; the user issues blocking calls on kernel threads, and a thread context switch occurs. This is bad, since it means you are not fully utilizing your quantum and you get to eat paging overhead. There are some things you can do about this: thread group affinity, which will preferentially switch to another thread in the same process instead of another process. This plays a bit of hell with CPU affinitiy, and the need to maintain cache coherency in a multiprocessor environment. The best (currently) answer out there is to use cooperative scheduling, where a user space scheduler does call conversion threading (just like pthreads, only hopefully a bit less overhead because of native support). The other advantage of loosely coupling kernel threads (kernel schedulable entities) and blocking calls from user space threads (user work-to-do) is that, unlike traditional SVR4/Solaris kernel to user space thread mappings, when there are more user space threads than kernel space threads, you still don't get starvation. Starvation occurs in SVR4/Solaris when there are user space threads capable of being run., but all kernel threads are handling blocking calls, and therefore there are no kernel threads available to run the user space threads: the quantum was allocated, but the kernel threads foolishly gave it away by blocking. The current pthreads implementation is a pure call-conversion implementation: a blocking call is converted into a non-blocking call and a context switch. Call conversion uses an entirely user space threads scheduler, whose job it is to handle the context switch (which includes stack switching and, on some RISC processors, register window flucshing, etc.). This type of call conversion has limited value because it applies only to system calls which operate on file descriptors, and then only for calls for which a non-blocking causes the call to return incomplete. The other problem with this type of threading is stack allocation; stacks can not grow automatically, they have to be preallocated. An alternative implementation of a call conversion threading package uses AIO (asynchronous I/O). Typically (SunOS 4.1.x liblwp, for instance) these implementations use the additional system calls aioread, aiowrite, aiowait, and aiocancel. This is superior to the non-blocking FD implementation, in that it allows I/O to be interleaved instead of terminated with an EWOULDBLOCK or EAGAIN. It is inferior, in that it only applies to read/write operations. In an ideal call-conversion scenario, there would be an async call gate corresponding to the sync call gate used by normal system calls. To implement this would require a pool of stacks and async request contexts that could be used to maintain all outstanding blocking calls. This is very similar to what VMS has in terms of the context records for ASTs (async system traps), but does not require a callback from the kernel to user space -- a major performance win. You could easily improve this even further by adding a flag value in the sysent structure to indicate calls which could potentially block, and delaying the conversion until a block would occur. This "lazy binding" of contexts would both save you total outstanding contexts at any one time, but in the case of a non-sleep completion, the call could return immediately to the caller, having successfully completed. Since a call context could be serviced on any processor, the normal scheduler interface is sufficient to maximize concurrency (there are minor issues of CPU affinity, which would be handled via cooperative scheduling on "EINPROGRESS" returns...). > Windows programmers often see threads as an absolute necessity > because of the Windows history. UNIX programmers use and > generally prefer fork(), but it's recognized that threads have > their purpose (which is why they're supported :). Windows is, historically, a voluntary preemption model cooperative multitasker. In other words, Windows prior to Windows 95 was a single process system, and all your applications were merely call-conversion based threads within that single process. This is why it's so hard to understand the amount of time it has taken the Wine project to get where it is today. In effect, Windows had loadable kernel modules before everyone, only they were called "Windows applications". Even in Windows 95, if you run legacy applications, you are running in a single process for all old applications. The process itself can be preeempted by Windows 95's preemptive multitasking scheduler, but within the process, preemption is entirely voluntary. The next time you are running a legacy application on Windows 95, type the three fingered salute of "CTRL-ALT-DEL" to get the "Close Program" (Task Manager) window, and look for "Winoldapp"... that's the process. > > >From what I've read this Free BSD seems to support multiple sessions. > > Yet I also read something about DLL's being supported, but I guess > > you can't thread anything. You can. You don't get "process attach" and "thread attach" type event notifications into a main loop in the DLL, however. You can get the equivalent of "process attach/detach" events: these are when the ctors/dtors are called, and you can hook these in a library implementation. The "thread attach/detach" is less meaningful, since objects are not constructed in thread local storage in FreeBSD. Why do you have to call CoCreateFreeThreadedMarshaller() to move COM objects between threads in the same process? The address space is not shared, and threads in Windows 95 are thinly disguised kernel threads. This saves them fixing the stack growth problem the right way (seperate stack segment using page anonymity and guard pages), but it means an obect instanced in one thread is not necessarily in the address space of another thread, even in the same process. This is somewhat masked by the use of "first available kernel thread". Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199708181822.LAA15482>