Date: Thu, 4 Nov 1999 18:04:17 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: julian@whistle.com (Julian Elischer) Cc: freebsd-arch@freebsd.org Subject: Re: Threads goals version III Message-ID: <199911041804.LAA18253@usr06.primenet.com> In-Reply-To: <Pine.BSF.4.05.9910311908080.8816-100000@home.elischer.org> from "Julian Elischer" at Oct 31, 99 07:09:04 pm
next in thread | previous in thread | raw e-mail | index | archive | help
I have been extermely busy at work, so I have not had time to contribute tothis discussion until now. I actually got up early today so I could do so... > ---Possible system design goals of system threads support -- > --- Note just becasue something is in this list doesn't mean that > it will be done, just that it's going o be carried forward to > further discussion. > -------------------------------------------------------------- I think there are some first principles missing here; the initial design goals of threads, when they were first invented, were: 1) A means of allowing programmers to build procedural soloutions instead of state machines. This is really a programmer convenience, for programmers who know how to write linear code, but don't know how to do finite state automata, to enable them to write code to solve certain classes of problems that they would not otherwise be able to solve. While I agree that it's nice that these people can be productive members of society without having to ask "would you like fries with that?", it's not really the overriding goal, it's just a side effect (a desirable one, and one that probably shouldn't be avoided in order to solve the other problems threads were designed to solve). Let's call this "allowing complex problems to be solved by procedural decomposition at a slightly higher cost than we'd have solving them by Carnot decomposition". 2) Something "lighter weight" than a several processes written to cooperatively solve the same problem. This implies a lot of things, but the top ones are: o Lower context switch overhead o Better utilization of quantum o Cheaper inter-context communication mechanisms o A way to reduce the number of processes that are taking up per-process resources on the system o A way to increase outstanding I/O and other high latency operations' concurrency I think we can agree that user space threads have lower context switch overhead that kernel space threads. I think we can agree that a user space threads scheduler context switch has a better quantum utilization than a blocking kernel thread, which gives up its quantum when it blocks. I think we can agree that user/kernel is irrelevant, so long as the address space is shared, in providing cheaper inter-context communications. I think we can agree that, when we talked about number of processes taking up per-process resources, we were really talking about kernel scheduling contexts, and so user space threads better achieves this goal than kernel space threads; by the same token, however, the resources which we are concerned about are less expensive than they used to be, and it may be time to reconsider the tradeoff we were trying to make. I think we can agree that, so long as the I/O and other high latency operations can be scheduled to occur, it doesn't matter that, while they are pending, something else is running, so that user space call-conversion (sync call = async call + user space threads context switch) is probably _more_ efficient than using a kernel thread and blocking it, and then using the kernel scheduler to schedule another thread (or process). So why not implement user space threads? Somewhere along the way, some additional factors have cropped up; these were not the initial intended use of threads, but they have been used to solve the problems associated with them, and it's generally a good design principle to not introduce new interfaces when you have perfectly good interfaces that can be used to solve the problem. The new factors are: 1) SMP scalability In order to scale a threaded process with the number of processors that a system has available, it's necessary to allow multiple processors into the same process context. This is partially a scheduler issue in the kernel, and partially a user space stack issue. The traditional answer to this was to introduce all sorts of N:M mapping schemes between multiple user space (N) and kernel space (M) threads and processors (P), where (N >= M) and (M <= P). 2) Long latency interleave of non-asynchronous system calls using heavy-weight kernel context switches in order to achieve the interleave. In laymans terms, we set (M > P) in order to allow (M) outstanding blocking calls to exist from the user space process for (M) threads. 3) Lazy-binding of kernel resources In order to lazy-bind kernel resources, such that the number of threads could be fluid, and so that we could not have to think ahead of time how many potential blocking calls we would need, we allocate a minimum of 2 kernel threads initially, and allocate more as blocking of user space calls into the kernel occur. To achieve this (2 <= M <= P + 1). The extra kernel thread is used to signal a user space scheduler thread to create more kernel threads, if desired, since all kernel threads bound to user space calls in progress are currently blocked. This is an anti-starvation strategy, which keeps runnable user space threads from being starved of kernel threads to utilized for blocking call contexts. Note: This is the current preverred Solaris model. 4) Abuse of kernel threads in order to compete as N kernel schedulable entities with respect to other processes in the system This is a dodge to avoid supporting or implementing scheduler callses for the class of problems that really need scheduler classes, in the hopes that a certain unfair competition ratio "will be enough". Of these, SMP scaling is the only truly compelling argument. Long latency interleave is not a good argument; there are obvious and significantly lighter weight methods of achieving this goal, starting with POSIX async I/O. While it's obviously a half-done hack that only covers things that were "important" at the time they were defining it, it at least points to a workable lower overhead alternative. The lazy-binding of kernel resources makes a couple of assumptions: o The implementation will be via kernel threads o Because the implementation will be via kernel threads, there is a need to minimize the impact of the way it will be implemented St. Thomas Aquinas and Douglas Hofsteader would be very pleased about this circular argument justifying the existance of its own topic. 8-). We can throw the abuse argument out. POSIX supplies mechanisms for dealing with this, including the idea of pluggable scheduling classes, and FreeBSD already supports many of these. This leaves us with: o Allowing complex problems to be solved by procedural decomposition o Lower context switch overhead o Better utilization of quantum o Cheaper inter-context communication mechanisms o A way to reduce the number of processes that are taking up per-process resources on the system o A way to increase outstanding I/O and other high latency operations' concurrency o SMP scalability As the true goals. Julian has asked me several times to write up my threading model; he's seen it on a white board, and I've been able to address all of his concerns regarding it there. I'll write that up for the implemetnation phase of the discussion, as my proposal for an implementation to address all of the stated design goals. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199911041804.LAA18253>