From owner-freebsd-arch Mon Nov 27 17: 2:16 2000 Delivered-To: freebsd-arch@freebsd.org Received: from net2.gendyn.com (nat2.gendyn.com [204.60.171.12]) by hub.freebsd.org (Postfix) with ESMTP id 303E337B4C5 for ; Mon, 27 Nov 2000 17:02:09 -0800 (PST) Received: from [153.11.11.3] (helo=plunger.gdeb.com) by net2.gendyn.com with esmtp (Exim 2.12 #1) id 140Z9g-000Ln6-00 for arch@freebsd.org; Mon, 27 Nov 2000 20:01:56 -0500 Received: from orion.caen.gdeb.com ([153.11.109.11]) by plunger.gdeb.com with ESMTP id TAA03337; Mon, 27 Nov 2000 19:58:45 -0500 (EST) Received: from gdeb.com (gpz.clc.gdeb.com [192.168.3.12]) by orion.caen.gdeb.com (8.9.3/8.9.3) with ESMTP id TAA00995; Mon, 27 Nov 2000 19:59:00 -0500 (EST) (envelope-from deischen@gdeb.com) Message-ID: <3A230437.F8318078@gdeb.com> Date: Mon, 27 Nov 2000 20:02:47 -0500 From: Dan Eischen X-Mailer: Mozilla 4.75 [en] (X11; U; SunOS 5.8 sun4u) X-Accept-Language: en MIME-Version: 1.0 To: arch@freebsd.org Cc: julian@elischer.org, jasone@canonware.com Subject: Re: Threads (KSE etc) comments Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I promised to post my thoughts on the User Thread Scheduler. Here they are in hopefully some presentable form. -- Dan Eischen ---------------------------------------------------------------------- 1. Overview This is a discussion of some of the issues in the User Thread Scheduler in the FreeBSD KSE project. This doesn't go into much detail on the design and implementation, but focuses more on the scheduling models and the (userland) API. 2. Definitions Kernel Scheduled Entities (KSE) - This is the entity in which user threads are scheduled. A process may have multiple KSEs in which to schedule threads. A KSE may be viewed as a virtual processor to the UTS. The number of KSEs available for threads with contention scope process may be limited to the actual number of processors in the system (TBD). Kernel Scheduled Entity Group (KSEG) - A group of KSEs that, as outlined so far, contains the shared quantum and priority. This is currently under discussion. Scheduling Allocation Domain - The number of processors over which threads can be scheduled. This really correlates into how many KSEs are allocated to the process. User Thread Scheduler (UTS) - This is responsible for scheduling user level threads over the available processors (KSEs). 3. Scheduling Models Threads are scheduled according to their contention scope and allocation domain. The contention scope, as defined by POSIX, is either system or process contention scope (PTHREAD_SCOPE_SYSTEM and PTHREAD_SCOPE_PROCESS respectively). Each contention scope system thread is bound to its own private KSE. This KSE has its own quantum and priority (if it needs a KSEG to obtain that, so be it) such that the thread competes on a system-wide basis with other system scope threads. Contention scope system threads need not be part of any scheduling queue since there are no other threads in competition for processing time on that KSE. Contention scope process threads are more interesting in that there are several possible scheduling schemes depending on the allocation domain and on the number of quanta granted by the kernel (not including quanta granted for scope system threads). 3.1 Single Queue, Single Allocation Domain This is the common case where there is only 1 CPU and only 1 quantum granted to handle scope process threads. All runnable threads are placed in the same queue and compete for processor time on one scheduleable entity (with 1 quantum). 3.2 Single Queue, Multiple Allocation Domain Here we have N schedulable entities over which threads can be executed. All runnable threads are placed in the same queue and are scheduled onto N schedule entities as they become avaliable. When a thread blocks or completes in a schedulable entity, then another thread is pulled from the single run queue for execution. If a KSE is preempted while running a thread, and in lieu of any hint from the application as to the binding of threads to KSEs, the UTS would only resume that thread on the next KSE if its priority was (strictly) greater than the priority of the next thread in the run queue, or if the preempted thread was in a critical region. Whether or not the scheduled entities have their own quantum is not known at this point, but under the current design it is possible (the UTS could create N KSEG/KSE pairs to allow this instead of N KSEs within 1 KSEG). 3.3 Multiple Queue, Multiple Allocation Domain Again we have N schedulable entities over which threads can be executed, but in this case there are also multiple (up to N) scheduling queues. In this model, there may exist a run queue for each schedulable entity, and threads may be bound a particular entity. As with the single queue model above, whether or not scheduled entities have their own quantum is not yet known, but it is possible with the current design. How does the UTS decide which threads get bound to each of the N scheduled entities? At the least, the application should have the ability to decide this. In lieu of any hint from the application, the UTS could also provide some method of (soft) binding threads to the N scheduled entities and optimize for maximum CPU utilization and minimum thread reassignment. My thought is that we concentrate on keeping it simple for now, and allow for this possiblity later. For the case where not all threads are bound to a specific KSE by the application, there could be a global run queue from which unbound threads are taken. In this case there would be as many as N+1 scheduling queues, with each KSE taking a peek at the priority of the global run queue before deciding on taking a thread from its own run queue. The global run queue would not have to be locked (unless adding/removing a thread), so this would only add the overhead of a couple of instructions to examine its priority. You might also have KSEs that don't have any bound threads, in which case they wouldn't have a run queue and would always obtain threads from the global run queue. Yet another option would be to disallow binding of threads to the original (main) KSE and only allow binding to other KSEs. All unbound threads would be executed on the main KSE (and any other KSE which does not have bound threads). Any KSE that has bound threads would only execute those threads. 4. API For the most part, the POSIX API is sufficient for our needs. But if we want to allow application control of how threads are assigned and scheduled on the KSEs, we could define the following set of interfaces: pthread_setconcurrency() - This is the POSIX interface to set the concurrency level. This will request the desired concurrency level and informs the UTS as to how many KSEs are to be requested from the kernel (which the kernel may limit). Whether or not this also allows additional quantum remains to be seen. Certainly, the UTS could create an additional KSEG/KSE pair for each level of concurrency above 1 in order to achieve additional quantum under the current design. This function does not change/reflect the scheduled entities for system scope threads. The limited concurrency level is returned. If we wanted this routine to act the same as it does under Solaris, then it would actually request the number of entities with quantum and priority (KSEGs as currently defined). thr_create(...) - This would be an alternative to using pthread_setconcurrency() to set the number of KSEs. This function allows an application to specify additional attributes for thread creation. Solaris allows additional flags to be specified, noteably THR_NEW_LWP and THR_BOUND. The effect of specifying THR_BOUND is the same as specifying PTHREAD_SCOPE_SYSTEM. But specifying THR_NEW_LWP (and omitting THR_BOUND) allocates an additional LWP that can be used to schedule unbound (scope process) threads. We could provide a similar flag THR_NEW_KSE (or THR_NEW_KSEG) that could tell the UTS to request an additional KSE (or KSEG) to be used to schedule scope process threads. I'm not too keen on this interface as an alternative to pthread_setconcurrency(), but perhaps it has some merit if we want a Solaris-like API. _kse_self() - returns the current KSE ID, where the integer ID ranges from 0 (for the original KSE of the main process) to M-1 (where M is the total number of KSEs in the process). Solaris provides a similar function _lwp_self(). pthread_bind_np(pthread_t pthread, int kse_id) - binds a given thread to the specified KSE. A kse_id of -1 refers to the current KSE. I suppose this could also be called _kse_bind() depending on how you looked at it. _kse_bind(int kse_id, int processor) - binds a KSE to a particular processor. This might also be called _cpu_bind() if you use _kse_bind instead of pthread_bind_np. Solaris has processor_bind() which can handle both LWPs and PIDs, and pset_bind() which allows binding of LWPs or PIDs to processor sets. This is probably getting a little ahead of ourselves, but something to think about anyways. For a moment, let's make the assumption that each KSE has a priority and quantum, or that we always use a KSEG with one KSE to achieve the same effect. We _could_ now present an interface that is very similar to that provided by Solaris. True, perhaps a KSE (or KSEG) is not as heavy as an LWP in Solaris, but that is just an internal implementation issue. To the application, they are seen as very much similar things. If we provide an API that is very similar to that provided by Solaris, that would make porting Solaris applications trivial. Again, something to think about. 5. Interaction of Existing Scheduling Interfaces We currently have the following interfaces that affect process scheduling: setpriority() rtprio() sched_setparam() sched_setscheduler() any others? In a threaded process, I think these should operate on the entity that contains the quantum and priority, not the process. Whether that is a KSE or a KSEG, I don't know. If it is a KSEG, then that's the only case I can see for forcing the threads library to know anything about KSEGs. Still, the kernel is responsible for setting these priorities, not the UTS, so it wouldn't be strictly necessary for the UTS to have any knowledge about KSEGs. 6. Summary I'd like some resolution as to what interfaces the threads library should provide to the application. I've outlined some of my thoughts above and I'd like some feedback. My biggest question is do we want to provide the ability for a threaded application to request more scheduling time (aside from PTHREAD_SCOPE_SYSTEM threads)? I've already seen applications that always use PTHREAD_SCOPE_SYSTEM when creating threads. I suppose this is mostly in part to obtain as much CPU as possible. At USENIX, Jason and I attended a BOF on threads and it was kind of amazing to me that folks seemed to prefer the LinuxThreads model. Given this attitude, I don't think it makes sense to attempt to restrict an application to only 1 (or N where N = # of CPUs) quantum; we'll just end up with applications that always use system scope threads. Do we want to provide a method of binding threads to KSEs, and KSEs to processors? Binding threads to KSEs isn't really that hard to implement in the UTS, and I wouldn't think it too difficult for the kernel to bind KSEs to processors either (?). Some KSEs may be automatically bound to processors, but others might not; KSEs allocated for system scope threads, or KSEs allocated (for process scope threads) above and beyond the number of CPUs (assuming we allow this). I'd like to resolve these issues (any others?) very soon so I can concentrate on more of the UTS details (like what is the communication channel between the kernel and the UTS). At some point, it may be worthwhile to have a telecon or IRC (never tried it) because it could take too long via this mailing list. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message