Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Nov 2001 10:04:07 -0800 (PST)
From:      Julian Elischer <julian@elischer.org>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        arch@freebsd.org
Subject:   RE: Thread scheduling in the kernel
Message-ID:  <Pine.BSF.4.21.0111130929230.98845-100000@InterJet.elischer.org>
In-Reply-To: <XFMail.011112162429.jhb@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
(I notice you only comented on the first half, but that's a lot better
than the complete lack of interest from everyone else.....)

On Mon, 12 Nov 2001, John Baldwin wrote:

> 
> On 12-Nov-01 Julian Elischer wrote:
> > 
> > In an attempt to get the next part of the KSE work designed (design before
> > code you know.. a strange new concept) I've been trying to work out
> > the "correct" scheduling methods for such a system.
> > 
> > There are a few 'tricks' that need to be taken into account..
> > 
> > a few notes..
> > 
> > 
> > 1/ Since threads running a syscall hit 'sleep' events
> > the entities on teh sleep queues must be the  threads.
> > 
> > 2/ the entity that is scheduled onto the run queues is the KSE.
> > (as the name suggests).
> > 
> > 3/ If we have only one run queue, then KSEs for several processors
> > from the same process, may be on the same queue.
> > 
> > 4/  If threads 'wake up' they are hung of a list of runnable threads
> > somewhere. This list could be hanging off the process, or the KSE.
> > (actually more likely the KSEgroup than the process but...)
> 
> It should hang off the group.

This was my original idea.  However I ended up splitting that queue up so
that it was on each KSE and allowed a KSE with no work to steal work from
another. i.e. a virtual single queue, with KSE affinity. If I bind KSEs to
processors lightly, then I bind threads at the same time. (lightly)

The idea is that threads are put on the queue for the KSE on which they
last ran. Only when a KSE runs out of runnable threads on its own list and
still has teh CPU, will it try steal work from another in the same group.

The downside is that there is no overall priority between threads in a
group.. This is one thing I want o discuss... the queueing model.


> 
> > 5/ If a KSE reaches teh front of the queue, but the process
> > that is running is not that for which that KSE has some affinity,
> > does it get out of the way to allow another KSE in the queue
> > to get run? or does it just run and 'switch' everything over to the new
> > available processor? Maybe the scheduler looks for the KSE from the same
> > group, that was assigned to that processor, and runs that, leaving
> > the original KSE at the head of the queue? 
> > Maybe that happens until all the KSEs in the queue
> > that were from that group have been run? In this case it becomes possible
> > to always have a KSE from that group ready...
> 
> Actually, I would remove the concept of affinities from the KSE
> itself.  Rather I would let each thread have lastcpu like it does now,
> and when a KSE goes to choose a thread, it chooses one that has the
> lastcpu == current cpuid.

That is another possible way of tackling the problem.
How deep in the group's queue does the KSE look before it decides to just
take 'any' thread?

> 
> > Maybe the KSE-GROUP is what is put unto the run queue, and KSEs from that
> > group are put on all processors that look for work, until all of them 
> > have been run? (this would ensure that threads from the same process
> > would all be run at the same time which is sometimes good, and sometimes
> > bad, depending on the application.
> 
> I wouldn't do this.  I would just put KSE's on the queue's.  However, I think
> that KSE's actually can be even smaller than they are now.  AFAICT they are
> basically placeholders to sit on the runqueue's and not good for much else. :)

You may notice that that's approximatly what I have done now... It's a
"Kernel Schedulable Entity". It does leave some unfairness to the
advantage of processes that have multiple KSEs. The KSe's job is to be
scheduled on a run queue, and to provide a linkage point for other
elements. It doesn't have very much else in it. (maybe a state variable).


> 
> > 6/ When a Thread is made runnable it gets (in the present system) a
> > priority. What priority does a KSE in the run queues have when it has
> > threads of several differnt priorities? Do we sort them in priority order
> > and drop the priority of the KSE(group) as we go through them
> > until we have less priority than some other kse?
> 
> Actually, in theory the prioities are supposed to be per-KSE group
> right?  In that case, changign the priority of an individual thread
> for the purposes of priority propagation/inheritance or other
> shenanigans results in creating a new group for that thread.

Static priority inputs, (e.g. nice), yes.
It is quite possible tha the KSEs in the group might have private
priorities that diverge from this according to inputs from the 
threads they are running at that time....

My guess is that a kse from the group is elevated in priority
when a thread with elevated priority comes runnable.
This brings up questions of pre-emption.

> 
> > 7/ when a KSE runs out of work, how does it decide whether there is work
> > that should be stolen from a fellow KSE? How does processor affinity
> > effect this?
> 
> If the list is per-ksegroup, then you just make a first pass
> preferring threads that last ran on the current CPU.  If you don't
> find anything, you just grab the first thing on the list.

The queue might be quite long.. maybe only scan the first N entries...

> 
> > 8/ If we had per-processor scheduling queues, How would that effect it?
> > Which element get's put on the queues? Does a KSE
> > stay on the run queue if it has un=run threads, even when it's running?
> > How do we handle the arrival of new runnable threads with a KSE
> > when it's running but a fellow KSE is not runnable. Do we 
> > bump the priority of the other KSE and hand it the new threads?
> 
> I'm not sure how this fits in that model unless you bind KSE's to
> CPU's or something similar.  Only threads really have affinity, KSE's
> don't really care if they migrate as they have no execution context
> that gets affected.

If you can bind KSes to processors a bit and 
have an affinity to a particular KSE, then you reduce the amount
of work you have to do to select thte next thread to run.
It's a tradeoff. The selection might get very heavyweight if there are a
LOT of threads to select from. this could make a scheme that does a
selection between each thread to be run, rather unscalable. If we had the
affinity 'built in' to the structures/lists then it would be an order(1)
operation.. and more scalable.

(just an idea)

> 
> If the priorities are per-KSEgroup, then you get to assume that all threads in
> a group are equal in priority, which is true unless a particular thread
> temporiarly gets a bump from priority propagation or the process assigns a
> thread to a realtime priority or some such.

I don;t think that the priority of all teh threads are the same, but
rather, the priority ifor them all is based upon the same BASE priority
and statistics.. i.e. the KSEG collects recent CPU usage and
it's base priority degrades, taking all it's KSE's and threads with it.
However I think that when a thread wakes up with an elevated priority,
(as they do now) then a KSE needs to be boosted in priority to run it.
After that thread has returned to user mode, the next highest priority
in the list is run, etc. This sort-of suggests per-KSEG priority queues...
As the KSE runs lower and lower priority threads, it's own priority
could be lowered. When its priority is lower than another KSE
onthe run queues it loses the CPU.. The question is whether the returning
syscall follows the execution path back into userland. If it doesn't
immediatly, then it may never get to userland if it lowers
it's priority on another thread, and loses the CPU. This could lead
to a case where a proces has completed all it's syscalls but
is not able to proceed in userland.. Maybe this is what should happen,
but I doubt it..

[lots of my other comments missing.....]



> 
> -- 
> 
> John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
> PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
> "Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0111130929230.98845-100000>