Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Jan 2003 00:35:54 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Steve Kargl <sgk@troutmask.apl.washington.edu>, Jeff Roberson <jroberson@chesapeake.net>, Robert Watson <rwatson@FreeBSD.ORG>, Gary Jennejohn <garyj@jennejohn.org>, arch@FreeBSD.ORG
Subject:   Re: New scheduler (#3)
Message-ID:  <3E339DEA.E1380FCE@mindspring.com>
References:  <20030125171217.D18109-100000@mail.chesapeake.net> <200301252320.h0PNKVoq090077@apollo.backplane.com> <200301252350.h0PNo6xO009489@apollo.backplane.com> <200301260114.h0Q1EXuu017546@apollo.backplane.com> <20030126013234.GA19891@troutmask.apl.washington.edu> <200301260218.h0Q2IlkX024483@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Matthew Dillon wrote:
>    I expect Jeff can fix the more obvious bugs in a few seconds.  Dealing
>    with the cpu-thread-stealing issue is a much harder problem.  I saw
>    the flip-flop and reported it, but did not try to fix it.


My fuzzy logic suggestion is to make the CPU migration model
"push" instead of "pull".  This also gets rid of the need to
lock in the scheduler, in all cases (whereas a pull model
requires locking each time).

The idea is that you create a process migration list for each
CPU, and when you migrate, you do it by "pushing" the process
onto the least-loaded CPU's migration queue (doing this requires
grabbing a queue lock, but checking that the queue is non-NULL
does not require grabbing a queue lock, so it is non-locking in
the common case, which is the non-migration case).

A "pull" model requires locking your queue all the time, since
it permits another CPU to access, which requires a lock to keep
someone from pulling from your queue while you are accessing
your queue.

By only permitting access to an exceptional queue, and not having
read access to the normal queue for other processes, you save
yourself from the locking of your own queue.

Relatively simple, and obvious, once explained.

The "flip flopping" is solved because "push" only occurs as a
result of relative load... in the "pull" case, you have a virtual
global queue, so there's no way to avoid it.  Basically, "pull"
prevents you from using a fuzzy difference in a "figure of merit"
indicating CPU load, and from which relative load can be calculated.

The only missing thing here is that the figure of merit needs to be
atomically readable without a lock -- sizeof(int), in the general
case -- which allows it to be calculated locally to the CPU in
question, and read (and acted upon) without a lock.  My suggestion
is to calculate "percentage load".  The easiest way to do this is
to keep a count of the number of processes on the ready-to-run
queue for a given CPU, and just use that number as the basis of a
moving average, which is multiplied by 100 (or 1000) to get the
integer figure-of-merit.

The hysteresis values, which you could allow to be set via sysctl,
provide high and low *differential* values, so that the flip-flopping
can be damped, as necessary, to get to the point where the effect no
longer impacts overall performance.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E339DEA.E1380FCE>