Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 03 Jan 2002 14:57:04 -0800 (PST)
From:      John Baldwin <jhb@FreeBSD.org>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        Peter Jeremy <peter.jeremy@alcatel.com.au>, Michal Mertl <mime@traveller.cz>, Bruce Evans <bde@zeta.org.au>, Mike Smith <msmith@FreeBSD.ORG>, Bernd Walter <ticso@cicely8.cicely.de>, arch@FreeBSD.ORG, Alfred Perlstein <bright@mu.org>, Matthew Dillon <dillon@apollo.backplane.com>, Bernd Walter <ticso@cicely9.cicely.de>
Subject:   Re: When to use atomic_ functions? (was: 64 bit counters)
Message-ID:  <XFMail.020103145704.jhb@FreeBSD.org>
In-Reply-To: <3C34C27C.C7FE6713@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 03-Jan-02 Terry Lambert wrote:
> ---
>   Example application and use of variable accuracy counts in
>   a non-locking scheduler with CPU migration, negaffinity,
>   and variable scheduling policy
> ---
> 
> Basically, you would not need extremely accurate load numbers
> in order to make CPU load balancing decisions, if you did the
> load balancing "correctly":
> 
> 1)    Per CPU run queues are created
> 2)    Processes do not migrate themselves between run queues
> 3)    A single "figure of merit" is kept on a per CPU basis,
>       where other CPUs can see it; their view of this figure
>       potentially contains latency in its accuracy
> 4)    An infrequent (relative to the scheduling quantum)
>       timer averages "CPUs of interest" (this permits NUMA
>       and cluster based migration, in the future, if desired,
>       by keeping per CPU set statistics, if desired), and
>       stores this, also, as a "figure of merit" which is
>       accessible to the other CPUs.  It might also keep a
>       list of M CPUs, in sorted order of lowest to highest
>       values, to make migration CPU target selection easier.
> 5)    When a CPU is ready to pull a task off its run queue,
>       perhaps every N iterations of this procedure, it takes
>       its "figure of merit" and compares it against the average
>       "figure of merit"; if the per CPU number is not some
>       configurable "high water mark" over the average, then
>       nothing happens: it simply schedules the task
> 6)    If it decides action nees to be taken, then it polls
>       the figures of merit for its interest group.  The CPU
>       with the lowest "figure of merit" is selected; this
>       may be mediated by negaffinity; for example, if the
>       task is a thread in a thread group (process), it might
>       have a 32 bit value in the process structure indicating
>       which CPUs have threads on them currently.  In that case,
>       only a CPU without its bit set in the 32 bit value could
>       be selected, based on relative figure of merit.
> 7)    The "scheduler insertion queue" for the selected target
>       CPU (a simple linked list) has the task added to it (the
>       task is migrated).  This requires a lock on the queue;
>       note that only one other CPU is tied up in this process,
>       in an N CPU system, and the statistical likelihood of a
>       collision is very low.
> 8)    Each CPU going into its scheduler looks first at its per
>       CPU scheduler insertion queue; this look does not require
>       a lock, in most instances, since all it is doing is a
>       check to see if a pointer is NULL.  If the pointer is not
>       NULL, then (and only then) is a lock on the queue
>       asserted by the CPU that "owns" it, and the element is
>       dequeued, the lock released, and run.

Actually, if another CPU can write to this queue whcih 7) seems to indicate,
then you do need a lock.  Well, if you can guarantee that the entire word is
always consistent then you may be able to get away with this, but you can read
a stale value (you get NULL when another CPU has written to it) and you will
miss a task.  I suppose that is a losable race as you will run the local task
on the next switch.  When you do grab a task off your local queue you _will_
need a lock however.  Otherwise another CPU could be giving you a task at the
same time and depending on how this worked you could end up with a task that
gets "lost" and never gets to run.  Or worse, you could corrupt the list.  
Thus, you do always need at least one lock when getting the next task to run. 
However, that lock may be a per-CPU lock which will be contended for less than
a lock on a system-wide queue.

> The point of this example is that the issue isn't about how much
> or how little locks cost, but about designs that don't require
> locks in the common case.

Yes, optimizing is good, but getting the design right first is much more
important.  Optimizations can wait until you have the design closer to
finalized.  Pre-mature optimization can lock you into bad design decisions that
you end up regretting later on.

-- 

John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.020103145704.jhb>