Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Aug 1999 21:08:58 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        BMCGROARTY@high-voltage.com (Brian McGroarty)
Cc:        peter@netplex.com.au, alc@cs.rice.edu, blanquer@cs.ucsb.edu, freebsd-smp@FreeBSD.ORG
Subject:   Re: Re: Questions
Message-ID:  <199908132108.OAA18118@usr01.primenet.com>
In-Reply-To: <6E585390B950D31186D50008C7333C82@high-voltage.com> from "Brian McGroarty" at Aug 13, 99 03:11:00 pm

next in thread | previous in thread | raw e-mail | index | archive | help
Brian McGroarty writes:
> Peter Wemm writes:
> > > Try asking John Dyson (dyson@iquest.net).  I think he has
> > > experimented with some limited forms of affinity scheduling.
> > 
> > I've done this BTW and have it currently running.  I've turned up a bug or
> > two that look awfully like something is changing p->p_priority of processes
> > while they are on run queues.
> > 
> > Even doing trivial affinity makes a big difference here.  Trivial meaning
> > that when selecting a process to run, walk the current run queue level and
> > find the first process with a matching lastcpu id rather than just the
> > first in the queue.  If no match, then take the head.  This is what John
> > did, but I rewrote setrunqueue, remrq in C and moved the process selection
> > out of i386/swtch.s and into C.  The compiler generates suprisingly similar
> > code to the assembler version, but when you turn on the U/V pipeline
> > scheduling and the cpu-specific code generation options (eg: use cmove etc)
> > then it seems to do slightly better than the original assembler code.
> > 
> > This dramatically simplifies the complexity of cpu_switch() and swtch.s
> > and moves the run queue management to MI code.  All that is left in
> > cpu_switch() is the actual context switch code.
>
> Does this selection mode skew overall processor allocation measurably?
> 
> I imagine allocation is severely skewed by roundrobin() between schedcpu()
> calls, but then the high p_estcpu bumps the favored task down a run level
> queue, compensating with a moment of being starved a processor.
> 
> The worst case would then be a high priority task with a goodly sized pool of
> others a few run queue levels down. The process would race and then bob down,
> float up as p_estcpu is averaged down through schedcpu() calls, soon climb a
> layer back up for its higher priority, then race and pitch downward again.

Yes, this is a highly suboptimal method of achieving affinity.

A more correct mechanism would use per CPU ready-to-run queues.  Such
a method is also amenable to getting multiple processors into user
space within a single process, using a slightly modified user space
threads scheduler (i.e. one with one threads scheduler stack per
expected CPU entry into the user space).

The other thing this buys you is the ability to migrate processes
between CPU's intentionally, with only reference to a per queue spin
lock, rather than use of the Big Giant Lock(tm), which increases
concurrency, and, for more than 2 CPU's, can only result in a single
stall instead of a global stall, in the kernel scheduler.

You still have to write some extra code to monitor instantaneous load
over time so you can decide when to migrate, and you have to keep a
last_cpu around to make the decision about which run queue to reinsert
on after a wakeup, but that's pretty trivial.

Also note that Peter's implementation is technically only machine
independent for SMP systems that implement a MESI coherency model;
this means it's unsuitable for most non-Intel architectures (e.g.
the PPC 603 dual processor boxes that remove the L2 cache and use
the cache control lines in order to implement only MEI coherency).

But this is true for most of the SMP implementation, e.g. there
are no hooks in the VM for signalling or cache flushing on systems
that don't support automatic cache writeback between processors.
Note that such systems are intrinsically less likely to cache bust,
when coherency is ensured only when necessary, and are likely to
be able to scale to a (potentially much) larger number of processors
than the Intel multiprocessing design.



					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199908132108.OAA18118>