Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Oct 2002 18:26:00 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Luigi Rizzo <rizzo@icir.org>
Cc:        Jeff Roberson <jeff@FreeBSD.ORG>, <arch@FreeBSD.ORG>
Subject:   Re: Scheduler patch, ready for commit.
Message-ID:  <20021010171931.U8144-100000@gamplex.bde.org>
In-Reply-To: <20021009193501.A55534@carp.icir.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 9 Oct 2002, Luigi Rizzo wrote:

> On Wed, Oct 09, 2002 at 09:17:18PM -0400, Jeff Roberson wrote:
> > I haven't heard any objections to the goals of this patch.  I have cleaned
> > it up and readied it for commit.  This step is important so that I can
>
> well, you said you posted it just to get feedback, and now in 24 hours
> you declare it "ready for commit". A bit rushing, aren't you!

I agree.  I got it too late yesterday to respond then.

> I totally agree on the goals of the change (to abstract the scheduler
> from the base system, generalise it, etc.), but it seems to me that
> the specific details still need some cleanup before committing.
> And, well, maybe leave a little bit more time to people to provide feedback!
>
> So to come to the specific points:
>
>   * there is one important API function which in my opinion is
>     missing: -stable has a function, curpriority_cmp, which compares
>     the priority of a currently running "thing" (process/thread)
>     with that of a newly awaken one, and decises who has more right
>     to get the CPU.

This function is a mistake which never actually worked.

>     In -current this is done inline (by comparing priorities),

-current cleaned this up by restoring priorities to a simple totally
ordered scheme, but didn't actually restore the inline comparisons
that worked.  It still does the the main comparison in maybe_resched(),
and still gets it wrong by comparing wrong priorities (td->td_priority
vs. curthread->td_curpriority).  KSE has made this bug more obvious:
td is quite often curthread, so the comparision fails, and even when
td is not curthread, td->td_priority is often stale.  The correct
priority for at least the call to maybe_resched() from resetpriority()
is kg->kg_user_pri (was p->p_usrpri).  td->td_priority is set to
kg->kg_user_pri in schedclock() _after_ calling resetpriority() there,
so the bug only causes rescheduling delays of at most INVERSE_ESTCPU_WEIGHT
ticks (default 1/16 seconds).

maybe_resched() is a mistake too.  It is easier to get the comparison
correct by comparing the correct terms inline in the few places that
the comparsion is done than to pass the correct terms to maybe_resched().
KSE reduced the number of such places by uninlining "OPTIMIZED EXPANSION
OF setrunnable()".

>     but this is very specific of the scheduler used there -- e.g.
>     it doesn't  work for the scheduler used in -stable (where we have
>     3 different classes for rtprio, normal and idlepri processes),
>     and it does not work in general in cases where you could have
>     different metrics to decide who is going to proceed.

curpriority_cmp() and maybe_resched() are unsitable for APIs _because_
they are very is very scheduler-specific (non-broken schedulers
schedulers don't even have them :-).  However, I think totally ordered
priorities are more or less required for priority propagation to work
right.  The scheduling algorithm is unimportant for priority propagation
-- there just must be a way to make selected processes run in preference
to others so that priority inversion doesn't occur.  Schedulers can
work by mapping their decisions into generic td_priority values.  The
range of generic values should be larger than [0..UCHAR_MAX].

>   * Another API which should be made generic is forward_roundrobin().
>     I believe the purpose of a "generic" version of this function
>     is to dispatch a generic timeout to the appropriate CPU(s). In
>     the FreeBSD's scheduler this timeout happens to be the roundrobin
>     timeout, hence the name, but other schedulers (e.g. the one we
>     wrote) have different timeout routines, and the dispatching
>     requirements vary (e.g. could need to go to a single CPU as
>     opposed to all cpus).

forward_roundrobin() seems to be generic enough already.  It doesn't
decide the timeout.  It just reschedules the current thread on _all_
CPUs.  This is a bit too scheduler-dependent.  A generic version would
let the scheduler choose which threads to reschedule and just kick the
CPUs that these threads are running on.  But it is only an optimization
to avoid null rescheduling.  Schedulers can kick running threads directly
with different costs

I agree with most of Luigi's other points.  Just don't pessimize things
more than 0.01% using too many function calls and indirections.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20021010171931.U8144-100000>