Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Nov 2006 20:58:53 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        David Xu <davidxu@FreeBSD.org>
Cc:        cvs-src@FreeBSD.org, src-committers@FreeBSD.org, cvs-all@FreeBSD.org
Subject:   Re: cvs commit: src/sys/kern sched_4bsd.c
Message-ID:  <20061113193924.L75708@delplex.bde.org>
In-Reply-To: <200611130717.03734.davidxu@freebsd.org>
References:  <200611111311.kABDBVNH042993@repoman.freebsd.org> <200611120716.10773.davidxu@freebsd.org> <20061112170556.U71828@delplex.bde.org> <200611130717.03734.davidxu@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 13 Nov 2006, David Xu wrote:

> On Sunday 12 November 2006 14:26, Bruce Evans wrote:
>> ...
>> Testing showed that nothing much is fixed.  Simple benchmarks like:
>> ...
>> %%%
>> for i in 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> do
>>      nice -$i sh -c "while :; do echo -n;done" &
>> done
>> top -o time
>> %%%
>>
>> still show that scheduling without KSE is very unfair.  They can take
>> several minutes to start the `top' process, and "killall sh" can take
>> many seconds to start unless you have an rtprio shell to start the
>> killall.
>>
>> With KSE, the top process starts soon enough and shows just the old
>> 4BSD scheduler bug that too many cycles are given to niced programs,
>> as in all versions of FreeBSD except 4.x.  Since no one uses niced
>> programs, this bug is unimportant.
>
> It might not be a bug of the NO_KSE, the problem is in sched_fork() and
> sched_exit(), for process which quickly fork() a child and then the child
> exits quickly, the parent's estcpu will be doubled quickly too, this fairness
> is really unfair,

That can't be the problem, since there are no exits in the above.

> I think your examples is the scenario, however, I don't know
> why KSE works better. this might be fixed by remembering the inherited
> estcpu in child, and decay it every second. when the child exits,
> it add really used estcpu to parent. code looks like this:
>
> in sched_fork(), we remember inherited estcpu:
> 	td->td_inherited_estcpu = parent->td_estcpu;
> in schedcpu(), we decay it every second (should be fixed in sched_wakeup
> too):
>        td->td_inherited_estcpu = decaycpu(loadfac, td->td_inherited_cpu);
> in sched_exit();
>        parent->td_estcpu = ESTCPULIM(parent->td_estcpu,
> 		childtd->td_estcpu - td->td_inherited_cpu);
>
> This should fix the quickly fork() and exit() problem for parent process.

I've known about this bug since Peter Default told me about it in late
1999, and now use the code at the end of this mail to avoid it.  However,
I remembered it incorrectly and may have misdescribed it to you.  I
thought I remembered actual doubling, with estcpu soon reaching
"infinity", but the ESTCPULIM() clamp prevents it getting preposterously
high now, and I couldn't find any version that let it reach "infinity".
Versions before late 1999 had a bogus limit of UCHAR_MAX and that may
have been responsible for shells appearing to hang because it was a
better approximation to "infinity".

I now use the following:

% Index: sched_4bsd.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/kern/sched_4bsd.c,v
% retrieving revision 1.41
% diff -u -2 -r1.41 sched_4bsd.c
% --- sched_4bsd.c	21 Jun 2004 23:47:47 -0000	1.41
% +++ sched_4bsd.c	8 Dec 2005 11:11:52 -0000
% @@ -550,9 +641,20 @@
% 
%  void
% -sched_exit_ksegrp(struct ksegrp *kg, struct ksegrp *child)
% +sched_exit_ksegrp(struct ksegrp *parent, struct ksegrp *child)
%  {
% 
%  	mtx_assert(&sched_lock, MA_OWNED);
% -	kg->kg_estcpu = ESTCPULIM(kg->kg_estcpu + child->kg_estcpu);
% +	/*
% +	 * XXX adding all of the child's cpu to the parent's like we used to
% +	 * do would be wrong, since we duplicate the parent's cpu at fork
% +	 * time so adding it all back would give exponential growth.  In
% +	 * practice, the growth would have been limited by ESTCPULIM, but that
% +	 * would be wrong too since it is very nonlinear.  Splitting the cpu
% +	 * at fork time would be better, but adding it all back here would
% +	 * still give nonlinearities since multiple processes tend to
% +	 * accumulate more cpu than single ones.
% +	 */
% +	if (parent->kg_estcpu < child->kg_estcpu)
% +		parent->kg_estcpu = child->kg_estcpu;
%  }
%

This seems to work well enough in practice.  It grows the parent's estcpu
quite slowly if there are a lot of fork/exits.

Previous versions did something different on fork too.  Splitting or
otherwise reducing estcpu on fork isn't such a good idea since it
reduces the limit on the real resource hogs -- all the children, when
there are lots of children that all want to run.  When the children
don't exit, hacking on the parent's estcpu doesn't help, and doubling
the child's estpcu on fork and halving it on exit is closer to being
correct than the reverse.

At least one of Peter Dufault's versions removed all explicit accesses
to p_estcpu on fork and exit.  I think the change on fork is only
cosmetic -- p_estcpu should have been automatically copied on fork.

Anyway, this isn't the bug in non-KSE.  I didn't look hard for the
reasons.  Top seemed to show the priorites of the hogs not decreasing
(numerically increasing) fast enough.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061113193924.L75708>