From owner-cvs-src@FreeBSD.ORG  Mon Nov 13 12:48:16 2006
Return-Path: <owner-cvs-src@FreeBSD.ORG>
X-Original-To: cvs-src@freebsd.org
Delivered-To: cvs-src@freebsd.org
Received: from localhost.my.domain (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id DC9C416A403;
	Mon, 13 Nov 2006 12:48:15 +0000 (UTC)
	(envelope-from davidxu@freebsd.org)
From: David Xu <davidxu@freebsd.org>
To: Bruce Evans <bde@zeta.org.au>
Date: Mon, 13 Nov 2006 20:48:07 +0800
User-Agent: KMail/1.8.2
References: <200611111311.kABDBVNH042993@repoman.freebsd.org>
	<200611130717.03734.davidxu@freebsd.org>
	<20061113193924.L75708@delplex.bde.org>
In-Reply-To: <20061113193924.L75708@delplex.bde.org>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200611132048.08180.davidxu@freebsd.org>
Cc: cvs-src@freebsd.org, src-committers@freebsd.org, cvs-all@freebsd.org
Subject: Re: cvs commit: src/sys/kern sched_4bsd.c
X-BeenThere: cvs-src@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: CVS commit messages for the src tree <cvs-src.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-src>
List-Post: <mailto:cvs-src@freebsd.org>
List-Help: <mailto:cvs-src-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Nov 2006 12:48:16 -0000

On Monday 13 November 2006 17:58, Bruce Evans wrote:

> > It might not be a bug of the NO_KSE, the problem is in sched_fork() and
> > sched_exit(), for process which quickly fork() a child and then the child
> > exits quickly, the parent's estcpu will be doubled quickly too, this
> > fairness is really unfair,
>
> That can't be the problem, since there are no exits in the above.
>
I have tried the change I mentioned, and top runs quickly and the
system does not have the problem as you described.

> > I think your examples is the scenario, however, I don't know
> > why KSE works better. this might be fixed by remembering the inherited
> > estcpu in child, and decay it every second. when the child exits,
> > it add really used estcpu to parent. code looks like this:
> >
> > in sched_fork(), we remember inherited estcpu:
> > 	td->td_inherited_estcpu = parent->td_estcpu;
> > in schedcpu(), we decay it every second (should be fixed in sched_wakeup
> > too):
> >        td->td_inherited_estcpu = decaycpu(loadfac, td->td_inherited_cpu);
> > in sched_exit();
> >        parent->td_estcpu = ESTCPULIM(parent->td_estcpu,
> > 		childtd->td_estcpu - td->td_inherited_cpu);
> >
> > This should fix the quickly fork() and exit() problem for parent process.
>
> I've known about this bug since Peter Default told me about it in late
> 1999, and now use the code at the end of this mail to avoid it.  However,
> I remembered it incorrectly and may have misdescribed it to you.  I
> thought I remembered actual doubling, with estcpu soon reaching
> "infinity", but the ESTCPULIM() clamp prevents it getting preposterously
> high now, and I couldn't find any version that let it reach "infinity".
> Versions before late 1999 had a bogus limit of UCHAR_MAX and that may
> have been responsible for shells appearing to hang because it was a
> better approximation to "infinity".
>
> I now use the following:
>
> % Index: sched_4bsd.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/kern/sched_4bsd.c,v
> % retrieving revision 1.41
> % diff -u -2 -r1.41 sched_4bsd.c
> % --- sched_4bsd.c	21 Jun 2004 23:47:47 -0000	1.41
> % +++ sched_4bsd.c	8 Dec 2005 11:11:52 -0000
> % @@ -550,9 +641,20 @@
> %
> %  void
> % -sched_exit_ksegrp(struct ksegrp *kg, struct ksegrp *child)
> % +sched_exit_ksegrp(struct ksegrp *parent, struct ksegrp *child)
> %  {
> %
> %  	mtx_assert(&sched_lock, MA_OWNED);
> % -	kg->kg_estcpu = ESTCPULIM(kg->kg_estcpu + child->kg_estcpu);
> % +	/*
> % +	 * XXX adding all of the child's cpu to the parent's like we used to
> % +	 * do would be wrong, since we duplicate the parent's cpu at fork
> % +	 * time so adding it all back would give exponential growth.  In
> % +	 * practice, the growth would have been limited by ESTCPULIM, but that
> % +	 * would be wrong too since it is very nonlinear.  Splitting the cpu
> % +	 * at fork time would be better, but adding it all back here would
> % +	 * still give nonlinearities since multiple processes tend to
> % +	 * accumulate more cpu than single ones.
> % +	 */
> % +	if (parent->kg_estcpu < child->kg_estcpu)
> % +		parent->kg_estcpu = child->kg_estcpu;
> %  }
> %
>
> This seems to work well enough in practice.  It grows the parent's estcpu
> quite slowly if there are a lot of fork/exits.
>
Yes, I knew there was the patch.

> Previous versions did something different on fork too.  Splitting or
> otherwise reducing estcpu on fork isn't such a good idea since it
> reduces the limit on the real resource hogs -- all the children, when
> there are lots of children that all want to run.  When the children
> don't exit, hacking on the parent's estcpu doesn't help, and doubling
> the child's estpcu on fork and halving it on exit is closer to being
> correct than the reverse.
>
> At least one of Peter Dufault's versions removed all explicit accesses
> to p_estcpu on fork and exit.  I think the change on fork is only
> cosmetic -- p_estcpu should have been automatically copied on fork.
>
> Anyway, this isn't the bug in non-KSE.  I didn't look hard for the
> reasons.  Top seemed to show the priorites of the hogs not decreasing
> (numerically increasing) fast enough.
>
I still can not find the bug although I have read all changes several times.

> Bruce