From owner-freebsd-arch@FreeBSD.ORG  Wed May 30 19:04:27 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id A18D616A421
	for <freebsd-arch@freebsd.org>; Wed, 30 May 2007 19:04:27 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com
	[216.240.101.25])
	by mx1.freebsd.org (Postfix) with ESMTP id 5842513C45E
	for <freebsd-arch@freebsd.org>; Wed, 30 May 2007 19:04:27 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from [192.168.1.101] (c-71-231-138-78.hsd1.or.comcast.net
	[71.231.138.78]) (authenticated bits=0)
	by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id
	l4UJ4OHO074016
	(version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO);
	Wed, 30 May 2007 15:04:25 -0400 (EDT)
	(envelope-from jroberson@chesapeake.net)
Date: Wed, 30 May 2007 12:04:17 -0700 (PDT)
From: Jeff Roberson <jroberson@chesapeake.net>
X-X-Sender: jroberson@10.0.0.1
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20070530201618.T13220@besplex.bde.org>
Message-ID: <20070530115752.F661@10.0.0.1>
References: <20070529105856.L661@10.0.0.1> <200705291456.38515.jhb@freebsd.org>
	<20070529121653.P661@10.0.0.1> <20070530065423.H93410@delplex.bde.org>
	<20070529141342.D661@10.0.0.1> <20070530125553.G12128@besplex.bde.org>
	<20070529201255.X661@10.0.0.1> <20070529220936.W661@10.0.0.1>
	<20070530201618.T13220@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@freebsd.org
Subject: Re: Updated rusage patch
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 30 May 2007 19:04:27 -0000

Ok, hopefully we're in the home stretch here.  Patch is at the same 
location.  I did the following:

1)  Renamed p_ru back.  I erroneously thought some code was accessing 
pstats indirectly through here.  That's why I changed the name.
2)  Cleaned up some more comments, casts, style bugs, etc.
3)  Fixed a problem where we migh've called callout_init_mtx() multiple 
times.
4)  Renamed ruxcollect since it doesn't do the same thing as rucollect().
5)  Resorted lim_cb() which now holds the sched_lock over much less code 
but is certainly safe as is.

Hopefully you will find this to your liking.  I intend to fix some more of 
the races in a follow-up commit when I change the scheduler locking.

Sorry for the top post.

Jeff

On Wed, 30 May 2007, Bruce Evans wrote:

> On Tue, 29 May 2007, Jeff Roberson wrote:
>
>> I have updated the patch at:
>> 
>> http://people.freebsd.org/~jeff/rusage3.diff
>
> New minor points about an even later update:
>
> % Index: kern/kern_exit.c
> % ===================================================================
> % RCS file: /usr/home/ncvs/src/sys/kern/kern_exit.c,v
> % retrieving revision 1.298
> % diff -u -p -r1.298 kern_exit.c
> % --- kern/kern_exit.c	14 May 2007 22:21:58 -0000	1.298
> % +++ kern/kern_exit.c	29 May 2007 21:38:21 -0000
> % ...
> % @@ -229,7 +233,7 @@ retry:
> %  	 */
> %  	EVENTHANDLER_INVOKE(process_exit, p);
> % % -	MALLOC(p->p_ru, struct rusage *, sizeof(struct rusage),
> % +	MALLOC(ru, struct rusage *, sizeof(struct rusage),
> %  		M_ZOMBIE, M_WAITOK);
> %  	/*
> %  	 * If parent is waiting for us to exit or exec,
>
> Perhaps this should not be micro-optimized for space (allocate it inside
> struct proc for the whole process lifetime).  Alternatively, inherit it
> from the first thread in the process that exits.
>
> % Index: kern/kern_resource.c
> % ===================================================================
> % RCS file: /usr/home/ncvs/src/sys/kern/kern_resource.c,v
> % retrieving revision 1.171
> % diff -u -p -r1.171 kern_resource.c
> % --- kern/kern_resource.c	27 May 2007 20:50:23 -0000	1.171
> % +++ kern/kern_resource.c	29 May 2007 22:06:05 -0000
> % @@ -619,6 +620,47 @@ setrlimit(td, uap)
> %  	return (error);
> %  }
> % % +static void
> % +lim_cb(void *arg)
> % +{
> % +	struct rlimit rlim;
> % +	struct thread *td;
> % +	struct proc *p;
>
> Unsorted decls.
>
> % +
> % +	p = (struct proc *)arg;
>
> Unnecessary cast.
>
> % +	PROC_LOCK_ASSERT(p, MA_OWNED);
> % +	/*
> % +	 * Check if the process exceeds its cpu resource allocation.  If
> % +	 * it reaches the max, arrange to kill the process in ast().
> % +	 */
> % +	mtx_lock_spin(&sched_lock);
> % +	FOREACH_THREAD_IN_PROC(p, td)
> % +		ruxcollect(&p->p_rux, td);
> % +	if (p->p_cpulimit != RLIM_INFINITY &&
>
> This should just retun if the limit is infinity.
>
> p_cpulimit is still marked as locked by sched_lock, but proc locking
> should be enough now, so sched locking (or whatever it becomes) is
> not needed for the early test and retrun.  The early return rarely
> happens but it gives simpler code as well as saving time when it
> happens.
>
> % +	    p->p_rux.rux_runtime > p->p_cpulimit * cpu_tickrate()) {
> % +		lim_rlimit(p, RLIMIT_CPU, &rlim);
> % +		if (p->p_rux.rux_runtime >= rlim.rlim_max * cpu_tickrate()) {
>
> Hmm, the tick rate conversion bug gives more than wrong stats.  Here it
> shoots processs.  E.g., suppose you have a cpulimit of 1 hour and you
> throttle the CPU to 8 times slower.  Processes that have run for > 7.5
> minutes at the old (higher) tick rate are then killed here.
>
> % ...
> % +	if (p->p_cpulimit != RLIM_INFINITY)
> % +		callout_reset(&p->p_limco, hz, lim_cb, (void *)p);
>
> Unnecessary cast.
>
> % Index: kern/kern_synch.c
> % ===================================================================
> % RCS file: /usr/home/ncvs/src/sys/kern/kern_synch.c,v
> % retrieving revision 1.295
> % diff -u -p -r1.295 kern_synch.c
> % --- kern/kern_synch.c	18 May 2007 07:10:45 -0000	1.295
> % +++ kern/kern_synch.c	29 May 2007 21:35:40 -0000
> % @@ -401,35 +401,17 @@ mi_switch(int flags, struct thread *newt
> % ...
> %  	/*
> %  	 * Compute the amount of time during which the current
> % -	 * process was running, and add that to its total so far.
> % +	 * thread was running, and add that to its total so far.
> %  	 */
> %  	new_switchtime = cpu_ticks();
> % -	p->p_rux.rux_runtime += (new_switchtime - PCPU_GET(switchtime));
> % -	p->p_rux.rux_uticks += td->td_uticks;
> % -	td->td_uticks = 0;
> % -	p->p_rux.rux_iticks += td->td_iticks;
> % -	td->td_iticks = 0;
> % -	p->p_rux.rux_sticks += td->td_sticks;
> % -	td->td_sticks = 0;
> % -
> % +	td->td_runtime += (new_switchtime - PCPU_GET(switchtime));
>
> Clean further by removing unnecesary parens.
>
> %  	td->td_generation++;	/* bump preempt-detect counter */
> % -
> % -	/*
> % -	 * Check if the process exceeds its cpu resource allocation.  If
> % -	 * it reaches the max, arrange to kill the process in ast().
> % -	 */
> % -	if (p->p_cpulimit != RLIM_INFINITY &&
> % -	    p->p_rux.rux_runtime >= p->p_cpulimit * cpu_tickrate()) {
> % -		p->p_sflag |= PS_XCPU;
> % -		td->td_flags |= TDF_ASTPENDING;
> % -	}
> % -
> %  	/*
> %  	 * Finish up stats for outgoing thread.
> %  	 */
>
> Clean further by merging the "start" and "finish" code (so that
> switchtime is cleared immediately after it is copied, etc).  There is
> nothing in between any more.
>
> Bruce
>