Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 31 May 2007 01:19:16 -0700 (PDT)
From:      Jeff Roberson <jroberson@chesapeake.net>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: Updated rusage patch
Message-ID:  <20070531010631.N661@10.0.0.1>
In-Reply-To: <20070531091419.S826@besplex.bde.org>
References:  <20070529105856.L661@10.0.0.1> <200705291456.38515.jhb@freebsd.org> <20070529121653.P661@10.0.0.1> <20070530065423.H93410@delplex.bde.org> <20070529141342.D661@10.0.0.1> <20070530125553.G12128@besplex.bde.org> <20070529201255.X661@10.0.0.1> <20070529220936.W661@10.0.0.1> <20070530201618.T13220@besplex.bde.org> <20070530115752.F661@10.0.0.1> <20070531091419.S826@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 31 May 2007, Bruce Evans wrote:

> On Wed, 30 May 2007, Jeff Roberson wrote:
>
> % Index: kern/kern_exit.c
> % ===================================================================
> % RCS file: /usr/home/ncvs/src/sys/kern/kern_exit.c,v
> % retrieving revision 1.298
> % diff -u -p -r1.298 kern_exit.c
> % --- kern/kern_exit.c	14 May 2007 22:21:58 -0000	1.298
> % +++ kern/kern_exit.c	30 May 2007 00:10:21 -0000
> % @@ -169,7 +170,8 @@ retry:
> %  		 * Threading support has been turned off.
> %  		 */
> %  	}
> % -
> % +	KASSERT(p->p_numthreads == 1,
> % +	    ("exit1: proc %p exiting with %d threads", p, p->p_numthreads));
> %  	/*
> %  	 * Wakeup anyone in procfs' PIOCWAIT.  They should have a hold
> %  	 * on our vmspace, so we should block below until they have
>
> Lost some formatting.

The extra newline?  Isn't that a style violation?

>
> % @@ -438,15 +442,17 @@ retry:
> %  		PROC_UNLOCK(q);
> %  	}
> % % -	/*
> % -	 * Save exit status and finalize rusage info except for times,
> % -	 * adding in child rusage info later when our time is locked.
> % -	 */
> % +	/* Save exit status. */
> %  	PROC_LOCK(p);
> %  	p->p_xstat = rv;
> %  	p->p_xthread = td;
> % -	p->p_stats->p_ru.ru_nvcsw++;
> % -	*p->p_ru = p->p_stats->p_ru;
> % +	/*
> % +	 * All statistics have been aggregated into the final td_ru by
> % +	 * thread_exit(), just copy it into p_ru here.
> % +	 */
> % +	*ru = td->td_ru;
> % +	ru->ru_nvcsw++;
> % +	p->p_ru = ru;
> % %  	/*
> %  	 * Notify interested parties of our demise.
>
> Still have various bugs here, starting with the grammar error (comma
> splice) in the comment.  I now understand what you mean by thread_exit()
> having aggregated the stats -- thread_exit() hasn't been called yet
> for this thread, but it has been called for other threads in the process,
> if any.  So td_ru has aggregated stats, and accessing it directly just
> gives the old races for the unthreaded case:
> - td_ru is locked by sched_lock, but there is no sched_locking here.
> - any further updates of td_ru except for the times are not copied into
>  p_ru unless they occur during the above copy in which case they corrupt
>  the copy.
> These bugs can probably be fixed by moving the copying to the final
> thread_exit().  Maybe acct_process() can be moved there too, but I
> suspect too much of the process has gone away by then for acct_process()
> to work.

I tried moving the rusage summing and assignment to p_ru to later in this 
function.  This lead to a race which produced "runtime went backwards" 
printfs because runtime became 0.  I didn't quite understand this as we 
check for PRS_ZOMBIE in wait, but I also didn't investigate terribly far.

More notes on locking below.

>
> Oops, I now see that you removed all the (j)s and sched locking here.
> But this isn't quite right -- many of the fields are accessed by
> statclock(), so something like interrupt atomicity or splhigh() is
> required to access them.  The above (mainly in exit1()) gives much
> the same races as in RELENG_4:
> - RELENG_4: statclock() uses splhigh() but not interrupt atomicity.
>            exit1() uses no locking and thus races with statclock().
>  above: statclock() still uses sched_lock but not interrupt atomicity.
>         exit1() uses no locking and thus races with statclock().
> Time fields are mostly in rux and still fully locked by sched_lock.
> exit1() copies some of them to p_ru, but that copy is not used.  I
> think proc locking is still used for p_ru -- it is used in kern_wait(),
> where it seems to be necessary to prevent multiple threads in the
> parent process racing to reap the child.

I believe the sched lock locking is necessary for rux.  For rusage, 
however, it's only ever modified by curthread.  The only races that are 
present when reading without locks are potentially inconsistent rusage 
where the stats are not all from a single snapshot of the program. 
However, the stats can obviously change before the copy makes it back to 
user-space anyhow.  I don't see a real race that needs protecting.

This is why I believe the code in exit1() is also safe although 
technically it could lose some information on the way to thread_exit(). 
To fix this we'd have to do the final accumulation under sched_lock the 
last time we lock it.  However there are other races in the proc code 
there that were not immediately obvious.  Also, doing the accumulation 
here negates the possibility of running process accounting after p_ru is 
valid, which it is not doing now.

Related to rusage but unrelated to my patch;  Why are irxss, irdss, and 
irsss expressed in units of bytes*ticks of execution when rusage doesn't 
report the ticks of execution and rather a timeval?  Are consumers 
expected to sum the timevals and then extrapolate?

Jeff

>
> Bruce
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070531010631.N661>