From owner-freebsd-current@FreeBSD.ORG  Wed Dec  8 02:54:07 2010
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8E2C0106566B;
	Wed,  8 Dec 2010 02:54:07 +0000 (UTC)
	(envelope-from davidxu@freebsd.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 604A58FC16;
	Wed,  8 Dec 2010 02:54:07 +0000 (UTC)
Received: from xyf.my.dom (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oB82s5nx082060;
	Wed, 8 Dec 2010 02:54:06 GMT (envelope-from davidxu@freebsd.org)
Message-ID: <4CFEF34D.1030701@freebsd.org>
Date: Wed, 08 Dec 2010 10:54:05 +0800
From: David Xu <davidxu@freebsd.org>
User-Agent: Thunderbird 2.0.0.24 (X11/20100630)
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
References: <20101205231829.GA68156@troutmask.apl.washington.edu>
	<201012060944.03196.jhb@freebsd.org> <4CFD7BB0.20500@freebsd.org>
	<201012070913.17118.jhb@freebsd.org>
In-Reply-To: <201012070913.17118.jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-current@freebsd.org, Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: Process accounting/timing has broken recently
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Dec 2010 02:54:07 -0000

John Baldwin wrote:
> On Monday, December 06, 2010 7:11:28 pm David Xu wrote:
>> John Baldwin wrote:
>>> On Sunday, December 05, 2010 6:18:29 pm Steve Kargl wrote:
>>>   
>>>> Sometime in the last 7-10 days, some one made a
>>>> change that has broken process accounting/timing.
>>>>
>>>> laptop:kargl[42] foreach i ( 0 1 2 3 4 5 6 7 8 9 )
>>>> foreach? time ./testf
>>>> foreach? end
>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>        69.55 real        38.39 user        30.94 sys
>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>        68.82 real        40.95 user        27.60 sys
>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>        69.14 real        38.90 user        30.02 sys
>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>        68.79 real        40.59 user        27.99 sys
>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>        68.93 real        39.76 user        28.96 sys
>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>        68.71 real        41.21 user        27.29 sys
>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>        69.05 real        39.68 user        29.15 sys
>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>        68.99 real        39.98 user        28.80 sys
>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>        69.02 real        39.64 user        29.16 sys
>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>        69.38 real        37.49 user        31.67 sys
>>>>
>>>> testf is a numerically intensive program that tests the
>>>> accuracy of expf() in a tight loop.  User time varies
>>>> by ~3 seconds on my lightly loaded 2 GHz core2 duo processor.
>>>> I'm fairly certain that the code does not suddenly grow/loose
>>>> 6 GFLOP of operations.
>>>>     
>>> The user/sys thing is a hack (and has been).  We sample the PC at stathz (~128 
>>> hz) to figure out a user vs sys split and use that to divide up the total 
>>> runtime (which actually is fairly accurate).  All you need is for the clock 
>>> ticks to fire just a bit differently between runs to get a swing in user vs 
>>> system time.
>>>
>>> What I would like is to keep separate raw bintime's for user vs system time in 
>>> the raw data instead, but that would involve checking the CPU ticker more 
>>> often (e.g. twice for each syscall, interrupt, and trap in addition to the 
>>> current once per context switch).  So far folks seem to be more worried about 
>>> the extra overhead rather than the loss of accuracy.
>>>
>>>   
>> Adding any instruction into global syscall path should be cautioned, it
>> is worse then before, thinking about a threaded application, a userland
>> thread may have locked a mutex and calls a system call, the overhead
>> added to system call path can directly affect a threaded application's
>> performance now, because the time window the mutex is held
>> is longer than before, I have seen some people likes to fiddle with
>> system call path, it should be cautioned.
> 
> OTOH, the current getrusage(2) stats cannot be trusted.  The only meaningful
> thing you can do is to sum them since the total is known to be accurate at
> least.
> 
> If it wouldn't make things so messy I'd consider a new kernel option
> 'ACCURATE_RUSAGE' or some such.
> 
Our getrusage is already very slow, everytime, it needs to
iterate the threads list with a process SLOCK held. I saw some mysql
versions heavily use getrusage, and a horribly slow. I think a
ACCURATE_RUSAGE will make it worse ?