From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 15:02:04 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2287116A4D8 for ; Fri, 1 Oct 2004 15:02:04 +0000 (GMT) Received: from mail2.speakeasy.net (mail2.speakeasy.net [216.254.0.202]) by mx1.FreeBSD.org (Postfix) with ESMTP id D1C8943D4C for ; Fri, 1 Oct 2004 15:02:03 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 27300 invoked from network); 1 Oct 2004 15:02:03 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 1 Oct 2004 15:02:02 -0000 Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i91F1sEH027282 for ; Fri, 1 Oct 2004 11:01:58 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: arch@FreeBSD.org Date: Fri, 1 Oct 2004 11:02:43 -0400 User-Agent: KMail/1.6.2 MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <200410011102.43394.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx Subject: [PATCH] Rework how we store process times in the kernel and deferring calcru() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 15:02:04 -0000 I'll commit this soonish unless there are any objections. The basic idea is to store process times resource usage as raw data (i.e. as bintimes and tick counts) for both process usage and child usage and only calculate the timeval style times if they are explicitly asked for. This lets us avoid always calling calcru() to calculate the timeval values in exit1() for example. A more detailed listing of the changes follows: - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. This also includes an additional fix so that calcru() now correctly handles threads from the process that are executing on other CPUs. Also, the calcru() now only locks sched_lock internally while doing the rux_runtime fixup. It now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. As a side effect of storing the raw values, the accuracy of the process timing has been approved. This makes benchmarking somewhat tricky as the appearance is that with this patch user times go way up but system times go way down. Thus, the only benchmarks I did were to compare real times and to also compare the sum of the user and system times to the real times. Thus, here are the results on a kernel w/o debugging (when WITNESS + INVARIANTS were on, the extra overhead resulted in no statistical difference in the before and after). For real times (100 runs of 10000 fork/wait loops): x smpng.fast.real + proc.fast.real +--------------------------------------------------------------------------+ | + | | + | | + + | | + + | | + + | | + + | | + + | | + + x x | | + + x x | | + + x x | | + + x x | | + + x x x | | + + x x x | | + + x x x | | + + x x x | | + + x x x x | | + + + x x x x | | + + + x x x x | | + + + x x x x | | + + + x x x x | | + + + x x x x x | | + + + x x x x x | | + + + + x x x x x | | + + + + x x x x x | | + + + + x x x x x x | | + + + + + * x x x x x | | + + + + + + * x x x x x | | + + + + + + * x x x x x | | + + + + + + + * * x x x x x | |+ + + + + + + + * * * x x x x x x| | |___M__A_____| |____M_A______| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 100 2.97 3.08 2.99 2.9959 0.018968075 + 100 2.88 2.99 2.93 2.9362 0.017568337 Difference at 95.0% confidence -0.0597 +/- 0.0050674 -1.99272% +/- 0.169145% (Student's t, pooled s = 0.0182816) So, close to about a 2% improvement. As far as accuracy "improvements", the numbers comparing sum of user + sys compared to "real" time is: x smpng.fast.real + smpng.fast.total N Min Max Median Avg Stddev x 100 2.97 3.08 2.99 2.9959 0.018968075 + 100 2.83 2.93 2.86 2.8601 0.016111668 Difference at 95.0% confidence -0.1358 +/- 0.0048779 -4.53286% +/- 0.162819% (Student's t, pooled s = 0.0175979) And for the kernel with the patch: x proc.fast.real + proc.fast.total N Min Max Median Avg Stddev x 100 2.88 2.99 2.93 2.9362 0.017568337 + 100 2.85 2.96 2.92 2.9201 0.017551943 Difference at 95.0% confidence -0.0161 +/- 0.00486742 -0.548328% +/- 0.165773% (Student's t, pooled s = 0.0175601) Thus, the total counts are closer to the real times with the patch than without the patch. Given that these results were repeated numerous times with different benchmarks on an idle box in the same state I feel that these differences indicate an improvement in the accuracy of the accounting. The patch is at http://www.FreeBSD.org/~jhb/patches/rusage_ext.patch and is largely based on a patch originally submitted by bde@. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org