From owner-freebsd-hackers@FreeBSD.ORG Tue Jan 3 17:14:02 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D76D4106566B; Tue, 3 Jan 2012 17:14:02 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id AD9068FC16; Tue, 3 Jan 2012 17:14:02 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 65D8746B3B; Tue, 3 Jan 2012 12:14:02 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id C3FE9B91E; Tue, 3 Jan 2012 12:14:01 -0500 (EST) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Tue, 3 Jan 2012 12:13:54 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p8; KDE/4.5.5; amd64; ; ) References: <4E3CC033.6070604@rawbw.com> <4E3D808F.1030101@rawbw.com> <201108160925.20568.jhb@freebsd.org> In-Reply-To: <201108160925.20568.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201201031213.54336.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 03 Jan 2012 12:14:01 -0500 (EST) Cc: Yuri , Alexander Best Subject: Re: top(1) loses process user time count when threads end X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jan 2012 17:14:02 -0000 On Tuesday, August 16, 2011 9:25:20 am John Baldwin wrote: > On Saturday, August 06, 2011 1:57:35 pm Yuri wrote: > > On 08/06/2011 02:11, Alexander Best wrote: > > > On Fri Aug 5 11, Yuri wrote: > > >> I have the process that first runs in 3 threads but later two active > > >> threads exit. > > >> > > >> top(1) shows this moment this way (1 sec intervals): > > >> 30833 yuri 3 76 0 4729M 4225M nanslp 4 0:32 88.62% app > > >> 30833 yuri 3 76 0 4729M 4225M nanslp 6 0:34 90.92% app > > >> 30833 yuri 1 96 0 4729M 4225M CPU1 1 0:03 1.17% app > > >> 30833 yuri 1 98 0 4729M 4226M CPU1 1 0:04 12.89% app > > >> > > >> Process time goes down: 0:34 -> 0:03. Also WCPU goes down 90.92% -> > > >> 1.17% even though this process is CPU bound and does intense things > > >> right after threads exit. > > >> > > >> getrusage(2) though, called in the process, shows the correct user time. > > >> > > >> I think this is the major bug in the process time accounting. > > > could you check, whether kern/128177 or kern/140892 describe your situation? > > > > I have ULE scheduler. kern/128177 talks about single thread with ULE > > scheduler, and my issue is with threads. So I am not sure if it is > > related. There have been no motion on kern/128177 since Feb 9, 2009. > > kern/140892 is probably the same as mine. > > > > In any case, both these PRs have to be fixed since they are very user > > visible, not just some obscure issues. Actually, I now think I know what this is. This is probably fixed now by the kernel changes in revision 188764 and my changes to top in 224062. I think what happened before is that top(1) "lost" the the runtime of exited threads because it used to sum up the runtime of the currently executing threads to get the process' runtime. Now it will use the kernel's value for the process runtime which should include both exited threads and currently running threads. I can't tell how recent your kernel/world are though from your message to see if you have both of these changes. -- John Baldwin