From owner-freebsd-questions@FreeBSD.ORG Thu Sep 14 08:09:00 2006 Return-Path: X-Original-To: questions@FreeBSD.org Delivered-To: freebsd-questions@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 79DCB16A40F; Thu, 14 Sep 2006 08:09:00 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from gaia.nimnet.asn.au (nimbin.lnk.telstra.net [139.130.45.143]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5578143D55; Thu, 14 Sep 2006 08:08:55 +0000 (GMT) (envelope-from smithi@nimnet.asn.au) Received: from localhost (smithi@localhost) by gaia.nimnet.asn.au (8.8.8/8.8.8R1.4) with SMTP id SAA16634; Thu, 14 Sep 2006 18:08:52 +1000 (EST) (envelope-from smithi@nimnet.asn.au) Date: Thu, 14 Sep 2006 18:08:51 +1000 (EST) From: Ian Smith To: Giorgos Keramidas In-Reply-To: <20060914054758.GA77575@gothmog.pc> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: "Tamouh H." , questions@FreeBSD.org Subject: Re: Top not showing cpu usage even remotely accurately X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Sep 2006 08:09:00 -0000 On Thu, 14 Sep 2006, Giorgos Keramidas wrote: > On 2006-09-14 00:48, "Tamouh H." wrote: > > I think TOP and load averages are no longer accurate on FBSD 5.x and > > 6.x with SMP kernel. As far as I've seen. Load averages hit sometimes > > 8.0 without a noticable degradation in performance. I still can't fathom what top tells me on a UP 5.5-STABLE system (300MHz Celeron if speed's relevant). I initiated this thread (weeks ago :) re seeing 0.0% idle (as expected) during buildworld but not seeing anything add up to anything like 100%, including S)ystem processes, in top. Chuck Swiger pointed out that a buildworld runs lots of processes for far shorter times than top's sampling interval, which was true, as a browse with 'lastcomm -eE | less' through the buildworld time showed. However that doesn't explain this typical top view when the system is quiescent or nearly so, as it mostly is, with only 5-minutely crons and 11-minutely entropy runs and the odd sendmail to be seen in lastcomm: last pid: 18500; load averages: 0.01, 0.08, 0.06 up 5+08:40:33 17:30:30 136 processes: 3 running, 110 sleeping, 23 waiting CPU states: 5.7% user, 0.0% nice, 6.3% system, 0.0% interrupt, 88.0% idle Mem: 73M Active, 18M Inact, 46M Wired, 8108K Cache, 25M Buf, 2572K Free Swap: 384M Total, 106M Used, 278M Free, 27% Inuse PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 11 root 171 52 0K 8K RUN 102.3H 86.82% 86.82% idle 743 smithi 96 0 26616K 2908K select 156:40 1.03% 1.03% kdeinit 708 smithi 96 0 34140K 15024K select 223:05 0.63% 0.63% Xorg 644 root 96 0 1244K 244K select 30:19 0.05% 0.05% moused 775 smithi 20 0 11524K 1028K kserel 319:17 0.00% 0.00% xmms 761 smithi 96 0 30824K 7272K select 97:50 0.00% 0.00% kdeinit 27 root 76 -43 0K 8K RUN 44:14 0.00% 0.00% swi5: clock s 772 smithi 96 0 29736K 5600K select 40:57 0.00% 0.00% kdeinit 777 smithi 8 0 2300K 448K nanslp 36:20 0.00% 0.00% asapm 778 smithi 8 0 2524K 460K nanslp 34:12 0.00% 0.00% ascpu 767 smithi 96 0 29448K 5612K select 29:23 0.00% 0.00% kdeinit 771 smithi 96 0 29884K 5504K select 22:28 0.00% 0.00% kdeinit 616 mysql 20 0 50824K 1428K kserel 21:04 0.00% 0.00% mysqld 759 smithi 96 0 29644K 5092K select 20:56 0.00% 0.00% kdeinit 773 smithi 96 0 35640K 4080K select 20:39 0.00% 0.00% kdeinit 766 smithi 96 0 29488K 4768K select 19:07 0.00% 0.00% kdeinit 764 smithi 96 0 28784K 3964K select 16:38 0.00% 0.00% kdeinit 774 smithi 96 0 33168K 3768K select 16:36 0.00% 0.00% kdeinit 757 smithi 96 0 27272K 5508K select 4:55 0.00% 0.00% kdeinit 23 root -60 -179 0K 8K WAIT 3:04 0.00% 0.00% irq12: psm0 22 root -80 -199 0K 8K WAIT 3:02 0.00% 0.00% irq11: cbb0 c 43 root 20 0 0K 8K syncer 3:00 0.00% 0.00% syncer 4 root -8 0 0K 8K - 2:58 0.00% 0.00% g_down 3 root -8 0 0K 8K - 2:30 0.00% 0.00% g_up 49 root 12 0 0K 8K - 2:09 0.00% 0.00% schedcpu 30 root -16 0 0K 8K - 1:53 0.00% 0.00% yarrow 39 root -16 0 0K 8K psleep 1:30 0.00% 0.00% pagedaemon 41 root 171 52 0K 8K pgzero 1:25 0.00% 0.00% pagezero [..] It never shows more than about 90% idle, whereas a 0.01 shorter term load average should indicate more like 99% idle, shouldn't it? 97-99%, sometimes 100% idle was what FreeBSD 4.5-R used to tell me with the same workload in around the same memory use, but maybe 4.5 was optimistic .. > > This is one TOP that freaked me out, notice Idle CPU is 70% while the > > process is showing it is using 99% of CPU. systat draws more accurate > > picture, however, load average is still useless as far as performance > > monitoring : > > > > last pid: 10174; load averages: 1.63, 1.44, 1.20 up 4+00:25:19 00:39:20 > > 169 processes: 2 running, 166 sleeping, 1 zombie > > CPU states: 25.8% user, 0.0% nice, 0.7% system, 0.1% interrupt, 73.4% idle > > Mem: 1316M Active, 1445M Inact, 297M Wired, 127M Cache, 112M Buf, 79M Free > > Swap: 8762M Total, 2096K Used, 8760M Free > > > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND > > 13362 root 111 0 36444K 34196K CPU3 3 50:06 98.88% 98.88% perl5.8.7 > > 90391 root 96 0 27356K 26236K select 2 0:06 0.54% 0.54% perl5.8.7 > > 79619 nobody 4 0 209M 84640K sbwait 1 0:09 0.39% 0.39% httpd > > 10161 root 97 0 6712K 4752K select 2 0:00 1.40% 0.20% exim-4.62-0 > > 79649 nobody 20 0 210M 84464K lockf 0 0:06 0.15% 0.15% httpd > > Apparently, you have a 4-CPU system :-) > > What you see displayed as "CPU" is for one of the processors, not for > all of them. Load average is not an easy thing to update for an SMP > system, I guess. There are two options: That idle looks right for one busy cpu of four, though what the other 0.63 load average consists of is less clear. In my recent top shot above, ordered by c)pu, I can't see more than 2 or 3% accounted for of the ~15% that is not idle, ie what processes are involved with the 5.7% user and 6.3% system usage? In FreeBSD 4, if (say) Mozilla went mad on some crappy javascript loop, top would show idle at 0.0% and the busy process at or nearer 100%, making it easy to spot and, if necessary, kill. Since running 5.4-R and now 5.5-STABLE, such 0.0% idle events can happen with top not showing the process involved looking busy at all - I'll capture this next time - and while it's usually obvious that (usually) Mozilla' the 'culprit' and killing it frees the system, I'm still bemused that top can't 'see' it. Re the 4-cpu box: > I don't remember off-hand how 5.X or 6.X calculate their load-average, > but I'd be interested to know what you expected it to show, or what it > shows on Linux systems. I've only a few years watching 4.5-R on this laptop for comparison :) but am installing 6.1 on a newer machine any day now, and will report. Cheers, Ian