From owner-freebsd-stable@FreeBSD.ORG Thu Dec 15 21:45:43 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A93961065701 for ; Thu, 15 Dec 2011 21:45:43 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta14.emeryville.ca.mail.comcast.net (qmta14.emeryville.ca.mail.comcast.net [76.96.27.212]) by mx1.freebsd.org (Postfix) with ESMTP id 8F1758FC1A for ; Thu, 15 Dec 2011 21:45:41 +0000 (UTC) Received: from omta17.emeryville.ca.mail.comcast.net ([76.96.30.73]) by qmta14.emeryville.ca.mail.comcast.net with comcast id 9WrN1i0041afHeLAEZlaFh; Thu, 15 Dec 2011 21:45:34 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta17.emeryville.ca.mail.comcast.net with comcast id 9ZCw1i00q1t3BNj8dZCxCE; Thu, 15 Dec 2011 21:12:57 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id D39DD102C19; Thu, 15 Dec 2011 13:45:39 -0800 (PST) Date: Thu, 15 Dec 2011 13:45:39 -0800 From: Jeremy Chadwick To: Dan Nelson Message-ID: <20111215214539.GA33888@icarus.home.lan> References: <4EEA5DD0.1040009@FreeBSD.org> <20111215210421.GA33083@icarus.home.lan> <20111215213539.GJ53453@dan.emsphone.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111215213539.GJ53453@dan.emsphone.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Doug Barton , freebsd-stable@freebsd.org Subject: Re: swi4: clock taking 40% cpu?!? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Dec 2011 21:45:43 -0000 On Thu, Dec 15, 2011 at 03:35:39PM -0600, Dan Nelson wrote: > In the last episode (Dec 15), Jeremy Chadwick said: > > On Thu, Dec 15, 2011 at 12:51:28PM -0800, Doug Barton wrote: > > > Web server under heavy'ish load (7 on a 2 cpu system) running > > > 8.2-RELEASE-p4 i386 I'm seeing this: > > > > > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > > > 12 root -32 - 0K 112K WAIT 0 129:01 39.99% {swi4: clock} > > > > > > Any ideas why the clock should be taking so much cpu? HZ=100 if that > > > makes a difference ... > > > > Could be wrong, but I believe this correlates with IRQ 4. What does > > vmstat -i show for a total and rate for irq4 if you run it, wait a few > > seconds, then run it again? Does the number greatly/rapidly increase? > > That would be "irq4" in that case, though. "swi4" is just a software > interrupt thread, and "clock" is the softclock callout handler. Yep, over my head (on the latter part). The correlation I made between swi4 and IRQ 4 was based on one of our systems. Note "swi4: clock sio", which to me meant "sioX is associated with swi4", so I looked up what sio0 was associated with and it was irq4. I guess it's just total chance that swi4 and IRQ 4 happen to both have the number 4 in them; sorry for the confusion/mistake and thanks for educating me. top: 13 root 1 -32 - 0K 8K WAIT 0 91:03 0.00% [swi4: clock sio] vmstat -i: interrupt total rate irq4: sio0 715 0 > There are both KTR and DTrace logging functions in kern_timeout.c, so you > could use either one to get a handle on what's eating your CPU. I highly doubt Doug's systems have DTrace built in to them, given that on the version of FreeBSD he's using, building it requires manually typing in "make WITH_CTF=1 buildkernel" (you cannot add this variable to make.conf or src.conf, other stuff will break -- this works on 9.x because of "a horrible kludge hack to make work" (not an exact quote but fairly close)). No admin is going to remember to type that in manually every time, and only during the buildkernel phase (not buildworld, not not installworld, and not installkernel). Grumble. :-) Also FWIW: I had no idea about KTR (what it was, its existence, etc.) until right this moment. I figured all this time it was shorthand for "ktrace". Sheesh, all this other neat debugging stuff! > Busy-looping "procstat -k 12" for a few seconds might get you some useful stacks, as > well. I guess that's why I'm surprised there isn't a process or series of processes on the machine which have, say, slightly high CPU (maybe 1% per process/child) and when ktrace'd show they're calling things like gettimeofday() a *lot*. I dealt with this exact situation at work not too long ago, where a developer decided to stick that call in a while(1) loop with conditionals that never slept/waited (but did only a little bit else) then wondered "what the problem was". Be nice to your systems! ;-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |