From owner-freebsd-acpi@FreeBSD.ORG Fri Dec 28 02:57:29 2007 Return-Path: Delivered-To: acpi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B4B7916A419; Fri, 28 Dec 2007 02:57:29 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from gaia.nimnet.asn.au (nimbin.lnk.telstra.net [139.130.45.143]) by mx1.freebsd.org (Postfix) with ESMTP id 3D3EB13C457; Fri, 28 Dec 2007 02:57:26 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from localhost (smithi@localhost) by gaia.nimnet.asn.au (8.8.8/8.8.8R1.5) with SMTP id NAA14383; Fri, 28 Dec 2007 13:38:14 +1100 (EST) (envelope-from smithi@nimnet.asn.au) Date: Fri, 28 Dec 2007 13:38:13 +1100 (EST) From: Ian Smith To: John Baldwin In-Reply-To: <200712271449.58285.jhb@freebsd.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: acpi@freebsd.org, njl@freebsd.org Subject: Re: An issue with powerd.. X-BeenThere: freebsd-acpi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: ACPI and power management development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Dec 2007 02:57:29 -0000 On Thu, 27 Dec 2007, John Baldwin wrote: > So I've had some issues where I get weird hangs when I run with powerd enabled > on my laptop and I think I've finally tracked it down. Note that this is on > an older current with the older EC code, but the design flaw in powerd is > still relevant even if the new EC code makes my laptop happier (I'm trying to > update my laptop to more recent HEAD, but there's some weird scheduling bug > that I haven't fixed yet in newer stuff). > > Anyways, I was trying to debug the weird hangs I had when running with powerd > (machine would go unresponsive and fans would spin up, and after a variable > number of seconds it would come back and all the pending input (mouse > movements, keypresses, etc.) would be processed). I added some code to track > how long it takes for GPE's to run that would print out on the console if one > took more than 750ms as I had a feeling that something with ACPI was making > the system busy. Fans spinning up is perhaps interesting? As noted in my recent whinge about lack of component documentation, I've yet to suss out interactions between acpi_thermal (wrt both fans and passive cooling itself modifying cpu freqs - could this fight with powerd?), devd and other subsystems. Yeah I'm slowly beating through the ACPI spec, up to page 46 of >600 pages, but it's reminiscent of reading govt legislation .. I'd love to find the ~50 page precis, then I may be better able to follow some code. > It was also far worse in console mode than in X. In console mode I found that > sometimes the system would never "come back". Presumably X itself keeps it busy enough to keep cpu freq 'reasonable'? I use gkrellm to keep an eye on cpu freq, temp, load avg .. but my T23 is only a two-speed, min 733MHZ, so I can't see what you're seeing (and that's my faster laptop :) > So I was running in console mode recently with my timing patches and noticed > that when it hung it started warning about GPE events taking several > _seconds_ to process, e.g. 2-3 seconds, or in some cases up to _30_ seconds. > So, my theory is that powerd has lowered my CPU all the way down to 100mhz > (easy to reproduce in non-X, just let the box sit with no apps running) and > that for some reason the machine ends up in a "GPE storm" where it is > spending all its time handling GPE's and never has any CPU left for userland > apps (due to being at 100mhz). The problem then is that powerd never runs to > bump my CPU up to some reasonable speed. One workaround some have noted using is to set debug.cpufreq.lowest to some value considerably higher than 100MHz, say >500MHz to maintain reasonable responsiveness, at a cost of higher power use when idle. > In fact, anytime a completely idle box suddently gets a lot of kernel work > (e.g. a sudden flow of packets) it could in theory end up trying to handle > all this work at the reduced speed since the work has a higher priority than > the powerd process. To that end, I think that at least part of powerd needs > to be in the kernel, or at least that the kernel should be more proactive > about bumping the speed up when it resumes from Cx due to an interrupt. A > simple policy would be to bump up to full speed for any non-clock interrupt > (possibly bumping up for a clock interrupt if we wake up softclock as well). > > Thoughts? Just humble grasshopper droppings, master .. but the default powerd polling interval is 500ms, which is a really long time on a fast box, so -p 100 or even less might make a considerable difference? Can't comment on any in-kernel component, but responding per any sort of single interrupt/s sounds way too triggerhappy compared to monitoring load, assuming that such as vm.loadavg and kern.cp_time are themselves updated promptly in high-stress times? AU$0.02, which rounds down to 0 since we abandoned coins less than 5c .. cheers, Ian