Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Nov 2008 11:45:39 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        Alexander Motin <mav@freebsd.org>
Cc:        Sam Leffler <sam@freebsd.org>, freebsd-mobile@freebsd.org
Subject:   Re: RFC: powerd algorithms enhancements
Message-ID:  <200811131145.39747.jhb@freebsd.org>
In-Reply-To: <491B5B62.40609@FreeBSD.org>
References:  <200811060901400000@466321507> <200811111206.53809.jhb@freebsd.org> <491B5B62.40609@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 12 November 2008 05:40:34 pm Alexander Motin wrote:
> John Baldwin wrote:
> > On my laptop the ACPI SCI is the culprit.  If I let the CPU drop below 400 
> > mhz, the GPE handler for temperature updates takes so long to run the CPU 
> > spends the entire time processing GPEs and never runs userland.  Thus, 
powerd 
> > never gets to run.  This happens on a "modern" laptop, not a Pentium-100.  
> > And actually, at certain speeds it would eventually let userland run 
enough 
> > to bump up.  I actually added KTR_SCHED events for ACPI GPE and Task 
handling 
> > and hacked schedgraph to parse them and thus had pretty pictures showing 
the 
> > GPE handler using all CPU time during the multiple-second "hangs" I would 
get 
> > on my laptop with powerd.
> 
> If your system completely freezes at 400MHz, then it spends about 20% of 
> CPU time on this at 2GHz. Doesn't it?

Nope.  It is usually very idle at full speed.  You are free to go buy your own 
HP nc6220 if you want to see it for yourself.  You can also grab the KTR 
trace and modified schedgraph.py at www.freebsd.org/~jhb/gpe/.

> With such amount of idle activity  
> you system just unable to save any power! Your 100% running CPU at 
> 400MHz will probably consume more power then any other really idle at 
> 2GHz. If you think that this is normal then disabling powerd is the only 
> way out for you.

Except I do get much better battery life with powerd even with lowest set to 
400.

> >> powerd just makes that situation more probable as it significantly 
> >> reduces CPU performance. Just insert gigabit card into Pentium-100 
> >> system and you will not be able to get there onder the load of only did 
> >> not using device polling mode. Rising frequency on interrupt processing 
> >> _will_not_ fix the problem, but just hide it for some time, until newer 
> >> network cards will be able to handle higher packet rate.
> > 
> > It will definitely fix the problem on my laptop.  
> 
> No. It only hides the problem.

*sigh*  FreeBSD is not usually used for batch-processing.  Most of the work 
FreeBSD does is interrupt-driven.  For those sorts of loads, it does make 
sense that you want to handle your interrupt with minimal latency and then go 
back to sleep when it is done.  The point Sam and I are making is that the 
idea that all power management can be driven from userland is flawed.  It is 
a task that will need to be shared between the kernel and userland.  Sam is 
also suggesting that this might be the single biggest issue with powerd.  I'm 
not quite sure of the exact priority of the various cpufreq/powerd problems, 
but I think it is on a similar scale to not handling multiple CPU's properly.

> >> I think the only solutions for this case can be in allowing scheduler to 
> >> really do it's job. Or by moving _everything_ out of interrupt threads 
> >> to make them extremely fast and so to avoid the livelock problem, or in 
> >> some other way allow scheduler to delay interrupt processing to allow 
> >> other (for example user-level) threads to obtain at least some part of 
> >> their CPU time slot according to their priorities.

This is completely backwards.  Userland is not more important than interrupt 
handling in the kernel.  The problem is that CPU frequency handling is more 
important than relegating the entire task to userland.  Instead of completely 
breaking the entire userland/kernel model to get part of userland executed at 
a kernel-level priority so CPU frequency handling is partially handled at a 
kernel-level priority, why not just move the CPU frequency bits that need to 
be kernel-level into the kernel?  We already doing the thermal management for 
passive cooling in the kernel rather than in userland.

> >> I don't see how powerd itself could do at least anything with this.
> > 
> > The point is that powerd is part of a CPU throttling strategy.  If you are 
> > going to mess with powerd you need to do so in the context of the overall 
> > strategy.
> 
> Can you show me this strategy to work in context? There was no 
> significant changes at powerd for years. Now it does not works fine for 
> SMP, it does not works fine for systems with big number of power levels, 
> it's functionality is absolutely minimal. That's why I have touched it. 
> There is several good ideas of future improvement was proposed, but 
> nobody give me any real objections against what I have proposed.
> 
> All of your objections is that your system unable to operate at low 
> frequency. So how it is related to powerd and proposed patches?

Sam merely suggested that while you are working on improving other areas, that 
fixing this problem is one that is also worth looking at.  In his opinion it 
is even more important.

> Here is how I see possible strategy:
> - Give more information to power controlling application: Differentiate 
> between power level and throttling. Throttling is completely ineffective 
> for CPUs supporting C1E, C2 and deeper states. It will give us better 
> responsibility at equal power consumption.
> - Make scheduler to use some per-CPU power state priorities to allow us 
> really disable unused cores/chips.
> - Reduce interrupt time to allow scheduler better handle process 
> priorities and fight against IRQ livelocks. It does not depends on 
> frequencies.
> 
> What is your strategy vision?

- Move the bits of CPU power management that are really important into the 
kernel.  We should offload things to userland when possible, but interrupt 
handling isn't one you can offload to userland, and ensuring the system has 
enough CPU to process an interrupt when it occurs is the job of the kernel, 
_not_ of userland.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200811131145.39747.jhb>