From owner-freebsd-bugs@FreeBSD.ORG Fri Feb 9 05:08:53 2007 Return-Path: X-Original-To: freebsd-bugs@freebsd.org Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C8A0216A409; Fri, 9 Feb 2007 05:08:53 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2-3.pacific.net.au [61.8.2.226]) by mx1.freebsd.org (Postfix) with ESMTP id 6973213C4BA; Fri, 9 Feb 2007 05:08:53 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.2.163]) by mailout2.pacific.net.au (Postfix) with ESMTP id E0F76109BAB; Fri, 9 Feb 2007 16:08:48 +1100 (EST) Received: from besplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (Postfix) with ESMTP id 6A0A22740A; Fri, 9 Feb 2007 16:08:50 +1100 (EST) Date: Fri, 9 Feb 2007 16:08:49 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Brad Huntting In-Reply-To: <200702090101.l1911Mdp060738@hunkular.glarp.com> Message-ID: <20070209152716.K1383@besplex.bde.org> References: <200702090101.l1911Mdp060738@hunkular.glarp.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-bugs@freebsd.org, FreeBSD-gnats-submit@freebsd.org Subject: Re: kern/108954: 'sleep(1)' sleeps >1 seconds when speedstep (Cx) is in economy mode X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Feb 2007 05:08:53 -0000 On Thu, 8 Feb 2007, Brad Huntting wrote: >> Description: > On some machines (those supporting Intel speedstep), > nanosleep(2) (and presumably select(2)) are confused by cpu > frequency changes and wind up over sleeping. Do they work without the lapic timer? (Not configuring "device apic" is the only easy way to avoid using the lapic timer. I forget if acpi can work without apic.) On some systems, the lapic timer doesn't work at all because the CPU enters a deep sleep on the hlt instruction in the idle process, and one workaround is to run other timers at a higher frequency than the lapic timer frequency to kick the CPU out of its deep sleep and thus keep the lapic timer interrupting. >> How-To-Repeat: > > /bin/sh -c 't0=`date +%s`; sleep 1; t1=`date +%s`; expr $t1 - $t0' > > On a normal machine this should almost always spit out '1'. > > On a Centrino or Pentium-M based laptop (such as the Panasonic > CF-W4), with hw.acpi.cpu.cx_lowest set to something other > than C1, this produces '4' or '5'. > > Note: If you can reproduce this, _please_ post a follow > up so I know I'm not insane. > > The problem seems to be that when 'sysctl hw.acpi.cpu.cx_lowest' > is set to anything other than 'full speed' (aka 'C1') the > cpu frequency is generally (and unpredictably) slower than > C1 speed. tvtohz(9) (located in /sys/kern/kern_clock.c) > assumes a static frequency and so returns several times the > correct number of tics. The frequency used by tvtohz() is required to be fixed. Since it is used mainly for timeouts, the frequency isn't required to be very accurate, but it should be accurate to within a few percent and not wrong by a factor of 5. > $ sysctl hw.acpi.cpu dev.cpu.0.freq_levels kern.timecounter.choice kern.timecounter.hardware > hw.acpi.cpu.cx_supported: C1/1 C2/1 C3/85 > hw.acpi.cpu.cx_lowest: C3 > hw.acpi.cpu.cx_usage: 0.00% 13.11% 86.88% > dev.cpu.0.freq_levels: 1200/-1 1100/-1 1000/-1 900/-1 800/-1 700/-1 600/-1 525/-1 450/-1 375/-1 300/-1 225/-1 150/-1 75/-1 > kern.timecounter.choice: TSC(800) ACPI-fast(1000) i8254(0) dummy(-1000000) > kern.timecounter.hardware: ACPI-fast The timecounter is not really involved here. It is only used to check the time (not quite correctly) after the timeout. That would fix avoid the problem if the timeout is too short but not if it is too long. >> Fix: > > The ideal solution would be to use a clock who's frequency > is not jerked around by speedstep. Perhaps this is just a > hardware bug, but seem to recall seeing this behavior on > my previous Intel Centrino based laptop as well. The i8254 timer (not timecounter) is supposed to have this property. Maybe the lapic timer doesn't. > Fixing nanosleep(2) (and select(2)) alone would be relatively > easy: Since they loop, returning to the user only when the > correct wakeup time has arrived (microtime(9) is apparently > not affected by this problem), one could just have tvtohz(9) > return the number of ticks based on the _lowest_ cpu frequency > rather than the _highest_. Unfortunately, this makes other > users of tvtohz(9) wake up early, and they may not be as > prepared to handle this. Yes, that should be OK as a workaround. One of the things that nanosleep() etc. don't do quite right is related: for very long sleeps, the calculated timeout may be more than 1 tick too long due to clock drift or just the limited resolution of the scale factor used in tvtohz(). That should be handled by using the _lowest_ possible scale factor rather than the nominal one. This could also be used to ensure that the final timeout is minimal (tvtohz() rounds up and then adds 1 to ensure that the timeout is long enough, so an average timeout is 1.5 ticks longer than strictly necessary; by not adding 1 but checking whether the timeout has expired on waking up, it is possible to make an average timeout only 0.5 ticks longer than necessary). There should be a new interface for callers that are prepared to handle this (or they can subtract 1 and rescale). Waking up early also wastes time so it shouldn't usually be done. Bruce