Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 9 Feb 2007 16:08:49 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Brad Huntting <huntting@hunkular.glarp.com>
Cc:        freebsd-bugs@freebsd.org, FreeBSD-gnats-submit@freebsd.org
Subject:   Re: kern/108954: 'sleep(1)' sleeps >1 seconds when speedstep (Cx) is in economy mode
Message-ID:  <20070209152716.K1383@besplex.bde.org>
In-Reply-To: <200702090101.l1911Mdp060738@hunkular.glarp.com>
References:  <200702090101.l1911Mdp060738@hunkular.glarp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 8 Feb 2007, Brad Huntting wrote:

>> Description:
> 	On some machines (those supporting Intel speedstep),
> 	nanosleep(2) (and presumably select(2)) are confused by cpu
> 	frequency changes and wind up over sleeping.

Do they work without the lapic timer?  (Not configuring "device apic"
is the only easy way to avoid using the lapic timer.  I forget if acpi
can work without apic.)  On some systems, the lapic timer doesn't work
at all because the CPU enters a deep sleep on the hlt instruction in
the idle process, and one workaround is to run other timers at a higher
frequency than the lapic timer frequency to kick the CPU out of its
deep sleep and thus keep the lapic timer interrupting.

>> How-To-Repeat:
>
> 		/bin/sh -c 't0=`date +%s`; sleep 1; t1=`date +%s`; expr $t1 - $t0'
>
> 	On a normal machine this should almost always spit out '1'.
>
> 	On a Centrino or Pentium-M based laptop (such as the Panasonic
> 	CF-W4), with hw.acpi.cpu.cx_lowest set to something other
> 	than C1, this produces '4' or '5'.
>
> 	Note:  If you can reproduce this, _please_ post a follow
> 	up so I know I'm not insane.
>
> 	The problem seems to be that when 'sysctl hw.acpi.cpu.cx_lowest'
> 	is set to anything other than 'full speed' (aka 'C1') the
> 	cpu frequency is generally (and unpredictably) slower than
> 	C1 speed.  tvtohz(9) (located in /sys/kern/kern_clock.c)
> 	assumes a static frequency and so returns several times the
> 	correct number of tics.

The frequency used by tvtohz() is required to be fixed.  Since it is
used mainly for timeouts, the frequency isn't required to be very
accurate, but it should be accurate to within a few percent and not
wrong by a factor of 5.

> 		$ sysctl hw.acpi.cpu dev.cpu.0.freq_levels kern.timecounter.choice kern.timecounter.hardware
> 		hw.acpi.cpu.cx_supported: C1/1 C2/1 C3/85
> 		hw.acpi.cpu.cx_lowest: C3
> 		hw.acpi.cpu.cx_usage: 0.00% 13.11% 86.88%
> 		dev.cpu.0.freq_levels: 1200/-1 1100/-1 1000/-1 900/-1 800/-1 700/-1 600/-1 525/-1 450/-1 375/-1 300/-1 225/-1 150/-1 75/-1
> 		kern.timecounter.choice: TSC(800) ACPI-fast(1000) i8254(0) dummy(-1000000)
> 		kern.timecounter.hardware: ACPI-fast

The timecounter is not really involved here.  It is only used to check
the time (not quite correctly) after the timeout.  That would fix avoid
the problem if the timeout is too short but not if it is too long.

>> Fix:
>
> 	The ideal solution would be to use a clock who's frequency
> 	is not jerked around by speedstep.  Perhaps this is just a
> 	hardware bug, but seem to recall seeing this behavior on
> 	my previous Intel Centrino based laptop as well.

The i8254 timer (not timecounter) is supposed to have this property.
Maybe the lapic timer doesn't.

> 	Fixing nanosleep(2) (and select(2)) alone would be relatively
> 	easy:  Since they loop, returning to the user only when the
> 	correct wakeup time has arrived (microtime(9) is apparently
> 	not affected by this problem), one could just have tvtohz(9)
> 	return the number of ticks based on the _lowest_ cpu frequency
> 	rather than the _highest_.  Unfortunately, this makes other
> 	users of tvtohz(9) wake up early, and they may not be as
> 	prepared to handle this.

Yes, that should be OK as a workaround.  One of the things that
nanosleep() etc. don't do quite right is related: for very long sleeps,
the calculated timeout may be more than 1 tick too long due to clock
drift or just the limited resolution of the scale factor used in
tvtohz().  That should be handled by using the _lowest_ possible scale
factor rather than the nominal one.  This could also be used to ensure
that the final timeout is minimal (tvtohz() rounds up and then adds 1
to ensure that the timeout is long enough, so an average timeout is
1.5 ticks longer than strictly necessary; by not adding 1 but checking
whether the timeout has expired on waking up, it is possible to make
an average timeout only 0.5 ticks longer than necessary).

There should be a new interface for callers that are prepared to handle
this (or they can subtract 1 and rescale).

Waking up early also wastes time so it shouldn't usually be done.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070209152716.K1383>