Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Oct 2008 03:51:02 +1100 (EST)
From:      Ian Smith <smithi@nimnet.asn.au>
To:        Jeremy Chadwick <koitsu@freebsd.org>
Cc:        bf <bf2006a@yahoo.com>, freebsd-stable@freebsd.org
Subject:   Re: Recent Problems with RELENG_7 i386
Message-ID:  <20081010023938.R16723@sola.nimnet.asn.au>
In-Reply-To: <20081009051214.GA94941@icarus.home.lan>
References:  <20081008183652.GA83351@icarus.home.lan> <501797.33750.qm@web39105.mail.mud.yahoo.com> <20081009051214.GA94941@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 8 Oct 2008, Jeremy Chadwick wrote:
 > On Wed, Oct 08, 2008 at 10:00:32PM -0700, bf wrote:
 > > --- On Wed, 10/8/08, Jeremy Chadwick <koitsu@FreeBSD.org> wrote:
[..]
 > > > > Oct  8 11:00:40 myhost kernel: t_delta
 > > > 15.fd80bdcb75b60200 too short
 > > > 
 > > > This comes from src/sys/kern/kern_tc.c, around line 908. 
 > > > I'm not
 > > > familiar with the kernel, but two ideas come to mind:
 > > > 
 > > > 1) If you have Intel SpeedStep (EIST) or AMD
 > > > Cool'n'Quiet enabled in
 > > > your BIOS, try disabling it,
 > > > 
 > > > 2) If you're using powerd, disable it (I don't see
 > > > it enabled),
 > > > 
 > > > 3) Try keeping HZ at 1000 (the default).
 > > > 
 > > 
 > > Thanks, Jeremy, for taking the time to consider my question and reply.
 > > 
 > > My CPU is pre-Cool'n'Quiet, and as far as I can tell I had disabled
 > > all forms of power management that may affect the clock speeds.  I have
 > > found that by raising kern.hz to 250, or by using the default, I no
 > > longer receive the t_delta is too short messages, and the other problems
 > > are no longer apparent.  My question is: why did this occur now?
 > 
 > I don't know.  We can't rewind time and find out system parameters and
 > kernel details from 6 months ago.  :-)
 > 
 > I'm thinking it might have something to do with the timecounter selected
 > by the kernel, but as I said, we can't rewind time to find out what
 > things were in the past.
 > 
 > The kernel environment variables I'm talking about are kern.timecounter.
 > "sysctl kern.timecounter" could help shed some light here, maybe.  It
 > would at least allow us to see what timecounters are available on your
 > system, and if a bad/unreliable one is being selected automatically.

I see bf has since posted these values, but I'd already clipped stuff 
from the original post with kernel config and verbose dmesg, already 
wondering why the two didn't match, like:

 | options         HZ=1000
 | options         DEVICE_POLLING

but

 | CPU: AMD Athlon(tm) Processor (906.35-MHz 686-class CPU)
 | Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
 | inittimecounter(0)... Timecounters tick every 10.000 msec

ie HZ=100, as mentioned, and using ACPI-safe as later confirmed.  So 
it's either a different kernel or bf updated kern.hz from loader.conf?

 > > I have been using a similar configuration for months now without any
 > > apparent problems. My original goal in using a lower kern.hz was to
 > > avoid burdening my machine with excessive context switching.
 > 
 > This is over my head, technically.  I would need to pull John Baldwin
 > into this, since he knows a bit about both (timecounters and context
 > switching).  I'm just a simple caveman..... :-)

Me too, but as I get to run much slower gear than many here, I have some 
small insight into what timer ticks work well with older kit.  Not that 
a 900MHz Athlon should have any trouble with HZ=1000 at all, whereas on 
a 300MHz Celeron that's way too fast, pushing idle load up considerably.

 > > I saw the relevant section of kern_tc.c before I wrote my first
 > > message, but when skimming through the changes in RELENG_7 over the
 > > past week or two, I couldn't see any commit that may have directly
 > > affected kernel timekeeping.  Has some new workload been imposed on
 > > the system by recent changes, that may have made a kern.hz of 100
 > > insufficient?  Is this tuneable setting properly implemented, so that
 > > all parts of the base system are using it's current value rather than
 > > the default?  Could some of my hardware, such as my RTC, be
 > > malfunctioning?
 > 
 > Well, I believe HZ was increased from 100 to 1000 long ago (RELENG_6?)
 > as a default.  I'm really not sure of the implications of decreasing it,
 > besides having less granularity for some things (the only things I know
 > of would be something pertaining to firewalls, I just can't remember
 > what.  My brain is full.  :-) )

You need a day off :)  But yes, RELENG_5 still had HZ=100 default, long 
after the 'average' CPU clock frequency was 10 or more times faster than 
the 166MHz Pentiums and such (mostly then on only 100Mbps ethernet) that 
were comfortable at 100Hz slicing.  1000Hz was a big shift to catch up.

In a day or so playing around with it years ago, I found 200-250Hz good 
for 300MHz, 500Hz a bit much, 1000Hz way too busy, and find my 1133MHz 
P3-M happy enough at 1000Hz, though I've done no specific tests on it.

Some people had perhaps similar clock issues when their fast processors 
were throttling/stepping down to very low speeds (100, even 75MHz) while 
still slicing at 1000Hz, which I didn't find too surprising.  Limiting 
minimum CPU freq to 300Mz or more seemed to solve many such issues, but 
I haven't your perseverance for digging up the relevant threads ..

Even in 5.5-S (/sys/conf/NOTES and /sys/i386/conf/NOTES) HZ=1000 or 2000 
was suggested for DEVICE_POLLING (which bf included in config, though 
maybe it's not enabled?) and HZ=1000 or more was recommended when using 
DUMMYNET with ipfw - to provide smoother queue dispatching, I gather.

Bottom line, IMHO, bf should probably run the default 1000Hz, 500 at 
least, on an Athlon 900.  With powerd, maybe set min. freq >= 150MHz?

cheers, Ian



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081010023938.R16723>