Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Jan 2008 00:11:34 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        "Oliver B. Warzecha" <obw@amarok.ping.de>
Cc:        FreeBSD-gnats-submit@FreeBSD.org, freebsd-i386@FreeBSD.org
Subject:   Re: i386/119574: 7.0-RC1 times out in calibrate_clocks()
Message-ID:  <20080113225917.Y50887@delplex.bde.org>
In-Reply-To: <200801112032.m0BKWFXg001186@karnevil9.amarok.ping.de>
References:  <200801112032.m0BKWFXg001186@karnevil9.amarok.ping.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 11 Jan 2008, Oliver B. Warzecha wrote:

>> Description:
> The 7.0-RC1 GENERIC kernel fails to recognize that the seconds counter
> changes during the loops in calibrate_clocks() in sys/i386/isa/clocks.c
>
> The same machine passes this routine quickly with 6.2-RELEASE-p8. It
> is a 200 MHz AMD K6 on an ASUS P55T2P4S board.

Oops.

> Through generous use of debugging printf()'s while searching, I could
> limit the possible causes to this lines:
> [... moved this later]

> As I noticed that rtcin() was reworked during the change to 7.0,
> (http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/isa/clock.c.diff?r1=1.222.2.2;r2=1.239.2.1;f=h)
> it occurred to me that perhaps there might be some condition
> in the new code.

Probably.  I wrote both.  Apparently I optimized rtcin() too much.
Try adding some delays to it, and/or more debugging code.  For debugging
code, just read the RTC index register and verify that it is set to
the memory value that is supposed to track it.  Check after writing
to it and when the write is avoided because it already has the desired
value according to the memory value.

> As I said, on a 6.2-RELEASE-p8 kernel it all goes smoothly, I just
> checked with boot -v:
> Calibrating clock(s) ... i8254 clock: 1193296 Hz
> CLK_USE_I8254_CALIBRATION not specified - using default frequency
> Timecounter "i8254" frequency 1193182 Hz quality 0
>
> It does *not* say "failed, using default i8254 clock [...]" which
> would indicate the "fault" label.

The calibration isn't really used unless you configure
CLK_USE_I8254_CALIBRATION.  Otherwise, it is just a sanity check. I
stopped using this configuration a few years ago when I switched to a
more dynamic calibration scheme.

>> Fix:
> One possible workaround would be to reset the timeout variable
> to a more sane value, for the first loop it gets set to
> 100000000. But then this loop seems to work okay, so I have
> no idea what a sane value would be. At least it would fail
> fast.

100 million is supposed to be 100 seconds with a large safety margin.
This depends on rtcin() taking at least 1 usec by doing slow
hardware accesses.  But 100 seconds is too long and rtcin() normally
takes much longer than 1us.  I reduced it to 1 million locally.  You
should reduce it to debug this (so that it times out after < 10
seconds).

>        timeout = 100000000;
>
>        /* Read the mc146818A seconds counter. */
>        for (;;) {
>                if (!(rtcin(RTC_STATUSA) & RTCSA_TUP)) {
>                        sec = rtcin(RTC_SEC);
>                        break;
>                }
>                if (--timeout == 0)
>                        goto fail;
>        }
> //        printf("seconds counter read ... \n");

The optimization might break this by going too fast -- it now reads the
status register as fast as possible, where before it had lots of delays.
It still has the delays (supposed to be done more carefully than before
when switching to reading the seconds register, so I would expect the
read of the seconds register to be as correct as before, but it apparently
isn't.

>        /* Wait for the mC146818A seconds counter to change. */
>        start_sec = sec;
>        for (;;) {
>                if (!(rtcin(RTC_STATUSA) & RTCSA_TUP)) {
>                        sec = rtcin(RTC_SEC);
>                        if (sec != start_sec)
>                                break;
>                }
>                if (--timeout == 0)
>                        goto fail;
>        }
> //        printf("seconds counter changed ... \n");
> (my debugging printf()s commented out with C++ comments)

This should behaves similarly.

> The first output "seconds counter read" gets printed very fast.

Do you think it gets printed too fast?  It should take 1/2 second on
average to reach the next seconds boundary, then another second to
calibrate.  Configure CLK_CALIBRATION_LOOP and boot with -v to see
the calibration repeated precisely every second if it is working.

> After that the machine counts back from 0 to 0 and after some minutes
> he falls out of the loop and "seconds counter changed" gets never
> printed.

You could also try watching the RTCSA_TUP bit.  This should change every
second and stay set for a while.  Some logic would be required to limit
printfs to 1 per change.  This bit alone could be used to detect seconds
boundaries.

I just noticed that I reduced the accuracy of the calibration by adding
delays to rtcin().  I had the delays commented out in my version but put
them back when I optimized the usual case of rtcin() being called from
the RTC interrupt handler (the index register is normally constant then).
The (relative) accuracy is limited the time that it takes to read the
seconds register and the time it takes to read the I8254, and though
the code is arranged so that the inaccuracy is much less than the sum
of these times, increasing the times doesn't help.  There is probably
also more jitter from branch misprediction in rtcin() when switching
indexes.  So watching only the RTCSA_TUP bit would be good for accuracy
too.  The seconds register could be read in non-time-critical code
after reading all the other registers as a sanity check (check that it
is incremented by 1; the current code only checks that it changes).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080113225917.Y50887>