Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 4 Oct 2019 00:11:35 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Bruce Evans <brde@optusnet.com.au>,  Sebastian Huber <sebastian.huber@embedded-brains.de>,  FreeBSD <freebsd-hackers@freebsd.org>
Subject:   Re: Why is tc_get_timecount() called two times in tc_init()?
Message-ID:  <20191003214837.L1757@besplex.bde.org>
In-Reply-To: <20191003084021.GI44691@kib.kiev.ua>
References:  <0e27fb3e-0f60-68e1-dbba-f17c3d91c332@embedded-brains.de> <20191002140040.GA44691@kib.kiev.ua> <20191003013314.O2151@besplex.bde.org> <20191002163946.GE44691@kib.kiev.ua> <20191003030837.C2787@besplex.bde.org> <20191003084021.GI44691@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 3 Oct 2019, Konstantin Belousov wrote:

> On Thu, Oct 03, 2019 at 03:12:26AM +1000, Bruce Evans wrote:
>> On Wed, 2 Oct 2019, Konstantin Belousov wrote:
>>
>>> On Thu, Oct 03, 2019 at 02:25:46AM +1000, Bruce Evans wrote:
>>>> On Wed, 2 Oct 2019, Konstantin Belousov wrote:
>>> So the conclusion is that the second call can be removed, am I right ?
>>
>> Yes.
>>
>> All tc_get_timecount() functions should be checked for doing sufficient
>> initialization in one call (so that deltas for subsequent calls are
>> correct).
>
> This should be it.

Hmm, there are a lot of calls from subsystems that shouldn't know about
this.

> But, is even a single call to tc_get_timecount() needed for "warmup" ?

Yes, something must initialize the hardware timecounter if it is not already
initialized (and free-running).  The i8254 timecounter is a good example.
It has internal state that is only updated while it is active.

Something must also initialize the software timecounter.  This seems to have
been broken since 2002 when switch_timecounter() was removed.  Most of the
software state is initialized early, but th_offset_count is initialized
late in tc_windup(), so going live with 'timecounter = newtc' never works.
The order in sysctl_kern_timecounter_hardware() is:

 	...tc_get_timecount(newtc);	/* initialize h/w tc. */
 	/* Fail to initialize newtc->th_offset with new h/w value. */
 	timecounter = newtc;		/* go live with uninitialized tc. */

> diff --git a/sys/dev/acpica/acpi.c b/sys/dev/acpica/acpi.c
> index 382d139617d..28d0d738a58 100644
> --- a/sys/dev/acpica/acpi.c
> +++ b/sys/dev/acpica/acpi.c
> @@ -3191,7 +3191,6 @@ acpi_resync_clock(struct acpi_softc *sc)
>      * Warm up timecounter again and reset system clock.
>      */
>     (void)timecounter->tc_get_timecount(timecounter);
> -    (void)timecounter->tc_get_timecount(timecounter);
>     inittodr(time_second + sc->acpi_sleep_delay);
> }

Fairly bad code with bogus comment.  Comment more bogus than before.
The timecounter hasn't been warmed up (or even initialized) before the
comment.  "again" in the comment refers to removed second tc call.

The actual initialization is done by inittodr().  No tc initialization
or warming should be needed before it it.  It first reads the RTC, then
calls tc_setclock() to set the timecounter to the value read.  There is
not enough locking around this (real-time locking to guarantee that the
value read is not too out of date when it is used...).

tc_setclock() now has mutex locking which might be good enough if it were
around the whole operation.  But it isn't even around all of itself, and
tc_setclock() depends on the timecounter being initialized at this point.
I think th_offset is used uninitialized after resume.  Various bugs and
magic limit the damage from this.  There is the large bug that
CLOCK_MONOTONIC stops while suspended.  I think acpi does start its timer
earlier and the comment about "warming up again" is partly correct for a
single tc call -- the call doesn't warm up again, but further warming
which is probably unnecessary.  The uninitialized th_offset provides a
garbage delta, but tc_setclock() compensates using other deltas:

XX void
XX tc_setclock(struct timespec *ts)
XX {
XX 	struct timespec tbef, taft;
XX 	struct bintime bt, bt2;
XX 
XX 	timespec2bintime(ts, &bt);
XX 	nanotime(&tbef);

Race above here by not locking yet.

Use th_offset uninitialized above and below here in some cases.
nanotime() gives a garbage time after acpi resume.  Since the acpi
timer mask is fairly large and the acpi timer frequency is fairly small,
there is a chance that overflow doesn't occur so that the garbage time
is almost correct as a relative time (the correct monotonic time less
the suspended time).

XX 	mtx_lock_spin(&tc_setclock_mtx);

Now have almost adequate locking.

XX 	cpu_tick_calibrate(1);
XX 	binuptime(&bt2);
XX 	bintime_sub(&bt, &bt2);

This gives an almost correct relative time with some magic:
- inittodr() passed us the RTC time which is absolute and now in bt
- bt2 is garbage calculated using the garbage th_offset
- whatever garbage bt2 is, the final bt gives the delta needed to
   recover the RTC provided we keep using the garbage offset and any
   overflows from this give consistent deltas.

XX 
XX 	/* XXX fiddle all the little crinkly bits around the fiords... */
XX 	tc_windup(&bt);

tc_windup() finally initializes th_offset.  Now the deltas aren't from
a consistent garbage offset, but the damage is limited.  The bt arg
here is only used to subvert boottimebin to a non-boottime so that
adding the broken monotic time to the boot time gives the RTC time
modulo the consistency bugs near here and drift.  Monotonic times use
the new offset and are only especially broken if that goes backwards
over suspend/resume.

It is unclear if other subsystems can call timecounter code during
resume (or suspend) before the timecounter is fully reinitialized here.

XX 	mtx_unlock_spin(&tc_setclock_mtx);
XX 	...

>
> diff --git a/sys/dev/acpica/acpi_timer.c b/sys/dev/acpica/acpi_timer.c
> index 34a832089a2..d768397a785 100644
> --- a/sys/dev/acpica/acpi_timer.c
> +++ b/sys/dev/acpica/acpi_timer.c
> @@ -274,7 +274,6 @@ acpi_timer_resume_handler(struct timecounter *newtc)
> 			    "restoring timecounter, %s -> %s\n",
> 			    tc->tc_name, newtc->tc_name);
> 		(void)newtc->tc_get_timecount(newtc);
> -		(void)newtc->tc_get_timecount(newtc);
> 		timecounter = newtc;
> 	}
> }

Oops, here is the first incomplete reinitialization and warmup.

It clearly uses th_offset uninitialized, just like for switching the
timecounter hardware.

This function (and similarly acpi_timer_suspend_handler()) seem to be mostly
wrong.  When the acpi timecounter is the current timecounter, suspend doesn't
schedule resume and resume has an unnecessary check.  When the acpi
timecounter is not the current timecounter, suspend forces a switch to it
and schules resume to undo this foot-shooting.  Supsend and resume do the
usual buggy switch with a warmup or 2 but th_offset uninitialized.

Note that acpi_resync_clock() for resume (where inittodr() is called) syncs
the current timecounter() and doesn't belong under acpi.

> diff --git a/sys/dev/xen/control/control.c b/sys/dev/xen/control/control.c
> index 98ab5bf3a6b..1e132f4d866 100644
> --- a/sys/dev/xen/control/control.c
> +++ b/sys/dev/xen/control/control.c
> @@ -303,7 +303,6 @@ xctrl_suspend()
> 	 * Warm up timecounter again and reset system clock.
> 	 */
> 	timecounter->tc_get_timecount(timecounter);
> -	timecounter->tc_get_timecount(timecounter);
> 	inittodr(time_second);

Similar with "again" in comment.  Actually worse than that: for acpi in
the foot-shooting case the acpi timecounter was warmed in suspend when it
was bogusly activated; in resume is inactivated and the original timecounter
warmed (but not again) and reactivated.  For xen, there is no corresponding
switching and just this misplaced incomplete reinitialization.


>
> #ifdef EARLY_AP_STARTUP
> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> index 9920a9a9304..847fbbbf35d 100644
> --- a/sys/kern/kern_tc.c
> +++ b/sys/kern/kern_tc.c
> @@ -1257,7 +1257,6 @@ tc_init(struct timecounter *tc)
> 	    tc->tc_frequency < timecounter->tc_frequency)
> 		return;
> 	(void)tc->tc_get_timecount(tc);
> -	(void)tc->tc_get_timecount(tc);
> 	timecounter = tc;
> }
>

This might work for the very first initialization.

> @@ -1519,7 +1518,6 @@ sysctl_kern_timecounter_hardware(SYSCTL_HANDLER_ARGS)
>
> 		/* Warm up new timecounter. */
> 		(void)newtc->tc_get_timecount(newtc);
> -		(void)newtc->tc_get_timecount(newtc);
>
> 		timecounter = newtc;
>

Original problem only appeared to be cosmetic.

> @@ -2011,7 +2009,6 @@ inittimecounter(void *dummy)
>
> 	/* warm up new timecounter (again) and get rolling. */
> 	(void)timecounter->tc_get_timecount(timecounter);
> -	(void)timecounter->tc_get_timecount(timecounter);
> 	mtx_lock_spin(&tc_setclock_mtx);
> 	tc_windup(NULL);
> 	mtx_unlock_spin(&tc_setclock_mtx);
>

This might work since it doesn't defer the call to tc_windup().

"again" in its comment seems to be bogus. "twice" might have been correct
but banal for the 2 calls.

This function is a SYSINIT() so it is unclear how much is initialized before
it.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20191003214837.L1757>