Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Jan 2018 22:56:29 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Warner Losh <imp@freebsd.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r327767 - in head/sys: conf i386/bios i386/conf i386/isa
Message-ID:  <20180113203848.L2336@besplex.bde.org>
In-Reply-To: <201801101459.w0AExJWM055025@repo.freebsd.org>
References:  <201801101459.w0AExJWM055025@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 10 Jan 2018, Warner Losh wrote:

> Log:
>  Retire pmtimer driver. Move time fixing into apm driver. Move
>  Iwasaki-san's copyright over. Remove FIXME code that couldn't possibly
>  work. Call tc_settime() with our estimate of the delta we've been
>  alseep (the one we print) to adjust the time. Not sure what to do
>  about callouts, so keep the small #ifdef in place there.

The FIXME code might have worked before timecounters, but never worked in
any committed version.  You seem to have updated it correctly for
timecounters in this commit, but then broke it by removing it in a later
commit (see another reply).

This commit obfuscates the fix by combining moving of the file with changing
it.  svn is also very broken -- it doesn't show history of the file under
its old name.  This is more broken than usual -- since the file was merged
into a different file, svn doesn't show history of the file under its new
name either.

The callout fixup code is more broken.  It never compiled in any committed
version, since it uses data structures and logic that went away before it
was committed.  The option for it (APM_FIXUP_CALLTODO) is so broken that it
is not a supported option or in LINT, so it is not normally noticed that
the code under it never compiled.

Long callouts are especially broken.  E.g., nanosleep(1 hour) at least
used to sleep for 2 hours if the system is suspended for 1 hour during
the sleep.  I haven't noticed any fixes for this.  Callouts mostly use
relative timeouts, and relative timeouts just don't work across suspensions.

Actually, I have noticed some fixes for nanosleep().  Another bug in it
was that it is specified to sleep in the time specified by the clock id
CLOCK_REALTIME, but used to sleep in the time specified by the clock id
CLOCK_MONOTONIC (the latter should be as close as possible to the former
except for leap seconds adjustments, but is quite broken -- it doesn't
count suspended time, and non-leap seconds settings of CLOCK_REALTIME
also make CLOCK_MONOTONIC useless for determining physical time differences.
nanosleep() now sleeps on the correct clock id, and clock_nanosleep() is
now implemented.  This is complicated, and I haven't checked if the fix
works.  It looks to simple to work.  To work, it needs to do things like
adjust timers whenever the timecounter jumps (mainly for resume and leap
seconds adjustments).  APM_FIXUP_CALLTODO only tries to handle the easier
case of resume.

> Modified: head/sys/conf/files.i386
> ==============================================================================
> --- head/sys/conf/files.i386	Wed Jan 10 14:58:58 2018	(r327766)
> +++ head/sys/conf/files.i386	Wed Jan 10 14:59:19 2018	(r327767)
> @@ -520,7 +520,6 @@ i386/ibcs2/ibcs2_util.c		optional ibcs2
> i386/ibcs2/ibcs2_xenix.c	optional ibcs2
> i386/ibcs2/ibcs2_xenix_sysent.c	optional ibcs2
> i386/ibcs2/imgact_coff.c	optional ibcs2
> -i386/isa/pmtimer.c		optional pmtimer
> i386/isa/prof_machdep.c		optional profiling-routine
> i386/linux/imgact_linux.c	optional compat_linux
> i386/linux/linux_dummy.c	optional compat_linux

pmtimer used to be needed all systems with acpi except amd64, because the
acpi fixup of timecounters was bogusly ifdefed for amd64 only (or was not
done by acpi in very old versions?).

The pmtimer code for this fixup is still better tha the acpi code, except
the FIXME made it unused and the next commit dumbs it down to worse than
the acpi code (the acpi code is at least simpler).

> Modified: head/sys/i386/bios/apm.c
> ==============================================================================
> --- head/sys/i386/bios/apm.c	Wed Jan 10 14:58:58 2018	(r327766)
> +++ head/sys/i386/bios/apm.c	Wed Jan 10 14:59:19 2018	(r327767)
> Modified: head/sys/i386/bios/apm.c
> ==============================================================================
> --- head/sys/i386/bios/apm.c	Wed Jan 10 14:58:58 2018	(r327766)
> +++ head/sys/i386/bios/apm.c	Wed Jan 10 14:59:19 2018	(r327767)
> @@ -205,7 +232,7 @@ static int
> apm_driver_version(int version)
> {
> 	struct apm_softc *sc = &apm_softc;
> -
> +
> 	sc->bios.r.eax = (APM_BIOS << 8) | APM_DRVVERSION;
> 	sc->bios.r.ebx  = 0x0;
> 	sc->bios.r.ecx  = version;

Unrelated style fixes make the important changes harder to see.  This
patch was to remove trailing whitespace, but some mail program broke it
by removing the whitespace in the source lines.

[... further mangled patches for whitespace not shown]

> @@ -331,12 +358,10 @@ apm_battery_low(void)
> static struct apmhook *
> apm_add_hook(struct apmhook **list, struct apmhook *ah)
> {
> -	int s;
> 	struct apmhook *p, *prev;
>
> 	APM_DPRINT("Add hook \"%s\"\n", ah->ah_name);
>
> -	s = splhigh();
> 	if (ah == NULL)
> 		panic("illegal apm_hook!");
> 	prev = NULL;
> @@ -351,30 +376,25 @@ apm_add_hook(struct apmhook **list, struct apmhook *ah
> 		ah->ah_next = prev->ah_next;
> 		prev->ah_next = ah;
> 	}
> -	splx(s);
> 	return ah;
> }

This commit also removes spl*() with no mention of this in the log message.
In FreeBSD-4 where spl's were not null, splhigh() was probably wrong, but it
was much better than the splsoftclock() used in pmtimer.c.  Hard clock
interrupts should be disabled, and my fixes for this change softclock to
statclock + XXX.  It happens that splstatclock() == splclock() == splhigh(),
so these splhigh()s were good enough.

Now the locking is obscure.  I think it is just Giant, and that is not
enough.  The spl's should not be removed until the locking is done properly.
The splsoftclock()'s were removed even earlier in pmtimer.c, without doing
any proper locking of course.

> static void
> apm_del_hook(struct apmhook **list, struct apmhook *ah)
> {
> -	int s;
> 	struct apmhook *p, *prev;
>
> -	s = splhigh();
> 	prev = NULL;
> 	for (p = *list; p != NULL; prev = p, p = p->ah_next)
> 		if (p == ah)
> 			goto deleteit;
> 	panic("Tried to delete unregistered apm_hook.");
> -	goto nosuchnode;
> +	return;
> deleteit:
> 	if (prev != NULL)
> 		prev->ah_next = p->ah_next;
> 	else
> 		*list = p->ah_next;
> -nosuchnode:
> -	splx(s);
> }

Locking for device removal is especially necessary and badly done.  New-bus
forces lots of Giant locking which is probably not enough.

> @@ -1047,6 +1067,53 @@ apm_processevent(void)
> 	} while (apm_event != PMEV_NOEVENT);
> }
>
> +static struct timeval suspend_time;
> +static struct timeval diff_time;
> +
> +static int
> +apm_rtc_suspend(void *arg __unused)
> +{
> +
> +	microtime(&diff_time);
> +	inittodr(0);
> +	microtime(&suspend_time);
> +	timevalsub(&diff_time, &suspend_time);
> +	return (0);
> +}

This calculates the difference for use later, using a hackish method:
- this used to be bogusly locked by 
- microtime(&diff_time) gives the current time in a good way
- start hacking with inittodr(0).  inittodr() reads the time from the
   RTC and adds the nasty global timezone offset to it.  Without the
   latter, the new time set by inittodr() would be off by up to 12 hours.
   With the latter, it might still be off by 1 hour for a DST adjustment,
   depending on races between this and settimeofday(2) and adjkerntz(8)
   and certain racy sysctls, and it it is usually off by many seconds due
   to different drift of the RTC and the current time.  The purpose of
   this function is to determine the drift at suspend time so as to add
   it back at resume time.  This also handles timezone offsets even if
   inittodr() doesn't add them, and might reduce races for adjusting
   the timezone offset for DST.

> +
> +static int
> +apm_rtc_resume(void *arg __unused)
> +{
> +	u_int second, minute, hour;
> +	struct timeval resume_time, tmp_time;
> +	struct timespec ts;
> +
> +	/* modified for adjkerntz */

Cryptic comment from old code.  This is related to the timezone offset,
but I don't see how adjkerntz benefits in practice from old modifications
made here.  It only obviously benefits in theory from smaller races.

> +	timer_restore();		/* restore the all timers */

Old banal comment.

timer_restore() function only exists on i386, and only restores 2 isa timers
(i8254 and rtc), so moving its code out of isa is a layering violation.

> +	inittodr(0);			/* adjust time to RTC */

Banal comment.  Also wrong when the RTC time is local.

> +	microtime(&resume_time);
> +	getmicrotime(&tmp_time);

The previous line is bad old code.  We want to do arithmetic using
resume_time and should copy it to tmp_time.  Instead, we use the
getmicrotime() mistake to fetch a truncated copy of a time slightly
later than resume_time.

> +	timevaladd(&tmp_time, &diff_time);

diff_time is the difference at suspend time and this adds it to resume_time
(modulo the truncation).  This gives correct-as-possible resume_time.

> +	/* Calculate the delta time suspended */
> +	timevalsub(&resume_time, &suspend_time);
> +
> +	second = ts.tv_sec = resume_time.tv_sec;
> +	ts.tv_nsec = 0;
> +	tc_setclock(&ts);

This used to be better, except tc_setclock() was 'time = tmp_time' under
FIXME.  The calculation of the time suspended is convolved with fixing
up the timecounter, and there is a comment about the relatively simple
calculation of the time suspended and none about the convulutions or
the more delicate fixup.

> +
> +#ifdef PMTIMER_FIXUP_CALLTODO
> +	/* Fixup the calltodo list with the delta time. */
> +	adjust_timeout_calltodo(&resume_time);
> +#endif /* PMTIMER_FIXUP_CALLTODO */

This has several style bugs together with its option being unusable.

It further abuses resume_time.  resume_time was initially actually the
resume time before the fixup.  The fixed up resume time is now in ts.
We calculated this using tmp_time to keep the initial resume_time
alive for using in another caclulation.  Then the other calculation
abuses resume_time for the delta time needed by the uncallable fixup.
The comment unobfuscates this a but by describing the abused resume_time
as the delta time.

> +	hour = second / 3600;
> +	second %= 3600;
> +	minute = second / 60;
> +	second %= 60;
> +	log(LOG_NOTICE, "wakeup from sleeping state (slept %02d:%02d:%02d)\n",
> +		hour, minute, second);
> +	return (0);

With the FIXME version, almost everything was unused except for printing
this msssage.  This message is useful for debugging, but acpi never bothered
doing it.

> +}

In the version with the FIXME, the careful fixups done by pmtimer reduce
to just inittodr(0) plus printing the delta.  In this version, the timecounter
fixup actually works so the careful setup for this is undone, but things are
convoluted with the unreachale callout fixup and printing the delta.

acpi just does 2 timecounter warmups followed by inittodr(time_second +
scp->acpi_sleep_delay).  Here the arg is differently wrong.  An arg of 0
says that we have no idea of the time and the raw RTC time should be used.
time_second is the time in seconds before the system is fully resumed.  This
is probably the time in seconds when the system was suspended (since
interrupts didn't run during suspension and if clock interrupts occurred
before here, they would do nothing good.  acpi_sleep_delay is just a forced
delay during suspension.  If it works at all, then updates of time_second
have halted before the delay, and the arg is our best guess of the actual
suspension time.  IIRC, the arg is only used in the unlikely event that
reading the RTC fails.  pmtimer doesn't bother passing a nonzero arg.

Bugs here are masked by other bugs:
- at least x86 RTC-reading code is sloppy and only reads the RTC to the
   nearest second.  So we expect an error of a second or 2 after suspend/
   resume.  This is too large for ntpd to fix well, so the best that can
   happen (without a resume script that repeats boot time ntpd initialization)
   is that ntpd is running and quickly notices that the offset is too large
   and steps the clock to fix it.  Backwards steps are very bad, so the resume
   code should try to set to an earlier time.
- when nptd is running, the current time may have been written to the RTC
   quite often.  This probably keeps the timecounter fixup less than a few
   seconds although x86 RTC-writing code is sloppy and gives another source
   of errors.
- if an error related to the timezone offset occurs or if the timecounter
   fixup is not applied, then the enormous error from this might be a feature.
   Hopefully ntpd is running and recovers faster if the error is enormous.
   (I think it actually doesn't do this right.  It has a limit like 900
   seconds and can step more than that to act a but like ntpdate when invoked
   with certain options, but you normally want it to do only 1 large step.
   Restarting it with these options after every resume is best.)

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180113203848.L2336>