Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 Mar 2017 14:44:50 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Eric van Gyzen <vangyzen@freebsd.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r315694 - head/lib/libc/sys
Message-ID:  <20170322120102.P963@besplex.bde.org>
In-Reply-To: <201703220050.v2M0ob0p012506@repo.freebsd.org>
References:  <201703220050.v2M0ob0p012506@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 22 Mar 2017, Eric van Gyzen wrote:

> Log:
>  clock_gettime.2: add some clock IDs
>
>  Add the CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clock_id
>  values to the clock_gettime(2) man page.  Reformat the excessively
>  long paragraph (sentence!) into a tag list.

Some old bugs are now clearer.

> Modified: head/lib/libc/sys/clock_gettime.2
> ==============================================================================
> --- head/lib/libc/sys/clock_gettime.2	Tue Mar 21 22:41:37 2017	(r315693)
> +++ head/lib/libc/sys/clock_gettime.2	Wed Mar 22 00:50:36 2017	(r315694)
> ...
> +.Pp
> +.Bl -tag -width indent -compact
> +.It Dv CLOCK_REALTIME
> +.It Dv CLOCK_REALTIME_PRECISE
> +.It Dv CLOCK_REALTIME_FAST
> +Increments as a wall clock should.

How should a wall clock increment? :-)  Not with leap seconds
adjustments or even DST adjustments, since those are confusing for
humans, and normally aren't even increments.  Some DST adjustments
are large decrements, and all leap seconds adjustments have been
positive, which causes the wall clock to sort of decrement, but this
is usually implemented by not incrementing.  I haven't seen a wall
clock that displays the 61st second of a minute as ":60" or even
stops at ":59".  Implementing this extra or sticking for a real
clockwork clock would be interesting.  Certainly wall clocks in
POSIX time, which are what this syscall exists.  POSIX time is
broken as designed, especially for real times, so nothing "should"
track it.  I think FreeBSD currently implements leap seconds in
POSIX time, so its CLOCK_REALTIME doesn't increment, but leaps back
by 1 second, then increments across the second again.  This is very
bad or large networks, so some systems use a hack to slew the adjustment
over a very long period.

Also, CLOCK_REALTIME_PRECISE is an unportable alias for CLOCK_REALTIME,
and CLOCK_REALTIME_FAST is a bad optimization which doesn't increment
like a wall clock should.

> +.It Dv CLOCK_MONOTONIC
> +.It Dv CLOCK_MONOTONIC_PRECISE
> +.It Dv CLOCK_MONOTONIC_FAST
> +Increments in SI seconds.

These don't increment in SI seconds, since they are the same as
CLOCK_UPTIME_* which are documented to not increment in some cases.
CLOCK_MONOTONIC_PRECISE and CLOCK_MONOTONIC_FAST are unportable, so they
can do anything, but POSIX requires CLOCK_MONOTONIC to increment like a
clock should, except even more fuzzily than that.  POSIX requires that
CLOCK_MONOTONIC returns a value "representing" the amount of time (in
seconds or nanoseconds) since an unspecified time in the past which does
not change after system startup.  I don't know exactly what an SI second
is, or how an "amount of time" differs from a time or how bad the
"representation" can be (it is not required to be accurate to any
amount in SI seconds, whatever they are), but clearly the value "should"
be as close as the system can get to physical seconds.  In another section,
POSIX specifies that CLOCK_MONOTONIC shall not jump backwards and that the
implementation shall have an implementation-defined limit on forward jumps.
"Implementation-defined" requires documentation, but there is nothing about
this near here.  Of course, since the "representation" doesn't seem to
have any limits or require documentation, it can conceal arbitrary large
jumps.

> +.It Dv CLOCK_UPTIME
> +.It Dv CLOCK_UPTIME_PRECISE
> +.It Dv CLOCK_UPTIME_FAST
> +Starts at zero when the kernel boots and increments
> +monotonically in SI seconds while the machine is running.

This is almost correct, but means that CLOCK_UPTIME doesn't give
the uptime.  It is not stated when the machine is running, but in
practice it isn't running for the purposes of this statement when
it is suspended.  On resume, CLOCK_REALTIME is updated (not like
a wall clock should be, but with larger jumps and inaccuracies
(usually acpi does it and it is off by a few seconds and ntpd
struggles to fix this, but sometimes acpi doesn't do this so it
is off by a few hours and ntpd gives up and it takes the user
restarting ntpd or a more manual fixup to fix this).  CLOCK_REALTIME
is is not updated on resume, except accidentally by the next bug.

CLOCK_UPTIME is also broken by stepping CLOCK_REALTIME.  This is
misimplemented by stepping boottime.  If the step is to adjust for
an initially wrong (boot) time, then this adjustment is correct --
the boot time was wrong, so it should change; the real time was
wrong so it should change; the step is not to fix drift so
CLOCK_MONOTONIC shouldn't change, and CLOCK_UPTIME is CLOCK_MONOTONIC
so it shouldn't change.  Otherwise, the step is to fix up drift and
all clock should change but the boot time shouldn't change.
CLOCK_UPTIME* should always CLOCK_MONOTONIC* (and shouldn't exist)
once the latter is implemented correctly.  The latter is required
to be the time since some fixed time, and this time can be the boot
time once the latter is fixed (invariant, with its current variation
fixed).  Adjustments on resume are an example of such a step.  They
are large jumps forwards and represent reality, so there is no problem
in making them (except thundering heres of timeouts, but making
CLOCK_MONOTONIC not advance is not the way to fix timeouts).  In
general, adjustments to fix drift might be backwards, but they can't
be for CLOCK_MONOTONIC.  They should be handled by slewing the clock
backwards over a long period.  This already happens for ntpd micro-
adjustments.

> +.It Dv CLOCK_VIRTUAL
> +Increments only when
> +the CPU is running in user mode on behalf of the calling process.
> +.It Dv CLOCK_PROF
> +Increments when the CPU is running in user or kernel mode.

It's more obvious that the clock stops incrementing for the other timeouts.

> +.It Dv CLOCK_SECOND
> +Returns the current second without performing a full time counter
> +query, using an in-kernel cached value of the current second.

This doeesn't say that the current second is a for a "FAST"
CLOCK_REALTIME_FAST, or as usual document any incoherencies.

CLOCK_SECOND is another unportability that shouldn't exist.  It is
useless for its intended purpose in most cases when it is implemented
in libc, and then it doesn't do what the man page says that it does --
libc always does a full timecounter query (so the other "FAST" clocks
aren't efficient either, but just throw away precision).

The "FAST" clocks are incoherent with the "PRECISE" clocks since they
use different rounding (round down in more cases).  Except when FAST
libc versions are mixed with FAST syscall versions.  Then user FAST
is coherent with user and kernel PRECISE, but not with kernel FAST.
The bug is most obvious for CLOCK_SECOND since it has largest granularity:

 	now = user_CLOCK_SECOND;		// value n
 	later = kernel_CLOCK_SECOND;		// value n - 1
 	assert(now <= later);			// fail

Ignore the complication that CLOCK_SECOND is a real time so it can go
backwards, and assume that it is monotonic.  The user should be aware
that FAST clocks are only coherent with each other and that CLOCK_SECOND
is FAST and not do assertions similar to the above but with 'now' obtained
from a non-FAST clock (e.g., from CLOCK_REALTIME rounded down explicitly).
But it is difficult for users to know the coherency classes of clocks
when these are not only undocumented but depend on the library.  E.g.,
versions of time(1) used CLOCK_REALTIME, so time() was coherent with
CLOCK_REALTIME but not with CLOCK_SECOND.  time() was changed to use
CLOCK_SECOND, so it became incoherent with CLOCK_REALTIME but coherent
with FAST clocks.  Then CLOCK_SECOND was changed to be in libc on most
arches/hardware, so time() usually because coherent with CLOCK_REALTIME
again, but incoherent with kernel CLOCK_SECOND and other FAST clocks.

The incoherency is most obvious for file times, since file times normally
use CLOCK_SECOND or a FAST clock.  These are incoherent with user times
if user times are read on a clock in a different coherency class (e.g.,
old time(), and current time() if user CLOCK_SECOND is not in the FAST
class).

The general description of the FAST and PRECISE versions still give
incomplete old implementation details but no new implementation details:
- FAST is documented to not do a full timecounter query and thus have
   accuracy 1 timer tick
   - actually, some implementations do do a full timecounter query
   - FASTness might be dominated by syscall overhead and not by timecounter
     query time.  This is why the FAST versions shouldn't exist.  When they
     are syscalls, they are still slow, and when they aren't syscalls they
     do they do do a full timecounter query.
   - it doesn't follow that accuracy is 1 timer tick.  The authors of this
     man page and API should know better than to conflate accuracy with
     precision, and didn't make that mistake for the PRECISE versions.
     FAST really means the opposite of PRECISE, and sometimes significantly
     more efficient.  The accuracy is the same as that of the PRECISE version,
     less the granularity of the precision.
   - the precision isn't 1 timer tick for kernel versions.  It is of course
     1 second for CLOCK_SECOND, and for the others it is tc_tick ticks.
     About max(1 tick, 1 msec).
   - the precision isn't 1 timer tick for user versions.  It is of course
     1 second for CLOCK_SECOND, and for the others it is the same as for
     the PRECISE version (should be the timecounter period, but this is
     impossible for TSC clocks at a few GHz so we advertise 1 nsec and
     depend on the accuracy being much worse than 1 nsec so the wrong
     precision doesn't matter).
- PRECISE is documented to get the most "exact" value possible at the expense
   of execution time.  I guess "exact" can be read as "accurate && precise",
   but since precision is invariant and accuracy is not under control of
   clock_gettime(), what the PRECISE versions actually give is correctness
   possibly at the expense of execution time.
- it isn't documented that the PRECISE are no different from the standard
   versions
- there is a problem with clock_getres()'s name, design and implemenation.
   I forget the difference between resolution and precision.  Anyway,
   clock_getres() clearly has the same resolution as timespecs (1 nsec),
   and was standardized just in time for that to be not fine enough.  It
   needs to return the precision (or is it the resolution?) of the
   underlying clock.  We conflate resolution with precision and mostly
   return the precision.  This is only clearly wrong for the FAST clocks.
   We return the same resolution for them as for the PRECISE clocks.  This
   is probably technically correct as a resolution, but it would be more
   useful to return the precision.  E.g, for a TSC at 4GHz, the TSC's
   resolution is clearly 1/4 nsec, but clock_getres() can't return that
   so we return 1 nsec since normal rounding down would give a physically
   impossible resolution.  But even the PRECISE version can't read the
   clock every nsec, so it has a practical precision/resolution of much
   larger than 1 nsec.  Perhaps it is 10 nsec, but we don't know what
   it is and just return the counter's resolution (rounded) to avoid
   surprising callers with values not a multiple of the resolution.
   We do the same for the FAST versions, although now the practical
   precision/resolution is 1 msec or larger.  It takes unportable
   code to use the FAST versions, and this code must know that
   clock_getres() is useless for determining the practical precision,
   and calculate it as 1/hz times the undocumented tc_tick, or just
   not care if the result is correct provided it is FAST.

> +.Xr clock_getcpuclockid 3 ,
> .Xr ctime 3 ,
> +.Xr pthread_getcpuclockid 3 ,
> .Xr timed 8
> .Sh STANDARDS
> The

Not very standard.  There seem to be 14 extensions and only 8 listed in
the extensions list (mainly ones that shouldn't exist).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170322120102.P963>