Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 4 Jun 2012 11:01:57 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-arch@freebsd.org
Cc:        Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>, Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>, Konstantin Belousov <kib@freebsd.org>, Konstantin Belousov <kostikbel@gmail.com>
Subject:   Re: Fwd: [RFC] Kernel shared variables
Message-ID:  <201206041101.57486.jhb@freebsd.org>
In-Reply-To: <20120603184315.T856@besplex.bde.org>
References:  <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2%2BoYo%2BwwT4ipA@mail.gmail.com> <20120603051904.GG2358@deviant.kiev.zoral.com.ua> <20120603184315.T856@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote:
> On Sun, 3 Jun 2012, Konstantin Belousov wrote:
> 
> > On Sun, Jun 03, 2012 at 07:28:09AM +1000, Bruce Evans wrote:
> >> On Sat, 2 Jun 2012, Konstantin Belousov wrote:
> >>> ...
> >>> In fact, I think that if the whole goal is only fast clocks, then we
> >>> do not need any additional system mechanisms, since we can easily export
> >>> coefficients for rdtsc formula already. E.g. we can put it into elf auxv,
> >>> which is ugly but bearable.
> >>
> >> How do you get the timehands offsets?  These only need to be updated
> >> every second or so, or when used, but how can the application know
> >> when they need to be updated if this is not done automatically in the
> >> kernel by writing to a shared page?  I can only think of the
> >> application arranging an alarm signal every second or so and updating
> >> then.  No good for libraries.
> > What is timehands offsets ? Do you mean things like leap seconds ?
> 
> Yes.  binuptime() is:
> 
> % void
> % binuptime(struct bintime *bt)
> % {
> % 	struct timehands *th;
> % 	u_int gen;
> % 
> % 	do {
> % 		th = timehands;
> % 		gen = th->th_generation;
> % 		*bt = th->th_offset;
> % 		bintime_addx(bt, th->th_scale * tc_delta(th));
> % 	} while (gen == 0 || gen != th->th_generation);
> % }
> 
> Without the kernel providing th->th_offset, you have to do lots of ntp
> handling for yourself (compatibly with the kernel) just to get an
> accuracy of 1 second.  Leap seconds don't affect CLOCK_MONOTONIC, but
> they do affect CLOCK_REALTIME which is the clock id used by
> gettimeofday().  For the former, you only have to advance the offset
> for yourself occasionally (compatibly with the kernel) and manage
> (compatibly with the kernel, especially in the long term) ntp slewing
> and other syscall/sysctl kernel activity that micro-adjusts th->th_scale.

I think duplicating this logic in userland would just be wasteful.  I have
a private fast gettimeofday() at my current job and it works by exporting
the current timehands structure (well, the equivalent) to userland.  The
userland bits then fetch a copy of the details and do the same as bintime().
(I move the math (bintime_addx() and the multiply)) out of the loop however.

> > This is indeed problematic for auxv. For auxv it could be solved by
> > providing offset for next recheck using syscalls, and making libc code to
> > respect this offset. But, I do think that vdso in shared page
> > is the right solution, not auxv.
> 
> timehands in a shared pages is close to working.  th_generation protects
> things in the same way as in the kernel, modulo assumptions that writes
> are ordered.

It would work fine.  And in fact, having multiple timehands is actually a
bug, not a feature.  It lets you compute bogus timestamps if you get preempted
at the wrong time and end up with time jumping around.  At Yahoo! we reduced
the number of timehands structures down to 2 or some such, and I'm now of
the opinion we should just have one and dispense with the entire array.

For my userland case I only export a single timehands copy.

> >> rdtsc is also very unportable, even on CPUs that have it.  But all other
> >> x86 timecounter hardware is too slow if you want gettimeofday() to be fast
> >> and as accurate as it is now.

For all the hardware where people run mysql and similar software that calls
getimeofday() a lot, rdtsc() works just fine.

> > !rdtsc hardware is probably cannot be used at all due to need to provide
> > usermode access to device registers. The mere presence of rdtsc does not
> > means that usermode indeed can use it, it should be decided by kernel
> > based on the current in-kernel time source. If rdtsc is not usable, the
> > corresponding data should not be exported, or implementation should go
> > directly into syscall or whatever.

Yes, the patches I have only work if the kernel uses the TSC as its main
timecounter as well.

> But then applications would:
> - use gettimeofday() more than they should ("it works on Linux"), even
>    more than now since when "it works on FreeBSD-x86" too
> - just be slow when gettimeofday() is slow
> - kludge around gettimeofday() being slow like they do now
> - kludge around gettimeofday() being slow not like they do now (use more
>    complications to probe it being slow).

Some applications really need fine-grained timing with as little overhead
as possible.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201206041101.57486.jhb>