Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Oct 2015 18:47:55 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>, Alexander Motin <mav@FreeBSD.org>
Cc:        freebsd-hackers <freebsd-hackers@FreeBSD.org>, Poul-Henning Kamp <phk@phk.freebsd.dk>, Jung-uk Kim <jkim@FreeBSD.org>
Subject:   Re: instability of timekeeping
Message-ID:  <5630FC3B.2070908@FreeBSD.org>
In-Reply-To: <20151027140403.GB2257@kib.kiev.ua>
References:  <56261398.60102@FreeBSD.org> <56261FE6.90302@FreeBSD.org> <56274FFC.2000608@FreeBSD.org> <20151021184850.GX2257@kib.kiev.ua> <562F3E2F.2010100@FreeBSD.org> <20151027115810.GU2257@kib.kiev.ua> <562F8109.4050203@FreeBSD.org> <20151027140403.GB2257@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 27/10/2015 16:04, Konstantin Belousov wrote:
> On Tue, Oct 27, 2015 at 03:50:01PM +0200, Andriy Gapon wrote:
>> On 27/10/2015 13:58, Konstantin Belousov wrote:
>>> On Tue, Oct 27, 2015 at 11:04:47AM +0200, Andriy Gapon wrote:
>>>> And now another observation. I have C1E option enabled in BIOS. It
>>>> means that if all cores enter C1 state, then the whole processor
>>>> is magically placed into a deep C-state (C3, I think). LAPIC timer
>>>> on this CPU model does not run in the deep C-state. So, I had to
>>>> disable C1E option to test the LAPIC timer in a useful way. But
>>>> before actually testing it I first tried to reproduce the problem. As
>>>> you might have already guessed the problem is gone with that option
>>>> disabled. Scratching my head to understand the implications of this
>>>> observation.
>>>
>>> Most obvious explanation would be that the latency of wakeup is very large.
>>> What is the HPET frequency when the jitter occur ?
>>>
>>
>> kern.timecounter.tc.TSC-low.frequency: 1607351869
>> kern.eventtimer.et.HPET.frequency: 14318180
>>
>> Or did you mean the actual rate of timer interrupts?
> 
> Actual rate, and I used the wrong word.  Since most likely your eventtimer
> is not periodic, I mean the next timer interrupt deadline.
> 

I wasn't sure how to answer your question, so I went ahead and obtained a little
bit of debugging information using KTR.
For a start, here is a patch that adds couple more trace points:
https://people.freebsd.org/~avg/timekeeping-ktr.patch
Please note that the patch has a small bug in hardclock_cnt():
+			CTR1(KTR_SPARE2, "hardclock_cnt at %d: newticks=%d",
+			    newticks);
Two format specifiers but only one argument.  I mention this, so that the trace
is not misinterpreted.

Then, here is a snippet of the trace that seems to be of interest:
https://people.freebsd.org/~avg/timekeeping.ktrdump.txt
I must admit that I have a hard time interpreting what that trace says.
As I understand the KTR timestamps are raw TSC values. So, there is a huge gap
between entries #999543 and #999544.  It's larger than 2 * 2^32, so even after
applying tsc_shift it's still larger that 2^32.

I am not sure how the HPET timer could get programmed for such a long delay.
Or maybe there is some lower level problem that caused the interrupt to be
delayed by (16450821032024 - 16442184278422) / 2 / 1607351869 ≈ 2.687 seconds.
That's quite a long time.

One thing that I can tell is that my earlier hypothesis seems to be wrong as all
raw TSC values are increasing.

Grepping for lines where the timer gets reprogrammed I was surprised to see the
following:
999543   1   16442184278422 load at 1:    next 5042.d25692b3 eq 0
999507   3   16442183843960 load at 3:    next 5042.d23e2146 eq 0
999503   2   16442183825488 load at 2:    next 5042.d25692b3 eq 0
999472   0   16442183792870 load at 0:    next 5042.d23e2146 eq 0
This looks like the next event time has flipped between those two values somehow.

Also, the following seemed a little bit strange:
999508   3   16442183860624 ipi  at 3:    now  5042.d2301033
999467   3   16442183787851 active at 3:  now  5042.d2306108
The later even reports a value of 'now' that is smaller.
I think that that can be explained by the fact that in one case the value is
obtained via sbinuptime() while in the other case it's taken from state->now,
which is set when an actual timer interrupt is received (before sending a
hardclock IPI).

In either case I am going to add a few more trace points in et_start and the
HPET timer code and see if I can catch anything interesting there.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5630FC3B.2070908>