Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 6 Apr 2019 01:01:19 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Michael Tuexen <tuexen@fh-muenster.de>,  freebsd-hackers Hackers <freebsd-hackers@freebsd.org>,  FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Message-ID:  <20190406003907.C3872@besplex.bde.org>
In-Reply-To: <20190405132128.GD1923@kib.kiev.ua>
References:  <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> <20190309144844.K1166@besplex.bde.org> <20190324110138.GR1923@kib.kiev.ua> <E0785613-2B6E-4BB3-95CD-03DD96902CD8@fh-muenster.de> <20190403070045.GW1923@kib.kiev.ua> <20190404011802.E2390@besplex.bde.org> <20190405113912.GB1923@kib.kiev.ua> <20190405230717.D3383@besplex.bde.org> <20190405132128.GD1923@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 5 Apr 2019, Konstantin Belousov wrote:

> On Fri, Apr 05, 2019 at 11:52:27PM +1100, Bruce Evans wrote:
>> On Fri, 5 Apr 2019, Konstantin Belousov wrote:
>>
>>> On Thu, Apr 04, 2019 at 02:47:34AM +1100, Bruce Evans wrote:
>>>> I noticed (or better realized) a general problem with multiple
>>>> timehands.  ntpd can slew the clock at up to 500 ppm, and at least an
>>>> old version of it uses a rate of 50 ppm to fix up fairly small drifts
>>>> in the milliseconds range.  500 ppm is enormous in CPU cycles -- it is
>>>> 500 thousand nsec or 2 million cycles at 4GHz.  Winding up the timecounter
>>>> every 1 msec reduces this to only 2000 cycles.
>>>> ...
>>>> The main point of having multiple timehands (after introducing the per-
>>>> timehands generation count) is to avoid blocking thread N during the
>>>> update, but this doesn't actually work, even for only 2 timehands and
>>>> a global generation count.
>>>
>>> You are describing the generic race between reader and writer. The same
>>> race would exist even with one timehand (and/or one global generation
>>> counter), where ntp adjustment might come earlier or later of some
>>> consumer accessing the timehands. If timehand instance was read before
>>> tc_windup() run but code consumed the result after the windup, it might
>>> appear as if time went backward, and this cannot be fixed without either
>>> re-reading the time after time-depended calculations were done and
>>> restarting, or some globabl lock ensuring serialization.
>>
>> With 1 timehand, its generation count would be global.  I think its ordering
>> is strong enough to ensure serialization.
> Yes, single timehands result in global generation.  But it would not solve
> the same race appearing in slightly different manner, as I described above.
> If reader finished while generation number in th was not yet reset, but
> caller uses the result after tc_windup(), the effect is same as if we
> have two th's and reader used the outdated one.

You described it too concisely for me to understand :-).

I now see that a single generation count doesn't give serialization.  I
thought that setting the generation to 0 at the start of tc_windup()
serialized the reader and writer.  But all it does is prevent use of the
results of the windup while only some of them are visible.  If the
setting the generation count to 0 doesn't become before tc_windup() reads
the hardware timecounter, then this read may be before other reads using
the old timehands, but it needs to be after.

A not so good fix for this is to wait a bit after setting the generation
count to 0, so that the change becomes visible on all CPUs.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190406003907.C3872>