Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Mar 2019 06:57:16 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Mark Millard <marklmi@yahoo.com>, Bruce Evans <brde@optusnet.com.au>,  freebsd-hackers Hackers <freebsd-hackers@freebsd.org>,  FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Message-ID:  <20190315064830.F7981@besplex.bde.org>
In-Reply-To: <20190314193946.GJ2492@kib.kiev.ua>
References:  <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> <5EED3352-2E8C-4BEE-B281-4AC8DE9570C2@yahoo.com> <20190314193946.GJ2492@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 14 Mar 2019, Konstantin Belousov wrote:

> On Thu, Mar 07, 2019 at 05:29:51PM -0800, Mark Millard wrote:
>> A basic question and a small note.
>>
>> Question's context for it tc->tc_get_timecount(tc) values:
>>
>> In the powerpc64 context tc->tc_get_timecount(tc) is the lower
>> 32 bits of the tbr, in my context having a 33,333,333 MHz or so
>> increment rate for a machine with a 2.5 GHz or so clock rate.
>> The truncated 32 bit tbr value wraps every 128 seconds or so.
>> 2 sockets, 2 cores per socket, so 4 separate tbr values.
>>
>> The question is . . .
>>
>> In tc_delta's:
>>
>>     tc->tc_get_timecount(tc) - th->th_offset_count
>>
>> is observing tc->tc_get_timecount(tc) < th->th_offset_count
>> ever supposed to be possible in correct operation, other than
>> tc->tc_get_timecount(tc) having wrapped around (and so being
>> newly 0 or "near" 0, no evidence of of having it having been
>> near 128 seconds or more for my context)?
> I think yes, there is no reason for current get_timecount() value
> to have any arithmetic relation to th_offset_count.  Look at tc_windup()
> on how the th_offset_count is calculated.  The final value is clamped
> by the tc_counter_mask, so only lower bits are important (higher bits
> are evacuated to th_offset or lost due to overflow if tc_windup()
> was not called soon enough).

Yes, it is a standard method to calculate time differences from a possibly-
wrapped counter as (finish - start) & mask in unsigned arithmetic, where
the counter must be checked before it wraps relative to 'start'.

>> The note:
>>
>> On 2019-Mar-7, at 14:22, Konstantin Belousov <kostikbel@gmail.com> wrote:
>>
>>> . . .
>>> +
>>> +	if (__predict_false(delta < large_delta)) {
>>
>> I thought that delta<large_delta was the non-overflow context
>> for scale*delta and that the overflow case for the multiplication
>> was when delta>=large_delta .
> You are right, I fixed this in my repo.
>>
>>> +		/* Avoid overflow for scale * delta. */
>>> +		x = (scale >> 32) * delta;
>>> +		bt->sec += x >> 32;
>>> +		bintime_addx(bt, x << 32);
>>> +		bintime_addx(bt, (scale & 0xffffffff) * delta);
>>> +	} else {
>>> +		bintime_addx(bt, scale * delta);
>>> +	}
>>> . . .

Fixed in my version too.

I might have helped break this.  I reversed the condition to get the
unusual path executed (though not when it overflow), and forgot to
undo this.  At least the unusual path got checked more).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190315064830.F7981>