Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Mar 2019 14:05:57 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Bruce Evans <brde@optusnet.com.au>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Message-ID:  <E6A33A82-F98C-4BFA-97B5-16F930586E6C@yahoo.com>
In-Reply-To: <20190314193946.GJ2492@kib.kiev.ua>
References:  <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> <5EED3352-2E8C-4BEE-B281-4AC8DE9570C2@yahoo.com> <20190314193946.GJ2492@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help


On 2019-Mar-14, at 12:39, Konstantin Belousov <kostikbel at gmail.com> =
wrote:

> On Thu, Mar 07, 2019 at 05:29:51PM -0800, Mark Millard wrote:
>> A basic question and a small note.
>>=20
>> Question's context for it tc->tc_get_timecount(tc) values:=20
>>=20
>> In the powerpc64 context tc->tc_get_timecount(tc) is the lower
>> 32 bits of the tbr, in my context having a 33,333,333 MHz or so
>> increment rate for a machine with a 2.5 GHz or so clock rate.
>> The truncated 32 bit tbr value wraps every 128 seconds or so.
>> 2 sockets, 2 cores per socket, so 4 separate tbr values.
>>=20
>> The question is . . .
>>=20
>> In tc_delta's:
>>=20
>>    tc->tc_get_timecount(tc) - th->th_offset_count
>>=20
>> is observing tc->tc_get_timecount(tc) < th->th_offset_count
>> ever supposed to be possible in correct operation, other than
>> tc->tc_get_timecount(tc) having wrapped around (and so being=20
>> newly 0 or "near" 0, no evidence of of having it having been
>> near 128 seconds or more for my context)?
> I think yes, there is no reason for current get_timecount() value
> to have any arithmetic relation to th_offset_count.  Look at =
tc_windup()
> on how the th_offset_count is calculated.  The final value is clamped
> by the tc_counter_mask, so only lower bits are important (higher bits
> are evacuated to th_offset or lost due to overflow if tc_windup()
> was not called soon enough).
>=20

Okay. Thanks.

Just FYI:

I asked because in my powerpc64 context I was seeing
(in sleepq_timeout) td->td_sleeptimo > sbinuptime() in:

        if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo =3D=3D =
0) {
                /*
                 * The thread does not want a timeout (yet).
                 */

and without such sleeps being rescheduled in that case, those sleeps
hang up. My hack to temporarily enable useful operation was to
have binuptime avoid tc->tc_get_timecount(tc) < th->th_offset_count
for small enough differences, as shown below:

. . .
        do {
                do { // HACK!!!
                    th=3D  timehands;
                    tc=3D  th->th_counter;
                    gen=3D atomic_load_acq_int(&th->th_generation);
                    tim_cnt=3D    tc->tc_get_timecount(tc);
                    tim_offset=3D th->th_offset_count;
                    tim_wrong_order_diff=3D tim_offset-tim_cnt;
                } while (tim_cnt<tim_offset && =
tim_wrong_order_diff<wrong_order_diff_proper_upper_bound); // HACK!!!
                *bt =3D th->th_offset;
. . .

where I experimentally came up with the following for the specific =
PowerMac G5 context:

        u_int const wrong_order_diff_proper_upper_bound=3D 0x14u; // =
0x11 is max observed diff so far HACK!!!

I've not hand any hung-up sleeps after that change. Despite being a =
hack,
this gives evidence that tc->tc_get_timecount(tc) < th->th_offset_count
for small enough differences (in binuptime) is involved in the hangups
in some essential way for the PowerMac G5 context.

I look forward to removing this hack at some point, when things just
work for this 2 socket, 2 cores per socket powerpc64 context. But
for now the hack is locally useful.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E6A33A82-F98C-4BFA-97B5-16F930586E6C>