Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Apr 2019 22:28:26 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Justin Hibbits <chmeeedalf@gmail.com>
Cc:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, Nathan Whitehorn <nwhitehorn@freebsd.org>
Subject:   Re: Looks like ap_letgo use needs platform specific code to allow avoiding the "sleep-gets-stuck" problem on PowerMac11,2's . . .
Message-ID:  <1C697AD4-6CAC-4C33-8D77-72D9B53A7648@yahoo.com>
In-Reply-To: <2E386EE0-782D-47CB-978B-B5A010AFCF88@yahoo.com>
References:  <B5B1CC39-1D75-42F4-9661-62DA9D029D34@yahoo.com> <CAHSQbTBJtUc70XZgEk449-dLEhOydsypxD2NpVNsrxy0NcxnNA@mail.gmail.com> <2E386EE0-782D-47CB-978B-B5A010AFCF88@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[The or 31,31,31 and or 6,6,6 test behaved as you said.]

On 2019-Apr-19, at 21:42, Mark Millard <marklmi at yahoo.com> wrote:

> On 2019-Apr-19, at 20:48, Justin Hibbits <chmeeedalf at gmail.com> =
wrote:
>=20
>> On Fri, Apr 19, 2019 at 10:36 PM Mark Millard <marklmi@yahoo.com> =
wrote:
>>>=20
>>> [Note: My context is tied to getting usefdt mode operable on
>>> old PowerMacs. The below is only tested for usefdt mode
>>> so far.]
>>>=20
>>> The following investigatory patch has so-far stopped my having
>>> sleep-gets-stuck problems (only seen on 2-socket/1-core-each
>>> 970 MP G5 Powermac11,2's as far as I know):
>>>=20
>>> # svnlite diff  /usr/src/sys/powerpc/powerpc/mp_machdep.c | more     =
                                                                  Index: =
/usr/src/sys/powerpc/powerpc/mp_machdep.c
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>> --- /usr/src/sys/powerpc/powerpc/mp_machdep.c   (revision 345758)
>>> +++ /usr/src/sys/powerpc/powerpc/mp_machdep.c   (working copy)
>>> @@ -77,9 +77,10 @@
>>>       PCPU_SET(awake, 1);
>>>       __asm __volatile("msync; isync");
>>>=20
>>> +       powerpc_sync();
>>>       while (ap_letgo =3D=3D 0)
>>> -               __asm __volatile("or 31,31,31");
>>> -       __asm __volatile("or 6,6,6");
>>> +               powerpc_sync();
>>> +       isync();
>>>=20
>>>       /*
>>>        * Set timebase as soon as possible to meet an implicit =
rendezvous
>>> @@ -262,8 +263,11 @@
>>>       __asm __volatile("msync; isync");
>>>=20
>>>       /* Let APs continue */
>>> -       atomic_store_rel_int(&ap_letgo, 1);
>>> +       ap_letgo=3D 1;    // depend on prior sync, no need to lwsync =
first
>>>=20
>>> +       powerpc_sync(); // analogous to what the ap's do (more =
similar time frame?)
>>> +       if (ap_letgo) isync();
>>> +
>>>       platform_smp_timebase_sync(ap_timebase, 0);
>>>=20
>>>       while (ap_awake < smp_cpus)
>>>=20
>>> Apparently, the use of "or 31,31,31" causes sizable
>>> variations in the time frame when the platform_smp_timebase_sync
>>> happens on the various cores across the two 970MPs.
>>>=20
>>> It looks something like a platform_ap_letgo_wait is appropriate,
>>> with a powermac_ap_letgo_wait specific one, say. (Or, possibly,
>>> AIM specific but spanning powermac?)
>>>=20
>>> The above patch has booted and operated the 2-socket PowerMac7,2
>>> context fine as well so far. I'll check the G5's and a G4
>>> dual-socket with a 32-bit powerpc build.
>>>=20
>>> I've no clue if there are any time-mismatch issues across
>>> sockets/cores/hw-threads for the "8-way SMT" contexts with
>>> "dozens to hundreds of CPUs".
>>>=20
>>> I've only been testing for part of today and I do not have
>>> access to any non-PowerMac PowerPC contexts. So this is
>>> preliminary but I do not expect "or 31,31,31" is going to
>>> be appropriate to the PowerMac11,2 contexts that caused
>>> my investigation of the issue.
>>=20
>> Those nops are just that on the G5: nops.  They do absolutely nothing
>> on any processor that's not multithreaded.  On multithreaded CPUs =
they
>> are priority hints.
>=20
> I thought PowerISA 2.03 was before multi-threaded (but spanning
> multi-core). It lists 31,31,31 in a table with other values. Is
> there a better match to the 970MP vintage of things for me to
> reference?
>=20
> I'll test:
>=20
>        powerpc_sync();
>        while (ap_letgo =3D=3D 0)
>        {
>                __asm __volatile("or 31,31,31");
>                powerpc_sync();
>        }
>        __asm __volatile("or 6,6,6");
>        isync();
>=20
> (or some other such if you want). Let me know if you
> have some specific variation you prefer.

Some quick testing seems to be doing what you indicated.
I've updated https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D233863
to have the updated code.

>> What most likely 'fixed' the problem for you was the addition of the
>> synchronization primitives in the code path.  You can add them =
without
>> sacrificing the nops.  Does this patch fix the boot issue on the G5
>> quad without the usefdt=3D1 setting, and without reverting the KVA
>> change?
>=20
> We already had an exchange about my forcing an slb entry as
> needed for pcpup->pc_curpcb to be used. It massively changed
> the frequency of hangups (rare now).
>=20
> As a reminder, I had added
>=20
> hack_into_slb_if_needed(pcpup->pc_curpcb);
>=20
> in cpudep_ap_bootstrap. This was because other
> activity from:
>=20
>        SI_SUB_KTHREAD_INIT     =3D 0xe000000,    /* init process*/
>        SI_SUB_KTHREAD_PAGE     =3D 0xe400000,    /* pageout daemon*/
>        SI_SUB_KTHREAD_VM       =3D 0xe800000,    /* vm daemon*/
>        SI_SUB_KTHREAD_BUF      =3D 0xea00000,    /* buffer daemon*/
>        SI_SUB_KTHREAD_UPDATE   =3D 0xec00000,    /* update daemon*/
>        SI_SUB_KTHREAD_IDLE     =3D 0xee00000,    /* idle procs*/
> #ifndef EARLY_AP_STARTUP
>        SI_SUB_SMP              =3D 0xf000000,    /* start the APs*/
> #endif=20
>=20
> was competing for slb entries and doing slb entry replacements in
> parallel with the ap startup activity so sometimes no slot covered
> the pcpup->pc_curpcb relted address range. (Replacement slots are
> picked based on mftb()%n_slbs .)
>=20
> Are you asking me to disable that call and see what happens?

I've not done anything about disabling the replacement of an
slb entry for spanning what pcpup->pc_curpcb-> refers to
when there is no such spanning entry already. (The code makes
no replacement if an entry does span the address range.
So, effectively, then, the code is a no-op for such conditions.)

> With the hack_into_slb_if_needed call and the other patches
> reported in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D233863
> I'm booting and using historical and usefdt modes just fine
> as far as I've tested. (usefdt mode needing vt.) The patches do
> not involve reverting the KVA change. One of the test machines is=20
> a "G5 quad" and its is the  primary one I build on and test.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1C697AD4-6CAC-4C33-8D77-72D9B53A7648>