Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 3 May 2019 14:52:02 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Justin Hibbits <chmeeedalf@gmail.com>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: 970MP PowerMac G5s: What printf's show about cpu_mp_unleash hangups on the test variant of head -r347003 (found cpu_mp_unleash counterexample)
Message-ID:  <9B05FB92-D1E5-4DD5-BB04-53088E528F15@yahoo.com>
In-Reply-To: <4D659851-8731-4116-A6B6-33A75E9F0B76@yahoo.com>
References:  <EF44A358-CC6D-4244-A911-6D4DACFF4B21@yahoo.com> <4D659851-8731-4116-A6B6-33A75E9F0B76@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[I've tried a different moea64_bootstrap_slb_prefault
usage variation. Also, I reverted all the Rx,Rx,Rx
no-op changes. I put in the change to protect the
mttb(...) implementation from interrupts
messing up the assignment to the TBR, just so that
would not be a worry. I still have my isync
additions and my libc string compare code changes
in what I'm working with for head -r347003 .]

On 2019-May-3, at 11:09, Mark Millard <marklmi at yahoo.com> wrote:

> [Exeriment with instead disabling the code in each of
> the nop_prio_ routines instead of commenting out specific
> calls. But mostly: more test runs. It does not support
> what I thought yesterday's cpu_mp_unlead suggested.]
>=20
> On 2019-May-2, at 22:23, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> [Note: I still have your requested loop change, my
>> isync additions, and my libc string compare code
>> change in what I'm working with for head -r347003 .]
>>=20
>> I started using printf to help identify more about what
>> code managed to execute vs what code did not for
>> hang-ups.
>>=20
>> This note is just about cpu_mp_unleash observations and
>> experiments related to what printf's showed.
>>=20
>> I did:
>>=20
>> static void
>> cpu_mp_unleash(void *dummy)
>> {
>> . . . (omitted as all earlier printf's printed) . . .
>> printf("cpu_mp_unleash: before DELAY\n");
>>       /* Let the APs get into the scheduler */
>>       DELAY(10000);
>> printf("cpu_mp_unleash: after DELAY\n");
>>=20
>> }
>>=20
>> What I saw was only the first of the twoDEALY printf's
>> shown above was printing when cpu_mp_unleash hung up,
>> such a hangup being the normal case when vt_upgrade
>> did not hang-up first.
>>=20
>> So I looked at /mnt/usr/src/sys/powerpc/powerpc/clock.c
>> and its DELAY routine and came up with only one thing
>> that looked like a useful experiment. Note what I
>> then commented out:
>>=20
>> # svnlite diff /mnt/usr/src/sys/powerpc/powerpc/clock.c
>> Index: /mnt/usr/src/sys/powerpc/powerpc/clock.c
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- /mnt/usr/src/sys/powerpc/powerpc/clock.c	(revision 347003)
>> +++ /mnt/usr/src/sys/powerpc/powerpc/clock.c	(working copy)
>> @@ -309,10 +309,10 @@
>> 	TSENTER();
>> 	tb =3D mftb();
>> 	ttb =3D tb + howmany((uint64_t)n * 1000000, ps_per_tick);
>> -	nop_prio_vlow();
>> +	//nop_prio_vlow();
>> 	while (tb < ttb)
>> 		tb =3D mftb();
>> -	nop_prio_medium();
>> +	//nop_prio_medium();
>> 	TSEXIT();
>> }
>>=20
>> After this change I've not (yet?) seen another cpu_mp_unleash
>> hangup in my test context.
>>=20
>> Even if not documented to do so, it appears to me that
>> ori Rx,Rx,Rx code that is behind the nop_prio_vlow() does
>> something specific on the 970MP's in the 2-socket/2-core-each
>> G5 PowerMac11,2's --and what it does interferes with making
>> progress in DELAY, in at least that specific use of it and/or
>> any others on the ap's during cpu_mp_unleash.
>>=20
>> Of course, this testing process is of a probabilistic context
>> and I do not have hundreds or more of examples of any specific
>> condition at this point. But, so far, the change in behavior
>> seems clear: I went from always-hanging-up-so-far to
>> always-booting-so-far (when vt_upgrade did not prevent the
>> test in each context).
>=20
> So I uncommented the 2 calls commented out the contents of
> the nop_prio_ routines, summarized here via:
>=20
>=20
> # egrep -r 'or.*(31,31,31|1,1,1|6,6,6|2,2,2|5,5,5|3,3,3)' =
/mnt/usr/src/sys/powerpc/ | more                                         =
                                                              =
/mnt/usr/src/sys/powerpc/include/cpufunc.h:     //__asm __volatile("or =
31,31,31");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h:     //__asm __volatile("or =
1,1,1");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h:     //__asm __volatile("or =
6,6,6");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h:     //__asm __volatile("or =
2,2,2");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h:     //__asm __volatile("or =
5,5,5");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h:     //__asm __volatile("or =
3,3,3");

[I had forgotten to list cpu.h 's use of the 27,27,27 form in
cpu_spinwait.]

I've reverted all such "or Rx,Rx,Rx" changes as having no
overall change in observed behavior for any of the 3
hang-up areas.


The new experiment . . .

I've tried changing the loop to only moea64_bootstrap_slb_prefault
about 1/2 of the kernel slb slots, leaving the others to be filled
before mftb()%n_slbs based "random" replacements would start:

 	/*
-	 * Map the entire KVA range into the SLB. We must not fault =
there.
+	 * Map about 1/2 the kernel slb slots, leaving the others =
available without
+	 * mftb()%n_slbs replaceme being involved.
 	 */
 	#ifdef __powerpc64__
-	for (va =3D virtual_avail; va < virtual_end; va +=3D =
SEGMENT_LENGTH)
+	i =3D 0;
+	for (va =3D virtual_avail; va < virtual_end && i<(n_slbs-1)/2; =
va +=3D SEGMENT_LENGTH, i++)
 		moea64_bootstrap_slb_prefault(va, 0);
 	#endif

This change still gets hang-ups in the 3 same places but,
so far, has booted more often. (Not that I have a
large sampling at this point.) Definitely still still
probabilistic.


Other status . . .

I've still not figured out a way to discover what happens
at each of the 3 hang-up points in the areas identified.
I'm also running out of ideas for what to do for possibly
gaining other types of evidence.

One hypothesis about why I've not seen these 3 hang-up
places in my normal environment ( currently based on
head -r345758 ), is that I normally run non-debug
kernels in that context. Here it has all been debug
kernels to better match with what artifacts.ci provides,
among other reasons.


For reference, the current -r347003 patches that I was
last exploring, including showing the extra printf's for
the 3 hang-up areas:

# svnlite diff /mnt/usr/src/sys/ | more
Index: /mnt/usr/src/sys/dev/vt/vt_core.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /mnt/usr/src/sys/dev/vt/vt_core.c   (revision 347003)
+++ /mnt/usr/src/sys/dev/vt/vt_core.c   (working copy)
@@ -2720,6 +2720,7 @@
=20
                /* Init 25 Hz timer. */
                callout_init_mtx(&vd->vd_timer, &vd->vd_lock, 0);
+printf("vt_upgrade after callout_init_mtx / before sequence leading to =
callout_reset\n");
=20
                /*
                 * Start timer when everything ready.
@@ -2732,6 +2733,7 @@
                atomic_add_acq_int(&vd->vd_timer_armed, 1);
                vd->vd_flags |=3D VDF_ASYNC;
                callout_reset(&vd->vd_timer, hz / VT_TIMERFREQ, =
vt_timer, vd);
+printf("vt_upgrade after callout_reset\n");
                register_handlers =3D 1;
        }
=20
Index: /mnt/usr/src/sys/fs/nfsserver/nfs_fha_new.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /mnt/usr/src/sys/fs/nfsserver/nfs_fha_new.c (revision 347003)
+++ /mnt/usr/src/sys/fs/nfsserver/nfs_fha_new.c (working copy)
@@ -93,9 +93,11 @@
         * Initialize the sysctl context list for the fha module.
         */
        sysctl_ctx_init(&softc->sysctl_ctx);
+printf("fhanew_init: after sysctl_ctx_init / before =
SYSCTL_ADD_NODE\n");
        softc->sysctl_tree =3D SYSCTL_ADD_NODE(&softc->sysctl_ctx,
            SYSCTL_STATIC_CHILDREN(_vfs_nfsd), OID_AUTO, "fha", =
CTLFLAG_RD,
            0, "NFS File Handle Affinity (FHA)");
+printf("fhanew_init: after SYSCTL_ADD_NODE\n");
        if (softc->sysctl_tree =3D=3D NULL) {
                printf("%s: unable to allocate sysctl tree\n", =
__func__);
                return;
Index: /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c    (revision 347003)
+++ /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c    (working copy)
@@ -956,10 +956,12 @@
        virtual_end =3D VM_MAX_SAFE_KERNEL_ADDRESS;=20
=20
        /*
-        * Map the entire KVA range into the SLB. We must not fault =
there.
+        * Map about 1/2 the kernel slb slots, leaving the others =
available without
+        * mftb()%n_slbs replaceme being involved.
         */
        #ifdef __powerpc64__
-       for (va =3D virtual_avail; va < virtual_end; va +=3D =
SEGMENT_LENGTH)
+       i =3D 0;
+       for (va =3D virtual_avail; va < virtual_end && i<(n_slbs-1)/2; =
va +=3D SEGMENT_LENGTH, i++)
                moea64_bootstrap_slb_prefault(va, 0);
        #endif
=20
@@ -1090,8 +1092,8 @@
        CPU_SET(PCPU_GET(cpuid), &pm->pm_active);
=20
        #ifdef __powerpc64__
-       PCPU_SET(aim.userslb, pm->pm_slb);
-       __asm __volatile("slbmte %0, %1; isync" ::
+       PCPU_SET(aim.userslb, pm->pm_slb); // no slbie needed?
+       __asm __volatile("isync; slbmte %0, %1; isync" ::
            "r"(td->td_pcb->pcb_cpu.aim.usr_vsid), "r"(USER_SLB_SLBE));
        #else
        PCPU_SET(curpmap, pm->pmap_phys);
@@ -1104,7 +1106,7 @@
 {
        pmap_t  pm;
=20
-       __asm __volatile("isync; slbie %0" :: "r"(USER_ADDR));
+       __asm __volatile("isync; slbie %0; isync" :: "r"(USER_ADDR));
=20
        pm =3D &td->td_proc->p_vmspace->vm_pmap;
        CPU_CLR(PCPU_GET(cpuid), &pm->pm_active);
@@ -1956,7 +1958,7 @@
            (uintptr_t)uaddr >> ADDR_SR_SHFT;
        curthread->td_pcb->pcb_cpu.aim.usr_vsid =3D slbv;
 #ifdef __powerpc64__
-       __asm __volatile ("slbie %0; slbmte %1, %2; isync" ::
+       __asm __volatile ("isync; slbie %0; slbmte %1, %2; isync" ::
            "r"(USER_ADDR), "r"(slbv), "r"(USER_SLB_SLBE));
 #else
        __asm __volatile("mtsr %0,%1; isync" :: "n"(USER_SR), =
"r"(slbv));
Index: /mnt/usr/src/sys/powerpc/aim/moea64_native.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /mnt/usr/src/sys/powerpc/aim/moea64_native.c        (revision =
347003)
+++ /mnt/usr/src/sys/powerpc/aim/moea64_native.c        (working copy)
@@ -406,7 +406,7 @@
         */
=20
        #ifdef __powerpc64__
-               __asm __volatile ("slbia");
+               __asm __volatile ("isync; slbia");
                __asm __volatile ("slbmfee %0,%1; slbie %0;" : =
"=3Dr"(seg0) :
                    "r"(0));
=20
@@ -417,6 +417,7 @@
                        __asm __volatile ("slbmte %0, %1" ::=20
                            "r"(slb[i].slbv), "r"(slb[i].slbe));=20
                }
+               __asm __volatile ("isync");
        #else
                for (i =3D 0; i < 16; i++)
                        mtsrin(i << ADDR_SR_SHFT, =
kernel_pmap->pm_sr[i]);
Index: /mnt/usr/src/sys/powerpc/aim/slb.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /mnt/usr/src/sys/powerpc/aim/slb.c  (revision 347003)
+++ /mnt/usr/src/sys/powerpc/aim/slb.c  (working copy)
@@ -457,7 +457,7 @@
        /* If it is for this CPU, put it in the SLB right away */
        if (pmap_bootstrapped) {
                /* slbie not required */
-               __asm __volatile ("slbmte %0, %1" ::=20
+               __asm __volatile ("isync; slbmte %0, %1; isync" ::=20
                    "r"(slbcache[i].slbv), "r"(slbcache[i].slbe));=20
        }
=20
Index: /mnt/usr/src/sys/powerpc/aim/trap_subr64.S
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /mnt/usr/src/sys/powerpc/aim/trap_subr64.S  (revision 347003)
+++ /mnt/usr/src/sys/powerpc/aim/trap_subr64.S  (working copy)
@@ -65,6 +65,7 @@
=20
        li      %r29, 0                 /* Set the counter to zero */
=20
+       isync
        slbia
        slbmfee %r31,%r29              =20
        clrrdi  %r31,%r31,28
@@ -71,6 +72,7 @@
        slbie   %r31
 1:     ld      %r31, 0(%r28)           /* Load SLB entry pointer */
        cmpdi   %r31, 0                 /* If NULL, stop */
+       isync
        beqlr
=20
        ld      %r30, 0(%r31)           /* Load SLBV */
@@ -96,6 +98,7 @@
        /* Otherwise, set up SLBs */
        li      %r29, 0                 /* Set the counter to zero */
=20
+       isync
        slbia
        slbmfee %r31,%r29              =20
        clrrdi  %r31,%r31,28
@@ -105,6 +108,7 @@
=20
        ld      %r31, 8(%r28)           /* Load SLBE */
        cmpdi   %r31, 0                 /* If SLBE is not valid, stop */
+       isync
        beqlr
        ld      %r30, 0(%r28)           /* Load SLBV  */
        slbmte  %r30, %r31              /* Install SLB entry */
@@ -113,6 +117,7 @@
        addi    %r29, %r29, 1
        cmpdi   %r29, 64                /* Repeat if we are not at the =
end */
        blt     1b=20
+       isync
        blr
=20
 /*
Index: /mnt/usr/src/sys/powerpc/include/cpufunc.h
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /mnt/usr/src/sys/powerpc/include/cpufunc.h  (revision 347003)
+++ /mnt/usr/src/sys/powerpc/include/cpufunc.h  (working copy)
@@ -155,15 +155,8 @@
        return (tb);
 }
=20
-static __inline void
-mttb(u_quad_t time)
-{
+// mttb moved to after intr_restore
=20
-       mtspr(TBR_TBWL, 0);
-       mtspr(TBR_TBWU, (uint32_t)(time >> 32));
-       mtspr(TBR_TBWL, (uint32_t)(time & 0xffffffff));
-}
-
 static __inline void
 eieio(void)
 {
@@ -202,6 +195,19 @@
        mtmsr(msr);
 }
=20
+static __inline void
+mttb(u_quad_t time)
+{
+       const uint32_t   high=3D time>>32;
+       const uint32_t   low=3D  time&0xffffffffu;
+
+       const register_t predisable_msr=3D intr_disable();
+       mtspr(TBR_TBWL, 0);
+       mtspr(TBR_TBWU, high);
+       mtspr(TBR_TBWL, low);
+       intr_restore(predisable_msr);
+}
+
 static __inline struct pcpu *
 get_pcpu(void)
 {
Index: /mnt/usr/src/sys/powerpc/powerpc/mp_machdep.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /mnt/usr/src/sys/powerpc/powerpc/mp_machdep.c       (revision =
347003)
+++ /mnt/usr/src/sys/powerpc/powerpc/mp_machdep.c       (working copy)
@@ -286,8 +286,10 @@
        if (smp_cpus > 1)
                atomic_store_rel_int(&smp_started, 1);
=20
+printf("cpu_mp_unleash: before DELAY\n");
        /* Let the APs get into the scheduler */
        DELAY(10000);
+printf("cpu_mp_unleash: after DELAY\n");
=20
 }
=20
Index: /mnt/usr/src/sys/powerpc/powerpc/trap.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /mnt/usr/src/sys/powerpc/powerpc/trap.c     (revision 347003)
+++ /mnt/usr/src/sys/powerpc/powerpc/trap.c     (working copy)
@@ -453,8 +453,8 @@
 #if defined(__powerpc64__) && defined(AIM)
                case EXC_DSE:
                        if (td->td_pcb->pcb_cpu.aim.usr_vsid !=3D 0 &&
-                           (frame->dar & SEGMENT_MASK) =3D=3D =
USER_ADDR) {
-                               __asm __volatile ("slbmte %0, %1" ::
+                           (frame->dar & SEGMENT_MASK) =3D=3D =
USER_ADDR) { // no slbie needed?
+                               __asm __volatile ("isync; slbmte %0, %1; =
isync" ::
                                        =
"r"(td->td_pcb->pcb_cpu.aim.usr_vsid),
                                        "r"(USER_SLB_SLBE));
                                return;
@@ -712,8 +712,8 @@
         * Speculatively restore last user SLB segment, which we know is
         * invalid already, since we are likely to do =
copyin()/copyout().
         */
-       if (td->td_pcb->pcb_cpu.aim.usr_vsid !=3D 0)
-               __asm __volatile ("slbmte %0, %1; isync" ::
+       if (td->td_pcb->pcb_cpu.aim.usr_vsid !=3D 0) // no slbie needed?
+               __asm __volatile ("isync; slbmte %0, %1; isync" ::
                    "r"(td->td_pcb->pcb_cpu.aim.usr_vsid), =
"r"(USER_SLB_SLBE));
 #endif
=20


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9B05FB92-D1E5-4DD5-BB04-53088E528F15>