Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Apr 2019 14:26:39 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Justin Hibbits <chmeeedalf@gmail.com>
Cc:        freebsd-ppc <freebsd-ppc@freebsd.org>
Subject:   Re: head -r344018 powerpc64 variant on Powermac G5 (2 sockets, 2 cores each): [*buffer arena] shows up more . . .? [mpc85xx_smp_timebase_sync problem too]
Message-ID:  <984188AE-93A9-4321-99EC-522C2E0CE9A9@yahoo.com>
In-Reply-To: <20190306223611.75c8a87e@titan.knownspace>
References:  <D9B56EE2-35C7-44A2-9229-D9E4AECAD3E1@yahoo.com> <20190306151914.44ea831c@titan.knownspace> <8668AAF7-9E6A-4278-9D1B-2ECDBD3804AA@yahoo.com> <99AD89F8-0F90-48BE-A060-DA12FD7129E6@yahoo.com> <20190306223611.75c8a87e@titan.knownspace>

next in thread | previous in thread | raw e-mail | index | archive | help
[Looks to me like the use and content of mpc85xx_smp_timebase_sync
have the same type of problems I noted for the proposed powermac
patch.]

On 2019-Mar-6, at 20:36, Justin Hibbits <chmeeedalf at gmail.com> wrote:

> On Wed, 6 Mar 2019 18:35:42 -0800
> Mark Millard <marklmi at yahoo.com> wrote:
> 
>> [The patch is definitely wrong via a 3rd use of
>> platform_smp_timebase_sync that I'd not noted before. Details at the
>> end.]
>> 
>> On 2019-Mar-6, at 16:39, Mark Millard <marklmi at yahoo.com> wrote:
>> 
>> 
>> 
>>> On 2019-Mar-6, at 13:19, Justin Hibbits <chmeeedalf at gmail.com>
>>> wrote: 
>>>> On Mon, 4 Mar 2019 19:43:09 -0800
>>>> Mark Millard via freebsd-ppc <freebsd-ppc@freebsd.org> wrote:
>>>> 
>>>>> [It is possible that the following is tied to my hack to
>>>>> avoid threads ending up stuck-sleeping. But I ask about
>>>>> an alternative that I see in the code.]
>>>>> 
>>>>> Context: using the modern powerpc64 VM_MAX_KERNEL_ADDRESS
>>>>> and using usefdt=1 on an old Powermac G5 (2 sockets, 2 cores
>>>>> each). Hacks are in use to provide fairly reliable booting
>>>>> and to avoid threads getting stuck sleeping.
>>>>> 
>>>>> Before the modern VM_MAX_KERNEL_ADDRESS figure there were only
>>>>> 2 or 3 bufspacedaemon-* threads as I remember. Now there are 8
>>>>> (plus bufdaemon and its worker), for example:
>>>>> 
>>>>> root         23   0.0  0.0     0   288  -  DL   15:48     0:00.39
>>>>> [bufdaemon/bufdaemon] root         23   0.0  0.0     0   288  -
>>>>> DL 15:48     0:00.05 [bufdaemon/bufspaced] root         23   0.0
>>>>> 0.0     0   288  -  DL   15:48     0:00.05 [bufdaemon/bufspaced]
>>>>> root         23   0.0  0.0     0   288  -  DL   15:48     0:00.05
>>>>> [bufdaemon/bufspaced] root         23   0.0  0.0     0   288  -
>>>>> DL 15:48     0:00.05 [bufdaemon/bufspaced] root         23   0.0
>>>>> 0.0     0   288  -  DL   15:48     0:00.05 [bufdaemon/bufspaced]
>>>>> root         23   0.0  0.0     0   288  -  DL   15:48     0:00.07
>>>>> [bufdaemon/bufspaced] root         23   0.0  0.0     0   288  -
>>>>> DL 15:48     0:00.05 [bufdaemon/bufspaced] root         23   0.0
>>>>> 0.0     0   288  -  DL   15:48     0:00.56 [bufdaemon// worker]
>>>>> 
>>>>> I'm sometimes seeing processes showing [*buffer arena] that
>>>>> seemed to wait for a fairly long time with that status, not
>>>>> something I'd seen historically for those same types of
>>>>> processes for a similar overall load (not much). During such
>>>>> times trying to create processes to look around at what is
>>>>> going on seems to also wait. (Probably with the same status?)
>>>>> 
>>>> 
>>>> Hi Mark,
>>>> 
>>>> Can you try the attached patch?  It might be overkill in the
>>>> synchronization, and I might be using the wrong barriers to be
>>>> considered correct, but I think this should narrow the race down,
>>>> and synchronize the timebases to within a very small margin.  The
>>>> real correct fix would be to suspend the timebase on all cores,
>>>> which is feasible (there's a GPIO for the G4s, and i2c for G5s),
>>>> but that's a non-trivial extra work.
>>>> 
>>>> Be warned, I haven't tested it, I've only compiled it (I don't
>>>> have a G5 to test with anymore).
>>>> 
>>> 
>>> Sure, I'll try it when the G5 is again available: it is doing
>>> a time consuming build.
>>> 
>>> I do see one possible oddity: tracing another
>>> platform_smp_timebase_sync use in the code . . .
>>> 
>>> DEVMETHOD(cpufreq_drv_set,      pmufreq_set)
>>> 
>>> static int
>>> pmufreq_set(device_t dev, const struct cf_setting *set)
>>> {
>>> . . .        
>>>       error = pmu_set_speed(speed_sel);
>>> . . .
>>> }
>>> 
>>> int
>>> pmu_set_speed(int low_speed)
>>> {
>>> . . .
>>>       platform_sleep();
>>> . . .
>>> }
>>> 
>>> PLATFORMMETHOD(platform_sleep,          powermac_sleep),
>>> 
>>> void
>>> powermac_sleep(platform_t platform)
>>> {
>>> 
>>>       *(unsigned long *)0x80 = 0x100;
>>>       cpu_sleep();
>>> }
>>> 
>>> void
>>> cpu_sleep()
>>> {
>>> . . .
>>>       platform_smp_timebase_sync(timebase, 0);
>>> . . .
>>> }
>>> 
>>> PLATFORMMETHOD(platform_smp_timebase_sync,
>>> powermac_smp_timebase_sync),
>>> 
>>> The issue:
>>> 
>>> I do not see any matching platform_smp_timebase_sync(timebase, 1)
>>> or other CPUs doing a powermac_smp_timebase_sync in this sequence.
>>> 
>>> (If this makes testing the patch inappropriate, let me know.)
>>> 
>> 
>> More important: There is also a use of:
>> 
>>        /* The following is needed for restoring from sleep. */
>>        platform_smp_timebase_sync(0, 1);
>> 
>> in cpudep_ap_setup . That in turn happens during cpu_reset_handler
>> before machdep_ap_bootstrap is called (which does
>> platform_smp_timebase_sync as well) :
>> 
>> cpu_reset_handler:
>>        GET_TOCBASE(%r2)
>> 
>>        ld      %r1,TOC_REF(tmpstk)(%r2)        /* get new SP */
>>        addi    %r1,%r1,(TMPSTKSZ-48)
>> 
>>        bl      CNAME(cpudep_ap_early_bootstrap) /* Set PCPU */
>>        nop
>>        lis     %r3,1@l
>>        bl      CNAME(pmap_cpu_bootstrap)       /* Turn on virtual
>> memory */ nop
>>        bl      CNAME(cpudep_ap_bootstrap)      /* Set up PCPU and
>> stack */ nop
>>        mr      %r1,%r3                         /* Use new stack */
>>        bl      CNAME(cpudep_ap_setup)
>>        nop
>>        GET_CPUINFO(%r5)
>>        ld      %r3,(PC_RESTORE)(%r5)
>>        cmpldi  %cr0,%r3,0
>>        beq     %cr0,2f
>>        nop
>>        li      %r4,1
>>        bl      CNAME(longjmp)
>>        nop
>> 2:
>> #ifdef SMP
>>        bl      CNAME(machdep_ap_bootstrap)     /* And away! */
>>        nop
>> #endif
>> 
>> Thus overall for ap's there is the sequence:
>> 
>>        platform_smp_timebase_sync(0, 1);
>> . . .
>>        while (ap_letgo == 0)
>>                __asm __volatile("or 31,31,31");
>>        __asm __volatile("or 6,6,6");
>> 
>>        /*
>>         * Set timebase as soon as possible to meet an implicit
>> rendezvous
>>         * from cpu_mp_unleash(), which sets ap_letgo and then
>> immediately
>>         * sets timebase.
>>         *
>>         * Note that this is instrinsically racy and is only relevant
>> on
>>         * platforms that do not support better mechanisms.
>>         */
>>        platform_smp_timebase_sync(ap_timebase, 1);
>> 
>> for each ap . So the (ap) case in powermac_smp_timebase_sync
>> will wait with tb==0 (from cpudep_ap_setup) and the later calls
>> from machdep_ap_bootstrap will not wait and will be after the
>> unleash but not just local to powermac_smp_timebase_sync:
>> 
>> static void
>> powermac_smp_timebase_sync(platform_t plat, u_long tb, int ap)
>> {
>>        static int cpus;
>>        static int unleash;
>> 
>>        if (ap) {
>>                atomic_add_int(&cpus, 1);
>>                while (!atomic_load_acq_int(&unleash))
>>                        ;
>>        } else {
>>                atomic_add_int(&cpus, 1);
>>                while (atomic_load_int(&cpus) != mp_ncpus)
>>                        ;
>>                atomic_store_rel_int(&unleash, 1);
>>        }
>> 
>>        mttb(tb);
>> }
>> 
>> In the end cpus will have double counts of the ap cpus instead
>> of matching mp_ncpus.
>> 
>> cpufreq_drv_set activity is a seperate, additional issue from this.
>> 
>> ===
>> Mark Millard
>> marklmi at yahoo.com
>> ( dsl-only.net went
>> away in early 2018-Mar)
>> 
> 
> As mentioned, I had only compiled it.  Your examination of the code
> path demonstrates that the patch is insufficient, and would hang at
> unleash anyway.  The sleep/wake logic probably needs to be updated
> anyway.  It was written for a G4 powerbook primarily for the PMU-based
> cpufreq driver, so some bits might need to be moved around.  Orthogonal
> to this issue, though.

It appears to me that the overall sequence:

       platform_smp_timebase_sync(0, 1); // called from cpudep_ap_setup
. . .
// The following are from in machdep_ap_bootstrap . . .
       while (ap_letgo == 0)
               __asm __volatile("or 31,31,31");
       __asm __volatile("or 6,6,6");
       . . .
       platform_smp_timebase_sync(ap_timebase, 1);

calls mpc85xx_smp_timebase_sync twice per ap and that the
2nd call has tb_ready && mp_ncpus<=cpu_done for each such
ap, leading to a lack of coordination of the activity that
2nd time for each ap:

static void
mpc85xx_smp_timebase_sync(platform_t plat, u_long tb, int ap)
{
        static volatile bool tb_ready;
        static volatile int cpu_done;
        
        if (ap) {
                /* APs.  Hold off until we get a stable timebase. */
                while (!tb_ready)
                        atomic_thread_fence_seq_cst();
                mttb(tb);
                atomic_add_int(&cpu_done, 1);
                while (cpu_done < mp_ncpus)
                        atomic_thread_fence_seq_cst();
        } else {
                /* BSP */
                freeze_timebase(rcpm_dev, true);
                tb_ready = true;
                mttb(tb);
                atomic_add_int(&cpu_done, 1);
                while (cpu_done < mp_ncpus)
                        atomic_thread_fence_seq_cst();
                freeze_timebase(rcpm_dev, false);
        }
}


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?984188AE-93A9-4321-99EC-522C2E0CE9A9>