Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 May 2019 11:54:01 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Justin Hibbits <chmeeedalf@gmail.com>
Cc:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: head -r347003 on 2-socket/2-cores-each G5 PowerMac11,2's: one type of boot-blocking context found
Message-ID:  <C85B1B21-5BF6-4CFC-B928-2F19960B91E2@yahoo.com>
In-Reply-To: <20190507130654.20a269f6@titan.knownspace>
References:  <D2CEBBBA-40A5-4924-9817-53A8ED81011E@yahoo.com> <20190507130654.20a269f6@titan.knownspace>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-May-7, at 11:06, Justin Hibbits <chmeeedalf at gmail.com> wrote:

> On Mon, 6 May 2019 22:43:36 -0700
> Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> Every example of boot failure during cpu_mp_unleash,
>> where I've had the tracking in place, has had 1 or more
>> examples of srr0<DMAP_BASE_ADDRESS (EXC_ISE) in
>> handle_kernel_slb_spill before cpu_mp_unleash tries to
>> start its first ap.
>>=20
>> Every example of boot success, where I've had the tracking
>> in place, has had no examples of srr0<DMAP_BASE_ADDRESS
>> (EXC_ISE) in handle_kernel_slb_spill before the
>> cpu_mp_unleash finished. (Successful boots are rare
>> in my current test context, so there are fewer examples
>> of this.)
>>=20
>> In other words: the original live-G5 information
>> for the segment was still present throughout that
>> time frame, thus avoiding a slbtrap for such a
>> fetch address over the time frame involved.
>>=20
>>=20
>>=20
>> In the the code:
>>=20
>>        rstvec =3D rstvec_virtbase + reset;
>> printf("powermac_smp_start_cpu: about to use *rstvec=3D=3D4\n");
>>        *rstvec =3D 4;
>>        powerpc_sync();
>>        (void)(*rstvec);
>>        powerpc_sync();
>>        DELAY(1);
>> printf("powermac_smp_start_cpu: about to use *rstvec=3D=3D0\n");
>>        *rstvec =3D 0;
>>        powerpc_sync();
>>        (void)(*rstvec);
>>        powerpc_sync();
>> printf("powermac_smp_start_cpu: done using *rstvec=3D=3D0\n");
>>=20
>> Every boot failure has had the last line reported by
>> FireWire dcons use as the first of those 3 printf's,
>> for CPU 2 as the target (of 0-3).
>>=20
>> The above code appears to me to execute with MSR.IR=3D1
>> on the bsp.
>>=20
>> But, then, what would *rstvec do if there is no ESID=3D0
>> V=3D1 combination active for the live-G5 information at
>> the time? Does that block the exception code that
>> is in what would be ESID=3D0's address range, effectively
>> preventing slbtrap from being invoked to enable ESID=3D0?
>>=20
>> In other words: when MSR.IR=3D1, does there always
>> need to be a ESID=3D0 V=3D1 entry? Is it appropriate
>> to reserve one for ESID=3D0 V=3D1 (after invalidating
>> any arbitrarily placed ESID=3D0 V=3D1 entry present
>> before the kernel even started)?
>=20
> Hi Mark,
>=20
> Thanks for continuing to look into this.  In this case you're
> presenting, a ISE shouldn't really matter, because the SLB miss =
handler
> is written to run entirely from real mode to handle the miss.  Can you
> determine what the addresses were that faulted in the failure cases?
> We shouldn't be touching anything below DMAP_BASE at this time, since
> we're not yet in userspace, and all mappings should be either KVA or
> DMAP.

I'll try to to get examples of all of them for based on
my current code code.

But in a earlier message I reported several examples from
simply sticking a printf in handle_kernel_sb_spill and
later making it controllable to report at selective time
frames. (The printf's being there lead to earlier hang-ups.
I was surprised I got anything.)

Remember that the number of handle_kernel_sb_spill
calls for srr0<DMAP_START and dar<DMAP_START varies
from boot to boot so the places are not unique unique
overall.

Here is the core of those old reports for reference:

KDB: debugger backends: ddb
KDB: current backend: ddb
handle_kernel_slb_spill: type=3D0x380 dar=3D0x3d99348 srr0=3D0xa869bc
handle_kernel_slb_spill: type=3D0x380 dar=3D0x10000000 srr0=3D0xa869bc

Both seemed to involve the stbx instruction in:

0000000000a869bc <.memset+0x20> stbx    r4,r9,r3
0000000000a869c0 <.memset+0x24> addi    r9,r9,1
0000000000a869c4 <.memset+0x28> bdnz    0000000000a869bc <.memset+0x20>

The above was from the unconditional printf addition and, as I
remember, repeated for:

     #ifdef __powerpc64__
     i =3D 0;
     for (va =3D virtual_avail; va < virtual_end && i<(n_slbs-1)/2; va =
+=3D SEGMENT_LENGTH, i++)
             moea64_bootstrap_slb_prefault(va, 0);
     #endif
enable_handle_kernel_slb_spill_reporting=3D 1;

(Note the (n_slbs-1)/2 that I was experimenting with at
the time.)

The below was from instead enabling later:

enable_handle_kernel_slb_spill_reporting=3D 1;
     dpcpu_init(dpcpu, curcpu);

got (eliminating an unrelated line that had a
truncated address showing):

KDB: debugger backends: ddb
KDB: current backend: ddb
handle_kernel_slb_spill: type=3D0x380 dar=3D0x22ef8 srr0=3D0xa86690
handle_kernel_slb_spill: type=3D0x480 dar=3D0x22ef8 srr0=3D0xa86690

Both seemed to involve the stdu instruction in:

0000000000a8668c <.memcpy+0x140> ldu     r0,-8(r9)
0000000000a86690 <.memcpy+0x144> stdu    r0,-8(r11)
0000000000a86694 <.memcpy+0x148> bdnz    0000000000a8668c =
<.memcpy+0x140>

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C85B1B21-5BF6-4CFC-B928-2F19960B91E2>