Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Oct 2014 14:20:26 -0700
From:      Mark Millard <markmi@dsl-only.net>
To:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Cc:        Justin Hibbits <chmeeedalf@gmail.com>
Subject:   A little new before-Copyright-notice/ofwcall crash information... [Still no solution, just more information]
Message-ID:  <477A81CF-3222-4462-B25D-F46F0AA09D3B@dsl-only.net>

next in thread | raw e-mail | index | archive | help
I was experimenting with trying to get more information on the "before =
Copyright notice"/ofwcall PowerMac G5 hangs and accidentally got better =
information than I expected. (At least if the "show registers" is to be =
believed for SRR0.)

First I'll give the results and what they refer to. Then how I got them.

As part of the experiments I stuck in isync commands after the ofwcall =
to after the mtmsrd just to prove that the same (relative) instruction =
position would be reported with or without those:

> Index: /usr/src/sys/powerpc/ofw/ofwcall64.S
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- /usr/src/sys/powerpc/ofw/ofwcall64.S	(revision 272558)
> +++ /usr/src/sys/powerpc/ofw/ofwcall64.S	(working copy)
> @@ -128,13 +128,22 @@
>  	bctrl
> =20
>  	/* Reload stack pointer and MSR from the OFW stack */
> +	isync
> +	isync
>  	ld	%r6,24(%r1)
> +	isync
> +	isync
>  	ld	%r2,16(%r1)
> +	isync
> +	isync
>  	ld	%r1,8(%r1)
> +	isync
> +	isync
> =20
>  	/* Now set the real MSR */
>  	mtmsrd	%r6
>  	isync
> +	isync
> =20
>  	/* Sign-extend the return value from OF */
>  	extsw	%r3,%r3

The result that I got was that the last isync above is where the SRR0 is =
reported as pointing when the trap happens. (No multiple-fault problem =
showed up so it did not point into the exception handling code.)

With all the extra isyncs removed (the normal code having only one isync =
in that area, the one just after the mtmsrd), the extsw instruction is =
in that position and it is what SRR0 pointed to. So that aspect ended up =
confirmed.

The version of the code with the extra isyncs should have forced any of =
the exceptions from the ld commands (and before) to happen before the =
mtmsrd was executed. As near as I can tell the implication would be that =
the mtmsrd itself is what is having an exception happen.

SRR1:  0x1000000040101120
lr:    0x8a64e8 .ofwcall+0xa8 (i.e., just after the bctrl in both types =
of code).

=46rom all this I expect that ofwcall returned before the exception =
happened.

ctr:   0xff846d78
cr:    0x22000022
xer:   0

I expect that the reported dar and dsisr are garbage (probably a wrong =
kind of trap to have them initialized). But they were listed as:

dar:   0x810248fbc10250fb
dsisr: 0xe102587f8802a648


I've no clue if openfirmware was well behaved about register values as =
of when it returned to ofwcall. r6 in the list below does not look good =
to me: a little more than r1's value, suggesting a stack address is =
being displayed instead of an msrd value. But by the time of mtmsrd %r6 =
execution r1 should no longer have the OFW stack address but one for the =
kernel at the time. (Presumes openfirmware was well behaved.)

r0: 0
r1: 0xbc0558
r2: 0xe18dd0 MP_ncpus
r3: 0xd24450
r4: 0x8a64e8 .ofwcall+0xa8 (specific address could depend on other =
variations in builds)
r5: 0
r6: 0xbc0568
r7: 0xe5f63d ofw_real_mode
r8: 0x1
r9: 0xe5f63d ofw_name_history_+0x15 (part of my crash information =
dumping hacks)
r10: 0x1c35ec0
r11: 0
r12: 0x22000022
r13: 0xddaf29 thread0
r14-r19: 0
r20: 0x10f6000
r21: 0x4
r22: 0x1801bd4
r23: 0x1803a28
r24: 0xc000000000008760
r25: 0xcd4a98
r26: 0xcf6758
r27: 0xcd4a98
r28: 0xe62690 emergency_buffer.7721+0x8
r29: 0x1874d0 ofw_name_history_pos (part of my information dumping =
hacks)
r30: 0x9000000000001032
r31: 0xc0000000000084a0

[ofw_name_history is how I earlier found the specific ofwcall that did =
not return all the way without getting an associated exception. =
ofw_name_history content is dumped by my DDB script that I forced to =
exist and runs when the exception happens.]


Now for the odd part of how I got to the above happening.

Given the multiple-fault problem that was involved I decided to try to =
get some information on which type(s) of exception(s) by making PC =
values distinct: duplicating the code that contained the address being =
reported so each use had its own copy.

So I ended up with not just realtrap but realtrap1, realtrap2, and =
realtrap3, for example, that look like:

> +realtrap1:
> +/* Test whether we already had PR set */
> +       mfsrr1  %r1
> +       mtcr    %r1
> +       mfsprg1 %r1                     /* restore SP (might have been
> +                                          overwritten) */
> +       bf      17,rt1_k_trap           /* branch if PSL_PR is false =
*/
> +       GET_CPUINFO(%r1)
> +       ld      %r1,PC_CURPCB(%r1)
> +       mr      %r27,%r28               /* Save LR, r29 */
> +       mtsprg2 %r29
> +       bl      restore_kernsrs         /* enable kernel mapping */
> +       mfsprg2 %r29
> +       mr      %r28,%r27
> +       FRAME_SETUP(PC_TEMPSAVE)
> +       ba trapagain
> +rt1_k_trap:
> +       FRAME_SETUP(PC_TEMPSAVE)
> +       ba trapagain

Since the original reports where for an address inside FRAME_SETUP code, =
I needed distinct copies of FRAME_SETUP to have unique PCs for the =
different uses.

(I could have used realtrap instead of having realtrap3 but ended up =
with realtrap unused.)

The trapagain code was after the reported fault place and so was not =
duplicated.

generictrap also got its own copy of such code (no label).

That left alitrap as the only use of the original s_trap code. (It is =
the only bla style use of s_trap in the original code and so I left that =
alone.)

After these changes I got the Show Registers results that I reported =
above instead of SRR0 values from one of the exception handler paths. =
(That is not what I expected.) The detailed changes to trap_subr64.S =
were:

> Index: /usr/src/sys/powerpc/aim/trap_subr64.S
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- /usr/src/sys/powerpc/aim/trap_subr64.S      (revision 272558)
> +++ /usr/src/sys/powerpc/aim/trap_subr64.S      (working copy)
> @@ -583,7 +583,7 @@
>         /* Try to detect a kernel stack overflow */
>         mfsrr1  %r31
>         mtcr    %r31
> -       bt      17,realtrap             /* branch is user mode */
> +       bt      17,realtrap1            /* branch is user mode */
>         mfsprg1 %r31                    /* get old SP */
>         clrrdi  %r31,%r31,12            /* Round SP down to nearest =
page */
>         sub.    %r30,%r31,%r30          /* SP - DAR */
> @@ -590,7 +590,7 @@
>         bge     1f
>         neg     %r30,%r30               /* modulo value */
>  1:     cmpldi  %cr0,%r30,4096          /* is DAR within a page of SP? =
*/
> -       bge     %cr0,realtrap           /* no, too far away. */
> +       bge     %cr0,realtrap2          /* no, too far away. */
> =20
>         /* Now convert this DSI into a DDB trap.  */
>         GET_CPUINFO(%r1)
> @@ -628,6 +628,68 @@
>         mr      %r28,%r27
>         ba s_trap
> =20
> +realtrap1:
> +/* Test whether we already had PR set */
> +       mfsrr1  %r1
> +       mtcr    %r1
> +       mfsprg1 %r1                     /* restore SP (might have been
> +                                          overwritten) */
> +       bf      17,rt1_k_trap           /* branch if PSL_PR is false =
*/
> +       GET_CPUINFO(%r1)
> +       ld      %r1,PC_CURPCB(%r1)
> +       mr      %r27,%r28               /* Save LR, r29 */
> +       mtsprg2 %r29
> +       bl      restore_kernsrs         /* enable kernel mapping */
> +       mfsprg2 %r29
> +       mr      %r28,%r27
> +       FRAME_SETUP(PC_TEMPSAVE)
> +       ba trapagain
> +rt1_k_trap:
> +       FRAME_SETUP(PC_TEMPSAVE)
> +       ba trapagain
> +
> +
> +realtrap2:
> +/* Test whether we already had PR set */
> +       mfsrr1  %r1
> +       mtcr    %r1
> +       mfsprg1 %r1                     /* restore SP (might have been
> +                                          overwritten) */
> +       bf      17,rt2_k_trap           /* branch if PSL_PR is false =
*/
> +       GET_CPUINFO(%r1)
> +       ld      %r1,PC_CURPCB(%r1)
> +       mr      %r27,%r28               /* Save LR, r29 */
> +       mtsprg2 %r29
> +       bl      restore_kernsrs         /* enable kernel mapping */
> +       mfsprg2 %r29
> +       mr      %r28,%r27
> +       FRAME_SETUP(PC_TEMPSAVE)
> +       ba trapagain
> +rt2_k_trap:
> +       FRAME_SETUP(PC_TEMPSAVE)
> +       ba trapagain
> +
> +realtrap3:
> +/* Test whether we already had PR set */
> +       mfsrr1  %r1
> +       mtcr    %r1
> +       mfsprg1 %r1                     /* restore SP (might have been
> +                                          overwritten) */
> +       bf      17,rt3_k_trap           /* branch if PSL_PR is false =
*/
> +       GET_CPUINFO(%r1)
> +       ld      %r1,PC_CURPCB(%r1)
> +       mr      %r27,%r28               /* Save LR, r29 */
> +       mtsprg2 %r29
> +       bl      restore_kernsrs         /* enable kernel mapping */
> +       mfsprg2 %r29
> +       mr      %r28,%r27
> +       FRAME_SETUP(PC_TEMPSAVE)
> +       ba trapagain
> +rt3_k_trap:
> +       FRAME_SETUP(PC_TEMPSAVE)
> +       ba trapagain
> +
> +
>  /*
>   * generictrap does some standard setup for trap handling to minimize
>   * the code that need be installed in the actual vectors. It expects
> @@ -666,6 +728,20 @@
>         mfsrr1  %r31
>         mtcr    %r31
> =20
> +       bf      17,gt_k_trap            /* branch if PSL_PR is false =
*/
> +       GET_CPUINFO(%r1)
> +       ld      %r1,PC_CURPCB(%r1)
> +       mr      %r27,%r28               /* Save LR, r29 */
> +       mtsprg2 %r29
> +       bl      restore_kernsrs         /* enable kernel mapping */
> +       mfsprg2 %r29
> +       mr      %r28,%r27
> +       FRAME_SETUP(PC_TEMPSAVE)
> +       ba trapagain
> +gt_k_trap:
> +       FRAME_SETUP(PC_TEMPSAVE)
> +       ba trapagain
> +
>  s_trap:
>         bf      17,k_trap               /* branch if PSL_PR is false =
*/
>         GET_CPUINFO(%r1)
> @@ -785,7 +861,7 @@
>         ld      %r31,(PC_DBSAVE+CPUSAVE_R31)(%r1)
>         mtsprg3 %r31                    /* SPRG3 was clobbered by =
FRAME_LEAVE */
>         mfsprg1 %r1
> -       b       realtrap
> +       b       realtrap3
>  dbleave:
>         FRAME_LEAVE(PC_DBSAVE)
>         rfid
>=20

Reverting this one file to the original code goes back to the historical =
exception-in-exception-handler report by DDB's Show Register.

=3D=3D=3D
Mark Millard
markmi at dsl-only.net




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?477A81CF-3222-4462-B25D-F46F0AA09D3B>