Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Feb 2015 14:21:50 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: PowerMac G5 powerpc64: new context where repeatedly booting varies between failing and working
Message-ID:  <ABDD60F1-72C0-41E0-8DFB-4CFDCA9ACA82@dsl-only.net>
In-Reply-To: <7CA43EE3-8C11-4FBD-9F8A-42DF08B82362@dsl-only.net>
References:  <7CA43EE3-8C11-4FBD-9F8A-42DF08B82362@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
I've been able to show that the TOC entry that authnone_init accesses is =
messed up and it is so from very early on.

I took advantage of sys/rpc/auth_none.c exposing the static variable's =
address calculation result, in fact the same one that the crash happened =
for:

AUTH *
authnone_create()
{
        struct authnone_private *ap =3D &authnone_private;

        return (&ap->no_client);
}

The no_client even happens to be the first field of the struct pointed =
to by ap.

So I put calls of that routine where it would periodically monitor the =
calculation during the early part of booting:

root@FBSDG5M1:/usr/src # svnlite diff sys/kern/init_main.c=20
Index: sys/kern/init_main.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- sys/kern/init_main.c	(revision 278443)
+++ sys/kern/init_main.c	(working copy)
@@ -91,6 +91,11 @@
 #include <ddb/ddb.h>
 #include <ddb/db_sym.h>
=20
+#if defined(VERBOSE_SYSINIT)
+/* HACK! */
+extern void* authnone_create(void);
+#endif
+
 void mi_startup(void);				/* Should be elsewhere =
*/
=20
 /* Components of the first process -- never freed. */
@@ -282,7 +287,9 @@
=20
 #if defined(VERBOSE_SYSINIT)
 		if (verbose)
+{printf(" authnone_private address generation check: %p ", =
authnone_create());
 			printf("done.\n");
+}
 #endif

So when it boots successfully it reports messages like:

   malloc_init(&M_JFREEFRAG)...  authnone_private address generation =
check: 0xe313a8 done.

When the boots fails the very first such message of that form shows the =
0x2400002200e313a8 value, as do all the later ones. When the boot works =
it always shows 0xe313a8.

[I have since shortened the text with:
printf(" &authnone_private: %p ", authnone_create());]

It would appear that the TOC entry generation/update is the source of =
the variations in value that are observed that can lead to a crash.



=3D=3D=3D
Mark Millard
markmi at dsl-only.net

On 2015-Feb-14, at 01:53 AM, Mark Millard <markmi at dsl-only.net> =
wrote:

FreeBSD context: PowerMac G5 Quad-Core with 16GB RAM.

root@FBSDG5M1:/usr/src # freebsd-version -ku ; uname -a
10.1-STABLE
10.1-STABLE
FreeBSD FBSDG5M1 10.1-STABLE FreeBSD 10.1-STABLE #10 r278443M: Fri Feb =
13 03:26:27 PST 2015     =
root@FBSDG5M1:/usr/obj/usr/src/sys/GENERIC64vtsc  powerpc

Other configuration/context details for /boot/kernel10.1S/kernel are =
given late in this message. But I will here mention the use of:

options         DDB
options         GDB
options         VERBOSE_SYSINIT
options         BOOTVERBOSE=3D1
options         BOOTHOWTO=3DRB_VERBOSE



I've got a new context where repeatedly booting via power-off then =
power-on varies between failing and working, always failing in the same =
way/place when it does.


Here are 3 addresses (&...: ...) from a successful boot of kernel10.1S, =
where the first one will be  different for the failing boots (this is =
from dmesg -a):

  authnone_init(0)...  &authnone_private: 0xe313a8, &_null_auth: =
0xe31608, &authnone_ops: 0xc31f80 done.

where the extra output is from the added printf in:

static void
authnone_init(void *dummy)
{
       struct authnone_private *ap =3D &authnone_private;
       XDR xdrs;

       ap->no_client.ah_cred =3D ap->no_client.ah_verf =3D _null_auth;
       ap->no_client.ah_ops =3D &authnone_ops;
printf(" &authnone_private: %p, &_null_auth: %p, &authnone_ops: %p ", =
ap, &_null_auth, &authnone_ops);
       xdrmem_create(&xdrs, ap->mclient, MAX_MARSHAL_SIZE, XDR_ENCODE);
       xdr_opaque_auth(&xdrs, &ap->no_client.ah_cred);
       xdr_opaque_auth(&xdrs, &ap->no_client.ah_verf);
       ap->mcnt =3D XDR_GETPOS(&xdrs);
       XDR_DESTROY(&xdrs);
}
SYSINIT(authnone_init, SI_SUB_KMEM, SI_ORDER_ANY, authnone_init, NULL);

The authnone_init code for through its first store into ap->... is:

00000000007a3ea4 <.authnone_init> mflr    r0
00000000007a3ea8 <.authnone_init+0x4> std     r29,-24(r1)
00000000007a3eac <.authnone_init+0x8> std     r30,-16(r1)
00000000007a3eb0 <.authnone_init+0xc> std     r31,-8(r1)
00000000007a3eb4 <.authnone_init+0x10> std     r0,16(r1)
00000000007a3eb8 <.authnone_init+0x14> stdu    r1,-192(r1)
00000000007a3ebc <.authnone_init+0x18> mr      r31,r1
00000000007a3ec0 <.authnone_init+0x1c> ld      r29,304(r2)
00000000007a3ec4 <.authnone_init+0x20> ld      r9,312(r2)
00000000007a3ec8 <.authnone_init+0x24> ld      r0,0(r9)
00000000007a3ecc <.authnone_init+0x28> ld      r11,8(r9)
00000000007a3ed0 <.authnone_init+0x2c> ld      r9,16(r9)
00000000007a3ed4 <.authnone_init+0x30> std     r0,24(r29)

When the boots fail they fail on that last std: std r0,24(r29) , doing =
so based on

r2: 0xd2da20
r29: 0x2400002200e313a8
bad virtual address: 0x2400002200e313c0

(These are from a boot-crash time register display, so hand copied off =
screen as it is too soon for interaction with DDB. I've got a default =
ddb script in place that does the display.)

When it boots okay r29 =3D 0x00e313a8 and the address accessed is =
0x00e313c0 instead: see the first address that I started with above (for =
&authnone_private).

In other words: The difference is the upper half of r29. I've no =
evidence that r2 is corrupt for failing boots for this code.

So either 304(r2) accesses different values from one time to the next =
for booting or the r29 register is corrupted somehow between

00000000007a3ec0 <.authnone_init+0x1c> ld      r29,304(r2)
and
00000000007a3ed4 <.authnone_init+0x30> std     r0,24(r29)

(such as an interrupt not restoring the 64bit-ABI's register value =
fully).

At this point I've no clue if variability in the TOC contents that =
304(r2) references makes any sense or not. I've yet to figure out how it =
is established.


More FreeBSD configuration details:

10.1-STABLE's buildworld kernel and installworld were all done from the =
PowerMac G5 itself.

root@FBSDG5M1:/usr/src # more sys/powerpc/conf/GENERIC64
GENERIC64      GENERIC64vtsc =20
root@FBSDG5M1:/usr/src # more sys/powerpc/conf/GENERIC64vtsc=20
include GENERIC64
ident   GENERIC64vtsc

nooptions       PS3                     #Sony Playstation 3              =
 HACK!!! to allow sc

options         DDB                     # HACK!!! to dump early crash =
info (but 11.0-CURRENT already has it)
options         GDB                     # HACK!!! ...
options         VERBOSE_SYSINIT
options         BOOTVERBOSE=3D1
options         BOOTHOWTO=3DRB_VERBOSE
#options        KTR
#options        KTR_MASK=3DKTR_TRAP
#options        KTR_CPUMASK=3D0xF
#options        KTR_VERBOSE

# HACK!!! to allow sc for 2560x1440 display on Radeon X1950 that vt =
historically mishandled during booting
device          sc
#device          kbdmux         # HACK: already listed by vt
options         SC_OFWFB        # OFW frame buffer
options         SC_DFLT_FONT    # compile font in
makeoptions     SC_DFLT_FONT=3Dcp437


# Disable extra checking typically used for FreeBSD 11.0-CURRENT:
nooptions       DEADLKRES               #Enable the deadlock resolver
nooptions       INVARIANTS              #Enable calls of extra sanity =
checking
nooptions       INVARIANT_SUPPORT       #Extra sanity checks of internal =
structures, required by INVARIANTS
nooptions       WITNESS                 #Enable checks to detect =
deadlocks and cycles
nooptions       WITNESS_SKIPSPIN        #Don't run witness on spinlocks =
for speed
nooptions       MALLOC_DEBUG_MAXZONES   # Separate malloc(9) zones

root@FBSDG5M1:/usr/src # svnlite status
?       .snap
M       sys/ddb/db_main.c
M       sys/ddb/db_script.c
M      sys/powerpc/conf
?       sys/powerpc/conf/GENERIC64vtsc
M       sys/powerpc/ofw/ofw_machdep.c
M       sys/powerpc/ofw/ofwcall64.S
M       sys/powerpc/powermac/powermac_thermal.c
M       sys/rpc/auth_none.c

root@FBSDG5M1:/usr/src # more /boot/loader.conf=20
#kernel=3D"kernel"
#kernel=3D"kernel10.1RE"
kernel=3D"kernel10.1S"
#kernel=3D"kernel11C"
verbose_loading=3D"YES"
kern.vty=3Dvt

root@FBSDG5M1:/usr/src # more /etc/make.conf=20
WRKDIRPREFIX=3D/usr/obj/portswork
WITH_DEBUG=3D
MALLOC_PRODUCTION=3D

root@FBSDG5M1:/usr/src # more /etc/src.conf
CFLAGS+=3D-DELF_VERBOSE
#WITH_DEBUG_FILES=3D
#WITHOUT_CLANG=3D

Other than powermac_thermal.c (from Justin H.) the source changes are =
for investigation of various early-boot problems for PowerMac G5's. The =
PowerMac G5 that powermac_thermal.c was put in place to experiment with =
is no longer around but I've not yet removed the powermac_thermal.c =
update from my environment.



=3D=3D=3D
Mark Millard
markmi at dsl-only.net





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ABDD60F1-72C0-41E0-8DFB-4CFDCA9ACA82>