From owner-freebsd-ppc@FreeBSD.ORG Sun Feb 15 02:13:52 2015 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 78E1763F for ; Sun, 15 Feb 2015 02:13:52 +0000 (UTC) Received: from asp.reflexion.net (outbound-242.asp.reflexion.net [69.84.129.242]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3274D376 for ; Sun, 15 Feb 2015 02:13:51 +0000 (UTC) Received: (qmail 29418 invoked from network); 15 Feb 2015 02:13:49 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 15 Feb 2015 02:13:49 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v7.40.1) with SMTP; Sat, 14 Feb 2015 21:13:49 -0500 (EST) Received: (qmail 32355 invoked from network); 15 Feb 2015 02:13:48 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (DHE-RSA-AES256-SHA encrypted) SMTP; 15 Feb 2015 02:13:48 -0000 X-No-Relay: not in my network Received: from [192.168.1.8] (c-67-189-19-145.hsd1.or.comcast.net [67.189.19.145]) by iron2.pdx.net (Postfix) with ESMTPSA id 4B53C1C43AF for ; Sat, 14 Feb 2015 18:13:42 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: PowerMac G5 powerpc64: new context where repeatedly booting varies between failing and working From: Mark Millard In-Reply-To: Date: Sat, 14 Feb 2015 18:13:46 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: <7CA43EE3-8C11-4FBD-9F8A-42DF08B82362@dsl-only.net> To: FreeBSD PowerPC ML X-Mailer: Apple Mail (2.2070.6) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Feb 2015 02:13:52 -0000 I found a time frame before the TOC corruption. The corruption happens = during SI_SUB_TUNABLES processing, not before and not after in any of my = examples so far. With my current extra displays of the address calculations a normal boot = now starts out (dmesg -a prefix) as below: > powerpc_init end: &authnone_private: 0xe313a8 > mi_startup start &authnone_private: 0xe313a8 >=20 >=20 >=20 > sysinit: 0xbd9c00 *sysinit: 0xc3c8a8 > &authnone_private: 0xe313a8 >=20 > &authnone_private before subsystem: 0xe313a8 > subsystem 700000 > &authnone_private before subsystem: 0xe313a8 > subsystem 800001 > Copyright (c) 1992-2015 The FreeBSD Project. > ... But when the boots fail the before "subsystem 700000" &authnone_private = figure above is unchanged but after that things look like (picking an = example bad value that has occurred and manually typing it): > &authnone_private before subsystem: 0x2400004200e313a8 > subsystem 800001 > Copyright (c) 1992-2015 The FreeBSD Project. > ... and all later displays of the calculation agree with the displayed bad = value until it crashes. I've never seen the value change at any other = stage so far. The code for mi_startup displaying the values as above is: > root@FBSDG5M1:/usr/src # svnlite diff sys/kern/init_main.c > Index: sys/kern/init_main.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/kern/init_main.c (revision 278443) > +++ sys/kern/init_main.c (working copy) > @@ -91,6 +91,11 @@ > #include > #include > =20 > +#if defined(VERBOSE_SYSINIT) > +/* HACK! */ > +extern void* authnone_create(void); > +#endif > + > void mi_startup(void); /* Should be = elsewhere */ > =20 > /* Components of the first process -- never freed. */ > @@ -207,6 +212,8 @@ > int verbose; > #endif > =20 > +printf("mi_startup start &authnone_private: %p\n\n", = authnone_create()); > + > if (boothowto & RB_VERBOSE) > bootverbose++; > =20 > @@ -215,7 +222,12 @@ > sysinit_end =3D SET_LIMIT(sysinit_set); > } > =20 > + > restart: > + > +printf("\n\nsysinit: %p *sysinit: %p\n", sysinit, *sysinit); > +printf("&authnone_private: %p\n\n", authnone_create()); > + > /* > * Perform a bubble sort of the system initialization objects by > * their subsystem (primary key) and order (secondary key). > @@ -234,6 +246,8 @@ > =20 > #if defined(VERBOSE_SYSINIT) > last =3D SI_SUB_COPYRIGHT; > +/* HACK */ > + last =3D SI_SUB_DUMMY; > verbose =3D 0; > #if !defined(DDB) > printf("VERBOSE_SYSINIT: DDB not enabled, symbol lookups = disabled.\n"); > @@ -254,7 +268,11 @@ > =20 > #if defined(VERBOSE_SYSINIT) > if ((*sipp)->subsystem > last) { > +printf("&authnone_private before subsystem: %p\n ", = authnone_create()); > + > verbose =3D 1; > +/* HACK */ > +verbose =3D 0; > last =3D (*sipp)->subsystem; > printf("subsystem %x\n", last); > } I have also observed a new wildly different bad value: 0 instead of = 0x2400004200e313a8. The kernel runs much farther in this case but eventually dies for = another large bad address. But the 0 means that some stomping on low = memory occurred, such as 24(r29) indicating address 24 (0x18 hex) in the = instruction that fails for r29=3D0x2400004200e313a8. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2015-Feb-14, at 02:21 PM, Mark Millard = wrote: I've been able to show that the TOC entry that authnone_init accesses is = messed up and it is so from very early on. I took advantage of sys/rpc/auth_none.c exposing the static variable's = address calculation result, in fact the same one that the crash happened = for: AUTH * authnone_create() { struct authnone_private *ap =3D &authnone_private; return (&ap->no_client); } The no_client even happens to be the first field of the struct pointed = to by ap. So I put calls of that routine where it would periodically monitor the = calculation during the early part of booting: root@FBSDG5M1:/usr/src # svnlite diff sys/kern/init_main.c=20 Index: sys/kern/init_main.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/kern/init_main.c (revision 278443) +++ sys/kern/init_main.c (working copy) @@ -91,6 +91,11 @@ #include #include +#if defined(VERBOSE_SYSINIT) +/* HACK! */ +extern void* authnone_create(void); +#endif + void mi_startup(void); /* Should be elsewhere = */ /* Components of the first process -- never freed. */ @@ -282,7 +287,9 @@ #if defined(VERBOSE_SYSINIT) if (verbose) +{printf(" authnone_private address generation check: %p ", = authnone_create()); printf("done.\n"); +} #endif So when it boots successfully it reports messages like: malloc_init(&M_JFREEFRAG)... authnone_private address generation = check: 0xe313a8 done. When the boots fails the very first such message of that form shows the = 0x2400002200e313a8 value, as do all the later ones. When the boot works = it always shows 0xe313a8. [I have since shortened the text with: printf(" &authnone_private: %p ", authnone_create());] It would appear that the TOC entry generation/update is the source of = the variations in value that are observed that can lead to a crash. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2015-Feb-14, at 01:53 AM, Mark Millard = wrote: FreeBSD context: PowerMac G5 Quad-Core with 16GB RAM. root@FBSDG5M1:/usr/src # freebsd-version -ku ; uname -a 10.1-STABLE 10.1-STABLE FreeBSD FBSDG5M1 10.1-STABLE FreeBSD 10.1-STABLE #10 r278443M: Fri Feb = 13 03:26:27 PST 2015 = root@FBSDG5M1:/usr/obj/usr/src/sys/GENERIC64vtsc powerpc Other configuration/context details for /boot/kernel10.1S/kernel are = given late in this message. But I will here mention the use of: options DDB options GDB options VERBOSE_SYSINIT options BOOTVERBOSE=3D1 options BOOTHOWTO=3DRB_VERBOSE I've got a new context where repeatedly booting via power-off then = power-on varies between failing and working, always failing in the same = way/place when it does. Here are 3 addresses (&...: ...) from a successful boot of kernel10.1S, = where the first one will be different for the failing boots (this is = from dmesg -a): authnone_init(0)... &authnone_private: 0xe313a8, &_null_auth: = 0xe31608, &authnone_ops: 0xc31f80 done. where the extra output is from the added printf in: static void authnone_init(void *dummy) { struct authnone_private *ap =3D &authnone_private; XDR xdrs; ap->no_client.ah_cred =3D ap->no_client.ah_verf =3D _null_auth; ap->no_client.ah_ops =3D &authnone_ops; printf(" &authnone_private: %p, &_null_auth: %p, &authnone_ops: %p ", = ap, &_null_auth, &authnone_ops); xdrmem_create(&xdrs, ap->mclient, MAX_MARSHAL_SIZE, XDR_ENCODE); xdr_opaque_auth(&xdrs, &ap->no_client.ah_cred); xdr_opaque_auth(&xdrs, &ap->no_client.ah_verf); ap->mcnt =3D XDR_GETPOS(&xdrs); XDR_DESTROY(&xdrs); } SYSINIT(authnone_init, SI_SUB_KMEM, SI_ORDER_ANY, authnone_init, NULL); The authnone_init code for through its first store into ap->... is: 00000000007a3ea4 <.authnone_init> mflr r0 00000000007a3ea8 <.authnone_init+0x4> std r29,-24(r1) 00000000007a3eac <.authnone_init+0x8> std r30,-16(r1) 00000000007a3eb0 <.authnone_init+0xc> std r31,-8(r1) 00000000007a3eb4 <.authnone_init+0x10> std r0,16(r1) 00000000007a3eb8 <.authnone_init+0x14> stdu r1,-192(r1) 00000000007a3ebc <.authnone_init+0x18> mr r31,r1 00000000007a3ec0 <.authnone_init+0x1c> ld r29,304(r2) 00000000007a3ec4 <.authnone_init+0x20> ld r9,312(r2) 00000000007a3ec8 <.authnone_init+0x24> ld r0,0(r9) 00000000007a3ecc <.authnone_init+0x28> ld r11,8(r9) 00000000007a3ed0 <.authnone_init+0x2c> ld r9,16(r9) 00000000007a3ed4 <.authnone_init+0x30> std r0,24(r29) When the boots fail they fail on that last std: std r0,24(r29) , doing = so based on r2: 0xd2da20 r29: 0x2400002200e313a8 bad virtual address: 0x2400002200e313c0 (These are from a boot-crash time register display, so hand copied off = screen as it is too soon for interaction with DDB. I've got a default = ddb script in place that does the display.) When it boots okay r29 =3D 0x00e313a8 and the address accessed is = 0x00e313c0 instead: see the first address that I started with above (for = &authnone_private). In other words: The difference is the upper half of r29. I've no = evidence that r2 is corrupt for failing boots for this code. So either 304(r2) accesses different values from one time to the next = for booting or the r29 register is corrupted somehow between 00000000007a3ec0 <.authnone_init+0x1c> ld r29,304(r2) and 00000000007a3ed4 <.authnone_init+0x30> std r0,24(r29) (such as an interrupt not restoring the 64bit-ABI's register value = fully). At this point I've no clue if variability in the TOC contents that = 304(r2) references makes any sense or not. I've yet to figure out how it = is established. More FreeBSD configuration details: 10.1-STABLE's buildworld kernel and installworld were all done from the = PowerMac G5 itself. root@FBSDG5M1:/usr/src # more sys/powerpc/conf/GENERIC64 GENERIC64 GENERIC64vtsc =20 root@FBSDG5M1:/usr/src # more sys/powerpc/conf/GENERIC64vtsc=20 include GENERIC64 ident GENERIC64vtsc nooptions PS3 #Sony Playstation 3 = HACK!!! to allow sc options DDB # HACK!!! to dump early crash = info (but 11.0-CURRENT already has it) options GDB # HACK!!! ... options VERBOSE_SYSINIT options BOOTVERBOSE=3D1 options BOOTHOWTO=3DRB_VERBOSE #options KTR #options KTR_MASK=3DKTR_TRAP #options KTR_CPUMASK=3D0xF #options KTR_VERBOSE # HACK!!! to allow sc for 2560x1440 display on Radeon X1950 that vt = historically mishandled during booting device sc #device kbdmux # HACK: already listed by vt options SC_OFWFB # OFW frame buffer options SC_DFLT_FONT # compile font in makeoptions SC_DFLT_FONT=3Dcp437 # Disable extra checking typically used for FreeBSD 11.0-CURRENT: nooptions DEADLKRES #Enable the deadlock resolver nooptions INVARIANTS #Enable calls of extra sanity = checking nooptions INVARIANT_SUPPORT #Extra sanity checks of internal = structures, required by INVARIANTS nooptions WITNESS #Enable checks to detect = deadlocks and cycles nooptions WITNESS_SKIPSPIN #Don't run witness on spinlocks = for speed nooptions MALLOC_DEBUG_MAXZONES # Separate malloc(9) zones root@FBSDG5M1:/usr/src # svnlite status ? .snap M sys/ddb/db_main.c M sys/ddb/db_script.c M sys/powerpc/conf ? sys/powerpc/conf/GENERIC64vtsc M sys/powerpc/ofw/ofw_machdep.c M sys/powerpc/ofw/ofwcall64.S M sys/powerpc/powermac/powermac_thermal.c M sys/rpc/auth_none.c root@FBSDG5M1:/usr/src # more /boot/loader.conf=20 #kernel=3D"kernel" #kernel=3D"kernel10.1RE" kernel=3D"kernel10.1S" #kernel=3D"kernel11C" verbose_loading=3D"YES" kern.vty=3Dvt root@FBSDG5M1:/usr/src # more /etc/make.conf=20 WRKDIRPREFIX=3D/usr/obj/portswork WITH_DEBUG=3D MALLOC_PRODUCTION=3D root@FBSDG5M1:/usr/src # more /etc/src.conf CFLAGS+=3D-DELF_VERBOSE #WITH_DEBUG_FILES=3D #WITHOUT_CLANG=3D Other than powermac_thermal.c (from Justin H.) the source changes are = for investigation of various early-boot problems for PowerMac G5's. The = PowerMac G5 that powermac_thermal.c was put in place to experiment with = is no longer around but I've not yet removed the powermac_thermal.c = update from my environment. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-ppc@FreeBSD.ORG Sun Feb 15 02:41:51 2015 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E4413BC5 for ; Sun, 15 Feb 2015 02:41:51 +0000 (UTC) Received: from o19.email.pinterest.com (o19.email.pinterest.com [167.89.1.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A2DFC914 for ; Sun, 15 Feb 2015 02:41:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=email.pinterest.com; h=content-type:mime-version:subject:from:to:reply-to; s=s20150106; bh=idoKMVlHVAHt3Xc48Hwo1IFnX24=; b=g1tjD5zhPLKgcr99 1Ap7e0KuzW10dYt9FNWH4YlLupVNWP8WA0Q1RRgvDvs7vIpcLl0rpJ0KIiAmBVbs bOabVqLdlsOlvGINxD6eGa/JgsRxJOc9z14q1NOKO0B5i2FE4D7iKN3XDn3pF62U Lu70X5wV+n0ezUKHsMDS/hpjWJI= Received: by filter0314p1mdw1.sendgrid.net with SMTP id filter0314p1mdw1.15368.54E0076E9 2015-02-15 02:41:50.42964234 +0000 UTC Received: from jobs-weeklyemail-lines-02feeecd.ec2.pin220.com (ec2-107-20-25-125.compute-1.amazonaws.com [107.20.25.125]) by ismtpd-041 (SG) with ESMTP id 14b8b1d06d3.4285.7dcd67 for ; Sun, 15 Feb 2015 02:41:50 +0000 (UTC) MIME-Version: 1.0 Subject: Join me on Pinterest From: Code Creative XL To: freebsd-ppc@FreeBSD.org Date: Sun, 15 Feb 2015 02:41:50 -0000 Message-ID: <20150215024150.7021.79975@jobs-weeklyemail-lines-02feeecd.ec2.pin220.com> Reply-To: pinbot@email.pinterest.com X-SG-EID: Vb+Anvs0EfIvXbjCHlZrgfJ7kERTSlN8eYfhjx7Ga+VSRlEx1ZOQNes2ukAOWPkHEbVyhcG9uOEjHz h0FywiCo7dqz0UXVQqJefdPshh5n93COGh0gWcg06OAnrDeNM9RnHvt0lpD8+wENAui+du7dihLXUk DmowkEUS7uwyMbBcWOaRatDqqNkBp69KMR/T X-SG-ID: WMLztlB6QyiGaIjT5SJci8t2TZzrz54jTdLrBALlWxN1NGDODuchwJefivfnWyxdzvhWyZAuT3vMR9 0SsWDPJ7XzaAB2w6M3gqRbLaIQwtM2eh5c9nbFu7Z6iFugt2xCnVVmB0NgKdtKKq6iiS2Up25pFPcy lOim4UM4cMfg9SUfNenuezH6a8k+GMv/YG7e0BJ8zWWyu8XgPF3vrNhMdtXGpbuRU+lqBEyv4fQYJI +0ZSlCiJ51/ZdBw0rflgqpVu0T1GimyAjE9w1FpJeYXQPa/E2yidawey+/7EHLBilp/T0YOGt8yduz zRXojGaeZxoLaDtFH3eqLqbyEjsgvA== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Feb 2015 02:41:52 -0000 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 Hi there! Code Creative XL has invited you to Pinterest. Sign up today to discover and save creative ideas for cooking, travel, home= improvement and more.Accept Invite: https://www.pinterest.com/invited/?inv= iter_user_id=3D571324040136907243&invite_code=3D360245abf214558cd20e75c= 002bc4b65&utm_campaign=3Dbinvite&e_t=3D83820ef236964925b420f020b8b8= 0394&utm_medium=3D2000&utm_source=3D31&e_t_s=3Dcta See what Code Creative XL has discovered FREE! | pOp Retro Touch Panel File - http://goo.gl/6Q22io https://www.pinterest.com/invited/?inviter_user_id=3D571324040136907243&inv= ite_code=3D360245abf214558cd20e75c002bc4b65&utm_campaign=3Dbinvite&utm_term= =3D1&utm_content=3D571323902704281088&e_t=3D83820ef236964925b420f020b8b8039= 4&utm_source=3D31&e_t_s=3Dpins&utm_medium=3D2000 Muscle and Style in Ferrari-Inspired Home Theater - Slideshow from CE ... https://www.pinterest.com/invited/?inviter_user_id=3D571324040136907243&inv= ite_code=3D360245abf214558cd20e75c002bc4b65&utm_campaign=3Dbinvite&utm_term= =3D2&utm_content=3D571323902702699778&e_t=3D83820ef236964925b420f020b8b8039= 4&utm_source=3D31&e_t_s=3Dpins&utm_medium=3D2000 This dream home is powered by AMX gear! Join us ... https://www.pinterest.com/invited/?inviter_user_id=3D571324040136907243&inv= ite_code=3D360245abf214558cd20e75c002bc4b65&utm_campaign=3Dbinvite&utm_term= =3D3&utm_content=3D571323902702228623&e_t=3D83820ef236964925b420f020b8b8039= 4&utm_source=3D31&e_t_s=3Dpins&utm_medium=3D2000 Class=C3=A9 Audio in 19" rack https://www.pinterest.com/invited/?inviter_user_id=3D571324040136907243&inv= ite_code=3D360245abf214558cd20e75c002bc4b65&utm_campaign=3Dbinvite&utm_term= =3D4&utm_content=3D571323902702699771&e_t=3D83820ef236964925b420f020b8b8039= 4&utm_source=3D31&e_t_s=3Dpins&utm_medium=3D2000 Check out our new blog post: AMX University...What Do ... https://www.pinterest.com/invited/?inviter_user_id=3D571324040136907243&inv= ite_code=3D360245abf214558cd20e75c002bc4b65&utm_campaign=3Dbinvite&utm_term= =3D5&utm_content=3D571323902702228637&e_t=3D83820ef236964925b420f020b8b8039= 4&utm_source=3D31&e_t_s=3Dpins&utm_medium=3D2000 Check out our new blog post: AMX University...What Do ... https://www.pinterest.com/invited/?inviter_user_id=3D571324040136907243&inv= ite_code=3D360245abf214558cd20e75c002bc4b65&utm_campaign=3Dbinvite&utm_term= =3D6&utm_content=3D571323902702228645&e_t=3D83820ef236964925b420f020b8b8039= 4&utm_source=3D31&e_t_s=3Dpins&utm_medium=3D2000 pOp Retro by @CodeCreative_XL =E2=96=BA http://t.co/E16C4zw2SA https://www.pinterest.com/invited/?inviter_user_id=3D571324040136907243&inv= ite_code=3D360245abf214558cd20e75c002bc4b65&utm_campaign=3Dbinvite&utm_term= =3D1&utm_content=3D571323902702225025&e_t=3D83820ef236964925b420f020b8b8039= 4&utm_source=3D31&e_t_s=3Dpins&utm_medium=3D2000 www.codecreativexl.com https://www.pinterest.com/invited/?inviter_user_id=3D571324040136907243&inv= ite_code=3D360245abf214558cd20e75c002bc4b65&utm_campaign=3Dbinvite&utm_term= =3D2&utm_content=3D571323902703446405&e_t=3D83820ef236964925b420f020b8b8039= 4&utm_source=3D31&e_t_s=3Dpins&utm_medium=3D2000 Extreme lighting control that has been simplified! Control any lights ... https://www.pinterest.com/invited/?inviter_user_id=3D571324040136907243&inv= ite_code=3D360245abf214558cd20e75c002bc4b65&utm_campaign=3Dbinvite&utm_term= =3D3&utm_content=3D571323902703446413&e_t=3D83820ef236964925b420f020b8b8039= 4&utm_source=3D31&e_t_s=3Dpins&utm_medium=3D2000 Press Release: Crestron=C2=AE 20=E2=80=9D HD Touch Screen Display with Capa= citive ... https://www.pinterest.com/invited/?inviter_user_id=3D571324040136907243&inv= ite_code=3D360245abf214558cd20e75c002bc4b65&utm_campaign=3Dbinvite&utm_term= =3D4&utm_content=3D571323902701907350&e_t=3D83820ef236964925b420f020b8b8039= 4&utm_source=3D31&e_t_s=3Dpins&utm_medium=3D2000 This dream home is powered by AMX gear! Join us ... https://www.pinterest.com/invited/?inviter_user_id=3D571324040136907243&inv= ite_code=3D360245abf214558cd20e75c002bc4b65&utm_campaign=3Dbinvite&utm_term= =3D5&utm_content=3D571323902702228634&e_t=3D83820ef236964925b420f020b8b8039= 4&utm_source=3D31&e_t_s=3Dpins&utm_medium=3D2000 Skeuomorphic and Flat Skeuomorphic =E2=80=94 The best of both designs...New= ... https://www.pinterest.com/invited/?inviter_user_id=3D571324040136907243&inv= ite_code=3D360245abf214558cd20e75c002bc4b65&utm_campaign=3Dbinvite&utm_term= =3D6&utm_content=3D571323902703756535&e_t=3D83820ef236964925b420f020b8b8039= 4&utm_source=3D31&e_t_s=3Dpins&utm_medium=3D2000 Pinterest: http://www.pinterest.com/Pinterest/?utm_campaign=3Dbinvite&e_t= =3D83820ef236964925b420f020b8b80394&utm_medium=3D2000&utm_source=3D31&e_t_s= =3Dfooter Facebook: https://www.facebook.com/Pinterest Twitter: https://www.twitter.com/Pinterest Google: https://plus.google.com/+Pinterest/ Blog: http://blog.pinterest.com This email was sent to michael@codecreativexl.com. Don't want to receive this type of email? https://www.pinterest.com/email/u= nsubscribe/?od=3D35E7Y4RzCtmns%2BZpRMsmkGYic5Rzz3yWXRRShjeJbyDifeDcAy3TQt8p= CsBVYGW%2FPLV27Jda9%2Bom%0AVYzBcEBWdqr6ZXcEihf%2BkvjUJjqNflHRaQCUB7I%2FFM39= Lp%2F8J0TBgixdjCCgOaKad8jgEWrN%2Fv1Z%0AsUGEdzPBCG8Ch1S%2FMdcegXNSPDc%2BcG2h= tlmTtVKnfk%2F6kIxfulw9lvGcqbeOxejf5S0B43k3bY95%0Achp2oX%2Bd2vdJA%2Fc%2BA7g7= gy24e4jdoDjg9Dfi2%2FwJa3qasdtCo%2F88wU4rKzlUPrN9SdRaz%2BQaJxaQ%0A52mpdW4DLN= bLhuI7kpiHbFypITIo18qHBFcx2Kd0aaY1baXJ6VdgExBk8ZtULFfXwmxKAnKoLdMZ%0A9B2RwL= KC0PFaH3r3Ygv2JhZ%2FNQ%3D%3D%0A&user_id=3DZnJlZWJzZC1wcGNARnJlZUJTRC5vc= mc%3D%0A&utm_campaign=3Dbinvite&e_t=3D83820ef236964925b420f020b8b80= 394&utm_medium=3D2000&utm_source=3D31&e_t_s=3Dfooter Have a question? Visit our Help Center: https://www.pinterest.com/help/?utm= _campaign=3Dbinvite&e_t=3D83820ef236964925b420f020b8b80394&utm_medi= um=3D2000&utm_source=3D31&e_t_s=3Dfooter Copyright Pinterest, Inc. * All Rights Reserved. Privacy Policy: https://www.pinterest.com/_/_/about/privacy/ Terms and Conditions: https://www.pinterest.com/_/_/about/terms/ From owner-freebsd-ppc@FreeBSD.ORG Sun Feb 15 08:38:23 2015 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B0C0E6FB for ; Sun, 15 Feb 2015 08:38:23 +0000 (UTC) Received: from asp.reflexion.net (outbound-242.asp.reflexion.net [69.84.129.242]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6B15EB5A for ; Sun, 15 Feb 2015 08:38:22 +0000 (UTC) Received: (qmail 13016 invoked from network); 15 Feb 2015 08:38:20 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 15 Feb 2015 08:38:20 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v7.40.1) with SMTP; Sun, 15 Feb 2015 03:38:20 -0500 (EST) Received: (qmail 13239 invoked from network); 15 Feb 2015 08:38:20 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (DHE-RSA-AES256-SHA encrypted) SMTP; 15 Feb 2015 08:38:20 -0000 X-No-Relay: not in my network Received: from [192.168.1.8] (c-67-189-19-145.hsd1.or.comcast.net [67.189.19.145]) by iron2.pdx.net (Postfix) with ESMTPSA id 1D48A1C43AF for ; Sun, 15 Feb 2015 00:38:19 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: PowerMac G5 powerpc64: new context where repeatedly booting varies between failing and working From: Mark Millard In-Reply-To: Date: Sun, 15 Feb 2015 00:38:18 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <5FE82152-BBF7-4C6D-932D-AEE70546CACA@dsl-only.net> References: <7CA43EE3-8C11-4FBD-9F8A-42DF08B82362@dsl-only.net> To: FreeBSD PowerPC ML X-Mailer: Apple Mail (2.2070.6) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Feb 2015 08:38:23 -0000 I changed the mi_startup code to report the specific SI_SUB_TUNABLES = item that ends up with the address showing as changed afterwards. mp_setmaxid is the only culprit reported so far. Further printf work lead to seeing = the calculation working fine before the following OF_peer(0) but when = there is a boot failure the bad value first shows up at this level just = after the OF_peer(0) call returned: static int powermac_smp_first_cpu(platform_t plat, struct cpuref *cpuref) { char buf[8]; phandle_t cpu, dev, root; int res; root =3D OF_peer(0); dev =3D OF_child(root); while (dev !=3D 0) { res =3D OF_getprop(dev, "name", buf, sizeof(buf)); if (res > 0 && strcmp(buf, "cpus") =3D=3D 0) break; dev =3D OF_peer(dev); } if (dev =3D=3D 0) { /* * psim doesn't have a name property on the /cpus node, * but it can be found directly */ dev =3D OF_finddevice("/cpus"); if (dev =3D=3D -1) return (ENOENT); } =20 cpu =3D OF_child(dev); =20 while (cpu !=3D 0) { res =3D OF_getprop(cpu, "device_type", buf, = sizeof(buf)); if (res > 0 && strcmp(buf, "cpu") =3D=3D 0) break; cpu =3D OF_peer(cpu); } if (cpu =3D=3D 0) return (ENOENT); return (powermac_smp_fill_cpuref(cpuref, cpu)); }=20 =3D=3D=3D Mark Millard markmi at dsl-only.net On 2015-Feb-14, at 06:13 PM, Mark Millard = wrote: I found a time frame before the TOC corruption. The corruption happens = during SI_SUB_TUNABLES processing, not before and not after in any of my = examples so far. With my current extra displays of the address calculations a normal boot = now starts out (dmesg -a prefix) as below: > powerpc_init end: &authnone_private: 0xe313a8 > mi_startup start &authnone_private: 0xe313a8 >=20 >=20 >=20 > sysinit: 0xbd9c00 *sysinit: 0xc3c8a8 > &authnone_private: 0xe313a8 >=20 > &authnone_private before subsystem: 0xe313a8 > subsystem 700000 > &authnone_private before subsystem: 0xe313a8 > subsystem 800001 > Copyright (c) 1992-2015 The FreeBSD Project. > ... But when the boots fail the before "subsystem 700000" &authnone_private = figure above is unchanged but after that things look like (picking an = example bad value that has occurred and manually typing it): > &authnone_private before subsystem: 0x2400004200e313a8 > subsystem 800001 > Copyright (c) 1992-2015 The FreeBSD Project. > ... and all later displays of the calculation agree with the displayed bad = value until it crashes. I've never seen the value change at any other = stage so far. The code for mi_startup displaying the values as above is: > root@FBSDG5M1:/usr/src # svnlite diff sys/kern/init_main.c > Index: sys/kern/init_main.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/kern/init_main.c (revision 278443) > +++ sys/kern/init_main.c (working copy) > @@ -91,6 +91,11 @@ > #include > #include >=20 > +#if defined(VERBOSE_SYSINIT) > +/* HACK! */ > +extern void* authnone_create(void); > +#endif > + > void mi_startup(void); /* Should be = elsewhere */ >=20 > /* Components of the first process -- never freed. */ > @@ -207,6 +212,8 @@ > int verbose; > #endif >=20 > +printf("mi_startup start &authnone_private: %p\n\n", = authnone_create()); > + > if (boothowto & RB_VERBOSE) > bootverbose++; >=20 > @@ -215,7 +222,12 @@ > sysinit_end =3D SET_LIMIT(sysinit_set); > } >=20 > + > restart: > + > +printf("\n\nsysinit: %p *sysinit: %p\n", sysinit, *sysinit); > +printf("&authnone_private: %p\n\n", authnone_create()); > + > /* > * Perform a bubble sort of the system initialization objects by > * their subsystem (primary key) and order (secondary key). > @@ -234,6 +246,8 @@ >=20 > #if defined(VERBOSE_SYSINIT) > last =3D SI_SUB_COPYRIGHT; > +/* HACK */ > + last =3D SI_SUB_DUMMY; > verbose =3D 0; > #if !defined(DDB) > printf("VERBOSE_SYSINIT: DDB not enabled, symbol lookups = disabled.\n"); > @@ -254,7 +268,11 @@ >=20 > #if defined(VERBOSE_SYSINIT) > if ((*sipp)->subsystem > last) { > +printf("&authnone_private before subsystem: %p\n ", = authnone_create()); > + > verbose =3D 1; > +/* HACK */ > +verbose =3D 0; > last =3D (*sipp)->subsystem; > printf("subsystem %x\n", last); > } I have also observed a new wildly different bad value: 0 instead of = 0x2400004200e313a8. The kernel runs much farther in this case but eventually dies for = another large bad address. But the 0 means that some stomping on low = memory occurred, such as 24(r29) indicating address 24 (0x18 hex) in the = instruction that fails for r29=3D0x2400004200e313a8. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2015-Feb-14, at 02:21 PM, Mark Millard = wrote: I've been able to show that the TOC entry that authnone_init accesses is = messed up and it is so from very early on. I took advantage of sys/rpc/auth_none.c exposing the static variable's = address calculation result, in fact the same one that the crash happened = for: AUTH * authnone_create() { struct authnone_private *ap =3D &authnone_private; return (&ap->no_client); } The no_client even happens to be the first field of the struct pointed = to by ap. So I put calls of that routine where it would periodically monitor the = calculation during the early part of booting: root@FBSDG5M1:/usr/src # svnlite diff sys/kern/init_main.c=20 Index: sys/kern/init_main.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/kern/init_main.c (revision 278443) +++ sys/kern/init_main.c (working copy) @@ -91,6 +91,11 @@ #include #include +#if defined(VERBOSE_SYSINIT) +/* HACK! */ +extern void* authnone_create(void); +#endif + void mi_startup(void); /* Should be elsewhere = */ /* Components of the first process -- never freed. */ @@ -282,7 +287,9 @@ #if defined(VERBOSE_SYSINIT) if (verbose) +{printf(" authnone_private address generation check: %p ", = authnone_create()); printf("done.\n"); +} #endif So when it boots successfully it reports messages like: malloc_init(&M_JFREEFRAG)... authnone_private address generation = check: 0xe313a8 done. When the boots fails the very first such message of that form shows the = 0x2400002200e313a8 value, as do all the later ones. When the boot works = it always shows 0xe313a8. [I have since shortened the text with: printf(" &authnone_private: %p ", authnone_create());] It would appear that the TOC entry generation/update is the source of = the variations in value that are observed that can lead to a crash. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2015-Feb-14, at 01:53 AM, Mark Millard = wrote: FreeBSD context: PowerMac G5 Quad-Core with 16GB RAM. root@FBSDG5M1:/usr/src # freebsd-version -ku ; uname -a 10.1-STABLE 10.1-STABLE FreeBSD FBSDG5M1 10.1-STABLE FreeBSD 10.1-STABLE #10 r278443M: Fri Feb = 13 03:26:27 PST 2015 = root@FBSDG5M1:/usr/obj/usr/src/sys/GENERIC64vtsc powerpc Other configuration/context details for /boot/kernel10.1S/kernel are = given late in this message. But I will here mention the use of: options DDB options GDB options VERBOSE_SYSINIT options BOOTVERBOSE=3D1 options BOOTHOWTO=3DRB_VERBOSE I've got a new context where repeatedly booting via power-off then = power-on varies between failing and working, always failing in the same = way/place when it does. Here are 3 addresses (&...: ...) from a successful boot of kernel10.1S, = where the first one will be different for the failing boots (this is = from dmesg -a): authnone_init(0)... &authnone_private: 0xe313a8, &_null_auth: 0xe31608, = &authnone_ops: 0xc31f80 done. where the extra output is from the added printf in: static void authnone_init(void *dummy) { struct authnone_private *ap =3D &authnone_private; XDR xdrs; ap->no_client.ah_cred =3D ap->no_client.ah_verf =3D _null_auth; ap->no_client.ah_ops =3D &authnone_ops; printf(" &authnone_private: %p, &_null_auth: %p, &authnone_ops: %p ", = ap, &_null_auth, &authnone_ops); xdrmem_create(&xdrs, ap->mclient, MAX_MARSHAL_SIZE, XDR_ENCODE); xdr_opaque_auth(&xdrs, &ap->no_client.ah_cred); xdr_opaque_auth(&xdrs, &ap->no_client.ah_verf); ap->mcnt =3D XDR_GETPOS(&xdrs); XDR_DESTROY(&xdrs); } SYSINIT(authnone_init, SI_SUB_KMEM, SI_ORDER_ANY, authnone_init, NULL); The authnone_init code for through its first store into ap->... is: 00000000007a3ea4 <.authnone_init> mflr r0 00000000007a3ea8 <.authnone_init+0x4> std r29,-24(r1) 00000000007a3eac <.authnone_init+0x8> std r30,-16(r1) 00000000007a3eb0 <.authnone_init+0xc> std r31,-8(r1) 00000000007a3eb4 <.authnone_init+0x10> std r0,16(r1) 00000000007a3eb8 <.authnone_init+0x14> stdu r1,-192(r1) 00000000007a3ebc <.authnone_init+0x18> mr r31,r1 00000000007a3ec0 <.authnone_init+0x1c> ld r29,304(r2) 00000000007a3ec4 <.authnone_init+0x20> ld r9,312(r2) 00000000007a3ec8 <.authnone_init+0x24> ld r0,0(r9) 00000000007a3ecc <.authnone_init+0x28> ld r11,8(r9) 00000000007a3ed0 <.authnone_init+0x2c> ld r9,16(r9) 00000000007a3ed4 <.authnone_init+0x30> std r0,24(r29) When the boots fail they fail on that last std: std r0,24(r29) , doing = so based on r2: 0xd2da20 r29: 0x2400002200e313a8 bad virtual address: 0x2400002200e313c0 (These are from a boot-crash time register display, so hand copied off = screen as it is too soon for interaction with DDB. I've got a default = ddb script in place that does the display.) When it boots okay r29 =3D 0x00e313a8 and the address accessed is = 0x00e313c0 instead: see the first address that I started with above (for = &authnone_private). In other words: The difference is the upper half of r29. I've no = evidence that r2 is corrupt for failing boots for this code. So either 304(r2) accesses different values from one time to the next = for booting or the r29 register is corrupted somehow between 00000000007a3ec0 <.authnone_init+0x1c> ld r29,304(r2) and 00000000007a3ed4 <.authnone_init+0x30> std r0,24(r29) (such as an interrupt not restoring the 64bit-ABI's register value = fully). At this point I've no clue if variability in the TOC contents that = 304(r2) references makes any sense or not. I've yet to figure out how it = is established. More FreeBSD configuration details: 10.1-STABLE's buildworld kernel and installworld were all done from the = PowerMac G5 itself. root@FBSDG5M1:/usr/src # more sys/powerpc/conf/GENERIC64 GENERIC64 GENERIC64vtsc =20 root@FBSDG5M1:/usr/src # more sys/powerpc/conf/GENERIC64vtsc=20 include GENERIC64 ident GENERIC64vtsc nooptions PS3 #Sony Playstation 3 = HACK!!! to allow sc options DDB # HACK!!! to dump early crash = info (but 11.0-CURRENT already has it) options GDB # HACK!!! ... options VERBOSE_SYSINIT options BOOTVERBOSE=3D1 options BOOTHOWTO=3DRB_VERBOSE #options KTR #options KTR_MASK=3DKTR_TRAP #options KTR_CPUMASK=3D0xF #options KTR_VERBOSE # HACK!!! to allow sc for 2560x1440 display on Radeon X1950 that vt = historically mishandled during booting device sc #device kbdmux # HACK: already listed by vt options SC_OFWFB # OFW frame buffer options SC_DFLT_FONT # compile font in makeoptions SC_DFLT_FONT=3Dcp437 # Disable extra checking typically used for FreeBSD 11.0-CURRENT: nooptions DEADLKRES #Enable the deadlock resolver nooptions INVARIANTS #Enable calls of extra sanity = checking nooptions INVARIANT_SUPPORT #Extra sanity checks of internal = structures, required by INVARIANTS nooptions WITNESS #Enable checks to detect = deadlocks and cycles nooptions WITNESS_SKIPSPIN #Don't run witness on spinlocks = for speed nooptions MALLOC_DEBUG_MAXZONES # Separate malloc(9) zones root@FBSDG5M1:/usr/src # svnlite status ? .snap M sys/ddb/db_main.c M sys/ddb/db_script.c M sys/powerpc/conf ? sys/powerpc/conf/GENERIC64vtsc M sys/powerpc/ofw/ofw_machdep.c M sys/powerpc/ofw/ofwcall64.S M sys/powerpc/powermac/powermac_thermal.c M sys/rpc/auth_none.c root@FBSDG5M1:/usr/src # more /boot/loader.conf=20 #kernel=3D"kernel" #kernel=3D"kernel10.1RE" kernel=3D"kernel10.1S" #kernel=3D"kernel11C" verbose_loading=3D"YES" kern.vty=3Dvt root@FBSDG5M1:/usr/src # more /etc/make.conf=20 WRKDIRPREFIX=3D/usr/obj/portswork WITH_DEBUG=3D MALLOC_PRODUCTION=3D root@FBSDG5M1:/usr/src # more /etc/src.conf CFLAGS+=3D-DELF_VERBOSE #WITH_DEBUG_FILES=3D #WITHOUT_CLANG=3D Other than powermac_thermal.c (from Justin H.) the source changes are = for investigation of various early-boot problems for PowerMac G5's. The = PowerMac G5 that powermac_thermal.c was put in place to experiment with = is no longer around but I've not yet removed the powermac_thermal.c = update from my environment. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-ppc@FreeBSD.ORG Sun Feb 15 12:25:23 2015 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A4E729A8 for ; Sun, 15 Feb 2015 12:25:23 +0000 (UTC) Received: from nagini.codelibre.net (nagini.codelibre.net [80.68.93.164]) by mx1.freebsd.org (Postfix) with ESMTP id 722D9164 for ; Sun, 15 Feb 2015 12:25:22 +0000 (UTC) Received: by nagini.codelibre.net (Postfix, from userid 1000) id 9B08C18112; Sun, 15 Feb 2015 12:25:21 +0000 (GMT) Date: Sun, 15 Feb 2015 12:25:21 +0000 From: Roger Leigh To: freebsd-ppc@freebsd.org Subject: Mac Mini G4 - black screen after boot with -CURRENT Message-ID: <20150215122521.GG18832@codelibre.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Feb 2015 12:25:23 -0000 This was asked 5 months back in http://lists.freebsd.org/pipermail/freebsd-ppc/2014-September/007202.html but there was no reply I saw. Yesterday, I built current SVN head on a new 10.1-RELEASE install. 10.1 booted just fine, but after booting the new kernel I get a black screen and the machine appears locked up (no working keyboard, and fan at max speed). Does anyone know what might be at fault here. Any special kernel build options needed? Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' schroot and sbuild http://alioth.debian.org/projects/buildd-tools `- GPG Public Key F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800 From owner-freebsd-ppc@FreeBSD.ORG Mon Feb 16 00:14:58 2015 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C02A1E7F for ; Mon, 16 Feb 2015 00:14:58 +0000 (UTC) Received: from vps.server.com (serv1.makinvest1.com [46.249.46.19]) by mx1.freebsd.org (Postfix) with ESMTP id 7F5C0B24 for ; Mon, 16 Feb 2015 00:14:58 +0000 (UTC) Received: from 188.36.251.72.tunnelservers.net (unknown [72.251.36.188]) by vps.server.com (Postfix) with ESMTPA id 5E0A91D936B4 for ; Mon, 16 Feb 2015 03:12:42 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=serv1.makinvest1.com; s=mail1; t=1424045563; bh=HyioEbglVKjLlVk98dnLeC8lcsLH0T6+UuJF6lvPkgo=; h=From:Subject:To:Content-Type:MIME-Version:Reply-To:Date; b=IGh9WMdFMZgy/ET4L5SuQfh0TLs9O8AQ2SQvmBcs73eAiGPNlozXDTpQQW6ynoeFA g9ti9fRGqFYEnNQLLE6Etd/quBy016EuX/Z1rgFusobLD3Bh+eeqisS19O1FAx9/fU VbfuqaDLf+dYzaGeOd0Xb0dRY+2NmrlrViDQpBIc= From: "Mak Global Investment" Subject: Do you need a loan or investment? To: "freebsd-ppc" MIME-Version: 1.0 Reply-To: "Mak Global Investment" Date: Sun, 15 Feb 2015 19:13:54 -0500 X-Greylist: Default is to whitelist mail, not delayed by milter-greylist-4.5.1 (vps.server.com [0.0.0.0]); Mon, 16 Feb 2015 03:12:43 +0300 (MSK) Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Feb 2015 00:14:58 -0000 =EF=BB=BFHello. =20 Do you need a loan or investment?. Apply here for your reliable loan= today at 3% interest rate with EXPO 2020 special offer, kindly contac= t us if you have a reliable and lucrative business that requires finan= cing. MAK Global Investment P.O. Box 471471 105B Salahuddin St Ras Al Khaimah United Arab Emirates Phone:+971(0)529273042 From owner-freebsd-ppc@FreeBSD.ORG Tue Feb 17 15:17:30 2015 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 99DBCD58 for ; Tue, 17 Feb 2015 15:17:30 +0000 (UTC) Received: from mail-ig0-x22b.google.com (mail-ig0-x22b.google.com [IPv6:2607:f8b0:4001:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6A3751F0 for ; Tue, 17 Feb 2015 15:17:30 +0000 (UTC) Received: by mail-ig0-f171.google.com with SMTP id h15so30247287igd.4 for ; Tue, 17 Feb 2015 07:17:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=yXVG/gQse3wq+g9T54bNu76xBAi7NcjPYXF+BNkzN+c=; b=Rna0ctdouezPnuL/ypF4BFANWCVyeEBa0WCYOca81wS3hrVwZOw1zRgmSkjaP4nZ8u jyq3IQxOSwDXPYJcE/aNBX6CAiATEFYstSl2NKvzVwCgsAceS511BXMzDHnwEmOJvNje CT0cW6yConxJtYVlGcCDu6838feHPsgsgEq3mA+MGwgDFQF46w60a+liso5bqLFPJ1as NYbdIxf7Ucqu7J5ssaH1kekIPuW1/jiXxbbUtA6ym9UsZnMdG1oZf1yi5tQNWEBPQXBc wCCsn7gxjNr6V5rclqV0KpzW/iVhgie1vlvmKMdOloFCo6MsXArOeM4XJe+4G4MTITMl bI2g== MIME-Version: 1.0 X-Received: by 10.107.169.42 with SMTP id s42mr1640917ioe.46.1424186249583; Tue, 17 Feb 2015 07:17:29 -0800 (PST) Received: by 10.36.2.146 with HTTP; Tue, 17 Feb 2015 07:17:28 -0800 (PST) Date: Tue, 17 Feb 2015 10:17:28 -0500 Message-ID: Subject: Thank you From: Britt Dodd To: FreeBSD PowerPC ML Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Feb 2015 15:17:30 -0000 First, thank you for being one of the best supported platforms for PowerPC. I'm fortunate to have the ability of running multiple machines in my home. So far I have a G5 XServe and a G4 XServe running FreeBSD (the newest 10.1 release). I've yet to get the XServe LED CPU lights working, but I'm assuming once I get a new kernel built with this support, it should work just fine. I'm slowly building applications via Ports, but would like to know if this group would be interested in me setting up a pkg repository? I'm reading up on Poudriere, which supposedly will allow you to build pkg packages from Ports. I have a co-located server I can push these built packages to, and can also provide SSH access to these instances if others are interested in helping with maintenance. I for one will be interested in maintaining packages for G4, because my equipment is older G4 things, and they probably would take a long time to build/update. By providing built pkg packages, compilation time across our userbase would diminish tremendously. Is this something that others would like to have? Britt From owner-freebsd-ppc@FreeBSD.ORG Wed Feb 18 05:35:03 2015 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D7F6B4BF for ; Wed, 18 Feb 2015 05:35:03 +0000 (UTC) Received: from asp.reflexion.net (outbound-242.asp.reflexion.net [69.84.129.242]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8E9A4E9C for ; Wed, 18 Feb 2015 05:35:02 +0000 (UTC) Received: (qmail 13064 invoked from network); 18 Feb 2015 05:34:56 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 18 Feb 2015 05:34:56 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v7.40.1) with SMTP; Wed, 18 Feb 2015 00:34:56 -0500 (EST) Received: (qmail 27936 invoked from network); 18 Feb 2015 05:34:56 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (DHE-RSA-AES256-SHA encrypted) SMTP; 18 Feb 2015 05:34:56 -0000 X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network Received: from [192.168.1.8] (c-67-189-19-145.hsd1.or.comcast.net [67.189.19.145]) by iron2.pdx.net (Postfix) with ESMTPSA id 191B71C43A4; Tue, 17 Feb 2015 21:34:49 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: PowerMac G5 powerpc64: new context where repeatedly booting varies between failing and working From: Mark Millard In-Reply-To: <5FE82152-BBF7-4C6D-932D-AEE70546CACA@dsl-only.net> Date: Tue, 17 Feb 2015 21:34:53 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <36C14790-8E66-4C9D-9F29-A137FB49439D@dsl-only.net> References: <7CA43EE3-8C11-4FBD-9F8A-42DF08B82362@dsl-only.net> <5FE82152-BBF7-4C6D-932D-AEE70546CACA@dsl-only.net> To: FreeBSD PowerPC ML X-Mailer: Apple Mail (2.2070.6) Cc: Justin Hibbits X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Feb 2015 05:35:03 -0000 [I had sent Nathan W. and Justin H. a picture of a display of a = boot-time corrupted memory region. This time I tried to find the start = and end of the region and I'm documenting in a textual form more = appropriate to the list. I have also removed prior Email history from = this Email but there is much context one must check that history for.] Several of the new values put in place by the .got memory corruption = reported below match up with .opd or other types of addresses reported = by objdump for my /boot/kernel10.1S/kernel. They are noted below as I = list detailed differences. I made the early-boot-crash display a larger range and the span of the = corruption seemed to go as follows for the corruption of part of the = .got area. Also I induced a deference of the bad pointer as soon as it = is discovered after the OF_peer(0) in question returns so later code = would not be involved when it crashes. (Crash early, crash often...) Overall structure: 0xd2da37 and before as far as I looked: no corruption found. The area from 0xd2da38-0xd2dc9F: largely corrupted. 0x268 or 616 bytes = or so in this corrupted range. 616=3D77*8. After that range: good again as far as I looked. The details: Warning: The below is based on hand transcribed information from screen = pictures that I took. Showing pair of lines (good then corrupted), using x/x style lines: 0xd2da30: 0, b4fd2c, 0, b4fd70 0xd2da30: 0, b4fd2c, 0, 0 0xd2da40: 0, e28948, 0, e1e460 0xd2da40: 0, 24000042, 0, d00058 (24000042 looks like a cr value?) (0000000000d00058 l .opd 0000000000000018 = ofw_rendezvous_dispatch) 0xd2da50: 0, bc7de8, 0, bc7e08 0xd2da50: 0, cde110, c0000000, 8740 (0xc000000000008740 looks like a stack address?) (0000000000cde110 g F .opd 0000000000000018 = smp_no_rendevous_barrier) 0xd2da60: 0, cd8470, 0, bd2608 0xd2da60: 0, 1, 0, c3a30c (0000000000c3a30c g .data 0000000000000000 ofw_sprg0_save) 0xd2da70: 0, bb5ea0, 0, b70870 0xd2da70: 0, 1c35ec0, 0, 0 0xd2da80: 0, c49918, 0, bc7e18 0xd2da80: 0, 44000022, 0, de4b30 (44000022 looks like a cr value?) (0000000000de4b30 g O .bss 0000000000000460 thread0) 0xd2da90: 0, b720a0, 0, b71370 0xd2da90: 900000000, 1032, 0, ff846d78 (9000000000001032 looks like a SRR1 value.) (ff846d78 is openfirmware entry point?) 0xd2daa0: 0, bc7e30, 0, bc7e58 0xd2daa0: 0, e39080, 100000000, 3030 (0000000000e39080 g O .bss 0000000000020000 __pcpu) (1000000000003030 looks like a SRR1 value?) 0xd2dab0: 0, bc7e80, 0, bc7eb0 0xd2dab0: c0000000, 83b0, 0, c3a280 (0xc0000000000083b0 looks like a stack address?) (c3a280 is inside my PowerMac G5 specific hack's ofwstk area: c392a0 up = to 0x3a2a0) (I've been gathering evidence about early-boot G5 crashes.) 0xd2dac0: 0, bc7ed0, 0, cf2960 0xd2dac0: 0, c40000, 0, c40000 0xd2dad0: 0, bc7f00, 0, bc7f28 0xd2dad0: 0, c40000, 0, c40000 0xd2dae0: 0, b72400, 0, bc7f28 0xd2dae0: c0000000, 8740, 0, cde110 (0xc000000000008740 looks like a stack address?) (0000000000cde110 g F .opd 0000000000000018 = smp_no_rendevous_barrier) 0xd2daf0: 0, cf2b28, 0, b716a0 0xd2daf0: 0, d00058, 0, cde110 (d00058 was also at 0xd2da4c and was followed by cde110 there.) (0000000000cde110 g F .opd 0000000000000018 = smp_no_rendevous_barrier) 0xd2db00: 0, cf2b88, 0, cf2b70 0xd2db00: 0, e6c280, 0, 0 (e6c280 is inside the emergency_buffer.7752 area: e6c278 up to e6c378) 0xd2db10: 0, cf2b58, 0, 8480 0xd2db10: 900000000, 1032, c0000000, 8740 (9000000000001032 looks like a SRR1 value?) (0xc000000000008740 looks like a stack address?) 0xd2db20: 0, c2d920, 0, cf2b10 0xd2db20: 0, c2d920, 0, cf2b10 (yep: unchanged!) 0xd2db30: 0, b71718, 0, c49888 0xd2db30: 0, ff846734, 10000000, 3030 (ff846734 would seem to be an openfirmware code address?) (1000000000003030 looks like a SRR1 value?) 0xd2db40: 0, c498a0, 0, c54000 0xd2db40: 0, c498a0, 0, ff846d78 (Yep: c498a0 was unchanged) (ff846d78 is openfirmware entry point?) 0xd2db50: 0, e313a8, 0, e31608 0xd2db50: 24000042, e313a8, 0, 0 (24000042 looks like a cr value?) (Trying to store to address 0x2400004200e313a8 for a specific type of 10.1-STABLE build is how the problem was originally noticed.) 0xd2db60: 0, c31f80, 0, bc81e8 0xd2db60: 0, c31f80, 0, 0 (Yep: 0x0000000000c31f80 is unchanged.) 0xd2db70: 0, e31408, 0, bc8228 0xd2db70: 200000, e31408, 0, bc8228 (Yep: Only the 0x200000 was a change.) 0xd2db80: 0, c32488, 0, bc8238 0xd2db80: 0, 1, 10000000, 3030 (1000000000003030 looks like a SRR1 value?) 0xd2db90: 0, e1e460, 0, c31fc0 0xd2db90: 0, 0, 0, 7ff7e800 0xd2dba0: 0, e31608, 0, bc8260 0xd2dba0: 0, 1000000a, 0, bc8260 (Yep: 0x0000000000bc8260 unchanged.) 0xd2dbb0: 0, e1e460, 0, e1fa60 0xd2dbb0: 0, e1e460, 0, e1fa60 (yep: unchanged!) 0xd2dbc0: 0, bc8288, 0, c32488 0xd2dbc0: 111081, 0, fd3c2000, 0 (fd3c2000 in openfirmware area?) 0xd2dbd0: 0, e3153c, 0, bc8298 0xd2dbd0: 10, 0, 0, 0 Now a few unchanged: 0xd2de0-0xd2dc1F Then a change in the pattern of corruptions for the rest of the = corrupted area: 0xd2dc20: 0, bc8288, 0, bc82e8 0xd2dc20: 0, bc8288, 127f500, bc82e8 Note how bc8288 and bc82e8 did not change. =46rom here on those two columns are not corrupted but the other two are. 0xd2dc30: 0, bc8300, 0, c32488 0xd2dc30: 8000000, bc8300, e7d540, c32488 0xd2dc40: 0, b4fef0, 0, e31558 0xd2dc40: ecc40, b4fef0, 84eec80, e31558 0xd2dc50: 0, bc8308, 0, cf2f00 0xd2dc50: 1e85440, bc8308, 8766200, cf2f00 0xd2dc60: 0, bc8310, 0, bc8350 0xd2dc60: fb9040, bc8310, 93bb000, bc8350 0xd2dc70: 0, c32038, 0, de5718 0xd2dc70: 94f6b00, c32038, 8632600, de5718 0xd2dc80: 0, de7768, 0, bc3760 0xd2dc80: 1fc0f40, de7768, 10f4b40, bc3760 0xd2dc90: 0, de7768, 0, e1fa00 0xd2dc90: 99e5700, cfc658, 228740, e1fa00 And after that things match for as far as I've looked: no corruptions. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-ppc@FreeBSD.ORG Wed Feb 18 12:51:17 2015 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B5AB0777 for ; Wed, 18 Feb 2015 12:51:17 +0000 (UTC) Received: from asp.reflexion.net (outbound-242.asp.reflexion.net [69.84.129.242]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6DFFD2D5 for ; Wed, 18 Feb 2015 12:51:16 +0000 (UTC) Received: (qmail 11212 invoked from network); 18 Feb 2015 12:51:15 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 18 Feb 2015 12:51:15 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v7.40.1) with SMTP; Wed, 18 Feb 2015 07:51:15 -0500 (EST) Received: (qmail 11121 invoked from network); 18 Feb 2015 12:51:14 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (DHE-RSA-AES256-SHA encrypted) SMTP; 18 Feb 2015 12:51:14 -0000 X-No-Relay: not in my network Received: from [192.168.1.8] (c-67-189-19-145.hsd1.or.comcast.net [67.189.19.145]) by iron2.pdx.net (Postfix) with ESMTPSA id 8F64D1C43A4 for ; Wed, 18 Feb 2015 04:51:12 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: PowerMac G5 powerpc64: new context where repeatedly booting varies between failing and working From: Mark Millard In-Reply-To: <36C14790-8E66-4C9D-9F29-A137FB49439D@dsl-only.net> Date: Wed, 18 Feb 2015 04:51:12 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <836A3016-D41B-45CB-AD4B-946767212026@dsl-only.net> References: <7CA43EE3-8C11-4FBD-9F8A-42DF08B82362@dsl-only.net> <5FE82152-BBF7-4C6D-932D-AEE70546CACA@dsl-only.net> <36C14790-8E66-4C9D-9F29-A137FB49439D@dsl-only.net> To: FreeBSD PowerPC ML X-Mailer: Apple Mail (2.2070.6) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Feb 2015 12:51:17 -0000 I modified openfirmware_core to check on the status of the pointer value = between most of its stages. With this I've also seen later failures than = the usual one, such as after a OF_finddevice use has its ofwcall return. And the change nails down the stage greatly for at what point it = corrupts memory when it does fail... // OKAY HERE result =3D ofwcall(args); // SOMETIMES CORRUPTED HERE Unfortunately to get this far ofwcall is my variant in order to, for = example, enable recovery/retry from observed bad r1/r3 register problems = that happened super-early on return from openfirmware in a high = percentage of my boot attempts. I have yet to see how close to normal I = can get ofwcall to be while still allowing this type of test. The relevant detection code in openfirmware_core is... /* HACK */ extern void** authnone_create(void); ... static __inline void ofw_restore_trap_vec(char *restore_trap_vec) { if (!ofw_real_mode) return; bcopy(restore_trap_vec, (void *)EXC_RST, EXC_LAST - EXC_RST); __syncicache(EXC_RSVD, EXC_LAST - EXC_RSVD); } ... static int openfirmware_core(void *args) { int result; register_t oldmsr; /* HACK */ void** jnk1pp; void** jnk2pp; void* jnk =3D *authnone_create() if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); /* * Turn off exceptions - we really don't want to end up * anywhere unexpected with PCPU set to something strange * or the stack pointer wrong. */ oldmsr =3D intr_disable(); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); ofw_sprg_prepare(); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); /* Save trap vectors */ ofw_save_trap_vec(save_trap_of); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); =20 /* Restore initially saved trap vectors */ ofw_restore_trap_vec(save_trap_init); /* HACK */ jnk1pp =3D authnone_create(); #if defined(AIM) && !defined(__powerpc64__) /* * Clear battable[] translations */ if (!(cpu_features & PPC_FEATURE_64)) __asm __volatile("mtdbatu 2, %0\n" "mtdbatu 3, %0" : : "r" (0)); isync(); #endif result =3D ofwcall(args); /* HACK */ jnk2pp =3D authnone_create(); /* Restore trap vecotrs */ ofw_restore_trap_vec(save_trap_of); /* HACK */ if (jnk !=3D *jnk1pp) jnk =3D *authnone_create(); if (jnk !=3D *jnk2pp) jnk =3D *authnone_create(); /* Note: *jnk2pp above is what detects the bad pointer value when it = goes bad */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); ofw_sprg_restore(); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); intr_restore(oldmsr); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); return (result); } In the code this translates to... 00000000008a671c <.openfirmware_core+0x168> bl 00000000007a3de4 = <.authnone_create> 00000000008a6720 <.openfirmware_core+0x16c> crmove 4*cr7+so,4*cr7+so 00000000008a6724 <.openfirmware_core+0x170> mr r28,r3 Note: The above loads r28 with a good address that later does not fail = when later dereferenced (while FreeBSD's exception vectors are in = place). 00000000008a6728 <.openfirmware_core+0x174> mr r3,r29 00000000008a672c <.openfirmware_core+0x178> bl 00000000008ac930 = <.ofwcall> 00000000008a6730 <.openfirmware_core+0x17c> crmove 4*cr7+so,4*cr7+so 00000000008a6734 <.openfirmware_core+0x180> mr r26,r3 00000000008a6738 <.openfirmware_core+0x184> bl 00000000007a3de4 = <.authnone_create> 00000000008a673c <.openfirmware_core+0x188> crmove 4*cr7+so,4*cr7+so 00000000008a6740 <.openfirmware_core+0x18c> mr r29,r3 Note: The above loads r29 with the bad address that is later detected by = referencing it. This is the corrupted pointer value. 00000000008a6744 <.openfirmware_core+0x190> ld r3,21216(r2) 00000000008a6748 <.openfirmware_core+0x194> lwz r0,0(r3) 00000000008a674c <.openfirmware_core+0x198> cmpwi cr7,r0,0 00000000008a6750 <.openfirmware_core+0x19c> beq+ cr7,00000000008a6778 = <.openfirmware_core+0x1c4> 00000000008a6754 <.openfirmware_core+0x1a0> addi r3,r3,16 00000000008a6758 <.openfirmware_core+0x1a4> li r4,256 00000000008a675c <.openfirmware_core+0x1a8> li r5,11776 00000000008a6760 <.openfirmware_core+0x1ac> bl 00000000008c158c = <.bcopy> 00000000008a6764 <.openfirmware_core+0x1b0> crmove 4*cr7+so,4*cr7+so 00000000008a6768 <.openfirmware_core+0x1b4> li r3,0 00000000008a676c <.openfirmware_core+0x1b8> li r4,12032 00000000008a6770 <.openfirmware_core+0x1bc> bl 00000000008d5358 = <.__syncicache> Note: At this point it is back to FreeBSD exception vectors so kernel = debug display will work for bad pointer detection tests. 00000000008a6774 <.openfirmware_core+0x1c0> crmove 4*cr7+so,4*cr7+so 00000000008a6778 <.openfirmware_core+0x1c4> ld r0,0(r28) Note: The above dereference of the before ofwcall pointer value (in r28) = does not detect a bad pointer. 00000000008a677c <.openfirmware_core+0x1c8> cmpd cr7,r0,r30 00000000008a6780 <.openfirmware_core+0x1cc> beq- cr7,00000000008a6790 = <.openfirmware_core+0x1dc> 00000000008a6784 <.openfirmware_core+0x1d0> bl 00000000007a3de4 = <.authnone_create> 00000000008a6788 <.openfirmware_core+0x1d4> crmove 4*cr7+so,4*cr7+so 00000000008a678c <.openfirmware_core+0x1d8> ld r30,0(r3) 00000000008a6790 <.openfirmware_core+0x1dc> ld r0,0(r29) It is that last instruction (.openfirmware_core+0x1dc) that "detects" = the bad pointer and leads to a kernel debugger display of some of the = corrupted memory, including the stored pointer that the above code = accessed and dereferenced to detect the problem. So the pointer was good just before the ofwcall and was bad just after = it. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2015-Feb-17, at 09:34 PM, Mark Millard = wrote: [I had sent Nathan W. and Justin H. a picture of a display of a = boot-time corrupted memory region. This time I tried to find the start = and end of the region and I'm documenting in a textual form more = appropriate to the list. I have also removed prior Email history from = this Email but there is much context one must check that history for.] Several of the new values put in place by the .got memory corruption = reported below match up with .opd or other types of addresses reported = by objdump for my /boot/kernel10.1S/kernel. They are noted below as I = list detailed differences. I made the early-boot-crash display a larger range and the span of the = corruption seemed to go as follows for the corruption of part of the = .got area. Also I induced a deference of the bad pointer as soon as it = is discovered after the OF_peer(0) in question returns so later code = would not be involved when it crashes. (Crash early, crash often...) Overall structure: 0xd2da37 and before as far as I looked: no corruption found. The area from 0xd2da38-0xd2dc9F: largely corrupted. 0x268 or 616 bytes = or so in this corrupted range. 616=3D77*8. After that range: good again as far as I looked. The details: Warning: The below is based on hand transcribed information from screen = pictures that I took. Showing pair of lines (good then corrupted), using x/x style lines: 0xd2da30: 0, b4fd2c, 0, b4fd70 0xd2da30: 0, b4fd2c, 0, 0 0xd2da40: 0, e28948, 0, e1e460 0xd2da40: 0, 24000042, 0, d00058 (24000042 looks like a cr value?) (0000000000d00058 l .opd 0000000000000018 = ofw_rendezvous_dispatch) 0xd2da50: 0, bc7de8, 0, bc7e08 0xd2da50: 0, cde110, c0000000, 8740 (0xc000000000008740 looks like a stack address?) (0000000000cde110 g F .opd 0000000000000018 = smp_no_rendevous_barrier) 0xd2da60: 0, cd8470, 0, bd2608 0xd2da60: 0, 1, 0, c3a30c (0000000000c3a30c g .data 0000000000000000 ofw_sprg0_save) 0xd2da70: 0, bb5ea0, 0, b70870 0xd2da70: 0, 1c35ec0, 0, 0 0xd2da80: 0, c49918, 0, bc7e18 0xd2da80: 0, 44000022, 0, de4b30 (44000022 looks like a cr value?) (0000000000de4b30 g O .bss 0000000000000460 thread0) 0xd2da90: 0, b720a0, 0, b71370 0xd2da90: 900000000, 1032, 0, ff846d78 (9000000000001032 looks like a SRR1 value.) (ff846d78 is openfirmware entry point?) 0xd2daa0: 0, bc7e30, 0, bc7e58 0xd2daa0: 0, e39080, 100000000, 3030 (0000000000e39080 g O .bss 0000000000020000 __pcpu) (1000000000003030 looks like a SRR1 value?) 0xd2dab0: 0, bc7e80, 0, bc7eb0 0xd2dab0: c0000000, 83b0, 0, c3a280 (0xc0000000000083b0 looks like a stack address?) (c3a280 is inside my PowerMac G5 specific hack's ofwstk area: c392a0 up = to 0x3a2a0) (I've been gathering evidence about early-boot G5 crashes.) 0xd2dac0: 0, bc7ed0, 0, cf2960 0xd2dac0: 0, c40000, 0, c40000 0xd2dad0: 0, bc7f00, 0, bc7f28 0xd2dad0: 0, c40000, 0, c40000 0xd2dae0: 0, b72400, 0, bc7f28 0xd2dae0: c0000000, 8740, 0, cde110 (0xc000000000008740 looks like a stack address?) (0000000000cde110 g F .opd 0000000000000018 = smp_no_rendevous_barrier) 0xd2daf0: 0, cf2b28, 0, b716a0 0xd2daf0: 0, d00058, 0, cde110 (d00058 was also at 0xd2da4c and was followed by cde110 there.) (0000000000cde110 g F .opd 0000000000000018 = smp_no_rendevous_barrier) 0xd2db00: 0, cf2b88, 0, cf2b70 0xd2db00: 0, e6c280, 0, 0 (e6c280 is inside the emergency_buffer.7752 area: e6c278 up to e6c378) 0xd2db10: 0, cf2b58, 0, 8480 0xd2db10: 900000000, 1032, c0000000, 8740 (9000000000001032 looks like a SRR1 value?) (0xc000000000008740 looks like a stack address?) 0xd2db20: 0, c2d920, 0, cf2b10 0xd2db20: 0, c2d920, 0, cf2b10 (yep: unchanged!) 0xd2db30: 0, b71718, 0, c49888 0xd2db30: 0, ff846734, 10000000, 3030 (ff846734 would seem to be an openfirmware code address?) (1000000000003030 looks like a SRR1 value?) 0xd2db40: 0, c498a0, 0, c54000 0xd2db40: 0, c498a0, 0, ff846d78 (Yep: c498a0 was unchanged) (ff846d78 is openfirmware entry point?) 0xd2db50: 0, e313a8, 0, e31608 0xd2db50: 24000042, e313a8, 0, 0 (24000042 looks like a cr value?) (Trying to store to address 0x2400004200e313a8 for a specific type of 10.1-STABLE build is how the problem was originally noticed.) 0xd2db60: 0, c31f80, 0, bc81e8 0xd2db60: 0, c31f80, 0, 0 (Yep: 0x0000000000c31f80 is unchanged.) 0xd2db70: 0, e31408, 0, bc8228 0xd2db70: 200000, e31408, 0, bc8228 (Yep: Only the 0x200000 was a change.) 0xd2db80: 0, c32488, 0, bc8238 0xd2db80: 0, 1, 10000000, 3030 (1000000000003030 looks like a SRR1 value?) 0xd2db90: 0, e1e460, 0, c31fc0 0xd2db90: 0, 0, 0, 7ff7e800 0xd2dba0: 0, e31608, 0, bc8260 0xd2dba0: 0, 1000000a, 0, bc8260 (Yep: 0x0000000000bc8260 unchanged.) 0xd2dbb0: 0, e1e460, 0, e1fa60 0xd2dbb0: 0, e1e460, 0, e1fa60 (yep: unchanged!) 0xd2dbc0: 0, bc8288, 0, c32488 0xd2dbc0: 111081, 0, fd3c2000, 0 (fd3c2000 in openfirmware area?) 0xd2dbd0: 0, e3153c, 0, bc8298 0xd2dbd0: 10, 0, 0, 0 Now a few unchanged: 0xd2de0-0xd2dc1F Then a change in the pattern of corruptions for the rest of the = corrupted area: 0xd2dc20: 0, bc8288, 0, bc82e8 0xd2dc20: 0, bc8288, 127f500, bc82e8 Note how bc8288 and bc82e8 did not change. =46rom here on those two columns are not corrupted but the other two are. 0xd2dc30: 0, bc8300, 0, c32488 0xd2dc30: 8000000, bc8300, e7d540, c32488 0xd2dc40: 0, b4fef0, 0, e31558 0xd2dc40: ecc40, b4fef0, 84eec80, e31558 0xd2dc50: 0, bc8308, 0, cf2f00 0xd2dc50: 1e85440, bc8308, 8766200, cf2f00 0xd2dc60: 0, bc8310, 0, bc8350 0xd2dc60: fb9040, bc8310, 93bb000, bc8350 0xd2dc70: 0, c32038, 0, de5718 0xd2dc70: 94f6b00, c32038, 8632600, de5718 0xd2dc80: 0, de7768, 0, bc3760 0xd2dc80: 1fc0f40, de7768, 10f4b40, bc3760 0xd2dc90: 0, de7768, 0, e1fa00 0xd2dc90: 99e5700, cfc658, 228740, e1fa00 And after that things match for as far as I've looked: no corruptions. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-ppc@FreeBSD.ORG Wed Feb 18 15:35:10 2015 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E64BEA6D for ; Wed, 18 Feb 2015 15:35:10 +0000 (UTC) Received: from d.mail.sonic.net (d.mail.sonic.net [64.142.111.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CAAE78F1 for ; Wed, 18 Feb 2015 15:35:10 +0000 (UTC) Received: from zeppelin.tachypleus.net (173-161-16-229-Illinois.hfc.comcastbusiness.net [173.161.16.229]) (authenticated bits=0) by d.mail.sonic.net (8.15.1/8.14.9) with ESMTPSA id t1IFZ1TK022539 (version=TLSv1.2 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 18 Feb 2015 07:35:01 -0800 Message-ID: <54E4B124.9040006@freebsd.org> Date: Wed, 18 Feb 2015 07:35:00 -0800 From: Nathan Whitehorn User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: freebsd-ppc@freebsd.org Subject: Re: Fixing powerpc64 /boot/loader's kernel page handing: suggestions? References: In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Sonic-CAuth: UmFuZG9tSVZdVFFuGe8Wb3dcarbuK/U3+lKfGptv0cdfLmxC6QKxSAAKxN9AQtJXLpFNbEwiS0Ud4p6epSgprKrqRgg9vzPu5AMPwSKIcls= X-Sonic-ID: C;sLTAs4O35BGis9UUxQPdhw== M;ohIqtIO35BGis9UUxQPdhw== X-Spam-Flag: No X-Sonic-Spam-Details: 0.0/5.0 by cerberusd X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Feb 2015 15:35:11 -0000 Thanks for diagnosing this! The syncicache spans the whole kernel out of laziness. As you note, it isn't appropriate. If there are more instances of this kind of thing, then it might make sense to try to make ld emit only one PT_LOAD program section as a long-term solution. I'll look into that soon. -Nathan On 02/10/15 17:16, Mark Millard wrote: > Context: > > Unfortunately this takes me a bit to describe... > > powerpc 64 FreeBSD 10.1-??? variants on a PowerMac G5 Quad-Core, built on the same machine. I expect the issue applies to some plain powerpc contexts as well as some other powerpc64 contexts. As example context where my issue occurs is: > >> 10.1-RELEASE-p5 >> 10.1-RELEASE-p5 >> FreeBSD FBSDG5M1 10.1-RELEASE-p5 FreeBSD 10.1-RELEASE-p5 #0 r277808M: Fri Jan 30 00:58:33 PST 2015 root@FBSDG5M1:/usr/obj/usr/home/markmi/src_10_1_releng/sys/GENERIC64vtsc powerpc > But I also get is for various vintages of 10.1-STABLE (and 11.0-CURRENT). I use 10.1-RELEASE-p5 here because I happen to have a build that avoids the problem and I know what to set for that build to regenerated --and I know at least one thing to to turn on for builds to create the problem. > >> root@FBSDG5M1:/usr/home/markmi/src_10_1_releng # more sys/powerpc/conf/GENERIC64vtsc >> include GENERIC64 >> ident GENERIC64vtsc >> >> nooptions PS3 #Sony Playstation 3 HACK!!! to allow sc >> >> options DDB # HACK!!! to dump early crash info (but 11.0-CURRENT already has it) >> options GDB # HACK!!! ... >> options VERBOSE_SYSINIT # VERBOSE_SYSINT blocks direct booting for my 10.1-RELEASE-p5 variants: Crashes when the loader is in __syncicache doing dcbst's. >> options BOOTVERBOSE=1 >> options BOOTHOWTO=RB_VERBOSE >> #options KTR >> #options KTR_MASK=KTR_TRAP >> #options KTR_CPUMASK=0xF >> #options KTR_VERBOSE >> >> # HACK!!! to allow sc for 2560x1440 display on Radeon X1950 that vt historically mishandled during booting >> device sc >> #device kbdmux # HACK: already listed by vt >> options SC_OFWFB # OFW frame buffer >> options SC_DFLT_FONT # compile font in >> makeoptions SC_DFLT_FONT=cp437 >> >> >> # Disable extra checking typically used for FreeBSD 11.0-CURRENT: >> nooptions DEADLKRES #Enable the deadlock resolver >> nooptions INVARIANTS #Enable calls of extra sanity checking >> nooptions INVARIANT_SUPPORT #Extra sanity checks of internal structures, required by INVARIANTS >> nooptions WITNESS #Enable checks to detect deadlocks and cycles >> nooptions WITNESS_SKIPSPIN #Don't run witness on spinlocks for speed >> nooptions MALLOC_DEBUG_MAXZONES # Separate malloc(9) zones > > For my temporarily extended ELF_VERBOSE code [and other printf's] that also reports on non-PT_LOADs (which are otherwise skipped) what it reports for booting various 10.1-??? kernel builds is the sequence: > > PT_PHDR > PT_INTERP > PT_LOAD (for .text) > (using archsw.arch_copyin then kern_pread) > Address range example: 0x100000-0xbe017b > > PT_LOAD (for .data) > (using kern_pread) > Address range for the same example: 0xbf0000-0xea4b7f > PT_DYNAMIC > PT_GNU_STACK > symtab > strtab > Final address for the same example: 0x1114baf > > The issue happens when there are such unreferenced pages where I indicated. It turns out for what I started this investigation with that if I commented out VERBOSE_SYSINIT in GENERIC64vtsc (listed earlier) then no unreferenced pages appear but with VERBOSE_SYSINT there are such pages (holding the rest of the context constant). But this is not the only way to get such unreferenced pages. For example my 10.1-STABLE build has unreferenced pages but does not have VERBOSE_SYSINIT (yet). > > When there are unreferenced pages between the two PT_LOADs those pages do not get archsw_arch_copyin or kern_pread handling. (kern_pread in turn uses archsw.arch_readin.) > > For my PowerMac G5 Quad-Core context those archsw.arch_ routines end up being ofw_copyin and ofw_readin. Those routines in turn call ofw_memmap which includes doing: > >> if (OF_call_method("claim", memory, 3, 1, destp, dlen, 0, &addr) >> == -1) { >> printf("ofw_mapmem: physical claim failed\n"); >> return (ENOMEM); >> } >> >> /* >> * We only do virtual memory management when real_mode is false. >> */ >> if (real_mode == 0) { >> if (OF_call_method("claim", mmu, 3, 1, destp, dlen, 0, &addr) >> == -1) { >> printf("ofw_mapmem: virtual claim failed\n"); >> return (ENOMEM); >> } >> >> if (OF_call_method("map", mmu, 4, 0, destp, destp, dlen, 0) >> == -1) { >> printf("ofw_mapmem: map failed\n"); >> return (ENOMEM); >> } >> } > and during load-time this is what programs the PowerPC to have the PTEG entries (and whatever else) that instructions like dcbst require (since MSR[DR]=1). > > The crashes are at the first dcbst in __syncicache execution that reference the missing pages. (It seems unlikely that there is any other usage of those pages.) The crash reports missing PTEG entries (DSISR for IV 0x300). (Apple's openfirmware word .registers shows the recorded register status from the crash. After the crash the PowerMac is in Apple's context, not FreeBSD's.) > > The __syncicache use results from the following > >> int >> ppc64_ofw_elf_loadfile(char *filename, u_int64_t dest, >> struct preloaded_file **result) >> { >> int r; >> >> r = __elfN(loadfile)(filename, dest, result); >> if (r != 0) >> return (r); >> >> /* >> * No need to sync the icache for modules: this will >> * be done by the kernel after relocation. >> */ >> if (!strcmp((*result)->f_type, "elf kernel")) >> __syncicache((void *) (*result)->f_addr, (*result)->f_size); >> return (0); >> } > (powerpc has a similar sequence with __syncicache as I remember.) For some reason the __syncicache usage is set up to span into or beyond the .data segment, not just the .text one. I do not know why. > > __elfN(loadfile)'s interface is not designed to return multiple address ranges and is returning one range that spans into both the PT_LOAD ranges (.text and .data) and any unreferenced pages that are between them. (In fact it spans even more afterwards as I remember.) > > > Questions: > > Anyone have a clue about why the __syncicache use is set up to span into .data (and more) and not just span .text --and willing to explain a little? > > > As far as solution directions go: this looks like a subject area appropriate to general FreeBSD use base on the available evidence. A local personal hack does not seem appropriate. So... > > > A) Should the link of the kernel be producing a kernel with unreferenced pages between the two PT_LOADs (between .text and .data)? Is the proper fix to prevent those pages from existing in linked kernels? > > vs. > > B) Is it okay for those unreferenced pages to be there between the two PT_LOADs? If yes... > > B1) Should something like the ofw_memmap activity be forced on those otherwise unreferenced pages so that the later __syncicache use can stay as it is? > > vs. > > B2) Should the unreferenced pages be skipped by making separate __synicache calls for each PT_LOAD (.text segment and then .data segment and beyond(?))? > > vs. > > B3) Should only the .text segment be spanned by the __syncicache use? Some other more specific range that avoids those unreferenced pages? > > > It would appear that all but (A) involve changing the interface provided by __elfN(loadfile) and/or the interfaces it uses: the fix does not appear well localized. (A) may have its own such issues but in other code or files that I've not looked at. > > > === > Mark Millard > markmi at dsl-only.net > > _______________________________________________ > freebsd-ppc@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-ppc > To unsubscribe, send any mail to "freebsd-ppc-unsubscribe@freebsd.org" > From owner-freebsd-ppc@FreeBSD.ORG Wed Feb 18 15:45:42 2015 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3EE08BD7 for ; Wed, 18 Feb 2015 15:45:42 +0000 (UTC) Received: from d.mail.sonic.net (d.mail.sonic.net [64.142.111.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1B895A05 for ; Wed, 18 Feb 2015 15:45:41 +0000 (UTC) Received: from zeppelin.tachypleus.net (173-161-16-229-Illinois.hfc.comcastbusiness.net [173.161.16.229]) (authenticated bits=0) by d.mail.sonic.net (8.15.1/8.14.9) with ESMTPSA id t1IFjcQP000644 (version=TLSv1.2 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 18 Feb 2015 07:45:39 -0800 Message-ID: <54E4B3A2.9020106@freebsd.org> Date: Wed, 18 Feb 2015 07:45:38 -0800 From: Nathan Whitehorn User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: freebsd-ppc@freebsd.org Subject: Re: PowerMac G5 powerpc64: new context where repeatedly booting varies between failing and working References: <7CA43EE3-8C11-4FBD-9F8A-42DF08B82362@dsl-only.net> <5FE82152-BBF7-4C6D-932D-AEE70546CACA@dsl-only.net> <36C14790-8E66-4C9D-9F29-A137FB49439D@dsl-only.net> <836A3016-D41B-45CB-AD4B-946767212026@dsl-only.net> In-Reply-To: <836A3016-D41B-45CB-AD4B-946767212026@dsl-only.net> Content-Type: multipart/mixed; boundary="------------040607000109080202030001" X-Sonic-CAuth: UmFuZG9tSVY9YnZRN3dzzJy9207IbVyBEx08YVwNRYwR/cNNTrs7LltcD8QtSDuHs+9x4okkWzsCU2KpPJxvQ7nU/2iT2A0fh/6/OzhrLlU= X-Sonic-ID: C;Vgy+L4W35BGwX9UUxQPdhw== M;zM8pMIW35BGwX9UUxQPdhw== X-Spam-Flag: No X-Sonic-Spam-Details: 0.0/5.0 by cerberusd X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Feb 2015 15:45:42 -0000 This is a multi-part message in MIME format. --------------040607000109080202030001 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Interesting. I'm assuming this is due to a bug in the 32-/64-bit ABI thunking that is required to call into Open Firmware. Could you see if the attached patch helps? -Nathan On 02/18/15 04:51, Mark Millard wrote: > I modified openfirmware_core to check on the status of the pointer value between most of its stages. With this I've also seen later failures than the usual one, such as after a OF_finddevice use has its ofwcall return. > > And the change nails down the stage greatly for at what point it corrupts memory when it does fail... > > // OKAY HERE > result = ofwcall(args); > // SOMETIMES CORRUPTED HERE > > Unfortunately to get this far ofwcall is my variant in order to, for example, enable recovery/retry from observed bad r1/r3 register problems that happened super-early on return from openfirmware in a high percentage of my boot attempts. I have yet to see how close to normal I can get ofwcall to be while still allowing this type of test. > > > The relevant detection code in openfirmware_core is... > > /* HACK */ > extern void** authnone_create(void); > ... > static __inline void > ofw_restore_trap_vec(char *restore_trap_vec) > { > if (!ofw_real_mode) > return; > > bcopy(restore_trap_vec, (void *)EXC_RST, EXC_LAST - EXC_RST); > __syncicache(EXC_RSVD, EXC_LAST - EXC_RSVD); > } > ... > static int > openfirmware_core(void *args) > { > int result; > register_t oldmsr; > > /* HACK */ > void** jnk1pp; > void** jnk2pp; > void* jnk = *authnone_create() > if (jnk == *authnone_create()) jnk = *authnone_create(); > > /* > * Turn off exceptions - we really don't want to end up > * anywhere unexpected with PCPU set to something strange > * or the stack pointer wrong. > */ > oldmsr = intr_disable(); > > /* HACK */ > if (jnk == *authnone_create()) jnk = *authnone_create(); > > ofw_sprg_prepare(); > > /* HACK */ > if (jnk == *authnone_create()) jnk = *authnone_create(); > > /* Save trap vectors */ > ofw_save_trap_vec(save_trap_of); > > /* HACK */ > if (jnk == *authnone_create()) jnk = *authnone_create(); > > /* Restore initially saved trap vectors */ > ofw_restore_trap_vec(save_trap_init); > > /* HACK */ > jnk1pp = authnone_create(); > > #if defined(AIM) && !defined(__powerpc64__) > /* > * Clear battable[] translations > */ > if (!(cpu_features & PPC_FEATURE_64)) > __asm __volatile("mtdbatu 2, %0\n" > "mtdbatu 3, %0" : : "r" (0)); > isync(); > #endif > > result = ofwcall(args); > > /* HACK */ > jnk2pp = authnone_create(); > > /* Restore trap vecotrs */ > ofw_restore_trap_vec(save_trap_of); > > /* HACK */ > if (jnk != *jnk1pp) jnk = *authnone_create(); > if (jnk != *jnk2pp) jnk = *authnone_create(); > /* Note: *jnk2pp above is what detects the bad pointer value when it goes bad */ > if (jnk == *authnone_create()) jnk = *authnone_create(); > > ofw_sprg_restore(); > > /* HACK */ > if (jnk == *authnone_create()) jnk = *authnone_create(); > > intr_restore(oldmsr); > > /* HACK */ > if (jnk == *authnone_create()) jnk = *authnone_create(); > > return (result); > } > > In the code this translates to... > > 00000000008a671c <.openfirmware_core+0x168> bl 00000000007a3de4 <.authnone_create> > 00000000008a6720 <.openfirmware_core+0x16c> crmove 4*cr7+so,4*cr7+so > 00000000008a6724 <.openfirmware_core+0x170> mr r28,r3 > > Note: The above loads r28 with a good address that later does not fail when later dereferenced (while FreeBSD's exception vectors are in place). > > 00000000008a6728 <.openfirmware_core+0x174> mr r3,r29 > 00000000008a672c <.openfirmware_core+0x178> bl 00000000008ac930 <.ofwcall> > 00000000008a6730 <.openfirmware_core+0x17c> crmove 4*cr7+so,4*cr7+so > 00000000008a6734 <.openfirmware_core+0x180> mr r26,r3 > 00000000008a6738 <.openfirmware_core+0x184> bl 00000000007a3de4 <.authnone_create> > 00000000008a673c <.openfirmware_core+0x188> crmove 4*cr7+so,4*cr7+so > 00000000008a6740 <.openfirmware_core+0x18c> mr r29,r3 > > Note: The above loads r29 with the bad address that is later detected by referencing it. This is the corrupted pointer value. > > 00000000008a6744 <.openfirmware_core+0x190> ld r3,21216(r2) > 00000000008a6748 <.openfirmware_core+0x194> lwz r0,0(r3) > 00000000008a674c <.openfirmware_core+0x198> cmpwi cr7,r0,0 > 00000000008a6750 <.openfirmware_core+0x19c> beq+ cr7,00000000008a6778 <.openfirmware_core+0x1c4> > 00000000008a6754 <.openfirmware_core+0x1a0> addi r3,r3,16 > 00000000008a6758 <.openfirmware_core+0x1a4> li r4,256 > 00000000008a675c <.openfirmware_core+0x1a8> li r5,11776 > 00000000008a6760 <.openfirmware_core+0x1ac> bl 00000000008c158c <.bcopy> > 00000000008a6764 <.openfirmware_core+0x1b0> crmove 4*cr7+so,4*cr7+so > 00000000008a6768 <.openfirmware_core+0x1b4> li r3,0 > 00000000008a676c <.openfirmware_core+0x1b8> li r4,12032 > 00000000008a6770 <.openfirmware_core+0x1bc> bl 00000000008d5358 <.__syncicache> > > Note: At this point it is back to FreeBSD exception vectors so kernel debug display will work for bad pointer detection tests. > > 00000000008a6774 <.openfirmware_core+0x1c0> crmove 4*cr7+so,4*cr7+so > 00000000008a6778 <.openfirmware_core+0x1c4> ld r0,0(r28) > > Note: The above dereference of the before ofwcall pointer value (in r28) does not detect a bad pointer. > > 00000000008a677c <.openfirmware_core+0x1c8> cmpd cr7,r0,r30 > 00000000008a6780 <.openfirmware_core+0x1cc> beq- cr7,00000000008a6790 <.openfirmware_core+0x1dc> > 00000000008a6784 <.openfirmware_core+0x1d0> bl 00000000007a3de4 <.authnone_create> > 00000000008a6788 <.openfirmware_core+0x1d4> crmove 4*cr7+so,4*cr7+so > 00000000008a678c <.openfirmware_core+0x1d8> ld r30,0(r3) > 00000000008a6790 <.openfirmware_core+0x1dc> ld r0,0(r29) > > It is that last instruction (.openfirmware_core+0x1dc) that "detects" the bad pointer and leads to a kernel debugger display of some of the corrupted memory, including the stored pointer that the above code accessed and dereferenced to detect the problem. > > So the pointer was good just before the ofwcall and was bad just after it. > > === > Mark Millard > markmi at dsl-only.net > > On 2015-Feb-17, at 09:34 PM, Mark Millard wrote: > > [I had sent Nathan W. and Justin H. a picture of a display of a boot-time corrupted memory region. This time I tried to find the start and end of the region and I'm documenting in a textual form more appropriate to the list. I have also removed prior Email history from this Email but there is much context one must check that history for.] > > Several of the new values put in place by the .got memory corruption reported below match up with .opd or other types of addresses reported by objdump for my /boot/kernel10.1S/kernel. They are noted below as I list detailed differences. > > I made the early-boot-crash display a larger range and the span of the corruption seemed to go as follows for the corruption of part of the .got area. Also I induced a deference of the bad pointer as soon as it is discovered after the OF_peer(0) in question returns so later code would not be involved when it crashes. (Crash early, crash often...) > > > Overall structure: > > 0xd2da37 and before as far as I looked: no corruption found. > > The area from 0xd2da38-0xd2dc9F: largely corrupted. 0x268 or 616 bytes or so in this corrupted range. 616=77*8. > > After that range: good again as far as I looked. > > > The details: > > Warning: The below is based on hand transcribed information from screen pictures that I took. > > Showing pair of lines (good then corrupted), using x/x style lines: > > 0xd2da30: 0, b4fd2c, 0, b4fd70 > 0xd2da30: 0, b4fd2c, 0, 0 > > 0xd2da40: 0, e28948, 0, e1e460 > 0xd2da40: 0, 24000042, 0, d00058 > (24000042 looks like a cr value?) > (0000000000d00058 l .opd 0000000000000018 ofw_rendezvous_dispatch) > > 0xd2da50: 0, bc7de8, 0, bc7e08 > 0xd2da50: 0, cde110, c0000000, 8740 > (0xc000000000008740 looks like a stack address?) > (0000000000cde110 g F .opd 0000000000000018 smp_no_rendevous_barrier) > > 0xd2da60: 0, cd8470, 0, bd2608 > 0xd2da60: 0, 1, 0, c3a30c > (0000000000c3a30c g .data 0000000000000000 ofw_sprg0_save) > > 0xd2da70: 0, bb5ea0, 0, b70870 > 0xd2da70: 0, 1c35ec0, 0, 0 > > 0xd2da80: 0, c49918, 0, bc7e18 > 0xd2da80: 0, 44000022, 0, de4b30 > (44000022 looks like a cr value?) > (0000000000de4b30 g O .bss 0000000000000460 thread0) > > 0xd2da90: 0, b720a0, 0, b71370 > 0xd2da90: 900000000, 1032, 0, ff846d78 > (9000000000001032 looks like a SRR1 value.) > (ff846d78 is openfirmware entry point?) > > 0xd2daa0: 0, bc7e30, 0, bc7e58 > 0xd2daa0: 0, e39080, 100000000, 3030 > (0000000000e39080 g O .bss 0000000000020000 __pcpu) > (1000000000003030 looks like a SRR1 value?) > > 0xd2dab0: 0, bc7e80, 0, bc7eb0 > 0xd2dab0: c0000000, 83b0, 0, c3a280 > (0xc0000000000083b0 looks like a stack address?) > (c3a280 is inside my PowerMac G5 specific hack's ofwstk area: c392a0 up to 0x3a2a0) > (I've been gathering evidence about early-boot G5 crashes.) > > 0xd2dac0: 0, bc7ed0, 0, cf2960 > 0xd2dac0: 0, c40000, 0, c40000 > > 0xd2dad0: 0, bc7f00, 0, bc7f28 > 0xd2dad0: 0, c40000, 0, c40000 > > 0xd2dae0: 0, b72400, 0, bc7f28 > 0xd2dae0: c0000000, 8740, 0, cde110 > (0xc000000000008740 looks like a stack address?) > (0000000000cde110 g F .opd 0000000000000018 smp_no_rendevous_barrier) > > 0xd2daf0: 0, cf2b28, 0, b716a0 > 0xd2daf0: 0, d00058, 0, cde110 > (d00058 was also at 0xd2da4c and was followed by cde110 there.) > (0000000000cde110 g F .opd 0000000000000018 smp_no_rendevous_barrier) > > 0xd2db00: 0, cf2b88, 0, cf2b70 > 0xd2db00: 0, e6c280, 0, 0 > (e6c280 is inside the emergency_buffer.7752 area: e6c278 up to e6c378) > > 0xd2db10: 0, cf2b58, 0, 8480 > 0xd2db10: 900000000, 1032, c0000000, 8740 > (9000000000001032 looks like a SRR1 value?) > (0xc000000000008740 looks like a stack address?) > > 0xd2db20: 0, c2d920, 0, cf2b10 > 0xd2db20: 0, c2d920, 0, cf2b10 (yep: unchanged!) > > 0xd2db30: 0, b71718, 0, c49888 > 0xd2db30: 0, ff846734, 10000000, 3030 > (ff846734 would seem to be an openfirmware code address?) > (1000000000003030 looks like a SRR1 value?) > > 0xd2db40: 0, c498a0, 0, c54000 > 0xd2db40: 0, c498a0, 0, ff846d78 > (Yep: c498a0 was unchanged) > (ff846d78 is openfirmware entry point?) > > 0xd2db50: 0, e313a8, 0, e31608 > 0xd2db50: 24000042, e313a8, 0, 0 > (24000042 looks like a cr value?) > (Trying to store to address 0x2400004200e313a8 for a specific > type of 10.1-STABLE build is how the problem was originally > noticed.) > > 0xd2db60: 0, c31f80, 0, bc81e8 > 0xd2db60: 0, c31f80, 0, 0 > (Yep: 0x0000000000c31f80 is unchanged.) > > 0xd2db70: 0, e31408, 0, bc8228 > 0xd2db70: 200000, e31408, 0, bc8228 > (Yep: Only the 0x200000 was a change.) > > 0xd2db80: 0, c32488, 0, bc8238 > 0xd2db80: 0, 1, 10000000, 3030 > (1000000000003030 looks like a SRR1 value?) > > 0xd2db90: 0, e1e460, 0, c31fc0 > 0xd2db90: 0, 0, 0, 7ff7e800 > > 0xd2dba0: 0, e31608, 0, bc8260 > 0xd2dba0: 0, 1000000a, 0, bc8260 > (Yep: 0x0000000000bc8260 unchanged.) > > 0xd2dbb0: 0, e1e460, 0, e1fa60 > 0xd2dbb0: 0, e1e460, 0, e1fa60 (yep: unchanged!) > > 0xd2dbc0: 0, bc8288, 0, c32488 > 0xd2dbc0: 111081, 0, fd3c2000, 0 > (fd3c2000 in openfirmware area?) > > 0xd2dbd0: 0, e3153c, 0, bc8298 > 0xd2dbd0: 10, 0, 0, 0 > > Now a few unchanged: 0xd2de0-0xd2dc1F > > Then a change in the pattern of corruptions for the rest of the corrupted area: > > 0xd2dc20: 0, bc8288, 0, bc82e8 > 0xd2dc20: 0, bc8288, 127f500, bc82e8 > > Note how bc8288 and bc82e8 did not change. > From here on those two columns are not > corrupted but the other two are. > > 0xd2dc30: 0, bc8300, 0, c32488 > 0xd2dc30: 8000000, bc8300, e7d540, c32488 > > 0xd2dc40: 0, b4fef0, 0, e31558 > 0xd2dc40: ecc40, b4fef0, 84eec80, e31558 > > 0xd2dc50: 0, bc8308, 0, cf2f00 > 0xd2dc50: 1e85440, bc8308, 8766200, cf2f00 > > 0xd2dc60: 0, bc8310, 0, bc8350 > 0xd2dc60: fb9040, bc8310, 93bb000, bc8350 > > 0xd2dc70: 0, c32038, 0, de5718 > 0xd2dc70: 94f6b00, c32038, 8632600, de5718 > > 0xd2dc80: 0, de7768, 0, bc3760 > 0xd2dc80: 1fc0f40, de7768, 10f4b40, bc3760 > > 0xd2dc90: 0, de7768, 0, e1fa00 > 0xd2dc90: 99e5700, cfc658, 228740, e1fa00 > > And after that things match for as far as I've looked: no corruptions. > > > > > > === > Mark Millard > markmi at dsl-only.net > > > > _______________________________________________ > freebsd-ppc@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-ppc > To unsubscribe, send any mail to "freebsd-ppc-unsubscribe@freebsd.org" > --------------040607000109080202030001 Content-Type: text/plain; charset=us-ascii; name="ofwcall.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="ofwcall.diff" Index: ofwcall64.S =================================================================== --- ofwcall64.S (revision 278935) +++ ofwcall64.S (working copy) @@ -106,7 +106,7 @@ /* Get OF stack pointer */ ld %r7,TOC_REF(ofwstk)(%r2) - addi %r7,%r7,OFWSTKSZ-32 + addi %r7,%r7,OFWSTKSZ-64 /* * Set the MSR to the OF value. This has the side effect of disabling @@ -126,9 +126,9 @@ */ mr %r5,%r1 mr %r1,%r7 - std %r5,8(%r1) /* Save real stack pointer */ - std %r2,16(%r1) /* Save old TOC */ - std %r6,24(%r1) /* Save old MSR */ + std %r5,40(%r1) /* Save real stack pointer */ + std %r2,48(%r1) /* Save old TOC */ + std %r6,56(%r1) /* Save old MSR */ li %r5,0 stw %r5,4(%r1) stw %r5,0(%r1) @@ -138,9 +138,9 @@ bctrl /* Reload stack pointer and MSR from the OFW stack */ - ld %r6,24(%r1) - ld %r2,16(%r1) - ld %r1,8(%r1) + ld %r6,56(%r1) + ld %r2,48(%r1) + ld %r1,40(%r1) /* Now set the real MSR */ mtmsrd %r6 --------------040607000109080202030001-- From owner-freebsd-ppc@FreeBSD.ORG Thu Feb 19 05:54:02 2015 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4DB587B1 for ; Thu, 19 Feb 2015 05:54:02 +0000 (UTC) Received: from asp.reflexion.net (outbound-242.asp.reflexion.net [69.84.129.242]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 00C93FA0 for ; Thu, 19 Feb 2015 05:54:01 +0000 (UTC) Received: (qmail 13792 invoked from network); 19 Feb 2015 05:53:54 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 19 Feb 2015 05:53:54 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v7.40.1) with SMTP; Thu, 19 Feb 2015 00:53:54 -0500 (EST) Received: (qmail 26192 invoked from network); 19 Feb 2015 05:53:53 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (DHE-RSA-AES256-SHA encrypted) SMTP; 19 Feb 2015 05:53:53 -0000 X-No-Relay: not in my network X-No-Relay: not in my network Received: from [192.168.1.8] (c-67-189-19-145.hsd1.or.comcast.net [67.189.19.145]) by iron2.pdx.net (Postfix) with ESMTPSA id 9A8B41C4052; Wed, 18 Feb 2015 21:53:46 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: PowerMac G5 powerpc64: new context where repeatedly booting varies between failing and working From: Mark Millard In-Reply-To: <836A3016-D41B-45CB-AD4B-946767212026@dsl-only.net> Date: Wed, 18 Feb 2015 21:53:51 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: <7CA43EE3-8C11-4FBD-9F8A-42DF08B82362@dsl-only.net> <5FE82152-BBF7-4C6D-932D-AEE70546CACA@dsl-only.net> <36C14790-8E66-4C9D-9F29-A137FB49439D@dsl-only.net> <836A3016-D41B-45CB-AD4B-946767212026@dsl-only.net> To: FreeBSD PowerPC ML , Nathan Whitehorn X-Mailer: Apple Mail (2.2070.6) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Feb 2015 05:54:02 -0000 Nathan W. wrote: > Interesting. I'm assuming this is due to a bug in the 32-/64-bit ABI=20= > thunking that is required to call into Open Firmware. Could you see if=20= > the attached patch helps? > -Nathan It appears that direct use of TOC_ENTRY and such is not = automatically/by-default available in 10.1-STABLE's context. Your basis = for the patch is for 11.0-CURRENT after the relocatable kernel changes. My context where I was lucky enough to get a memory layout that produced = the failure that allowed detecting the memory corruption (and has a = known way to quickly detect the specific corruption) is for some range = of versions of 10.1-STABLE when my GENERIC64vtsc has a particular set of = options enabled. I do not know how to take an arbitrary FreeBSD version = and give it such a handy context for the issue. So I will stick with = 10.1-STABLE as much as I can for investigating this issue. There is also the issue of the "once very early for many boots: %r1 and = %r3 corruption on openfirmware return". I've been using my hack to = "retry at most once per ofwcall use" to make my G5 quad-core PowerMac = context boot most of the time (rather than needing to power off then on = up to over a dozen times in a row to get a successful boot). The = super-early boot failure rate had been blocking most investigation = activities until I used this type of hack. The closest 10.1-STABLE partial match to your patch mixed with my = %r1/%r3 corruption handling that I've come up with overall is as = follows. (Tabs probably turned to spaces.) Do you think it is sufficient = for what you want tested? (My observations suggest that the non-volatile registers are preserved = by openfirmware even when I've seen other problems. I used to use = explicit storage instead but switched to this style for the = %r1/%r3-handling-hack part of the code because of it being invariant to = relocatable vs. not. I've used %r29, %r28, %r27 as needing to survive = the openfirmware call. %r25 does not need to do so.) Index: sys/powerpc/ofw/ofwcall64.S =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/powerpc/ofw/ofwcall64.S (revision 278443) +++ sys/powerpc/ofw/ofwcall64.S (working copy) @@ -114,23 +114,64 @@ * the old MSR so we can get them back later. */ mr %r5,%r1 - lis %r1,(ofwstk+OFWSTKSZ-32)@ha - addi %r1,%r1,(ofwstk+OFWSTKSZ-32)@l - std %r5,8(%r1) /* Save real stack pointer */ - std %r2,16(%r1) /* Save old TOC */ - std %r6,24(%r1) /* Save old MSR */ + lis %r1,(ofwstk+OFWSTKSZ-64)@ha + addi %r1,%r1,(ofwstk+OFWSTKSZ-64)@l + std %r5,40(%r1) /* Save real stack pointer */ + std %r2,48(%r1) /* Save old TOC */ + std %r6,56(%r1) /* Save old MSR */ li %r5,0 stw %r5,4(%r1) stw %r5,0(%r1) =20 + /* HACK: recording %r1 (FreeBSD SP) before openfirmware for use = in + * possible retry and also for testing for corruption = (net-change). + * %r29 is supposed to be non-volitile for darwin 32 bit = ABI. + */ + mr %r29,%r1 + + /* HACK: recording %r3 before openfirmware for use in possible = retry. + * %r28 is supposed to be non-volitile for darwin 32 bit = ABI. + */ + mr %r28,%r3 + + /* HACK: recording %r4 before openfirmware for use in possible = retry. + * %r27 is supposed to be non-volitile for darwin 32 bit = ABI. + */ + mr %r27,%r4 + /* Finally, branch to OF */ mtctr %r4 bctrl =20 + /* HACK: check if %r1 was corrupted (had a net-change) */ + cmpw %r29,%r1 + bne 2f /* stack pointer corrupted so go retry once */ + + /* HACK Notes: the observed corruption had %r1 changed and = %r1=3D%r3. + * This code is somewhat more general. + */ + + /* HACK: %r1 okay but check %r3 for being 0 or -1 vs. anything = else */ + xoris %r25,%r3,0 + cmpw %r25,%r3 + bne 2f /* %r3 was neither 0 nor -1 so corruption: go retry = once */ + +1: /* HACK: here both %r1 and %r3 appear to be okay: + * so sequential flow was for "no problems" + * but jumping here is a retry result being + * returned, possibly with forced-good values + * indicating a openfirmware error status (%r3=3D-1). + */ + + /* HACK removal: I've removed the mtsprg0 that put back + * FreeBSD's value to help with exceptions and + * and DDB display for when %r1 was corrupted. + */ + /* Reload stack pointer and MSR from the OFW stack */ - ld %r6,24(%r1) - ld %r2,16(%r1) - ld %r1,8(%r1) + ld %r6,56(%r1) + ld %r2,48(%r1) + ld %r1,40(%r1) =20 /* Now set the real MSR */ mtmsrd %r6 @@ -168,6 +209,40 @@ mtlr %r0 blr =20 +/* HACK: code for %r1 and/or %r3 corruption's single-retry */ +/* Still under openfirmware's msr, sprg0, stack values */ + +2: /* HACK: corruption observed so retry, restoring %r1 and %r3 = first + mr %r1,%r29 + mr %r3,%r28 + mtctr %r27 + bctrl + + /* HACK: check if %r1 was corrupted (had a net-change) */ + cmpw %r29,%r1 + bne 3f /* retry corrupted %r1 + * so go give up with %r3 being -1 and %r1 = forced-good + */ + + /* HACK Notes: the observed corruption had %r1 changed and = %r1=3D%r3 + * This code is somewhat more general. + */ + + /* HACK: %r1 okay but check %r3 for being 0 or -1 vs. anything = els + xoris %r25,%r3,0 + cmpw %r25,%r3 + beq 1b /* %r3 also was 0 or -1 so no corruption observed on = re + * so go do a normal return + */ + +3: /* Either %r1 had a net change after retry + * or %r3 was not one of 0,-1 after retry + * so force %r1 and have %r3 be -1 then go return + */ + mr %r1,%r29 + li %r3,-1 /* the openfirmware failure return value */ + b 1b + /* * RTAS 32-bit Entry Point. Similar to the OF one, but simpler (no = separate * stack) The context would be: root@FBSDG5M1:/usr/src # svnlite status ? .snap M sys/ddb/db_main.c M sys/ddb/db_script.c M sys/powerpc/conf ? sys/powerpc/conf/GENERIC64vtsc M sys/powerpc/ofw/ofw_machdep.c M sys/powerpc/ofw/ofwcall64.S M sys/powerpc/powermac/platform_powermac.c root@FBSDG5M1:/usr/src # svnlite info Path: . Working Copy Root Path: /usr/src URL: https://svn0.us-west.freebsd.org/base/stable/10 Relative URL: ^/stable/10 Repository Root: https://svn0.us-west.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 278443 Node Kind: directory Schedule: normal Last Changed Author: brooks Last Changed Rev: 278443 Last Changed Date: 2015-02-09 01:22:47 -0800 (Mon, 09 Feb 2015) The ddb's are there to have an automatic display on failure if it = happens from a FreeBSD exception vector context. ofw_machdep.c does the check for corruption around its ofwcall. platform_powermac.c has a printf for reporting the expected pointer = value just before it has ever been observed to go bad. ofwcall64.S: See above if it is acceptable. root@FBSDG5M1:/usr/src # more sys/powerpc/conf/GENERIC64vtsc include GENERIC64 ident GENERIC64vtsc nooptions PS3 #Sony Playstation 3 = HACK!!! to allow sc options DDB # HACK!!! to dump early crash = info (but 11.0-CURRENT already has it) options GDB # HACK!!! ... options VERBOSE_SYSINIT options BOOTVERBOSE=3D1 options BOOTHOWTO=3DRB_VERBOSE #options KTR #options KTR_MASK=3DKTR_TRAP #options KTR_CPUMASK=3D0xF #options KTR_VERBOSE # HACK!!! to allow sc for 2560x1440 display on Radeon X1950 that vt = historically mishandled during booting device sc #device kbdmux # HACK: already listed by vt options SC_OFWFB # OFW frame buffer options SC_DFLT_FONT # compile font in makeoptions SC_DFLT_FONT=3Dcp437 # Disable extra checking typically used for FreeBSD 11.0-CURRENT: nooptions DEADLKRES #Enable the deadlock resolver nooptions INVARIANTS #Enable calls of extra sanity = checking nooptions INVARIANT_SUPPORT #Extra sanity checks of internal = structures, required by INVARIANTS nooptions WITNESS #Enable checks to detect = deadlocks and cycles nooptions WITNESS_SKIPSPIN #Don't run witness on spinlocks = for speed nooptions MALLOC_DEBUG_MAXZONES # Separate malloc(9) zones root@FBSDG5M1:/usr/src # more /etc/make.conf WRKDIRPREFIX=3D/usr/obj/portswork WITH_DEBUG=3D MALLOC_PRODUCTION=3D root@FBSDG5M1:/usr/src # more /etc/src.conf=20 CFLAGS+=3D-DELF_VERBOSE #WITH_DEBUG_FILES=3D #WITHOUT_CLANG=3D root@FBSDG5M1:/usr/src # more /boot/loader.conf #kernel=3D"kernel" #kernel=3D"kernel10.1RE" kernel=3D"kernel10.1S" #kernel=3D"kernel11C" verbose_loading=3D"YES" kern.vty=3Dvt =3D=3D=3D Mark Millard markmi at dsl-only.net On 2015-Feb-18, at 04:51 AM, Mark Millard = wrote: I modified openfirmware_core to check on the status of the pointer value = between most of its stages. With this I've also seen later failures than = the usual one, such as after a OF_finddevice use has its ofwcall return. And the change nails down the stage greatly for at what point it = corrupts memory when it does fail... // OKAY HERE result =3D ofwcall(args); // SOMETIMES CORRUPTED HERE Unfortunately to get this far ofwcall is my variant in order to, for = example, enable recovery/retry from observed bad r1/r3 register problems = that happened super-early on return from openfirmware in a high = percentage of my boot attempts. I have yet to see how close to normal I = can get ofwcall to be while still allowing this type of test. The relevant detection code in openfirmware_core is... /* HACK */ extern void** authnone_create(void); ... static __inline void ofw_restore_trap_vec(char *restore_trap_vec) { if (!ofw_real_mode) return; bcopy(restore_trap_vec, (void *)EXC_RST, EXC_LAST - EXC_RST); __syncicache(EXC_RSVD, EXC_LAST - EXC_RSVD); } ... static int openfirmware_core(void *args) { int result; register_t oldmsr; /* HACK */ void** jnk1pp; void** jnk2pp; void* jnk =3D *authnone_create() if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); /* * Turn off exceptions - we really don't want to end up * anywhere unexpected with PCPU set to something strange * or the stack pointer wrong. */ oldmsr =3D intr_disable(); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); ofw_sprg_prepare(); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); /* Save trap vectors */ ofw_save_trap_vec(save_trap_of); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); /* Restore initially saved trap vectors */ ofw_restore_trap_vec(save_trap_init); /* HACK */ jnk1pp =3D authnone_create(); #if defined(AIM) && !defined(__powerpc64__) /* * Clear battable[] translations */ if (!(cpu_features & PPC_FEATURE_64)) __asm __volatile("mtdbatu 2, %0\n" "mtdbatu 3, %0" : : "r" (0)); isync(); #endif result =3D ofwcall(args); /* HACK */ jnk2pp =3D authnone_create(); /* Restore trap vecotrs */ ofw_restore_trap_vec(save_trap_of); /* HACK */ if (jnk !=3D *jnk1pp) jnk =3D *authnone_create(); if (jnk !=3D *jnk2pp) jnk =3D *authnone_create(); /* Note: *jnk2pp above is what detects the bad pointer value when it = goes bad */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); ofw_sprg_restore(); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); intr_restore(oldmsr); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); return (result); } In the code this translates to... 00000000008a671c <.openfirmware_core+0x168> bl 00000000007a3de4 = <.authnone_create> 00000000008a6720 <.openfirmware_core+0x16c> crmove 4*cr7+so,4*cr7+so 00000000008a6724 <.openfirmware_core+0x170> mr r28,r3 Note: The above loads r28 with a good address that later does not fail = when later dereferenced (while FreeBSD's exception vectors are in = place). 00000000008a6728 <.openfirmware_core+0x174> mr r3,r29 00000000008a672c <.openfirmware_core+0x178> bl 00000000008ac930 = <.ofwcall> 00000000008a6730 <.openfirmware_core+0x17c> crmove 4*cr7+so,4*cr7+so 00000000008a6734 <.openfirmware_core+0x180> mr r26,r3 00000000008a6738 <.openfirmware_core+0x184> bl 00000000007a3de4 = <.authnone_create> 00000000008a673c <.openfirmware_core+0x188> crmove 4*cr7+so,4*cr7+so 00000000008a6740 <.openfirmware_core+0x18c> mr r29,r3 Note: The above loads r29 with the bad address that is later detected by = referencing it. This is the corrupted pointer value. 00000000008a6744 <.openfirmware_core+0x190> ld r3,21216(r2) 00000000008a6748 <.openfirmware_core+0x194> lwz r0,0(r3) 00000000008a674c <.openfirmware_core+0x198> cmpwi cr7,r0,0 00000000008a6750 <.openfirmware_core+0x19c> beq+ cr7,00000000008a6778 = <.openfirmware_core+0x1c4> 00000000008a6754 <.openfirmware_core+0x1a0> addi r3,r3,16 00000000008a6758 <.openfirmware_core+0x1a4> li r4,256 00000000008a675c <.openfirmware_core+0x1a8> li r5,11776 00000000008a6760 <.openfirmware_core+0x1ac> bl 00000000008c158c = <.bcopy> 00000000008a6764 <.openfirmware_core+0x1b0> crmove 4*cr7+so,4*cr7+so 00000000008a6768 <.openfirmware_core+0x1b4> li r3,0 00000000008a676c <.openfirmware_core+0x1b8> li r4,12032 00000000008a6770 <.openfirmware_core+0x1bc> bl 00000000008d5358 = <.__syncicache> Note: At this point it is back to FreeBSD exception vectors so kernel = debug display will work for bad pointer detection tests. 00000000008a6774 <.openfirmware_core+0x1c0> crmove 4*cr7+so,4*cr7+so 00000000008a6778 <.openfirmware_core+0x1c4> ld r0,0(r28) Note: The above dereference of the before ofwcall pointer value (in r28) = does not detect a bad pointer. 00000000008a677c <.openfirmware_core+0x1c8> cmpd cr7,r0,r30 00000000008a6780 <.openfirmware_core+0x1cc> beq- cr7,00000000008a6790 = <.openfirmware_core+0x1dc> 00000000008a6784 <.openfirmware_core+0x1d0> bl 00000000007a3de4 = <.authnone_create> 00000000008a6788 <.openfirmware_core+0x1d4> crmove 4*cr7+so,4*cr7+so 00000000008a678c <.openfirmware_core+0x1d8> ld r30,0(r3) 00000000008a6790 <.openfirmware_core+0x1dc> ld r0,0(r29) It is that last instruction (.openfirmware_core+0x1dc) that "detects" = the bad pointer and leads to a kernel debugger display of some of the = corrupted memory, including the stored pointer that the above code = accessed and dereferenced to detect the problem. So the pointer was good just before the ofwcall and was bad just after = it. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2015-Feb-17, at 09:34 PM, Mark Millard = wrote: [I had sent Nathan W. and Justin H. a picture of a display of a = boot-time corrupted memory region. This time I tried to find the start = and end of the region and I'm documenting in a textual form more = appropriate to the list. I have also removed prior Email history from = this Email but there is much context one must check that history for.] Several of the new values put in place by the .got memory corruption = reported below match up with .opd or other types of addresses reported = by objdump for my /boot/kernel10.1S/kernel. They are noted below as I = list detailed differences. I made the early-boot-crash display a larger range and the span of the = corruption seemed to go as follows for the corruption of part of the = .got area. Also I induced a deference of the bad pointer as soon as it = is discovered after the OF_peer(0) in question returns so later code = would not be involved when it crashes. (Crash early, crash often...) Overall structure: 0xd2da37 and before as far as I looked: no corruption found. The area from 0xd2da38-0xd2dc9F: largely corrupted. 0x268 or 616 bytes = or so in this corrupted range. 616=3D77*8. After that range: good again as far as I looked. The details: Warning: The below is based on hand transcribed information from screen = pictures that I took. Showing pair of lines (good then corrupted), using x/x style lines: 0xd2da30: 0, b4fd2c, 0, b4fd70 0xd2da30: 0, b4fd2c, 0, 0 0xd2da40: 0, e28948, 0, e1e460 0xd2da40: 0, 24000042, 0, d00058 (24000042 looks like a cr value?) (0000000000d00058 l .opd 0000000000000018 = ofw_rendezvous_dispatch) 0xd2da50: 0, bc7de8, 0, bc7e08 0xd2da50: 0, cde110, c0000000, 8740 (0xc000000000008740 looks like a stack address?) (0000000000cde110 g F .opd 0000000000000018 = smp_no_rendevous_barrier) 0xd2da60: 0, cd8470, 0, bd2608 0xd2da60: 0, 1, 0, c3a30c (0000000000c3a30c g .data 0000000000000000 ofw_sprg0_save) 0xd2da70: 0, bb5ea0, 0, b70870 0xd2da70: 0, 1c35ec0, 0, 0 0xd2da80: 0, c49918, 0, bc7e18 0xd2da80: 0, 44000022, 0, de4b30 (44000022 looks like a cr value?) (0000000000de4b30 g O .bss 0000000000000460 thread0) 0xd2da90: 0, b720a0, 0, b71370 0xd2da90: 900000000, 1032, 0, ff846d78 (9000000000001032 looks like a SRR1 value.) (ff846d78 is openfirmware entry point?) 0xd2daa0: 0, bc7e30, 0, bc7e58 0xd2daa0: 0, e39080, 100000000, 3030 (0000000000e39080 g O .bss 0000000000020000 __pcpu) (1000000000003030 looks like a SRR1 value?) 0xd2dab0: 0, bc7e80, 0, bc7eb0 0xd2dab0: c0000000, 83b0, 0, c3a280 (0xc0000000000083b0 looks like a stack address?) (c3a280 is inside my PowerMac G5 specific hack's ofwstk area: c392a0 up = to 0x3a2a0) (I've been gathering evidence about early-boot G5 crashes.) 0xd2dac0: 0, bc7ed0, 0, cf2960 0xd2dac0: 0, c40000, 0, c40000 0xd2dad0: 0, bc7f00, 0, bc7f28 0xd2dad0: 0, c40000, 0, c40000 0xd2dae0: 0, b72400, 0, bc7f28 0xd2dae0: c0000000, 8740, 0, cde110 (0xc000000000008740 looks like a stack address?) (0000000000cde110 g F .opd 0000000000000018 = smp_no_rendevous_barrier) 0xd2daf0: 0, cf2b28, 0, b716a0 0xd2daf0: 0, d00058, 0, cde110 (d00058 was also at 0xd2da4c and was followed by cde110 there.) (0000000000cde110 g F .opd 0000000000000018 = smp_no_rendevous_barrier) 0xd2db00: 0, cf2b88, 0, cf2b70 0xd2db00: 0, e6c280, 0, 0 (e6c280 is inside the emergency_buffer.7752 area: e6c278 up to e6c378) 0xd2db10: 0, cf2b58, 0, 8480 0xd2db10: 900000000, 1032, c0000000, 8740 (9000000000001032 looks like a SRR1 value?) (0xc000000000008740 looks like a stack address?) 0xd2db20: 0, c2d920, 0, cf2b10 0xd2db20: 0, c2d920, 0, cf2b10 (yep: unchanged!) 0xd2db30: 0, b71718, 0, c49888 0xd2db30: 0, ff846734, 10000000, 3030 (ff846734 would seem to be an openfirmware code address?) (1000000000003030 looks like a SRR1 value?) 0xd2db40: 0, c498a0, 0, c54000 0xd2db40: 0, c498a0, 0, ff846d78 (Yep: c498a0 was unchanged) (ff846d78 is openfirmware entry point?) 0xd2db50: 0, e313a8, 0, e31608 0xd2db50: 24000042, e313a8, 0, 0 (24000042 looks like a cr value?) (Trying to store to address 0x2400004200e313a8 for a specific type of 10.1-STABLE build is how the problem was originally noticed.) 0xd2db60: 0, c31f80, 0, bc81e8 0xd2db60: 0, c31f80, 0, 0 (Yep: 0x0000000000c31f80 is unchanged.) 0xd2db70: 0, e31408, 0, bc8228 0xd2db70: 200000, e31408, 0, bc8228 (Yep: Only the 0x200000 was a change.) 0xd2db80: 0, c32488, 0, bc8238 0xd2db80: 0, 1, 10000000, 3030 (1000000000003030 looks like a SRR1 value?) 0xd2db90: 0, e1e460, 0, c31fc0 0xd2db90: 0, 0, 0, 7ff7e800 0xd2dba0: 0, e31608, 0, bc8260 0xd2dba0: 0, 1000000a, 0, bc8260 (Yep: 0x0000000000bc8260 unchanged.) 0xd2dbb0: 0, e1e460, 0, e1fa60 0xd2dbb0: 0, e1e460, 0, e1fa60 (yep: unchanged!) 0xd2dbc0: 0, bc8288, 0, c32488 0xd2dbc0: 111081, 0, fd3c2000, 0 (fd3c2000 in openfirmware area?) 0xd2dbd0: 0, e3153c, 0, bc8298 0xd2dbd0: 10, 0, 0, 0 Now a few unchanged: 0xd2de0-0xd2dc1F Then a change in the pattern of corruptions for the rest of the = corrupted area: 0xd2dc20: 0, bc8288, 0, bc82e8 0xd2dc20: 0, bc8288, 127f500, bc82e8 Note how bc8288 and bc82e8 did not change. =46rom here on those two columns are not corrupted but the other two are. 0xd2dc30: 0, bc8300, 0, c32488 0xd2dc30: 8000000, bc8300, e7d540, c32488 0xd2dc40: 0, b4fef0, 0, e31558 0xd2dc40: ecc40, b4fef0, 84eec80, e31558 0xd2dc50: 0, bc8308, 0, cf2f00 0xd2dc50: 1e85440, bc8308, 8766200, cf2f00 0xd2dc60: 0, bc8310, 0, bc8350 0xd2dc60: fb9040, bc8310, 93bb000, bc8350 0xd2dc70: 0, c32038, 0, de5718 0xd2dc70: 94f6b00, c32038, 8632600, de5718 0xd2dc80: 0, de7768, 0, bc3760 0xd2dc80: 1fc0f40, de7768, 10f4b40, bc3760 0xd2dc90: 0, de7768, 0, e1fa00 0xd2dc90: 99e5700, cfc658, 228740, e1fa00 And after that things match for as far as I've looked: no corruptions. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-ppc@FreeBSD.ORG Thu Feb 19 07:51:56 2015 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BC84EF7E for ; Thu, 19 Feb 2015 07:51:56 +0000 (UTC) Received: from asp.reflexion.net (outbound-242.asp.reflexion.net [69.84.129.242]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6D7BFCC1 for ; Thu, 19 Feb 2015 07:51:55 +0000 (UTC) Received: (qmail 7369 invoked from network); 19 Feb 2015 07:51:54 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 19 Feb 2015 07:51:54 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v7.40.1) with SMTP; Thu, 19 Feb 2015 02:51:54 -0500 (EST) Received: (qmail 13478 invoked from network); 19 Feb 2015 07:51:53 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (DHE-RSA-AES256-SHA encrypted) SMTP; 19 Feb 2015 07:51:53 -0000 X-No-Relay: not in my network X-No-Relay: not in my network Received: from [192.168.1.8] (c-67-189-19-145.hsd1.or.comcast.net [67.189.19.145]) by iron2.pdx.net (Postfix) with ESMTPSA id 8B1271C4052; Wed, 18 Feb 2015 23:51:52 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: PowerMac G5 powerpc64: new context where repeatedly booting varies between failing and working From: Mark Millard In-Reply-To: Date: Wed, 18 Feb 2015 23:51:51 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: <7CA43EE3-8C11-4FBD-9F8A-42DF08B82362@dsl-only.net> <5FE82152-BBF7-4C6D-932D-AEE70546CACA@dsl-only.net> <36C14790-8E66-4C9D-9F29-A137FB49439D@dsl-only.net> <836A3016-D41B-45CB-AD4B-946767212026@dsl-only.net> To: FreeBSD PowerPC ML , Nathan Whitehorn X-Mailer: Apple Mail (2.2070.6) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Feb 2015 07:51:56 -0000 My variant of your patch did not fix the sometimes-corruption (that = typically at the same place in the boot sequence). The same good vs. bad = values result and the overall range of the corruption is similar as = well. I've no hint that ofwcall itself (outside openfirmware) is writing the = memory locations that end up corrupted. The pattern of corruption in the = picture that I sent makes no sense for that stage doing it that I can = see. But... It would seem that either (A) openfirmware itself wrote those corrupted = locations or (B) some form of dynamic binding is involved and injected = some one-time code that did it. In part I say this for (A) because at that point the openfirmware = exception vectors are supposed to be in place so exception handling = would be openfirmware code too as far as I know. I wish I had a Logic Analyzer configuration for the G5 processor to = record and analyze activity with. I've not figured out a way to get = useful evidence from the context. Thinking about it if (A) is the issue: the patch is using storage = locations the FreeBSD powerpc64 ABI way(/places) but Apple's = openfirmware on the G5's likely uses a Darwin PowerPC ABI style: does = not even use TOC's and has %r2 for general use as a volatile register. In fact as I remember when I looked up the openfirmware entry's first = under a dozen instructions with x/i in ddb it was something like: or r2,r0,r2, (a form of replaceable no-op given what follows?) addis r2,r0,-0x49 (so %r2 ends up as 0xFFB70000 as a 32-bit = interpretation?) ori r2,r2,0xf00 (so %r2 updates to 0xFFB70F00 as a ...?) std r1,r2,0x8, std r0,r2,0x10, mfspr r0,lr std r0,r2,0x120, mfmsr r1 std r1,r2,0x108, In other words: %r2's initial value is ignored and its value is quickly = set and then it is used to point to a memory area to write to. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2015-Feb-18, at 09:53 PM, Mark Millard = wrote: Nathan W. wrote: > Interesting. I'm assuming this is due to a bug in the 32-/64-bit ABI=20= > thunking that is required to call into Open Firmware. Could you see if=20= > the attached patch helps? > -Nathan It appears that direct use of TOC_ENTRY and such is not = automatically/by-default available in 10.1-STABLE's context. Your basis = for the patch is for 11.0-CURRENT after the relocatable kernel changes. My context where I was lucky enough to get a memory layout that produced = the failure that allowed detecting the memory corruption (and has a = known way to quickly detect the specific corruption) is for some range = of versions of 10.1-STABLE when my GENERIC64vtsc has a particular set of = options enabled. I do not know how to take an arbitrary FreeBSD version = and give it such a handy context for the issue. So I will stick with = 10.1-STABLE as much as I can for investigating this issue. There is also the issue of the "once very early for many boots: %r1 and = %r3 corruption on openfirmware return". I've been using my hack to = "retry at most once per ofwcall use" to make my G5 quad-core PowerMac = context boot most of the time (rather than needing to power off then on = up to over a dozen times in a row to get a successful boot). The = super-early boot failure rate had been blocking most investigation = activities until I used this type of hack. The closest 10.1-STABLE partial match to your patch mixed with my = %r1/%r3 corruption handling that I've come up with overall is as = follows. (Tabs probably turned to spaces.) Do you think it is sufficient = for what you want tested? (My observations suggest that the non-volatile registers are preserved = by openfirmware even when I've seen other problems. I used to use = explicit storage instead but switched to this style for the = %r1/%r3-handling-hack part of the code because of it being invariant to = relocatable vs. not. I've used %r29, %r28, %r27 as needing to survive = the openfirmware call. %r25 does not need to do so.) Index: sys/powerpc/ofw/ofwcall64.S =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/powerpc/ofw/ofwcall64.S (revision 278443) +++ sys/powerpc/ofw/ofwcall64.S (working copy) @@ -114,23 +114,64 @@ * the old MSR so we can get them back later. */ mr %r5,%r1 - lis %r1,(ofwstk+OFWSTKSZ-32)@ha - addi %r1,%r1,(ofwstk+OFWSTKSZ-32)@l - std %r5,8(%r1) /* Save real stack pointer */ - std %r2,16(%r1) /* Save old TOC */ - std %r6,24(%r1) /* Save old MSR */ + lis %r1,(ofwstk+OFWSTKSZ-64)@ha + addi %r1,%r1,(ofwstk+OFWSTKSZ-64)@l + std %r5,40(%r1) /* Save real stack pointer */ + std %r2,48(%r1) /* Save old TOC */ + std %r6,56(%r1) /* Save old MSR */ li %r5,0 stw %r5,4(%r1) stw %r5,0(%r1) + /* HACK: recording %r1 (FreeBSD SP) before openfirmware for use = in + * possible retry and also for testing for corruption = (net-change). + * %r29 is supposed to be non-volitile for darwin 32 bit = ABI. + */ + mr %r29,%r1 + + /* HACK: recording %r3 before openfirmware for use in possible = retry. + * %r28 is supposed to be non-volitile for darwin 32 bit = ABI. + */ + mr %r28,%r3 + + /* HACK: recording %r4 before openfirmware for use in possible = retry. + * %r27 is supposed to be non-volitile for darwin 32 bit = ABI. + */ + mr %r27,%r4 + /* Finally, branch to OF */ mtctr %r4 bctrl + /* HACK: check if %r1 was corrupted (had a net-change) */ + cmpw %r29,%r1 + bne 2f /* stack pointer corrupted so go retry once */ + + /* HACK Notes: the observed corruption had %r1 changed and = %r1=3D%r3. + * This code is somewhat more general. + */ + + /* HACK: %r1 okay but check %r3 for being 0 or -1 vs. anything = else */ + xoris %r25,%r3,0 + cmpw %r25,%r3 + bne 2f /* %r3 was neither 0 nor -1 so corruption: go retry = once */ + +1: /* HACK: here both %r1 and %r3 appear to be okay: + * so sequential flow was for "no problems" + * but jumping here is a retry result being + * returned, possibly with forced-good values + * indicating a openfirmware error status (%r3=3D-1). + */ + + /* HACK removal: I've removed the mtsprg0 that put back + * FreeBSD's value to help with exceptions and + * and DDB display for when %r1 was corrupted. + */ + /* Reload stack pointer and MSR from the OFW stack */ - ld %r6,24(%r1) - ld %r2,16(%r1) - ld %r1,8(%r1) + ld %r6,56(%r1) + ld %r2,48(%r1) + ld %r1,40(%r1) /* Now set the real MSR */ mtmsrd %r6 @@ -168,6 +209,40 @@ mtlr %r0 blr +/* HACK: code for %r1 and/or %r3 corruption's single-retry */ +/* Still under openfirmware's msr, sprg0, stack values */ + +2: /* HACK: corruption observed so retry, restoring %r1 and %r3 = first + mr %r1,%r29 + mr %r3,%r28 + mtctr %r27 + bctrl + + /* HACK: check if %r1 was corrupted (had a net-change) */ + cmpw %r29,%r1 + bne 3f /* retry corrupted %r1 + * so go give up with %r3 being -1 and %r1 = forced-good + */ + + /* HACK Notes: the observed corruption had %r1 changed and = %r1=3D%r3 + * This code is somewhat more general. + */ + + /* HACK: %r1 okay but check %r3 for being 0 or -1 vs. anything = els + xoris %r25,%r3,0 + cmpw %r25,%r3 + beq 1b /* %r3 also was 0 or -1 so no corruption observed on = re + * so go do a normal return + */ + +3: /* Either %r1 had a net change after retry + * or %r3 was not one of 0,-1 after retry + * so force %r1 and have %r3 be -1 then go return + */ + mr %r1,%r29 + li %r3,-1 /* the openfirmware failure return value */ + b 1b + /* * RTAS 32-bit Entry Point. Similar to the OF one, but simpler (no = separate * stack) The context would be: root@FBSDG5M1:/usr/src # svnlite status ? .snap M sys/ddb/db_main.c M sys/ddb/db_script.c M sys/powerpc/conf ? sys/powerpc/conf/GENERIC64vtsc M sys/powerpc/ofw/ofw_machdep.c M sys/powerpc/ofw/ofwcall64.S M sys/powerpc/powermac/platform_powermac.c root@FBSDG5M1:/usr/src # svnlite info Path: . Working Copy Root Path: /usr/src URL: https://svn0.us-west.freebsd.org/base/stable/10 Relative URL: ^/stable/10 Repository Root: https://svn0.us-west.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 278443 Node Kind: directory Schedule: normal Last Changed Author: brooks Last Changed Rev: 278443 Last Changed Date: 2015-02-09 01:22:47 -0800 (Mon, 09 Feb 2015) The ddb's are there to have an automatic display on failure if it = happens from a FreeBSD exception vector context. ofw_machdep.c does the check for corruption around its ofwcall. platform_powermac.c has a printf for reporting the expected pointer = value just before it has ever been observed to go bad. ofwcall64.S: See above if it is acceptable. root@FBSDG5M1:/usr/src # more sys/powerpc/conf/GENERIC64vtsc include GENERIC64 ident GENERIC64vtsc nooptions PS3 #Sony Playstation 3 = HACK!!! to allow sc options DDB # HACK!!! to dump early crash = info (but 11.0-CURRENT already has it) options GDB # HACK!!! ... options VERBOSE_SYSINIT options BOOTVERBOSE=3D1 options BOOTHOWTO=3DRB_VERBOSE #options KTR #options KTR_MASK=3DKTR_TRAP #options KTR_CPUMASK=3D0xF #options KTR_VERBOSE # HACK!!! to allow sc for 2560x1440 display on Radeon X1950 that vt = historically mishandled during booting device sc #device kbdmux # HACK: already listed by vt options SC_OFWFB # OFW frame buffer options SC_DFLT_FONT # compile font in makeoptions SC_DFLT_FONT=3Dcp437 # Disable extra checking typically used for FreeBSD 11.0-CURRENT: nooptions DEADLKRES #Enable the deadlock resolver nooptions INVARIANTS #Enable calls of extra sanity = checking nooptions INVARIANT_SUPPORT #Extra sanity checks of internal = structures, required by INVARIANTS nooptions WITNESS #Enable checks to detect = deadlocks and cycles nooptions WITNESS_SKIPSPIN #Don't run witness on spinlocks = for speed nooptions MALLOC_DEBUG_MAXZONES # Separate malloc(9) zones root@FBSDG5M1:/usr/src # more /etc/make.conf WRKDIRPREFIX=3D/usr/obj/portswork WITH_DEBUG=3D MALLOC_PRODUCTION=3D root@FBSDG5M1:/usr/src # more /etc/src.conf=20 CFLAGS+=3D-DELF_VERBOSE #WITH_DEBUG_FILES=3D #WITHOUT_CLANG=3D root@FBSDG5M1:/usr/src # more /boot/loader.conf #kernel=3D"kernel" #kernel=3D"kernel10.1RE" kernel=3D"kernel10.1S" #kernel=3D"kernel11C" verbose_loading=3D"YES" kern.vty=3Dvt =3D=3D=3D Mark Millard markmi at dsl-only.net On 2015-Feb-18, at 04:51 AM, Mark Millard = wrote: I modified openfirmware_core to check on the status of the pointer value = between most of its stages. With this I've also seen later failures than = the usual one, such as after a OF_finddevice use has its ofwcall return. And the change nails down the stage greatly for at what point it = corrupts memory when it does fail... // OKAY HERE result =3D ofwcall(args); // SOMETIMES CORRUPTED HERE Unfortunately to get this far ofwcall is my variant in order to, for = example, enable recovery/retry from observed bad r1/r3 register problems = that happened super-early on return from openfirmware in a high = percentage of my boot attempts. I have yet to see how close to normal I = can get ofwcall to be while still allowing this type of test. The relevant detection code in openfirmware_core is... /* HACK */ extern void** authnone_create(void); ... static __inline void ofw_restore_trap_vec(char *restore_trap_vec) { if (!ofw_real_mode) return; bcopy(restore_trap_vec, (void *)EXC_RST, EXC_LAST - EXC_RST); __syncicache(EXC_RSVD, EXC_LAST - EXC_RSVD); } ... static int openfirmware_core(void *args) { int result; register_t oldmsr; /* HACK */ void** jnk1pp; void** jnk2pp; void* jnk =3D *authnone_create() if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); /* * Turn off exceptions - we really don't want to end up * anywhere unexpected with PCPU set to something strange * or the stack pointer wrong. */ oldmsr =3D intr_disable(); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); ofw_sprg_prepare(); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); /* Save trap vectors */ ofw_save_trap_vec(save_trap_of); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); /* Restore initially saved trap vectors */ ofw_restore_trap_vec(save_trap_init); /* HACK */ jnk1pp =3D authnone_create(); #if defined(AIM) && !defined(__powerpc64__) /* * Clear battable[] translations */ if (!(cpu_features & PPC_FEATURE_64)) __asm __volatile("mtdbatu 2, %0\n" "mtdbatu 3, %0" : : "r" (0)); isync(); #endif result =3D ofwcall(args); /* HACK */ jnk2pp =3D authnone_create(); /* Restore trap vecotrs */ ofw_restore_trap_vec(save_trap_of); /* HACK */ if (jnk !=3D *jnk1pp) jnk =3D *authnone_create(); if (jnk !=3D *jnk2pp) jnk =3D *authnone_create(); /* Note: *jnk2pp above is what detects the bad pointer value when it = goes bad */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); ofw_sprg_restore(); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); intr_restore(oldmsr); /* HACK */ if (jnk =3D=3D *authnone_create()) jnk =3D *authnone_create(); return (result); } In the code this translates to... 00000000008a671c <.openfirmware_core+0x168> bl 00000000007a3de4 = <.authnone_create> 00000000008a6720 <.openfirmware_core+0x16c> crmove 4*cr7+so,4*cr7+so 00000000008a6724 <.openfirmware_core+0x170> mr r28,r3 Note: The above loads r28 with a good address that later does not fail = when later dereferenced (while FreeBSD's exception vectors are in = place). 00000000008a6728 <.openfirmware_core+0x174> mr r3,r29 00000000008a672c <.openfirmware_core+0x178> bl 00000000008ac930 = <.ofwcall> 00000000008a6730 <.openfirmware_core+0x17c> crmove 4*cr7+so,4*cr7+so 00000000008a6734 <.openfirmware_core+0x180> mr r26,r3 00000000008a6738 <.openfirmware_core+0x184> bl 00000000007a3de4 = <.authnone_create> 00000000008a673c <.openfirmware_core+0x188> crmove 4*cr7+so,4*cr7+so 00000000008a6740 <.openfirmware_core+0x18c> mr r29,r3 Note: The above loads r29 with the bad address that is later detected by = referencing it. This is the corrupted pointer value. 00000000008a6744 <.openfirmware_core+0x190> ld r3,21216(r2) 00000000008a6748 <.openfirmware_core+0x194> lwz r0,0(r3) 00000000008a674c <.openfirmware_core+0x198> cmpwi cr7,r0,0 00000000008a6750 <.openfirmware_core+0x19c> beq+ cr7,00000000008a6778 = <.openfirmware_core+0x1c4> 00000000008a6754 <.openfirmware_core+0x1a0> addi r3,r3,16 00000000008a6758 <.openfirmware_core+0x1a4> li r4,256 00000000008a675c <.openfirmware_core+0x1a8> li r5,11776 00000000008a6760 <.openfirmware_core+0x1ac> bl 00000000008c158c = <.bcopy> 00000000008a6764 <.openfirmware_core+0x1b0> crmove 4*cr7+so,4*cr7+so 00000000008a6768 <.openfirmware_core+0x1b4> li r3,0 00000000008a676c <.openfirmware_core+0x1b8> li r4,12032 00000000008a6770 <.openfirmware_core+0x1bc> bl 00000000008d5358 = <.__syncicache> Note: At this point it is back to FreeBSD exception vectors so kernel = debug display will work for bad pointer detection tests. 00000000008a6774 <.openfirmware_core+0x1c0> crmove 4*cr7+so,4*cr7+so 00000000008a6778 <.openfirmware_core+0x1c4> ld r0,0(r28) Note: The above dereference of the before ofwcall pointer value (in r28) = does not detect a bad pointer. 00000000008a677c <.openfirmware_core+0x1c8> cmpd cr7,r0,r30 00000000008a6780 <.openfirmware_core+0x1cc> beq- cr7,00000000008a6790 = <.openfirmware_core+0x1dc> 00000000008a6784 <.openfirmware_core+0x1d0> bl 00000000007a3de4 = <.authnone_create> 00000000008a6788 <.openfirmware_core+0x1d4> crmove 4*cr7+so,4*cr7+so 00000000008a678c <.openfirmware_core+0x1d8> ld r30,0(r3) 00000000008a6790 <.openfirmware_core+0x1dc> ld r0,0(r29) It is that last instruction (.openfirmware_core+0x1dc) that "detects" = the bad pointer and leads to a kernel debugger display of some of the = corrupted memory, including the stored pointer that the above code = accessed and dereferenced to detect the problem. So the pointer was good just before the ofwcall and was bad just after = it. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2015-Feb-17, at 09:34 PM, Mark Millard = wrote: [I had sent Nathan W. and Justin H. a picture of a display of a = boot-time corrupted memory region. This time I tried to find the start = and end of the region and I'm documenting in a textual form more = appropriate to the list. I have also removed prior Email history from = this Email but there is much context one must check that history for.] Several of the new values put in place by the .got memory corruption = reported below match up with .opd or other types of addresses reported = by objdump for my /boot/kernel10.1S/kernel. They are noted below as I = list detailed differences. I made the early-boot-crash display a larger range and the span of the = corruption seemed to go as follows for the corruption of part of the = .got area. Also I induced a deference of the bad pointer as soon as it = is discovered after the OF_peer(0) in question returns so later code = would not be involved when it crashes. (Crash early, crash often...) Overall structure: 0xd2da37 and before as far as I looked: no corruption found. The area from 0xd2da38-0xd2dc9F: largely corrupted. 0x268 or 616 bytes = or so in this corrupted range. 616=3D77*8. After that range: good again as far as I looked. The details: Warning: The below is based on hand transcribed information from screen = pictures that I took. Showing pair of lines (good then corrupted), using x/x style lines: 0xd2da30: 0, b4fd2c, 0, b4fd70 0xd2da30: 0, b4fd2c, 0, 0 0xd2da40: 0, e28948, 0, e1e460 0xd2da40: 0, 24000042, 0, d00058 (24000042 looks like a cr value?) (0000000000d00058 l .opd 0000000000000018 = ofw_rendezvous_dispatch) 0xd2da50: 0, bc7de8, 0, bc7e08 0xd2da50: 0, cde110, c0000000, 8740 (0xc000000000008740 looks like a stack address?) (0000000000cde110 g F .opd 0000000000000018 = smp_no_rendevous_barrier) 0xd2da60: 0, cd8470, 0, bd2608 0xd2da60: 0, 1, 0, c3a30c (0000000000c3a30c g .data 0000000000000000 ofw_sprg0_save) 0xd2da70: 0, bb5ea0, 0, b70870 0xd2da70: 0, 1c35ec0, 0, 0 0xd2da80: 0, c49918, 0, bc7e18 0xd2da80: 0, 44000022, 0, de4b30 (44000022 looks like a cr value?) (0000000000de4b30 g O .bss 0000000000000460 thread0) 0xd2da90: 0, b720a0, 0, b71370 0xd2da90: 900000000, 1032, 0, ff846d78 (9000000000001032 looks like a SRR1 value.) (ff846d78 is openfirmware entry point?) 0xd2daa0: 0, bc7e30, 0, bc7e58 0xd2daa0: 0, e39080, 100000000, 3030 (0000000000e39080 g O .bss 0000000000020000 __pcpu) (1000000000003030 looks like a SRR1 value?) 0xd2dab0: 0, bc7e80, 0, bc7eb0 0xd2dab0: c0000000, 83b0, 0, c3a280 (0xc0000000000083b0 looks like a stack address?) (c3a280 is inside my PowerMac G5 specific hack's ofwstk area: c392a0 up = to 0x3a2a0) (I've been gathering evidence about early-boot G5 crashes.) 0xd2dac0: 0, bc7ed0, 0, cf2960 0xd2dac0: 0, c40000, 0, c40000 0xd2dad0: 0, bc7f00, 0, bc7f28 0xd2dad0: 0, c40000, 0, c40000 0xd2dae0: 0, b72400, 0, bc7f28 0xd2dae0: c0000000, 8740, 0, cde110 (0xc000000000008740 looks like a stack address?) (0000000000cde110 g F .opd 0000000000000018 = smp_no_rendevous_barrier) 0xd2daf0: 0, cf2b28, 0, b716a0 0xd2daf0: 0, d00058, 0, cde110 (d00058 was also at 0xd2da4c and was followed by cde110 there.) (0000000000cde110 g F .opd 0000000000000018 = smp_no_rendevous_barrier) 0xd2db00: 0, cf2b88, 0, cf2b70 0xd2db00: 0, e6c280, 0, 0 (e6c280 is inside the emergency_buffer.7752 area: e6c278 up to e6c378) 0xd2db10: 0, cf2b58, 0, 8480 0xd2db10: 900000000, 1032, c0000000, 8740 (9000000000001032 looks like a SRR1 value?) (0xc000000000008740 looks like a stack address?) 0xd2db20: 0, c2d920, 0, cf2b10 0xd2db20: 0, c2d920, 0, cf2b10 (yep: unchanged!) 0xd2db30: 0, b71718, 0, c49888 0xd2db30: 0, ff846734, 10000000, 3030 (ff846734 would seem to be an openfirmware code address?) (1000000000003030 looks like a SRR1 value?) 0xd2db40: 0, c498a0, 0, c54000 0xd2db40: 0, c498a0, 0, ff846d78 (Yep: c498a0 was unchanged) (ff846d78 is openfirmware entry point?) 0xd2db50: 0, e313a8, 0, e31608 0xd2db50: 24000042, e313a8, 0, 0 (24000042 looks like a cr value?) (Trying to store to address 0x2400004200e313a8 for a specific type of 10.1-STABLE build is how the problem was originally noticed.) 0xd2db60: 0, c31f80, 0, bc81e8 0xd2db60: 0, c31f80, 0, 0 (Yep: 0x0000000000c31f80 is unchanged.) 0xd2db70: 0, e31408, 0, bc8228 0xd2db70: 200000, e31408, 0, bc8228 (Yep: Only the 0x200000 was a change.) 0xd2db80: 0, c32488, 0, bc8238 0xd2db80: 0, 1, 10000000, 3030 (1000000000003030 looks like a SRR1 value?) 0xd2db90: 0, e1e460, 0, c31fc0 0xd2db90: 0, 0, 0, 7ff7e800 0xd2dba0: 0, e31608, 0, bc8260 0xd2dba0: 0, 1000000a, 0, bc8260 (Yep: 0x0000000000bc8260 unchanged.) 0xd2dbb0: 0, e1e460, 0, e1fa60 0xd2dbb0: 0, e1e460, 0, e1fa60 (yep: unchanged!) 0xd2dbc0: 0, bc8288, 0, c32488 0xd2dbc0: 111081, 0, fd3c2000, 0 (fd3c2000 in openfirmware area?) 0xd2dbd0: 0, e3153c, 0, bc8298 0xd2dbd0: 10, 0, 0, 0 Now a few unchanged: 0xd2de0-0xd2dc1F Then a change in the pattern of corruptions for the rest of the = corrupted area: 0xd2dc20: 0, bc8288, 0, bc82e8 0xd2dc20: 0, bc8288, 127f500, bc82e8 Note how bc8288 and bc82e8 did not change. =46rom here on those two columns are not corrupted but the other two are. 0xd2dc30: 0, bc8300, 0, c32488 0xd2dc30: 8000000, bc8300, e7d540, c32488 0xd2dc40: 0, b4fef0, 0, e31558 0xd2dc40: ecc40, b4fef0, 84eec80, e31558 0xd2dc50: 0, bc8308, 0, cf2f00 0xd2dc50: 1e85440, bc8308, 8766200, cf2f00 0xd2dc60: 0, bc8310, 0, bc8350 0xd2dc60: fb9040, bc8310, 93bb000, bc8350 0xd2dc70: 0, c32038, 0, de5718 0xd2dc70: 94f6b00, c32038, 8632600, de5718 0xd2dc80: 0, de7768, 0, bc3760 0xd2dc80: 1fc0f40, de7768, 10f4b40, bc3760 0xd2dc90: 0, de7768, 0, e1fa00 0xd2dc90: 99e5700, cfc658, 228740, e1fa00 And after that things match for as far as I've looked: no corruptions. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-ppc@FreeBSD.ORG Thu Feb 19 11:38:50 2015 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A9CE291F for ; Thu, 19 Feb 2015 11:38:50 +0000 (UTC) Received: from asp.reflexion.net (outbound-242.asp.reflexion.net [69.84.129.242]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 53B1483D for ; Thu, 19 Feb 2015 11:38:49 +0000 (UTC) Received: (qmail 13179 invoked from network); 19 Feb 2015 11:38:48 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 19 Feb 2015 11:38:48 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v7.40.1) with SMTP; Thu, 19 Feb 2015 06:38:48 -0500 (EST) Received: (qmail 12202 invoked from network); 19 Feb 2015 11:38:48 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (DHE-RSA-AES256-SHA encrypted) SMTP; 19 Feb 2015 11:38:48 -0000 X-No-Relay: not in my network X-No-Relay: not in my network Received: from [192.168.1.8] (c-67-189-19-145.hsd1.or.comcast.net [67.189.19.145]) by iron2.pdx.net (Postfix) with ESMTPSA id 5F40A1C43A2; Thu, 19 Feb 2015 03:38:46 -0800 (PST) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Fixing powerpc64 /boot/loader's kernel page handing: suggestions? Message-Id: <229FBFAB-B198-4F79-827D-D381DE716593@dsl-only.net> Date: Thu, 19 Feb 2015 03:38:45 -0800 To: FreeBSD PowerPC ML , Nathan Whitehorn Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) X-Mailer: Apple Mail (2.2070.6) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Feb 2015 11:38:50 -0000 Nathan W wrote: > Thanks for diagnosing this! The syncicache spans the whole kernel out = of=20 > laziness. As you note, it isn't appropriate. If there are more = instances=20 > of this kind of thing, then it might make sense to try to make ld emit=20= > only one PT_LOAD program section as a long-term solution. I'll look = into=20 > that soon. > -Nathan In = https://lists.freebsd.org/pipermail/freebsd-ppc/2015-February/007415.html = I reported that I had accidentally caused the existence of the = unreferenced pages via a .align that was larger than a page: I typed the = target number instead of its power of 2. It was only "luck" that my = prior builds had been close to the large .align boundary and so happened = to not get any extra pages from the mistake. Without this oddity in the .align only one PT_LOAD is present in any of = my builds. But with the oddity such breaks can cause multiple PT_LOAD's to be = generated with holes between them. As stands there is an implicit rule that no pages can be wasted: no = holes that contain a full page or more are allowed. But any holes that = include such have non-obvious behavior and failure at a very early stage = where it is messy to figure out what happened. An alternate for the issue of "holes" might be a message reporting the = issue as a reason why the load/boot will be rejected --sort of like = detecting and reporting other types of problems with using a messed-up = file as a kernel. May be even just reporting whenever there is more than = one PT_LOAD even if no pages would make a hole. Then the issue of avoiding isync'ing memory regions that are not = appropriate would be an independent point that still could be addressed = on its own. =3D=3D=3D Mark Millard markmi at dsl-only.net