Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Feb 2015 05:00:37 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Cc:        Justin Hibbits <chmeeedalf@gmail.com>
Subject:   Re: Fixing powerpc64 /boot/loader's kernel page handing: suggestions? [my screwup!]
Message-ID:  <3B68D71E-806B-4826-A603-1141E70A281C@dsl-only.net>
In-Reply-To: <B756B0BB-D15F-4E4B-8A61-93EEBA7BD464@dsl-only.net>
References:  <B756B0BB-D15F-4E4B-8A61-93EEBA7BD464@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
The unreferenced pages were *my* screw up and so I have the answer to my =
questions:

I typed the target value 16 to .align instead of typing the power of 2 =
(i.e., 4) --and then repeatedly read-in my intent, not what the source =
said/implied. That things prior to .data ended within a page of such a =
2**16 boundary for a notable time in my activities just happened to be =
the case: The mistake has been in my environment for a long time despite =
my only recently discovering the crash issue with recent builds that =
I=E2=80=99d done.

So the answer to my questions was option (A): Avoid having the link =
produce the unreferenced pages. (By keeping alignments smaller than a =
page --by in turn specifying the power of 2 like I should have.)

At least I got an exploration of more of the FreeBSD code out of my =
mistake...

=3D=3D=3D
Mark Millard
markmi at dsl-only.net

On 2015-Feb-10, at 05:16 PM, Mark Millard <markmi at dsl-only.net> =
wrote:

Context:

Unfortunately this takes me a bit to describe...

powerpc 64 FreeBSD 10.1-??? variants on a PowerMac G5 Quad-Core, built =
on the same machine. I expect the issue applies to some plain powerpc =
contexts as well as some other powerpc64 contexts. As example context =
where my issue occurs is:

> 10.1-RELEASE-p5
> 10.1-RELEASE-p5
> FreeBSD FBSDG5M1 10.1-RELEASE-p5 FreeBSD 10.1-RELEASE-p5 #0 r277808M: =
Fri Jan 30 00:58:33 PST 2015     =
root@FBSDG5M1:/usr/obj/usr/home/markmi/src_10_1_releng/sys/GENERIC64vtsc =
 powerpc

But I also get is for various vintages of 10.1-STABLE (and =
11.0-CURRENT). I use 10.1-RELEASE-p5 here because I happen to have a =
build that avoids the problem and I know what to set for that build to =
regenerated --and I know at least one thing to to turn on for builds to =
create the problem.

> root@FBSDG5M1:/usr/home/markmi/src_10_1_releng # more =
sys/powerpc/conf/GENERIC64vtsc=20
> include GENERIC64
> ident   GENERIC64vtsc
>=20
> nooptions       PS3                     #Sony Playstation 3            =
   HACK!!! to allow sc
>=20
> options         DDB                     # HACK!!! to dump early crash =
info (but 11.0-CURRENT already has it)
> options         GDB                     # HACK!!! ...
> options         VERBOSE_SYSINIT         # VERBOSE_SYSINT blocks direct =
booting for my 10.1-RELEASE-p5 variants: Crashes when the loader is in =
__syncicache doing dcbst's.
> options         BOOTVERBOSE=3D1
> options         BOOTHOWTO=3DRB_VERBOSE
> #options        KTR
> #options        KTR_MASK=3DKTR_TRAP
> #options        KTR_CPUMASK=3D0xF
> #options        KTR_VERBOSE
>=20
> # HACK!!! to allow sc for 2560x1440 display on Radeon X1950 that vt =
historically mishandled during booting
> device          sc
> #device          kbdmux         # HACK: already listed by vt
> options         SC_OFWFB        # OFW frame buffer
> options         SC_DFLT_FONT    # compile font in
> makeoptions     SC_DFLT_FONT=3Dcp437
>=20
>=20
> # Disable extra checking typically used for FreeBSD 11.0-CURRENT:
> nooptions       DEADLKRES               #Enable the deadlock resolver
> nooptions       INVARIANTS              #Enable calls of extra sanity =
checking
> nooptions       INVARIANT_SUPPORT       #Extra sanity checks of =
internal structures, required by INVARIANTS
> nooptions       WITNESS                 #Enable checks to detect =
deadlocks and cycles
> nooptions       WITNESS_SKIPSPIN        #Don't run witness on =
spinlocks for speed
> nooptions       MALLOC_DEBUG_MAXZONES   # Separate malloc(9) zones


For my temporarily extended ELF_VERBOSE code [and other printf's] that =
also reports on non-PT_LOADs (which are otherwise skipped) what it =
reports for booting various 10.1-??? kernel builds is the sequence:

PT_PHDR
PT_INTERP
PT_LOAD (for .text)
   (using archsw.arch_copyin then kern_pread)
   Address range example: 0x100000-0xbe017b
<note: some builds have unreferenced pages between the 2 PT_LOADs>
PT_LOAD (for .data)
   (using kern_pread)
   Address range for the same example: 0xbf0000-0xea4b7f
PT_DYNAMIC
PT_GNU_STACK
symtab
strtab
   Final address for the same example: 0x1114baf

The issue happens when there are such unreferenced pages where I =
indicated. It turns out for what I started this investigation with that =
if I commented out VERBOSE_SYSINIT in GENERIC64vtsc (listed earlier) =
then no unreferenced pages appear but with VERBOSE_SYSINT there are such =
pages (holding the rest of the context constant). But this is not the =
only way to get such unreferenced pages. For example my 10.1-STABLE =
build has unreferenced pages but does not have VERBOSE_SYSINIT (yet).

When there are unreferenced pages between the two PT_LOADs those pages =
do not get archsw_arch_copyin or kern_pread handling. (kern_pread in =
turn uses archsw.arch_readin.)

For my PowerMac G5 Quad-Core context those archsw.arch_<?> routines end =
up being ofw_copyin and ofw_readin. Those routines in turn call =
ofw_memmap which includes doing:

>        if (OF_call_method("claim", memory, 3, 1, destp, dlen, 0, =
&addr)
>            =3D=3D -1) {
>                printf("ofw_mapmem: physical claim failed\n");
>                return (ENOMEM);
>        }
>=20
>        /*
>         * We only do virtual memory management when real_mode is =
false.
>         */
>        if (real_mode =3D=3D 0) {
>                if (OF_call_method("claim", mmu, 3, 1, destp, dlen, 0, =
&addr)
>                    =3D=3D -1) {
>                        printf("ofw_mapmem: virtual claim failed\n");
>                        return (ENOMEM);
>                }
>=20
>                if (OF_call_method("map", mmu, 4, 0, destp, destp, =
dlen, 0)
>                    =3D=3D -1) {
>                        printf("ofw_mapmem: map failed\n");
>                        return (ENOMEM);
>                }
>        }

and during load-time this is what programs the PowerPC to have the PTEG =
entries (and whatever else) that instructions like dcbst require (since =
MSR[DR]=3D1).

The crashes are at the first dcbst in __syncicache execution that =
reference the missing pages. (It seems unlikely that there is any other =
usage of those pages.) The crash reports missing PTEG entries (DSISR for =
IV 0x300). (Apple's openfirmware word .registers shows the recorded =
register status from the crash. After the crash the PowerMac is in =
Apple's context, not FreeBSD's.)

The __syncicache use results from the following

> int
> ppc64_ofw_elf_loadfile(char *filename, u_int64_t dest,
>    struct preloaded_file **result)
> {
>        int     r;
>=20
>        r =3D __elfN(loadfile)(filename, dest, result);
>        if (r !=3D 0)
>                return (r);
>=20
>        /*
>         * No need to sync the icache for modules: this will
>         * be done by the kernel after relocation.
>         */
>        if (!strcmp((*result)->f_type, "elf kernel"))
>                __syncicache((void *) (*result)->f_addr, =
(*result)->f_size);
>        return (0);
> }

(powerpc has a similar sequence with __syncicache as I remember.) For =
some reason the __syncicache usage is set up to span into or beyond the =
.data segment, not just the .text one. I do not know why.

__elfN(loadfile)'s interface is not designed to return multiple address =
ranges and is returning one range that spans into both the PT_LOAD =
ranges (.text and .data) and any unreferenced pages that are between =
them. (In fact it spans even more afterwards as I remember.)


Questions:

Anyone have a clue about why the __syncicache use is set up to span into =
.data (and more) and not just span .text --and willing to explain a =
little?


As far as solution directions go: this looks like a subject area =
appropriate to general FreeBSD use base on the available evidence. A =
local personal hack does not seem appropriate. So...


A) Should the link of the kernel be producing a kernel with unreferenced =
pages between the two PT_LOADs (between .text and .data)? Is the proper =
fix to prevent those pages from existing in linked kernels?

vs.

B) Is it okay for those unreferenced pages to be there between the two =
PT_LOADs? If yes...

B1) Should something like the ofw_memmap activity be forced on those =
otherwise unreferenced pages so that the later __syncicache use can stay =
as it is?

vs.

B2) Should the unreferenced pages be skipped by making separate =
__synicache calls for each PT_LOAD (.text segment and then .data segment =
and beyond(?))?

vs.

B3) Should only the .text segment be spanned by the __syncicache use? =
Some other more specific range that avoids those unreferenced pages?


It would appear that all but (A) involve changing the interface provided =
by __elfN(loadfile) and/or the interfaces it uses: the fix does not =
appear well localized. (A) may have its own such issues but in other =
code or files that I've not looked at.


=3D=3D=3D
Mark Millard
markmi at dsl-only.net





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3B68D71E-806B-4826-A603-1141E70A281C>