Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Jan 2019 16:30:06 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Cc:        Justin Hibbits <chmeeedalf@gmail.com>, Nathan Whitehorn <nwhitehorn@freebsd.org>
Subject:   Re: Under usefdt=1 type of context for G5 "4 core" (system total): unload then boot /boot/kernel/kernel fails very early with: panic: Translation map has incorrect cell count (260/256)
Message-ID:  <EDDC5FC1-25AA-42A5-B9D4-CB6480F19C74@yahoo.com>
In-Reply-To: <9A6549AE-46FB-41B9-8F15-9E1AE3E16E2C@yahoo.com>
References:  <9A6549AE-46FB-41B9-8F15-9E1AE3E16E2C@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On 2019-Jan-28, at 23:42, Mark Millard <marklmi at yahoo.com> wrote:

> [The loader here has been modified to always do the usefdt=3D1 code.
> The modern VM_MAX_KERNEL_ADDRESS is in the kernel.]
>=20
> In trying to boot for other experiments I tried at the loader prompt:
>=20
> unload
> load /boot/kernel/kernel
> boot
>=20
> and:
>=20
> unload
> boot -v /boot/kernel/kernel
>=20
> Such combinations produced (typed from a picture of the screen,
> the missing text was really missing):
>=20
> GDB: no debug ports present
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> panic: Translation map has incorrect cell count (260/256)

Note that 260 is a multiple of 5 and 4 but 256 is not a multiple
of 5. Being a multiple of 5 is important for  "acells=3D=3D2" contexts.

Going through some of the code, first noting some context:

struct ofw_map {
        cell_t  om_va;
        cell_t  om_len;
        uint64_t om_pa;
        cell_t  om_mode;
};

This has 4 fields but one appears to be bigger than cell_t
if I understand right. This gets into if:

        OF_getencprop(OF_finddevice("/"), "#address-cells", &acells,
            sizeof(acells));

indicates 2 cells for om_pa (via acells=3D=3D2) or just 1 cell. So
when acells=3D=3D2 there are 5 cells per, otherwise there are 4.

Now to the routine in question:

static void
moea64_add_ofw_mappings(mmu_t mmup, phandle_t mmu, size_t sz)
{
        struct ofw_map  translations[sz/(4*sizeof(cell_t))]; /*>=3D 4 =
cells per */

For acells=3D=3D2 the above potentially over allocates translations =
some.
Later it turns out that if it does that may prevent out of bounds
writing into translations .


        pcell_t         acells, trans_cells[sz/sizeof(cell_t)];

It turns out that sz/sizeof(cell_t) at this point ended up as a
multiple of 4 but not of 5, despite acells=3D=3D2 being the case.
Later this leads to out of bounds accesses for trans_cells .

        struct pvo_entry *pvo;
        register_t      msr;
        vm_offset_t     off;
        vm_paddr_t      pa_base;
        int             i, j;
=20
        bzero(translations, sz);
        OF_getencprop(OF_finddevice("/"), "#address-cells", &acells,
            sizeof(acells));

This ended up with acells=3D=3D2 --so 5 cells per translation entry.

        if (OF_getencprop(mmu, "translations", trans_cells, sz) =3D=3D =
-1)
                panic("moea64_bootstrap: can't get ofw translations");

For acells=3D=3D2: sz/sizeof(cell_t) not being a multiple of 5 means
either some unused trans_cell entries or access outside the
trans_cells array in the loop below, depending on the loop test
used.

        CTR0(KTR_PMAP, "moea64_add_ofw_mappings: translations");
        sz /=3D sizeof(cell_t);

sz here ended up as 256, not a multiple of 5.

        for (i =3D 0, j =3D 0; i < sz; j++) {

The i < sz test is testing the first of the 4 or 5 being
below sz instead of testing the last of the 4 or 5. This
leads to out of bounds accesses into trans_cells below for
the 256 with acells=3D=3D2 case. It also means that the last
translations[j] ends up with some garbage content.

                translations[j].om_va =3D trans_cells[i++];
                translations[j].om_len =3D trans_cells[i++];
                translations[j].om_pa =3D trans_cells[i++];
                if (acells =3D=3D 2) {
                        translations[j].om_pa <<=3D 32;
                        translations[j].om_pa |=3D trans_cells[i++];
                }
                translations[j].om_mode =3D trans_cells[i++];
        }

The loop test for the above loop just looks wrong to me for avoid
bad accesses.

        KASSERT(i =3D=3D sz, ("Translations map has incorrect cell count =
(%d/%zd)",
            i, sz));

Having sz [ the original sz/sizeof(cell_t) ] not be a multiple of 5
for acells=3D=3D2 would fail this test even if the loop stopped without
going out of bounds on trans_cells (so i=3D=3D255 instead of i=3D=3D260).

For the original sz figure having sz/sizeof(cell_t) =3D=3D 256, the =
original
sz figure was not a multiple of 5 [presuming sizeof(cell_t) is not a
multiple of 5].

My guess is that the original sz value was wrong after the unload and
boot (reloading) and the loop structure should avoid going out of bounds
on trans_cells anyway.

> cpuid =3D 0
> time =3D 1
> KDB: stack backtrace:
> 0xc000000000f9e       kdb_backtrace+0x60
> 0xc000000000f9e       vpanic+0x258
> 0xc000000000f9e       panic+0x3c
> 0xc000000000f9e       moea64_late_bootstrap+0x2c0
> 0xc000000000f9e       moea64_bootstrap_native+0x20c
> 0xc000000000f9e       pmap_bootstrap_0xc8
> 0xc000000000f9e       powerpc_init+0x440
> 0xc000000000f9efc0: at=20
>=20
> (And that is all.)



The below tried to fix the loop to avoid out of bounds
trans_cells accesses and to allow my context to not panic
for the "multiple of 4 but not of 5" issue for the unload
then reload/boot sequence. The bias was for the code
changes to be easy to follow, given the earlier commentary.

# svnlite diff /usr/src/sys/powerpc/aim/mmu_oea64.c | more               =
                                                                         =
                                              Index: =
/usr/src/sys/powerpc/aim/mmu_oea64.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /usr/src/sys/powerpc/aim/mmu_oea64.c        (revision 341836)
+++ /usr/src/sys/powerpc/aim/mmu_oea64.c        (working copy)
@@ -502,7 +502,7 @@
        register_t      msr;
        vm_offset_t     off;
        vm_paddr_t      pa_base;
-       int             i, j;
+       int             i, istep, j;
=20
        bzero(translations, sz);
        OF_getencprop(OF_finddevice("/"), "#address-cells", &acells,
@@ -512,7 +512,8 @@
=20
        CTR0(KTR_PMAP, "moea64_add_ofw_mappings: translations");
        sz /=3D sizeof(cell_t);
-       for (i =3D 0, j =3D 0; i < sz; j++) {
+       istep =3D (acells =3D=3D 2) ? 5 : 4;
+       for (i =3D 0, j =3D 0; i+istep-1 < sz; j++) {
                translations[j].om_va =3D trans_cells[i++];
                translations[j].om_len =3D trans_cells[i++];
                translations[j].om_pa =3D trans_cells[i++];
@@ -522,7 +523,7 @@
                }
                translations[j].om_mode =3D trans_cells[i++];
        }
-       KASSERT(i =3D=3D sz, ("Translations map has incorrect cell count =
(%d/%zd)",
+       KASSERT(i+sz%istep =3D=3D sz, ("Translations map has incorrect =
cell count (%d/%zd)",
            i, sz));
=20
        sz =3D j;

I do not claim the KASSERT should officially change but I expect that =
the
loop should.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EDDC5FC1-25AA-42A5-B9D4-CB6480F19C74>