Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Jun 2019 12:12:57 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        FreeBSD Toolchain <freebsd-toolchain@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, freeBSD PowerPC ML <freebsd-ppc@freebsd.org>, Conrad Meyer <cem@freebsd.org>
Cc:        Alfredo Dal Ava Junior <alfredo.junior@eldorado.org.br>, Justin Hibbits <jrh29@alumni.cwru.edu>
Subject:   Re: kern_execve using vm_page_zero_invalid but not vm_page_set_validclean to load /sbin/init ?
Message-ID:  <CF4D6785-F512-4DE7-BF61-7C0CF5B6E099@yahoo.com>
In-Reply-To: <D1093D97-C7B5-4370-9C75-507D1EB98D03@yahoo.com>
References:  <1464D960-A1D6-404A-BB10-E615E2D14C1D@yahoo.com> <CAG6CVpV5FBHgOTgxEgRmP%2B46Vm7mxoPCPECDJiq3k=D4qZ8PCA@mail.gmail.com> <4003198F-C11B-4587-910B-2001DC09F538@yahoo.com> <47E002B7-D4A1-4C4B-BFFD-D926263D895E@yahoo.com> <48148449-93B0-446C-AA28-F211FFAE1A8B@yahoo.com> <86F7C4C4-2BB6-40F0-B5D3-C80ECB4A97CF@yahoo.com> <D1093D97-C7B5-4370-9C75-507D1EB98D03@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[Looks to me like the ->valid mask only is used for the
last page of the /sbin/init file, not based on the size
and alignment of the data requested for the PT_LOAD.]

On 2019-Jun-11, at 21:53, Mark Millard <marklmi at yahoo.com> wrote:

> [The garbage after .got up to the page boundary is
> .comment section strings. The context here is
> targeting 32-bit powerpc via system-clang-8 and
> devel/powerpc64-binutils for buildworld and
> buildkernel . ]
>=20
> On 2019-Jun-11, at 19:55, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> [I have confirmed .sbss not being zero'd out and environ
>> thereby starting out non-zero (garbage): a
>> debug.minidump=3D0 style dump.]
>>=20
>>> On 2019-Jun-10, at 16:19, Mark Millard <marklmi@yahoo.com> wrote:
>>>=20
>>> . . . (omitted) . . .
>>=20
>> I used debug.minidump=3D0 in /boot/loader.conf for
>> cusing a dump for the crash and a libkvm modified
>> enough for my working boot environment to allow me
>> to examine the the memory-image bytes of such a dump,
>> with libkvm used via /usr/local/bin/kgdb . (No support
>> of automatically translating user-space addresses
>> or other such.)
>>=20
>> For the clang based debug buildworld and debug buildkernel
>> context with /sbin/init having:
>>=20
>> [16] .got              PROGBITS        01956ccc 146ccc 000010 04 WAX  =
0   0  4
>> [17] .sbss             NOBITS          01956cdc 146cdc 0000b0 00  WA  =
0   0  4
>> [18] .bss              NOBITS          01956dc0 146cdc 02ee28 00  WA  =
0   0 64
>>=20
>> I confirmed that .sbss in /sbin/init's address space
>> is not zeroed (so environ is not assigned by handle_argv ).
>> I also confirmed that _start was given a good env value
>> (in %r5) based on where the value was stored on the
>> stack. It is just that the value was not used.
>>=20
>> The detailed obvious-failure point (crash) can change based
>> on the garbage in the .sbss and, for the build that I used
>> this time, that happened in __je_arean_malloc_hard instead
>> of before _init_tls called _libc_allocate_tls . (I traced
>> the call chain in the dump.)
>>=20
>>=20
>> =46rom what I've seen in the dump there seem to be special
>> uses of some values (that also have normal uses, of
>> course):
>>=20
>> 0xfa5005af: as yet invalid page content.
>> 0x1c000020: as yet unassigned user-space-stack memory for /sbin/init.
>>=20
>> These are the same locations that I previously reported as
>> showing up in the DSI read trap reports for /sbin/init failing.
>> The specific build here failed with a different value.
>>=20
>> For reference relative to libkvm:
>>=20
>> # svnlite diff /usr/src/lib/libkvm/
>> Index: /usr/src/lib/libkvm/kvm_powerpc.c
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- /usr/src/lib/libkvm/kvm_powerpc.c	(revision 347549)
>> +++ /usr/src/lib/libkvm/kvm_powerpc.c	(working copy)
>> @@ -211,6 +211,53 @@
>> 	if (be32toh(vm->ph->p_paddr) =3D=3D 0xffffffff)
>> 		return ((int)powerpc_va2off(kd, va, ofs));
>>=20
>> +	// HACK in something for what I observe in
>> +	// a debug.minidump=3D0 vmcore.* for 32-bit powerpc
>> +	//
>> +	if (  be32toh(vm->ph->p_vaddr)  =3D=3D 0xffffffff
>> +	   && be32toh(vm->ph->p_paddr)  =3D=3D 0
>> +	   && be16toh(vm->eh->e_phnum)  =3D=3D 1
>> +	   ) {
>> +		// Presumes p_memsz is either unsigned
>> +		// 32-bit or is 64-bit, same for va .
>> +
>> +		if (be32toh(vm->ph->p_memsz) <=3D va)
>> +			return 0; // Like powerpc_va2off
>> +
>> +		// If ofs was (signed) 32-bit there
>> +		// would be a problem for sufficiently
>> +		// large postive memsz's and va's
>> +		// near the end --because of p_offset
>> +		// and dmphdrsz causing overflow/wrapping
>> +		// for some large va values.
>> +		// Presumes 64-bit ofs for such cases.
>> +		// Also presumes dmphdrsz+p_offset
>> +		// is non-negative so that small
>> +		// non-negative va values have no
>> +		// problems with ofs going negative.
>> +
>> +		*ofs =3D    vm->dmphdrsz
>> +			+ be32toh(vm->ph->p_offset)
>> +			+ va;
>> +
>> +		// The normal return value overflows/wraps
>> +		// for p_memsz =3D=3D 0x80000000u when va =3D=3D 0 .
>> +		// Avoid this by depending on calling code's
>> +		// loop for sufficiently large cases.
>> +		// This code presumes p_memsz/2 <=3D MAX_INT .
>> +		// 32-bit powerpc FreeBSD does not allow
>> +		// using more than 2 GiBytes of RAM but
>> +		// does allow using 2 GiBytes on 64-bit
>> +		// hardware.
>> +		//
>> +		if (  (int)be32toh(vm->ph->p_memsz) < 0
>> +		   && va < be32toh(vm->ph->p_memsz)/2
>> +		   )
>> +			return be32toh(vm->ph->p_memsz)/2;
>> +
>> +		return be32toh(vm->ph->p_memsz) - va;
>> +	}
>> +
>> 	_kvm_err(kd, kd->program, "Raw corefile not supported");
>> 	return (0);
>> }
>> Index: /usr/src/lib/libkvm/kvm_private.c
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- /usr/src/lib/libkvm/kvm_private.c	(revision 347549)
>> +++ /usr/src/lib/libkvm/kvm_private.c	(working copy)
>> @@ -131,7 +131,9 @@
>> {
>>=20
>> 	return (kd->nlehdr.e_ident[EI_CLASS] =3D=3D class &&
>> -	    kd->nlehdr.e_type =3D=3D ET_EXEC &&
>> +	    (  kd->nlehdr.e_type =3D=3D ET_EXEC ||
>> +	       kd->nlehdr.e_type =3D=3D ET_DYN
>> +	    ) &&
>> 	    kd->nlehdr.e_machine =3D=3D machine);
>> }
>>=20
>>=20
>>=20
>=20
> The following is was is in the .sbss/.bss up to the page
> boundry (after the .got bytes):
>=20
> (kgdb) x/s 0x2a66cdc
> 0x2a66cdc:	"$FreeBSD: head/lib/csu/powerpc/crt1.c 326219 2017-11-26 =
02:00:33Z pfg $"
>=20
> (kgdb) x/s 0x2a66d24
> 0x2a66d24:	"$FreeBSD: head/lib/csu/common/crtbrand.c 340701 =
2018-11-20 20:59:49Z emaste $"
>=20
> (kgdb) x/s 0x2a66d72
> 0x2a66d72:	"$FreeBSD: head/lib/csu/common/ignore_init.c 340702 =
2018-11-20 21:04:20Z emaste $"
>=20
> (kgdb) x/s 0x2a66dc3
> 0x2a66dc3:	"FreeBSD clang version 8.0.0 (tags/RELEASE_800/final =
356365) (based on LLVM 8.0.0)"
>=20
> (kgdb) x/s 0x2a66e15
> 0x2a66e15:	"$FreeBSD: head/lib/csu/powerpc/crti.S 217399 2011-01-14 =
11:34:58Z kib $"
>=20
> (kgdb) x/s 0x2a66e5d
> 0x2a66e5d:	"$FreeBSD: head/sbin/mount/getmntopts.c 326025 =
2017-11-20 19:49:47Z pfg $"
>=20
> (kgdb) x/s 0x2a66ea6
> 0x2a66ea6:	"$FreeBSD: head/lib/libutil/login_tty.c 334106 =
2018-05-23 17:02:12Z jhb $"
>=20
> (kgdb) x/s 0x2a66eef
> 0x2a66eef:	"$FreeBSD: head/lib/libutil/login_class.c 296723 =
2016-03-12 14:54:34Z kib $"
>=20
> (kgdb) x/s 0x2a66f83
> 0x2a66f83:	"$FreeBSD: head/lib/libutil/_secure_path.c 139012 =
2004-12-18 12:31:12Z ru $"
>=20
> (kgdb) x/s 0x2a66fce
> 0x2a66fce:	"$FreeBSD: head/lib/libcrypt/crypt.c 326219 2017-11
>=20
> (I truncated that last to avoid the 0xfa5005af's on the next page
> in RAM.)
>=20
> Compare ( from readelf /sbin/init ):
>=20
> String dump of section '.comment':
>  [     0]  $FreeBSD: head/lib/csu/powerpc/crt1.c 326219 2017-11-26 =
02:00:33Z pfg $
>  [    48]  $FreeBSD: head/lib/csu/common/crtbrand.c 340701 2018-11-20 =
20:59:49Z emaste $
>  [    96]  $FreeBSD: head/lib/csu/common/ignore_init.c 340702 =
2018-11-20 21:04:20Z emaste $
>  [    e7]  FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) =
(based on LLVM 8.0.0)
>  [   139]  $FreeBSD: head/lib/csu/powerpc/crti.S 217399 2011-01-14 =
11:34:58Z kib $
>  [   181]  $FreeBSD: head/sbin/mount/getmntopts.c 326025 2017-11-20 =
19:49:47Z pfg $
>  [   1ca]  $FreeBSD: head/lib/libutil/login_tty.c 334106 2018-05-23 =
17:02:12Z jhb $
>  [   213]  $FreeBSD: head/lib/libutil/login_class.c 296723 2016-03-12 =
14:54:34Z kib $
>  [   25e]  $FreeBSD: head/lib/libutil/login_cap.c 317265 2017-04-21 =
19:27:33Z pfg $
>  [   2a7]  $FreeBSD: head/lib/libutil/_secure_path.c 139012 2004-12-18 =
12:31:12Z ru $
>  [   2f2]  $FreeBSD: head/lib/libcrypt/crypt.c 326219 2017-11-26 =
02:00:33Z pfg $
> . . .
>=20
> Note:
>=20
> Program Headers:
>  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg =
Align
>  LOAD           0x000000 0x01800000 0x01800000 0x140ad4 0x140ad4 R E =
0x10000
>  LOAD           0x140ae0 0x01950ae0 0x01950ae0 0x061fc 0x35108 RWE =
0x10000
>  NOTE           0x0000d4 0x018000d4 0x018000d4 0x00048 0x00048 R   0x4
>  TLS            0x140ae0 0x01950ae0 0x01950ae0 0x00b10 0x00b1d R   =
0x10
>  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  =
0x10
>=20
> Section to Segment mapping:
>  Segment Sections...
>   00     .note.tag .init .text .fini .rodata .eh_frame=20
>   01     .tdata .tbss .init_array .fini_array .ctors .dtors .jcr =
.data.rel.ro .data .got .sbss .bss=20
>   02     .note.tag=20
>   03     .tdata .tbss=20
>   04    =20
> There are 24 section headers, starting at offset 0x16cec8:
>=20
> Section Headers:
>  [Nr] Name              Type            Addr     Off    Size   ES Flg =
Lk Inf Al
> . . .
>  [16] .got              PROGBITS        01956ccc 146ccc 000010 04 WAX  =
0   0  4
>  [17] .sbss             NOBITS          01956cdc 146cdc 0000b0 00  WA  =
0   0  4
>  [18] .bss              NOBITS          01956dc0 146cdc 02ee28 00  WA  =
0   0 64
>  [19] .comment          PROGBITS        00000000 146cdc 0073d4 01  MS  =
0   0  1
>=20
> It looks like material after the .got is being copied,
> spanning the in-file-empty .sbss and .bss sections and
> implicitly initializing (the first part of) those
> sections.


The ->valid assignments appears to trace to code like:

        /*
         * The last page has valid blocks.  Invalid part can only
         * exist at the end of file, and the page is made fully valid
         * by zeroing in vm_pager_get_pages().
         */
        if (m[count - 1]->valid !=3D 0 && --count =3D=3D 0) {
                if (iodone !=3D NULL)
                        iodone(arg, m, 1, 0);
                return (VM_PAGER_OK);
        }

independent of if the requested data does not span
into the last page but does not span to the end of
a page.

So it appears that the use of:

QUOTE
vm_imgact_map_page uses vm_imgact_hold_page.

vm_imgact_hold_page uses vm_pager_get_pages.

vm_pager_get_pages uses vm_page_zero_invalid
to "Zero out partially filled data"
END QUOTE

simply does not do the right thing for .sbss
or .bss handling. The m->valid related code
for zeroing is basically irrelevant to .sbss
and .bss.

Note that the below code requires a m->valid bit
to be asserted in order to do any
pmap_zero_page_area operations. Thus it does not
zero out pages that are completely invalid either.
This explains why I see 0xfa5005af on the full
pages in the .sbss/.bss area for debug builds:
nothing is zeroing the full pages either.

void
vm_page_zero_invalid(vm_page_t m, boolean_t setvalid)
{
       int b;
       int i;

       VM_OBJECT_ASSERT_WLOCKED(m->object);
       /*
        * Scan the valid bits looking for invalid sections that
        * must be zeroed.  Invalid sub-DEV_BSIZE'd areas ( where the
        * valid bit may be set ) have already been zeroed by
        * vm_page_set_validclean().
        */
       for (b =3D i =3D 0; i <=3D PAGE_SIZE / DEV_BSIZE; ++i) {
               if (i =3D=3D (PAGE_SIZE / DEV_BSIZE) ||
                   (m->valid & ((vm_page_bits_t)1 << i))) {
                       if (i > b) {
                               pmap_zero_page_area(m,
                                   b << DEV_BSHIFT, (i - b) << =
DEV_BSHIFT);
                       }
                       b =3D i + 1;
               }
       }

       /*
        * setvalid is TRUE when we can safely set the zero'd areas
        * as being valid.  We can do this if there are no cache =
consistancy
        * issues.  e.g. it is ok to do with UFS, but not ok to do with =
NFS.
        */
       if (setvalid)
               m->valid =3D VM_PAGE_BITS_ALL;
}

This code simply does not do the right thing for .sbss and
.bss handling.

__start in /sbin/init (for example) expects .sbss and .bss
to have already been initialized to zero (and possibly
further adjusted after that for something like environ).

So far I find nothing to cover that.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CF4D6785-F512-4DE7-BF61-7C0CF5B6E099>