Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 19 Aug 2018 16:59:51 +0200
From:      Michael Gmelin <freebsd@grem.de>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        Michael Gmelin <freebsd@grem.de>, Konstantin Belousov <kostikbel@gmail.com>, "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, Matthias Apitz <guru@unixarea.de>
Subject:   Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy)
Message-ID:  <20180819165951.274d61b0@bsd64.grem.de>
In-Reply-To: <8726bc32-6023-bfe1-7600-5b2c706236f8@FreeBSD.org>
References:  <20180603215020.452a81d8@bsd64.grem.de> <20180603205340.GS3789@kib.kiev.ua> <20180604004632.56ca6afa@bsd64.grem.de> <20180604110654.GA2450@kib.kiev.ua> <20180604231756.2ed2adb9@bsd64.grem.de> <20180605131135.GH2450@kib.kiev.ua> <20180606010625.62632920@bsd64.grem.de> <20180815005106.69402d23@bsd64.grem.de> <20180815130447.GZ2340@kib.kiev.ua> <C26CD25D-3CB0-4F7E-8B50-F7E95E16B776@grem.de> <20180815135531.GA2340@kib.kiev.ua> <FAEA5B0A-5302-4A48-B322-21CB0D97C8CC@grem.de> <e82ed552-83b0-5331-3117-6750b8c205f7@FreeBSD.org> <07E28AC5-EBE6-4893-810A-6C03F07925C8@grem.de> <8726bc32-6023-bfe1-7600-5b2c706236f8@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help


On Fri, 17 Aug 2018 10:02:08 +0100
John Baldwin <jhb@FreeBSD.org> wrote:

> On 8/17/18 9:54 AM, Michael Gmelin wrote:
> >=20
> >  =20
> >> On 17. Aug 2018, at 08:17, John Baldwin <jhb@FreeBSD.org> wrote:
> >> =20
> >>> On 8/16/18 1:58 PM, Michael Gmelin wrote:
> >>>
> >>> =20
> >>>> On 15. Aug 2018, at 15:55, Konstantin Belousov
> >>>> <kostikbel@gmail.com <mailto:kostikbel@gmail.com>> wrote:=20
> >>>>> On Wed, Aug 15, 2018 at 03:52:37PM +0200, Michael Gmelin wrote:
> >>>>>
> >>>>> =20
> >>>>>>> On 15. Aug 2018, at 15:04, Konstantin Belousov
> >>>>>>> <kostikbel@gmail.com <mailto:kostikbel@gmail.com>> wrote:
> >>>>>>>
> >>>>>>> On Wed, Aug 15, 2018 at 12:51:06AM +0200, Michael Gmelin
> >>>>>>> wrote: Reviving this old thread, since I just updated to
> >>>>>>> r337818 and a similar problem is happening again. Since the
> >>>>>>> fix in r334799 (review https://reviews.freebsd.org/D15675)
> >>>>>>> (mp_)machdep.c have been touched, so maybe this is related
> >>>>>>> (https://svnweb.freebsd.org/base?view=3Drevision&revision=3D33479=
9).
> >>>>>>>
> >>>>>>> Please see the screenshot of the panic below:
> >>>>>>> https://gist.github.com/grembo/78d0f2a100dd4f16775b85a118769658
> >>>>>>>
> >>>>>>> This is me not digging any deeper, hoping that this is
> >>>>>>> something obvious. Please let me know if you need more
> >>>>>>> input. =20
> >>>>>>
> >>>>>> I do not see how recent mp_machdep.c changes could affect this.
> >>>>>> Can you try newest kernel but old loader ? =20
> >>>>>
> >>>>> I will try (but that will take a while). Oh, also, it still
> >>>>> boots in save mode/with smp disabled. =20
> >>>>
> >>>> Right, this is because the access to that address through DMAP
> >>>> is only needed when configuring AP startup resources.
> >>>>
> >>>> Also, I think it is safe to suggest that the bisect is needed. =20
> >>>
> >>> Using an older loader didn=E2=80=99t help, but I identified the probl=
em:
> >>>
> >>> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D334952
> >>>
> >>> modified the code you introduced in
> >>>
> >>> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D334799
> >>>
> >>> By correcting units to pages it also broke booting the Chromebook
> >>> as a side effect - so the previous fix just worked due to a bug
> >>> it seems.
> >>>
> >>> Is there an easy way to output the content of physmap at that
> >>> point (debug.late_console=3D0 doesn=E2=80=99t work) - like an existing
> >>> buffer I could use, or would this be more elaborate (I did
> >>> something complicated last time but didn=E2=80=99t save it, so any si=
mple
> >>> solution would be preferred). =20
> >>
> >> How about reverting the commit for now so you get a working console
> >> and print out the physmap array values along with Maxmem later in
> >> the boot (or just use kgdb to examine them once the system is
> >> running)?=20
> >=20
> > This is before the system has a working console (part of calling
> > getmem...), disabling late console makes it hang, physmap changes
> > afterwards, so running kgdb later doesn=E2=80=99t help. Last time I kep=
t a
> > copy of physmap and logged it later to know the original content. I
> > can do that again, I just thought maybe there is a simple mechanism
> > I=E2=80=99m not aware of that would save me some time. =20
>=20
> I thought we only modified phys_avail[], but saving a copy of
> physmap[] and dumping it from kgdb is probably the simplest thing to
> do.
>=20

Okay, so I had some time to investigate a bit more:

Before calling init_ops.mp_bootaddress in getmemsize (machdep.c),
physmap looks like this:

physmap_idx: 8
i mem atop
0 0x0 0x0
1 0x30000 0x30
2 0x40000 0x40
3 0x9e400 0x9e
4 0x100000 0x100
5 0xf00000 0xf00
6 0x1000000 0x1000
7 0x7bf7a000 0x7bf7a
8 0x100000000 0x100000
9 0x100600000 0x100600
10 0x0 0x0
Maxmem: 0x100600000 0x100600

Without using atop (the "buggy" version that actually boots without
crashing), the loop in mp_bootaddress looks like this:

i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem
8 0x100000000 0x100600000 0x100600 0x100600=20
6 0x1000000 0x7bf7a000 0x7bf7a 0x100600=20
4 0x100000 0xf00000 0xf00 0x100600=20
2 0x40000 0x9e400 0x9e 0x100600=20

And physmap looks like this afterwards:

physmap_idx: 8
i mem atop
0 0x0 0x0
1 0x30000 0x30
2 0x43000 0x43 <-- here
3 0x9e400 0x9e
4 0x100000 0x100
5 0xf00000 0xf00
6 0x1000000 0x1000
7 0x7bf7a000 0x7bf7a
8 0x100000000 0x100000
9 0x100600000 0x100600
10 0x0 0x0
mptramp_pagetables is 0x40000

So a three page gap was made at 0x40000 (atop(idx 2) is now 0x43
instead of 0x40)

In the current version (using atop), the loop in mp_bootaddress
looks like this:

i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem
8 0x100000000 0x100600000 0x100600 0x100600=20
6 0x1000000 0x7bf7a000 0x7bf7a 0x100600=20

And physmap looks like this afterwards:

physmap_idx: 8
i mem atop
0 0x0 0x0
1 0x30000 0x30
2 0x40000 0x40
3 0x9e400 0x9e
4 0x100000 0x100
5 0xf00000 0xf00
6 0x1003000 0x1003 <-- here
7 0x7bf7a000 0x7bf7a
8 0x100000000 0x100000
9 0x100600000 0x100600
10 0x0 0x0
mptramp_pagetables: 0x1000000

So a three page gap was made at 0x1000000 (atop(idx 6) is now
0x1003 instead of 0x1000)

When changing the code to require a page below 0x1000:

  if (physmap[i] >=3D GiB(4) || physmap[i + 1] -
      round_page(physmap[i]) < PAGE_SIZE * 3 ||
      atop(physmap[i + 1]) > Maxmem
      || atop(physmap[i + 1]) > 0x1000) // <--- this
      continue;

The system boots just fine. It uses page 0x100
for the bootstrap code in this case:

i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem
8 0x100000000 0x100600000 0x100600 0x100600=20
6 0x1000000 0x7bf7a000 0x7bf7a 0x100600=20
4 0x100000 0xf00000 0xf00 0x100600=20

Physmap looks like this:
physmap_idx: 8
i mem atop
0 0x0 0x0
1 0x30000 0x30
2 0x40000 0x40
3 0x9e400 0x9e
4 0x103000 0x103 <-- here
5 0xf00000 0xf00
6 0x1000000 0x1000
7 0x7bf7a000 0x7bf7a
8 0x100000000 0x100000
9 0x100600000 0x100600
10 0x0 0x0
mptramp_pagetables: 0x100000

So for some reason it's crashing when using pages 0x1000 - 0x1003 for
the bootstrap code, while it boots okay when using 0x40 - 0x43 and
0x100 - 0x103.

Any ideas?

Best,
Michael

p.s. This is what biosmem looks like

Type '?' for a list of command, 'help' for more detailed
help.
OK biosmem
bios_basemem: 0x9e400
bios_extmem: 0x3ff00000
memtop: 0x3c000000
high_heap_base: 0x3c000000
high_heap_size: 0x4000000
bios_quirks: 0x01 BQ_DISTRUST_820_EXTMEM
b_bios_probed: 0x0a B_BASEMEM_12 B_EXTMEM_E801

--=20
Michael Gmelin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180819165951.274d61b0>