From owner-freebsd-current@freebsd.org Sun Aug 19 16:16:59 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 828B3106E802 for ; Sun, 19 Aug 2018 16:16:59 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E80297DA62; Sun, 19 Aug 2018 16:16:58 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id w7JGGgPO084252 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 19 Aug 2018 19:16:45 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua w7JGGgPO084252 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id w7JGGgxK084251; Sun, 19 Aug 2018 19:16:42 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 19 Aug 2018 19:16:42 +0300 From: Konstantin Belousov To: Michael Gmelin Cc: John Baldwin , "freebsd-current@freebsd.org" , Matthias Apitz Subject: Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy) Message-ID: <20180819161642.GP2340@kib.kiev.ua> References: <20180606010625.62632920@bsd64.grem.de> <20180815005106.69402d23@bsd64.grem.de> <20180815130447.GZ2340@kib.kiev.ua> <20180815135531.GA2340@kib.kiev.ua> <07E28AC5-EBE6-4893-810A-6C03F07925C8@grem.de> <8726bc32-6023-bfe1-7600-5b2c706236f8@FreeBSD.org> <20180819165951.274d61b0@bsd64.grem.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180819165951.274d61b0@bsd64.grem.de> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Aug 2018 16:16:59 -0000 On Sun, Aug 19, 2018 at 04:59:51PM +0200, Michael Gmelin wrote: > > > On Fri, 17 Aug 2018 10:02:08 +0100 > John Baldwin wrote: > > > On 8/17/18 9:54 AM, Michael Gmelin wrote: > > > > > > > > >> On 17. Aug 2018, at 08:17, John Baldwin wrote: > > >> > > >>> On 8/16/18 1:58 PM, Michael Gmelin wrote: > > >>> > > >>> > > >>>> On 15. Aug 2018, at 15:55, Konstantin Belousov > > >>>> > wrote: > > >>>>> On Wed, Aug 15, 2018 at 03:52:37PM +0200, Michael Gmelin wrote: > > >>>>> > > >>>>> > > >>>>>>> On 15. Aug 2018, at 15:04, Konstantin Belousov > > >>>>>>> > wrote: > > >>>>>>> > > >>>>>>> On Wed, Aug 15, 2018 at 12:51:06AM +0200, Michael Gmelin > > >>>>>>> wrote: Reviving this old thread, since I just updated to > > >>>>>>> r337818 and a similar problem is happening again. Since the > > >>>>>>> fix in r334799 (review https://reviews.freebsd.org/D15675) > > >>>>>>> (mp_)machdep.c have been touched, so maybe this is related > > >>>>>>> (https://svnweb.freebsd.org/base?view=revision&revision=334799). > > >>>>>>> > > >>>>>>> Please see the screenshot of the panic below: > > >>>>>>> https://gist.github.com/grembo/78d0f2a100dd4f16775b85a118769658 > > >>>>>>> > > >>>>>>> This is me not digging any deeper, hoping that this is > > >>>>>>> something obvious. Please let me know if you need more > > >>>>>>> input. > > >>>>>> > > >>>>>> I do not see how recent mp_machdep.c changes could affect this. > > >>>>>> Can you try newest kernel but old loader ? > > >>>>> > > >>>>> I will try (but that will take a while). Oh, also, it still > > >>>>> boots in save mode/with smp disabled. > > >>>> > > >>>> Right, this is because the access to that address through DMAP > > >>>> is only needed when configuring AP startup resources. > > >>>> > > >>>> Also, I think it is safe to suggest that the bisect is needed. > > >>> > > >>> Using an older loader didn???t help, but I identified the problem: > > >>> > > >>> https://svnweb.freebsd.org/base?view=revision&revision=334952 > > >>> > > >>> modified the code you introduced in > > >>> > > >>> https://svnweb.freebsd.org/base?view=revision&revision=334799 > > >>> > > >>> By correcting units to pages it also broke booting the Chromebook > > >>> as a side effect - so the previous fix just worked due to a bug > > >>> it seems. > > >>> > > >>> Is there an easy way to output the content of physmap at that > > >>> point (debug.late_console=0 doesn???t work) - like an existing > > >>> buffer I could use, or would this be more elaborate (I did > > >>> something complicated last time but didn???t save it, so any simple > > >>> solution would be preferred). > > >> > > >> How about reverting the commit for now so you get a working console > > >> and print out the physmap array values along with Maxmem later in > > >> the boot (or just use kgdb to examine them once the system is > > >> running)? > > > > > > This is before the system has a working console (part of calling > > > getmem...), disabling late console makes it hang, physmap changes > > > afterwards, so running kgdb later doesn???t help. Last time I kept a > > > copy of physmap and logged it later to know the original content. I > > > can do that again, I just thought maybe there is a simple mechanism > > > I???m not aware of that would save me some time. > > > > I thought we only modified phys_avail[], but saving a copy of > > physmap[] and dumping it from kgdb is probably the simplest thing to > > do. > > > > Okay, so I had some time to investigate a bit more: > > Before calling init_ops.mp_bootaddress in getmemsize (machdep.c), > physmap looks like this: > > physmap_idx: 8 > i mem atop > 0 0x0 0x0 > 1 0x30000 0x30 > 2 0x40000 0x40 > 3 0x9e400 0x9e > 4 0x100000 0x100 > 5 0xf00000 0xf00 > 6 0x1000000 0x1000 > 7 0x7bf7a000 0x7bf7a > 8 0x100000000 0x100000 > 9 0x100600000 0x100600 > 10 0x0 0x0 > Maxmem: 0x100600000 0x100600 > > Without using atop (the "buggy" version that actually boots without > crashing), the loop in mp_bootaddress looks like this: > > i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem > 8 0x100000000 0x100600000 0x100600 0x100600 > 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600 > 4 0x100000 0xf00000 0xf00 0x100600 > 2 0x40000 0x9e400 0x9e 0x100600 > > And physmap looks like this afterwards: > > physmap_idx: 8 > i mem atop > 0 0x0 0x0 > 1 0x30000 0x30 > 2 0x43000 0x43 <-- here > 3 0x9e400 0x9e > 4 0x100000 0x100 > 5 0xf00000 0xf00 > 6 0x1000000 0x1000 > 7 0x7bf7a000 0x7bf7a > 8 0x100000000 0x100000 > 9 0x100600000 0x100600 > 10 0x0 0x0 > mptramp_pagetables is 0x40000 > > So a three page gap was made at 0x40000 (atop(idx 2) is now 0x43 > instead of 0x40) > > In the current version (using atop), the loop in mp_bootaddress > looks like this: > > i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem > 8 0x100000000 0x100600000 0x100600 0x100600 > 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600 > > And physmap looks like this afterwards: > > physmap_idx: 8 > i mem atop > 0 0x0 0x0 > 1 0x30000 0x30 > 2 0x40000 0x40 > 3 0x9e400 0x9e > 4 0x100000 0x100 > 5 0xf00000 0xf00 > 6 0x1003000 0x1003 <-- here > 7 0x7bf7a000 0x7bf7a > 8 0x100000000 0x100000 > 9 0x100600000 0x100600 > 10 0x0 0x0 > mptramp_pagetables: 0x1000000 > > So a three page gap was made at 0x1000000 (atop(idx 6) is now > 0x1003 instead of 0x1000) > > When changing the code to require a page below 0x1000: > > if (physmap[i] >= GiB(4) || physmap[i + 1] - > round_page(physmap[i]) < PAGE_SIZE * 3 || > atop(physmap[i + 1]) > Maxmem > || atop(physmap[i + 1]) > 0x1000) // <--- this > continue; > > The system boots just fine. It uses page 0x100 > for the bootstrap code in this case: > > i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem > 8 0x100000000 0x100600000 0x100600 0x100600 > 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600 > 4 0x100000 0xf00000 0xf00 0x100600 > > Physmap looks like this: > physmap_idx: 8 > i mem atop > 0 0x0 0x0 > 1 0x30000 0x30 > 2 0x40000 0x40 > 3 0x9e400 0x9e > 4 0x103000 0x103 <-- here > 5 0xf00000 0xf00 > 6 0x1000000 0x1000 > 7 0x7bf7a000 0x7bf7a > 8 0x100000000 0x100000 > 9 0x100600000 0x100600 > 10 0x0 0x0 > mptramp_pagetables: 0x100000 > > So for some reason it's crashing when using pages 0x1000 - 0x1003 for > the bootstrap code, while it boots okay when using 0x40 - 0x43 and > 0x100 - 0x103. > > Any ideas? I in fact misread the page fault state decoding in your photo. It is curiously protection violation on write, instead of non-present page access. Compile ddb into your kernel, then on fault do db> x/x dmaplimit db> x/x dmaplimit+4 db> show pte Also show me the verbose dmesg lines with CPU features identification. > > Best, > Michael > > p.s. This is what biosmem looks like > > Type '?' for a list of command, 'help' for more detailed > help. > OK biosmem > bios_basemem: 0x9e400 > bios_extmem: 0x3ff00000 > memtop: 0x3c000000 > high_heap_base: 0x3c000000 > high_heap_size: 0x4000000 > bios_quirks: 0x01 BQ_DISTRUST_820_EXTMEM > b_bios_probed: 0x0a B_BASEMEM_12 B_EXTMEM_E801 > > -- > Michael Gmelin