Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 31 Mar 2014 09:26:39 +0200
From:      Wojciech Macek <wma@semihalf.com>
To:        Ian Lepore <ian@freebsd.org>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: arm SMP on Cortex-A15
Message-ID:  <CANsEV8crwz5=jhkSp2zeu8tEendgcX2oVtW=yJ=jep9aKkY7fQ@mail.gmail.com>
In-Reply-To: <1396048363.81853.177.camel@revolution.hippie.lan>
References:  <CANsEV8euHTsfviiCMP_aet3qYiK2T-oK%2B-37eay7zAPH2S2vaA@mail.gmail.com> <20131220125638.GA5132@mail.bsdpad.com> <20131222092913.GA89153@mail.bsdpad.com> <CANsEV8fSoygoSUyQqKoEQ7tRxjqDOwrPD8dU7O2V2PXRj35j4A@mail.gmail.com> <20131222123636.GA61193@ci0.org> <CANsEV8fWvUkFHi8DP6Nr807RwPDB1iZrO39fpfa44qOkJPidZA@mail.gmail.com> <1395149146.1149.586.camel@revolution.hippie.lan> <1395254911.80941.9.camel@revolution.hippie.lan> <CANsEV8c047SNF61EgP6AiMR2oY=ofcMuTWYZnd60bRmp2Sk9HA@mail.gmail.com> <1395320561.80941.13.camel@revolution.hippie.lan> <CANsEV8f-Cyte-TO%2BCfWVcC_zw5dkYJT8Qfi92eH7yrfhqBvgjg@mail.gmail.com> <1395494401.81853.34.camel@revolution.hippie.lan> <CANsEV8eB1SQHjKrBbG=HCZyb6wuwZmmpq6BKfVatOHm8UtZ9ig@mail.gmail.com> <CANsEV8fJaBmSD7i01BoG8%2BKDPg0ZD5mdHkpLeyyYNB-8sSse_w@mail.gmail.com> <1395754555.81853.65.camel@revolution.hippie.lan> <CANsEV8fGAuRvGYf5cNLd8YuMFwbaGra6DLmv5_S2J1sgj9eNnQ@mail.gmail.com> <CANsEV8dmTZbWfNWDh77-CWOdVSCZfOucZLEs%2BdXvZQbb8PGbgQ@mail.gmail.com> <1396048363.81853.177.camel@revolution.hippie.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
Sorry for late response.

The word "workaround" might have been a little misleading, sorry. Yes, I
agree that it is necessary step for armv7-mp, but I was afraid that by
accident it can mask code errors. I just want to make sure if what I see in
pmap is caused by any missing tlb-inv or not. The Olivier's change stopped
the issue reproducing, so that's the reason I reverted it for a while.
Well, each panic is an opportunity to make the code working better :) Also,
that pmap-v6.c runs on platforms with separate D/I TBL flushing, and that's
the reason why I'd like to spend some time debugging.

Thanks a lot for comments. I remove the unnecessary lines and only leave
changes in pmap_kenter_internal and pmap_kremove, where pages can be
executable - it does not change anything for armv7, but might help on R-Pi.
I try to close it up this week.

Thanks,
Wojtek


2014-03-29 0:12 GMT+01:00 Ian Lepore <ian@freebsd.org>:

> I finally got around to looking at your latest patch.  Some comments...
>
> I think the changes to use ID flush in the following functions can't
> possibly be needed, because the memory in question is page table entries
> or other things that cannot be executable:   pmap_l2ptp_ctor(),
> pmap_grow_map(), pmap_growkernel(), pmap_copy_page_generic(),
> pmap_copy_pages().
>
> The change in vector_page_setprot() is probably unnecessary, but
> harmless.  There is no need for the L2_S_EXECUTABLE() check -- we just
> forced it to be executable on the previous line.
>
> In pmap_postinit() I'm pretty sure an ID flush isn't needed, but
> harmless since it's only called once at startup
>
> In general, since flushD and flushID are the same code none of these
> changes is bad right now.  There may come a day when we want to remove
> the BTB flush in the the flushD case for performance, and having a bunch
> of routines calling flushID when flushD is all that's really needed will
> mean that someone will have to do all this analysis again to actually
> get any performance benefit.
>
> -- Ian
>
> On Thu, 2014-03-27 at 14:27 +0100, Wojciech Macek wrote:
> > Sorry, I meant
> >
> https://drive.google.com/file/d/0B-7yTLrPxaWtRjJvTzJZY3g5akU/edit?usp=sharing
> >
> > W.
> >
> >
> > 2014-03-27 14:10 GMT+01:00 Wojciech Macek <wma@semihalf.com>:
> >
> > > I started cleaning up and have some concerns. Sorry guys, the r263251
> > > still looks like a workaround to me :)
> > > The fact that TLB cache issue is seen only on executable pages and that
> > > Olivier's fix helps suggests, that we are missing flush-I somewhere
> (crash
> > > appears only when deallocating userspace vm, so my guess is
> pmap_kremove,
> > > where definitely should be flushID). For test purposes I reverted
> r263251
> > > and added additional flushID/flushD logic to the pmap-v6 in
> > > performance-critical sections, or modified flushD to flushID where the
> > > speed is not the case. All changes integrated with previous patches
> can be
> > > found here:
> > >
> > >
> https://drive.google.com/file/d/0B-7yTLrPxaWtRFRHTUtpZ3VKbG8/edit?usp=sharing
> > > With above patch I'm able to run all tests without any problems.
> > >
> > > Maybe it would be better to fix these suspicious places in pmap
> instead?
> > >
> > > Regards,
> > > Wojtek
> > >
> > >
> > > 2014-03-25 14:35 GMT+01:00 Ian Lepore <ian@freebsd.org>:
> > >
> > > Good, that means we can stop searching for the missing tlb flush when a
> > >> pte is invalidated. :)  You guys can commit whenever you're ready (or
> I
> > >> can do it if you'd like).
> > >>
> > >> -- Ian
> > >>
> > >> On Tue, 2014-03-25 at 11:14 +0100, Wojciech Macek wrote:
> > >> > Hi Ian,
> > >> >
> > >> > The r263251 fix helped, thanks!
> > >> > So now, if you don';t have any objections, I will clean up a little
> the
> > >> > cpufunc.S & pmap-v6 changes and make them ready for submitting.
> > >> >
> > >> > Regards,
> > >> > Wojtek
> > >> >
> > >> >
> > >> > 2014-03-24 14:31 GMT+01:00 Wojciech Macek <wma@semihalf.com>:
> > >> >
> > >> > > Without the unconditional invalidation, the panic shows up just
> at the
> > >> > > beginning, after rootfs is mounted and init scripts are running.
> When
> > >> a
> > >> > > userspace process is exitting, its memory resources are freed -
> this
> > >> is the
> > >> > > moment pmap_remove_pages fails due to tharanslation fault. It is
> the
> > >> > > "typical" crash I observed when TLB-cache holds an old entry.
> Below
> > >> there
> > >> > > is a backtrace, but I doubt if it can be helpful.
> > >> > >
> > >> > > Regarding old pte/tlb, the TLB cache contains entry from old
> process
> > >> > > context, when in-memory-PTE value is already correct - at least
> this
> > >> was
> > >> > > the scenario when I debugged it last year. So, invalidating after
> > >> *pte=0 is
> > >> > > definitely not our case. The issue shows up only on a15, where the
> > >> > > tlb-prefetcher can cache pte entries anytime.
> > >> > >
> > >> > > I believe I don't have r263251 integrated. I'll give it a try -
> > >> typically,
> > >> > > the tlb-caused crash appears only on pages containing shared
> > >> libraries code
> > >> > > (with executable attr), so there is a chance Olivier's fix help.
> > >> > >
> > >> > >
> > >> > > The fault:
> > >> > >
> > >> > > vm_fault(0xc5b894f0, 0, 2, 0) -> 1
> > >> > > Fatal kernel mode data abort: 'Translation Fault (P)'
> > >> > > trapframe: 0xef2cca40
> > >> > > FSR=00000817, FAR=00000030, spsr=60000013
> > >> > > r0 =00000000, r1 =c320a048, r2 =00000000, r3 =c3208074
> > >> > > r4 =c5b7cd08, r5 =c5b7cd04, r6 =c5b05800, r7 =c5b895ac
> > >> > > r8 =c320a044, r9 =fffffffe, r10=c5b895ac, r11=ef2ccae0
> > >> > > r12=00000000, ssp=ef2cca90, slr=c0604148, pc =c0628a60
> > >> > >
> > >> > > [ thread pid 83 tid 100050 ]
> > >> > > Stopped at      pmap_remove_pages+0x270:        streq   r3, [r0,
> > >> #0x030]
> > >> > > db> bt
> > >> > > Tracing pid 83 tid 100050 td 0xc5bc4320
> > >> > > db_trace_self() at db_trace_self
> > >> > >          pc = 0xc061f62c  lr = 0xc024ddbc (db_hex2dec+0x498)
> > >> > >          sp = 0xef2cc738  fp = 0xef2cc750
> > >> > >         r10 = 0xc0708270
> > >> > > db_hex2dec() at db_hex2dec+0x498
> > >> > >          pc = 0xc024ddbc  lr = 0xc024d76c (db_command_loop+0x2f0)
> > >> > >          sp = 0xef2cc758  fp = 0xef2cc7f8
> > >> > >          r4 = 0x00000000  r5 = 0x00000000
> > >> > >          r6 = 0xc0695cf1
> > >> > > db_command_loop() at db_command_loop+0x2f0
> > >> > >          pc = 0xc024d76c  lr = 0xc024d4dc (db_command_loop+0x60)
> > >> > >          sp = 0xef2cc800  fp = 0xef2cc810
> > >> > >          r4 = 0xc0666f88  r5 = 0xc067b997
> > >> > >          r6 = 0xc0752954  r7 = 0xc0748f80
> > >> > >          r8 = 0xef2cca40  r9 = 0xc07084e0
> > >> > >         r10 = 0xc0748f84
> > >> > > db_command_loop() at db_command_loop+0x60
> > >> > >          pc = 0xc024d4dc  lr = 0xc024ffb8
> (X_db_symbol_values+0x254)
> > >> > >          sp = 0xef2cc818  fp = 0xef2cc938
> > >> > >          r4 = 0x00000000  r5 = 0xef2cc820
> > >> > >          r6 = 0xc0748fb0
> > >> > > X_db_symbol_values() at X_db_symbol_values+0x254
> > >> > >          pc = 0xc024ffb8  lr = 0xc0430554 (kdb_trap+0x164)
> > >> > >          sp = 0xef2cc940  fp = 0xef2cc968
> > >> > >          r4 = 0x00000000  r5 = 0x00000817
> > >> > >          r6 = 0xc0748fb0  r7 = 0xc0748f80
> > >> > > kdb_trap() at kdb_trap+0x164
> > >> > >          pc = 0xc0430554  lr = 0xc0632ef0
> (data_abort_handler+0x7dc)
> > >> > >          sp = 0xef2cc970  fp = 0xef2cc988
> > >> > >          r4 = 0xef2cca40  r5 = 0x600000d3
> > >> > >          r6 = 0x00000030  r7 = 0x00000817
> > >> > >          r8 = 0xc5b894f0  r9 = 0x00000001
> > >> > >         r10 = 0xef2cca40
> > >> > > data_abort_handler() at data_abort_handler+0x7dc
> > >> > >          pc = 0xc0632ef0  lr = 0xc0632cc0
> (data_abort_handler+0x5ac)
> > >> > >          sp = 0xef2cc990  fp = 0xef2cca38
> > >> > >          r4 = 0x00000817  r5 = 0xc5bc4320
> > >> > >          r6 = 0xc5a47a0c  r7 = 0x00000004
> > >> > > data_abort_handler() at data_abort_handler+0x5ac
> > >> > >          pc = 0xc0632cc0  lr = 0xc0621214 (exception_exit)
> > >> > >          sp = 0xef2cca40  fp = 0xef2ccae0
> > >> > >          r4 = 0xc5b7cd08  r5 = 0xc5b7cd04
> > >> > >          r6 = 0xc5b05800  r7 = 0xc5b895ac
> > >> > >          r8 = 0xc320a044  r9 = 0xfffffffe
> > >> > >         r10 = 0xc5b895ac
> > >> > > exception_exit() at exception_exit
> > >> > >          pc = 0xc0621214  lr = 0xc0604148 (PHYS_TO_VM_PAGE+0x48)
> > >> > >          sp = 0xef2cca94  fp = 0xef2ccae0
> > >> > >          r0 = 0x00000000  r1 = 0xc320a048
> > >> > >          r2 = 0x00000000  r3 = 0xc3208074
> > >> > >          r4 = 0xc5b7cd08  r5 = 0xc5b7cd04
> > >> > >          r6 = 0xc5b05800  r7 = 0xc5b895ac
> > >> > >          r8 = 0xc320a044  r9 = 0xfffffffe
> > >> > >         r10 = 0xc5b895ac r12 = 0x00000000
> > >> > > pmap_remove_pages() at pmap_remove_pages+0x270
> > >> > >          pc = 0xc0628a60  lr = 0xc05f2d08 (vmspace_exit+0xd8)
> > >> > >          sp = 0xef2ccae8  fp = 0xef2ccb10
> > >> > >          r4 = 0xc5b895a8  r5 = 0xc5bc4320
> > >> > >          r6 = 0x00000001  r7 = 0xc5a47960
> > >> > >          r8 = 0xc5b895ac  r9 = 0xc5b894f0
> > >> > >         r10 = 0xc0753be0
> > >> > > vmspace_exit() at vmspace_exit+0xd8
> > >> > >          pc = 0xc05f2d08  lr = 0xc03a7348 (exit1+0x930)
> > >> > >          sp = 0xef2ccb18  fp = 0xef2ccb70
> > >> > >          r4 = 0xc5a479fc  r5 = 0x00000004
> > >> > >          r6 = 0xc583861c  r7 = 0x00000001
> > >> > >          r8 = 0xc5a47960  r9 = 0xc5bc4320
> > >> > >         r10 = 0xc5a47a0c
> > >> > > exit1() at exit1+0x930
> > >> > >          pc = 0xc03a7348  lr = 0xc03f1604 (sigexit+0x8c4)
> > >> > >          sp = 0xef2ccb78  fp = 0xef2ccd68
> > >> > >          r4 = 0x00000002  r5 = 0xc5bc4320
> > >> > >          r6 = 0xc5a47960  r7 = 0xc5a47a0c
> > >> > >          r8 = 0xc5bc4320  r9 = 0xc5b7a000
> > >> > >         r10 = 0x00000002
> > >> > > sigexit() at sigexit+0x8c4
> > >> > >          pc = 0xc03f1604  lr = 0xc03f23a0 (postsig+0x39c)
> > >> > >          sp = 0xef2ccd70  fp = 0xef2cce18
> > >> > >          r4 = 0x00000001  r5 = 0xc5bc4320
> > >> > >          r6 = 0xc5a47960  r7 = 0xc5b7aab8
> > >> > >          r8 = 0xc5a47a0c  r9 = 0xc5b7a000
> > >> > >         r10 = 0x00000002
> > >> > > postsig() at postsig+0x39c
> > >> > >          pc = 0xc03f23a0  lr = 0xc044388c (ast+0x4f4)
> > >> > >          sp = 0xef2cce20  fp = 0xef2cce58
> > >> > >          r4 = 0x00000001  r5 = 0xc5bc4320
> > >> > >          r6 = 0xc5a47960  r7 = 0xc5a47a0c
> > >> > >          r8 = 0xc5a47a0c  r9 = 0x01020804
> > >> > >         r10 = 0x00000ab8
> > >> > > ast() at ast+0x4f4
> > >> > >          pc = 0xc044388c  lr = 0xc0621080 (swi_entry+0x6c)
> > >> > >          sp = 0xef2cce60  fp = 0xbfffe438
> > >> > >          r4 = 0x40000013  r5 = 0xc5bc4320
> > >> > >          r6 = 0x00000001  r7 = 0x00000154
> > >> > >          r8 = 0x20037008  r9 = 0xbfffee5c
> > >> > >         r10 = 0xbfffea10
> > >> > > swi_entry() at swi_entry+0x6c
> > >> > >          pc = 0xc0621080  lr = 0xc0621080 (swi_entry+0x6c)
> > >> > >          sp = 0xef2cce60  fp = 0xbfffe438
> > >> > > Unable to unwind further
> > >> > > db>
> > >> > >
> > >> > >
> > >> > >
> > >> > > 2014-03-22 14:20 GMT+01:00 Ian Lepore <ian@freebsd.org>:
> > >> > >
> > >> > > On Fri, 2014-03-21 at 07:20 +0100, Wojciech Macek wrote:
> > >> > >> > No, changing flushD to flushID did not make any difference,
> but I
> > >> think
> > >> > >> it
> > >> > >> > should be there - D-only flushing might not be sufficient.
> > >> > >> >
> > >> > >>
> > >> > >> Olivier reminded me right after I posted that: last week I made a
> > >> change
> > >> > >> to cpufunc.c that makes flushD and flushID the same.  So of
> course it
> > >> > >> made no difference.  :)  It really should be flushID though, in
> case
> > >> > >> that ever changes.
> > >> > >>
> > >> > >> You didn't say whether you have that change, which was r263251.
> > >> > >>
> > >> > >> > Currently, I'm running pmap_kernel_internal attached below. It
> is
> > >> doing
> > >> > >> > unconditional flushID at the end, just like the old comment was
> > >> saying
> > >> > >> :)
> > >> > >> > SMP seems to be stable.
> > >> > >> >
> > >> > >>
> > >> > >> That seems to say that somehow there is a valid TLB entry even
> though
> > >> > >> the old pte for that entry is zero.  That means there's a problem
> > >> > >> somewhere else in the code, but I don't see it.  It looks to me
> like
> > >> we
> > >> > >> do a TLB flush everywhere that we zero out a pte.
> > >> > >>
> > >> > >> You said without the unconditional flush it panics at startup.
> > >>  Where in
> > >> > >> startup?  Early, or after init is launched or what?  Where does
> the
> > >> > >> panic backtrace to?
> > >> > >>
> > >> > >> If we've got some other pte/tlb maintenance problem, I'd hate to
> > >> hide it
> > >> > >> with this unconditional flush and have it appear as some other
> > >> problem
> > >> > >> later that will be even harder to track down.
> > >> > >>
> > >> > >> -- Ian
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >
> > >>
> > >>
> > >>
> > >
> > _______________________________________________
> > freebsd-arm@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-arm
> > To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org"
>
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANsEV8crwz5=jhkSp2zeu8tEendgcX2oVtW=yJ=jep9aKkY7fQ>