FreeBSD Mail Archives

Date:      Tue, 25 Mar 2014 11:14:47 +0100
From:      Wojciech Macek <wma@semihalf.com>
To:        Ian Lepore <ian@freebsd.org>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: arm SMP on Cortex-A15
Message-ID:  <CANsEV8fJaBmSD7i01BoG8%2BKDPg0ZD5mdHkpLeyyYNB-8sSse_w@mail.gmail.com>
In-Reply-To: <CANsEV8eB1SQHjKrBbG=HCZyb6wuwZmmpq6BKfVatOHm8UtZ9ig@mail.gmail.com>
References:  <CANsEV8euHTsfviiCMP_aet3qYiK2T-oK%2B-37eay7zAPH2S2vaA@mail.gmail.com> <20131220125638.GA5132@mail.bsdpad.com> <20131222092913.GA89153@mail.bsdpad.com> <CANsEV8fSoygoSUyQqKoEQ7tRxjqDOwrPD8dU7O2V2PXRj35j4A@mail.gmail.com> <20131222123636.GA61193@ci0.org> <CANsEV8fWvUkFHi8DP6Nr807RwPDB1iZrO39fpfa44qOkJPidZA@mail.gmail.com> <1395149146.1149.586.camel@revolution.hippie.lan> <1395254911.80941.9.camel@revolution.hippie.lan> <CANsEV8c047SNF61EgP6AiMR2oY=ofcMuTWYZnd60bRmp2Sk9HA@mail.gmail.com> <1395320561.80941.13.camel@revolution.hippie.lan> <CANsEV8f-Cyte-TO%2BCfWVcC_zw5dkYJT8Qfi92eH7yrfhqBvgjg@mail.gmail.com> <1395494401.81853.34.camel@revolution.hippie.lan> <CANsEV8eB1SQHjKrBbG=HCZyb6wuwZmmpq6BKfVatOHm8UtZ9ig@mail.gmail.com>

Hi Ian,

The r263251 fix helped, thanks!
So now, if you don';t have any objections, I will clean up a little the
cpufunc.S & pmap-v6 changes and make them ready for submitting.

Regards,
Wojtek


2014-03-24 14:31 GMT+01:00 Wojciech Macek <wma@semihalf.com>:

> Without the unconditional invalidation, the panic shows up just at the
> beginning, after rootfs is mounted and init scripts are running. When a
> userspace process is exitting, its memory resources are freed - this is the
> moment pmap_remove_pages fails due to tharanslation fault. It is the
> "typical" crash I observed when TLB-cache holds an old entry. Below there
> is a backtrace, but I doubt if it can be helpful.
>
> Regarding old pte/tlb, the TLB cache contains entry from old process
> context, when in-memory-PTE value is already correct - at least this was
> the scenario when I debugged it last year. So, invalidating after *pte=0 is
> definitely not our case. The issue shows up only on a15, where the
> tlb-prefetcher can cache pte entries anytime.
>
> I believe I don't have r263251 integrated. I'll give it a try - typically,
> the tlb-caused crash appears only on pages containing shared libraries code
> (with executable attr), so there is a chance Olivier's fix help.
>
>
> The fault:
>
> vm_fault(0xc5b894f0, 0, 2, 0) -> 1
> Fatal kernel mode data abort: 'Translation Fault (P)'
> trapframe: 0xef2cca40
> FSR=00000817, FAR=00000030, spsr=60000013
> r0 =00000000, r1 =c320a048, r2 =00000000, r3 =c3208074
> r4 =c5b7cd08, r5 =c5b7cd04, r6 =c5b05800, r7 =c5b895ac
> r8 =c320a044, r9 =fffffffe, r10=c5b895ac, r11=ef2ccae0
> r12=00000000, ssp=ef2cca90, slr=c0604148, pc =c0628a60
>
> [ thread pid 83 tid 100050 ]
> Stopped at      pmap_remove_pages+0x270:        streq   r3, [r0, #0x030]
> db> bt
> Tracing pid 83 tid 100050 td 0xc5bc4320
> db_trace_self() at db_trace_self
>          pc = 0xc061f62c  lr = 0xc024ddbc (db_hex2dec+0x498)
>          sp = 0xef2cc738  fp = 0xef2cc750
>         r10 = 0xc0708270
> db_hex2dec() at db_hex2dec+0x498
>          pc = 0xc024ddbc  lr = 0xc024d76c (db_command_loop+0x2f0)
>          sp = 0xef2cc758  fp = 0xef2cc7f8
>          r4 = 0x00000000  r5 = 0x00000000
>          r6 = 0xc0695cf1
> db_command_loop() at db_command_loop+0x2f0
>          pc = 0xc024d76c  lr = 0xc024d4dc (db_command_loop+0x60)
>          sp = 0xef2cc800  fp = 0xef2cc810
>          r4 = 0xc0666f88  r5 = 0xc067b997
>          r6 = 0xc0752954  r7 = 0xc0748f80
>          r8 = 0xef2cca40  r9 = 0xc07084e0
>         r10 = 0xc0748f84
> db_command_loop() at db_command_loop+0x60
>          pc = 0xc024d4dc  lr = 0xc024ffb8 (X_db_symbol_values+0x254)
>          sp = 0xef2cc818  fp = 0xef2cc938
>          r4 = 0x00000000  r5 = 0xef2cc820
>          r6 = 0xc0748fb0
> X_db_symbol_values() at X_db_symbol_values+0x254
>          pc = 0xc024ffb8  lr = 0xc0430554 (kdb_trap+0x164)
>          sp = 0xef2cc940  fp = 0xef2cc968
>          r4 = 0x00000000  r5 = 0x00000817
>          r6 = 0xc0748fb0  r7 = 0xc0748f80
> kdb_trap() at kdb_trap+0x164
>          pc = 0xc0430554  lr = 0xc0632ef0 (data_abort_handler+0x7dc)
>          sp = 0xef2cc970  fp = 0xef2cc988
>          r4 = 0xef2cca40  r5 = 0x600000d3
>          r6 = 0x00000030  r7 = 0x00000817
>          r8 = 0xc5b894f0  r9 = 0x00000001
>         r10 = 0xef2cca40
> data_abort_handler() at data_abort_handler+0x7dc
>          pc = 0xc0632ef0  lr = 0xc0632cc0 (data_abort_handler+0x5ac)
>          sp = 0xef2cc990  fp = 0xef2cca38
>          r4 = 0x00000817  r5 = 0xc5bc4320
>          r6 = 0xc5a47a0c  r7 = 0x00000004
> data_abort_handler() at data_abort_handler+0x5ac
>          pc = 0xc0632cc0  lr = 0xc0621214 (exception_exit)
>          sp = 0xef2cca40  fp = 0xef2ccae0
>          r4 = 0xc5b7cd08  r5 = 0xc5b7cd04
>          r6 = 0xc5b05800  r7 = 0xc5b895ac
>          r8 = 0xc320a044  r9 = 0xfffffffe
>         r10 = 0xc5b895ac
> exception_exit() at exception_exit
>          pc = 0xc0621214  lr = 0xc0604148 (PHYS_TO_VM_PAGE+0x48)
>          sp = 0xef2cca94  fp = 0xef2ccae0
>          r0 = 0x00000000  r1 = 0xc320a048
>          r2 = 0x00000000  r3 = 0xc3208074
>          r4 = 0xc5b7cd08  r5 = 0xc5b7cd04
>          r6 = 0xc5b05800  r7 = 0xc5b895ac
>          r8 = 0xc320a044  r9 = 0xfffffffe
>         r10 = 0xc5b895ac r12 = 0x00000000
> pmap_remove_pages() at pmap_remove_pages+0x270
>          pc = 0xc0628a60  lr = 0xc05f2d08 (vmspace_exit+0xd8)
>          sp = 0xef2ccae8  fp = 0xef2ccb10
>          r4 = 0xc5b895a8  r5 = 0xc5bc4320
>          r6 = 0x00000001  r7 = 0xc5a47960
>          r8 = 0xc5b895ac  r9 = 0xc5b894f0
>         r10 = 0xc0753be0
> vmspace_exit() at vmspace_exit+0xd8
>          pc = 0xc05f2d08  lr = 0xc03a7348 (exit1+0x930)
>          sp = 0xef2ccb18  fp = 0xef2ccb70
>          r4 = 0xc5a479fc  r5 = 0x00000004
>          r6 = 0xc583861c  r7 = 0x00000001
>          r8 = 0xc5a47960  r9 = 0xc5bc4320
>         r10 = 0xc5a47a0c
> exit1() at exit1+0x930
>          pc = 0xc03a7348  lr = 0xc03f1604 (sigexit+0x8c4)
>          sp = 0xef2ccb78  fp = 0xef2ccd68
>          r4 = 0x00000002  r5 = 0xc5bc4320
>          r6 = 0xc5a47960  r7 = 0xc5a47a0c
>          r8 = 0xc5bc4320  r9 = 0xc5b7a000
>         r10 = 0x00000002
> sigexit() at sigexit+0x8c4
>          pc = 0xc03f1604  lr = 0xc03f23a0 (postsig+0x39c)
>          sp = 0xef2ccd70  fp = 0xef2cce18
>          r4 = 0x00000001  r5 = 0xc5bc4320
>          r6 = 0xc5a47960  r7 = 0xc5b7aab8
>          r8 = 0xc5a47a0c  r9 = 0xc5b7a000
>         r10 = 0x00000002
> postsig() at postsig+0x39c
>          pc = 0xc03f23a0  lr = 0xc044388c (ast+0x4f4)
>          sp = 0xef2cce20  fp = 0xef2cce58
>          r4 = 0x00000001  r5 = 0xc5bc4320
>          r6 = 0xc5a47960  r7 = 0xc5a47a0c
>          r8 = 0xc5a47a0c  r9 = 0x01020804
>         r10 = 0x00000ab8
> ast() at ast+0x4f4
>          pc = 0xc044388c  lr = 0xc0621080 (swi_entry+0x6c)
>          sp = 0xef2cce60  fp = 0xbfffe438
>          r4 = 0x40000013  r5 = 0xc5bc4320
>          r6 = 0x00000001  r7 = 0x00000154
>          r8 = 0x20037008  r9 = 0xbfffee5c
>         r10 = 0xbfffea10
> swi_entry() at swi_entry+0x6c
>          pc = 0xc0621080  lr = 0xc0621080 (swi_entry+0x6c)
>          sp = 0xef2cce60  fp = 0xbfffe438
> Unable to unwind further
> db>
>
>
>
> 2014-03-22 14:20 GMT+01:00 Ian Lepore <ian@freebsd.org>:
>
> On Fri, 2014-03-21 at 07:20 +0100, Wojciech Macek wrote:
>> > No, changing flushD to flushID did not make any difference, but I think
>> it
>> > should be there - D-only flushing might not be sufficient.
>> >
>>
>> Olivier reminded me right after I posted that: last week I made a change
>> to cpufunc.c that makes flushD and flushID the same.  So of course it
>> made no difference.  :)  It really should be flushID though, in case
>> that ever changes.
>>
>> You didn't say whether you have that change, which was r263251.
>>
>> > Currently, I'm running pmap_kernel_internal attached below. It is doing
>> > unconditional flushID at the end, just like the old comment was saying
>> :)
>> > SMP seems to be stable.
>> >
>>
>> That seems to say that somehow there is a valid TLB entry even though
>> the old pte for that entry is zero.  That means there's a problem
>> somewhere else in the code, but I don't see it.  It looks to me like we
>> do a TLB flush everywhere that we zero out a pte.
>>
>> You said without the unconditional flush it panics at startup.  Where in
>> startup?  Early, or after init is launched or what?  Where does the
>> panic backtrace to?
>>
>> If we've got some other pte/tlb maintenance problem, I'd hate to hide it
>> with this unconditional flush and have it appear as some other problem
>> later that will be even harder to track down.
>>
>> -- Ian
>>
>>
>>
>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANsEV8fJaBmSD7i01BoG8%2BKDPg0ZD5mdHkpLeyyYNB-8sSse_w>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation