Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 24 Oct 2020 15:37:35 -0400
From:      Mark Johnston <markj@freebsd.org>
To:        mmel@freebsd.org
Cc:        bob prohaska <fbsd@www.zefox.net>, freebsd-current@freebsd.org, freebsd-arm@freebsd.org
Subject:   Re: panic: non-current pmap 0xffffa00020eab8f0 on Rpi3
Message-ID:  <20201024193735.GA7755@raichu>
In-Reply-To: <454e1e9f-e839-8961-2ae1-9ddd86f1cefd@freebsd.org>
References:  <20201006021029.GA13260@www.zefox.net> <20201006133743.GA96285@raichu> <c8a5e1d2-0c47-e3f7-300a-f2fce55d2819@freebsd.org> <20201019203954.GC46122@raichu> <454e1e9f-e839-8961-2ae1-9ddd86f1cefd@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 23, 2020 at 06:32:25PM +0200, Michal Meloun wrote:
> 
> 
> On 19.10.2020 22:39, Mark Johnston wrote:
> > On Fri, Oct 16, 2020 at 11:53:56AM +0200, Michal Meloun wrote:
> >>
> >>
> >> On 06.10.2020 15:37, Mark Johnston wrote:
> >>> On Mon, Oct 05, 2020 at 07:10:29PM -0700, bob prohaska wrote:
> >>>> Still seeing non-current pmap panics on the Pi3, this time a B+ running
> >>>> 13.0-CURRENT (GENERIC-MMCCAM) #0 71e02448ffb-c271826(master)
> >>>> during a -j4 buildworld.  The backtrace reports
> >>>>
> >>>> panic: non-current pmap 0xffffa00020eab8f0
> >>>
> >>> Could you show the output of "show procvm" from the debugger?
> >>
> >> I see same panic too, in my case its very rare - typical scenario is
> >> rebuild of kf5 ports (~250, 2 days of full load).  Any idea how to debug
> >> this?
> >> Michal
> > 
> > I suspect that there is some race involving the pmap switching in
> > vmspace_exit(), but I can't see it.  In the example below, presumably
> > process 22604 on CPU 0 is also exiting?  Could you show the backtrace?>
> > It would also be useful to see the value of PCPU_GET(curpmap) at the
> > time of the panic.  I'm not sure if there's a way to get that from DDB,
> > but I suspect it should be equal to &vmspace0->vm_pmap.
> Mark,
> I think that I found problem.
> The PCPU_GET() is not (and is not supposed to be) an atomic operation,
> it expects that thread is at least pinned.
> This is not true for pmap_remove_pages() - so I think that the KASSERT
> is racy and shoud be removed (or at least covered by
> sched_pin()/sched_unpin() pair).
> What do you think?

I think you're right.  On amd64 curpmap is loaded using a single
instruction so the assertion happens to work properly.  On arm64 we
have:

   0xffff0000007ff138 <+32>:      mov     x8, x18
   0xffff0000007ff13c <+36>:      ldr     x8, [x8, #216]
   0xffff0000007ff140 <+40>:      mov     x26, x0
   0xffff0000007ff144 <+44>:      cmp     x8, x0

Though, it looks like arm64's PCPU_GET could be modified to combine the
first two instructions.

To fix it, we could perhaps change the KASSERT to verify that pmap ==
vmspace_pmap(curthread->td_proc->p_vmspace).  The various
implementations of pmap_remove_pages() have different flavours of the
same check and it would be nice to unify them.  Using sched_pin() would
also be fine I think.

> > I think vmspace_exit() should issue a release fence with the cmpset and
> > an acquire fence when handling the refcnt == 1 case,
> Yep, true, fully agree.

Alan pointed out in the review that pmap_remove_pages() acquires the
pmap lock, which I missed, so I don't think the extra barriers are
necessary after all.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20201024193735.GA7755>