Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Jan 2003 11:13:29 -0800
From:      Marcel Moolenaar <marcel@xcllnt.net>
To:        Doug Rabson <dfr@nlsystems.com>
Cc:        Arun Sharma <arun.sharma@intel.com>, freebsd-ia64@FreeBSD.ORG
Subject:   Re: unaligned fault in pmap_find_vhpt
Message-ID:  <20030107191329.GA619@dhcp01.pn.xcllnt.net>
In-Reply-To: <200301070924.42508.dfr@nlsystems.com>
References:  <200301032303.gBQJBOs00863@unix-os.sc.intel.com> <20030104043524.GA2059@dhcp01.pn.xcllnt.net> <200301070924.42508.dfr@nlsystems.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jan 07, 2003 at 09:24:42AM +0000, Doug Rabson wrote:
> On Saturday 04 January 2003 4:35 am, Marcel Moolenaar wrote:
> > On Fri, Jan 03, 2003 at 03:03:14PM -0800, Arun Sharma wrote:
> > > I saw a kernel mode unaligned fault during a compilation workload
> > > yesterday on an SMP 5.0-RC1 kernel. The fault happened here:
> > >
> > > 0xe000000000aad660 <pmap_find_vhpt+80>:
> > >
> > > More info below. It looks like the pte_chain is getting corrupted
> > > somehow. What is the locking scheme being used to protect pte
> > > collision chains on an SMP kernel ?
> >
> > We don't really have a consistent locking scheme. We walk and
> > update the VHPT from IVA interrupt code as well. Under high
> > load, a SMP kernel corrupts process space. I haven't seen the
> > unaligned fault you mention.
> 
> The IVA only updates the contents of the VHPT head entry (which is 
> always a copy of some element of the pte_chain). It never edits the 
> chain but I can see it getting confused if someone else edits the chain 
> while the IVA is walking it.

We may also have a problem when multiple CPUs fault on some address
and happen to end up with the same hash value. The chain is walked
when the tag differs, irrespective of whether tag-invalid is set or
not. Also, since we invalidate the entry after we found the PTE, two
or more CPUs could be walking the chain concurrently even if we
respected TI. Consequently, two or more CPUs can update the head
entry concurrently. See also BTW1.

BTW1: the page fault handler should really update the TLB as well
as inserting the PTE into the VHPT. We now need 2 faults on a TLB
miss when the PTE is not in the VHPT.

BTW2: we really should make the VHPT optional. It's there for
performance, not correctness. Being able to run without VHPT
not only helps detect design bugs, it can also help debug SMP
issues by being able to disable a possible faulty component.

BTW3: being able to play with different kinds of VHPT schemes can
proof educational. I think it would be good to abstract as much as
possible of the actual VHPT implementation so that we can switch
implementation at compile time and study behaviour. Not just the
short (per-region) or long (global) format VHPT, but also how we
implement the hash buckets (ie collisions) in the long format VHPT.

-- 
 Marcel Moolenaar	  USPA: A-39004		 marcel@xcllnt.net

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-ia64" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030107191329.GA619>