Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Aug 2002 09:19:22 -0400
From:      Bosko Milekic <bmilekic@unixdaemons.com>
To:        Mark Santcroos <marks@ripe.net>
Cc:        Terry Lambert <tlambert2@mindspring.com>, freebsd-current@FreeBSD.ORG, hackers@FreeBSD.ORG
Subject:   Re: Memory corruption in CURRENT
Message-ID:  <20020822091922.A33850@unixdaemons.com>
In-Reply-To: <20020822113411.GA23616@laptop.6bone.nl>; from marks@ripe.net on Thu, Aug 22, 2002 at 01:34:11PM %2B0200
References:  <200208220909.g7M99NcS077303@freebsd.dk> <3D64B005.6657A3B5@mindspring.com> <20020822100014.GA17143@ripe.net> <3D64BA1F.B3C8C8E0@mindspring.com> <20020822102553.GA17453@ripe.net> <3D64C9C2.30A37BF8@mindspring.com> <20020822113411.GA23616@laptop.6bone.nl>

next in thread | previous in thread | raw e-mail | index | archive | help

We have seen weird problems regarding the pmap PG_G related stuff (well
sort of, it has to do with PSE and PG_G) on ppro and pII chips
(apparently, this is not the case with at least Xeons) but what
happened, for the record, was this:

We would enable PSE and switch the pde corresponding to the first 4M
to the new entry describing a 4M page, instead of the one describing the
location of the ptes covering those 4M.  Then, what we would do is walk
all the ptes, including those old stale and useless ones that previously
described those first 4M and set the PG_G bit there (Note: we've already
set PG_G on our 4M page).  Normally, we don't really need to touch the
old ptes but we did it just because it was more convenient (i.e. a few
lines less code).  Oddly enough, on the ppro and pII what would happen
is that we would page fault on that page where we kept the old ptes
covering those first 4M, and only on that page!  The other ptes - the
ones that actually mattered - were all fine.  The ptes are mapped above
the 4M so I don't see how changing the pde for those first 4M would have
done anything.  To "fix" the problem, we (actually Peter) committed code
that basically just jumps beyond that first page of stale ptes when
setting the PG_G bit for the 4K pages, and since then, the problem seems
to have gone away.  Although we are not sure, this seems like a silicon
bug.

Since then, Peter had some work planned to load the kernel above the
first 4M to see if that fixed the problems.  I'm wondering if this
problem on the PIVs could be related.  Please let us know if the removal
of those two options really makes 5-10 buildworlds in a row work out for
you.

Regards,
Bosko

On Thu, Aug 22, 2002 at 01:34:11PM +0200, Mark Santcroos wrote:
> On Thu, Aug 22, 2002 at 04:23:46AM -0700, Terry Lambert wrote:
> > Ugh!  Wait until it seems to work for a statistically significant
> > sample size, and for more than one person before calling it "happy"!
> > 
> > Also, I'm not sure looking at the code whether or not the PG_G is
> > truly significant, or just preterbs the workaround.  The problem
> > I've referred to in my "hunch" here is actually related solely to
> > the PSE, but with the recent code reorganization in locore.s, etc.,
> > it could have become more significant.
> 
> I was just giving a slight report, not yelling halleluja yet ;-)
> 
> It's doing the 2nd buildworld now.
> 
> Do you also want me to try to split up the disabling of the two options?
> 
> Mark
> 
> -- 
> Mark Santcroos				RIPE Network Coordination Centre
> http://www.ripe.net/home/mark/		New Projects Group/TTM
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-current" in the body of the message
> 

-- 
Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020822091922.A33850>