Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Aug 2002 01:32:02 -0700
From:      Peter Wemm <peter@wemm.org>
To:        current@FreeBSD.ORG
Subject:   Re: Memory corruption in -CURRENT [was Re: Plea to committers to only commit to HEAD if you run -current {from developers@FreeBSD.org}] 
Message-ID:  <20020823083202.900A12A7D6@canning.wemm.org>
In-Reply-To: <20020823063155.GA215@HAL9000.homeunix.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
David Schultz wrote:
> Thus spake Terry Lambert <tlambert2@mindspring.com>:
> > David Schultz wrote:
> > > Thus spake Terry Lambert <tlambert2@mindspring.com>:
> > > > DISABLE_PSE is a 1:6 probability; DISABLE_PG_G is a 1:100 (both
> > > > estimates, but on that order), so mixing and matching them will
> > > > not usually give any additional information.  Martin got "lucky"
> > > > with his machine... it seems to require both.
> > > >
> > > > The problem is a hardware bug in most Pentium on up processors,
> > > > which gets worse in newer CPUs (P4, AMD) as they try to optimize
> > > > certain things.  It's like writing ANSI C without "volatile".
> > > 
> > > It sounds like you're describing a cache coherence problem.  Could
> > > you elaborate or point me to a reference on this?  Thanks.
> > 
> > There is no reference on this.  It is an undocumented hardware bug.
> 
> Err...so you know there's a long-standing random bug that it has
> to do with 4 MB pages, but nobody has bothered to characterize it
> after all these years?  This sounds much like the problem Linux
> had with 4 MB pages and AGP GART on Athlons, where the hardware
> designers maintained that it was a `feature', not a bug, and that
> the software people were relying on undocumented behavior.

I know of one bug we were running into that basically boils down to 'do not
point a 4MB page at physical address zero or funny things happen'.  This
particular one affects pentium pro and the older pentium 2 systems.  I have
finally fixed this particular problem and am testing it out now on two
troublesome systems that I have available.  I have a vague suspicion that
it just *might* be a factor in the current round of problems on UP
pentium4's.

Terry claims to have diagnosed another bug but says he will not tell
anybody what it is or how to work around it.

There are also fundamental races in the pmap code when page tables are
shared.  We've fixed some of the bugs, but there are more still. :-(

Cheers,
-Peter
--
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020823083202.900A12A7D6>