Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Nov 2003 11:33:58 -0500
From:      Don Bowman <don@sandvine.com>
To:        'Uwe Doering' <gemini@geminix.org>, freebsd-gnats-submit@FreeBSD.org
Cc:        freebsd-stable@freebsd.org
Subject:   RE: kern/59719 Re: 4.9 Stable Crashes on SuperMicro with SMP
Message-ID:  <FE045D4D9F7AED4CBFF1B3B813C85337035E3F61@mail.sandvine.com>

next in thread | raw e-mail | index | archive | help

From: Uwe Doering [mailto:gemini@geminix.org]
> Jonathan Gilpin wrote:
> > I've run memtest (memtest86.com) kindly provided by Don and 
> it passed all
> > the tests. I've installed installed a kernel module to test 
> for memory
> > errors and found that again no memory errors are found... 
> So this means it's
> > either a problem with the CPU's or a geniune bug in the 
> kernel. (bugger!)
> 
> No, that's unfortunately not what it means.  If a memory test 
> fails you 
> can draw the conclusion that you have bad memory, but this 
> doesn't work 
> the other way round.  If a memory test passes there is still a 
> possibility that a memory chip is the culprit since memory 
> test software 
> cannot find all errors.
> 
> Also, there is the chip set on the mainboard that coordinates 
> bus access 
> etc. for the two CPUs.  Mainboard and chip set developers are 
> known to 
> make errors, too.  In this case you would have to swap the entire 
> mainboard, possible with one from a different manufacturer.  
> I can tell 
> you from my own experience that it is really hard to find reliable PC 
> hardware these days, in light of ever shorter and faster 
> product release 
> cycles.

I have several hundred of the motherboard the poster is using,
and it works reliably with MP operation with 4.X.
The memtest86 that i sent him understands the ECC registers
on the e7501 MCH, it should find all correctable and uncorrectable
errors.

--don



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE045D4D9F7AED4CBFF1B3B813C85337035E3F61>