Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Jan 2013 13:23:00 -0700 (MST)
From:      Warren Block <wblock@wonkity.com>
To:        kpneal@pobox.com
Cc:        freebsd-stable@FreeBSD.org, Ian Lepore <ian@FreeBSD.org>, Ronald Klop <ronald-freebsd8@klop.yi.org>
Subject:   Re:  Spontaneous reboots on Intel i5 and FreeBSD 9.0
Message-ID:  <alpine.BSF.2.00.1301181313560.1604@wonkity.com>
In-Reply-To: <20130118173602.GA76438@neutralgood.org>
References:  <CAJ-UWtSANRMsOqwW9rJ6Eebta6=AiHeNO6fhPO0mhYhZiMmn4A@mail.gmail.com> <op.wq3zxn038527sy@ronaldradial.versatec.local> <alpine.BSF.2.00.1301180758460.96418@wonkity.com> <1358527685.32417.237.camel@revolution.hippie.lan> <20130118173602.GA76438@neutralgood.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 18 Jan 2013, kpneal@pobox.com wrote:

> On Fri, Jan 18, 2013 at 09:48:05AM -0700, Ian Lepore wrote:
>> I tend to agree, a machine that starts rebooting spontaneously when
>> nothing significant changed and it used to be stable is usually a sign
>> of a failing power supply or memory.
>
> Agreed.
>
>> But I disagree about memtest86.  It's probably not completely without
>> value, but to me its value is only negative:  if it tells you memory is
>> bad, it is.  If it tells you it's good, you know nothing.  Over the
>> years I've had 5 dimms fail.  memtest86 found the error in one of them,
>> but said all the others were fine in continuous 48-hour tests.  I even
>> tried running the tests on multiple systems.
>>
>> The thing that always reliably finds bad memory for me
>> is /usr/ports/math/mprime run in test/benchmark mode.  It often takes 24
>> or more hours of runtime, but it will find your bad memory.
>
> I've had "good" luck with gcc showing bad memory. If compiling a new kernel
> produces seg faults then I know I have a hardware problem. I've seen
> compilers at work failing due to bad memory as well.
>
> Some problems only happen with particular access patterns.  So if a compiler
> works fine then, like memtest86, it doesn't say anything about the health
> of the hardware.

Most test tools are like that.  They might diagnose something as bad, 
but they often can't prove it is good.  SMART has a reputation for not 
finding any problems on disks that are failing, and capacitors that 
aren't swollen or leaking still may not be working.

But diagnostic tools can at least give a hint.  In my case, memtest 
indicated a problem--a big problem.  I removed one DIMM at random (there 
were only two) and the problems and memtest errors both went away. 
Replace the DIMM, and both came back.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1301181313560.1604>