Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Jan 2013 09:48:05 -0700
From:      Ian Lepore <ian@FreeBSD.org>
To:        Warren Block <wblock@wonkity.com>
Cc:        freebsd-stable@FreeBSD.org, Ronald Klop <ronald-freebsd8@klop.yi.org>
Subject:   Re: Spontaneous reboots on Intel i5 and FreeBSD 9.0
Message-ID:  <1358527685.32417.237.camel@revolution.hippie.lan>
In-Reply-To: <alpine.BSF.2.00.1301180758460.96418@wonkity.com>
References:  <CAJ-UWtSANRMsOqwW9rJ6Eebta6=AiHeNO6fhPO0mhYhZiMmn4A@mail.gmail.com> <op.wq3zxn038527sy@ronaldradial.versatec.local> <alpine.BSF.2.00.1301180758460.96418@wonkity.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2013-01-18 at 08:04 -0700, Warren Block wrote:
> On Fri, 18 Jan 2013, Ronald Klop wrote:
> 
> > Memory chips gone bad? Power (or other) cables gone loose?
> 
> Memory failures will cause intermittent and mysterious things.  Easy to 
> test, too, just run memtest86 on it for a while.  Do that before 
> rebuilding.  If memory is failing, corrupted data could be written to 
> disk.
> 
> I had a Crucial DIMM fail spontaneously a couple of weeks ago.  Working 
> one minute, totally failed the next.  The machine rebooted, for no 
> visible reason.  After it came back up, compiles failed, always with 
> different errors and in different places.
> 
> Power supplies also fail, as do motherboards.  These are both harder to 
> swap out than memory, so test the memory first.

I tend to agree, a machine that starts rebooting spontaneously when
nothing significant changed and it used to be stable is usually a sign
of a failing power supply or memory.  

But I disagree about memtest86.  It's probably not completely without
value, but to me its value is only negative:  if it tells you memory is
bad, it is.  If it tells you it's good, you know nothing.  Over the
years I've had 5 dimms fail.  memtest86 found the error in one of them,
but said all the others were fine in continuous 48-hour tests.  I even
tried running the tests on multiple systems.

The thing that always reliably finds bad memory for me
is /usr/ports/math/mprime run in test/benchmark mode.  It often takes 24
or more hours of runtime, but it will find your bad memory.

-- Ian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1358527685.32417.237.camel>