Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Jan 2013 20:36:56 +0200
From:      Marin Atanasov Nikolov <dnaeon@gmail.com>
To:        kpneal@pobox.com
Cc:        Warren Block <wblock@wonkity.com>, ml-freebsd-stable <freebsd-stable@freebsd.org>, Ian Lepore <ian@freebsd.org>, Ronald Klop <ronald-freebsd8@klop.yi.org>
Subject:   Re: Spontaneous reboots on Intel i5 and FreeBSD 9.0
Message-ID:  <CAJ-UWtQ=iKwCZeaAiFRiir9E7_CLbTpFHrUq6TfbS%2BacdQ3AfQ@mail.gmail.com>
In-Reply-To: <20130118173602.GA76438@neutralgood.org>
References:  <CAJ-UWtSANRMsOqwW9rJ6Eebta6=AiHeNO6fhPO0mhYhZiMmn4A@mail.gmail.com> <op.wq3zxn038527sy@ronaldradial.versatec.local> <alpine.BSF.2.00.1301180758460.96418@wonkity.com> <1358527685.32417.237.camel@revolution.hippie.lan> <20130118173602.GA76438@neutralgood.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello,

Thanks everyone for the input. Here's what I did.

* Checked power cables - everything okay
* Checked for bad capacitors - everything okay
* Ran memtest86+ test - everything okay

I have to mention also that the machine's hardware is new one - hdds,
motherboard, memory, power supply, etc.. The only thing I've replaced on
this machine are 2 x 750Gb old Seagate disks with brand new 2 x 1000Gb
Seagate ones. That was a month ago I think, but didn't have any issue until
recently. smartd doesn't say anything to be worried about these disks,
neither.

I am also using this machine as a build host for months already and I build
different projects on it using gcc, clang, scan-build and others and I've
never had problems building or testing anything, so I would have noticed
any issues if I had problems with gcc or clang for example.

On one of my old machines I had issues with memory and bad disks in the
past and then the system simply halted. I could see on the terminal that
the system halted and some info about the problem itself, but never had a
system that rebooted due to issues with hardware.

If it was a problem with memory or disks I would expect that the system
simply halts, and not reboots itself, and I'm a bit puzzled what could be
the root cause of this.

This system very rarely changes in terms of hardware or software. On the
software side the one change that was done few weeks ago was to allow
System V IPC primitives to be used in jails, but I don't see how that could
cause these issues.

I've had a power outage more than 2 weeks ago, but the UPS successfully
shutdown the system and it's batteries have been exhausted at the time of
the outage. They were recharged since then, but in the meantime I'm
thinking of unplugging this system from the UPS control cables, just to be
sure that this outage since 2 weeks ago didn't break the UPS in some way.

Thanks again,
Marin



On Fri, Jan 18, 2013 at 7:36 PM, <kpneal@pobox.com> wrote:

> On Fri, Jan 18, 2013 at 09:48:05AM -0700, Ian Lepore wrote:
> > I tend to agree, a machine that starts rebooting spontaneously when
> > nothing significant changed and it used to be stable is usually a sign
> > of a failing power supply or memory.
>
> Agreed.
>
> > But I disagree about memtest86.  It's probably not completely without
> > value, but to me its value is only negative:  if it tells you memory is
> > bad, it is.  If it tells you it's good, you know nothing.  Over the
> > years I've had 5 dimms fail.  memtest86 found the error in one of them,
> > but said all the others were fine in continuous 48-hour tests.  I even
> > tried running the tests on multiple systems.
> >
> > The thing that always reliably finds bad memory for me
> > is /usr/ports/math/mprime run in test/benchmark mode.  It often takes 24
> > or more hours of runtime, but it will find your bad memory.
>
> I've had "good" luck with gcc showing bad memory. If compiling a new kernel
> produces seg faults then I know I have a hardware problem. I've seen
> compilers at work failing due to bad memory as well.
>
> Some problems only happen with particular access patterns.  So if a
> compiler
> works fine then, like memtest86, it doesn't say anything about the health
> of the hardware.
>
> --
> Kevin P. Neal                                http://www.pobox.com/~kpn/
>       'Concerns about "rights" and "ownership" of domains are
> inappropriate.
>  It is appropriate to be concerned about "responsibilities" and "service"
>  to the community.' -- RFC 1591, page 4: March 1994
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>



-- 
Marin Atanasov Nikolov

dnaeon AT gmail DOT com
http://www.unix-heaven.org/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-UWtQ=iKwCZeaAiFRiir9E7_CLbTpFHrUq6TfbS%2BacdQ3AfQ>