Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Nov 2009 04:59:03 -0800
From:      David Wolfskill <david@catwhisker.org>
To:        Peter Jeremy <peter@vk2pj.dyndns.org>
Cc:        hardware@freebsd.org
Subject:   Re: 7.2-STABLE i386 box crashing -- clues?
Message-ID:  <20091112125903.GA1631@albert.catwhisker.org>
In-Reply-To: <20091112062708.GA16648@server.vk2pj.dyndns.org>
References:  <20091111173747.GA1150@albert.catwhisker.org> <20091112062708.GA16648@server.vk2pj.dyndns.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--jI8keyz6grp/JLjh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Nov 12, 2009 at 05:27:09PM +1100, Peter Jeremy wrote:
> I can't offer any solutions but I have some more questions...

I appreciate the help!

> ...
> >Every once in a while, it just crashes -- hard.  It loses video output
> >at that point; Ctl+Alt+Esc doesn't appear to change anything; entering
> >(say) "reset" blindly at that point has no apparent effect.
>=20
> Roughly how often?

For the current month:

albert(7.2-S)[8] last reboot shutdown
reboot           ~                         Thu Nov 12 03:04
reboot           ~                         Wed Nov 11 20:06
reboot           ~                         Wed Nov 11 14:42
shutdown         ~                         Wed Nov 11 14:40
reboot           ~                         Wed Nov 11 14:35
reboot           ~                         Wed Nov 11 10:05
reboot           ~                         Wed Nov 11 09:09
reboot           ~                         Wed Nov 11 04:25
reboot           ~                         Tue Nov 10 12:49
reboot           ~                         Mon Nov  9 14:52
reboot           ~                         Sun Nov  8 17:42
reboot           ~                         Sat Nov  7 04:22
reboot           ~                         Fri Nov  6 21:43
reboot           ~                         Fri Nov  6 19:00
reboot           ~                         Fri Nov  6 16:20
shutdown         ~                         Fri Nov  6 16:17
reboot           ~                         Fri Nov  6 16:03
reboot           ~                         Fri Nov  6 13:07
reboot           ~                         Fri Nov  6 09:46
reboot           ~                         Thu Nov  5 16:41
reboot           ~                         Thu Nov  5 13:32
reboot           ~                         Thu Nov  5 12:59
reboot           ~                         Thu Nov  5 10:17
reboot           ~                         Thu Nov  5 04:26
reboot           ~                         Wed Nov  4 20:32
reboot           ~                         Wed Nov  4 15:48
reboot           ~                         Wed Nov  4 10:37
reboot           ~                         Tue Nov  3 13:15
reboot           ~                         Tue Nov  3 10:55
reboot           ~                         Tue Nov  3 04:16
reboot           ~                         Mon Nov  2 18:13
reboot           ~                         Sun Nov  1 20:03
shutdown         ~                         Sun Nov  1 20:01
reboot           ~                         Sun Nov  1 17:10
reboot           ~                         Sun Nov  1 13:51
shutdown         ~                         Sun Nov  1 13:48

wtmp begins Sun Nov  1 05:08:18 PST 2009
albert(7.2-S)[9]=20

The "solo reboots" are crashes; those paired with "shutdown" entries are
controlled.

> Has anything unusual happened lately?  Brownout, blackout, power surge,
> lightning, heatwave, ...

Nothing linked to the crashes.  I pulled the UPS out of service
some weeks ago because it needs new batteries; I need to get those
ordered.  But the crashes were happening before that, in any case.

> >accordingly, had attached a SCSI host adaptor via PCI riser card.  Since
> >I had nothing actually connected to the card, I pulled it out of the
> >machine before bringing it back up.
>=20
> Did you also pull the riser card?  Riser cards don't have a spectacularly
> high reputation.

That's actually what I pulled.  The SCSI card itself is still physically
in the chassis, merely with an air gap between itself at the system
board (because the riser card is now in a closet).

> > (I also fleft around for
> >excessively warm spots; nothing.  All fans spin up, as well.)
>=20
> I don't suppose you also studied the capacitors on the motherboard.
> Are any showing any signs of bulges?

I'll take another look for those; I recall that electrolytics exhibit
that as a sign of failure -- thanks for the reminder.

> Have you tried reseating everything?

The memory, yeah (even before replacing it); also swapped the DIMMs.
Only other thing that can be re-seated (desktop system board, so most
everything is built-in) would be the CPU, and I'm not quite sure how
that heat sink works.  I did re-seat some power connectors.

> >Flaky CPU?  Flaky power supply?  How might I tell?
>=20
> CPU shouldn't go flaky unless it's been overheated.  In my experience,
> PSUs are the least reliable part of consumer-grade hardware but about
> the only way to check is to swap it.

:-}

> If you've got a DMM, you could check all the rails but there are
> lots of failure modes that won't show up that way.

Yeah, I kinda figured that.  I do have a DMM (used to have a VTVM), but
figured the meter wouldn't show transient dips or whatever too well.

> Have you checked the voltage/temperature screen in the BIOS?  Does
> anything look abnormal?

Did a couple of reality checks in that way as detours during some of the
reboots.  Nothing interesting there at all.  (And I have seen a case in
the past -- though with a 1U box) where that test definitely showed
something wrong (CPU temp climbing about 1C every 30 seconds, IIRC).

> Are you using a PS/2 or USB keyboard?

PS/2 via KVM.  I don't have any USB keyboarda.  :-}

> Are you running X?

Yes; the machine is configured to start xdm on transition to
multi--user, as my spouse used to use it as a desktop.  (She's gone back
to using its predecessor, a 4.11-STABLE machine, in frustration.)

> At this stage, my suggestion would be to try swapping the PSU.

Thanks.  I'll discuss it with the "family CFO."

Peace,
david
--=20
David H. Wolfskill				david@catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.

--jI8keyz6grp/JLjh
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.13 (FreeBSD)

iEYEARECAAYFAkr8BpUACgkQmprOCmdXAD0yeQCfZmK6zwOTfDdQ2TIdjf9Df8QU
G1MAnR81BXl85TGJIbjQ21LZqBHoFOin
=QGTk
-----END PGP SIGNATURE-----

--jI8keyz6grp/JLjh--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20091112125903.GA1631>