Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Oct 2003 17:48:26 +0100
From:      Matthew Seaman <m.seaman@infracaninophile.co.uk>
To:        Hani Mouneimne <hani@nimsay-networks.com>
Cc:        "." <freebsd-stable@freebsd.org>
Subject:   Re: Crashing box
Message-ID:  <20031013164826.GB20434@happy-idiot-talk.infracaninophile.co.uk>
In-Reply-To: <40e792e94c57e8fc779e568f066edcfb@194.83.224.1>
References:  <40e792e94c57e8fc779e568f066edcfb@194.83.224.1>

next in thread | previous in thread | raw e-mail | index | archive | help

--i9LlY+UWpKt15+FH
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Oct 13, 2003 at 04:19:58PM +0200, Hani Mouneimne wrote:
> Hey all,
>=20
> I was wondering if you could help with this issue.
>=20
> Eeverytime I run a make/compile on my freebsd 4.8 p10 systrem it has a
> complete spaz and reboots. Usually cores and someimes gives no messages at
> all in the logfiles.=20
> Here is the latest output of a makeworld I am doing
> =3D"sh /usr/src/tools/install.sh"=20
> PATH=3D/usr/obj/usr/src/i386/usr/sbin:/usr/obj/usr/src/i386/usr/bin:/usr/=
obj/usr/src/i386/usr/games:/sbin:/bin:/usr/sbin:/usr/bin
> make -f Makefile.inc1 par-depend
> *** Signal 11
> *** Signal 11
> Killed

I assume you've read the FAQ entry on Sig11:

    http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#=
SIGNAL11

Signal 11, especially if it occurs in an unpredictable place during
compiles or other heavy weight operations, is a clear sign of hardware
problems, but I think you know that from what you say next.
=20
> This is just one of many crashes of similar scale, Sefaulting is also
> common.=20
> I have changed the entire server hardware including the hard drive and it=
 is
> still doing this. It was fine with FreeBSD p0 so I am wondering it it cou=
ld
> be some code issue.

Tricky.  Are you sure you've swapped out *all* of the hardware?  SEGVs
are typically due to memory or CPUs going bad, but there are several
other considerations.

    - memory can be marginal: tests like running memtest86 won't
      necessarily pick up all failure cases, although when they do
      find a problem they are generally right.

      If the memory timing isn't quite in spec, or if there's a
      problem that only occurs when the memory stick heats up due to
      high activity then you may not pick it up except under load.

    - SEGVs can also occur due to bad memory in such devices as RAID
      or graphics controllers, or even in the CPU cache.

    - Overheating will generally cause stressed components to fail in
      this sort of way.  Such failures will definitely be correlated
      with high system activity.  CPUs generally do have thermal
      cutouts that just halt the machine, but thermal problems in
      other components can crash the system as you've seen.
      Northbridge and Southbridge chipsets on the motherboard can be
      an Achilles' heel in this respect.

      Check that all of the fans are working correctly, and that all
      of the ventilation holes/dust filters are clear and that there
      is sufficient room around the machine to permit free flow of
      air.  If you've added extra components inside the system is the
      cooling airflow still adequate?

    - PSUs are also capable of causing such symptoms, especially if
      they aren't actually quite powerful enough to drive all your
      hardware.  If the system voltages aren't properly stable then
      all sorts of undefined behaviour can occur.  Modern 1GHz+ boxes
      generally need a 300W PSU, and the PSU tends to be both one of
      the least reliable parts of the system and one of the items
      where box manufacturers will be most agressive on price when
      sourcing components.

    - Even the machine *case* can cause this sort of problem.  I've
      seen a machine where all of the electronics, PSU, fans etc. were
      swapped out, but the machine still keeled over when the case was
      screwed back together.  Turned out that the case itself was a
      bit distorted, and screwing the case on resulted in bending the
      motherboard in a way that was clearly not good for it,
      especially when it warmed up a bit as well.  Changing out the
      case produced a working system...

	Cheers,

	Matthew

--=20
Dr Matthew J Seaman MA, D.Phil.                       26 The Paddocks
                                                      Savill Way
PGP: http://www.infracaninophile.co.uk/pgpkey         Marlow
Tel: +44 1628 476614                                  Bucks., SL7 1TH UK

--i9LlY+UWpKt15+FH
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/itdadtESqEQa7a0RAkZEAJ4kTbYLiNPCO4xDwMDApXMz2FI2SQCfZXuz
klMmBVj5iCj2ET3WhBnQGe0=
=tFXY
-----END PGP SIGNATURE-----

--i9LlY+UWpKt15+FH--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031013164826.GB20434>