Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Nov 2005 16:46:48 -0500
From:      Kris Kennaway <kris@obsecurity.org>
To:        Walter Roberts <wroberts@securenym.net>
Cc:        freebsd-bugs@FreeBSD.org
Subject:   Re: misc/89103: gcc segmentation fault errors
Message-ID:  <20051121214648.GC7696@xor.obsecurity.org>
In-Reply-To: <200511180600.jAI60WtR048667@freefall.freebsd.org>
References:  <200511180600.jAI60WtR048667@freefall.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--HG+GLK89HZ1zG0kk
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Nov 18, 2005 at 06:00:32AM +0000, Walter Roberts wrote:
> The following reply was made to PR misc/89103; it has been noted by GNATS.
>=20
> From: "Walter Roberts" <wroberts@securenym.net>
> To: <bug-followup@FreeBSD.org>, <wroberts@securenym.net>
> Cc: =20
> Subject: Re: misc/89103: gcc segmentation fault errors=20
> Date: Fri, 18 Nov 2005 00:55:21 -0500
>=20
>  This is a multi-part message in MIME format.
> =20
>  ------=3D_NextPart_000_0007_01C5EBDA.C0C39C10
>  Content-Type: text/plain;
>  	charset=3D"iso-8859-1"
>  Content-Transfer-Encoding: quoted-printable
> =20
>  Ruled out hardware issue:
> =20
>  1.  Ran memtest 86 -- 7 full cycles (18 hours +/-).
>  2.  Reduced memory from 512Mb to 256Mb, repeated with different memory =
=3D
>  chip.
>  3.  Ran full burncpu, passed.
> =20
>  Power supplies operating at nominal voltages.
> =20
>  System is apparently not using swap space for this process.
> =20
>  Replaced AMD K6  200 with old K6 slow processor=3D20
> =20
>  Same failure.  CPU temps are <33C in all cases.  I don't know the exact =
=3D
>  numbers, but it's typically around 28C.
> =20
>  This simply does not smell like a hardware problem

[Snip historical anecdotes]

>   I'm willing to believe you, =3D
>  but I'd like to know why you're so convinced this is a hardware issue. =
=3D20

Because I've been answering these questions for years, and I've seen
dozens of people start out saying "I'm convinced it's not a hardware
problem" and then working their way around to "it was a hardware
problem, sorry for wasting your time".

>  The factors pointing against a hardware issue are:  1.  The machine runs=
 =3D
>  everything else without a problem.  2.  The machine ran non-stop =3D
>  (non-reboot) on a UPS for over a half a year without a glitch, (take =3D
>  that NT), and it seems to run f90 ok, and most cc's ok.  3.  The system =
=3D
>  runs very compute/memory intenstive monte carlo high energy physics code=
 =3D
>  that stores lots and lots of numbers to be written to files at the end =
=3D
>  of the day and works consistantly.  I would expect that if it weren't =3D
>  working properly, something would be amiss elsewhere and would expect a =
=3D
>  panic at some point, or the system to just plain stop working.  4.  From=
 =3D
>  the archives it appears that more than one of us is havng a similar =3D
>  problem.

Not that I've seen.  Where are these other reports?

>  5.  This exact system ran for years without a glitch running =3D
>  FreeBSD 2.2 and FreeBSD 3.2. =3D20

This kind of problem can be *very* workload-specific.  i.e. everything
will work fine except one task that tickles the machine in exactly the
right way to trigger the hardware failure.

Yes, I've seen exactly this scenario happen many times.

>  Is it safe to upgrade to GCC 4?  Would that solve the problem?  I'd be =
=3D
>  happy to get it from gnu and try it, if it won't break anything.  I =3D
>  don't have the time I used to have to go messing in operating system =3D
>  innards, much as I'd like to.

It won't fix a hardware problem, naturally.  You can't use a
non-system compiler to compile FreeBSD, although you could compile
your own code with it.

>  It is certainly possible that a pointer is misprogrammed (or perhaps the=
 =3D
>  fixed point  register in the AMD chip doesn't work right??) and picks up=
 =3D
>  something funny that causes the compiler to have the "segementation =3D
>  fault  11"  That fault is consistent!

I'm sure it's consistent on this machine, but you're really reaching
by suggesting that it's a CPU bug affecting thousands of users :-)

Kris

P.S. Did you say in a previous email that the machine worked fine when
it was running at a site at high altitude, but stopped working when
you moved it and then upgraded it?  That's a big clue that says
something broke at that point (or before, but was masked by lower
ambient temperatures, or something).

--HG+GLK89HZ1zG0kk
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (FreeBSD)

iD8DBQFDgkBIWry0BWjoQKURApoEAKCf8k8Rr7BmCSdba5re6bb815q9hACdHVsO
UTFTHF+G/NJsWx7rQQp3ZFE=
=9GkX
-----END PGP SIGNATURE-----

--HG+GLK89HZ1zG0kk--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051121214648.GC7696>