Date: Mon, 21 Nov 2005 16:46:48 -0500 From: Kris Kennaway <kris@obsecurity.org> To: Walter Roberts <wroberts@securenym.net> Cc: freebsd-bugs@FreeBSD.org Subject: Re: misc/89103: gcc segmentation fault errors Message-ID: <20051121214648.GC7696@xor.obsecurity.org> In-Reply-To: <200511180600.jAI60WtR048667@freefall.freebsd.org> References: <200511180600.jAI60WtR048667@freefall.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--HG+GLK89HZ1zG0kk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 18, 2005 at 06:00:32AM +0000, Walter Roberts wrote: > The following reply was made to PR misc/89103; it has been noted by GNATS. >=20 > From: "Walter Roberts" <wroberts@securenym.net> > To: <bug-followup@FreeBSD.org>, <wroberts@securenym.net> > Cc: =20 > Subject: Re: misc/89103: gcc segmentation fault errors=20 > Date: Fri, 18 Nov 2005 00:55:21 -0500 >=20 > This is a multi-part message in MIME format. > =20 > ------=3D_NextPart_000_0007_01C5EBDA.C0C39C10 > Content-Type: text/plain; > charset=3D"iso-8859-1" > Content-Transfer-Encoding: quoted-printable > =20 > Ruled out hardware issue: > =20 > 1. Ran memtest 86 -- 7 full cycles (18 hours +/-). > 2. Reduced memory from 512Mb to 256Mb, repeated with different memory = =3D > chip. > 3. Ran full burncpu, passed. > =20 > Power supplies operating at nominal voltages. > =20 > System is apparently not using swap space for this process. > =20 > Replaced AMD K6 200 with old K6 slow processor=3D20 > =20 > Same failure. CPU temps are <33C in all cases. I don't know the exact = =3D > numbers, but it's typically around 28C. > =20 > This simply does not smell like a hardware problem [Snip historical anecdotes] > I'm willing to believe you, =3D > but I'd like to know why you're so convinced this is a hardware issue. = =3D20 Because I've been answering these questions for years, and I've seen dozens of people start out saying "I'm convinced it's not a hardware problem" and then working their way around to "it was a hardware problem, sorry for wasting your time". > The factors pointing against a hardware issue are: 1. The machine runs= =3D > everything else without a problem. 2. The machine ran non-stop =3D > (non-reboot) on a UPS for over a half a year without a glitch, (take =3D > that NT), and it seems to run f90 ok, and most cc's ok. 3. The system = =3D > runs very compute/memory intenstive monte carlo high energy physics code= =3D > that stores lots and lots of numbers to be written to files at the end = =3D > of the day and works consistantly. I would expect that if it weren't =3D > working properly, something would be amiss elsewhere and would expect a = =3D > panic at some point, or the system to just plain stop working. 4. From= =3D > the archives it appears that more than one of us is havng a similar =3D > problem. Not that I've seen. Where are these other reports? > 5. This exact system ran for years without a glitch running =3D > FreeBSD 2.2 and FreeBSD 3.2. =3D20 This kind of problem can be *very* workload-specific. i.e. everything will work fine except one task that tickles the machine in exactly the right way to trigger the hardware failure. Yes, I've seen exactly this scenario happen many times. > Is it safe to upgrade to GCC 4? Would that solve the problem? I'd be = =3D > happy to get it from gnu and try it, if it won't break anything. I =3D > don't have the time I used to have to go messing in operating system =3D > innards, much as I'd like to. It won't fix a hardware problem, naturally. You can't use a non-system compiler to compile FreeBSD, although you could compile your own code with it. > It is certainly possible that a pointer is misprogrammed (or perhaps the= =3D > fixed point register in the AMD chip doesn't work right??) and picks up= =3D > something funny that causes the compiler to have the "segementation =3D > fault 11" That fault is consistent! I'm sure it's consistent on this machine, but you're really reaching by suggesting that it's a CPU bug affecting thousands of users :-) Kris P.S. Did you say in a previous email that the machine worked fine when it was running at a site at high altitude, but stopped working when you moved it and then upgraded it? That's a big clue that says something broke at that point (or before, but was masked by lower ambient temperatures, or something). --HG+GLK89HZ1zG0kk Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFDgkBIWry0BWjoQKURApoEAKCf8k8Rr7BmCSdba5re6bb815q9hACdHVsO UTFTHF+G/NJsWx7rQQp3ZFE= =9GkX -----END PGP SIGNATURE----- --HG+GLK89HZ1zG0kk--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051121214648.GC7696>