Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Sep 2000 12:49:49 -0500 (CDT)
From:      Chris Dillon <cdillon@wolves.k12.mo.us>
To:        Michael Allman <msa@dinosauricon.com>
Cc:        BSD <bsd@shell-server.com>, stable@FreeBSD.ORG
Subject:   Re: Constant panics on 4.1-STABLE!
Message-ID:  <Pine.BSF.4.21.0009211229300.31769-100000@mail.wolves.k12.mo.us>
In-Reply-To: <Pine.BSF.4.21.0009211252260.17806-100000@dinosaur.umbc.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 21 Sep 2000, Michael Allman wrote:

> On Thu, 21 Sep 2000, Chris Dillon wrote:
> 
> > On Thu, 21 Sep 2000, Michael Allman wrote:
> > 
> > > I am having problems with random panics/reboots as well.  I am using two
> > > sticks of Corsair 128MB ECC memory.  My motherboard uses the GX chipset.  
> > > Crashes occur when I am using both sticks and one or the other stick.  
> > > Considering that I have been using this memory reliably for about a year I
> > > find it hard to believe that both sticks would go bad simultaneously.  I
> > > have been using CAS3, ECC settings in my bios.
> > 
> > It probably isn't the memory, then (Corsair is pretty good).
> > 
> > > > BTW, crash dumps will be meaningless if this really is a hardware
> > > > problem.
> > > 
> > > Equivalent to this statement is the following.  If the crash dumps are not
> > > meaningless (meaningful?), then this is not a hardware problem.  I would
> > > say it is still worthwhile to look at crash dumps.
> > 
> > Wrong.  You have no way of knowing just by looking at a crashdump if
> > the problem was caused by random memory corruption, CPU flakyness, or
> > whatever, or if it was a real software problem.  Crashdumps are only
> > useful if you _know_ flaky hardware wasn't the culprit.  If you hand a
> > developer a crashdump caused by hardware flakyness, you are going to
> > send them on a wild goose-chase and they will never find a real
> > problem with the code where the failure supposedly occurred.  If
> > they're really lucky, they'll look at a crashdump and say "It is not
> > at all possible for this to have happened because of software.  It
> > must have been caused by hardware".  I wouldn't put that burden on any
> > of these developers, however.  This has already happened at least a
> > few times, and usually the developer wastes days or weeks looking for
> > a non-existent problem until the original finder of the problem comes
> > back and says "Duh, I'm REALLY sorry guys, but I found the culprit, it
> > was my hardware".  You can find at least a few of these archived in
> > our mailing lists.
> 
> Let's wait and see what the other guy who's having these problems comes up
> with (Bart, I think).  Also, I think I know why I'm not getting crash
> dumps sometimes.  When it starts to take a dump, if you press a key on the
> keyboard it aborts, yes?  Since I use my computer for application work
> perhaps my typing at the keyboard is aborting the dump before it finishes.

Yes, it is possible that is what is preventing the crashdumps.  If you
are in X whenever you get these unexplained reboots, it might help to
hook up a serial console so that you can monitor the goings-on even if
the primary display never makes it back to the real world when the
crash occurs.

> > > I have ECC RAM with ECC enabled.  I get crashes anyway.  Would you say
> > > then that it's not the RAM?
> > 
> > Then it most likely isn't the RAM.  That does not, however, rule out
> > the CPU, support chipsets, or even a weird expansion card that is
> > spewing enough RF noise to cause data corruption on nearby devices.
> 
> I have tried using another CPU to no avail.  This other CPU is
> currently in use in another system without problems.  I have
> swapped out every one of my expansion cards, and then some.  One
> thing that comes to mind is that I haven't tried a different
> ethernet card (my ethernet is on the motherboard).  I will try
> that.  I am also not excluding the possibility of a bad chipset.  
> I may try using a different motherboard.  It's really just a
> matter of finding the time to do these things.

Try a new power supply, too.  This won't make you feel any better, but
I recently fought a system that I never found out what was causing the
problems.  I had gone through many combinations of motherboards, CPUs
(both Intel and AMD, this was a Socket-7 system), power supplies,
ethernet cards, video cards, RAM, hard drives, CDROM drives, etc...
NOTHING was the same when I was done, yet I kept having the same
problems.  It was almost as if the case itself was posessed (or maybe
I was cursed), as that was the only thing I hadn't changed.  I knew it
was a hardware problem because the weirdness occurred regularly in
both FreeBSD and NT4 Server, even during initial installations of
FreeBSD or NT4.  I finally gave up and just got an entirely new
system.  I haven't had a single problem with it.


-- Chris Dillon - cdillon@wolves.k12.mo.us - cdillon@inter-linc.net
   FreeBSD: The fastest and most stable server OS on the planet.
   For Intel x86 and Alpha architectures. ( http://www.freebsd.org )




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0009211229300.31769-100000>