Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 15 Jun 1997 11:11:46 -0500 (CDT)
From:      Nick Johnson <spatula@gulf.net>
To:        freebsd-questions@freebsd.org
Subject:   Reasonable diagnosis?
Message-ID:  <Pine.BSI.3.96.970615110245.11599B-100000@pompano.pcola.gulf.net>

next in thread | raw e-mail | index | archive | help
Greetings all,

   For quite some time, my system has been unstable, and I've been trying
to find the solution.  I think that I finally have, but I'd like to run
my diagnosis by more people to see if I'm more or less correct, as I am no
hardware expert.

   The biggest problem I have is the ominous "Page fault while in kernel
mode" bomb.  This happens all the time (at least once a day) during
various processes.  I've tried things like removing all unnecessary cards,
disabling cache, etc, but the faults continued.

   Last week, my /var partition crashed (bad superblock).  It made me
fairly sad, especially when fsck died with a floating point exception when
I tried to repair the problem (!).  I was able to run newfs on /var and
repair things.

   Last night, my system went fruitcake.  Every so often two columns of
video shifted up a couple rows and random letters started blinking.  When
I rebooted, I found that I had a bad superblock on /.  Again, i was sad,
but reinstalled.

   Today while building my kernel the compile failed where it shouldn't
have, complaining of a non-digit where a digit should have been.  Sure
enough, there was a letter "s" stuck in the middle of a number where it
had not been the last time I tried to compile.

   My analysis of these events leads me to believe that most of my
problems are the result of a flaky disk controller; the page faults could
very well have been a result of reading bad data off the swap partition on
the disk, which could conceivably make the OS go berzerk, resulting in any
number of strange things happening, such as the video spasm I got last
night (which had also happened a few months ago before either partition
crashed.  at that time it looked like a program mistakenly thought its
stack belonged in my video ram).

   The disk controller is a WDC AC31600H on a WD Caviar drive.  I've been
told that the WD controllers beginning with a 3 cause problems.  Can
anyone share experiences of this?

   Any insight or corrections to my hypothesis are most welcome and
appreciated.
  
   Nick

--
"...some people without brains do an awful lot of talking"
	-- the Scarecrow (The Wizard of Oz)
Nick Johnson, version 1.0 http://www.pcola.gulf.net/~spatula/




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSI.3.96.970615110245.11599B-100000>