Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Apr 1996 13:58:20 -0600
From:      Nate Williams <nate@sri.MT.net>
To:        "Marc G. Fournier" <scrappy@ki.net>
Cc:        Nate Williams <nate@sri.MT.net>, current@FreeBSD.org
Subject:   Re: MotherBoard Jumper Settings... 
Message-ID:  <199604251958.NAA19541@rocky.sri.MT.net>
In-Reply-To: <Pine.NEB.3.93.960425153450.2323E-100000@freebsd.ki.net>
References:  <199604251926.NAA19429@rocky.sri.MT.net> <Pine.NEB.3.93.960425153450.2323E-100000@freebsd.ki.net>

next in thread | previous in thread | raw e-mail | index | archive | help
> > Does that mean your box is now stable?  If so, that's *great* news.
> >
> 	Nope, just that I'm going to submit new ones now that I
> think I've gone over everything with a fine toothed comb and caught
> any hardware mis-configurations I can find :)

OK, here's some advice.  Generally speaking most folks *shouldn't* have
to go through this many steps, but in Marc's case where he's having
problems that no-one else is seeing, this might be helpful.

Step 0:
- Remove *ALL* NFS and DOS mounts from your system.  The NFS and DOS
  filesystems are slightly broken, and can cause weird problems.

Step 1: - Disable *ALL* caches on your machine in the BIOS.  Set the
  memory wait states to the higher number and your bus speed to ~8Mhz (for
  ISA/EISA boxes).

Test, test, test, test, test.

Do the errors still occur?  If so, move onto step 2, else assume it's a
hardware problem, probably involving the L2 cache (motherboard and/or
memory) or BIOS setup.

[ Leave the cache's disabled, just in case they are *also* a problem ]

Step 2:
- Make *SURE* (!!!!) that your SCSI cables are good and everything is
  terminated correctly.  This means that there should be 2 termination
  points, one at one end and one at the other.  Also, if you have
  external devices, remove them and terminate your SCSI card, just to
  rule out bad external SCSI cables (very common).  If you've got a
  scanner, remove it.  (Scanner's are notorious for screwing things up
  under load.)

Test, test, test, test, test.

Do the problems still occur?  If so, move onto step 3, else assume it's
a hardware problem with SCSI termination and/or cabling.

Step 3:
- Remove *ALL* non-essential hardware from the system.  This means
leaving a disk big enough for the OS and some sources, and necessary
cards.  Ultimately, this would mean only have a video card, hard/floppy
card, and possibly an ethernet card.

Test, test, test, test, test.

The problems still occur?  Then it's still possible that it's
hardware, move onto step 4, else assume it's a misconfigured card.

Step 4:
- Swap out your memory with known-good memory, your disk with a known-
  good disk, and your controller with a known-good controller.  (Heck,
  go IDE at this point.)  Re-install FreeBSD to make sure all the bits
  aren't corrupted from a previously bad hardware setup.

Test, test, test, test, test.

It *should* work now, because it was a hardware problem in the first
place, given the consistency and frequency of your problems.

Quick history note:

The original 'interim' (pre-FreeBSD, pre-WC) development was a 486/33
box that hosted the development when I was a student at Montana State
University.  This box (which is still in service today as my home box)
would occasionally get NMI's from faulty hardware under heavy load.
Most of the time it worked, but it was annoying.

Almost 3 years after I got the box I finally got tired of it, and
decided to replace the motherboard.  Unfortunately, the board I got was
DOA, but I noticed that the new board had faster cache ram than my
original motherboard.  On a whim, I swapped out the cache on my old
(working but NMI) Mboard with the cache from the DOA board, and it
seemed to work.  From that day one, I have been unable to produce an NMI
on that board no matter *what* the load.  The machine has been 'rock'
stable ever since I re-installed FreeBSD on it.

However, before I installed FreeBSD on it I got random crashes b/c of FS
corruption.  Binaries, directories, inodes, and all sorts of other files
were corrupt from the the previous hardware misconfiguration.  So, even
after I fixed my hardware problems, I still got *random* crashes.  I
backed up what I could of the data (using tar to avoid FS corruptions),
and then re-installed and restored all my previos files and I haven't
had a crash on it yet.  The only reboots occur when I turn-on my DAT
drive to do backups, and then reboot to turn it off since I don't like
to leave it on.




Nate



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199604251958.NAA19541>