From owner-freebsd-questions  Fri Sep 27 00:39:30 1996
Return-Path: owner-questions
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id AAA00757
          for questions-outgoing; Fri, 27 Sep 1996 00:39:30 -0700 (PDT)
Received: from diablo.ppp.de (diablo.ppp.de [193.141.101.34])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id AAA00714
          for <questions@FreeBSD.ORG>; Fri, 27 Sep 1996 00:39:26 -0700 (PDT)
Received: from allegro.lemis.de by diablo.ppp.de with smtp
	(Smail3.1.28.1 #1) id m0v6XW0-000QlRC; Fri, 27 Sep 96 09:39 MET DST
From: grog@lemis.de (Greg Lehey)
Organisation: LEMIS, Schellnhausen 2, 36325 Feldatal, Germany
Phone: +49-6637-919123
Fax:   +49-6637-919122
Received: (grog@localhost) by allegro.lemis.de (8.6.9/8.6.9) 
	id IAA00555; Fri, 27 Sep 1996 08:10:25 +0200
Message-Id: <199609270610.IAA00555@allegro.lemis.de>
Subject: Re: 2.1.5-RELEASE halts for no apparent reason URGENT!!
To: mcellroy@jxm203.rh.psu.edu (Jonathan A. McEllroy)
Date: Fri, 27 Sep 1996 08:10:25 +0200 (MET DST)
Cc: questions@FreeBSD.ORG
In-Reply-To: <199609261501.LAA00324@jxm203.rh.psu.edu> from "Jonathan A. McEllroy" at Sep 26, 96 11:01:29 am
X-Mailer: ELM [version 2.4 PL23]
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Sender: owner-questions@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Jonathan A. McEllroy writes:
>
> The problem is that about 2-7 hours after boot the machine will just halt. It becomes wedged so
> hard that I can't even get through from my terminal on the serial port. Also there are no
> messages of any kind in syslog, and the hard drive is not active. This can happen under extreme
> load or no load at all, when people are logged in or no one is logged in.

Try adding this line to your config file:

options 	DDB			# kernel debugger

Then do a 'config -g MYKERNEL' and build a new kernel with debugging
symbols.  Save it as /var/crash/kernel.gdb.  After booting, confirm
that if you hit CTRL-ALT-ESCAPE, you will drop into the kernel
debugger.  Don't stay in the debugger: it halts normal system
operation.  You can get out again with the 'c' command.

Also, create a directory /var/crash if it doesn't exist, and ensure
that you have something like this in your /etc/sysconfig:

   # Set to the name of the device for kernel crashdumps, or `off' to
   # disable any statically configured dumpdev, or NO for no change.
   # The device should normally be one of the swap devices specified
   # in /etc/fstab.
   dumpdev=/dev/wd0s1b
   
   # Set to YES if you want kernel crashdumps to be saved for debugging
   savecore=YES

Then, when the system hangs, press CTRL-ALT-ESCAPE.  No reaction?
Then the machine's really hosed, and the probability of a hardware
problem becomes higher.  Otherwise:

1.  Enter 't' (stack trace) and write down the output.  Sorry, there's
    no way to save this output automatically.
2.  Enter 'pa' (panic) and let it write its dump and reboot.  You may
    have to press return a couple of times before it writes its dump.
3.  After reboot, do:

    $ cd /var/crash
    $ ls				(look at the names of the files)
    $ gdb -k kernel.gdb vmcore.0	(vmcore.x is the dump you want to look at)
    (gdb) bt				(do a backtrace)

Send me or the list the backtrace and we'll at least be able to point
you in the right direction.

It's possible that various things can go wrong along the way, enough
that I don't want to describe all of them.  If they do, again, let me
know.

Greg