From owner-freebsd-hackers Mon Feb 20 10:59:58 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.9/8.6.6) id KAA01769 for hackers-outgoing; Mon, 20 Feb 1995 10:59:58 -0800 Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.19]) by freefall.cdrom.com (8.6.9/8.6.6) with ESMTP id KAA01758 for ; Mon, 20 Feb 1995 10:59:52 -0800 Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.9/8.6.9) id FAA29796; Tue, 21 Feb 1995 05:55:18 +1100 Date: Tue, 21 Feb 1995 05:55:18 +1100 From: Bruce Evans Message-Id: <199502201855.FAA29796@godzilla.zeta.org.au> To: freebsd-hackers@FreeBSD.org, mark@communica.oz.au Subject: Re: 2.0-950210-SNAP hangs Sender: hackers-owner@FreeBSD.org Precedence: bulk >Now, the problem: At periods ranging from every 6 hours through to every >2 days, the system hangs for no apparent reason. When it hangs, it goes >completely catatonic: It doesn't respond to pings from other hosts on my >ethernet, the console doesn't work, all disk activity stops; nothing can >get any response out of it. >Now, normally this wouldn't be a problem; debugging things like this is >what kernel debuggers are for, right? Well, no, not really -- When it >hangs, it is obviously splx()'ed to a priority higher than the console, >'cos I can't jump to the debugger (or, indeed, get anything else on the >console happening). If it is splx()'ed to a value like that, that would >tend to suggest that the root cause is either something to do with the >network or something to do with the disks (it could be anything with >a higher priority than the console, I know, but those two seem most >likely to me). The disk priority is actually independent of the console priority. I have used methods like `w/x tty_imask 0' to clear the tty interrupt mask and allow entry to ddb at all times. This is too dangerous on a busy machine. A variant of this should be safe enough: allow keyboard interrupts at almost all times and don't use the keyboard or screen until the system hangs. To allow the interrupts, enter ddb and disassemble the keyboard interrupt handler (Xintr1) to find the `testb $0x2,%al' instruction and replace the 0x02 by 0 using `w/b'. Here the byte to be replaced is at Xintr1+0x2c and the replacement command is `w/b xintr1+0x2c 0'. Bruce