Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 2 Jan 2007 10:04:38 -0800
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        Gavin Atkinson <gavin.atkinson@ury.york.ac.uk>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Interrupt (SCSI?) hang on 4.x
Message-ID:  <20070102180438.GA81454@icarus.home.lan>
In-Reply-To: <1167755991.84652.6.camel@buffy.york.ac.uk>
References:  <20070102153608.GA78405@icarus.home.lan> <1167755991.84652.6.camel@buffy.york.ac.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jan 02, 2007 at 04:39:51PM +0000, Gavin Atkinson wrote:
> On Tue, 2007-01-02 at 07:36 -0800, Jeremy Chadwick wrote:
> > # vmstat -i
> > ata0 irq14                      6          0
> > fxp0 irq10                  14874         28
> > mux irq11                   65028        125
> > fdc0 irq6                       1          0
> > sio0 irq4                     948          1
> > clk irq0                   516187        998
> > rtc irq8                    66071        127
> > Total                      663115       1282
> 
> Do any of these numbers continue to increase after the hang?  You may
> find that if you are already logged in over the serial port before the
> hang and have run vmstat recently, it'll still be runnable due to it
> being cached.

When this problem is happening, at the login: prompt (via serial
console) once one types "root" and hits enter, one never gets a
Password: prompt.  This is likely because getpwent(3) and friends
attempt to read passwd/master.passwd from the disk, which obviously
hung due to the SCSI controller.

Therefore, one cannot log in and run any commands.

> If the serial port is dead, you will probably still find you can get
> output from the serial port, so start "date; vmstat -i" in a loop over
> the serial port before it hangs, and watch the output once it wedges.

Once the machine is hung like described, since running shell
commands (date/vmstat/even spawning sh itself) involves disk I/O,
this won't work.  If date and vmstat could be cached in memory
somewhere, this might work, but I don't know how one would do that.
(A memory filesystem could work, but pretty much all of / would
have to be there for this to work...)

The best I could do would be to have a cronjob or a process running
in a screen session which does date && vmstat -i over and over to a
log file, and examine that log once the machine hung like described.
This wouldn't tell us if the numbers were increasing/fluxuating
*after* the hang, though.  :-(

-- 
| Jeremy Chadwick                                 jdc at parodius.com |
| Parodius Networking                        http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070102180438.GA81454>