Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Jun 2000 10:58:19 -0400 (EDT)
From:      Luoqi Chen <luoqi@watermarkgroup.com>
To:        current@FreeBSD.ORG, dillon@FreeBSD.ORG, jruigrok@via-net-works.nl
Cc:        ps@FreeBSD.ORG, stable@FreeBSD.ORG, wpaul@FreeBSD.ORG
Subject:   Re:  Weird 4.0-STABLE problem, might be related to 5.0 as well
Message-ID:  <200006131458.e5DEwJb03585@lor.watermarkgroup.com>

next in thread | raw e-mail | index | archive | help
> This is the third time this happened to a 4.0-STABLE host of ours.
> 
> The problem starts with havnig a number of processes which are unable to
> be killed.  So we want to reboot the box.
> 
> All goes well, bufdaemon and syncer stop normally.
> 
> Then it gets to
> 
> syncing disks
> done.
> 
> And there it hangs.  At this point only the NIC is reachable on its IP
> address for ping.
> 
At this point the kernel is trying to unmount all filesystems, it hangs
probably because it's waiting for locks those unkillable processes hold.

> So I break into DDB and get this from a trace:
> 
The trace didn't reveal anything wrong: xl is updating its status during a
scheduled timeout.

The best way to diagnose the problem is to work on the live system when
the same symptom occurs (unkillable process), find out which channels
these processes are sleeping on and why they're not waken up (hardware
failure might contribute to it).

A `ps axl' report would be very helpful. For those unkillable processes,
you might want to report the backtrace for each, here's how to get them,
	# gdb -k /kernel /dev/mem
	(kgdb) proc <pid>
	(kgdb) bt

-lq


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200006131458.e5DEwJb03585>