From owner-freebsd-questions  Thu Jan  6 12:27:17 2000
Delivered-To: freebsd-questions@freebsd.org
Received: from cc942873-a.ewndsr1.nj.home.com (cc942873-a.ewndsr1.nj.home.com [24.2.89.207])
	by hub.freebsd.org (Postfix) with ESMTP id BA79F1570C
	for <freebsd-questions@FreeBSD.ORG>; Thu,  6 Jan 2000 12:27:13 -0800 (PST)
	(envelope-from cjc@cc942873-a.ewndsr1.nj.home.com)
Received: (from cjc@localhost)
	by cc942873-a.ewndsr1.nj.home.com (8.9.3/8.9.3) id PAA20493
	for freebsd-questions@FreeBSD.ORG; Thu, 6 Jan 2000 15:31:40 -0500 (EST)
	(envelope-from cjc)
From: "Crist J. Clark" <cjc@cc942873-a.ewndsr1.nj.home.com>
Message-Id: <200001062031.PAA20493@cc942873-a.ewndsr1.nj.home.com>
Subject: Hung NFS Mount
To: freebsd-questions@FreeBSD.ORG (FreeBSD Questions)
Date: Thu, 6 Jan 2000 15:31:40 -0500 (EST)
Reply-To: cjclark@home.com
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

A machine of mine had some SCSI hardware problems yesterday. The
machine does NFS serving to several others. The filesystems exported
are on drives that were experiencing problems. This was causing local
hung processes on the machine as well as hung processes on the NFS
clients.

Eventually, I was forced to reboot the machine with hardware
problems. Now, the NFS exports are clean. Most machines that had
problems noticed the server go down and come up. They responded with
'stale NFS handle's messages at access attempts. A simple umount/mount
of the filesystem fixed this.

However, one machine is still having problems. It tried to access
files on the failing server while the NFS daemon was alive, but unable
to get the files due to the hardware problems. These processes are
still hanging. Despite the server going up and down and the fact it is
now alive and well, I cannot get the processes to "unhang."

Here are some of them,

root     15083  0.0  0.1   288   16  p0- D    11:51AM    0:00.04 umount /usr/ports
postman  15288  0.0  2.2   740  488  p1  Ds   12:08PM    0:00.40 -tcsh (tcsh)
root     15312  0.0  0.1   288   16  p2- D    12:09PM    0:00.03 umount /usr/ports
root     15820  0.0  0.1   224   16  p2- D    12:42PM    0:00.02 mount /usr/ports
root     16223  0.0  1.3   240  288  p2- D     1:05PM    0:00.43 / (find)
root     17693  0.0  0.2   288   36  p0- D     2:53PM    0:00.03 umount -f /usr/ports

I would really rather not reboot the machine this is happening
on (and I wonder if the shutdown would even be clean). However, these
are just a few of the hung processes. I've already had 'file table
full' errors which I believe are caused by all of the hung processes
keeping files open.

I know that hard NFS errors like this are very tough, if not
impossible, to clear, but I'd try just about anything. I'd build raw
packets to throw from the NFS server if I thought it would spoof the
cleint out of the hangs.

Any ideas would be great. (But I really think I'll need to
reboot... after 160 days up too... *sigh*)
-- 
Crist J. Clark                           cjclark@home.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message