From owner-freebsd-questions Fri Jan 7 8:15: 0 2000 Delivered-To: freebsd-questions@freebsd.org Received: from cc942873-a.ewndsr1.nj.home.com (cc942873-a.ewndsr1.nj.home.com [24.2.89.207]) by hub.freebsd.org (Postfix) with ESMTP id 8EEEF157B0 for ; Fri, 7 Jan 2000 08:14:46 -0800 (PST) (envelope-from cjc@cc942873-a.ewndsr1.nj.home.com) Received: (from cjc@localhost) by cc942873-a.ewndsr1.nj.home.com (8.9.3/8.9.3) id LAA23340 for freebsd-questions@FreeBSD.ORG; Fri, 7 Jan 2000 11:19:16 -0500 (EST) (envelope-from cjc) From: "Crist J. Clark" Message-Id: <200001071619.LAA23340@cc942873-a.ewndsr1.nj.home.com> Subject: Re: Hung NFS Mount In-Reply-To: <200001062031.PAA20493@cc942873-a.ewndsr1.nj.home.com> from "Crist J. Clark" at "Jan 6, 2000 03:31:40 pm" To: freebsd-questions@FreeBSD.ORG (FreeBSD Questions) Date: Fri, 7 Jan 2000 11:19:16 -0500 (EST) Reply-To: cjclark@home.com X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I have a little more info about my hung up system. I'm hoping against all hope someone out there might know how to help me with this. The client with unkillable processes is hung up, but still alive. It's been sending out NFS packets every ten seconds for a day now trying to get the file it wants from the server. I'm begining to wonder if this is a client or server issue. I've caught the packet is keeps sending, # tcpdump -v \( host newmail \&\& port nfs \) 10:40:04.592755 newmail.mydom.org.3388292815 > backmail.mydom.org.nfs: 104 lookup [|nfs] (ttl 64, id 15687) 10:40:14.838168 newmail.mydom.org.3388292815 > backmail.mydom.org.nfs: 104 lookup [|nfs] (ttl 64, id 15736) 10:40:25.088713 newmail.mydom.org.3388292815 > backmail.mydom.org.nfs: 104 lookup [|nfs] (ttl 64, id 15737) 10:40:35.339210 newmail.mydom.org.3388292815 > backmail.mydom.org.nfs: 104 lookup [|nfs] (ttl 64, id 15768) ^C 40 packets received by filter 0 packets dropped by kernel More detail on one of these packets, 00:16:41.324996 newmail.mydom.org.3388292815 > backmail.mydom.org.nfs: 104 lookup fh 25,0/713065181 "net" (ttl 64, id 30022) The full packet is show at the bottom. I have tried a few things. I have scanned the source port on the client from the NFS port of the server (nmap reports it open anyway). I have set "unreach" rules on the server's firewall on the NFS port with a variety of unreachable responses. None of these stopped the packets from continuing or unhung the processes on the client. I have also mounted the remote filesystem elsewhere on the system and then unounted it with no problem. That is, # mount -t nfs # show the hung FS backmail:/u1/FreeBSD-3S/ports on /usr/ports # mount backmail:/u1/FreeBSD-3S/ports /mnt # ls /mnt .cvsignore archivers deskutils mail sysutils INDEX astro devel math textproc LEGAL audio distfiles mbone www Makefile benchmarks editors misc x11 Mk biology emulators net x11-clocks README cad ftp news x11-fm Templates comms games print x11-fonts Tools converters graphics security x11-toolkits YEAR2000 databases lang shells x11-wm # umount backmail:/u1/FreeBSD-3S/ports umount: /usr/ports: Device busy # umount /mnt # ls /mnt # I am kind of curious why the server is not responding at all to this. I'm used to getting "stale file handle" messages when a server disappears and comes back up in a new state, and I would think that that would be the server response in this case. Does anyone know how to prompt the server to give a response? Or how to build a packet to send to the client to get it out of this rut? Any ideas would be much appreciated. Here is a hexdump -C of the NFS packet it keeps sending (IP addresses masked), # hexdump -C nfs.tcpdump 00000000 d4 c3 b2 a1 02 00 04 00 00 00 00 00 00 00 00 00 |................| 00000010 00 01 00 00 01 00 00 00 b9 76 75 38 84 f5 04 00 |.........vu8....| 00000020 92 00 00 00 92 00 00 00 00 aa 00 bb 1e 42 00 aa |.............B..| 00000030 00 6f d7 28 08 00 45 00 00 84 75 46 00 00 40 11 |.o.(..E...uF..@.| 00000040 ae 07 xx xx xx xx xx xx xx xx 03 fd 08 01 00 70 |...............p| 00000050 30 32 c9 f5 3e cf 00 00 00 00 00 00 00 02 00 01 |02..>...........| 00000060 86 a3 00 00 00 03 00 00 00 03 00 00 00 01 00 00 |................| 00000070 00 18 00 00 00 00 00 00 00 00 00 00 ff fe 00 00 |................| 00000080 ff fe 00 00 00 01 00 00 ff fe 00 00 00 00 00 00 |................| 00000090 00 00 00 00 00 1c 00 19 00 00 dd 82 80 2a 0c 00 |.............*..| 000000a0 00 00 00 d9 00 00 0c b0 82 65 00 00 00 00 00 00 |.........e......| 000000b0 00 00 00 00 00 03 6e 65 74 00 |......net.| 000000ba Crist J. Clark wrote, > A machine of mine had some SCSI hardware problems yesterday. The > machine does NFS serving to several others. The filesystems exported > are on drives that were experiencing problems. This was causing local > hung processes on the machine as well as hung processes on the NFS > clients. > > Eventually, I was forced to reboot the machine with hardware > problems. Now, the NFS exports are clean. Most machines that had > problems noticed the server go down and come up. They responded with > 'stale NFS handle's messages at access attempts. A simple umount/mount > of the filesystem fixed this. > > However, one machine is still having problems. It tried to access > files on the failing server while the NFS daemon was alive, but unable > to get the files due to the hardware problems. These processes are > still hanging. Despite the server going up and down and the fact it is > now alive and well, I cannot get the processes to "unhang." > > Here are some of them, > > root 15083 0.0 0.1 288 16 p0- D 11:51AM 0:00.04 umount /usr/ports > postman 15288 0.0 2.2 740 488 p1 Ds 12:08PM 0:00.40 -tcsh (tcsh) > root 15312 0.0 0.1 288 16 p2- D 12:09PM 0:00.03 umount /usr/ports > root 15820 0.0 0.1 224 16 p2- D 12:42PM 0:00.02 mount /usr/ports > root 16223 0.0 1.3 240 288 p2- D 1:05PM 0:00.43 / (find) > root 17693 0.0 0.2 288 36 p0- D 2:53PM 0:00.03 umount -f /usr/ports > > I would really rather not reboot the machine this is happening > on (and I wonder if the shutdown would even be clean). However, these > are just a few of the hung processes. I've already had 'file table > full' errors which I believe are caused by all of the hung processes > keeping files open. > > I know that hard NFS errors like this are very tough, if not > impossible, to clear, but I'd try just about anything. I'd build raw > packets to throw from the NFS server if I thought it would spoof the > cleint out of the hangs. > > Any ideas would be great. (But I really think I'll need to > reboot... after 160 days up too... *sigh*) > -- > Crist J. Clark cjclark@home.com > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-questions" in the body of the message > -- Crist J. Clark cjclark@home.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message