Date: Sat, 01 Oct 2005 22:16:37 -0700 From: Frank Mayhar <frank@exit.com> To: hackers@freebsd.org Subject: Very weird NFS-related hang in 6-beta5. Message-ID: <1128230197.63551.1.camel@realtime.exit.com>
next in thread | raw e-mail | index | archive | help
I mount my /usr/ports, /usr/src, et al from an NFS server. Everything seems to work fine except on one system where I've been seeing repeated hangs. Of course the system in question is my main desktop one, sigh. At first I was using gigabit Ethernet (Intel Pro/1000, 82545GM chipset) but the interface kept wedging hard, also on this system (and _not_ on the server, just on this one). I upgrading the system to 6.0-beta5 to see if the interface hangs went away. (I upgraded by NFS-mounting /usr/src over my parallel 100BaseTX network rather than the Gigabit network.) The upgrade worked fine but the hangs didn't disappear. I planned to swap out the gigabit card to see if it was the hardware that was the problem, but in the interim (not having a spare card lying around) I decided to do a complete portupgrade using the 100BaseTX network. This is where it gets weird. Because of all the hangs I've run into, at some point I made all the NFS mounts soft mounts. I've been watching these port builds, and from time to time, with no obvious pattern that can discern, NFS hangs. The server seems perfectly healthy and in fact the _interface_ seems healthy, but the particular I/O in question just hangs until it eventually times out due to the soft-mount. After it finally times out, things pick up and keep going again. NFS works fine for a while, then it hangs again. I captured one of the hangs; this is from the client machine: 16:17:53.642822 IP realtime.exit.com.560259720 > jill.exit.com.nfs: 132 read fh 1070,983185/1114384 8192 bytes @ 1925120 16:17:53.643541 IP jill.exit.com.nfs > realtime.exit.com.560259720: reply ok 1472 read 16:18:11.679433 IP realtime.exit.com.560259720 > jill.exit.com.nfs: 132 read fh 1070,983185/1114384 8192 bytes @ 1925120 16:18:11.680142 IP jill.exit.com.nfs > realtime.exit.com.560259720: reply ok 1472 read So the server gets the read and replies, but the client apparently never sees the reply (despite the fact that it is coming in on the interface and gets picked up by tcpdump). I've attached the dmesg from the client, if it helps, but I doubt it will. I can't imagine that this is hardware, although I guess it _might_ be. It's just very weird. Any hints as to cause or further steps I can take to diagnose it would be appreciated. -- Frank Mayhar frank@exit.com http://www.exit.com/ Exit Consulting http://www.gpsclock.com/ http://www.exit.com/blog/frank/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1128230197.63551.1.camel>