From owner-freebsd-hackers@FreeBSD.ORG Sun Oct 2 05:16:39 2005 Return-Path: X-Original-To: hackers@freebsd.org Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 52E9E16A41F for ; Sun, 2 Oct 2005 05:16:39 +0000 (GMT) (envelope-from frank@exit.com) Received: from tinker.exit.com (tinker.exit.com [206.223.0.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id C03CF43D45 for ; Sun, 2 Oct 2005 05:16:38 +0000 (GMT) (envelope-from frank@exit.com) Received: from realtime.exit.com (realtime [206.223.0.5]) by tinker.exit.com (8.13.4/8.13.4) with ESMTP id j925Gcsw094411 for ; Sat, 1 Oct 2005 22:16:39 -0700 (PDT) (envelope-from frank@exit.com) Received: from realtime.exit.com (localhost [127.0.0.1]) by realtime.exit.com (8.13.4/8.13.3) with ESMTP id j925GbeZ070245 for ; Sat, 1 Oct 2005 22:16:37 -0700 (PDT) (envelope-from frank@exit.com) Received: (from frank@localhost) by realtime.exit.com (8.13.4/8.13.4/Submit) id j925GbWG070244 for hackers@freebsd.org; Sat, 1 Oct 2005 22:16:37 -0700 (PDT) (envelope-from frank@exit.com) X-Authentication-Warning: realtime.exit.com: frank set sender to frank@exit.com using -f From: Frank Mayhar To: hackers@freebsd.org Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Exit Consulting Date: Sat, 01 Oct 2005 22:16:37 -0700 Message-Id: <1128230197.63551.1.camel@realtime.exit.com> Mime-Version: 1.0 X-Mailer: Evolution 2.2.3 FreeBSD GNOME Team Port X-Virus-Scanned: ClamAV 0.86.2/1106/Fri Sep 30 10:17:17 2005 on tinker.exit.com X-Virus-Status: Clean Cc: Subject: Very weird NFS-related hang in 6-beta5. X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: frank@exit.com List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Oct 2005 05:16:39 -0000 I mount my /usr/ports, /usr/src, et al from an NFS server. Everything seems to work fine except on one system where I've been seeing repeated hangs. Of course the system in question is my main desktop one, sigh. At first I was using gigabit Ethernet (Intel Pro/1000, 82545GM chipset) but the interface kept wedging hard, also on this system (and _not_ on the server, just on this one). I upgrading the system to 6.0-beta5 to see if the interface hangs went away. (I upgraded by NFS-mounting /usr/src over my parallel 100BaseTX network rather than the Gigabit network.) The upgrade worked fine but the hangs didn't disappear. I planned to swap out the gigabit card to see if it was the hardware that was the problem, but in the interim (not having a spare card lying around) I decided to do a complete portupgrade using the 100BaseTX network. This is where it gets weird. Because of all the hangs I've run into, at some point I made all the NFS mounts soft mounts. I've been watching these port builds, and from time to time, with no obvious pattern that can discern, NFS hangs. The server seems perfectly healthy and in fact the _interface_ seems healthy, but the particular I/O in question just hangs until it eventually times out due to the soft-mount. After it finally times out, things pick up and keep going again. NFS works fine for a while, then it hangs again. I captured one of the hangs; this is from the client machine: 16:17:53.642822 IP realtime.exit.com.560259720 > jill.exit.com.nfs: 132 read fh 1070,983185/1114384 8192 bytes @ 1925120 16:17:53.643541 IP jill.exit.com.nfs > realtime.exit.com.560259720: reply ok 1472 read 16:18:11.679433 IP realtime.exit.com.560259720 > jill.exit.com.nfs: 132 read fh 1070,983185/1114384 8192 bytes @ 1925120 16:18:11.680142 IP jill.exit.com.nfs > realtime.exit.com.560259720: reply ok 1472 read So the server gets the read and replies, but the client apparently never sees the reply (despite the fact that it is coming in on the interface and gets picked up by tcpdump). I've attached the dmesg from the client, if it helps, but I doubt it will. I can't imagine that this is hardware, although I guess it _might_ be. It's just very weird. Any hints as to cause or further steps I can take to diagnose it would be appreciated. -- Frank Mayhar frank@exit.com http://www.exit.com/ Exit Consulting http://www.gpsclock.com/ http://www.exit.com/blog/frank/