From owner-freebsd-fs@FreeBSD.ORG Mon Oct 25 00:22:31 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BF67E16A4CE for ; Mon, 25 Oct 2004 00:22:31 +0000 (GMT) Received: from grummit.biaix.org (86.Red-213-97-212.pooles.rima-tde.net [213.97.212.86]) by mx1.FreeBSD.org (Postfix) with SMTP id 2530843D5E for ; Mon, 25 Oct 2004 00:22:28 +0000 (GMT) (envelope-from lists-freebsd-questions@biaix.org) Received: (qmail 36834 invoked by uid 1000); 25 Oct 2004 00:20:08 -0000 Date: Mon, 25 Oct 2004 02:20:08 +0200 From: Joan Picanyol To: freebsd-stable@freebsd.org Message-ID: <20041025002008.GA36161@grummit.biaix.org> Mail-Followup-To: freebsd-stable@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i cc: freebsd-fs@freebsd.org cc: freebsd-net@freebsd.org Subject: process stuck in nfsfsync state X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Oct 2004 00:22:31 -0000 [please honour Mail-Followup-To:, no need to keep the crosspost] This is a repost of http://docs.FreeBSD.org/cgi/mid.cgi?20041014110752.GA57541, with some additional information. I've updated the client to RC1, and the problem still persists. In short, a 5.3-RC1 client mounting /home off a 4.10-p3 server can't use the NFS fs anymore when trying to start GNOME, since gconfd and gnome-session are in nfsfsync state. Any process accessing the fs hungs, and the console gets full of nfs server grummit:/fs/home/mount: not responding messages, even though the client can still ping the server and other mount points are still available. AFAICT, nfsd and friends are running both on the client and the server, and the client can use RPC properly (checked via rpcinfo). Also, doing 'tcpdump -vv -s 192 port nfs' on the client and the server seems support the hypothesis of a locking issue, since I see a write request for the same fh repeating over and over. The trace of gnome-session is as follows: db> tr 610 sched_switch(c180b4b0,0,1,11d,27b8ea4) at sched_switch+0x190 mi_switch(1,0,c063d701,19d,2) at mi_switch+0x2ac sleepq_switch(c216d23c,c0639f0f,18e,2,da518a5c) at sleepq_switch+0x134 sleepq_wait(c216d23c,0,c063b2f5,db,0) at sleepq_wait+0x41 msleep(c216d23c,c216d210,4d,c1906703,0) at msleep+0x3b5 nfs_flush(c216d210,c17fed00,1,c180b4b0,0) at nfs_flush+0x961 nfs_close(da518b8c,1,c0643a5e,140,c0681da0) at nfs_close+0x7e vn_close(c216d210,2,c17fed00,c180b4b0,c0692c20) at vn_close+0x67 vn_closefile(c1c2b6e8,c180b4b0,c0637a98,829,c1c2b6e8) at vn_closefile+0xc4 fdrop_locked(c1c2b6e8,c180b4b0,c0637a98,768) at fdrop_locked+0xb4 fdrop(c1c2b6e8,c180b4b0,3,c180b4b0,da518c98) at fdrop+0x3c closef(c1c2b6e8,c180b4b0,c0637a98,3e3,0) at closef+0x21c close(c180b4b0,da518d14,4,431,1) at close+0x135 syscall(2f,2f,2f,0,28d38ec0) at syscall+0x272 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (6, FreeBSD ELF32, close), eip = 0x28ca1e6f, esp = 0xbfbfe52c, ebp = 0xbfbfe538 --- I have a debugging kernel and a console attached, feel free to ask for any other information of interest. This is driving me nuts, and I'm surely not the only one using GNOME over NFS, is anyone else seeing this? What exactly is going on? How can I fix it? It might be that the problem appeared going from BETA3 to BETA6, but I've been unable to "downgrade" the workstation; where can I get a copy of BETA3 to test this? tks -- pica