From owner-freebsd-current  Thu Jul 29 17: 5:16 1999
Delivered-To: freebsd-current@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
	by hub.freebsd.org (Postfix) with ESMTP id ACDD415742
	for <current@FreeBSD.ORG>; Thu, 29 Jul 1999 17:05:11 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id RAA80452;
	Thu, 29 Jul 1999 17:05:02 -0700 (PDT)
	(envelope-from dillon)
Date: Thu, 29 Jul 1999 17:05:02 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199907300005.RAA80452@apollo.backplane.com>
To: Bill Paul <wpaul@skynet.ctr.columbia.edu>
Cc: peter@netplex.com.au, crossd@cs.rpi.edu, current@FreeBSD.ORG
Subject: readdirplus client side fix (was Re: IRIX 6.5.4 NFS v3 TCP client + FreeBSD server = bewm)
References:  <199907292322.TAA17429@skynet.ctr.columbia.edu>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:And here is something even scarier: readdirplus from the client side
:doesn't appear to work correctly either. This time, you don't need an
:IRIX machine to trigger the problem (though it helps :). Do the following
:
:client# mount -o nvsv3,tcp,rdirplus server:/somefs /mnt
:client# ls /mnt; du /mnt; etc...
:
:Seems okay so far, right? Ah, but now try to unmount the filesystem:
:
:# umount /mnt
:<process wedges, can't be killed, can't log in, other processes wedge, etc..>
:...
:-Bill

    But, on the bright side, readdirplus is somewhat experimental in that
    it is not used by default, so very little testing of it has been done
    to date.  Thus the bug is not unexepcted :-).  At least the bugs we
    are getting now tend to be in the 'outlying areas' of NFS and not so much
    with the core code.

    Another area that is probably full of bugs:  nqleasing.

    --

    Ok, I was able to reproduce the above bug and fix it.  The problem on
    the FreeBSD client is in nfs_readdirplusrpc() in nfs/nfs_vnops.c.  It 
    can obtain the vnode being used to populate the additional directory 
    info in one of two ways.  When it gets the vnode via nfs_nget(), the
    returned vnode is locked.  When it gets it via a hit against NFS_CMPFH()
    (which I presume is for '.'), it simply VREF()'s the vnode.

    In the one case the vnode is returned locked, in the other it is not.

    However, the internal loop vrele()'s the vnode rather then vput()'s it,
    so the vnodes in the directory scan are never unlocked.  This leads to
    the lockup.

    If you could test and then commit this patch (w/ me as the submitter),
    I would appreciate it!  It seems to fix the problem for me.  This patch
    is relative to CURRENT.  The fix ought to be MFCable to STABLE.

    The funny thing is that the error termination code actually got it
    right and the loop got it wrong.  Usually it's the other way around. 

    --

    Presumably this will not fix the SGI client.  I've no idea what the
    problem there is.  There may be a bug in the SGI client or there may
    be a bug in the client & server implementation of the protocol in FreeBSD.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


Index: nfs_vnops.c
===================================================================
RCS file: /home/ncvs/src/sys/nfs/nfs_vnops.c,v
retrieving revision 1.135
diff -u -r1.135 nfs_vnops.c
--- nfs_vnops.c	1999/07/01 13:32:54	1.135
+++ nfs_vnops.c	1999/07/29 23:57:06
@@ -2367,7 +2367,10 @@
 			    nfsm_adv(nfsm_rndup(i));
 			}
 			if (newvp != NULLVP) {
-			    vrele(newvp);
+			    if (newvp == vp)
+				vrele(newvp);
+			    else
+				vput(newvp);
 			    newvp = NULLVP;
 			}
 			nfsm_dissect(tl, u_int32_t *, NFSX_UNSIGNED);


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message