Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Jul 1999 23:06:35 -0400 (EDT)
From:      Bill Paul <wpaul@skynet.ctr.columbia.edu>
To:        dillon@apollo.backplane.com (Matthew Dillon)
Cc:        peter@netplex.com.au, crossd@cs.rpi.edu, current@freebsd.org
Subject:   Re: readdirplus client side fix (was Re: IRIX 6.5.4 NFS v3 TCP client + FreeBSD server = bewm)
Message-ID:  <199907300306.XAA17720@skynet.ctr.columbia.edu>
In-Reply-To: <199907300005.RAA80452@apollo.backplane.com> from "Matthew Dillon" at Jul 29, 99 05:05:02 pm

next in thread | previous in thread | raw e-mail | index | archive | help
Of all the gin joints in all the towns in all the world, Matthew Dillon 
had to walk into mine and say:

> :And here is something even scarier: readdirplus from the client side
> :doesn't appear to work correctly either. This time, you don't need an
> :IRIX machine to trigger the problem (though it helps :). Do the following
> :
> :client# mount -o nvsv3,tcp,rdirplus server:/somefs /mnt
> :client# ls /mnt; du /mnt; etc...
> :
> :Seems okay so far, right? Ah, but now try to unmount the filesystem:
> :
> :# umount /mnt
> :<process wedges, can't be killed, can't log in, other processes wedge, etc..>
> :...
> :-Bill
> 
>     But, on the bright side, readdirplus is somewhat experimental in that
>     it is not used by default, so very little testing of it has been done
>     to date.  Thus the bug is not unexepcted :-).  At least the bugs we
>     are getting now tend to be in the 'outlying areas' of NFS and not so much
>     with the core code.

Well, IRIX is using it by default, and option or not, it's documented
and implemented, so it should work.

>     Another area that is probably full of bugs:  nqleasing.

Well, the problem there is: what commercial UNIXes implement NQNFS?
I stumbled over these problems because I was testing things with a
commercial implementation of NFS.

>     --
> 
>     Ok, I was able to reproduce the above bug and fix it.  The problem on
>     the FreeBSD client is in nfs_readdirplusrpc() in nfs/nfs_vnops.c.  It 
>     can obtain the vnode being used to populate the additional directory 
>     info in one of two ways.  When it gets the vnode via nfs_nget(), the
>     returned vnode is locked.  When it gets it via a hit against NFS_CMPFH()
>     (which I presume is for '.'), it simply VREF()'s the vnode.
> 
>     In the one case the vnode is returned locked, in the other it is not.
> 
>     However, the internal loop vrele()'s the vnode rather then vput()'s it,
>     so the vnodes in the directory scan are never unlocked.  This leads to
>     the lockup.

Uh, yeah.

One of these days I'll be able to understand everything that you just
said. But not today.
 
>     If you could test and then commit this patch (w/ me as the submitter),
>     I would appreciate it!  It seems to fix the problem for me.  This patch
>     is relative to CURRENT.  The fix ought to be MFCable to STABLE.

Close, but not quite. You didn't beat up on it hard enough. The secret
is to think like a kid with a new toy, or more precisely, a sysadmin with
a new toy (amounts to the same thing :). The first thing any sysadmin
wants to do when you hand him a new gizmo is to push the buttons, turns
the knobs and flip the switches, in order to try out all those great
new features he's heard about. That's how you find the bugs.

Anyway, in this case, I found another problem: with your patch applied,
I mounted a filesystem from a 3.2-RELEASE server (which I fixed today
with the readdirplus server side patch) which happened to have a
directory containing the unpacked source code for Ghostscript 5.50,
plus objects left over from a build. There are a crapload of files
in the gs 5.50 distribution, plus another crapload created by compiling
it. I did the following:

client# mount -o nfsv3,tcp,rdirplus server:/fs /mnt
client# cd /mnt
client# ls
client# du
<lots of stuff printed, until the gs5.50 directory is reached>
<bang! another panic>

There seems to be another problem in nfs_readdirplusrpc(). The following
diff shows the changes I made to stop the panic:



 
>     The funny thing is that the error termination code actually got it
>     right and the loop got it wrong.  Usually it's the other way around. 
> 
>     --
> 
>     Presumably this will not fix the SGI client.  I've no idea what the
>     problem there is.  There may be a bug in the SGI client or there may
>     be a bug in the client & server implementation of the protocol in FreeBSD.
> 
> 					-Matt
> 					Matthew Dillon 
> 					<dillon@backplane.com>
> 
> 
> Index: nfs_vnops.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/nfs/nfs_vnops.c,v
> retrieving revision 1.135
> diff -u -r1.135 nfs_vnops.c
> --- nfs_vnops.c	1999/07/01 13:32:54	1.135
> +++ nfs_vnops.c	1999/07/29 23:57:06
> @@ -2367,7 +2367,10 @@
>  			    nfsm_adv(nfsm_rndup(i));
>  			}
>  			if (newvp != NULLVP) {
> -			    vrele(newvp);
> +			    if (newvp == vp)
> +				vrele(newvp);
> +			    else
> +				vput(newvp);
>  			    newvp = NULLVP;
>  			}
>  			nfsm_dissect(tl, u_int32_t *, NFSX_UNSIGNED);
> 


-- 
=============================================================================
-Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
Work:         wpaul@ctr.columbia.edu | Center for Telecommunications Research
Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
=============================================================================
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=============================================================================


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199907300306.XAA17720>