Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Mar 2010 12:00:41 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Steve Polyack <korvus@comcast.net>
Cc:        freebsd-fs@freebsd.org, User Questions <freebsd-questions@freebsd.org>, bseklecki@noc.cfi.pgh.pa.us
Subject:   Re: FreeBSD NFS client goes into infinite retry loop
Message-ID:  <201003221200.41607.jhb@freebsd.org>
In-Reply-To: <4BA7911F.5060905@comcast.net>
References:  <4BA3613F.4070606@comcast.net> <4BA78444.4040707@comcast.net> <4BA7911F.5060905@comcast.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday 22 March 2010 11:47:43 am Steve Polyack wrote:
> On 03/22/10 10:52, Steve Polyack wrote:
> > On 3/19/2010 11:27 PM, Rick Macklem wrote:
> >> On Fri, 19 Mar 2010, Steve Polyack wrote:
> >>
> >> [good stuff snipped]
> >>>
> >>> This makes sense.  According to wireshark, the server is indeed 
> >>> transmitting "Status: NFS3ERR_IO (5)".  Perhaps this should be STALE 
> >>> instead; it sounds more correct than marking it a general IO error.  
> >>> Also, the NFS server is serving its share off of a ZFS filesystem, 
> >>> if it makes any difference.  I suppose ZFS could be talking to the 
> >>> NFS server threads with some mismatched language, but I doubt it.
> >>>
> >> Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return
> >> ESTALE when the file no longer exists, the NFS server returns whatever
> >> error it has returned.
> >>
> >> So, either VFS_FHTOVP() succeeds after the file has been deleted, which
> >> would be a problem that needs to be fixed within ZFS
> >> OR
> >> ZFS returns an error other than ESTALE when it doesn't exist.
> >>
> >> Try the following patch on the server (which just makes any error
> >> returned by VFS_FHTOVP() into ESTALE) and see if that helps.
> >>
> >> --- nfsserver/nfs_srvsubs.c.sav    2010-03-19 22:06:43.000000000 -0400
> >> +++ nfsserver/nfs_srvsubs.c    2010-03-19 22:07:22.000000000 -0400
> >> @@ -1127,6 +1127,8 @@
> >>          }
> >>      }
> >>      error = VFS_FHTOVP(mp, &fhp->fh_fid, vpp);
> >> +    if (error != 0)
> >> +        error = ESTALE;
> >>      vfs_unbusy(mp);
> >>      if (error)
> >>          goto out;
> >>
> >> Please let me know if the patch helps, rick
> >>
> >>
> > The patch seems to fix the bad behavior.  Running with the patch, I 
> > see the following output from my patch (return code of nfs_doio from 
> > within nfsiod):
> > nfssvc_iod: iod 0 nfs_doio returned errno: 70
> >
> > Furthermore, when inspecting the transaction with Wireshark, after 
> > deleting the file on the NFS server it looks like there is only a 
> > single error.  This time there it is a reply to a V3 Lookup call that 
> > contains a status of "NFS3ERR_NOENT (2)" coming from the NFS server.  
> > The client also does not repeatedly try to complete the failed request.
> >
> > Any suggestions on the next step here?  Based on what you said it 
> > looks like ZFS is falsely reporting an IO error to VFS instead of 
> > ESTALE / NOENT.  I tried looking around zfs_fhtovp() and only saw 
> > returns of EINVAL, but I'm not even sure I'm looking in the right place.
> 
> Further on down the rabbit hole... here's the piece in zfs_fhtovp() 
> where it's kicking out EINVAL instead of ESTALE - the following patch 
> corrects the behavior, but of course also suggests further digging 
> within the zfs_zget() function to ensure that _it_ is returning the 
> correct thing and whether or not it needs to be handled there or within 
> zfs_fhtovp().
> 
> --- 
> src-orig/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c    
> 2010-03-22 11:41:21.000000000 -0400
> +++ src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c    
> 2010-03-22 16:25:21.000000000 -0400
> @@ -1246,7 +1246,7 @@
>       dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, gen_mask);
>       if (err = zfs_zget(zfsvfs, object, &zp)) {
>           ZFS_EXIT(zfsvfs);
> -        return (err);
> +        return (ESTALE);
>       }
>       zp_gen = zp->z_phys->zp_gen & gen_mask;
>       if (zp_gen == 0)

So the odd thing here is that ffs_fhtovp() doesn't return ESTALE if VFS_VGET() 
(which calls ffs_vget()) fails, it only returns ESTALE if the generation count 
doesn't matter.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201003221200.41607.jhb>