From owner-freebsd-questions@FreeBSD.ORG  Mon Mar 22 17:41:03 2010
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E17D51065679;
	Mon, 22 Mar 2010 17:41:03 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id AF52A8FC14;
	Mon, 22 Mar 2010 17:41:03 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 5D6A646B2D;
	Mon, 22 Mar 2010 13:41:03 -0400 (EDT)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPA id 477F58A027;
	Mon, 22 Mar 2010 13:41:02 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Steve Polyack <korvus@comcast.net>
Date: Mon, 22 Mar 2010 13:39:37 -0400
User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; )
References: <4BA3613F.4070606@comcast.net> <201003221200.41607.jhb@freebsd.org>
	<4BA79E54.5030504@comcast.net>
In-Reply-To: <4BA79E54.5030504@comcast.net>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201003221339.37169.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Mon, 22 Mar 2010 13:41:02 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-1.7 required=4.2 tests=AWL,BAYES_00 autolearn=ham
	version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: freebsd-fs@freebsd.org, Rick Macklem <rmacklem@uoguelph.ca>,
	User Questions <freebsd-questions@freebsd.org>, bseklecki@noc.cfi.pgh.pa.us
Subject: Re: FreeBSD NFS client goes into infinite retry loop
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Mar 2010 17:41:04 -0000

On Monday 22 March 2010 12:44:04 pm Steve Polyack wrote:
> On 03/22/10 12:00, John Baldwin wrote:
> > On Monday 22 March 2010 11:47:43 am Steve Polyack wrote:
> >    
> >> On 03/22/10 10:52, Steve Polyack wrote:
> >>      
> >>> On 3/19/2010 11:27 PM, Rick Macklem wrote:
> >>>        
> >>>> On Fri, 19 Mar 2010, Steve Polyack wrote:
> >>>>
> >>>> [good stuff snipped]
> >>>>          
> >>>>> This makes sense.  According to wireshark, the server is indeed
> >>>>> transmitting "Status: NFS3ERR_IO (5)".  Perhaps this should be STALE
> >>>>> instead; it sounds more correct than marking it a general IO error.
> >>>>> Also, the NFS server is serving its share off of a ZFS filesystem,
> >>>>> if it makes any difference.  I suppose ZFS could be talking to the
> >>>>> NFS server threads with some mismatched language, but I doubt it.
> >>>>>
> >>>>>            
> >>>> Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return
> >>>> ESTALE when the file no longer exists, the NFS server returns whatever
> >>>> error it has returned.
> >>>>
> >>>> So, either VFS_FHTOVP() succeeds after the file has been deleted, which
> >>>> would be a problem that needs to be fixed within ZFS
> >>>> OR
> >>>> ZFS returns an error other than ESTALE when it doesn't exist.
> >>>>
> >>>> Try the following patch on the server (which just makes any error
> >>>> returned by VFS_FHTOVP() into ESTALE) and see if that helps.
> >>>>
> >>>> --- nfsserver/nfs_srvsubs.c.sav    2010-03-19 22:06:43.000000000 -0400
> >>>> +++ nfsserver/nfs_srvsubs.c    2010-03-19 22:07:22.000000000 -0400
> >>>> @@ -1127,6 +1127,8 @@
> >>>>           }
> >>>>       }
> >>>>       error = VFS_FHTOVP(mp,&fhp->fh_fid, vpp);
> >>>> +    if (error != 0)
> >>>> +        error = ESTALE;
> >>>>       vfs_unbusy(mp);
> >>>>       if (error)
> >>>>           goto out;
> >>>>
> >>>> Please let me know if the patch helps, rick
> >>>>
> >>>>
> >>>>          
> >>> The patch seems to fix the bad behavior.  Running with the patch, I
> >>> see the following output from my patch (return code of nfs_doio from
> >>> within nfsiod):
> >>> nfssvc_iod: iod 0 nfs_doio returned errno: 70
> >>>
> >>> Furthermore, when inspecting the transaction with Wireshark, after
> >>> deleting the file on the NFS server it looks like there is only a
> >>> single error.  This time there it is a reply to a V3 Lookup call that
> >>> contains a status of "NFS3ERR_NOENT (2)" coming from the NFS server.
> >>> The client also does not repeatedly try to complete the failed request.
> >>>
> >>> Any suggestions on the next step here?  Based on what you said it
> >>> looks like ZFS is falsely reporting an IO error to VFS instead of
> >>> ESTALE / NOENT.  I tried looking around zfs_fhtovp() and only saw
> >>> returns of EINVAL, but I'm not even sure I'm looking in the right place.
> >>>        
> >> Further on down the rabbit hole... here's the piece in zfs_fhtovp()
> >> where it's kicking out EINVAL instead of ESTALE - the following patch
> >> corrects the behavior, but of course also suggests further digging
> >> within the zfs_zget() function to ensure that _it_ is returning the
> >> correct thing and whether or not it needs to be handled there or within
> >> zfs_fhtovp().
> >>
> >> ---
> >> src-orig/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
> >> 2010-03-22 11:41:21.000000000 -0400
> >> +++ src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
> >> 2010-03-22 16:25:21.000000000 -0400
> >> @@ -1246,7 +1246,7 @@
> >>        dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, 
gen_mask);
> >>        if (err = zfs_zget(zfsvfs, object,&zp)) {
> >>            ZFS_EXIT(zfsvfs);
> >> -        return (err);
> >> +        return (ESTALE);
> >>        }
> >>        zp_gen = zp->z_phys->zp_gen&  gen_mask;
> >>        if (zp_gen == 0)
> >>      
> > So the odd thing here is that ffs_fhtovp() doesn't return ESTALE if 
VFS_VGET()
> > (which calls ffs_vget()) fails, it only returns ESTALE if the generation 
count
> > doesn't matter.
> >
> >    
> It looks like it also returns ESTALE when the inode is invalid (< 
> ROOTINO || > max inodes?) - would an unlinked file in FFS referenced at 
> a later time report an invalid inode?
> 
> But back to your point, zfs_zget() seems to be failing and returning the 
> EINVAL before zfs_fhtovp() even has a chance to set and check zp_gen.  
> I'm trying to get some more details through the use of gratuitous 
> dprintf()'s, but they don't seem to be making it to any logs or the 
> console even with vfs.zfs.debug=1 set.  Any pointers on how to get these 
> dprintf() calls working?

That I have no idea on.  Maybe Rick can chime in?  I'm actually not sure why 
we would want to treat a FHTOVP failure as anything but an ESTALE error in the 
NFS server to be honest.

-- 
John Baldwin