From owner-freebsd-fs@freebsd.org Wed Jul 1 23:49:04 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 08812992748 for ; Wed, 1 Jul 2015 23:49:04 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id E668B2827 for ; Wed, 1 Jul 2015 23:49:03 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from zeta.ixsystems.com (unknown [12.229.62.2]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id 8DBC61A4E9; Wed, 1 Jul 2015 16:49:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1435794543; x=1435808943; bh=mU51GQiBCcWPpLcttJHV5yoKiv5edOluXoZToisCqRw=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=oI+J7TrxMrfgmEoxZzvLJ/m73E9nrTyc8KBceM1YyulGqnXbRjncEITvByMiQ59h6 fiWBGW+6JYYqhSIhd1qUTeXya9HEQB9qIwewjcJiqRGsIHkLTI+PlCdfCwXFF3xtjR PbbxBx48+9WlM71hQmS+KErIhXRO1vv0jjsyoSYI= Message-ID: <55947C6E.5060409@delphij.net> Date: Wed, 01 Jul 2015 16:49:02 -0700 From: Xin Li Reply-To: d@delphij.net Organization: The FreeBSD Project MIME-Version: 1.0 To: Ahmed Kamal , Rick Macklem CC: freebsd-fs@freebsd.org Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Jul 2015 23:49:04 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 07/01/15 16:44, Ahmed Kamal via freebsd-fs wrote: > The not so great news is, after updating sysctl and rebooting the > nas box, I still saw a few (NFS: v4 server nas returned a bad > sequence-id error!) lines in logs. Users have already left, so I > don't know how bad is it .. > > Could you share more info on what this error means? RedHat seems to > think the client can skip-by-1 and choose larger IDs and that would > be totally fine ? Also how serious is this error, would it cause > NFS session stall like that ? I wonder if this would help, which loosen the check: Index: sys/fs/nfsserver/nfs_nfsdstate.c =================================================================== - --- sys/fs/nfsserver/nfs_nfsdstate.c (revision 285016) +++ sys/fs/nfsserver/nfs_nfsdstate.c (working copy) @@ -3805,7 +3805,8 @@ nfsrv_checkseqid(struct nfsrv_descript *nd, u_int3 printf("refcnt=%d\n", stp->ls_op->rc_refcnt); panic("nfsrvstate op refcnt"); } - - if ((stp->ls_seq + 1) == seqid) { + if ((stp->ls_seq + 1) == seqid || + (stp->ls_seq + 2) == seqid) { if (stp->ls_op) nfsrvd_derefcache(stp->ls_op); stp->ls_op = op; Personally I don't quite buy the skip-by-1 is Okay argument but it seems that the RFC text can be interpreted that way. Cheers, > On Thu, Jul 2, 2015 at 1:36 AM, Rick Macklem > wrote: > >> Ahmed Kamal wrote: >>> Hi all, >>> >>> I'm a refugee from linux land. I just set up my first freebsd >>> 10.1 zfs >> box, >>> sharing /home over nfs. Since every home directory is its own >>> zfs >> dataset, >>> I chose to use nfsv4 to enable recursively sharing/mounting any >>> directory under /home (I understand nfs4 is a must in this >>> scenario!) >>> >>> I'm able to mount form linux (rhel5 latest kernel) >>> successfully. Users >> are >>> working fine. However every now and then a user screams that >>> his session >> is >>> frozen. Usually the processes are stuck in nfs_wait or rpc_* >>> state. I >> tried >>> using a much newer linux kernel (3.2 however it still faced the >>> same problem). The errors in Linux log files are mostly: Jul 1 >>> 17:41:47 mammoth kernel: NFS: v4 server nas returned a *bad >>> sequence-id error*! Jul 1 17:52:32 mammoth kernel: >>> nfs4_reclaim_locks: unhandled error -11. Zeroing state Jul 1 >>> 17:52:32 mammoth kernel: nfs4_reclaim_open_state: Lock reclaim >>> failed! >>> >> Btw, a client should only do "reclaim" operations after the >> server has replied with NFS4ERR_STALE_CLIENTID or >> NFS4ERR_STALE_STATEID. I am pretty certain that the FreeBSD NFSv4 >> server only generates these replies after it has rebooted, so >> assuming the server didn't reboot, I have no idea why the client >> would attempt these and am not surprised they failed. >> >> I'm guessing that the DRC constipation somehow caused the Linux >> client to go into recovery mode? >> >> rick >> >>> My search led me to >>> (https://access.redhat.com/solutions/1328073) a detailed >>> analysis of the issue, which you can read over here >>> https://dl.dropboxusercontent.com/u/51939288/nfs4-bad-seq.pdf >>> .. NetApp confirmed this was a bug for them (I'm wondering if >>> this is still in FreeBSD?!) >>> >>> PS: Right before sending this, I saw dmesg on the freebsd box >>> advising increasing vfs.nfsd.tcphighwater .. So I up'ed that to >>> 64000. I also >> up'ed >>> the number of nfs server threads (-t) from 10 to 60 (we're >>> roughly 40 >> linux >>> machines) >>> >>> Any advice is most appreciated! >>> >>> Thanks _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs To >>> unsubscribe, send any mail to >>> "freebsd-fs-unsubscribe@freebsd.org" >>> >> > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs To > unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.1.5 (FreeBSD) iQIcBAEBCgAGBQJVlHxuAAoJEJW2GBstM+nskvsP/ire8QyTfL6mF1njMNZwI/k5 AQ+BwWs5r8LzcRN/4v7/gelbS+lXnYVbVHMl8q6j+HzUzQ3yId4ZGlJWpJtHDNnj +gV8kmFt/og1QTrQRbN81i4GEr914SlKWmo7LsxrWmEhAiKsN0sYsjELD/mH5BZX 1wRe3vTvyrMwm+6u1krqT8ZrxRANBFBmNqiFb8sag7B3oJQZsGhAyUSsJvUhb00o ozwC2NT5y8Jv0QcZdC/wGeYc8FmRNQTAjE22WkzbsUey/e7FxL7vflCGgngYCIxE zbZNW65xThZO8fti5MxiepJ27VPa5ocX0CQihBFYp5haG6fzWBGalV/ggAOwYL44 nz1caLhdKIj9JSd8QwLdTArq8+6H8Sx4jp4iGzQnppNo8PtG/AlHlw9uDKaUF4iw H+tMb6qMu2FQJ9X+phtplzvjZxCbBbwY205GeTm5eElOkYzIyYvqIvZasos02ze0 v3SQXtpIHjrnndXMVNRJOkhYquGxVFxUm5IJ7o+0wrgVJp1V3cBKd4vs0o84Mgu5 EPGKCyt8x/B6ujCxkunODpNOb+sFyq6aqsDLAO6JSih5HfQntpxoZTjm8p4KjsG6 nPqXQXmi2NoOd6WPOunp7w/y+fKA4YdLAhPC7rbXQwpLL81UqNH141BrtscN0ovi pyRlJ4r3Zs75qUwVSkzL =3/OG -----END PGP SIGNATURE-----