Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 01 Jul 2015 16:49:02 -0700
From:      Xin Li <delphij@delphij.net>
To:        Ahmed Kamal <email.ahmedkamal@googlemail.com>,  Rick Macklem <rmacklem@uoguelph.ca>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Linux NFSv4 clients are getting (bad sequence-id error!)
Message-ID:  <55947C6E.5060409@delphij.net>
In-Reply-To: <CANzjMX7xKBvnzJhQhB_ZrUnyE2m_FJXXy4fm_RFnuZfBDyDm2A@mail.gmail.com>
References:  <CANzjMX45QaC8yZx2nHPAohJRvQjmUOHuhMQWP9nX%2BsrJs707Hg@mail.gmail.com> <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <CANzjMX7xKBvnzJhQhB_ZrUnyE2m_FJXXy4fm_RFnuZfBDyDm2A@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 07/01/15 16:44, Ahmed Kamal via freebsd-fs wrote:
> The not so great news is, after updating sysctl and rebooting the
> nas box, I still saw a few (NFS: v4 server nas  returned a bad
> sequence-id error!) lines in logs. Users have already left, so I
> don't know how bad is it ..
> 
> Could you share more info on what this error means? RedHat seems to
> think the client can skip-by-1 and choose larger IDs and that would
> be totally fine ? Also how serious is this error, would it cause
> NFS session stall like that ?

I wonder if this would help, which loosen the check:

Index: sys/fs/nfsserver/nfs_nfsdstate.c
===================================================================
- --- sys/fs/nfsserver/nfs_nfsdstate.c	(revision 285016)
+++ sys/fs/nfsserver/nfs_nfsdstate.c	(working copy)
@@ -3805,7 +3805,8 @@ nfsrv_checkseqid(struct nfsrv_descript *nd, u_int3
 		printf("refcnt=%d\n", stp->ls_op->rc_refcnt);
 		panic("nfsrvstate op refcnt");
 	}
- -	if ((stp->ls_seq + 1) == seqid) {
+	if ((stp->ls_seq + 1) == seqid ||
+	    (stp->ls_seq + 2) == seqid) {
 		if (stp->ls_op)
 			nfsrvd_derefcache(stp->ls_op);
 		stp->ls_op = op;


Personally I don't quite buy the skip-by-1 is Okay argument but it
seems that the RFC text can be interpreted that way.

Cheers,


> On Thu, Jul 2, 2015 at 1:36 AM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> 
>> Ahmed Kamal wrote:
>>> Hi all,
>>> 
>>> I'm a refugee from linux land. I just set up my first freebsd
>>> 10.1 zfs
>> box,
>>> sharing /home over nfs. Since every home directory is its own
>>> zfs
>> dataset,
>>> I chose to use nfsv4 to enable recursively sharing/mounting any
>>> directory under /home (I understand nfs4 is a must in this
>>> scenario!)
>>> 
>>> I'm able to mount form linux (rhel5 latest kernel)
>>> successfully. Users
>> are
>>> working fine. However every now and then a user screams that
>>> his session
>> is
>>> frozen. Usually the processes are stuck in nfs_wait or rpc_*
>>> state. I
>> tried
>>> using a much newer linux kernel (3.2 however it still faced the
>>> same problem). The errors in Linux log files are mostly: Jul  1
>>> 17:41:47 mammoth kernel: NFS: v4 server nas  returned a *bad 
>>> sequence-id error*! Jul  1 17:52:32 mammoth kernel:
>>> nfs4_reclaim_locks: unhandled error -11. Zeroing state Jul  1
>>> 17:52:32 mammoth kernel: nfs4_reclaim_open_state: Lock reclaim 
>>> failed!
>>> 
>> Btw, a client should only do "reclaim" operations after the
>> server has replied with NFS4ERR_STALE_CLIENTID or
>> NFS4ERR_STALE_STATEID. I am pretty certain that the FreeBSD NFSv4
>> server only generates these replies after it has rebooted, so
>> assuming the server didn't reboot, I have no idea why the client
>> would attempt these and am not surprised they failed.
>> 
>> I'm guessing that the DRC constipation somehow caused the Linux
>> client to go into recovery mode?
>> 
>> rick
>> 
>>> My search led me to
>>> (https://access.redhat.com/solutions/1328073) a detailed
>>> analysis of the issue, which you can read over here 
>>> https://dl.dropboxusercontent.com/u/51939288/nfs4-bad-seq.pdf
>>> .. NetApp confirmed this was a bug for them (I'm wondering if
>>> this is still in FreeBSD?!)
>>> 
>>> PS: Right before sending this, I saw dmesg on the freebsd box
>>> advising increasing vfs.nfsd.tcphighwater .. So I up'ed that to
>>> 64000. I also
>> up'ed
>>> the number of nfs server threads (-t) from 10 to 60 (we're
>>> roughly 40
>> linux
>>> machines)
>>> 
>>> Any advice is most appreciated!
>>> 
>>> Thanks _______________________________________________ 
>>> freebsd-fs@freebsd.org mailing list 
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs To
>>> unsubscribe, send any mail to
>>> "freebsd-fs-unsubscribe@freebsd.org"
>>> 
>> 
> _______________________________________________ 
> freebsd-fs@freebsd.org mailing list 
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs To
> unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 


- -- 
Xin LI <delphij@delphij.net>    https://www.delphij.net/
FreeBSD - The Power to Serve!           Live free or die
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.1.5 (FreeBSD)

iQIcBAEBCgAGBQJVlHxuAAoJEJW2GBstM+nskvsP/ire8QyTfL6mF1njMNZwI/k5
AQ+BwWs5r8LzcRN/4v7/gelbS+lXnYVbVHMl8q6j+HzUzQ3yId4ZGlJWpJtHDNnj
+gV8kmFt/og1QTrQRbN81i4GEr914SlKWmo7LsxrWmEhAiKsN0sYsjELD/mH5BZX
1wRe3vTvyrMwm+6u1krqT8ZrxRANBFBmNqiFb8sag7B3oJQZsGhAyUSsJvUhb00o
ozwC2NT5y8Jv0QcZdC/wGeYc8FmRNQTAjE22WkzbsUey/e7FxL7vflCGgngYCIxE
zbZNW65xThZO8fti5MxiepJ27VPa5ocX0CQihBFYp5haG6fzWBGalV/ggAOwYL44
nz1caLhdKIj9JSd8QwLdTArq8+6H8Sx4jp4iGzQnppNo8PtG/AlHlw9uDKaUF4iw
H+tMb6qMu2FQJ9X+phtplzvjZxCbBbwY205GeTm5eElOkYzIyYvqIvZasos02ze0
v3SQXtpIHjrnndXMVNRJOkhYquGxVFxUm5IJ7o+0wrgVJp1V3cBKd4vs0o84Mgu5
EPGKCyt8x/B6ujCxkunODpNOb+sFyq6aqsDLAO6JSih5HfQntpxoZTjm8p4KjsG6
nPqXQXmi2NoOd6WPOunp7w/y+fKA4YdLAhPC7rbXQwpLL81UqNH141BrtscN0ovi
pyRlJ4r3Zs75qUwVSkzL
=3/OG
-----END PGP SIGNATURE-----



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55947C6E.5060409>