Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 3 Jul 2015 01:21:00 +0200
From:      Ahmed Kamal <email.ahmedkamal@googlemail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Julian Elischer <julian@freebsd.org>, freebsd-fs@freebsd.org, Xin LI <d@delphij.net>
Subject:   Re: Linux NFSv4 clients are getting (bad sequence-id error!)
Message-ID:  <CANzjMX5xyUz6OkMKS4O-MrV2w58YT9ricOPLJWVtAR5Ci-LMew@mail.gmail.com>
In-Reply-To: <CANzjMX427XNQJ1o6Wh2CVy1LF1ivspGcfNeRCmv%2BOyApK2UhJg@mail.gmail.com>
References:  <CANzjMX45QaC8yZx2nHPAohJRvQjmUOHuhMQWP9nX%2BsrJs707Hg@mail.gmail.com> <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <CANzjMX7xKBvnzJhQhB_ZrUnyE2m_FJXXy4fm_RFnuZfBDyDm2A@mail.gmail.com> <55947C6E.5060409@delphij.net> <1491630362.2785531.1435799383802.JavaMail.zimbra@uoguelph.ca> <5594B008.10202@freebsd.org> <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <CANzjMX5eN1FsnHMf6KGZe_b3vwxxF=dy3fJUHxeGO4BXuNzfPA@mail.gmail.com> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <CANzjMX427XNQJ1o6Wh2CVy1LF1ivspGcfNeRCmv%2BOyApK2UhJg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
PS: Today (after adjusting tcp.highwater) I didn't get any screaming
reports from users about hung vnc sessions. So maybe just maybe, linux
clients are able to somehow recover from this bad sequence messages. I
could still see the bad sequence error message in logs though

Why isn't the highwater tunable set to something better by default ? I mean
this server is certainly not under a high or unusual load (it's only 40 PCs
mounting from it)

On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal <email.ahmedkamal@googlemail.com
> wrote:

> Thanks all .. I understand now we're doing the "right thing" .. Although
> if mounting keeps wedging, I will have to solve it somehow! Either using
> Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1.
>
> Regarding Xin's patch, is it possible to build the patched nfsd code, as a
> kernel module ? I'm looking to minimize my delta to upstream.
>
> Also would adopting Xin's patch and hiding it behind a
> kern.nfs.allow_linux_broken_client be an option (I'm probably not the last
> person on earth to hit this) ?
>
> Thanks a lot for all the help!
>
> On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
>
>> Ahmed Kamal wrote:
>> > Appreciating the fruitful discussion! Can someone please explain to me,
>> > what would happen in the current situation (linux client doing this
>> > skip-by-1 thing, and freebsd not doing it) ? What is the effect of that?
>> Well, as you've seen, the Linux client doesn't function correctly against
>> the FreeBSD server (and probably others that don't support this
>> "skip-by-1"
>> case).
>>
>> > What do users see? Any chances of data loss?
>> Hmm. Mostly it will cause Opens to fail, but I can't guess what the Linux
>> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the guy
>> observing
>> it.
>>
>> >
>> > Also, I find it strange that netapp have acknowledged this is a bug on
>> > their side, which has been fixed since then!
>> Yea, I think Netapp screwed up. For some reason their server allowed this,
>> then was fixed to not allow it and then someone decided that was broken
>> and
>> reversed it.
>>
>> > I also find it strange that I'm the first to hit this :) Is no one
>> running
>> > nfs4 yet!
>> >
>> Well, it seems to be slowly catching on. I suspect that the Linux client
>> mounting a Netapp is the most common use of it. Since it appears that they
>> flip flopped w.r.t. who's bug this is, it has probably persisted.
>>
>> It may turn out that the Linux client has been fixed or it may turn out
>> that most servers allowed this "skip-by-1" even though David Noveck (one
>> of the main authors of the protocol) seems to agree with me that it should
>> not be allowed.
>>
>> It is possible that others have bumped into this, but it wasn't isolated
>> (I wouldn't have guessed it, so it was good you pointed to the RedHat
>> discussion)
>> and they worked around it by reverting to NFSv3 or similar.
>> The protocol is rather complex in this area and changed completely for
>> NFSv4.1,
>> so many have also probably moved onto NFSv4.1 where this won't be an
>> issue.
>> (NFSv4.1 uses sessions to provide exactly once RPC semantics and doesn't
>> use
>>  these seqid fields.)
>>
>> This is all just mho, rick
>>
>> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem <rmacklem@uoguelph.ca>
>> wrote:
>> >
>> > > Julian Elischer wrote:
>> > > > On 7/2/15 9:09 AM, Rick Macklem wrote:
>> > > > > I am going to post to nfsv4@ietf.org to see what they say. Please
>> > > > > let me know if Xin Li's patch resolves your problem, even though I
>> > > > > don't believe it is correct except for the UINT32_MAX case. Good
>> > > > > luck with it, rick
>> > > > and please keep us all in the loop as to what they say!
>> > > >
>> > > > the general N+2 bit sounds like bullshit to me.. its always N+1 in a
>> > > > number field that has a
>> > > > bit of slack at wrap time (probably due to some ambiguity in the
>> > > > original spec).
>> > > >
>> > > Actually, since N is the lock op already done, N + 1 is the next lock
>> > > operation in order. Since lock ops need to be strictly ordered,
>> allowing
>> > > N + 2 (which means N + 2 would be done before N + 1) makes no sense.
>> > >
>> > > I think the author of the RFC meant that N + 2 or greater fails, but
>> it
>> > > was poorly worded.
>> > >
>> > > I will pass along whatever I get from nfsv4@ietf.org. (There is an
>> archive
>> > > of it somewhere, but I can't remember where.;-)
>> > >
>> > > rick
>> > > _______________________________________________
>> > > freebsd-fs@freebsd.org mailing list
>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>> > >
>> >
>>
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANzjMX5xyUz6OkMKS4O-MrV2w58YT9ricOPLJWVtAR5Ci-LMew>