Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Jul 2015 05:52:19 +0200
From:      Ahmed Kamal <email.ahmedkamal@googlemail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Graham Allan <allan@physics.umn.edu>,  Ahmed Kamal via freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: Linux NFSv4 clients are getting (bad sequence-id error!)
Message-ID:  <CANzjMX6e4dZF_pzvujwXB6u8scfzh6Z1nQ8OPLYUmc28hbZvkg@mail.gmail.com>
In-Reply-To: <CANzjMX4NmxBErtEu=e5yEGJ6gAJBF4_ar_aPdNDO2-tUcePqTQ@mail.gmail.com>
References:  <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <5594B008.10202@freebsd.org> <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <CANzjMX5eN1FsnHMf6KGZe_b3vwxxF=dy3fJUHxeGO4BXuNzfPA@mail.gmail.com> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <CANzjMX427XNQJ1o6Wh2CVy1LF1ivspGcfNeRCmv%2BOyApK2UhJg@mail.gmail.com> <CANzjMX5xyUz6OkMKS4O-MrV2w58YT9ricOPLJWVtAR5Ci-LMew@mail.gmail.com> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> <CANzjMX4NmxBErtEu=e5yEGJ6gAJBF4_ar_aPdNDO2-tUcePqTQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
More info .. Just noticed nfsd is spinning the cpu at 500% :( I just did
the dtrace with:

dtrace -n profile-1001 { @[stack()] = count(); }
The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom)

Since rebooting the nfs server didn't fix it .. I imagine I'd have to
reboot all NFS clients .. This would be really sad .. Any advice is most
appreciated .. Thanks


On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal <
email.ahmedkamal@googlemail.com> wrote:

> Hi folks,
>
> I've upgraded a test client to rhel6 today, and I'll keep an eye on it to
> see what happens.
>
> During the process, I made the (I guess mistake) of zfs send | recv to a
> locally attached usb disk for backup purposes .. long story short, sharenfs
> property on the received filesystem was causing some nfs/mountd errors in
> logs .. I wasn't too happy with what I got .. I destroyed the backup
> datasets and the whole pool eventually .. and then rebooted the whole nas
> box .. After reboot my logs are still flooded with
>
> Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session
> Jul 21 05:13:07 nas last message repeated 7536 times
> Jul 21 05:15:08 nas last message repeated 29664 times
>
> Not sure what that means .. or how it can be stopped .. Anyway, will keep
> you posted on progress.
>
> On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
>
>> Graham Allan wrote:
>> > I'm curious how things are going for you with this?
>> >
>> > Reading your thread did pique my interest since we have a lot of
>> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I meant
>> > to glance through our logs for signs of the same issue, but today I
>> > started investigating a machine which appeared to have hung processes,
>> > high rpciod load, and high traffic to the NFS server. Of course it is
>> > exactly this issue.
>> >
>> > The affected machine is running SL5 though most of our server nodes are
>> > now SL6. I can see errors from most of them but the SL6 systems appear
>> > less affected - I see a stream of the sequence-id errors in their logs
>> but
>> > things in general keep working. The one SL5 machine I'm looking at
>> > has a single sequence-id error in today's logs, but then goes into a
>> > stream of "state recovery failed" then "Lock reclaim failed". It's
>> > probably partly related to the particular workload on this machine.
>> >
>> > I would try switching our SL6 machines to NFS 4.1 to see if the
>> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is it in
>> > 10.1?).
>> >
>> Btw, I've done some testing against a fairly recent Fedora and haven't
>> seen
>> the problem. If either of you guys could load a recent Fedora on a test
>> client
>> box, it would be interesting to see if it suffers from this. (My
>> experience is
>> that the Fedora distros have more up to date Linux NFS clients.)
>>
>> rick
>>
>> > At the NFS servers, most of the sysctl settings are already tuned
>> > from defaults. eg tcp.highwater=100000, vfs.nfsd.tcpcachetimeo=300,
>> > 128-256 nfs kernel threads.
>> >
>> > Graham
>> >
>> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via freebsd-fs
>> wrote:
>> > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming
>> > > reports from users about hung vnc sessions. So maybe just maybe, linux
>> > > clients are able to somehow recover from this bad sequence messages. I
>> > > could still see the bad sequence error message in logs though
>> > >
>> > > Why isn't the highwater tunable set to something better by default ?
>> I mean
>> > > this server is certainly not under a high or unusual load (it's only
>> 40 PCs
>> > > mounting from it)
>> > >
>> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal
>> > > <email.ahmedkamal@googlemail.com
>> > > > wrote:
>> > >
>> > > > Thanks all .. I understand now we're doing the "right thing" ..
>> Although
>> > > > if mounting keeps wedging, I will have to solve it somehow! Either
>> using
>> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1.
>> > > >
>> > > > Regarding Xin's patch, is it possible to build the patched nfsd
>> code, as
>> > > > a
>> > > > kernel module ? I'm looking to minimize my delta to upstream.
>> > > >
>> > > > Also would adopting Xin's patch and hiding it behind a
>> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not
>> the
>> > > > last
>> > > > person on earth to hit this) ?
>> > > >
>> > > > Thanks a lot for all the help!
>> > > >
>> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem <rmacklem@uoguelph.ca
>> >
>>
>> > > > wrote:
>> > > >
>> > > >> Ahmed Kamal wrote:
>> > > >> > Appreciating the fruitful discussion! Can someone please explain
>> to
>> > > >> > me,
>> > > >> > what would happen in the current situation (linux client doing
>> this
>> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect
>> of
>> > > >> > that?
>> > > >> Well, as you've seen, the Linux client doesn't function correctly
>> > > >> against
>> > > >> the FreeBSD server (and probably others that don't support this
>> > > >> "skip-by-1"
>> > > >> case).
>> > > >>
>> > > >> > What do users see? Any chances of data loss?
>> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what the
>> > > >> Linux
>> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the
>> guy
>> > > >> observing
>> > > >> it.
>> > > >>
>> > > >> >
>> > > >> > Also, I find it strange that netapp have acknowledged this is a
>> bug on
>> > > >> > their side, which has been fixed since then!
>> > > >> Yea, I think Netapp screwed up. For some reason their server
>> allowed
>> > > >> this,
>> > > >> then was fixed to not allow it and then someone decided that was
>> broken
>> > > >> and
>> > > >> reversed it.
>> > > >>
>> > > >> > I also find it strange that I'm the first to hit this :) Is no
>> one
>> > > >> running
>> > > >> > nfs4 yet!
>> > > >> >
>> > > >> Well, it seems to be slowly catching on. I suspect that the Linux
>> client
>> > > >> mounting a Netapp is the most common use of it. Since it appears
>> that
>> > > >> they
>> > > >> flip flopped w.r.t. who's bug this is, it has probably persisted.
>> > > >>
>> > > >> It may turn out that the Linux client has been fixed or it may
>> turn out
>> > > >> that most servers allowed this "skip-by-1" even though David
>> Noveck (one
>> > > >> of the main authors of the protocol) seems to agree with me that it
>> > > >> should
>> > > >> not be allowed.
>> > > >>
>> > > >> It is possible that others have bumped into this, but it wasn't
>> isolated
>> > > >> (I wouldn't have guessed it, so it was good you pointed to the
>> RedHat
>> > > >> discussion)
>> > > >> and they worked around it by reverting to NFSv3 or similar.
>> > > >> The protocol is rather complex in this area and changed completely
>> for
>> > > >> NFSv4.1,
>> > > >> so many have also probably moved onto NFSv4.1 where this won't be
>> an
>> > > >> issue.
>> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and
>> doesn't
>> > > >> use
>> > > >>  these seqid fields.)
>> > > >>
>> > > >> This is all just mho, rick
>> > > >>
>> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem <
>> rmacklem@uoguelph.ca>
>> > > >> wrote:
>> > > >> >
>> > > >> > > Julian Elischer wrote:
>> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote:
>> > > >> > > > > I am going to post to nfsv4@ietf.org to see what they say.
>> > > >> > > > > Please
>> > > >> > > > > let me know if Xin Li's patch resolves your problem, even
>> though
>> > > >> > > > > I
>> > > >> > > > > don't believe it is correct except for the UINT32_MAX
>> case. Good
>> > > >> > > > > luck with it, rick
>> > > >> > > > and please keep us all in the loop as to what they say!
>> > > >> > > >
>> > > >> > > > the general N+2 bit sounds like bullshit to me.. its always
>> N+1 in
>> > > >> > > > a
>> > > >> > > > number field that has a
>> > > >> > > > bit of slack at wrap time (probably due to some ambiguity in
>> the
>> > > >> > > > original spec).
>> > > >> > > >
>> > > >> > > Actually, since N is the lock op already done, N + 1 is the
>> next
>> > > >> > > lock
>> > > >> > > operation in order. Since lock ops need to be strictly ordered,
>> > > >> allowing
>> > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no
>> sense.
>> > > >> > >
>> > > >> > > I think the author of the RFC meant that N + 2 or greater
>> fails, but
>> > > >> it
>> > > >> > > was poorly worded.
>> > > >> > >
>> > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There
>> is an
>> > > >> archive
>> > > >> > > of it somewhere, but I can't remember where.;-)
>> > > >> > >
>> > > >> > > rick
>> > > >> > > _______________________________________________
>> > > >> > > freebsd-fs@freebsd.org mailing list
>> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> > > >> > > To unsubscribe, send any mail to
>> > > >> > > "freebsd-fs-unsubscribe@freebsd.org"
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > > _______________________________________________
>> > > freebsd-fs@freebsd.org mailing list
>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>> >
>> > --
>> >
>> -------------------------------------------------------------------------
>> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) 624-5040
>> > School of Physics and Astronomy - University of Minnesota
>> >
>> -------------------------------------------------------------------------
>> > _______________________________________________
>> > freebsd-fs@freebsd.org mailing list
>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>> >
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANzjMX6e4dZF_pzvujwXB6u8scfzh6Z1nQ8OPLYUmc28hbZvkg>