From owner-freebsd-fs@freebsd.org  Thu Jul 23 17:55:20 2015
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C6C929A9D41
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 23 Jul 2015 17:55:20 +0000 (UTC)
 (envelope-from m.e.sanliturk@gmail.com)
Received: from mail-ie0-x230.google.com (mail-ie0-x230.google.com
 [IPv6:2607:f8b0:4001:c03::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 853B9144C
 for <freebsd-fs@freebsd.org>; Thu, 23 Jul 2015 17:55:20 +0000 (UTC)
 (envelope-from m.e.sanliturk@gmail.com)
Received: by iebmu5 with SMTP id mu5so1685434ieb.1
 for <freebsd-fs@freebsd.org>; Thu, 23 Jul 2015 10:55:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=6Xk8v/BdK/twku5L9VL3qbIfKAW1ROk/8GCBvcU4oEI=;
 b=OQ9j7SAaBMHmiccicXMqgxj5IL3R403K90/u6I7iTy5zF4gSeMYANyVlmc2sl/uUWp
 aNdQjCHHMBHOHo72QnYZAAJD/eEqNMldGm+sPwJALYVZtZJdCHvJd1DhPlhNnW35nz5+
 ZeSQlei/e1OJlyNxrQR7yBuDPdE+v+g+K4Dtunw+OV4GIjRy62SRwB33G0Aly5xxukh1
 tLBPYtk5sAMCIj3pvWjWJpnjMNWWO+RkjSvGwi7AI6d0A/b6z/N0ObvB86ysZXYrY73L
 syHvWgrL54QqYoKbaZEtiGje9RnW0vuOSDs7/PBLZq/8PPzfOGu6zbTAS7z19gF5D+2H
 05UQ==
MIME-Version: 1.0
X-Received: by 10.107.41.146 with SMTP id p140mr15286418iop.58.1437669439640; 
 Thu, 23 Jul 2015 09:37:19 -0700 (PDT)
Received: by 10.65.15.33 with HTTP; Thu, 23 Jul 2015 09:37:19 -0700 (PDT)
In-Reply-To: <CANzjMX6z39CSZgo7ag+MhWqF-m2=nC85XqwunBXMYDRYG92qPw@mail.gmail.com>
References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca>
 <CANzjMX427XNQJ1o6Wh2CVy1LF1ivspGcfNeRCmv+OyApK2UhJg@mail.gmail.com>
 <CANzjMX5xyUz6OkMKS4O-MrV2w58YT9ricOPLJWVtAR5Ci-LMew@mail.gmail.com>
 <20150716235022.GF32479@physics.umn.edu>
 <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca>
 <CANzjMX4NmxBErtEu=e5yEGJ6gAJBF4_ar_aPdNDO2-tUcePqTQ@mail.gmail.com>
 <CANzjMX6e4dZF_pzvujwXB6u8scfzh6Z1nQ8OPLYUmc28hbZvkg@mail.gmail.com>
 <CANzjMX5RA4eSQ8sk1n5hG0AaeThDJqw4x7iJu6kQEV_3+QAXpQ@mail.gmail.com>
 <1474771205.1788105.1437648059578.JavaMail.zimbra@uoguelph.ca>
 <CANzjMX6z39CSZgo7ag+MhWqF-m2=nC85XqwunBXMYDRYG92qPw@mail.gmail.com>
Date: Thu, 23 Jul 2015 09:37:19 -0700
Message-ID: <CAOgwaMsF+O+0ObAR6L52kjB6vucaAM9aaHHmjUazQ+74XKES9w@mail.gmail.com>
Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!)
From: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com>
To: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Cc: Rick Macklem <rmacklem@uoguelph.ca>, 
 Ahmed Kamal via freebsd-fs <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Jul 2015 17:55:21 -0000

On Thu, Jul 23, 2015 at 8:26 AM, Ahmed Kamal via freebsd-fs <
freebsd-fs@freebsd.org> wrote:

> Well .. The problem is now gone, so I guess I can't collect more data till
> it happens (or hopefully doesn't :) happen again .. So as I described, I
> had to restart the FreeBSD NFS server box first .. maybe this caused linux
> clients to give up after 5 mins, and attempt to destroy the session ? When
> the NFS server was back up .. It was being bombarded (50Mbps traffic) with
> rpc traffic, probably saying this "destroy session" message.
>
> What I don't understand however is, why doesn't this end. What does FreeBSD
> reply with? Shouldn't it say, Okay, I don't know anything about this
> session, so consider it destroyed .. suit yourself linux .. or does it
> refuse to destroy, causing Linux to keep on retrying like crazy ?
>
>

My opinion is that in the latest Linux NFS client there is a problem : It
is consuming too much time to communicate with the Linux server . For that
reason , I have switched backed to Fedora 18 as a client because of this
"fighting" with the server visible from the switch lights and getting
response after a long activity which is meaningless to make so much
activity get a response  .

Server is Fedora 19 .

Mehmet Erol Sanliturk


> On Thu, Jul 23, 2015 at 12:40 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
>
> > Ahmed Kamal wrote:
> > > rhel6 servers logs were flooded with errors like:
> > http://paste2.org/EwLGcGF6
> > > The Freebsd box was being pounded with 40Mbps of nfs traffic ..
> probably
> > > Linux was retrying too hard ?! I had to reboot all PCs and after the
> last
> > > one, nfsd CPU usage dropped immediately to zero
> > >
> > Btw, it would be interesting to know what triggers these things (overload
> > of
> > the nfs server resulting in very slow response or ???). Basically
> > Destroy_session
> > isn't an operation that a client would normally do. I have no idea why
> the
> > Linux
> > client would do it. (A session is what achieves the "exactly once"
> > semantics for
> > the RPCs. It should really be in the RPC layer, but the NFSv4 working
> > group put
> > it in NFSv4.1 because they didn't want to replace Sun RPC. I can't think
> > of a reason
> > to destroy a session except on dismount. Maybe if the client thinks the
> > session is
> > broken for some reason??)
> >
> > Maybe something like "vmstat -m", "vmstat -z" and "nfsstat -s -e" running
> > repeatedly
> > (once/sec with timestamps via "date" or similar) so that you can see what
> > was happening just
> > before the meltdowns.
> >
> > A raw packet trace of just when the meltdown starts would be useful, but
> I
> > can't think
> > of how you'd get one of reasonable size. Maybe having "tcpdump -s 0 -w
> > <file>.pcap <client-host>"
> > run for 1sec and then kill/restart it repeatedly with different file
> > names, so you might get
> > a useful 1sec capture at the critical time?
> >
> > Anyhow, good luck with it, rick
> >
> > > On Tue, Jul 21, 2015 at 5:52 AM, Ahmed Kamal <
> > > email.ahmedkamal@googlemail.com> wrote:
> > >
> > > > More info .. Just noticed nfsd is spinning the cpu at 500% :( I just
> > did
> > > > the dtrace with:
> > > >
> > > > dtrace -n profile-1001 { @[stack()] = count(); }
> > > > The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom)
> > > >
> > > > Since rebooting the nfs server didn't fix it .. I imagine I'd have to
> > > > reboot all NFS clients .. This would be really sad .. Any advice is
> > most
> > > > appreciated .. Thanks
> > > >
> > > >
> > > > On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal <
> > > > email.ahmedkamal@googlemail.com> wrote:
> > > >
> > > >> Hi folks,
> > > >>
> > > >> I've upgraded a test client to rhel6 today, and I'll keep an eye on
> > it to
> > > >> see what happens.
> > > >>
> > > >> During the process, I made the (I guess mistake) of zfs send | recv
> > to a
> > > >> locally attached usb disk for backup purposes .. long story short,
> > > >> sharenfs
> > > >> property on the received filesystem was causing some nfs/mountd
> > errors in
> > > >> logs .. I wasn't too happy with what I got .. I destroyed the backup
> > > >> datasets and the whole pool eventually .. and then rebooted the
> whole
> > nas
> > > >> box .. After reboot my logs are still flooded with
> > > >>
> > > >> Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session
> > > >> Jul 21 05:13:07 nas last message repeated 7536 times
> > > >> Jul 21 05:15:08 nas last message repeated 29664 times
> > > >>
> > > >> Not sure what that means .. or how it can be stopped .. Anyway, will
> > keep
> > > >> you posted on progress.
> > > >>
> > > >> On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem <rmacklem@uoguelph.ca
> >
> > > >> wrote:
> > > >>
> > > >>> Graham Allan wrote:
> > > >>> > I'm curious how things are going for you with this?
> > > >>> >
> > > >>> > Reading your thread did pique my interest since we have a lot of
> > > >>> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I
> > meant
> > > >>> > to glance through our logs for signs of the same issue, but
> today I
> > > >>> > started investigating a machine which appeared to have hung
> > processes,
> > > >>> > high rpciod load, and high traffic to the NFS server. Of course
> it
> > is
> > > >>> > exactly this issue.
> > > >>> >
> > > >>> > The affected machine is running SL5 though most of our server
> > nodes are
> > > >>> > now SL6. I can see errors from most of them but the SL6 systems
> > appear
> > > >>> > less affected - I see a stream of the sequence-id errors in their
> > logs
> > > >>> but
> > > >>> > things in general keep working. The one SL5 machine I'm looking
> at
> > > >>> > has a single sequence-id error in today's logs, but then goes
> into
> > a
> > > >>> > stream of "state recovery failed" then "Lock reclaim failed".
> It's
> > > >>> > probably partly related to the particular workload on this
> machine.
> > > >>> >
> > > >>> > I would try switching our SL6 machines to NFS 4.1 to see if the
> > > >>> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is
> > it in
> > > >>> > 10.1?).
> > > >>> >
> > > >>> Btw, I've done some testing against a fairly recent Fedora and
> > haven't
> > > >>> seen
> > > >>> the problem. If either of you guys could load a recent Fedora on a
> > test
> > > >>> client
> > > >>> box, it would be interesting to see if it suffers from this. (My
> > > >>> experience is
> > > >>> that the Fedora distros have more up to date Linux NFS clients.)
> > > >>>
> > > >>> rick
> > > >>>
> > > >>> > At the NFS servers, most of the sysctl settings are already tuned
> > > >>> > from defaults. eg tcp.highwater=100000,
> vfs.nfsd.tcpcachetimeo=300,
> > > >>> > 128-256 nfs kernel threads.
> > > >>> >
> > > >>> > Graham
> > > >>> >
> > > >>> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via
> > freebsd-fs
> > > >>> wrote:
> > > >>> > > PS: Today (after adjusting tcp.highwater) I didn't get any
> > screaming
> > > >>> > > reports from users about hung vnc sessions. So maybe just
> maybe,
> > > >>> linux
> > > >>> > > clients are able to somehow recover from this bad sequence
> > messages.
> > > >>> I
> > > >>> > > could still see the bad sequence error message in logs though
> > > >>> > >
> > > >>> > > Why isn't the highwater tunable set to something better by
> > default ?
> > > >>> I mean
> > > >>> > > this server is certainly not under a high or unusual load (it's
> > only
> > > >>> 40 PCs
> > > >>> > > mounting from it)
> > > >>> > >
> > > >>> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal
> > > >>> > > <email.ahmedkamal@googlemail.com
> > > >>> > > > wrote:
> > > >>> > >
> > > >>> > > > Thanks all .. I understand now we're doing the "right thing"
> ..
> > > >>> Although
> > > >>> > > > if mounting keeps wedging, I will have to solve it somehow!
> > Either
> > > >>> using
> > > >>> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1.
> > > >>> > > >
> > > >>> > > > Regarding Xin's patch, is it possible to build the patched
> nfsd
> > > >>> code, as
> > > >>> > > > a
> > > >>> > > > kernel module ? I'm looking to minimize my delta to upstream.
> > > >>> > > >
> > > >>> > > > Also would adopting Xin's patch and hiding it behind a
> > > >>> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably
> > not
> > > >>> the
> > > >>> > > > last
> > > >>> > > > person on earth to hit this) ?
> > > >>> > > >
> > > >>> > > > Thanks a lot for all the help!
> > > >>> > > >
> > > >>> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem <
> > > >>> rmacklem@uoguelph.ca>
> > > >>>
> > > >>> > > > wrote:
> > > >>> > > >
> > > >>> > > >> Ahmed Kamal wrote:
> > > >>> > > >> > Appreciating the fruitful discussion! Can someone please
> > > >>> explain to
> > > >>> > > >> > me,
> > > >>> > > >> > what would happen in the current situation (linux client
> > doing
> > > >>> this
> > > >>> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the
> > effect
> > > >>> of
> > > >>> > > >> > that?
> > > >>> > > >> Well, as you've seen, the Linux client doesn't function
> > correctly
> > > >>> > > >> against
> > > >>> > > >> the FreeBSD server (and probably others that don't support
> > this
> > > >>> > > >> "skip-by-1"
> > > >>> > > >> case).
> > > >>> > > >>
> > > >>> > > >> > What do users see? Any chances of data loss?
> > > >>> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess
> > what
> > > >>> the
> > > >>> > > >> Linux
> > > >>> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID.
> You're
> > the
> > > >>> guy
> > > >>> > > >> observing
> > > >>> > > >> it.
> > > >>> > > >>
> > > >>> > > >> >
> > > >>> > > >> > Also, I find it strange that netapp have acknowledged this
> > is a
> > > >>> bug on
> > > >>> > > >> > their side, which has been fixed since then!
> > > >>> > > >> Yea, I think Netapp screwed up. For some reason their server
> > > >>> allowed
> > > >>> > > >> this,
> > > >>> > > >> then was fixed to not allow it and then someone decided that
> > was
> > > >>> broken
> > > >>> > > >> and
> > > >>> > > >> reversed it.
> > > >>> > > >>
> > > >>> > > >> > I also find it strange that I'm the first to hit this :)
> Is
> > no
> > > >>> one
> > > >>> > > >> running
> > > >>> > > >> > nfs4 yet!
> > > >>> > > >> >
> > > >>> > > >> Well, it seems to be slowly catching on. I suspect that the
> > Linux
> > > >>> client
> > > >>> > > >> mounting a Netapp is the most common use of it. Since it
> > appears
> > > >>> that
> > > >>> > > >> they
> > > >>> > > >> flip flopped w.r.t. who's bug this is, it has probably
> > persisted.
> > > >>> > > >>
> > > >>> > > >> It may turn out that the Linux client has been fixed or it
> may
> > > >>> turn out
> > > >>> > > >> that most servers allowed this "skip-by-1" even though David
> > > >>> Noveck (one
> > > >>> > > >> of the main authors of the protocol) seems to agree with me
> > that
> > > >>> it
> > > >>> > > >> should
> > > >>> > > >> not be allowed.
> > > >>> > > >>
> > > >>> > > >> It is possible that others have bumped into this, but it
> > wasn't
> > > >>> isolated
> > > >>> > > >> (I wouldn't have guessed it, so it was good you pointed to
> the
> > > >>> RedHat
> > > >>> > > >> discussion)
> > > >>> > > >> and they worked around it by reverting to NFSv3 or similar.
> > > >>> > > >> The protocol is rather complex in this area and changed
> > > >>> completely for
> > > >>> > > >> NFSv4.1,
> > > >>> > > >> so many have also probably moved onto NFSv4.1 where this
> > won't be
> > > >>> an
> > > >>> > > >> issue.
> > > >>> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics
> > and
> > > >>> doesn't
> > > >>> > > >> use
> > > >>> > > >>  these seqid fields.)
> > > >>> > > >>
> > > >>> > > >> This is all just mho, rick
> > > >>> > > >>
> > > >>> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem <
> > > >>> rmacklem@uoguelph.ca>
> > > >>> > > >> wrote:
> > > >>> > > >> >
> > > >>> > > >> > > Julian Elischer wrote:
> > > >>> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote:
> > > >>> > > >> > > > > I am going to post to nfsv4@ietf.org to see what
> they
> > > >>> say.
> > > >>> > > >> > > > > Please
> > > >>> > > >> > > > > let me know if Xin Li's patch resolves your problem,
> > even
> > > >>> though
> > > >>> > > >> > > > > I
> > > >>> > > >> > > > > don't believe it is correct except for the
> UINT32_MAX
> > > >>> case. Good
> > > >>> > > >> > > > > luck with it, rick
> > > >>> > > >> > > > and please keep us all in the loop as to what they
> say!
> > > >>> > > >> > > >
> > > >>> > > >> > > > the general N+2 bit sounds like bullshit to me.. its
> > always
> > > >>> N+1 in
> > > >>> > > >> > > > a
> > > >>> > > >> > > > number field that has a
> > > >>> > > >> > > > bit of slack at wrap time (probably due to some
> > ambiguity
> > > >>> in the
> > > >>> > > >> > > > original spec).
> > > >>> > > >> > > >
> > > >>> > > >> > > Actually, since N is the lock op already done, N + 1 is
> > the
> > > >>> next
> > > >>> > > >> > > lock
> > > >>> > > >> > > operation in order. Since lock ops need to be strictly
> > > >>> ordered,
> > > >>> > > >> allowing
> > > >>> > > >> > > N + 2 (which means N + 2 would be done before N + 1)
> > makes no
> > > >>> sense.
> > > >>> > > >> > >
> > > >>> > > >> > > I think the author of the RFC meant that N + 2 or
> greater
> > > >>> fails, but
> > > >>> > > >> it
> > > >>> > > >> > > was poorly worded.
> > > >>> > > >> > >
> > > >>> > > >> > > I will pass along whatever I get from nfsv4@ietf.org.
> > (There
> > > >>> is an
> > > >>> > > >> archive
> > > >>> > > >> > > of it somewhere, but I can't remember where.;-)
> > > >>> > > >> > >
> > > >>> > > >> > > rick
> > > >>> > > >> > > _______________________________________________
> > > >>> > > >> > > freebsd-fs@freebsd.org mailing list
> > > >>> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > >>> > > >> > > To unsubscribe, send any mail to
> > > >>> > > >> > > "freebsd-fs-unsubscribe@freebsd.org"
> > > >>> > > >> > >
> > > >>> > > >> >
> > > >>> > > >>
> > > >>> > > >
> > > >>> > > >
> > > >>> > > _______________________________________________
> > > >>> > > freebsd-fs@freebsd.org mailing list
> > > >>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > >>> > > To unsubscribe, send any mail to "
> > freebsd-fs-unsubscribe@freebsd.org
> > > >>> "
> > > >>> >
> > > >>> > --
> > > >>> >
> > > >>>
> > -------------------------------------------------------------------------
> > > >>> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612)
> > 624-5040
> > > >>> > School of Physics and Astronomy - University of Minnesota
> > > >>> >
> > > >>>
> > -------------------------------------------------------------------------
> > > >>> > _______________________________________________
> > > >>> > freebsd-fs@freebsd.org mailing list
> > > >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > >>> > To unsubscribe, send any mail to "
> > freebsd-fs-unsubscribe@freebsd.org"
> > > >>> >
> > > >>> _______________________________________________
> > > >>> freebsd-fs@freebsd.org mailing list
> > > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > >>> To unsubscribe, send any mail to "
> freebsd-fs-unsubscribe@freebsd.org
> > "
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>