Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Jul 2015 07:35:24 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Ahmed Kamal <email.ahmedkamal@googlemail.com>
Cc:        Graham Allan <allan@physics.umn.edu>,  Ahmed Kamal via freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: Linux NFSv4 clients are getting (bad sequence-id error!)
Message-ID:  <1603742210.4824721.1438169724361.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <CANzjMX7J7wm4f8nCXqjGe7F%2BZgwYFBtFtKTFOzrPJN-MeXJ5EA@mail.gmail.com>
References:  <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <55B12EB7.6030607@physics.umn.edu> <1935759160.2320694.1437688383362.JavaMail.zimbra@uoguelph.ca> <CANzjMX48F1gAVwqq64q=yALfTBNEc7iMbKAK1zi6aUfoF3WpOw@mail.gmail.com> <576106597.2326662.1437688749018.JavaMail.zimbra@uoguelph.ca> <CANzjMX5Q4TNLBxrAm6R2F6oUdfgRD8dX1LRZiniJA4M4HTN_=w@mail.gmail.com> <1089316279.4709692.1438126760802.JavaMail.zimbra@uoguelph.ca> <CANzjMX7J7wm4f8nCXqjGe7F%2BZgwYFBtFtKTFOzrPJN-MeXJ5EA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Ahmed Kamal wrote:
> hmm, if I understand you correctly, this time_seconds value is the number
> of seconds till the box booted ?
No, it is the number of seconds when the box booted. Once, it was supposed to
be the number of seconds since Jan. 1, 1970, but I don't if that is still the
case. (For my i386, it is about 1.4billion when it boots, so I'd guess they
still use Jan. 1, 1970. 3600*24*365*45 = 1419120000. Yea, I didn't bother with
leap years, etc.)

Now, I don't know if the clock could somehow set it to a value > 4billion when
the nfsd starts (it copies time_seconds to nfsrvboottime as it starts up), but the
current clock code is pretty convoluted stuff, so??

rick
ps: From the NFSv4 server's point of view, it only needs a number that is unique and
    changes every time the server reboots. As such, using the low order 32bits of
    it would be sufficient, even if it exceeds 4billion. However, the code incorrectly
    assumes it won't exceed 4*1024*1024*1024 - 1 unless you apply the patch.

> If so, I guess this is not really the
> cause of what we're seeing as the box is only up for 8 days
> 
> bsd# uptime
> 11:28AM  up 8 days,  6:20, 6 users, load averages: 0.94, 0.91, 0.84
> 
> The NFS client box's uptime is
> linux# uptime
>  11:31:39 up 8 days,  5:51, 11 users,  load average: 87.74, 87.43, 87.35
> 
> and yes the huge load is most likely due to this NFS bug
> 
> On Wed, Jul 29, 2015 at 1:39 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> 
> > Ahmed Kamal wrote:
> > > Hi again Rick,
> > >
> > > Seems that I'm still being unlucky with nfs :/ I caught one of the newly
> > > installed RHEL6 boxes having high CPU usage, and bombarding the BSD NFS
> > box
> > > with 10Mbps traffic .. I caught a tcpdump as you mentioned .. You can
> > > download it here:
> > >
> > >
> > https://dl.dropboxusercontent.com/u/51939288/nfs41-high-client-cpu.pcap.bz2
> > >
> > Ok, the packet trace suggests that the NFSv4 server is broken (it is
> > replying
> > with NFS4ERR_STALE_CLIENTID for a recently generated ClientID).
> > Now, I can't be sure, but the only explanation I can come up with is...
> > - For some arches (I only have i386, so I wouldn't have seen this during
> > testing),
> >   time_t is 64bits (uint64_t).
> >   --> If time_seconds somehow doesn't fit in the low order 32bits, then
> > the code
> >       would be busted for these arches because nfsrvboottime is set to
> > time_seconds
> >       when the server is started and then there are comparisons like:
> >       if (nfsrvboottime != clientid.lval[0])
> >            return (NFSERR_STALECLIENTID);
> >        /* where clientid.lval[0] is a uint32_t */
> > Anyhow, if this is what is happening, the attached simple patch should fix
> > it.
> > (I don't know how time_seconds would exceed 4billion, but the clock code is
> >  pretty convoluted, so I can't say if it can possibly happen?)
> >
> > rick
> > ps: Hmm, on i386 time_seconds ends up at 1438126486, so maybe it can exceed
> >     4*1024*1024*1024 - 1 on amd64?
> >
> > > I didn't restart the client yet .. so if you catch me in the next few
> > hours
> > > and want me to run any diagnostics, let me know. Thanks a lot all for
> > > helping
> > >
> > > On Thu, Jul 23, 2015 at 11:59 PM, Rick Macklem <rmacklem@uoguelph.ca>
> > wrote:
> > >
> > > > Ahmed Kamal wrote:
> > > > > Can you please let me know the ultimate packet trace command I'd
> > need to
> > > > > run in case of any nfs4 troubles .. I guess this should be
> > comprehensive
> > > > > even at the expense of a larger output size (which we can trim
> > later)..
> > > > > Thanks a lot for the help!
> > > > >
> > > > tcpdump -s 0 -w <file>.pcap host <client-host-name>
> > > > (<file> refers to a file name you choose and <client-host-name> refers
> > to
> > > >  the host name of a client generating traffic.)
> > > > --> But you won't be able to allow this to run for long during the
> > storm
> > > > or the
> > > >     file will be huge.
> > > >
> > > > Then you look at <file>.pcap in wireshark, which knows NFS.
> > > >
> > > > rick
> > > >
> > > > > On Thu, Jul 23, 2015 at 11:53 PM, Rick Macklem <rmacklem@uoguelph.ca
> > >
> > > > wrote:
> > > > >
> > > > > > Graham Allan wrote:
> > > > > > > For our part, the user whose code triggered the pathological
> > > > behaviour
> > > > > > > on SL5 reran it on SL6 without incident - I still see lots of
> > > > > > > sequence-id errors in the logs, but nothing bad happened.
> > > > > > >
> > > > > > > I'd still like to ask them to rerun again on SL5 to see if the
> > > > "accept
> > > > > > > skipped seqid" patch had any effect, though I think we expect
> > not.
> > > > Maybe
> > > > > > > it would be nice if I could get set up to capture rolling
> > tcpdumps of
> > > > > > > the nfs traffic before they run that though...
> > > > > > >
> > > > > > > Graham
> > > > > > >
> > > > > > > On 7/20/2015 10:26 PM, Ahmed Kamal wrote:
> > > > > > > > Hi folks,
> > > > > > > >
> > > > > > > > I've upgraded a test client to rhel6 today, and I'll keep an
> > eye
> > > > on it
> > > > > > > > to see what happens.
> > > > > > > >
> > > > > > > > During the process, I made the (I guess mistake) of zfs send |
> > > > recv to
> > > > > > a
> > > > > > > > locally attached usb disk for backup purposes .. long story
> > short,
> > > > > > > > sharenfs property on the received filesystem was causing some
> > > > > > nfs/mountd
> > > > > > > > errors in logs .. I wasn't too happy with what I got .. I
> > > > destroyed the
> > > > > > > > backup datasets and the whole pool eventually .. and then
> > rebooted
> > > > the
> > > > > > > > whole nas box .. After reboot my logs are still flooded with
> > > > > > > >
> > > > > > > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session
> > > > > > > > Jul 21 05:13:07 nas last message repeated 7536 times
> > > > > > > > Jul 21 05:15:08 nas last message repeated 29664 times
> > > > > > > >
> > > > > > > > Not sure what that means .. or how it can be stopped .. Anyway,
> > > > will
> > > > > > > > keep you posted on progress.
> > > > > > >
> > > > > > Oh, I didn't see the part about "reboot" before. Unfortunately, it
> > > > sounds
> > > > > > like the
> > > > > > client isn't recovering after the session is lost. When the server
> > > > > > reboots, the
> > > > > > client(s) will get NFS4ERR_BAD_SESSION errors back because the
> > server
> > > > > > reboot has
> > > > > > deleted all sessions. The NFS4ERR_BAD_SESSION should trigger state
> > > > > > recovery on the client.
> > > > > > (It doesn't sound like the clients went into recovery, starting
> > with a
> > > > > > Create_session
> > > > > >  operation, but without a packet trace, I can't be sure?)
> > > > > >
> > > > > > rick
> > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > >
> > -------------------------------------------------------------------------
> > > > > > > Graham Allan - gta@umn.edu - allan@physics.umn.edu
> > > > > > > School of Physics and Astronomy - University of Minnesota
> > > > > > >
> > > >
> > -------------------------------------------------------------------------
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1603742210.4824721.1438169724361.JavaMail.zimbra>