Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Jul 2015 18:47:42 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Ahmed Kamal <email.ahmedkamal@googlemail.com>
Cc:        Graham Allan <allan@physics.umn.edu>,  Ahmed Kamal via freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: Linux NFSv4 clients are getting (bad sequence-id error!)
Message-ID:  <399737523.5288680.1438210062715.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <CANzjMX5N_NiQpnTMXSbaDk4e1XFm4gA8h_UgEv%2BC9VdPtMMTew@mail.gmail.com>
References:  <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <CANzjMX48F1gAVwqq64q=yALfTBNEc7iMbKAK1zi6aUfoF3WpOw@mail.gmail.com> <576106597.2326662.1437688749018.JavaMail.zimbra@uoguelph.ca> <CANzjMX5Q4TNLBxrAm6R2F6oUdfgRD8dX1LRZiniJA4M4HTN_=w@mail.gmail.com> <1089316279.4709692.1438126760802.JavaMail.zimbra@uoguelph.ca> <CANzjMX7J7wm4f8nCXqjGe7F%2BZgwYFBtFtKTFOzrPJN-MeXJ5EA@mail.gmail.com> <1603742210.4824721.1438169724361.JavaMail.zimbra@uoguelph.ca> <CANzjMX5N_NiQpnTMXSbaDk4e1XFm4gA8h_UgEv%2BC9VdPtMMTew@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_5288678_109992561.1438210062713
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Ahmed Kamal wrote:
> hmm Thanks Rick ..
> 
> You mentioned the error appears when nfsrvboottime != clientid.lval[0] .. I
> understand nfsrvboottime is number of seconds since the epoch (1970) .. Can
> you please explain what clientid.lval[0] is, and (if it comes from the
> client?) what guarantees it should equal nfsrvboottime ?
> 
> Apart from trying to understand the problem. Can you send me a small c
> program that runs the same code that computes nfsrvboottime and writes that
> to the terminal window. I would like to avoid testing a kernel patch on
> this system since it runs in production. And last time I rebooted the nfs
> server, I ended up having to reboot all clients (every single workstation)
> so that was painful .. So if we just want to know if the number if bigger
> than 4 billion or not, I think this small app can help us get this value
> right ?
> 
Ok, I took a closer look at the packet trace (thanks for creating it) and I saw
that the clientid value being returned by the server was bogus. (The field that
shouldn't change except when the server is rebooted was changing.)

Given the above, I think I did find the bug and the attached patch should fix it.
(This is NFSv4.1 specific and has nothing to do with the NFSv4.0 seqid issue.)
The patch actually fixes 3 things, although I don't think the other 2 would affect
you in practice:
1 - When a confirmed clientid already exists, nfsrv_setclient() wasn't setting the
    clientidp argument, so the reply included garbage off the stack.
    I think this is what caused your problem.
2 - If the high order 32bits of the nfsrvboottime is non-zero, the comparisons with
    clientid.lval[0] would fail. I don't think this should actually happen until the
    year 2138, but is fixed in the patch.
3 - It was possible to leave an unconfirmed duplicate clientid structure in the list
    when the match in nfsrv_setclient() found it in the last hash list. I do not think
    this would have caused any problem, since the new one would be at the head of the
    list. The old one would eventually been scavenged and cleared out, although it
    would have occupied storage until then.

The attached patch fixes all of these and I think will fix your problem. Maybe you
can create a patched kernel and then find a time to reboot the server someday?

If any other reader is using the NFSv4 server, please test this patch if possible.

Thanks for creating the packet trace and sorry about this bug causing you grief, rick
ps: I'd guess you are one of the first serious users of the NFSv4.1 server, but hopefully
    it will behave ok for you with this patch.


> On Wed, Jul 29, 2015 at 1:35 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> 
> > Ahmed Kamal wrote:
> > > hmm, if I understand you correctly, this time_seconds value is the number
> > > of seconds till the box booted ?
> > No, it is the number of seconds when the box booted. Once, it was supposed
> > to
> > be the number of seconds since Jan. 1, 1970, but I don't if that is still
> > the
> > case. (For my i386, it is about 1.4billion when it boots, so I'd guess they
> > still use Jan. 1, 1970. 3600*24*365*45 = 1419120000. Yea, I didn't bother
> > with
> > leap years, etc.)
> >
> > Now, I don't know if the clock could somehow set it to a value > 4billion
> > when
> > the nfsd starts (it copies time_seconds to nfsrvboottime as it starts up),
> > but the
> > current clock code is pretty convoluted stuff, so??
> >
> > rick
> > ps: From the NFSv4 server's point of view, it only needs a number that is
> > unique and
> >     changes every time the server reboots. As such, using the low order
> > 32bits of
> >     it would be sufficient, even if it exceeds 4billion. However, the code
> > incorrectly
> >     assumes it won't exceed 4*1024*1024*1024 - 1 unless you apply the
> > patch.
> >
> > > If so, I guess this is not really the
> > > cause of what we're seeing as the box is only up for 8 days
> > >
> > > bsd# uptime
> > > 11:28AM  up 8 days,  6:20, 6 users, load averages: 0.94, 0.91, 0.84
> > >
> > > The NFS client box's uptime is
> > > linux# uptime
> > >  11:31:39 up 8 days,  5:51, 11 users,  load average: 87.74, 87.43, 87.35
> > >
> > > and yes the huge load is most likely due to this NFS bug
> > >
> > > On Wed, Jul 29, 2015 at 1:39 AM, Rick Macklem <rmacklem@uoguelph.ca>
> > wrote:
> > >
> > > > Ahmed Kamal wrote:
> > > > > Hi again Rick,
> > > > >
> > > > > Seems that I'm still being unlucky with nfs :/ I caught one of the
> > newly
> > > > > installed RHEL6 boxes having high CPU usage, and bombarding the BSD
> > NFS
> > > > box
> > > > > with 10Mbps traffic .. I caught a tcpdump as you mentioned .. You can
> > > > > download it here:
> > > > >
> > > > >
> > > >
> > https://dl.dropboxusercontent.com/u/51939288/nfs41-high-client-cpu.pcap.bz2
> > > > >
> > > > Ok, the packet trace suggests that the NFSv4 server is broken (it is
> > > > replying
> > > > with NFS4ERR_STALE_CLIENTID for a recently generated ClientID).
> > > > Now, I can't be sure, but the only explanation I can come up with is...
> > > > - For some arches (I only have i386, so I wouldn't have seen this
> > during
> > > > testing),
> > > >   time_t is 64bits (uint64_t).
> > > >   --> If time_seconds somehow doesn't fit in the low order 32bits, then
> > > > the code
> > > >       would be busted for these arches because nfsrvboottime is set to
> > > > time_seconds
> > > >       when the server is started and then there are comparisons like:
> > > >       if (nfsrvboottime != clientid.lval[0])
> > > >            return (NFSERR_STALECLIENTID);
> > > >        /* where clientid.lval[0] is a uint32_t */
> > > > Anyhow, if this is what is happening, the attached simple patch should
> > fix
> > > > it.
> > > > (I don't know how time_seconds would exceed 4billion, but the clock
> > code is
> > > >  pretty convoluted, so I can't say if it can possibly happen?)
> > > >
> > > > rick
> > > > ps: Hmm, on i386 time_seconds ends up at 1438126486, so maybe it can
> > exceed
> > > >     4*1024*1024*1024 - 1 on amd64?
> > > >
> > > > > I didn't restart the client yet .. so if you catch me in the next few
> > > > hours
> > > > > and want me to run any diagnostics, let me know. Thanks a lot all for
> > > > > helping
> > > > >
> > > > > On Thu, Jul 23, 2015 at 11:59 PM, Rick Macklem <rmacklem@uoguelph.ca
> > >
> > > > wrote:
> > > > >
> > > > > > Ahmed Kamal wrote:
> > > > > > > Can you please let me know the ultimate packet trace command I'd
> > > > need to
> > > > > > > run in case of any nfs4 troubles .. I guess this should be
> > > > comprehensive
> > > > > > > even at the expense of a larger output size (which we can trim
> > > > later)..
> > > > > > > Thanks a lot for the help!
> > > > > > >
> > > > > > tcpdump -s 0 -w <file>.pcap host <client-host-name>
> > > > > > (<file> refers to a file name you choose and <client-host-name>
> > refers
> > > > to
> > > > > >  the host name of a client generating traffic.)
> > > > > > --> But you won't be able to allow this to run for long during the
> > > > storm
> > > > > > or the
> > > > > >     file will be huge.
> > > > > >
> > > > > > Then you look at <file>.pcap in wireshark, which knows NFS.
> > > > > >
> > > > > > rick
> > > > > >
> > > > > > > On Thu, Jul 23, 2015 at 11:53 PM, Rick Macklem <
> > rmacklem@uoguelph.ca
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Graham Allan wrote:
> > > > > > > > > For our part, the user whose code triggered the pathological
> > > > > > behaviour
> > > > > > > > > on SL5 reran it on SL6 without incident - I still see lots of
> > > > > > > > > sequence-id errors in the logs, but nothing bad happened.
> > > > > > > > >
> > > > > > > > > I'd still like to ask them to rerun again on SL5 to see if
> > the
> > > > > > "accept
> > > > > > > > > skipped seqid" patch had any effect, though I think we expect
> > > > not.
> > > > > > Maybe
> > > > > > > > > it would be nice if I could get set up to capture rolling
> > > > tcpdumps of
> > > > > > > > > the nfs traffic before they run that though...
> > > > > > > > >
> > > > > > > > > Graham
> > > > > > > > >
> > > > > > > > > On 7/20/2015 10:26 PM, Ahmed Kamal wrote:
> > > > > > > > > > Hi folks,
> > > > > > > > > >
> > > > > > > > > > I've upgraded a test client to rhel6 today, and I'll keep
> > an
> > > > eye
> > > > > > on it
> > > > > > > > > > to see what happens.
> > > > > > > > > >
> > > > > > > > > > During the process, I made the (I guess mistake) of zfs
> > send |
> > > > > > recv to
> > > > > > > > a
> > > > > > > > > > locally attached usb disk for backup purposes .. long story
> > > > short,
> > > > > > > > > > sharenfs property on the received filesystem was causing
> > some
> > > > > > > > nfs/mountd
> > > > > > > > > > errors in logs .. I wasn't too happy with what I got .. I
> > > > > > destroyed the
> > > > > > > > > > backup datasets and the whole pool eventually .. and then
> > > > rebooted
> > > > > > the
> > > > > > > > > > whole nas box .. After reboot my logs are still flooded
> > with
> > > > > > > > > >
> > > > > > > > > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session
> > > > > > > > > > Jul 21 05:13:07 nas last message repeated 7536 times
> > > > > > > > > > Jul 21 05:15:08 nas last message repeated 29664 times
> > > > > > > > > >
> > > > > > > > > > Not sure what that means .. or how it can be stopped ..
> > Anyway,
> > > > > > will
> > > > > > > > > > keep you posted on progress.
> > > > > > > > >
> > > > > > > > Oh, I didn't see the part about "reboot" before.
> > Unfortunately, it
> > > > > > sounds
> > > > > > > > like the
> > > > > > > > client isn't recovering after the session is lost. When the
> > server
> > > > > > > > reboots, the
> > > > > > > > client(s) will get NFS4ERR_BAD_SESSION errors back because the
> > > > server
> > > > > > > > reboot has
> > > > > > > > deleted all sessions. The NFS4ERR_BAD_SESSION should trigger
> > state
> > > > > > > > recovery on the client.
> > > > > > > > (It doesn't sound like the clients went into recovery, starting
> > > > with a
> > > > > > > > Create_session
> > > > > > > >  operation, but without a packet trace, I can't be sure?)
> > > > > > > >
> > > > > > > > rick
> > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > >
> > > >
> > -------------------------------------------------------------------------
> > > > > > > > > Graham Allan - gta@umn.edu - allan@physics.umn.edu
> > > > > > > > > School of Physics and Astronomy - University of Minnesota
> > > > > > > > >
> > > > > >
> > > >
> > -------------------------------------------------------------------------
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 

------=_Part_5288678_109992561.1438210062713
Content-Type: text/x-patch; name=nfsv41exch.patch
Content-Disposition: attachment; filename=nfsv41exch.patch
Content-Transfer-Encoding: base64

LS0tIGZzL25mc3NlcnZlci9uZnNfbmZzZHN0YXRlLmMuc2F2CTIwMTUtMDctMjggMTg6NTQ6MDYu
NTYxNDU0MDAwIC0wNDAwCisrKyBmcy9uZnNzZXJ2ZXIvbmZzX25mc2RzdGF0ZS5jCTIwMTUtMDct
MjkgMTg6MDc6NTMuMDAwMDAwMDAwIC0wNDAwCkBAIC0yMjAsNyArMjIwLDggQEAgbmZzcnZfc2V0
Y2xpZW50KHN0cnVjdCBuZnNydl9kZXNjcmlwdCAqbgogCQkJYnJlYWs7CiAJCX0KIAkgICAgfQot
CSAgICBpKys7CisJICAgIGlmIChnb3RpdCA9PSAwKQorCQlpKys7CiAJfQogCWlmICghZ290aXQg
fHwKIAkgICAgKGNscC0+bGNfZmxhZ3MgJiAoTENMX05FRURTQ09ORklSTSB8IExDTF9BRE1JTlJF
Vk9LRUQpKSkgewpAQCAtNDAwLDkgKzQwMSwxMiBAQCBuZnNydl9zZXRjbGllbnQoc3RydWN0IG5m
c3J2X2Rlc2NyaXB0ICpuCiAJfQogCiAJLyogRm9yIE5GU3Y0LjEsIG1hcmsgdGhhdCB3ZSBmb3Vu
ZCBhIGNvbmZpcm1lZCBjbGllbnRpZC4gKi8KLQlpZiAoKG5kLT5uZF9mbGFnICYgTkRfTkZTVjQx
KSAhPSAwKQorCWlmICgobmQtPm5kX2ZsYWcgJiBORF9ORlNWNDEpICE9IDApIHsKKwkJY2xpZW50
aWRwLT5sdmFsWzBdID0gY2xwLT5sY19jbGllbnRpZC5sdmFsWzBdOworCQljbGllbnRpZHAtPmx2
YWxbMV0gPSBjbHAtPmxjX2NsaWVudGlkLmx2YWxbMV07CisJCWNvbmZpcm1wLT5sdmFsWzBdID0g
MDsJLyogSWdub3JlZCBieSBjbGllbnQgKi8KIAkJY29uZmlybXAtPmx2YWxbMV0gPSAxOwotCWVs
c2UgeworCX0gZWxzZSB7CiAJCS8qCiAJCSAqIGlkIGFuZCB2ZXJpZmllciBtYXRjaCwgc28gdXBk
YXRlIHRoZSBuZXQgYWRkcmVzcyBpbmZvCiAJCSAqIGFuZCBnZXQgcmlkIG9mIGFueSBleGlzdGlu
ZyBjYWxsYmFjayBhdXRoZW50aWNhdGlvbgpAQCAtNDg3LDcgKzQ5MSw3IEBAIG5mc3J2X2dldGNs
aWVudChuZnNxdWFkX3QgY2xpZW50aWQsIGludCAKIAlpZiAoY2xwcCkKIAkJKmNscHAgPSBOVUxM
OwogCWlmICgobmQgPT0gTlVMTCB8fCAobmQtPm5kX2ZsYWcgJiBORF9ORlNWNDEpID09IDAgfHwK
LQkgICAgb3BmbGFncyAhPSBDTE9QU19SRU5FVykgJiYgbmZzcnZib290dGltZSAhPSBjbGllbnRp
ZC5sdmFsWzBdKSB7CisJICAgIG9wZmxhZ3MgIT0gQ0xPUFNfUkVORVcpICYmICh1aW50MzJfdClu
ZnNydmJvb3R0aW1lICE9IGNsaWVudGlkLmx2YWxbMF0pIHsKIAkJZXJyb3IgPSBORlNFUlJfU1RB
TEVDTElFTlRJRDsKIAkJZ290byBvdXQ7CiAJfQpAQCAtNjgzLDcgKzY4Nyw3IEBAIG5mc3J2X2Rl
c3Ryb3ljbGllbnQobmZzcXVhZF90IGNsaWVudGlkLCAKIAlzdHJ1Y3QgbmZzY2xpZW50aGFzaGhl
YWQgKmhwOwogCWludCBlcnJvciA9IDAsIGksIGlnb3Rsb2NrOwogCi0JaWYgKG5mc3J2Ym9vdHRp
bWUgIT0gY2xpZW50aWQubHZhbFswXSkgeworCWlmICgodWludDMyX3QpbmZzcnZib290dGltZSAh
PSBjbGllbnRpZC5sdmFsWzBdKSB7CiAJCWVycm9yID0gTkZTRVJSX1NUQUxFQ0xJRU5USUQ7CiAJ
CWdvdG8gb3V0OwogCX0KQEAgLTM5OTYsMTEgKzQwMDAsMTEgQEAgbmZzcnZfY2hlY2tyZXN0YXJ0
KG5mc3F1YWRfdCBjbGllbnRpZCwgdQogCSAqLwogCWlmIChmbGFncyAmCiAJICAgIChORlNMQ0tf
T1BFTiB8IE5GU0xDS19URVNUIHwgTkZTTENLX1JFTEVBU0UgfCBORlNMQ0tfREVMRUdQVVJHRSkp
IHsKLQkJaWYgKGNsaWVudGlkLmx2YWxbMF0gIT0gbmZzcnZib290dGltZSkgeworCQlpZiAoY2xp
ZW50aWQubHZhbFswXSAhPSAodWludDMyX3QpbmZzcnZib290dGltZSkgewogCQkJcmV0ID0gTkZT
RVJSX1NUQUxFQ0xJRU5USUQ7CiAJCQlnb3RvIG91dDsKIAkJfQotCX0gZWxzZSBpZiAoc3RhdGVp
ZHAtPm90aGVyWzBdICE9IG5mc3J2Ym9vdHRpbWUgJiYKKwl9IGVsc2UgaWYgKHN0YXRlaWRwLT5v
dGhlclswXSAhPSAodWludDMyX3QpbmZzcnZib290dGltZSAmJgogCQlzcGVjaWFsaWQgPT0gMCkg
ewogCQlyZXQgPSBORlNFUlJfU1RBTEVTVEFURUlEOwogCQlnb3RvIG91dDsK
------=_Part_5288678_109992561.1438210062713--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?399737523.5288680.1438210062715.JavaMail.zimbra>