Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Jul 2015 11:05:34 -0500
From:      Graham Allan <allan@physics.umn.edu>
To:        Ahmed Kamal <email.ahmedkamal@googlemail.com>
Cc:        Ahmed Kamal via freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: Linux NFSv4 clients are getting (bad sequence-id error!)
Message-ID:  <55A927CE.5010505@physics.umn.edu>
In-Reply-To: <CANzjMX43dsKkdvnnBaX5qsb2XbHpRKftRKyZ8QrZkAaR2wFVVg@mail.gmail.com>
References:  <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <CANzjMX7xKBvnzJhQhB_ZrUnyE2m_FJXXy4fm_RFnuZfBDyDm2A@mail.gmail.com> <55947C6E.5060409@delphij.net> <1491630362.2785531.1435799383802.JavaMail.zimbra@uoguelph.ca> <5594B008.10202@freebsd.org> <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <CANzjMX5eN1FsnHMf6KGZe_b3vwxxF=dy3fJUHxeGO4BXuNzfPA@mail.gmail.com> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <CANzjMX427XNQJ1o6Wh2CVy1LF1ivspGcfNeRCmv%2BOyApK2UhJg@mail.gmail.com> <CANzjMX5xyUz6OkMKS4O-MrV2w58YT9ricOPLJWVtAR5Ci-LMew@mail.gmail.com> <20150716235022.GF32479@physics.umn.edu> <CANzjMX43dsKkdvnnBaX5qsb2XbHpRKftRKyZ8QrZkAaR2wFVVg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I have maintenance scheduled this weekend so maybe I will try to add Xin 
Li's patch on one of our 9.3 servers and can see if the sequence-id 
messages diminish (even though it didn't help for you - possibly SL6 
will behave differently).

As for SL6/NFS4 being more tolerant, I suspect the problem is dependent 
on the specific job. This is the first time I have seen it at all (that 
is, with the stuck processes and high rpciod load), and I only see one 
person running this code. Although looking back ~60 days in logs I can 
see the sequence-id messages occurring all over the place from other 
machines, apparently without incident.

For the more intense users who are running on 200 servers at once, I 
wonder if they are not hitting the NFS server in the same way - possibly 
they are mostly writing somewhere else like hadoop and only reading from 
NFS. However our compute farm conversions to SL6 and NFSv4 are fairly 
recent, so something may yet show up.

I wonder if we have any avenue to file a bug with Redhat. I have a very 
basic subscription which only lets me look at their KB, but I could 
upgrade it - but then, as I'm running a clone product I probably don't 
have a viable report.

Graham

On 7/17/2015 6:21 AM, Ahmed Kamal wrote:
> Hi Graham,
>
> So my RHEL5 boxes certainly have trouble with nfs4 .. I'm running about
> 20 boxes and almost all of them develop a choking process every day or
> two. I'm now in the process of upgrading our RHEL boxes to v6.x .. This
> is motivated to migrate to NFS4.1, although now that you say NFS4 is
> more tolerant on EL6, I might just remain on that. So far I did one week
> of basic testing of a VM on el6 with nfs4.1 vs my FreeBSD 10.1, so far I
> didn't hit problems (although the testing was light). Next week, I'll
> probably upgrade one of our production machines to el6 and see how it fares.
>
> PS: I had upgraded our el5 box with elrepo kernel (v3.2) .. which I
> thought would be way much newer (even newer than el6) .. But I still had
> trouble with it .. so I reverted to stock el5 kernel! Not sure if this
> means Linux is not the only component at fault ?!
>


-- 
-------------------------------------------------------------------------
Graham Allan - gta@umn.edu - allan@physics.umn.edu
School of Physics and Astronomy - University of Minnesota
-------------------------------------------------------------------------




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55A927CE.5010505>