From owner-freebsd-fs@freebsd.org Wed Jul 8 23:30:51 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 690059974AB for ; Wed, 8 Jul 2015 23:30:51 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id D7DCA1A23; Wed, 8 Jul 2015 23:30:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2AIBQAbsp1V/61jaINRBAaDZmAGgxq5ZwqFLUoCghkRAQEBAQEBAYEKhCMBAQEDAQEBASArIAsFCwIBCA4KAgINGQICJwEJJgIECAcCAgEcBId4AwoIDbc/kFANhVMBAQEHAQEBAQEdgSGKKoJNgVYGBQUCAQUIAQ40B4JogUMFjCOIAIRnglyBaoQMRYNTiwWEK4NdAiaCDByBbyIELQd+AR4jgQQBAQE X-IronPort-AV: E=Sophos;i="5.15,435,1432612800"; d="scan'208";a="224453036" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 08 Jul 2015 19:30:50 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 3C65715F563; Wed, 8 Jul 2015 19:30:49 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id pH7bwdGUBR2D; Wed, 8 Jul 2015 19:30:47 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id C693B15F564; Wed, 8 Jul 2015 19:30:47 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id NroFybeF7073; Wed, 8 Jul 2015 19:30:47 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 9140415F563; Wed, 8 Jul 2015 19:30:47 -0400 (EDT) Date: Wed, 8 Jul 2015 19:30:47 -0400 (EDT) From: Rick Macklem To: Ahmed Kamal Cc: Julian Elischer , freebsd-fs@freebsd.org, Xin LI Message-ID: <502673468.6406432.1436398247559.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <2010996878.3611963.1435884702063.JavaMail.zimbra@uoguelph.ca> <1463698530.4486572.1436135333962.JavaMail.zimbra@uoguelph.ca> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: Linux NFSv4 clients are getting (bad sequence-id error!) Thread-Index: 2IXTVT1xRu0B4urmn1qoL08Y1Am1BQ== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 23:30:51 -0000 Ahmed Kamal wrotes: > Hi folks, > > I have tested Xin's patches .. Unfortunately the problem didn't go away :/ > Many users are still reporting hung processes. If it would help, can you > show me how to dump a network trace that would help you identify the issue ? > Oops, I didn't see this. Ignore my comment w.r.t. testing it in the other post. rick > Also, is it possible in any way to have my trusted nfs3, handle the case > where every zfs /home folder is its own dataset ? > These would all need to be separate mounts. If the # of mounts is very large, maybe using an automounter would be helpful? (As far as I know, there is no limit to the # of mounts, so I don't see why you can't just mount them all.) rick > On Mon, Jul 6, 2015 at 12:28 AM, Rick Macklem wrote: > > > Ahmed Kamal wrote: > > > Hi folks, > > > > > > Just a quick update. I did not test Xin's patches yet .. What I did so > > far > > > is to increase the tcp highwater tunable and increase nfsd threads to 60. > > > Today (a working day) I noticed I only got one bad sequence error > > message! > > > Check this: > > > > > > # grep 'bad sequence' messages* | awk '{print $1 $2}' | uniq -c > > > 1 messages:Jul5 > > > 39 messages.1:Jun28 > > > 15 messages.1:Jun29 > > > 4 messages.1:Jun30 > > > 9 messages.1:Jul1 > > > 23 messages.1:Jul2 > > > 1 messages.1:Jul4 > > > 1 messages.2:Jun28 > > > > > > So there seems to be an improvement! Not sure if the Linux nfs4 client is > > > able to somehow recover from those bad-sequence situations or not .. I > > did > > > get some user complaints that running "ls -l" is sometimes slow and > > takes a > > > couple of seconds to finish. > > > > > > One final question .. Do you folks think nfs4.1 is more reliable in > > general > > > than nfs4 .. I've always only used nfs3 (I guess it can't work here with > > > /home/* being separate zfs filesystems) .. So should I go through the > > pain > > > of upgrading a few servers to RHEL-6 to try out nfs4.1 ? Basically do you > > > expect the protocol to be more solid ? I know it's a fluffy question, > > just > > > give me your thoughts. Thanks a lot! > > > > > All I can say is that the "bad seqid" errors should not occur, since > > NFSv4.1 > > doesn't use the seqid#s to order RPCs. > > > > Also I would say that a correctly implemented NFSv4.1 protocol should > > function > > "more correctly" since all RPCs and performed "exactly once". (How much > > effect > > this will have in practice, I can't say.) > > > > On the other hand, NFSv4.1 is a newer protocol (with an RFC of over > > 500pages), > > so it is hard to say how mature the implementations are. > > I think only testing will give you the answer. > > > > I would suggest that you test Xi Lin's patch that allows the "seqid + 2" > > case > > and see if that makes the "bad seqid" errors go away. (Even though I think > > this > > would indicate a client bug, adding this in way that it can be enabled via > > a sysctl > > seems reasonable.) > > > > Btw, I haven't seen any additional posts from nfsv4@ietf.org on this, rick > > > > > > > > > > > On Fri, Jul 3, 2015 at 2:51 AM, Rick Macklem > > wrote: > > > > > > > Ahmed Kamal wrote: > > > > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming > > > > > reports from users about hung vnc sessions. So maybe just maybe, > > linux > > > > > clients are able to somehow recover from this bad sequence messages. > > I > > > > > could still see the bad sequence error message in logs though > > > > > > > > > > Why isn't the highwater tunable set to something better by default ? > > I > > > > mean > > > > > this server is certainly not under a high or unusual load (it's only > > 40 > > > > PCs > > > > > mounting from it) > > > > > > > > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal < > > > > email.ahmedkamal@googlemail.com > > > > > > wrote: > > > > > > > > > > > Thanks all .. I understand now we're doing the "right thing" .. > > > > Although > > > > > > if mounting keeps wedging, I will have to solve it somehow! Either > > > > using > > > > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > > > > > > > > > > > > Regarding Xin's patch, is it possible to build the patched nfsd > > code, > > > > as a > > > > > > kernel module ? I'm looking to minimize my delta to upstream. > > > > > > > > > > Yes, you can build the nfsd as a module. If your kernel config does not > > > > include > > > > "options NFSD" the module will get loaded/used. It is also possible to > > > > replace > > > > the module without rebooting, but you need to kill of the nfsd daemon > > then > > > > kldunload nfsd.ko and replace nfsd.ko with the new one. (In > > > > /boot/.) > > > > > > > > > > Also would adopting Xin's patch and hiding it behind a > > > > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not > > the > > > > last > > > > > > person on earth to hit this) ? > > > > > > > > > > If it fixes your problem, I think this is reasonable. > > > > I'm also hoping that someone that works on the Linux client reports > > > > if/when this > > > > was changed. > > > > > > > > rick > > > > > > > > > > Thanks a lot for all the help! > > > > > > > > > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < > > rmacklem@uoguelph.ca> > > > > > > wrote: > > > > > > > > > > > >> Ahmed Kamal wrote: > > > > > >> > Appreciating the fruitful discussion! Can someone please > > explain to > > > > me, > > > > > >> > what would happen in the current situation (linux client doing > > this > > > > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect > > of > > > > that? > > > > > >> Well, as you've seen, the Linux client doesn't function correctly > > > > against > > > > > >> the FreeBSD server (and probably others that don't support this > > > > > >> "skip-by-1" > > > > > >> case). > > > > > >> > > > > > >> > What do users see? Any chances of data loss? > > > > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what > > the > > > > Linux > > > > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the > > guy > > > > > >> observing > > > > > >> it. > > > > > >> > > > > > >> > > > > > > >> > Also, I find it strange that netapp have acknowledged this is a > > bug > > > > on > > > > > >> > their side, which has been fixed since then! > > > > > >> Yea, I think Netapp screwed up. For some reason their server > > allowed > > > > this, > > > > > >> then was fixed to not allow it and then someone decided that was > > > > broken > > > > > >> and > > > > > >> reversed it. > > > > > >> > > > > > >> > I also find it strange that I'm the first to hit this :) Is no > > one > > > > > >> running > > > > > >> > nfs4 yet! > > > > > >> > > > > > > >> Well, it seems to be slowly catching on. I suspect that the Linux > > > > client > > > > > >> mounting a Netapp is the most common use of it. Since it appears > > that > > > > they > > > > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. > > > > > >> > > > > > >> It may turn out that the Linux client has been fixed or it may > > turn > > > > out > > > > > >> that most servers allowed this "skip-by-1" even though David > > Noveck > > > > (one > > > > > >> of the main authors of the protocol) seems to agree with me that > > it > > > > should > > > > > >> not be allowed. > > > > > >> > > > > > >> It is possible that others have bumped into this, but it wasn't > > > > isolated > > > > > >> (I wouldn't have guessed it, so it was good you pointed to the > > RedHat > > > > > >> discussion) > > > > > >> and they worked around it by reverting to NFSv3 or similar. > > > > > >> The protocol is rather complex in this area and changed > > completely for > > > > > >> NFSv4.1, > > > > > >> so many have also probably moved onto NFSv4.1 where this won't be > > an > > > > > >> issue. > > > > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and > > > > doesn't > > > > > >> use > > > > > >> these seqid fields.) > > > > > >> > > > > > >> This is all just mho, rick > > > > > >> > > > > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < > > rmacklem@uoguelph.ca> > > > > > >> wrote: > > > > > >> > > > > > > >> > > Julian Elischer wrote: > > > > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > > > > > >> > > > > I am going to post to nfsv4@ietf.org to see what they > > say. > > > > Please > > > > > >> > > > > let me know if Xin Li's patch resolves your problem, even > > > > though I > > > > > >> > > > > don't believe it is correct except for the UINT32_MAX > > case. > > > > Good > > > > > >> > > > > luck with it, rick > > > > > >> > > > and please keep us all in the loop as to what they say! > > > > > >> > > > > > > > > >> > > > the general N+2 bit sounds like bullshit to me.. its always > > N+1 > > > > in a > > > > > >> > > > number field that has a > > > > > >> > > > bit of slack at wrap time (probably due to some ambiguity > > in the > > > > > >> > > > original spec). > > > > > >> > > > > > > > > >> > > Actually, since N is the lock op already done, N + 1 is the > > next > > > > lock > > > > > >> > > operation in order. Since lock ops need to be strictly > > ordered, > > > > > >> allowing > > > > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no > > > > sense. > > > > > >> > > > > > > > >> > > I think the author of the RFC meant that N + 2 or greater > > fails, > > > > but > > > > > >> it > > > > > >> > > was poorly worded. > > > > > >> > > > > > > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There > > is > > > > an > > > > > >> archive > > > > > >> > > of it somewhere, but I can't remember where.;-) > > > > > >> > > > > > > > >> > > rick > > > > > >> > > _______________________________________________ > > > > > >> > > freebsd-fs@freebsd.org mailing list > > > > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > > > >> > > To unsubscribe, send any mail to " > > > > freebsd-fs-unsubscribe@freebsd.org" > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > >