From owner-freebsd-fs@freebsd.org Thu Jul 23 15:27:12 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2C5C09A810C for ; Thu, 23 Jul 2015 15:27:12 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wi0-x236.google.com (mail-wi0-x236.google.com [IPv6:2a00:1450:400c:c05::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 900261E99 for ; Thu, 23 Jul 2015 15:27:11 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wibud3 with SMTP id ud3so224572288wib.0 for ; Thu, 23 Jul 2015 08:27:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=hq8gj35xDIKhSmw2RZm41No0ZXdWodLndJgJS7oCf9A=; b=ozxt04nodg8TawXhw9pSkliGJY8lxJaXMxIywq/1DAoSsGWKWw4PYPFrkVVmrNCbgz vvAzZRMR5z0c5al7ydzfs6iSmyGYahda1Vk2zb0JdMbGCwLFHLV821guP87IRQyEZ44B 364IcnyKVIRRaEPnxlQhl5kP+nw7Bkv58TkxjuHzZsKp6fsAb4ys9AuHxc+L1/4iQ6yG ZU9ASWyCg9wX2xEuvD+G4uHS7XhdLTSLi4qtjAAbE8rYe70DNfQrRcOwD1tcqppr1KOn YYZuxlHXR5V/I26OXRF9Wyzsmk1VrVLnGv69kaEPQPlxhCws1QeKmcyH1zlyJTBSPo1O 0ARg== X-Received: by 10.194.192.33 with SMTP id hd1mr17663347wjc.96.1437665229769; Thu, 23 Jul 2015 08:27:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Thu, 23 Jul 2015 08:26:50 -0700 (PDT) In-Reply-To: <1474771205.1788105.1437648059578.JavaMail.zimbra@uoguelph.ca> References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> <1474771205.1788105.1437648059578.JavaMail.zimbra@uoguelph.ca> From: Ahmed Kamal Date: Thu, 23 Jul 2015 17:26:50 +0200 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) To: Rick Macklem Cc: Graham Allan , Ahmed Kamal via freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jul 2015 15:27:12 -0000 Well .. The problem is now gone, so I guess I can't collect more data till it happens (or hopefully doesn't :) happen again .. So as I described, I had to restart the FreeBSD NFS server box first .. maybe this caused linux clients to give up after 5 mins, and attempt to destroy the session ? When the NFS server was back up .. It was being bombarded (50Mbps traffic) with rpc traffic, probably saying this "destroy session" message. What I don't understand however is, why doesn't this end. What does FreeBSD reply with? Shouldn't it say, Okay, I don't know anything about this session, so consider it destroyed .. suit yourself linux .. or does it refuse to destroy, causing Linux to keep on retrying like crazy ? On Thu, Jul 23, 2015 at 12:40 PM, Rick Macklem wrote: > Ahmed Kamal wrote: > > rhel6 servers logs were flooded with errors like: > http://paste2.org/EwLGcGF6 > > The Freebsd box was being pounded with 40Mbps of nfs traffic .. probably > > Linux was retrying too hard ?! I had to reboot all PCs and after the last > > one, nfsd CPU usage dropped immediately to zero > > > Btw, it would be interesting to know what triggers these things (overload > of > the nfs server resulting in very slow response or ???). Basically > Destroy_session > isn't an operation that a client would normally do. I have no idea why the > Linux > client would do it. (A session is what achieves the "exactly once" > semantics for > the RPCs. It should really be in the RPC layer, but the NFSv4 working > group put > it in NFSv4.1 because they didn't want to replace Sun RPC. I can't think > of a reason > to destroy a session except on dismount. Maybe if the client thinks the > session is > broken for some reason??) > > Maybe something like "vmstat -m", "vmstat -z" and "nfsstat -s -e" running > repeatedly > (once/sec with timestamps via "date" or similar) so that you can see what > was happening just > before the meltdowns. > > A raw packet trace of just when the meltdown starts would be useful, but I > can't think > of how you'd get one of reasonable size. Maybe having "tcpdump -s 0 -w > .pcap " > run for 1sec and then kill/restart it repeatedly with different file > names, so you might get > a useful 1sec capture at the critical time? > > Anyhow, good luck with it, rick > > > On Tue, Jul 21, 2015 at 5:52 AM, Ahmed Kamal < > > email.ahmedkamal@googlemail.com> wrote: > > > > > More info .. Just noticed nfsd is spinning the cpu at 500% :( I just > did > > > the dtrace with: > > > > > > dtrace -n profile-1001 { @[stack()] = count(); } > > > The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom) > > > > > > Since rebooting the nfs server didn't fix it .. I imagine I'd have to > > > reboot all NFS clients .. This would be really sad .. Any advice is > most > > > appreciated .. Thanks > > > > > > > > > On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal < > > > email.ahmedkamal@googlemail.com> wrote: > > > > > >> Hi folks, > > >> > > >> I've upgraded a test client to rhel6 today, and I'll keep an eye on > it to > > >> see what happens. > > >> > > >> During the process, I made the (I guess mistake) of zfs send | recv > to a > > >> locally attached usb disk for backup purposes .. long story short, > > >> sharenfs > > >> property on the received filesystem was causing some nfs/mountd > errors in > > >> logs .. I wasn't too happy with what I got .. I destroyed the backup > > >> datasets and the whole pool eventually .. and then rebooted the whole > nas > > >> box .. After reboot my logs are still flooded with > > >> > > >> Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > > >> Jul 21 05:13:07 nas last message repeated 7536 times > > >> Jul 21 05:15:08 nas last message repeated 29664 times > > >> > > >> Not sure what that means .. or how it can be stopped .. Anyway, will > keep > > >> you posted on progress. > > >> > > >> On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem > > >> wrote: > > >> > > >>> Graham Allan wrote: > > >>> > I'm curious how things are going for you with this? > > >>> > > > >>> > Reading your thread did pique my interest since we have a lot of > > >>> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I > meant > > >>> > to glance through our logs for signs of the same issue, but today I > > >>> > started investigating a machine which appeared to have hung > processes, > > >>> > high rpciod load, and high traffic to the NFS server. Of course it > is > > >>> > exactly this issue. > > >>> > > > >>> > The affected machine is running SL5 though most of our server > nodes are > > >>> > now SL6. I can see errors from most of them but the SL6 systems > appear > > >>> > less affected - I see a stream of the sequence-id errors in their > logs > > >>> but > > >>> > things in general keep working. The one SL5 machine I'm looking at > > >>> > has a single sequence-id error in today's logs, but then goes into > a > > >>> > stream of "state recovery failed" then "Lock reclaim failed". It's > > >>> > probably partly related to the particular workload on this machine. > > >>> > > > >>> > I would try switching our SL6 machines to NFS 4.1 to see if the > > >>> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is > it in > > >>> > 10.1?). > > >>> > > > >>> Btw, I've done some testing against a fairly recent Fedora and > haven't > > >>> seen > > >>> the problem. If either of you guys could load a recent Fedora on a > test > > >>> client > > >>> box, it would be interesting to see if it suffers from this. (My > > >>> experience is > > >>> that the Fedora distros have more up to date Linux NFS clients.) > > >>> > > >>> rick > > >>> > > >>> > At the NFS servers, most of the sysctl settings are already tuned > > >>> > from defaults. eg tcp.highwater=100000, vfs.nfsd.tcpcachetimeo=300, > > >>> > 128-256 nfs kernel threads. > > >>> > > > >>> > Graham > > >>> > > > >>> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via > freebsd-fs > > >>> wrote: > > >>> > > PS: Today (after adjusting tcp.highwater) I didn't get any > screaming > > >>> > > reports from users about hung vnc sessions. So maybe just maybe, > > >>> linux > > >>> > > clients are able to somehow recover from this bad sequence > messages. > > >>> I > > >>> > > could still see the bad sequence error message in logs though > > >>> > > > > >>> > > Why isn't the highwater tunable set to something better by > default ? > > >>> I mean > > >>> > > this server is certainly not under a high or unusual load (it's > only > > >>> 40 PCs > > >>> > > mounting from it) > > >>> > > > > >>> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal > > >>> > > > >>> > > > wrote: > > >>> > > > > >>> > > > Thanks all .. I understand now we're doing the "right thing" .. > > >>> Although > > >>> > > > if mounting keeps wedging, I will have to solve it somehow! > Either > > >>> using > > >>> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > > >>> > > > > > >>> > > > Regarding Xin's patch, is it possible to build the patched nfsd > > >>> code, as > > >>> > > > a > > >>> > > > kernel module ? I'm looking to minimize my delta to upstream. > > >>> > > > > > >>> > > > Also would adopting Xin's patch and hiding it behind a > > >>> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably > not > > >>> the > > >>> > > > last > > >>> > > > person on earth to hit this) ? > > >>> > > > > > >>> > > > Thanks a lot for all the help! > > >>> > > > > > >>> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < > > >>> rmacklem@uoguelph.ca> > > >>> > > >>> > > > wrote: > > >>> > > > > > >>> > > >> Ahmed Kamal wrote: > > >>> > > >> > Appreciating the fruitful discussion! Can someone please > > >>> explain to > > >>> > > >> > me, > > >>> > > >> > what would happen in the current situation (linux client > doing > > >>> this > > >>> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the > effect > > >>> of > > >>> > > >> > that? > > >>> > > >> Well, as you've seen, the Linux client doesn't function > correctly > > >>> > > >> against > > >>> > > >> the FreeBSD server (and probably others that don't support > this > > >>> > > >> "skip-by-1" > > >>> > > >> case). > > >>> > > >> > > >>> > > >> > What do users see? Any chances of data loss? > > >>> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess > what > > >>> the > > >>> > > >> Linux > > >>> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're > the > > >>> guy > > >>> > > >> observing > > >>> > > >> it. > > >>> > > >> > > >>> > > >> > > > >>> > > >> > Also, I find it strange that netapp have acknowledged this > is a > > >>> bug on > > >>> > > >> > their side, which has been fixed since then! > > >>> > > >> Yea, I think Netapp screwed up. For some reason their server > > >>> allowed > > >>> > > >> this, > > >>> > > >> then was fixed to not allow it and then someone decided that > was > > >>> broken > > >>> > > >> and > > >>> > > >> reversed it. > > >>> > > >> > > >>> > > >> > I also find it strange that I'm the first to hit this :) Is > no > > >>> one > > >>> > > >> running > > >>> > > >> > nfs4 yet! > > >>> > > >> > > > >>> > > >> Well, it seems to be slowly catching on. I suspect that the > Linux > > >>> client > > >>> > > >> mounting a Netapp is the most common use of it. Since it > appears > > >>> that > > >>> > > >> they > > >>> > > >> flip flopped w.r.t. who's bug this is, it has probably > persisted. > > >>> > > >> > > >>> > > >> It may turn out that the Linux client has been fixed or it may > > >>> turn out > > >>> > > >> that most servers allowed this "skip-by-1" even though David > > >>> Noveck (one > > >>> > > >> of the main authors of the protocol) seems to agree with me > that > > >>> it > > >>> > > >> should > > >>> > > >> not be allowed. > > >>> > > >> > > >>> > > >> It is possible that others have bumped into this, but it > wasn't > > >>> isolated > > >>> > > >> (I wouldn't have guessed it, so it was good you pointed to the > > >>> RedHat > > >>> > > >> discussion) > > >>> > > >> and they worked around it by reverting to NFSv3 or similar. > > >>> > > >> The protocol is rather complex in this area and changed > > >>> completely for > > >>> > > >> NFSv4.1, > > >>> > > >> so many have also probably moved onto NFSv4.1 where this > won't be > > >>> an > > >>> > > >> issue. > > >>> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics > and > > >>> doesn't > > >>> > > >> use > > >>> > > >> these seqid fields.) > > >>> > > >> > > >>> > > >> This is all just mho, rick > > >>> > > >> > > >>> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < > > >>> rmacklem@uoguelph.ca> > > >>> > > >> wrote: > > >>> > > >> > > > >>> > > >> > > Julian Elischer wrote: > > >>> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > > >>> > > >> > > > > I am going to post to nfsv4@ietf.org to see what they > > >>> say. > > >>> > > >> > > > > Please > > >>> > > >> > > > > let me know if Xin Li's patch resolves your problem, > even > > >>> though > > >>> > > >> > > > > I > > >>> > > >> > > > > don't believe it is correct except for the UINT32_MAX > > >>> case. Good > > >>> > > >> > > > > luck with it, rick > > >>> > > >> > > > and please keep us all in the loop as to what they say! > > >>> > > >> > > > > > >>> > > >> > > > the general N+2 bit sounds like bullshit to me.. its > always > > >>> N+1 in > > >>> > > >> > > > a > > >>> > > >> > > > number field that has a > > >>> > > >> > > > bit of slack at wrap time (probably due to some > ambiguity > > >>> in the > > >>> > > >> > > > original spec). > > >>> > > >> > > > > > >>> > > >> > > Actually, since N is the lock op already done, N + 1 is > the > > >>> next > > >>> > > >> > > lock > > >>> > > >> > > operation in order. Since lock ops need to be strictly > > >>> ordered, > > >>> > > >> allowing > > >>> > > >> > > N + 2 (which means N + 2 would be done before N + 1) > makes no > > >>> sense. > > >>> > > >> > > > > >>> > > >> > > I think the author of the RFC meant that N + 2 or greater > > >>> fails, but > > >>> > > >> it > > >>> > > >> > > was poorly worded. > > >>> > > >> > > > > >>> > > >> > > I will pass along whatever I get from nfsv4@ietf.org. > (There > > >>> is an > > >>> > > >> archive > > >>> > > >> > > of it somewhere, but I can't remember where.;-) > > >>> > > >> > > > > >>> > > >> > > rick > > >>> > > >> > > _______________________________________________ > > >>> > > >> > > freebsd-fs@freebsd.org mailing list > > >>> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> > > >> > > To unsubscribe, send any mail to > > >>> > > >> > > "freebsd-fs-unsubscribe@freebsd.org" > > >>> > > >> > > > > >>> > > >> > > > >>> > > >> > > >>> > > > > > >>> > > > > > >>> > > _______________________________________________ > > >>> > > freebsd-fs@freebsd.org mailing list > > >>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> > > To unsubscribe, send any mail to " > freebsd-fs-unsubscribe@freebsd.org > > >>> " > > >>> > > > >>> > -- > > >>> > > > >>> > ------------------------------------------------------------------------- > > >>> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) > 624-5040 > > >>> > School of Physics and Astronomy - University of Minnesota > > >>> > > > >>> > ------------------------------------------------------------------------- > > >>> > _______________________________________________ > > >>> > freebsd-fs@freebsd.org mailing list > > >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> > To unsubscribe, send any mail to " > freebsd-fs-unsubscribe@freebsd.org" > > >>> > > > >>> _______________________________________________ > > >>> freebsd-fs@freebsd.org mailing list > > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org > " > > >>> > > >> > > >> > > > > > >