Date: Tue, 19 Jul 2011 09:15:27 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Mike Shultz <mike@votesmart.org> Cc: freebsd-fs@freebsd.org, Clinton Adams <clinton@votesmart.org> Subject: Re: nfsd server cache flooded, try to increase nfsrc_floodlevel Message-ID: <752938116.734332.1311081327632.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4E24B266.9050108@votesmart.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Mike Shultz wrote: > I ran into an issue today of our server thinking that it was being > flooded and locking our nfs users out. Got a LOT of these messages: > > Jul 12 16:08:22 xxxxx kernel: nfsd server cache flooded, try to > increase > nfsrc_floodlevel > > Our server(`uname -a`): FreeBSD xxxxx 8.2-RELEASE-p2 FreeBSD > 8.2-RELEASE-p2 #0: Tue Jun 21 16:52:27 MDT 2011 > yyy@xxxxx:/usr/obj/usr/src/sys/XXXXX amd64 > > I could find no information on nfsrc_floodlevel other than source code > which didn't explain too much about it. I don't know if it's a kernel > config var, or what. > > `nfsstat -e` did show this: > > CacheSize TCPPeak > 16385 16385 > > So I'm guessing that that is the current cache limit. > > The source code and this output suggest that we're just running into > the > limit. However, a comment in that source does suggest that "The cache > will still function over flood level" but that doesn't seem to be the > case. I ended up having to revoke the clients and restarting nfsd to > get > it operational again. > Since you were seeing the messages "...try increasing flood level" it means that at least some of your client(s) are using NFSv4. For NFSv4, the client will get NFS4ERR_RESOURCE back as a reply at this point. My guess is that the client(s) just kept sending retries of the RPCs and, since the cache size didn't decrease, just kept getting NFS4ERR_RESOURCE. The real question becomes "how did it hit the flood level?". Hmm, there was a recent SMP related cache problem that is fixed by this patch: http://people.freebsd.org/~rmacklem/cache.patch I'd suggest you try this patch and see if the problem occurs again. It seems unlikely you would hit the flood level unless there is a bug (or very weird client behaviour), but it's conceivable. You can increase it by editting sys/fs/nfs/nfs.h and increasing the value, then rebuilding a kernel/modules. Also, what client(s) are mounting the server and how many/how busy are they? Hopefully the SMP patch will fix this for you, although it's hard to predict what behaviour could be observed without the patch. (Btw, the patch is in head and stable/8, but not releng 8.2.) I only have single core hardware for testing, so I'd never see these kinds of bugs myself. > I would appreciate anyone that could clarify what nfsrc_floodlevel is > and how to go about changing it. > This is mostly a "sanity check" and it's hard to imagine hitting the limit of 16K. To hit this without some sort of bug or client/server interoperability issue would take something like 4000 TCP mounts against the server. However, the only issue with respect to increasing it is running out of mbufs. You can try increasing it, as above, and then use `nfssstat -e -s` to monitor how it grows. If it keeps growing, then there is definitely a bug or interoperability problem. If it just seems to peak at some level, I`d like to hear what that level is and what kind of load the server would have at that time. The only other thing that I can think of that might result in hitting the limit is a TCP stack which allows a large amount of unACKed data, since a cache entry is discarded when the client's TCP layer ACKs past the TCP seq# for the reply sent to the client.) Good luck with it and please let me know how it goes, rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?752938116.734332.1311081327632.JavaMail.root>