Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 19 Jul 2011 09:15:27 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Mike Shultz <mike@votesmart.org>
Cc:        freebsd-fs@freebsd.org, Clinton Adams <clinton@votesmart.org>
Subject:   Re: nfsd server cache flooded, try to increase nfsrc_floodlevel
Message-ID:  <752938116.734332.1311081327632.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <4E24B266.9050108@votesmart.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Mike Shultz wrote:
> I ran into an issue today of our server thinking that it was being
> flooded and locking our nfs users out. Got a LOT of these messages:
> 
> Jul 12 16:08:22 xxxxx kernel: nfsd server cache flooded, try to
> increase
> nfsrc_floodlevel
> 
> Our server(`uname -a`): FreeBSD xxxxx 8.2-RELEASE-p2 FreeBSD
> 8.2-RELEASE-p2 #0: Tue Jun 21 16:52:27 MDT 2011
> yyy@xxxxx:/usr/obj/usr/src/sys/XXXXX amd64
> 
> I could find no information on nfsrc_floodlevel other than source code
> which didn't explain too much about it. I don't know if it's a kernel
> config var, or what.
> 
> `nfsstat -e` did show this:
> 
> CacheSize TCPPeak
> 16385 16385
> 
> So I'm guessing that that is the current cache limit.
> 
> The source code and this output suggest that we're just running into
> the
> limit. However, a comment in that source does suggest that "The cache
> will still function over flood level" but that doesn't seem to be the
> case. I ended up having to revoke the clients and restarting nfsd to
> get
> it operational again.
> 
Since you were seeing the messages "...try increasing flood level" it means
that at least some of your client(s) are using NFSv4. For NFSv4, the client
will get NFS4ERR_RESOURCE back as a reply at this point. My guess is that
the client(s) just kept sending retries of the RPCs and, since the cache size
didn't decrease, just kept getting NFS4ERR_RESOURCE.

The real question becomes "how did it hit the flood level?".

Hmm, there was a recent SMP related cache problem that is fixed by this patch:
   http://people.freebsd.org/~rmacklem/cache.patch

I'd suggest you try this patch and see if the problem occurs again.

It seems unlikely you would hit the flood level unless there is a bug (or very
weird client behaviour), but it's conceivable.

You can increase it by editting sys/fs/nfs/nfs.h and increasing the value, then
rebuilding a kernel/modules.

Also, what client(s) are mounting the server and how many/how busy are they?

Hopefully the SMP patch will fix this for you, although it's hard to predict
what behaviour could be observed without the patch. (Btw, the patch is in head
and stable/8, but not releng 8.2.) I only have single core hardware for testing,
so I'd never see these kinds of bugs myself.

> I would appreciate anyone that could clarify what nfsrc_floodlevel is
> and how to go about changing it.
> 
This is mostly a "sanity check" and it's hard to imagine hitting the limit
of 16K. To hit this without some sort of bug or client/server interoperability
issue would take something like 4000 TCP mounts against the server. However,
the only issue with respect to increasing it is running out of mbufs. You
can try increasing it, as above, and then use `nfssstat -e -s` to monitor
how it grows. If it keeps growing, then there is definitely a bug or
interoperability problem. If it just seems to peak at some level, I`d like
to hear what that level is and what kind of load the server would have at that
time.

The only other thing that I can think of that might result in hitting the limit
is a TCP stack which allows a large amount of unACKed data, since a cache
entry is discarded when the client's TCP layer ACKs past the TCP seq# for the
reply sent to the client.)

Good luck with it and please let me know how it goes, rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?752938116.734332.1311081327632.JavaMail.root>