Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Apr 2018 01:49:33 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        =?Windows-1252?Q?Niels_Kobsch=E4tzki?= <niels@kobschaetzki.net>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: High rate of NFS cache misses after upgrading from 10.3-prerelease to 11.1-release
Message-ID:  <YQBPR0101MB1042D2F0CE2575EB4F17588ADDB20@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <ce3712c0-626e-c8f2-3bba-933cf359bcef@kobschaetzki.net>
References:  <ce3712c0-626e-c8f2-3bba-933cf359bcef@kobschaetzki.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Niels Kobsch=E4tzki wrote:
>sorry for the cross-posting but so far I had no real luck on the forum
>or on question, thus I want to try my luck here as well.
I read email lists but don't do the other stuff, so I just saw this yesterd=
ay.
Short answer, I haven't a clue why cache hits rate would have changed.

The code that decides if there is a hit/miss for the attribute cache is in
ncl_getattrcache() and the code hasn't changed between 10.3->11.1,
except the old code did a mtx_lock(&Giant), but I can't imagine how that
would affect the code.

You might want to:
# sysctl -a | fgrep vfs.nfs
for both the 10.3 and 11.1 systems, to check if any defaults have somehow
been changed. (I don't recall any being changed, but??)

If you go into ncl_getattrcache() {it's in sys/fs/nfsclient/nfs_clsubs.c}
and add a printf() for "time_second" and "np->n_mtime.tv_sec" near the
top, where it calculates "timeo" from it.
Running this hacked kernel might show you if either of these fields is bogu=
s.
(You could then printf() "timeo" and "np->n_attrtimeo" just before the "if"
clause that increments "attrcache_misses", which is where the cache misses
happen to see why it is missing the cache.)
If you could do this for the 10.3 kernel as well, this might indicate why t=
he
miss rate has increased?

>I upgraded a machine from 10.3-Prerelease (custom kernel with
>tcp_fastopen added) to 11.1-Release (standard kernel) with
>freebsd-update. I have two other machines that are still on
>10.3-Prerelease. Those machines mount an NFS-export from a
>Linux-NFS-server and use NFSv3. The machine that got upgraded shows now
>far more cache misses for getattr than on the 10.3-machines (we talk a
>factor of 100) in munin. munin also shows a lot more cache-misses for
>other metrics like biow, biorl, biod (where can I find what those
>metrics mean=85currently I have not even an understanding what these are)
>etc.
>
>Can anybody help me how I can debug this problem or has an idea what
>could cause the problem? The result of this behavior is that this
>machine shows a lower performance than the others and I cannot upgrade
>other machines before I didn't fix this bug.
I haven't run a 10.x system in quite a while. When I get home in a few days=
,
I might be able to reproduce this. If I can. I can poke at it, but it would=
 be at
least a week before I might have an answer and I may not figure it out for =
a
long time.

rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQBPR0101MB1042D2F0CE2575EB4F17588ADDB20>