Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Nov 2016 22:45:51 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Alan Somers <asomers@freebsd.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, FreeBSD CURRENT <freebsd-current@freebsd.org>
Subject:   Re: NFSv4 performance degradation with 12.0-CURRENT client
Message-ID:  <YTXPR01MB0189C3E11821E4F7B7DF1814DDB60@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CAOtMX2hBXAJN_udED-u5%2B6UznR2%2BW88xgb=RqKSZL65Z3%2BcKOw@mail.gmail.com>
References:  <CAOtMX2jJ2XoQyVG1c04QL7NTJn1pg38s=XEgecE38ea0QoFAOw@mail.gmail.com> <20161124090811.GO54029@kib.kiev.ua> <YTXPR01MB0189E0B1DB5B16EE6B388B7DDDB60@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM>, <CAOtMX2hBXAJN_udED-u5%2B6UznR2%2BW88xgb=RqKSZL65Z3%2BcKOw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
asomers@gmail.com wrote:
[stuff snipped]
>I've reproduced the issue on stock FreeBSD 12, and I've also learned
>that nullfs is a required factor.  Doing the buildworld directly on
>the NFS mount doesn't cause any slowdown, but doing a buildworld on
>the nullfs copy of the NFS mount does.  The slowdown affects the base
>NFS mount as well as the nullfs copy.  Here is the nfsstat output for
>both server and client duing "ls -al" on the client:
>
>nfsstat -e -s -z
If you do this again, avoid using "-z" and I think you'll see the Opens (be=
low Server:)
going up and up...
>
>Server Info:
>  Getattr   Setattr    Lookup  Readlink      Read     Write    Create    R=
emove
>      800         0       121         0         0         2         0     =
    0
>   Rename      Link   Symlink     Mkdir     Rmdir   Readdir  RdirPlus    A=
ccess
>        0         0         0         0         0         0         0     =
    8
>    Mknod    Fsstat    Fsinfo  PathConf    Commit   LookupP   SetClId SetC=
lIdCf
>       0         0         0         0         1         3         0      =
   0
>     Open  OpenAttr OpenDwnGr  OpenCfrm DelePurge   DeleRet     GetFH     =
 Lock
>        0         0         0         0         0         0       123     =
    0
>    LockT     LockU     Close    Verify   NVerify     PutFH  PutPubFH PutR=
ootFH
>        0         0         0         0         0       674         0     =
    0
>    Renew RestoreFH    SaveFH   Secinfo RelLckOwn  V4Create
>        0         0         0         0         0         0
>Server:
>Retfailed    Faults   Clients
>        0         0         0
>OpenOwner     Opens LockOwner     Locks    Delegs
>        0         0         0         0         0
Oops, I think this is an nfsstats bug. I don't normally use "-z", so I didn=
't notice
it clears these counts and it probably should not, since they are "how many=
 of
these that are currently allocated".
I'll check this. (Not relevant to this issue, but needs fixin.;-)
>Server Cache Stats:
>   Inprog      Idem  Non-idem    Misses CacheSize   TCPPeak
>        0         0         0       674     16738     16738
>
>nfsstat -e -c -z
>Client Info:
>Rpc Counts:
> Getattr   Setattr    Lookup  Readlink      Read     Write    Create    Re=
move
>       60         0       119         0         0         0         0     =
    0
>   Rename      Link   Symlink     Mkdir     Rmdir   Readdir  RdirPlus    A=
ccess
>        0         0         0         0         0         0         0     =
    3
>    Mknod    Fsstat    Fsinfo  PathConf    Commit   SetClId SetClIdCf     =
 Lock
>        0         0         0         0         0         0         0     =
    0
>    LockT     LockU      Open   OpenCfr
>        0         0         0         0
>OpenOwner     Opens LockOwner     Locks    Delegs  LocalOwn LocalOpen Loca=
lLOwn
>     5638    141453         0         0         0         0         0     =
    0
Ok, I think this shows us the problem. 141453 opens is a lot and the client=
 would have
to chek these every time another open is done (there goes all that CPU;-).

Now, why has this occurred?
Well, the NFSv4 client can't close NFSv4 Opens on a vnode until that vnode'=
s
v_usecount goes to 0. This is because mmap'd files might do I/O after the f=
ile
descriptor is closed.
Now, hopefully Kostik will know something about nullfs and can help with th=
is.
My guess is that nullfs ends up acquiring a refcnt on the NFS vnode so the
v_usecount doesn't go to 0 and, therefore, the client never closes the NFSv=
4 Opens.
Kostik, do you know if this is the case and whether or not it can be change=
d?
>LocalLock
>        0
>Rpc Info:
>TimedOut   Invalid X Replies   Retries  Requests
>        0         0         0         0       662
>Cache Info:
>Attr Hits    Misses Lkup Hits    Misses BioR Hits    Misses BioW Hits    M=
isses
>     1275        58       837       121         0         0         0     =
    0
>BioRLHits    Misses BioD Hits    Misses DirE Hits    Misses
>        1         0         6         0         1         0
>
[more stuff snipped]
>What role could nullfs be playing?
As noted above, my hunch is that is acquiring a refcnt on the NFS client vn=
ode such
that the v_usecount doesn't go to zero (at least for a long time) and witho=
ut
a VOP_INACTIVE() on the NFSv4 vnode, the NFSv4 Opens don't get closed and
accumulate.
(If that isn't correct, it is somehow interfering with the client Closing t=
he NFSv4 Opens
 in some other way.)

rick




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTXPR01MB0189C3E11821E4F7B7DF1814DDB60>