Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Oct 2006 16:06:53 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        fs@freebsd.org
Cc:        mohans@freebsd.org
Subject:   Re: lost dotdot caching pessimizes nfs especially
Message-ID:  <20061014143825.F1264@epsplex.bde.org>
In-Reply-To: <20061006050913.Y5250@epsplex.bde.org>
References:  <20061006050913.Y5250@epsplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 6 Oct 2006, Bruce Evans wrote:

> This change:
>
> % Index: vfs_cache.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/kern/vfs_cache.c,v
> % retrieving revision 1.102
> % retrieving revision 1.103
> % diff -u -2 -r1.102 -r1.103
> % --- vfs_cache.c	13 Jun 2005 05:59:59 -0000	1.102
> % +++ vfs_cache.c	17 Jun 2005 01:05:13 -0000	1.103
> % ...
>
> is responsible for about half of the performance loss since RELENG_4
> for building kernels over nfs (/usr and sys trees on nfs).  The kernel
> build uses "../../" a lot, and the above change apparently results in
> lots of network activity for things that should be cached locally.
>
> Some times for building a RELENG_4 kernel under conditions invariant
> except for the host kernel (after "make clean; sleep 2; make depend;
> make; make clean; sleep 2; make depend" to warm up caches):
>
> kernel:
> RELENG_4                 77.51 real        60.62 user         4.36 sys
> current.2004.07.01       ~78.5 (lost details)
> current.2005.01.01       ~79 (lost details)
> current.2005.06.17       82.42 real        62.50 user         4.71 sys
> current.2005.06.19       89.53 real        62.18 user         5.44 sys
> current.2005.06.17+      ~89.5 (lost details)
>               .17+ = .17 plus above change
> current.2005.06.17+*     86.08 real        62.43 user         5.13 sys
>               .17+* = .17+ with ../.. in Makefile avoided using a symlink
> 			    @ -> <path to sys not using ..>
> RELENG_6                 91.14 real        62.04 user         5.71 sys
> current                  similar to RELENG_6 (lost details)
>
> The total performance loss is about 18%.
>
> The total performance loss for a local sys tree (/usr still on nfs) is much
> smaller (about 4%):
>
> RELENG_4                 65.19 real        60.50 user         3.95 sys
> current.2005.06.17       67.49 real        62.13 user         4.27 sys
> RELENG_6                 67.83 real        61.84 user         4.71 sys
> current                  similar to RELENG_6 (lost details)
>
> The nfs performance for building of things that should be entirely
> cached locally is very dependent on network latency.  Not caching
> things very well causes lots of unnecessary network traffic for Getattr
> and Lookup.  The packets are small, so throughput is unimportant and
> latency dominates.  For building over nfs without -j, the dead time
> (real - user - sys) is almost directly proportional to the latency.
> My usual local network has fairly low latency (~100uS unloaded) and
> the ~14 seconds dead time in the above is for it.  Switching to a 1
> Gbps network with lower quality NICs gives an unloaded latency of ~160uS
> and a dead time of ~21 seconds.  Building with -j helps even for UP,
> at the cost of extra CPU, by letting some processes advance using cached
> stuff while others are waiting for the network.  Building with -j helps
> even more on FreeBSD cluster machines, more because they have a much
> higher network latency than because they are SMP.

I finished finding almost all the lost performance.  As indicated above,
It was almost all in nfs.

This change:

% Index: nfs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/nfsclient/nfs_vnops.c,v
% retrieving revision 1.235
% retrieving revision 1.236
% diff -u -2 -r1.235 -r1.236
% --- nfs_vnops.c	6 Dec 2004 18:52:28 -0000	1.235
% +++ nfs_vnops.c	6 Dec 2004 19:18:00 -0000	1.236
% @@ -418,10 +418,11 @@
%  		if (error)
%  			return (error);
% -		np->n_mtime = vattr.va_mtime.tv_sec;
% +		np->n_mtime = vattr.va_mtime;
%  	} else {
% +		np->n_attrstamp = 0;
    		^^^^^^^^^^^^^^^^^^^^
%  		error = VOP_GETATTR(vp, &vattr, ap->a_cred, ap->a_td);
%  		if (error)
%  			return (error);
% -		if (np->n_mtime != vattr.va_mtime.tv_sec) {
% +		if (NFS_TIMESPEC_COMPARE(&np->n_mtime, &vattr.va_mtime)) {
%  			if (vp->v_type == VDIR)
%  				np->n_direofoffset = 0;

and associated changes give silly behaviour that almost doubles the
number of Access RPCs.  One of the associated changes clears n_attrstamp
on close().  Then on open(), since lookup() is called before the above
is reached, nfs_access_otw() has always just been called, and the above
forces another call.

Counting RPCs gives a good metric for the pessimizations.  Removing the
above clearing in RELENG_6 gives the following improvement:

Before:
        89.90 real        62.16 user         5.50 sys
  Lookup Read Write Create Access Fsstat Setattr Other   Total
   60010 2410  5353    442  43785   1742    5194     6  118942
After:
        86.46 real        62.22 user         5.21 sys
  Lookup Read Write Create Access Fsstat Setattr Other   Total
   59986 2410  5353    442  20935   1742    5194     6   96068

Note the RPC delta-counts barely changed except for the Access one.
About 20000 Access calls were avoided.  Just removing the clearing
is not correct but is close.

The pessimization in vfs_cache.c 1.103 is now easy to quantify.  It
triples the number of Lookup RPCs.  Removing it in addition to the
above gives a much larger improvement:

        79.24 real        61.87 user         5.04 sys
  Lookup Read Write Create Access Fsstat Setattr Other   Total
   19548 2410  5353    442  20922   1742    5194     6   55617

Note the RPC delta-counts barely changed except for the Lookup one.
About 40000 Lookup calls were avoided.  Just removing the change in
vfs_cache.c 1.103 is not close to being correct.

The last major pessimization is another silly one.  The changes to
mark atimes on exec() and mmap() cause a silly null Setattr RPC for
every exec() (more for interprters?) and every mmap().  This is
easy to fix (almost) correctly.  VOP_SETATTR() is assumed to do
nothing for requests that it doesn't understand, but nfs_setattr()
does null RPCs instead.  The following fix:

% diff -c2 ./nfsclient/nfs_vnops.c~ ./nfsclient/nfs_vnops.c
% *** ./nfsclient/nfs_vnops.c~	Sun Oct  8 23:08:57 2006
% --- ./nfsclient/nfs_vnops.c	Fri Oct 13 09:58:12 2006
% ***************
% *** 669,675 ****
% 
%   	/*
% ! 	 * Setting of flags is not supported.
%   	 */
% ! 	if (vap->va_flags != VNOVAL)
%   		return (EOPNOTSUPP);
% 
% --- 677,684 ----
% 
%   	/*
% ! 	 * Setting of flags and marking of atimes are not supported.
%   	 */
% ! 	if (vap->va_flags != VNOVAL ||
% ! 	    ((bdefix & 4) && (vap->va_vaflags & VA_MARK_ATIME)))
%   		return (EOPNOTSUPP);
%

in addition to the removals gives the following improvement with
bdefix set to 7:

        78.14 real        62.03 user         4.79 sys
  Lookup Read Write Create Access Fsstat Other   Total
   19556 2410  5353    442  19581   1738    14   49094

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061014143825.F1264>