Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 Apr 2013 22:19:23 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ?
Message-ID:  <1236177219.867591.1366078763224.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20130415184639.V1081@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote:
> On Sun, 14 Apr 2013, Rick Macklem wrote:
> 
> > Paul van der Zwan wrote:
> >> On 14 Apr 2013, at 5:00 , Rick Macklem <rmacklem@uoguelph.ca>
> >> wrote:
> >>
> >> Thanks for taking the effort to send such an extensive reply.
> >>
> >>> Paul van der Zwan wrote:
> >>>> On 12 Apr 2013, at 16:28 , Paul van der Zwan
> >>>> <paulz@vanderzwan.org>
> >>>> wrote:
> > ...
> >>> In NFSv3, each RPC is defined and usually includes attributes for
> >>> files
> >>> before and after the operation (implicit getattrs not counted in
> >>> the
> >>> RPC
> >>> counts reported by nfsstat).
> >>>
> >>> For NFSv4, every RPC is a compound built up of a list of
> >>> Operations
> >>> like
> >>> Getattr. Since the NFSv4 server doesn't know what the compound is
> >>> doing,
> >>> nfsstat reports the counts of Operations for the NFSv4 server, so
> >>> the counts
> >>> will be much higher than with NFSv3, but do not reflect the number
> >>> of RPCs being done.
> >>> To get NFSv4 nfsstat output that can be compared to NFSv3, you
> >>> need
> >>> to
> >>> do the command on the client(s) and it still is only roughly the
> >>> same.
> >>> (I just realized this should be documented in man nfsstat.)
> >>>
> >> I ran nfsstat -s -v 4 on the server and saw the number of requests
> >> being done.
> >> They were in the order of a few thousand per second for a single
> >> FreeBSD 9.1 client
> >> doing a make build world.
> >>
> > Yes, but as I noted above, for NFSv4, these are counts of
> > operations,
> > not RPCs. Each RPC in NFSv4 consists of several operations. For
> > example,
> > for read it is something like:
> > - PutFH, Read, Getattr
> >
> > As such, you need to do "nfsstat -e -c" on the client in order to
> > see how many RPCs are happening.
> 
> Does it show the number of physical RPC or only "roughly the same"?
> 
Yes, for NFSv4, the client side counts are for the RPCs. The roughly
referred to the fact that the NFSv4 compound doesn't do exactly the
same thing as the NFSv3 RPC, although they tend to be very similar.

> >>> For the FreeBSD NFSv4 client, the compounds include Getattr
> >>> operations
> >>> similar to what NFSv3 does. It doesn't do a Getattr on the
> >>> directory
> >>> for Lookup, because that would have made the compound much more
> >>> complex.
> >>> I don't think this will have a significant performance impact, but
> >>> will
> >>> result in some additional Getattr RPCs.
> >>>
> >> I ran snoop on port 2049 on the server and I saw a large number of
> >> lookups.
> >> A lot of them seem to be for directories which are part of the
> >> filenames of
> >> the compiler and include files which on the nfs mounted /usr/obj.
> >> The same names keep reappering so it looks like there is no caching
> >> being done on
> >> the client.
> 
> When I worked on this in ~2007, unnecessary RPCs for lookup was a
> large cause of slowness. This was fixed in at least nfsv3. Almost
> all RPCs for makeworld (closer to 99% than 90%) should now be for open
> of the excessively layered and polluted include files, since they are
> opened so often compared with other files and every open goes to the
> server (except "nocto" should fix this). There are lots of lookups
> for the include files too, but the lookups are properly cached.
> 
> >> I tried the nocto option in /etc/fstab but it does not show when
> >> mount
> >> shows
> >> the mounted filesystems so I am not sure if it is being used.
> > Head (and I think stable9) is patched so that ``nfsstat -m`` shows
> > all the options actually being used. For 9.1, you just have to trust
> > that it has been set.
> 
> This doesn't work on ref10-amd64 running 10.0-CURRENT Apr 5. nfsstat
> -m
> gives null output. Plain nfsstat confirms that there are some nfs
> mounts,
> with so much activity on them that man of the cache counts are
> negative
> after 9 days of uptime.
> 
I both the kernel and nfsstat binary are Apr. 5, I think it should work.
(It will only do the new/default NFS mounts, not oldnfs ones.)

I'll take another look, in case something got missed for the commit.

rick

> > ...
> >> I tried a make buildworld buildkernel with /usr/obj a local FS in
> >> the
> >> Vbox VM
> >> that completed in about 2 hours. With /usr/obj on an NFS v4
> >> filesystem
> >> it takes
> >> about a day. A twelve fold increase is elapsed time makes using
> >> NFSv4
> >> unusable
> >> for this use case.
> 
> That is extremely slow. Here I am unhappy with the makeworld time over
> nfs staying about 13 minutes despite attempts to improve this, but I
> only have old slow hardware (2 core 2GHz Turion laptop). I also have
> a modified FreeBSD-5, which avoids some of the bloat in -current. My
> best
> time without excessive tuning was:
> 
> @ --------------------------------------------------------------
> @ >>> make world completed on Fri Nov 2 23:35:11 EST 2007
> @ (started Fri Nov 2 23:21:27 EST 2007)
> @ --------------------------------------------------------------
> @ 823.53 real 1295.80 user 192.46 sys
> @
> @ Lookup Read Access Fsstat Other Total
> @ 127134 23214 624060 24764 99 799271
> 
> The kernel was current at the time, but userland was ~5.2. Newer
> kernels (1-2 years old) are only a bit slower and don't require any
> modifications to get similar RPC counts (with Getattr.nstead of
> Access)
> /usr including /usr/bin and /usr/src was on nfs, but /bin and /usr/obj
> were local. Everything fits in RAM caches so there was no disk
> activity
> except for new reads and new writes. Network latency was tuned to 60
> usec (min for ping).
> 
> When nfs was pessimized, the above RPC counts blew out to no more than
> 2
> million. Suppose you have 2 million RPCs with a latency of just 65
> usec.
> That gives a latency of 130 seconds. Not too bad, but large compared
> with
> 823 seconds. They latency is amortized by having more than 1 CPU
> and/or
> building concurrently. Then progress can usually be made in some
> threads
> while others are blocked waiting for the RPCs. However, many networks
> have latencies much larger than 65 usec. On the freebsd cluster now,
> the
> min latency is about 250 usec, and since it it has multiple users the
> latency is sometimes over 1 msec. 2 million RPCs with a latency of 1
> msec
> take 2000 seconds, which is a lot compared with a build time of 823
> seconds.
> 
> I consider "nocto" as excessive tuning, since although it would help
> makeworld benchmarks it is unsafe in general. Of course I tried my
> version of it in the above. (They above RPC counts are with the
> following
> critical modifications that weren't in FreeBSD at the time:
> - negative caching
> - fix for broken dotdot caching
> - fix for broken "cto". It did twice as many RPCs as needed.)
> Adding the equivalent of "nocto" reduced the RPC counts significantly,
> but only reduced the real time by about 20 (?) seconds.
> 
> > Source builds on NFS mounts are notoriously slow. A big part of this
> > is
> 
> Only when misconfigured. The nfs build time in the above is between 5%
> and 10% slower than the local build time.
> 
> > the synchronous writes that get done because there is only one dirty
> > byte range for a block and the loader loves to write small
> > non-contiguous
> > areas of its output file.
> 
> Writing to nfs would be slow, but I made /usr/obj local to avoid it.
> Also,
> in other (kernel build) tests where object files are written to the
> current
> directory which is on nfs, the non-separate object directory is
> mounted
> async on the server so it is fast enough. Now my reference is building
> a FreeBSD-4 kernel. My best times were:
> - 32+ seconds (src and obj on nfs, async, -j4)
> - 30- seconds (src and obj of ffs, async, -j4)
> - 64+ (?) seconds (src and obj on nfs, async, -j1)
> - 58 (?) seconds (src and obj on ffs, async, -j1)
> (/usr on nfs, /bin on ffs). Without parallelism, everything has to
> wait
> for the RPCs, and even with low network latency this costs 5-10%.
> 
> >> Too bad the server hangs when I use nfsv3 mount for /usr/obj.
> > Try this mount command:
> > mount -t nfs -o nfsv3,nolockd ...
> > (I do builds of the src tree NFS mounted, so the only reason I can
> > think that it would hang would be a rpc.lockd issue.)
> > If this works, I suspect it will still be slow, but it would be nice
> > to
> > find out how much slower NFSv4 is for your case.
> 
> Needed to localize the slowness anyway. It might be just in the
> server.
> 
> Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1236177219.867591.1366078763224.JavaMail.root>