Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Mar 2016 10:59:26 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        fs@freebsd.org, rmacklem@freebsd.org, glebius@freebsd.org
Subject:   Re: nfs pessimized by vnode pages changes
Message-ID:  <20160329090209.Q1020@besplex.bde.org>
In-Reply-To: <20160327144755.Y4269@besplex.bde.org>
References:  <20160327144755.Y4269@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 27 Mar 2016, Bruce Evans wrote:

> I debugged another pessimization of nfs.
>
> ncl_getpages() is now almost always called with a count of 1 page, due
> to the change changing the count from faultcount to 1 in r292373 in
> vm_fault().  The only exception seems to be for the initial pagein for
> exec -- this is still normally the nfs rsize.  ncl_getpages() doesn't
> do any readahead stuff like vnode_pager_generic_getpages() does, so it 
> normally does read RPCs of size 1 page instead of the the nfs rsize.
> This gives the following increases in read RPCs for makeworld of an
> old world:
> - with rsize = 16K, from 24k to 39k (the worst case would be 4 times as many)
> - with rsize = 8K, from 39k to 44k (the worst case would be 2 times as many).
>
> Also, nfs_getpages() has buggy logic which works accidentally if the
> count is 1:
>
> X diff -c2 ./fs/nfsclient/nfs_clbio.c~ ./fs/nfsclient/nfs_clbio.c
> X *** ./fs/nfsclient/nfs_clbio.c~	Sun Mar 27 01:31:38 2016
> X --- ./fs/nfsclient/nfs_clbio.c	Sun Mar 27 02:35:32 2016
> X ***************
> X *** 135,140 ****
> X   	 */
> X   	VM_OBJECT_WLOCK(object);
> X ! 	if (pages[npages - 1]->valid != 0 && --npages == 0)
> X   		goto out;
> X   	VM_OBJECT_WUNLOCK(object);
> X X --- 135,155 ----
> X   	 */
> X   	VM_OBJECT_WLOCK(object);
> X ! #if 0
> X ! 	/* This matches the comment. but doesn't work (has little effect). */
> X ! 	if (pages[0]->valid != 0)
> X   		goto out;
>
> The comment still says that the code checks the requested page, but
> that is no longer passed to the function in a_reqpage.  The first page
> is a better geuss of the requested page than the last one, but when
> npages is 1 these pages are the same.
>
> X + #else
> X + 	if (pages[0]->valid != 0)
> X + 		printf("ncl_getpages: page 0 valid; npages %d\n", npages);
> X + 	for (i = 0; i < npages; i++)
> X + 		if (pages[i]->valid != 0)
> X + 			printf("ncl_getpages: page %d valid; npages %d\n",
> X + 			    i, npages);
> X + 	for (i = 0; i < npages; i++)
> X + 		if (pages[i]->valid != 0)
> X + 			npages = i;
> X + 	if (npages == 0)
> X + 		goto out;
> X + #endif
>
> Debugging and more forceful guessing code.  This makes little difference
> except of course to spam the console.
>
> X   	VM_OBJECT_WUNLOCK(object);
> X X ***************
> X *** 199,202 ****
> X --- 214,220 ----
> X   			KASSERT(m->dirty == 0,
> X   			    ("nfs_getpages: page %p is dirty", m));
> X + 			printf("ncl_getpages: partial page %d of %d %s\n",
> X + 			    i, npages,
> X + 			    pages[i]->valid != 0 ? "valid" : "invalid");
> X   		} else {
> X   			/*
> X ***************
> X *** 210,215 ****
> X --- 228,239 ----
> X   			 */
> X   			;
> X + 			printf("ncl_getpages: short page %d of %d %s\n",
> X + 			    i, npages,
> X + 			    pages[i]->valid != 0 ? "valid" : "invalid");
> X   		}
> X   	}
> X + 	for (i = 0; i < npages; i++)
> X + 		printf("ncl_getpages: page %d of %d %s\n",
> X + 		    i, npages, pages[i]->valid != 0 ? "valid" : "invalid");
> X   out:
> X   	VM_OBJECT_WUNLOCK(object);
>
> Further debugging code.  Similar debugging code in the old working version
> shows that normal operation for paging in a 15K file with an rsize of 16K is:
> - call here with npages = 4
> - page in 3 full pages and 1 partial page using 1 RPC
> - call here again with npages = 1 for the partial page
> - use the optimization of returning early for this page -- don't do another
>  RPC
>
> The buggy version does:
> - call here with npages = 1; page in 1 full page using 1 RPC
> - call here with npages = 1; page in 1 full page using 1 RPC
> - call here with npages = 1; page in 1 full page using 1 RPC
> - call here with npages = 1; page in 1 partial page using 1 RPC
> - call here again with npages = 1 for the partial page; the optimization
>  works as before.
>
> The partial page isn't handled very well, but at least there is no extra
> physical i/o for it, at least if it is at EOF.  vfs clustering handles
> partial pages even worse than this.
>
> Bruce

This seems to be very difficult to fix.  Even vnode_pager_generic_getpages()
does input 1 page at a time unless the fs supports bmap, and nfs doesn't
support bmap.

Memory mapped input was already much slower than read().  I get the following
speeds for reading an 8MB file with a cold cache on the client and a warm
cache on the server (8MB is the limit below which cp(1) does "optimized" i/o
using mmap):
- read(): 64MB/sec (doesn't depend much on rsize; the network can't go
   much faster than 70MB/sec; probably does depend on low network latency
   (~100 sec))
- FreeBSD-10 mmap(), rsize=16K: 40MB/sec
- FreeBSD-10 mmap(), rsize=8K: 32MB/sec
- FreeBSD-9 mmap(), rsize=8K: 16MB/sec

vnode_pager_generic_getpages() has always had an even better pessimization
if the fs does support bmap but the fs block size is smaller than PAGE_SIZE.
Then it calls vnode_pager_input_smlfs() once for each page to reduce the
input size to the fs block size which can be as low as 512.  However, if
the fs doesn't support bmap, then it calls vnode_pager_input_old() once
for each page.  vnode_pager_input_old() uses VOP_READ(), so it is not much
slower than read().

vnode_pager_input_old() has chances of being better than the specialized
pager methods in all cases, since read() should do vfs clustering which
does more than read-ahead.  vfs clustering doesn't do read-behind AFAIR,
but perhaps its better read-ahead is more important.

All methods have problems guessing how much read-ahead is good.  The
read-ahead and read-behind should certainly be enough to give fill out
an fs-block.

I couldn't get full-fs-block input to work right in ncl_getpages().  In
the old version, vm_fault_hold() almost always calls it with the pages
for exactly 1 full block, because vm_fault_additional_pages() somehow
finds this many pages.

nfs doesn't use vfs clustering and wouldn't benefit from it since its
block size is the same as its maximum i/o size.  The block size can
now be MAXBCACHEBUF which can be larger than MAXBSIZE, but I think using
that is a wrong way to do things -- a default small block size plus
clustering works better by not using large blocks for everything.
ncl_getpages() could probably expand the read size to MAXPHYS (or whatever
the pbuf size limit is) just as easily as to the rsize.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160329090209.Q1020>