From owner-freebsd-hackers  Mon Oct 14  2:42:21 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 78F5437B401
	for <hackers@FreeBSD.ORG>; Mon, 14 Oct 2002 02:42:19 -0700 (PDT)
Received: from HAL9000.homeunix.com (12-232-220-15.client.attbi.com [12.232.220.15])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C247143EB7
	for <hackers@FreeBSD.ORG>; Mon, 14 Oct 2002 02:42:18 -0700 (PDT)
	(envelope-from dschultz@uclink.Berkeley.EDU)
Received: from HAL9000.homeunix.com (localhost [127.0.0.1])
	by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id g9E9gHT1000355;
	Mon, 14 Oct 2002 02:42:17 -0700 (PDT)
	(envelope-from dschultz@uclink.Berkeley.EDU)
Received: (from das@localhost)
	by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id g9E9gHB1000354;
	Mon, 14 Oct 2002 02:42:17 -0700 (PDT)
	(envelope-from dschultz@uclink.Berkeley.EDU)
Date: Mon, 14 Oct 2002 02:42:17 -0700
From: David Schultz <dschultz@uclink.Berkeley.EDU>
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: Peter Wemm <peter@wemm.org>, Sean Kelly <smkelly@zombie.org>,
	hackers@FreeBSD.ORG
Subject: Re: swapoff?
Message-ID: <20021014094217.GA228@HAL9000.homeunix.com>
Mail-Followup-To: Matthew Dillon <dillon@apollo.backplane.com>,
	Peter Wemm <peter@wemm.org>, Sean Kelly <smkelly@zombie.org>,
	hackers@FreeBSD.ORG
References: <20020713071911.GA1558@HAL9000.wox.org> <20020713073404.9869A3811@overcee.wemm.org> <20020713115746.GA2162@HAL9000.wox.org> <200207131636.g6DGaoqh081285@apollo.backplane.com> <20021007153845.GA371@HAL9000.homeunix.com> <200210072347.g97Nl3Zo049415@apollo.backplane.com> <20021008113614.GA319@HAL9000.homeunix.com> <200210081745.g98Hjkam078883@apollo.backplane.com> <20021011130154.GA16549@HAL9000.homeunix.com> <200210111814.g9BIEbah040688@apollo.backplane.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200210111814.g9BIEbah040688@apollo.backplane.com>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

[ Latest patches at http://csua.berkeley.edu/~das/swapoff.patch3 ]

Thus spake Matthew Dillon <dillon@apollo.backplane.com>:
> :I'm worried that vm_proc_swapin_all() has a similar race with the
> :swapout daemon.  Presently I assume that my references to the
> :UPAGES object and the associated pages remain valid after the
> :faultin(), and that I can use swap_pager_freeswapspace() to free
> :the correct metadata, instead of calling swap_pager_unswapped() on
> :each page.  Should just hold the process lock until the metadata
> :are freed?
> 
>     Hmm.  Well, the proc lock is not held during vm_proc_swapin()
>     (but the PS_SWAPPINGIN flag is set).  The proc lock is held during
>     vm_proc_swapout().
> 
>     In your vm_proc_swapin_all() you seem to be doing the right thing
>     in regards to the mutexes and retry, and you have already marked
>     the device is SW_CLOSING so if something does get in there and
>     try to swap the process back in it shouldn't allocate swap you are
>     trying to free.
> 
>     I think you may be ok.

The concern was that there could be a race where the process is
swapped out again after I have swapped it back in but before I can
dirty its pages.  (Perhaps I need to hold the process lock a bit
longer.)  Under heavy swapping load, swapoff() is failing to find
a single page about one time out of ten, and I thought that might
be the cause.

I have tweaked swap_pager.c as you suggested earlier.  It runs
about an order of magnitude slower under load now, since it's
doing a vm_object_pip_wait() on every swap-backed object in the
system that's currently paging, even for objects that are paging
to a different swap device.  Unless you have a better idea, I
think one way to improve performance might be to skip the busy
objects, and after the whole hash has been scanned, rescan
starting at the first index that was skipped.  Of course, it would
have to wait for at least one object on each iteration so it
doesn't get into a tight loop.

Another important optimization is to page in the entire block at
once, rather than doing it a page at a time.  I tried to do this
with the following algorithm:

	- grab SWAP_META_PAGES pages
	- note which ones are already in core using a bitmap
	- call getpages() to retrieve the entire range
	- re-lookup all of the pages at the appropriate offset
	  within the object in case they've changed or gone away
	- dirty them, move them to the appropriate queue (based
	  on the values in the bitmap computed earlier), and
	  remove their backing store

This didn't work, and it produced all sorts of interesting panics
for reasons I haven't yet figured out.  My latest patch has some
remnants of of some my attempts in swp_pager_force_pagein(), but
I'll probably leave that optimization for another day unless you
can see an obvious flaw in my approach.

BTW, thanks for all of your help!

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message