From owner-svn-src-user@freebsd.org  Thu Mar  1 17:31:37 2018
Return-Path: <owner-svn-src-user@freebsd.org>
Delivered-To: svn-src-user@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EE587F3DE9F
 for <svn-src-user@mailman.ysv.freebsd.org>;
 Thu,  1 Mar 2018 17:31:36 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org
 [IPv6:2610:1c1:1:606c::19:3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "mxrelay.nyi.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A2C52716B0;
 Thu,  1 Mar 2018 17:31:36 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 83A1C3F55;
 Thu,  1 Mar 2018 17:31:36 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w21HVaU8091331;
 Thu, 1 Mar 2018 17:31:36 GMT (envelope-from markj@FreeBSD.org)
Received: (from markj@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id w21HVaPR091330;
 Thu, 1 Mar 2018 17:31:36 GMT (envelope-from markj@FreeBSD.org)
Message-Id: <201803011731.w21HVaPR091330@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: markj set sender to
 markj@FreeBSD.org using -f
From: Mark Johnston <markj@FreeBSD.org>
Date: Thu, 1 Mar 2018 17:31:36 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-user@freebsd.org
Subject: svn commit: r330244 - user/markj/vm-playground
X-SVN-Group: user
X-SVN-Commit-Author: markj
X-SVN-Commit-Paths: user/markj/vm-playground
X-SVN-Commit-Revision: 330244
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user/>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Mar 2018 17:31:37 -0000

Author: markj
Date: Thu Mar  1 17:31:36 2018
New Revision: 330244
URL: https://svnweb.freebsd.org/changeset/base/330244

Log:
  Branch jeff's numa branch for some further sys/vm scalability work.

Added:
     - copied from r330243, user/jeff/numa/
Directory Properties:
  user/markj/vm-playground/   (props changed)

From owner-svn-src-user@freebsd.org  Thu Mar  1 18:11:04 2018
Return-Path: <owner-svn-src-user@freebsd.org>
Delivered-To: svn-src-user@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7CC7DF40799
 for <svn-src-user@mailman.ysv.freebsd.org>;
 Thu,  1 Mar 2018 18:11:04 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org
 [IPv6:2610:1c1:1:606c::19:3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "mxrelay.nyi.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2EC9272F87;
 Thu,  1 Mar 2018 18:11:04 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 29C7E45FD;
 Thu,  1 Mar 2018 18:11:04 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w21IB443011255;
 Thu, 1 Mar 2018 18:11:04 GMT (envelope-from markj@FreeBSD.org)
Received: (from markj@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id w21IB3Ud011251;
 Thu, 1 Mar 2018 18:11:03 GMT (envelope-from markj@FreeBSD.org)
Message-Id: <201803011811.w21IB3Ud011251@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: markj set sender to
 markj@FreeBSD.org using -f
From: Mark Johnston <markj@FreeBSD.org>
Date: Thu, 1 Mar 2018 18:11:03 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-user@freebsd.org
Subject: svn commit: r330246 - user/markj/vm-playground/sys/vm
X-SVN-Group: user
X-SVN-Commit-Author: markj
X-SVN-Commit-Paths: user/markj/vm-playground/sys/vm
X-SVN-Commit-Revision: 330246
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user/>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Mar 2018 18:11:04 -0000

Author: markj
Date: Thu Mar  1 18:11:03 2018
New Revision: 330246
URL: https://svnweb.freebsd.org/changeset/base/330246

Log:
  Revert changes to batch the insertion of pages into page queues.
  
  It will be replaced by a more general mechanism in a future commit.
  The approach in this change has a number of disadvantages:
  - It bloats the per-domain structure quite a bit: we keep a batch
    queue per page lock per page queue, for a total of
    PQ_COUNT*PA_LOCK_COUNT queues per domain. We'd like to be able
    to increase PA_LOCK_COUNT without incurring bloat.
  - It only improves scalability for enqueue operations; threads which
    wish to dequeue or requeue pages still must acquire the page queue
    lock. Thus, the page queue lock remains a bottleneck in certain
    workloads. Builds, for example, involve frequent removal of pages
    from PQ_ACTIVE as short-lived VM objects are destroyed.
  - The page daemon still needs to acquire the page queue lock once
    per page during a queue scan.

Modified:
  user/markj/vm-playground/sys/vm/vm_object.c
  user/markj/vm-playground/sys/vm/vm_page.c
  user/markj/vm-playground/sys/vm/vm_pageout.c
  user/markj/vm-playground/sys/vm/vm_pagequeue.h

Modified: user/markj/vm-playground/sys/vm/vm_object.c
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_object.c	Thu Mar  1 17:47:28 2018	(r330245)
+++ user/markj/vm-playground/sys/vm/vm_object.c	Thu Mar  1 18:11:03 2018	(r330246)
@@ -723,6 +723,7 @@ vm_object_terminate_pages(vm_object_t object)
 	vm_page_t p, p_next;
 	struct mtx *mtx, *mtx1;
 	struct vm_pagequeue *pq, *pq1;
+	int dequeued;
 
 	VM_OBJECT_ASSERT_WLOCKED(object);
 
@@ -747,6 +748,7 @@ vm_object_terminate_pages(vm_object_t object)
 				if (mtx != NULL)
 					mtx_unlock(mtx);
 				if (pq != NULL) {
+					vm_pagequeue_cnt_add(pq, dequeued);
 					vm_pagequeue_unlock(pq);
 					pq = NULL;
 				}
@@ -764,19 +766,27 @@ vm_object_terminate_pages(vm_object_t object)
 			    "page %p is not queued", p));
 			pq1 = vm_page_pagequeue(p);
 			if (pq != pq1) {
-				if (pq != NULL)
+				if (pq != NULL) {
+					vm_pagequeue_cnt_add(pq, dequeued);
 					vm_pagequeue_unlock(pq);
+				}
 				pq = pq1;
 				vm_pagequeue_lock(pq);
+				dequeued = 0;
 			}
+			p->queue = PQ_NONE;
+			TAILQ_REMOVE(&pq->pq_pl, p, plinks.q);
+			dequeued--;
 		}
 		if (vm_page_free_prep(p, true))
 			continue;
 unlist:
 		TAILQ_REMOVE(&object->memq, p, listq);
 	}
-	if (pq != NULL)
+	if (pq != NULL) {
+		vm_pagequeue_cnt_add(pq, dequeued);
 		vm_pagequeue_unlock(pq);
+	}
 	if (mtx != NULL)
 		mtx_unlock(mtx);
 

Modified: user/markj/vm-playground/sys/vm/vm_page.c
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_page.c	Thu Mar  1 17:47:28 2018	(r330245)
+++ user/markj/vm-playground/sys/vm/vm_page.c	Thu Mar  1 18:11:03 2018	(r330246)
@@ -74,13 +74,6 @@
  *		* The page daemon can acquire and hold any pair of page queue
  *		  locks in any order.
  *
- *		* Batch queues are used to defer insertions of pages into the
- *		  main paging queues.  The aim is to reduce contention at the
- *		  entry point of the queue by inserting multiple pages in an
- *		  O(1) operation.  This comes at the expense of strict LRU.
- *		  Only a page lock is required to insert a page into a batch
- *		  queue.
- *
  *	- The object lock is required when inserting or removing
  *	  pages from an object (vm_page_insert() or vm_page_remove()).
  *
@@ -443,7 +436,7 @@ vm_page_domain_init(int domain)
 {
 	struct vm_domain *vmd;
 	struct vm_pagequeue *pq;
-	int i, j;
+	int i;
 
 	vmd = VM_DOMAIN(domain);
 	bzero(vmd, sizeof(*vmd));
@@ -465,15 +458,6 @@ vm_page_domain_init(int domain)
 		TAILQ_INIT(&pq->pq_pl);
 		mtx_init(&pq->pq_mutex, pq->pq_name, "vm pagequeue",
 		    MTX_DEF | MTX_DUPOK);
-
-		/*
-		 * The batch queue limits are set in vm_pageout_init() once
-		 * we've set the paging targets.
-		 */
-		for (j = 0; j < BPQ_COUNT; j++) {
-			TAILQ_INIT(&pq->pq_bpqs[j].bpq_pl);
-			pq->pq_bpqs[j].bpq_lim = 1;
-		}
 	}
 	mtx_init(&vmd->vmd_free_mtx, "vm page free queue", NULL, MTX_DEF);
 	mtx_init(&vmd->vmd_pageout_mtx, "vm pageout lock", NULL, MTX_DEF);
@@ -3040,30 +3024,6 @@ vm_page_pagequeue(vm_page_t m)
 }
 
 /*
- *     vm_page_enqueue_batch:
- *
- *     Concatenate the pages in a batch queue to their corresponding paging
- *     queue.
- *
- *     The pagequeue must be locked.
- */
-static void
-vm_page_enqueue_batch(struct vm_pagequeue *pq, u_int idx)
-{
-	struct vm_batchqueue *bpq;
-
-	KASSERT(idx < BPQ_COUNT, ("invalid batch queue index %u", idx));
-	vm_pagequeue_assert_locked(pq);
-
-	bpq = &pq->pq_bpqs[idx];
-	if (bpq->bpq_cnt != 0) {
-		TAILQ_CONCAT(&pq->pq_pl, &bpq->bpq_pl, plinks.q);
-		vm_pagequeue_cnt_add(pq, bpq->bpq_cnt);
-		bpq->bpq_cnt = 0;
-	}
-}
-
-/*
  *	vm_page_dequeue:
  *
  *	Remove the given page from its current page queue.
@@ -3081,7 +3041,6 @@ vm_page_dequeue(vm_page_t m)
 	pq = vm_page_pagequeue(m);
 	vm_pagequeue_lock(pq);
 	m->queue = PQ_NONE;
-	vm_page_enqueue_batch(pq, BPQ_IDX(m));
 	TAILQ_REMOVE(&pq->pq_pl, m, plinks.q);
 	vm_pagequeue_cnt_dec(pq);
 	vm_pagequeue_unlock(pq);
@@ -3102,7 +3061,6 @@ vm_page_dequeue_locked(vm_page_t m)
 	vm_page_lock_assert(m, MA_OWNED);
 	pq = vm_page_pagequeue(m);
 	vm_pagequeue_assert_locked(pq);
-	vm_page_enqueue_batch(pq, BPQ_IDX(m));
 	m->queue = PQ_NONE;
 	TAILQ_REMOVE(&pq->pq_pl, m, plinks.q);
 	vm_pagequeue_cnt_dec(pq);
@@ -3118,7 +3076,6 @@ vm_page_dequeue_locked(vm_page_t m)
 static void
 vm_page_enqueue(uint8_t queue, vm_page_t m)
 {
-	struct vm_batchqueue *bpq;
 	struct vm_pagequeue *pq;
 
 	vm_page_lock_assert(m, MA_OWNED);
@@ -3126,14 +3083,11 @@ vm_page_enqueue(uint8_t queue, vm_page_t m)
 	    ("vm_page_enqueue: invalid queue %u request for page %p",
 	    queue, m));
 	pq = &vm_pagequeue_domain(m)->vmd_pagequeues[queue];
+	vm_pagequeue_lock(pq);
 	m->queue = queue;
-	bpq = &pq->pq_bpqs[BPQ_IDX(m)];
-	TAILQ_INSERT_TAIL(&bpq->bpq_pl, m, plinks.q);
-	if (bpq->bpq_cnt++ >= bpq->bpq_lim) {
-		vm_pagequeue_lock(pq);
-		vm_page_enqueue_batch(pq, BPQ_IDX(m));
-		vm_pagequeue_unlock(pq);
-	}
+	TAILQ_INSERT_TAIL(&pq->pq_pl, m, plinks.q);
+	vm_pagequeue_cnt_inc(pq);
+	vm_pagequeue_unlock(pq);
 }
 
 /*
@@ -3153,7 +3107,6 @@ vm_page_requeue(vm_page_t m)
 	    ("vm_page_requeue: page %p is not queued", m));
 	pq = vm_page_pagequeue(m);
 	vm_pagequeue_lock(pq);
-	vm_page_enqueue_batch(pq, BPQ_IDX(m));
 	TAILQ_REMOVE(&pq->pq_pl, m, plinks.q);
 	TAILQ_INSERT_TAIL(&pq->pq_pl, m, plinks.q);
 	vm_pagequeue_unlock(pq);
@@ -3171,12 +3124,10 @@ vm_page_requeue_locked(vm_page_t m)
 {
 	struct vm_pagequeue *pq;
 
-	vm_page_lock_assert(m, MA_OWNED);
 	KASSERT(m->queue != PQ_NONE,
 	    ("vm_page_requeue_locked: page %p is not queued", m));
 	pq = vm_page_pagequeue(m);
 	vm_pagequeue_assert_locked(pq);
-	vm_page_enqueue_batch(pq, BPQ_IDX(m));
 	TAILQ_REMOVE(&pq->pq_pl, m, plinks.q);
 	TAILQ_INSERT_TAIL(&pq->pq_pl, m, plinks.q);
 }
@@ -3481,7 +3432,6 @@ vm_page_unwire_noq(vm_page_t m)
 static inline void
 _vm_page_deactivate(vm_page_t m, boolean_t noreuse)
 {
-	struct vm_batchqueue *bpq;
 	struct vm_pagequeue *pq;
 	int queue;
 
@@ -3502,17 +3452,9 @@ _vm_page_deactivate(vm_page_t m, boolean_t noreuse)
 		} else {
 			if (queue != PQ_NONE)
 				vm_page_dequeue(m);
-			bpq = &pq->pq_bpqs[BPQ_IDX(m)];
-			if (bpq->bpq_cnt < bpq->bpq_lim) {
-				bpq->bpq_cnt++;
-				m->queue = PQ_INACTIVE;
-				TAILQ_INSERT_TAIL(&bpq->bpq_pl, m, plinks.q);
-				return;
-			}
 			vm_pagequeue_lock(pq);
 		}
 		m->queue = PQ_INACTIVE;
-		vm_page_enqueue_batch(pq, BPQ_IDX(m));
 		if (noreuse)
 			TAILQ_INSERT_BEFORE(
 			    &vm_pagequeue_domain(m)->vmd_inacthead, m,

Modified: user/markj/vm-playground/sys/vm/vm_pageout.c
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_pageout.c	Thu Mar  1 17:47:28 2018	(r330245)
+++ user/markj/vm-playground/sys/vm/vm_pageout.c	Thu Mar  1 18:11:03 2018	(r330246)
@@ -1952,7 +1952,6 @@ vm_pageout_init_domain(int domain)
 {
 	struct vm_domain *vmd;
 	struct sysctl_oid *oid;
-	int lim, i, j;
 
 	vmd = VM_DOMAIN(domain);
 	vmd->vmd_interrupt_free_min = 2;
@@ -1991,22 +1990,6 @@ vm_pageout_init_domain(int domain)
 	 */
 	vmd->vmd_background_launder_target = (vmd->vmd_free_target -
 	    vmd->vmd_free_min) / 10;
-
-	/*
-	 * Set batch queue limits for paging queues.
-	 *
-	 * We want these to be small relative to the amount of system memory.
-	 * Roughly v_page_count / PA_LOCK_COUNT pages are mapped to a given
-	 * batch queue; ensure that no more than 0.1% of them may be queued in
-	 * the batch queue for a particular page queue.  Then no more than
-	 * 0.1% * PQ_COUNT can be queued across all page queues.  This gives a
-	 * per-page queue batch limit of 1 page per GB of memory on amd64.
-	 */
-
-	lim = MAX(vmd->vmd_page_count / 1000 / BPQ_COUNT, 8);
-	for (i = 0; i < PQ_COUNT; i++)
-		for (j = 0; j < BPQ_COUNT; j++)
-			vmd->vmd_pagequeues[i].pq_bpqs[j].bpq_lim = lim;
 
 	/* Initialize the pageout daemon pid controller. */
 	pidctrl_init(&vmd->vmd_pid, hz / VM_INACT_SCAN_RATE,

Modified: user/markj/vm-playground/sys/vm/vm_pagequeue.h
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_pagequeue.h	Thu Mar  1 17:47:28 2018	(r330245)
+++ user/markj/vm-playground/sys/vm/vm_pagequeue.h	Thu Mar  1 18:11:03 2018	(r330246)
@@ -66,23 +66,11 @@
 #define	_VM_PAGEQUEUE_
 
 #ifdef _KERNEL
-
-#define	BPQ_COUNT	PA_LOCK_COUNT
-#define	BPQ_IDX(m)	(pa_index(VM_PAGE_TO_PHYS(m)) % BPQ_COUNT)
-
-struct vm_batchqueue {
-	struct pglist	bpq_pl;
-	int		bpq_cnt;
-	int		bpq_lim;
-} __aligned(CACHE_LINE_SIZE);
-
 struct vm_pagequeue {
 	struct mtx	pq_mutex;
 	struct pglist	pq_pl;
 	int		pq_cnt;
 	const char	* const pq_name;
-	char            _pq_pad[0] __aligned(CACHE_LINE_SIZE);
-	struct vm_batchqueue pq_bpqs[BPQ_COUNT];
 } __aligned(CACHE_LINE_SIZE);
 
 #include <vm/uma.h>

From owner-svn-src-user@freebsd.org  Thu Mar  1 18:19:15 2018
Return-Path: <owner-svn-src-user@freebsd.org>
Delivered-To: svn-src-user@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B5D89F40FD6
 for <svn-src-user@mailman.ysv.freebsd.org>;
 Thu,  1 Mar 2018 18:19:15 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org
 [IPv6:2610:1c1:1:606c::19:3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "mxrelay.nyi.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 6B26A734D4;
 Thu,  1 Mar 2018 18:19:15 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 47EA74769;
 Thu,  1 Mar 2018 18:19:15 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w21IJFAl013919;
 Thu, 1 Mar 2018 18:19:15 GMT (envelope-from markj@FreeBSD.org)
Received: (from markj@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id w21IJFBk013918;
 Thu, 1 Mar 2018 18:19:15 GMT (envelope-from markj@FreeBSD.org)
Message-Id: <201803011819.w21IJFBk013918@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: markj set sender to
 markj@FreeBSD.org using -f
From: Mark Johnston <markj@FreeBSD.org>
Date: Thu, 1 Mar 2018 18:19:15 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-user@freebsd.org
Subject: svn commit: r330247 - user/markj/vm-playground/sys/vm
X-SVN-Group: user
X-SVN-Commit-Author: markj
X-SVN-Commit-Paths: user/markj/vm-playground/sys/vm
X-SVN-Commit-Revision: 330247
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user/>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Mar 2018 18:19:15 -0000

Author: markj
Date: Thu Mar  1 18:19:14 2018
New Revision: 330247
URL: https://svnweb.freebsd.org/changeset/base/330247

Log:
  Temporarily remove page-clustering code from the PQ_INACTIVE scan.
  
  It will be reintegrated after some page queue locking changes go in.

Modified:
  user/markj/vm-playground/sys/vm/vm_pageout.c

Modified: user/markj/vm-playground/sys/vm/vm_pageout.c
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_pageout.c	Thu Mar  1 18:11:03 2018	(r330246)
+++ user/markj/vm-playground/sys/vm/vm_pageout.c	Thu Mar  1 18:19:14 2018	(r330247)
@@ -1094,104 +1094,6 @@ dolaundry:
 	}
 }
 
-struct pgo_pglist {
-	struct pglist	pgl;
-	int		count;
-};
-
-static void
-vm_pageout_pglist_init(struct pgo_pglist *pglist)
-{
-
-	TAILQ_INIT(&pglist->pgl);
-	pglist->count = 0;
-}
-
-static void
-vm_pageout_pglist_append(struct pgo_pglist *pglist, vm_page_t m)
-{
-	if (vm_page_free_prep(m, false)) {
-		m->flags &= ~PG_ZERO;
-		TAILQ_INSERT_TAIL(&pglist->pgl, m, listq);
-		pglist->count++;
-	}
-}
-
-static void
-vm_pageout_pglist_flush(struct pgo_pglist *pglist, bool force)
-{
-	if (pglist->count > 64 || (force && pglist->count != 0)) {
-		vm_page_free_phys_pglist(&pglist->pgl);
-		vm_pageout_pglist_init(pglist);
-	}
-}
-
-
-static int
-vm_pageout_free_pages(struct pgo_pglist *pglist, vm_object_t object,
-    vm_page_t m)
-{
-	vm_page_t p, pp;
-	struct mtx *mtx;
-	vm_pindex_t start;
-	int pcount, count;
-
-	pcount = MAX(object->iosize / PAGE_SIZE, 1);
-	mtx = vm_page_lockptr(m);
-	count = 1;
-	if (pcount == 1) {
-		vm_pageout_pglist_append(pglist, m);
-		goto out;
-	}
-
-	/* Find the first page in the block. */
-	start = m->pindex - (m->pindex % pcount);
-	for (p = m; p->pindex > start && (pp = vm_page_prev(p)) != NULL; 
-	    p = pp);
-
-	/* Free the original page so we don't validate it twice. */
-	if (p == m)
-		p = vm_page_next(m);
-	vm_pageout_pglist_append(pglist, m);
-	/* Iterate through the block range and free compatible pages. */
-	for (m = p; m != NULL && m->pindex < start + pcount; m = p) {
-		p = TAILQ_NEXT(m, listq);
-		if (mtx != vm_page_lockptr(m)) {
-			mtx_unlock(mtx);
-			mtx = vm_page_lockptr(m);
-			mtx_lock(mtx);
-		}
-		if (vm_page_held(m) || vm_page_busied(m) ||
-		    m->queue != PQ_INACTIVE)
-			continue;
-		if (m->valid == 0)
-			goto free_page;
-		if ((m->aflags & PGA_REFERENCED) != 0)
-			continue;
-		if (object->ref_count != 0) {
-			if (pmap_ts_referenced(m)) {
-				vm_page_aflag_set(m, PGA_REFERENCED);
-				continue;
-			}
-			vm_page_test_dirty(m);
-			if (m->dirty == 0)
-				pmap_remove_all(m);
-		}
-		if (m->dirty)
-			continue;
-free_page:
-		vm_pageout_pglist_append(pglist, m);
-		count++;
-	}
-out:
-	mtx_unlock(mtx);
-	VM_OBJECT_WUNLOCK(object);
-	vm_pageout_pglist_flush(pglist, false);
-	VM_CNT_ADD(v_dfree, count);
-
-	return (count);
-}
-
 /*
  *	vm_pageout_scan does the dirty work for the pageout daemon.
  *
@@ -1204,7 +1106,6 @@ out:
 static bool
 vm_pageout_scan(struct vm_domain *vmd, int pass, int shortage)
 {
-	struct pgo_pglist pglist;
 	vm_page_t m, next;
 	struct vm_pagequeue *pq;
 	vm_object_t object;
@@ -1260,7 +1161,6 @@ vm_pageout_scan(struct vm_domain *vmd, int pass, int s
 	 */
 	pq = &vmd->vmd_pagequeues[PQ_INACTIVE];
 	maxscan = pq->pq_cnt;
-	vm_pageout_pglist_init(&pglist);
 	vm_pagequeue_lock(pq);
 	queue_locked = TRUE;
 	for (m = TAILQ_FIRST(&pq->pq_pl);
@@ -1425,17 +1325,15 @@ unlock_page:
 		 */
 		if (m->dirty == 0) {
 free_page:
-			page_shortage -= vm_pageout_free_pages(&pglist,
-			    object, m);
-			goto lock_queue;
+			vm_page_free(m);
+			VM_CNT_INC(v_dfree);
+			--page_shortage;
 		} else if ((object->flags & OBJ_DEAD) == 0)
 			vm_page_launder(m);
 drop_page:
 		vm_page_unlock(m);
 		VM_OBJECT_WUNLOCK(object);
-lock_queue:
 		if (!queue_locked) {
-			vm_pageout_pglist_flush(&pglist, false);
 			vm_pagequeue_lock(pq);
 			queue_locked = TRUE;
 		}
@@ -1443,7 +1341,6 @@ lock_queue:
 		TAILQ_REMOVE(&pq->pq_pl, &vmd->vmd_marker, plinks.q);
 	}
 	vm_pagequeue_unlock(pq);
-	vm_pageout_pglist_flush(&pglist, true);
 
 	/*
 	 * Wake up the laundry thread so that it can perform any needed

From owner-svn-src-user@freebsd.org  Fri Mar  2 18:12:27 2018
Return-Path: <owner-svn-src-user@freebsd.org>
Delivered-To: svn-src-user@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 97329F39C4F
 for <svn-src-user@mailman.ysv.freebsd.org>;
 Fri,  2 Mar 2018 18:12:26 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org
 [IPv6:2610:1c1:1:606c::19:3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "mxrelay.nyi.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 572406F589;
 Fri,  2 Mar 2018 18:12:26 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 51EB41AF12;
 Fri,  2 Mar 2018 18:12:26 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w22ICQ3i037416;
 Fri, 2 Mar 2018 18:12:26 GMT (envelope-from markj@FreeBSD.org)
Received: (from markj@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id w22ICPur037409;
 Fri, 2 Mar 2018 18:12:25 GMT (envelope-from markj@FreeBSD.org)
Message-Id: <201803021812.w22ICPur037409@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: markj set sender to
 markj@FreeBSD.org using -f
From: Mark Johnston <markj@FreeBSD.org>
Date: Fri, 2 Mar 2018 18:12:25 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-user@freebsd.org
Subject: svn commit: r330288 - user/markj/vm-playground/sys/vm
X-SVN-Group: user
X-SVN-Commit-Author: markj
X-SVN-Commit-Paths: user/markj/vm-playground/sys/vm
X-SVN-Commit-Revision: 330288
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user/>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Mar 2018 18:12:27 -0000

Author: markj
Date: Fri Mar  2 18:12:25 2018
New Revision: 330288
URL: https://svnweb.freebsd.org/changeset/base/330288

Log:
  Add vm_page_alloc_pages_after().
  
  This is a new page allocation which is intended to complement
  vm_page_grab_pages(). It permits the allocation of multiple pages,
  contiguous in the pindex space, with a single call. When
  VM_ALLOC_{NOWAIT,WAITFAIL} are specified, the returned run may be
  shorter than the one requested. In support of this function,
  vm_reserv_extend() and vm_reserv_alloc_page() may now optionally
  return a run of contiguous pages from the same reservation, and the
  new vm_phys_alloc_npages() function is used to allocate pages from
  the physical memory allocator.

Modified:
  user/markj/vm-playground/sys/vm/swap_pager.c
  user/markj/vm-playground/sys/vm/vm_page.c
  user/markj/vm-playground/sys/vm/vm_page.h
  user/markj/vm-playground/sys/vm/vm_pagequeue.h
  user/markj/vm-playground/sys/vm/vm_reserv.c
  user/markj/vm-playground/sys/vm/vm_reserv.h
  user/markj/vm-playground/sys/vm/vnode_pager.c

Modified: user/markj/vm-playground/sys/vm/swap_pager.c
==============================================================================
--- user/markj/vm-playground/sys/vm/swap_pager.c	Fri Mar  2 17:07:08 2018	(r330287)
+++ user/markj/vm-playground/sys/vm/swap_pager.c	Fri Mar  2 18:12:25 2018	(r330288)
@@ -1097,10 +1097,10 @@ swap_pager_getpages(vm_object_t object, vm_page_t *ma,
     int *rahead)
 {
 	struct buf *bp;
-	vm_page_t mpred, msucc, p;
+	vm_page_t mpred, msucc;
 	vm_pindex_t pindex;
 	daddr_t blk;
-	int i, j, maxahead, maxbehind, reqcount, shift;
+	int i, maxahead, maxbehind, reqcount, shift;
 
 	reqcount = count;
 
@@ -1136,39 +1136,27 @@ swap_pager_getpages(vm_object_t object, vm_page_t *ma,
 	/*
 	 * Allocate readahead and readbehind pages.
 	 */
-	shift = rbehind != NULL ? *rbehind : 0;
-	if (shift != 0) {
-		for (i = 1; i <= shift; i++) {
-			p = vm_page_alloc(object, ma[0]->pindex - i,
-			    VM_ALLOC_NORMAL);
-			if (p == NULL) {
-				/* Shift allocated pages to the left. */
-				for (j = 0; j < i - 1; j++)
-					bp->b_pages[j] =
-					    bp->b_pages[j + shift - i + 1];
-				break;
-			}
-			bp->b_pages[shift - i] = p;
-		}
-		shift = i - 1;
-		*rbehind = shift;
-	}
+	if (rbehind != NULL && *rbehind > 0) {
+		shift = vm_page_alloc_pages_after(object,
+		    ma[0]->pindex - *rbehind, VM_ALLOC_NORMAL, &bp->b_pages[0],
+		    *rbehind, mpred);
+		if (shift != *rbehind) {
+			/* Drop a partially allocated run. */
+			for (i = 0; i < shift; i++)
+				vm_page_free(bp->b_pages[i]);
+			shift = *rbehind = 0;
+		} else
+			count += *rbehind;
+	} else
+		shift = 0;
 	for (i = 0; i < reqcount; i++)
 		bp->b_pages[i + shift] = ma[i];
-	if (rahead != NULL) {
-		for (i = 0; i < *rahead; i++) {
-			p = vm_page_alloc(object,
-			    ma[reqcount - 1]->pindex + i + 1, VM_ALLOC_NORMAL);
-			if (p == NULL)
-				break;
-			bp->b_pages[shift + reqcount + i] = p;
-		}
-		*rahead = i;
-	}
-	if (rbehind != NULL)
-		count += *rbehind;
-	if (rahead != NULL)
+	if (rahead != NULL && *rahead > 0) {
+		*rahead = vm_page_alloc_pages_after(object,
+		    ma[reqcount - 1]->pindex + 1, VM_ALLOC_NORMAL,
+		    &bp->b_pages[reqcount + shift], *rahead, ma[reqcount - 1]);
 		count += *rahead;
+	}
 
 	vm_object_pip_add(object, count);
 

Modified: user/markj/vm-playground/sys/vm/vm_page.c
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_page.c	Fri Mar  2 17:07:08 2018	(r330287)
+++ user/markj/vm-playground/sys/vm/vm_page.c	Fri Mar  2 18:12:25 2018	(r330288)
@@ -1696,9 +1696,10 @@ vm_page_alloc_after(vm_object_t object, vm_pindex_t pi
  * for the request class and false otherwise.
  */
 int
-vm_domain_allocate(struct vm_domain *vmd, int req, int npages)
+vm_domain_allocate(struct vm_domain *vmd, int req, int npages, bool partial)
 {
 	u_int limit, old, new;
+	int avail;
 
 	req = req & VM_ALLOC_CLASS_MASK;
 
@@ -1707,6 +1708,7 @@ vm_domain_allocate(struct vm_domain *vmd, int req, int
 	 */
 	if (curproc == pageproc && req != VM_ALLOC_INTERRUPT)
 		req = VM_ALLOC_SYSTEM;
+
 	if (req == VM_ALLOC_INTERRUPT)
 		limit = 0;
 	else if (req == VM_ALLOC_SYSTEM)
@@ -1719,9 +1721,12 @@ vm_domain_allocate(struct vm_domain *vmd, int req, int
 	 */
 	do {
 		old = vmd->vmd_free_count;
-		new = old - npages;
-		if (new < limit)
+		if (old <= limit)
 			return (0);
+		avail = min(old - limit, (u_int)npages);
+		if (avail != npages && !partial)
+			return (0);
+		new = old - avail;
 	} while (atomic_cmpset_int(&vmd->vmd_free_count, old, new) == 0);
 
 	/* Wake the page daemon if we've crossed the threshold. */
@@ -1733,7 +1738,7 @@ vm_domain_allocate(struct vm_domain *vmd, int req, int
 	    (old >= vmd->vmd_free_severe && new < vmd->vmd_free_severe))
 		vm_domain_set(vmd);
 
-	return (1);
+	return (avail);
 }
 
 vm_page_t
@@ -1764,21 +1769,22 @@ again:
 	 * Can we allocate the page from a reservation?
 	 */
 	if (vm_object_reserv(object) &&
-	    ((m = vm_reserv_extend(req, object, pindex, domain, mpred)) != NULL ||
-	    (m = vm_reserv_alloc_page(req, object, pindex, domain, mpred)) != NULL)) {
+	    ((m = vm_reserv_extend(req, object, pindex, domain, mpred, NULL)) !=
+	    NULL ||
+	    (m = vm_reserv_alloc_page(req, object, pindex, domain, mpred,
+	    NULL)) != NULL)) {
 		domain = vm_phys_domain(m);
 		vmd = VM_DOMAIN(domain);
 		goto found;
 	}
 #endif
 	vmd = VM_DOMAIN(domain);
-	if (object != NULL && !vm_object_reserv(object) &&
-	    vmd->vmd_pgcache != NULL) {
+	if (!vm_object_reserv(object) && vmd->vmd_pgcache != NULL) {
 		m = uma_zalloc(vmd->vmd_pgcache, M_NOWAIT);
 		if (m != NULL)
 			goto found;
 	}
-	if (vm_domain_allocate(vmd, req, 1)) {
+	if (vm_domain_allocate(vmd, req, 1, false) == 1) {
 		/*
 		 * If not, allocate it from the free page queues.
 		 */
@@ -1828,7 +1834,7 @@ found:
 		m->busy_lock = VPB_SINGLE_EXCLUSIVER;
 	if ((req & VM_ALLOC_SBUSY) != 0)
 		m->busy_lock = VPB_SHARERS_WORD(1);
-	if (req & VM_ALLOC_WIRED) {
+	if ((req & VM_ALLOC_WIRED) != 0) {
 		/*
 		 * The page lock is not required for wiring a page until that
 		 * page is inserted into the object.
@@ -1869,7 +1875,191 @@ found:
 	return (m);
 }
 
+int
+vm_page_alloc_pages_after(vm_object_t object, vm_pindex_t pindex, int req,
+    vm_page_t *ma, int nreq, vm_page_t mpred)
+{
+	struct vm_domainset_iter di;
+	int domain, n;
+
+	vm_domainset_iter_page_init(&di, object, &domain, &req);
+	do {
+		n = vm_page_alloc_pages_domain_after(object, pindex, domain,
+		    req, ma, nreq, mpred);
+		if (n > 0)
+			break;
+	} while (vm_domainset_iter_page(&di, &domain, &req) == 0);
+
+	return (n);
+}
+
 /*
+ *	vm_page_alloc_pages_after:
+ *
+ *	Allocate a range of pages, contiguous in the pindex space.  The
+ *	number of pages actually allocated is returned and may be smaller
+ *	than the number requested unless VM_ALLOC_WAITOK is specified.
+ *	This function is otherwise identical to vm_page_alloc().
+ */
+int
+vm_page_alloc_pages_domain_after(vm_object_t object, vm_pindex_t pindex,
+    int domain, int req, vm_page_t *ma, int nreq, vm_page_t mpred)
+{
+	struct vm_domain *vmd;
+	vm_page_t m;
+	int avail, i, nalloc, pool;
+	u_int busy_lock, flags, oflags;
+
+	KASSERT(nreq > 0, ("invalid nreq %d", nreq));
+	KASSERT((object != NULL) == ((req & VM_ALLOC_NOOBJ) == 0) &&
+	    (object != NULL || (req & VM_ALLOC_SBUSY) == 0) &&
+	    ((req & (VM_ALLOC_NOBUSY | VM_ALLOC_SBUSY)) !=
+	    (VM_ALLOC_NOBUSY | VM_ALLOC_SBUSY)),
+	    ("inconsistent object(%p)/req(%x)", object, req));
+	KASSERT(object == NULL || (req & VM_ALLOC_WAITOK) == 0,
+	    ("Can't sleep and retry object insertion."));
+	KASSERT(mpred == NULL || mpred->pindex < pindex,
+	    ("mpred %p doesn't precede pindex 0x%jx", mpred,
+	    (uintmax_t)pindex));
+	if (object != NULL)
+		VM_OBJECT_ASSERT_WLOCKED(object);
+
+	nalloc = 0;
+
+#if VM_NRESERVLEVEL > 0
+	if (vm_object_reserv(object)) {
+		avail = nreq;
+		m = vm_reserv_extend(req, object, pindex, domain, mpred,
+		    &avail);
+		if (m == NULL)
+			m = vm_reserv_alloc_page(req, object, pindex, domain,
+			    mpred, &avail);
+		if (m != NULL) {
+			domain = vm_phys_domain(m);
+			while (nalloc < avail)
+				ma[nalloc++] = m++;
+
+			/*
+			 * We might have gotten a short run back because we
+			 * reached the end of a reservation.  If so, declare
+			 * success now rather than trying to fill the rest of
+			 * the array, in the hope that a subsequent allocation
+			 * attempt will allocate a new reservation.
+			 */
+			if (nalloc == nreq || (req & VM_ALLOC_WAITOK) == 0)
+				goto done;
+		}
+	}
+#endif
+
+again:
+	vmd = VM_DOMAIN(domain);
+	if ((avail = vm_domain_allocate(vmd, req, nreq - nalloc, true)) > 0) {
+		pool = object != NULL ? VM_FREEPOOL_DEFAULT :
+		    VM_FREEPOOL_DIRECT;
+		vm_domain_free_lock(vmd);
+		do {
+			i = vm_phys_alloc_npages(domain, pool, &m,
+			    avail - nalloc);
+			if (i == 0) {
+				vm_domain_freecnt_inc(vmd, avail - nalloc);
+				break;
+			}
+			for (; i > 0; i--)
+				ma[nalloc++] = m++;
+		} while (nalloc < avail);
+		vm_domain_free_unlock(vmd);
+	}
+	if (nalloc == 0 || (nalloc < nreq && (req & VM_ALLOC_WAITOK) != 0)) {
+#if VM_NRESERVLEVEL > 0
+		if (vm_reserv_reclaim_inactive(domain))
+			goto again;
+#endif
+
+		/*
+		 * We failed to allocate at least one page, or the caller
+		 * requested a blocking allocation and we weren't able to
+		 * scrounge enough pages in the latest attempt.
+		 */
+		if (vm_domain_alloc_fail(vmd, object, req))
+			goto again;
+		return (0);
+	}
+
+done:
+	for (i = 0; i < nalloc; i++)
+		vm_page_alloc_check(ma[i]);
+
+	/*
+	 * Initialize the pages.  Only the PG_ZERO flag is inherited.
+	 */
+	flags = 0;
+	if ((req & VM_ALLOC_ZERO) != 0)
+		flags |= PG_ZERO;
+	if ((req & VM_ALLOC_NODUMP) != 0)
+		flags |= PG_NODUMP;
+	oflags = (object == NULL || (object->flags & OBJ_UNMANAGED) != 0) ?
+	    VPO_UNMANAGED : 0;
+	busy_lock = VPB_UNBUSIED;
+	if ((req & (VM_ALLOC_NOBUSY | VM_ALLOC_NOOBJ | VM_ALLOC_SBUSY)) == 0)
+		busy_lock = VPB_SINGLE_EXCLUSIVER;
+	if ((req & VM_ALLOC_SBUSY) != 0)
+		busy_lock = VPB_SHARERS_WORD(1);
+
+	for (i = 0; i < nalloc; i++) {
+		m = ma[i];
+
+		m->flags = (m->flags | PG_NODUMP) & flags;
+		m->aflags = 0;
+		m->oflags = oflags;
+		m->busy_lock = busy_lock;
+		if ((req & VM_ALLOC_WIRED) != 0) {
+			/*
+			 * The page lock is not required for wiring a page
+			 * until that page is inserted into the object.
+			 */
+			m->wire_count = 1;
+		}
+		m->act_count = 0;
+
+		if (object != NULL) {
+			if (vm_page_insert_after(m, object, pindex + i,
+			    mpred)) {
+				avail = i;
+				for (; i < nalloc; i++) {
+					m = ma[i];
+					m->busy_lock = VPB_UNBUSIED;
+					m->oflags = VPO_UNMANAGED;
+					m->wire_count = 0;
+					KASSERT(m->object == NULL,
+					    ("page %p has object", m));
+					/* Don't change PG_ZERO. */
+					vm_page_free_toq(m);
+				}
+				if ((req & VM_ALLOC_WAITFAIL) != 0) {
+					VM_OBJECT_WUNLOCK(object);
+					vm_radix_wait();
+					VM_OBJECT_WLOCK(object);
+				}
+				nalloc = avail;
+				break;
+			}
+
+			/* Ignore device objects; the pager sets "memattr" for them. */
+			if (object->memattr != VM_MEMATTR_DEFAULT &&
+			    (object->flags & OBJ_FICTITIOUS) == 0)
+				pmap_page_set_memattr(m, object->memattr);
+		} else
+			m->pindex = pindex + i;
+		mpred = m;
+	}
+	if ((req & VM_ALLOC_WIRED) != 0)
+		VM_CNT_ADD(v_wire_count, nalloc);
+
+	return (nalloc);
+}
+
+/*
  *	vm_page_alloc_contig:
  *
  *	Allocate a contiguous set of physical pages of the given size "npages"
@@ -1981,7 +2171,7 @@ again:
 #endif
 	m_ret = NULL;
 	vmd = VM_DOMAIN(domain);
-	if (vm_domain_allocate(vmd, req, npages)) {
+	if (vm_domain_allocate(vmd, req, npages, false) == npages) {
 		/*
 		 * allocate them from the free page queues.
 		 */
@@ -2139,7 +2329,7 @@ vm_page_alloc_freelist_domain(int domain, int freelist
 	 */
 	vmd = VM_DOMAIN(domain);
 again:
-	if (vm_domain_allocate(vmd, req, 1)) {
+	if (vm_domain_allocate(vmd, req, 1, false) == 1) {
 		vm_domain_free_lock(vmd);
 		m = vm_phys_alloc_freelist_pages(domain, freelist,
 		    VM_FREEPOOL_DIRECT, 0);
@@ -2191,7 +2381,7 @@ vm_page_import(void *arg, void **store, int cnt, int d
 		    MIN(n, cnt-i));
 		if (n == 0)
 			break;
-		if (!vm_domain_allocate(vmd, VM_ALLOC_NORMAL, n)) {
+		if (vm_domain_allocate(vmd, VM_ALLOC_NORMAL, n, false) == 0) {
 			vm_phys_free_contig(m, n);
 			break;
 		}
@@ -3189,14 +3379,14 @@ vm_page_free_prep(vm_page_t m, bool pagequeue_locked)
 	if ((m->oflags & VPO_UNMANAGED) == 0) {
 		vm_page_lock_assert(m, MA_OWNED);
 		KASSERT(!pmap_page_is_mapped(m),
-		    ("vm_page_free_toq: freeing mapped page %p", m));
+		    ("vm_page_free_prep: freeing mapped page %p", m));
 	} else
 		KASSERT(m->queue == PQ_NONE,
-		    ("vm_page_free_toq: unmanaged page %p is queued", m));
+		    ("vm_page_free_prep: unmanaged page %p is queued", m));
 	VM_CNT_INC(v_tfree);
 
 	if (vm_page_sbusied(m))
-		panic("vm_page_free: freeing busy page %p", m);
+		panic("vm_page_free_prep: freeing busy page %p", m);
 
 	vm_page_remove(m);
 
@@ -3222,11 +3412,11 @@ vm_page_free_prep(vm_page_t m, bool pagequeue_locked)
 	vm_page_undirty(m);
 
 	if (m->wire_count != 0)
-		panic("vm_page_free: freeing wired page %p", m);
+		panic("vm_page_free_prep: freeing wired page %p", m);
 	if (m->hold_count != 0) {
 		m->flags &= ~PG_ZERO;
 		KASSERT((m->flags & PG_UNHOLDFREE) == 0,
-		    ("vm_page_free: freeing PG_UNHOLDFREE page %p", m));
+		    ("vm_page_free_prep: freeing PG_UNHOLDFREE page %p", m));
 		m->flags |= PG_UNHOLDFREE;
 		return (false);
 	}
@@ -3703,9 +3893,8 @@ int
 vm_page_grab_pages(vm_object_t object, vm_pindex_t pindex, int allocflags,
     vm_page_t *ma, int count)
 {
-	vm_page_t m, mpred;
-	int pflags;
-	int i;
+	vm_page_t m, mpred, msucc;
+	int i, pflags, run;
 	bool sleep;
 
 	VM_OBJECT_ASSERT_WLOCKED(object);
@@ -3717,6 +3906,7 @@ vm_page_grab_pages(vm_object_t object, vm_pindex_t pin
 	KASSERT((allocflags & VM_ALLOC_SBUSY) == 0 ||
 	    (allocflags & VM_ALLOC_IGN_SBUSY) != 0,
 	    ("vm_page_grab_pages: VM_ALLOC_SBUSY/IGN_SBUSY mismatch"));
+
 	if (count == 0)
 		return (0);
 	pflags = allocflags & ~(VM_ALLOC_NOWAIT | VM_ALLOC_WAITOK |
@@ -3728,10 +3918,14 @@ retrylookup:
 	m = vm_radix_lookup_le(&object->rtree, pindex + i);
 	if (m == NULL || m->pindex != pindex + i) {
 		mpred = m;
+		msucc = mpred != NULL ? TAILQ_NEXT(mpred, listq) :
+		    TAILQ_FIRST(&object->memq);
 		m = NULL;
-	} else
+	} else {
 		mpred = TAILQ_PREV(m, pglist, listq);
-	for (; i < count; i++) {
+		msucc = TAILQ_NEXT(m, listq);
+	}
+	while (i < count) {
 		if (m != NULL) {
 			sleep = (allocflags & VM_ALLOC_IGN_SBUSY) != 0 ?
 			    vm_page_xbusied(m) : vm_page_busied(m);
@@ -3761,21 +3955,41 @@ retrylookup:
 				vm_page_xbusy(m);
 			if ((allocflags & VM_ALLOC_SBUSY) != 0)
 				vm_page_sbusy(m);
+			if (m->valid == 0 &&
+			    (allocflags & VM_ALLOC_ZERO) != 0) {
+				if ((m->flags & PG_ZERO) == 0)
+					pmap_zero_page(m);
+				m->valid = VM_PAGE_BITS_ALL;
+			}
+			ma[i++] = m;
 		} else {
-			m = vm_page_alloc_after(object, pindex + i,
-			    pflags | VM_ALLOC_COUNT(count - i), mpred);
-			if (m == NULL) {
+			/*
+			 * Try to allocate multiple consecutive pages.  Use the
+			 * succeeding page, if any, to bound the length of the
+			 * requested run.
+			 */
+			run = msucc == NULL || msucc->pindex >= pindex + count ?
+			    count - i : msucc->pindex - (pindex + i);
+			run = vm_page_alloc_pages_after(object, pindex + i,
+			    pflags | VM_ALLOC_COUNT(run), ma + i, run, mpred);
+			if (run == 0) {
 				if ((allocflags & VM_ALLOC_NOWAIT) != 0)
 					break;
 				goto retrylookup;
 			}
+			if ((allocflags & VM_ALLOC_ZERO) != 0) {
+				for (; run != 0; run--, i++) {
+					m = ma[i];
+					if ((m->flags & PG_ZERO) == 0)
+						pmap_zero_page(m);
+					m->valid = VM_PAGE_BITS_ALL;
+				}
+			} else
+				i += run;
+			m = ma[i - 1];
 		}
-		if (m->valid == 0 && (allocflags & VM_ALLOC_ZERO) != 0) {
-			if ((m->flags & PG_ZERO) == 0)
-				pmap_zero_page(m);
-			m->valid = VM_PAGE_BITS_ALL;
-		}
-		ma[i] = mpred = m;
+		mpred = m;
+		msucc = TAILQ_NEXT(m, listq);
 		m = vm_page_next(m);
 	}
 	return (i);

Modified: user/markj/vm-playground/sys/vm/vm_page.h
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_page.h	Fri Mar  2 17:07:08 2018	(r330287)
+++ user/markj/vm-playground/sys/vm/vm_page.h	Fri Mar  2 18:12:25 2018	(r330288)
@@ -467,6 +467,10 @@ vm_page_t vm_page_alloc_domain(vm_object_t, vm_pindex_
 vm_page_t vm_page_alloc_after(vm_object_t, vm_pindex_t, int, vm_page_t);
 vm_page_t vm_page_alloc_domain_after(vm_object_t, vm_pindex_t, int, int,
     vm_page_t);
+int vm_page_alloc_pages_after(vm_object_t, vm_pindex_t, int, vm_page_t *, int,
+    vm_page_t);
+int vm_page_alloc_pages_domain_after(vm_object_t, vm_pindex_t, int, int,
+    vm_page_t *, int, vm_page_t);
 vm_page_t vm_page_alloc_contig(vm_object_t object, vm_pindex_t pindex, int req,
     u_long npages, vm_paddr_t low, vm_paddr_t high, u_long alignment,
     vm_paddr_t boundary, vm_memattr_t memattr);

Modified: user/markj/vm-playground/sys/vm/vm_pagequeue.h
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_pagequeue.h	Fri Mar  2 17:07:08 2018	(r330287)
+++ user/markj/vm-playground/sys/vm/vm_pagequeue.h	Fri Mar  2 18:12:25 2018	(r330288)
@@ -183,7 +183,8 @@ vm_pagequeue_cnt_add(struct vm_pagequeue *pq, int adde
 
 void vm_domain_set(struct vm_domain *vmd);
 void vm_domain_clear(struct vm_domain *vmd);
-int vm_domain_allocate(struct vm_domain *vmd, int req, int npages);
+int vm_domain_allocate(struct vm_domain *vmd, int req, int npages,
+    bool partial);
 
 /*
  *      vm_pagequeue_domain:

Modified: user/markj/vm-playground/sys/vm/vm_reserv.c
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_reserv.c	Fri Mar  2 17:07:08 2018	(r330287)
+++ user/markj/vm-playground/sys/vm/vm_reserv.c	Fri Mar  2 18:12:25 2018	(r330288)
@@ -643,7 +643,7 @@ vm_reserv_extend_contig(int req, vm_object_t object, v
 		if (popmap_is_set(rv->popmap, index + i))
 			goto out;
 	}
-	if (!vm_domain_allocate(vmd, req, npages))
+	if (vm_domain_allocate(vmd, req, npages, false) == 0)
 		goto out;
 	for (i = 0; i < npages; i++)
 		vm_reserv_populate(rv, index + i);
@@ -792,7 +792,7 @@ vm_reserv_alloc_contig(int req, vm_object_t object, vm
 	 */
 	m = NULL;
 	vmd = VM_DOMAIN(domain);
-	if (vm_domain_allocate(vmd, req, allocpages)) {
+	if (vm_domain_allocate(vmd, req, allocpages, false) == allocpages) {
 		vm_domain_free_lock(vmd);
 		m = vm_phys_alloc_contig(domain, allocpages, low, high,
 		    ulmax(alignment, VM_LEVEL_0_SIZE),
@@ -839,8 +839,10 @@ vm_reserv_alloc_contig(int req, vm_object_t object, vm
 }
 
 /*
- * Attempts to extend an existing reservation and allocate the page to the
- * object.
+ * Attempts to extend an existing reservation and allocate the request page to
+ * the object.  Opportunistically returns up to "*countp" contiguous pages if
+ * the caller so requests.  The number of pages allocated is returned in
+ * "*countp".
  *
  * The page "mpred" must immediately precede the offset "pindex" within the
  * specified object.
@@ -849,12 +851,12 @@ vm_reserv_alloc_contig(int req, vm_object_t object, vm
  */
 vm_page_t
 vm_reserv_extend(int req, vm_object_t object, vm_pindex_t pindex, int domain,
-    vm_page_t mpred)
+    vm_page_t mpred, int *countp)
 {
 	struct vm_domain *vmd;
 	vm_page_t m, msucc;
 	vm_reserv_t rv;
-	int index;
+	int avail, index, nalloc;
 
 	VM_OBJECT_ASSERT_WLOCKED(object);
 
@@ -886,10 +888,30 @@ vm_reserv_extend(int req, vm_object_t object, vm_pinde
 		m = NULL;
 		goto out;
 	}
-	if (vm_domain_allocate(vmd, req, 1) == 0)
-		m = NULL;
-	else
+
+	/*
+	 * If the caller is prepared to accept multiple pages, try to allocate
+	 * them.  We are constrained by:
+	 *   1) the number of pages the caller can accept,
+	 *   2) the number of free pages in the reservation succeeding "index",
+	 *   3) the number of available free pages in the domain.
+	 */
+	nalloc = countp != NULL ? imin(VM_LEVEL_0_NPAGES - index, *countp) : 1;
+	if ((avail = vm_domain_allocate(vmd, req, nalloc, true)) > 0) {
 		vm_reserv_populate(rv, index);
+		if (countp != NULL) {
+			for (nalloc = 1; nalloc < avail; nalloc++) {
+				if (popmap_is_set(rv->popmap, ++index))
+					break;
+				vm_reserv_populate(rv, index);
+			}
+			if (nalloc < avail)
+				/* Return leftover pages. */
+				vm_domain_freecnt_inc(vmd, avail - nalloc);
+			*countp = nalloc;
+		}
+	} else
+		m = NULL;
 out:
 	vm_reserv_unlock(rv);
 
@@ -897,22 +919,25 @@ out:
 }
 
 /*
- * Allocates a page from an existing reservation.
+ * Allocates a new reservation for the object, and returns a page from that
+ * reservation.  Opportunistically returns up to *"countp" contiguous pages if
+ * the caller so requests.  The number of pages allocated is returned in
+ * "*countp".
  *
  * The page "mpred" must immediately precede the offset "pindex" within the
  * specified object.
  *
- * The object and free page queue must be locked.
+ * The object and per-domain free page queues must be locked.
  */
 vm_page_t
 vm_reserv_alloc_page(int req, vm_object_t object, vm_pindex_t pindex, int domain,
-    vm_page_t mpred)
+    vm_page_t mpred, int *countp)
 {
 	struct vm_domain *vmd;
 	vm_page_t m, msucc;
 	vm_pindex_t first, leftcap, rightcap;
 	vm_reserv_t rv;
-	int index;
+	int avail, index, nalloc;
 
 	VM_OBJECT_ASSERT_WLOCKED(object);
 
@@ -981,16 +1006,24 @@ vm_reserv_alloc_page(int req, vm_object_t object, vm_p
 
 	/*
 	 * Allocate and populate the new reservation.
+	 *
+	 * If the caller is prepared to accept multiple pages, try to allocate
+	 * them.  We are constrained by:
+	 *   1) the number of pages the caller can accept,
+	 *   2) the number of free pages in the reservation succeeding "index",
+	 *   3) the number of available free pages in the domain.
 	 */
+	index = VM_RESERV_INDEX(object, pindex);
 	m = NULL;
+	nalloc = countp != NULL ? imin(VM_LEVEL_0_NPAGES - index, *countp) : 1;
 	vmd = VM_DOMAIN(domain);
-	if (vm_domain_allocate(vmd, req, 1)) {
+	if ((avail = vm_domain_allocate(vmd, req, nalloc, true)) > 0) {
 		vm_domain_free_lock(vmd);
 		m = vm_phys_alloc_pages(domain, VM_FREEPOOL_DEFAULT,
 		    VM_LEVEL_0_ORDER);
 		vm_domain_free_unlock(vmd);
 		if (m == NULL) {
-			vm_domain_freecnt_inc(vmd, 1);
+			vm_domain_freecnt_inc(vmd, avail);
 			return (NULL);
 		}
 	} else
@@ -1000,11 +1033,22 @@ vm_reserv_alloc_page(int req, vm_object_t object, vm_p
 	KASSERT(rv->pages == m,
 	    ("vm_reserv_alloc_page: reserv %p's pages is corrupted", rv));
 	vm_reserv_insert(rv, object, first);
-	index = VM_RESERV_INDEX(object, pindex);
 	vm_reserv_populate(rv, index);
+	m = &rv->pages[index];
+	if (countp != NULL) {
+		for (nalloc = 1; nalloc < avail; nalloc++) {
+			if (popmap_is_set(rv->popmap, ++index))
+				break;
+			vm_reserv_populate(rv, index);
+		}
+		if (nalloc < avail)
+			/* Return leftover pages. */
+			vm_domain_freecnt_inc(vmd, avail - nalloc);
+		*countp = nalloc;
+	}
 	vm_reserv_unlock(rv);
 
-	return (&rv->pages[index]);
+	return (m);
 }
 
 /*
@@ -1227,15 +1271,16 @@ vm_reserv_reclaim(vm_reserv_t rv)
 
 /*
  * Breaks the reservation at the head of the partially populated reservation
- * queue, releasing its free pages to the physical memory allocator.  Returns
- * TRUE if a reservation is broken and FALSE otherwise.
+ * queue, releasing its free pages to the physical memory allocator, and
+ * returns the number of pages released.
  *
  * The free page queue lock must be held.
  */
-boolean_t
+int
 vm_reserv_reclaim_inactive(int domain)
 {
 	vm_reserv_t rv;
+	int freed;
 
 	while ((rv = TAILQ_FIRST(&vm_rvq_partpop[domain])) != NULL) {
 		vm_reserv_lock(rv);
@@ -1243,11 +1288,12 @@ vm_reserv_reclaim_inactive(int domain)
 			vm_reserv_unlock(rv);
 			continue;
 		}
+		freed = VM_LEVEL_0_NPAGES - rv->popcnt;
 		vm_reserv_reclaim(rv);
 		vm_reserv_unlock(rv);
-		return (TRUE);
+		return (freed);
 	}
-	return (FALSE);
+	return (0);
 }
 
 /*

Modified: user/markj/vm-playground/sys/vm/vm_reserv.h
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_reserv.h	Fri Mar  2 17:07:08 2018	(r330287)
+++ user/markj/vm-playground/sys/vm/vm_reserv.h	Fri Mar  2 18:12:25 2018	(r330288)
@@ -54,10 +54,12 @@ vm_page_t	vm_reserv_extend_contig(int req, vm_object_t
 		    vm_pindex_t pindex, int domain, u_long npages,
 		    vm_paddr_t low, vm_paddr_t high, u_long alignment,
 		    vm_paddr_t boundary, vm_page_t mpred);
-vm_page_t	vm_reserv_alloc_page(int req, vm_object_t object, vm_pindex_t pindex,
-		    int domain, vm_page_t mpred);
+vm_page_t	vm_reserv_alloc_page(int req, vm_object_t object,
+		    vm_pindex_t pindex, int domain, vm_page_t mpred,
+		    int *countp);
 vm_page_t	vm_reserv_extend(int req, vm_object_t object,
-		    vm_pindex_t pindex, int domain, vm_page_t mpred);
+		    vm_pindex_t pindex, int domain, vm_page_t mpred,
+		    int *countp);
 void		vm_reserv_break_all(vm_object_t object);
 boolean_t	vm_reserv_free_page(vm_page_t m);
 void		vm_reserv_init(void);
@@ -67,7 +69,7 @@ int		vm_reserv_level_iffullpop(vm_page_t m);
 boolean_t	vm_reserv_reclaim_contig(int domain, u_long npages,
 		    vm_paddr_t low, vm_paddr_t high, u_long alignment,
 		    vm_paddr_t boundary);
-boolean_t	vm_reserv_reclaim_inactive(int domain);
+int		vm_reserv_reclaim_inactive(int domain);
 void		vm_reserv_rename(vm_page_t m, vm_object_t new_object,
 		    vm_object_t old_object, vm_pindex_t old_object_offset);
 int		vm_reserv_size(int level);

Modified: user/markj/vm-playground/sys/vm/vnode_pager.c
==============================================================================
--- user/markj/vm-playground/sys/vm/vnode_pager.c	Fri Mar  2 17:07:08 2018	(r330287)
+++ user/markj/vm-playground/sys/vm/vnode_pager.c	Fri Mar  2 18:12:25 2018	(r330288)
@@ -897,35 +897,27 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_page
 
 	/*
 	 * Fill in the bp->b_pages[] array with requested and optional   
-	 * read behind or read ahead pages.  Read behind pages are looked
-	 * up in a backward direction, down to a first cached page.  Same
-	 * for read ahead pages, but there is no need to shift the array
-	 * in case of encountering a cached page.
+	 * read behind or read ahead pages.
 	 */
 	i = bp->b_npages = 0;
-	if (rbehind) {
-		vm_pindex_t startpindex, tpindex;
-		vm_page_t p;
+	if (rbehind > 0) {
+		vm_pindex_t startpindex;
+		vm_page_t mpred;
 
 		VM_OBJECT_WLOCK(object);
 		startpindex = m[0]->pindex - rbehind;
-		if ((p = TAILQ_PREV(m[0], pglist, listq)) != NULL &&
-		    p->pindex >= startpindex)
-			startpindex = p->pindex + 1;
+		if ((mpred = TAILQ_PREV(m[0], pglist, listq)) != NULL &&
+		    mpred->pindex >= startpindex)
+			startpindex = mpred->pindex + 1;
 
-		/* tpindex is unsigned; beware of numeric underflow. */
-		for (tpindex = m[0]->pindex - 1;
-		    tpindex >= startpindex && tpindex < m[0]->pindex;
-		    tpindex--, i++) {
-			p = vm_page_alloc(object, tpindex, VM_ALLOC_NORMAL);
-			if (p == NULL) {
-				/* Shift the array. */
-				for (int j = 0; j < i; j++)
-					bp->b_pages[j] = bp->b_pages[j + 
-					    tpindex + 1 - startpindex]; 
-				break;
-			}
-			bp->b_pages[tpindex - startpindex] = p;
+		i = vm_page_alloc_pages_after(object, startpindex,
+		    VM_ALLOC_NORMAL, &bp->b_pages[0],
+		    m[0]->pindex - startpindex, mpred);
+		if (i < m[0]->pindex - startpindex) {
+			/* We have to drop the partially allocated run. */
+			for (int j = 0; j < i; j++)
+				vm_page_free(bp->b_pages[j]);
+			i = 0;
 		}
 
 		bp->b_pgbefore = i;
@@ -939,29 +931,24 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_page
 		bp->b_pages[i] = m[j];
 	bp->b_npages += count;
 
-	if (rahead) {
-		vm_pindex_t endpindex, tpindex;
-		vm_page_t p;
+	if (rahead > 0) {
+		vm_pindex_t endpindex, startpindex;
+		vm_page_t msucc;
 
 		if (!VM_OBJECT_WOWNED(object))
 			VM_OBJECT_WLOCK(object);
-		endpindex = m[count - 1]->pindex + rahead + 1;
-		if ((p = TAILQ_NEXT(m[count - 1], listq)) != NULL &&
-		    p->pindex < endpindex)
-			endpindex = p->pindex;
+		startpindex = m[count - 1]->pindex + 1;
+		endpindex = startpindex + rahead;
+		if ((msucc = TAILQ_NEXT(m[count - 1], listq)) != NULL &&
+		    msucc->pindex < endpindex)
+			endpindex = msucc->pindex;
 		if (endpindex > object->size)
 			endpindex = object->size;
 
-		for (tpindex = m[count - 1]->pindex + 1;
-		    tpindex < endpindex; i++, tpindex++) {
-			p = vm_page_alloc(object, tpindex, VM_ALLOC_NORMAL);
-			if (p == NULL)
-				break;
-			bp->b_pages[i] = p;
-		}
-
-		bp->b_pgafter = i - bp->b_npages;
-		bp->b_npages = i;
+		bp->b_pgafter = vm_page_alloc_pages_after(object, startpindex,
+		    VM_ALLOC_NORMAL, &bp->b_pages[i], endpindex - startpindex,
+		    m[count - 1]);
+		bp->b_npages += bp->b_pgafter;
 	} else
 		bp->b_pgafter = 0;
 

From owner-svn-src-user@freebsd.org  Fri Mar  2 21:50:03 2018
Return-Path: <owner-svn-src-user@freebsd.org>
Delivered-To: svn-src-user@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4BEA3F24E25
 for <svn-src-user@mailman.ysv.freebsd.org>;
 Fri,  2 Mar 2018 21:50:03 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org
 [IPv6:2610:1c1:1:606c::19:3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "mxrelay.nyi.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id F3BD27AD02;
 Fri,  2 Mar 2018 21:50:02 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id E9FB81D185;
 Fri,  2 Mar 2018 21:50:02 +0000 (UTC)
 (envelope-from markj@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w22Lo20B044951;
 Fri, 2 Mar 2018 21:50:02 GMT (envelope-from markj@FreeBSD.org)
Received: (from markj@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id w22Lo2FO044943;
 Fri, 2 Mar 2018 21:50:02 GMT (envelope-from markj@FreeBSD.org)
Message-Id: <201803022150.w22Lo2FO044943@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: markj set sender to
 markj@FreeBSD.org using -f
From: Mark Johnston <markj@FreeBSD.org>
Date: Fri, 2 Mar 2018 21:50:02 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-user@freebsd.org
Subject: svn commit: r330296 - in user/markj/vm-playground/sys: amd64/include
 kern vm
X-SVN-Group: user
X-SVN-Commit-Author: markj
X-SVN-Commit-Paths: in user/markj/vm-playground/sys: amd64/include kern vm
X-SVN-Commit-Revision: 330296
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user/>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Mar 2018 21:50:03 -0000

Author: markj
Date: Fri Mar  2 21:50:02 2018
New Revision: 330296
URL: https://svnweb.freebsd.org/changeset/base/330296

Log:
  Add queues for batching page queue operations and page frees.
  
  As an alternative to the approach taken in r328860, relax the locking
  protocol for the queue field of struct vm_page and introduce per-CPU
  batch queues for each page queue and for each free queue. This approach
  reduces lock contention for enqueue, dequeue, and requeue operations
  by separating logical and physical queue state. In general, logical
  queue state is protected by the page lock, while physical queue state
  is protected by a page queue lock. Queue state is encoded in the
  queue and aflags fields. When performing a queue operation on a page,
  the logical operation is performed first, with the page lock held,
  and the physical operation is deferred using a batch queue. Physical
  operations may be deferred indefinitely (in particular, until after
  the page has been freed), but the number of pages whose logical and
  physical queue states do not match is bounded by a small number.
  
  The queue state of pages is also now decoupled from the allocation
  state: pages may be freed without having been physically dequeued
  (though they must be logically dequeued). The page allocators ensure
  that pages have been physically dequeued before a page is reused. One
  consequence of this is that page queue locks must now be leaf locks.
  As a result, active queue scanning is modified to work the same as
  inactive and laundry queue scanning do, so the active queue lock is not
  held when calling into the pmap layer during a scan.
  
  The queue field now encodes the logical queue state of the page, and
  the new PGA_ENQUEUED flag indicates whether the page is physically
  enqueued. To update the queue field of a page, the queue lock for its
  old value must be held: the page queue lock if the value is not PQ_NONE,
  and the page lock otherwise. When performing such an update, one of the
  new or old values must be PQ_NONE. To enqueue a page, the queue field
  is updated to the index of the queue; later, the page is physically
  enqueued while the page queue lock is held, and PGA_ENQUEUED is set.
  The PGA_ENQUEUED flag may only be set or cleared with the corresponding
  page queue lock held.
  
  Logical dequeues and requeues are requested using the PGA_DEQUEUE and
  PGA_REQUEUE flags, respectively. Both must be set with the page lock
  held, and can only be cleared once the corresponding physical operation
  has been performed, with the page queue lock held. As mentioned above,
  pages must be at least logically dequeued before being freed.
  
  The inactive queue scanning algorithm is changed to exploit the relaxed
  locking protocol. Rather than acquire the inactive queue lock once
  per page during a scan, we collect and physically dequeue a batch of
  pages, which is then processed using only the page and object locks.
  The fact that logical queue state is encoded in the page's atomic flags
  allows the page daemon to synchronize with a thread which is
  simultaneously freeing pages from the object.
  
  While this approach brings with it considerable complexity, it also
  allows some simplification of existing code. For instance, the page
  queue lock dance in vm_object_terminate() goes away since the dequeues
  are automatically batched. We also no longer need to use
  vm_pageout_fallback_object_lock() in the inactive queue scan. The
  laundry queue scan still uses it, but it is not required.

Modified:
  user/markj/vm-playground/sys/amd64/include/vmparam.h
  user/markj/vm-playground/sys/kern/subr_witness.c
  user/markj/vm-playground/sys/vm/vm_object.c
  user/markj/vm-playground/sys/vm/vm_page.c
  user/markj/vm-playground/sys/vm/vm_page.h
  user/markj/vm-playground/sys/vm/vm_pageout.c
  user/markj/vm-playground/sys/vm/vm_pagequeue.h
  user/markj/vm-playground/sys/vm/vm_phys.c

Modified: user/markj/vm-playground/sys/amd64/include/vmparam.h
==============================================================================
--- user/markj/vm-playground/sys/amd64/include/vmparam.h	Fri Mar  2 21:26:48 2018	(r330295)
+++ user/markj/vm-playground/sys/amd64/include/vmparam.h	Fri Mar  2 21:50:02 2018	(r330296)
@@ -227,4 +227,10 @@
 
 #define	ZERO_REGION_SIZE	(2 * 1024 * 1024)	/* 2MB */
 
+/*
+ * Use a fairly large batch size since we expect amd64 systems to have
+ * lots of memory.
+ */
+#define	VM_BATCHQUEUE_SIZE	31
+
 #endif /* _MACHINE_VMPARAM_H_ */

Modified: user/markj/vm-playground/sys/kern/subr_witness.c
==============================================================================
--- user/markj/vm-playground/sys/kern/subr_witness.c	Fri Mar  2 21:26:48 2018	(r330295)
+++ user/markj/vm-playground/sys/kern/subr_witness.c	Fri Mar  2 21:50:02 2018	(r330296)
@@ -601,7 +601,6 @@ static struct witness_order_list_entry order_lists[] =
 	 * CDEV
 	 */
 	{ "vm map (system)", &lock_class_mtx_sleep },
-	{ "vm pagequeue", &lock_class_mtx_sleep },
 	{ "vnode interlock", &lock_class_mtx_sleep },
 	{ "cdev", &lock_class_mtx_sleep },
 	{ NULL, NULL },
@@ -611,11 +610,11 @@ static struct witness_order_list_entry order_lists[] =
 	{ "vm map (user)", &lock_class_sx },
 	{ "vm object", &lock_class_rw },
 	{ "vm page", &lock_class_mtx_sleep },
-	{ "vm pagequeue", &lock_class_mtx_sleep },
 	{ "pmap pv global", &lock_class_rw },
 	{ "pmap", &lock_class_mtx_sleep },
 	{ "pmap pv list", &lock_class_rw },
 	{ "vm page free queue", &lock_class_mtx_sleep },
+	{ "vm pagequeue", &lock_class_mtx_sleep },
 	{ NULL, NULL },
 	/*
 	 * kqueue/VFS interaction

Modified: user/markj/vm-playground/sys/vm/vm_object.c
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_object.c	Fri Mar  2 21:26:48 2018	(r330295)
+++ user/markj/vm-playground/sys/vm/vm_object.c	Fri Mar  2 21:50:02 2018	(r330296)
@@ -721,14 +721,11 @@ static void
 vm_object_terminate_pages(vm_object_t object)
 {
 	vm_page_t p, p_next;
-	struct mtx *mtx, *mtx1;
-	struct vm_pagequeue *pq, *pq1;
-	int dequeued;
+	struct mtx *mtx;
 
 	VM_OBJECT_ASSERT_WLOCKED(object);
 
 	mtx = NULL;
-	pq = NULL;
 
 	/*
 	 * Free any remaining pageable pages.  This also removes them from the
@@ -738,60 +735,23 @@ vm_object_terminate_pages(vm_object_t object)
 	 */
 	TAILQ_FOREACH_SAFE(p, &object->memq, listq, p_next) {
 		vm_page_assert_unbusied(p);
-		if ((object->flags & OBJ_UNMANAGED) == 0) {
+		if ((object->flags & OBJ_UNMANAGED) == 0)
 			/*
 			 * vm_page_free_prep() only needs the page
 			 * lock for managed pages.
 			 */
-			mtx1 = vm_page_lockptr(p);
-			if (mtx1 != mtx) {
-				if (mtx != NULL)
-					mtx_unlock(mtx);
-				if (pq != NULL) {
-					vm_pagequeue_cnt_add(pq, dequeued);
-					vm_pagequeue_unlock(pq);
-					pq = NULL;
-				}
-				mtx = mtx1;
-				mtx_lock(mtx);
-			}
-		}
+			vm_page_change_lock(p, &mtx);
 		p->object = NULL;
 		if (p->wire_count != 0)
-			goto unlist;
+			continue;
 		VM_CNT_INC(v_pfree);
 		p->flags &= ~PG_ZERO;
-		if (p->queue != PQ_NONE) {
-			KASSERT(p->queue < PQ_COUNT, ("vm_object_terminate: "
-			    "page %p is not queued", p));
-			pq1 = vm_page_pagequeue(p);
-			if (pq != pq1) {
-				if (pq != NULL) {
-					vm_pagequeue_cnt_add(pq, dequeued);
-					vm_pagequeue_unlock(pq);
-				}
-				pq = pq1;
-				vm_pagequeue_lock(pq);
-				dequeued = 0;
-			}
-			p->queue = PQ_NONE;
-			TAILQ_REMOVE(&pq->pq_pl, p, plinks.q);
-			dequeued--;
-		}
-		if (vm_page_free_prep(p, true))
-			continue;
-unlist:
-		TAILQ_REMOVE(&object->memq, p, listq);
+
+		vm_page_free(p);
 	}
-	if (pq != NULL) {
-		vm_pagequeue_cnt_add(pq, dequeued);
-		vm_pagequeue_unlock(pq);
-	}
 	if (mtx != NULL)
 		mtx_unlock(mtx);
 
-	vm_page_free_phys_pglist(&object->memq);
-
 	/*
 	 * If the object contained any pages, then reset it to an empty state.
 	 * None of the object's fields, including "resident_page_count", were
@@ -1974,7 +1934,6 @@ vm_object_page_remove(vm_object_t object, vm_pindex_t 
 {
 	vm_page_t p, next;
 	struct mtx *mtx;
-	struct pglist pgl;
 
 	VM_OBJECT_ASSERT_WLOCKED(object);
 	KASSERT((object->flags & OBJ_UNMANAGED) == 0 ||
@@ -1983,7 +1942,6 @@ vm_object_page_remove(vm_object_t object, vm_pindex_t 
 	if (object->resident_page_count == 0)
 		return;
 	vm_object_pip_add(object, 1);
-	TAILQ_INIT(&pgl);
 again:
 	p = vm_page_find_least(object, start);
 	mtx = NULL;
@@ -2038,12 +1996,10 @@ again:
 		if ((options & OBJPR_NOTMAPPED) == 0 && object->ref_count != 0)
 			pmap_remove_all(p);
 		p->flags &= ~PG_ZERO;
-		if (vm_page_free_prep(p, false))
-			TAILQ_INSERT_TAIL(&pgl, p, listq);
+		vm_page_free(p);
 	}
 	if (mtx != NULL)
 		mtx_unlock(mtx);
-	vm_page_free_phys_pglist(&pgl);
 	vm_object_pip_wakeup(object);
 }
 

Modified: user/markj/vm-playground/sys/vm/vm_page.c
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_page.c	Fri Mar  2 21:26:48 2018	(r330295)
+++ user/markj/vm-playground/sys/vm/vm_page.c	Fri Mar  2 21:50:02 2018	(r330296)
@@ -131,13 +131,11 @@ extern int	uma_startup_count(int);
 extern void	uma_startup(void *, int);
 extern int	vmem_startup_count(void);
 
-/*
- *	Associated with page of user-allocatable memory is a
- *	page structure.
- */
-
 struct vm_domain vm_dom[MAXMEMDOM];
 
+static DPCPU_DEFINE(struct vm_batchqueue, pqbatch[MAXMEMDOM][PQ_COUNT]);
+static DPCPU_DEFINE(struct vm_batchqueue, freeqbatch[MAXMEMDOM]);
+
 struct mtx_padalign __exclusive_cache_line pa_lock[PA_LOCK_COUNT];
 
 /* The following fields are protected by the domainset lock. */
@@ -176,7 +174,7 @@ static uma_zone_t fakepg_zone;
 
 static void vm_page_alloc_check(vm_page_t m);
 static void vm_page_clear_dirty_mask(vm_page_t m, vm_page_bits_t pagebits);
-static void vm_page_enqueue(uint8_t queue, vm_page_t m);
+static void vm_page_enqueue_lazy(vm_page_t m, uint8_t queue);
 static void vm_page_free_phys(struct vm_domain *vmd, vm_page_t m);
 static void vm_page_init(void *dummy);
 static int vm_page_insert_after(vm_page_t m, vm_object_t object,
@@ -1814,6 +1812,7 @@ again:
 	KASSERT(m != NULL, ("missing page"));
 
 found:
+	vm_page_dequeue(m);
 	vm_page_alloc_check(m);
 
 	/*
@@ -1987,8 +1986,10 @@ again:
 	}
 
 done:
-	for (i = 0; i < nalloc; i++)
+	for (i = 0; i < nalloc; i++) {
+		vm_page_dequeue(ma[i]);
 		vm_page_alloc_check(ma[i]);
+	}
 
 	/*
 	 * Initialize the pages.  Only the PG_ZERO flag is inherited.
@@ -2195,8 +2196,10 @@ again:
 #if VM_NRESERVLEVEL > 0
 found:
 #endif
-	for (m = m_ret; m < &m_ret[npages]; m++)
+	for (m = m_ret; m < &m_ret[npages]; m++) {
+		vm_page_dequeue(m);
 		vm_page_alloc_check(m);
+	}
 
 	/*
 	 * Initialize the pages.  Only the PG_ZERO flag is inherited.
@@ -2273,6 +2276,8 @@ vm_page_alloc_check(vm_page_t m)
 	KASSERT(m->object == NULL, ("page %p has object", m));
 	KASSERT(m->queue == PQ_NONE,
 	    ("page %p has unexpected queue %d", m, m->queue));
+	KASSERT((m->aflags & PGA_QUEUE_STATE_MASK) == 0,
+	    ("page %p has unexpected queue state", m));
 	KASSERT(!vm_page_held(m), ("page %p is held", m));
 	KASSERT(!vm_page_busied(m), ("page %p is busy", m));
 	KASSERT(m->dirty == 0, ("page %p is dirty", m));
@@ -2342,6 +2347,7 @@ again:
 			goto again;
 		return (NULL);
 	}
+	vm_page_dequeue(m);
 	vm_page_alloc_check(m);
 
 	/*
@@ -2534,6 +2540,7 @@ retry:
 				    vm_reserv_size(level)) - pa);
 #endif
 			} else if (object->memattr == VM_MEMATTR_DEFAULT &&
+			    /* XXX need to check PGA_DEQUEUE */
 			    m->queue != PQ_NONE && !vm_page_busied(m)) {
 				/*
 				 * The page is allocated but eligible for
@@ -2686,6 +2693,7 @@ retry:
 			else if (object->memattr != VM_MEMATTR_DEFAULT)
 				error = EINVAL;
 			else if (m->queue != PQ_NONE && !vm_page_busied(m)) {
+				/* XXX need to check PGA_DEQUEUE */
 				KASSERT(pmap_page_get_memattr(m) ==
 				    VM_MEMATTR_DEFAULT,
 				    ("page %p has an unexpected memattr", m));
@@ -3213,113 +3221,288 @@ vm_page_pagequeue(vm_page_t m)
 	return (&vm_pagequeue_domain(m)->vmd_pagequeues[m->queue]);
 }
 
+static struct mtx *
+vm_page_pagequeue_lockptr(vm_page_t m)
+{
+
+	if (m->queue == PQ_NONE)
+		return (NULL);
+	return (&vm_page_pagequeue(m)->pq_mutex);
+}
+
+static void
+vm_pqbatch_process(struct vm_pagequeue *pq, struct vm_batchqueue *bq,
+    uint8_t queue)
+{
+	vm_page_t m;
+	int delta;
+	uint8_t aflags;
+
+	vm_pagequeue_assert_locked(pq);
+
+	delta = 0;
+	VM_BATCHQ_FOREACH(bq, m) {
+		if (__predict_false(m->queue != queue))
+			continue;
+
+		aflags = m->aflags;
+		if ((aflags & PGA_DEQUEUE) != 0) {
+			if (__predict_true((aflags & PGA_ENQUEUED) != 0)) {
+				TAILQ_REMOVE(&pq->pq_pl, m, plinks.q);
+				delta--;
+			}
+
+			/*
+			 * Synchronize with the page daemon, which may be
+			 * simultaneously scanning this page with only the page
+			 * lock held.  We must be careful to avoid leaving the
+			 * page in a state where it appears to belong to a page
+			 * queue.
+			 */
+			m->queue = PQ_NONE;
+			atomic_thread_fence_rel();
+			vm_page_aflag_clear(m, PGA_QUEUE_STATE_MASK);
+		} else if ((aflags & PGA_ENQUEUED) == 0) {
+			TAILQ_INSERT_TAIL(&pq->pq_pl, m, plinks.q);
+			delta++;
+			vm_page_aflag_set(m, PGA_ENQUEUED);
+			if (__predict_false((aflags & PGA_REQUEUE) != 0))
+				vm_page_aflag_clear(m, PGA_REQUEUE);
+		} else if ((aflags & PGA_REQUEUE) != 0) {
+			TAILQ_REMOVE(&pq->pq_pl, m, plinks.q);
+			TAILQ_INSERT_TAIL(&pq->pq_pl, m, plinks.q);
+			vm_page_aflag_clear(m, PGA_REQUEUE);
+		}
+	}
+	vm_batchqueue_init(bq);
+	vm_pagequeue_cnt_add(pq, delta);
+}
+
 /*
- *	vm_page_dequeue:
+ *	vm_page_dequeue_lazy:
  *
- *	Remove the given page from its current page queue.
+ *	Request removal of the given page from its current page
+ *	queue.  Physical removal from the queue may be deferred
+ *	arbitrarily, and may be cancelled by later queue operations
+ *	on that page.
  *
  *	The page must be locked.
  */
-void
-vm_page_dequeue(vm_page_t m)
+static void
+vm_page_dequeue_lazy(vm_page_t m)
 {
+	struct vm_batchqueue *bq;
 	struct vm_pagequeue *pq;
+	int domain, queue;
 
 	vm_page_assert_locked(m);
-	KASSERT(m->queue < PQ_COUNT, ("vm_page_dequeue: page %p is not queued",
-	    m));
-	pq = vm_page_pagequeue(m);
-	vm_pagequeue_lock(pq);
-	m->queue = PQ_NONE;
-	TAILQ_REMOVE(&pq->pq_pl, m, plinks.q);
-	vm_pagequeue_cnt_dec(pq);
+
+	queue = m->queue;
+	if (queue == PQ_NONE)
+		return;
+	domain = vm_phys_domain(m);
+	pq = &VM_DOMAIN(domain)->vmd_pagequeues[queue];
+
+	vm_page_aflag_set(m, PGA_DEQUEUE);
+
+	critical_enter();
+	bq = DPCPU_PTR(pqbatch[domain][queue]);
+	if (vm_batchqueue_insert(bq, m)) {
+		critical_exit();
+		return;
+	}
+	if (!vm_pagequeue_trylock(pq)) {
+		critical_exit();
+		vm_pagequeue_lock(pq);
+		critical_enter();
+		bq = DPCPU_PTR(pqbatch[domain][queue]);
+	}
+	vm_pqbatch_process(pq, bq, queue);
+
+	/*
+	 * The page may have been dequeued by another thread before we
+	 * acquired the page queue lock.  However, since we hold the
+	 * page lock, the page's queue field cannot change a second
+	 * time and we can safely clear PGA_DEQUEUE.
+	 */
+	KASSERT(m->queue == queue || m->queue == PQ_NONE,
+	    ("%s: page %p migrated between queues", __func__, m));
+	if (m->queue == queue) {
+		(void)vm_batchqueue_insert(bq, m);
+		vm_pqbatch_process(pq, bq, queue);
+	} else
+		vm_page_aflag_clear(m, PGA_DEQUEUE);
 	vm_pagequeue_unlock(pq);
+	critical_exit();
 }
 
 /*
  *	vm_page_dequeue_locked:
  *
- *	Remove the given page from its current page queue.
+ *	Remove the page from its page queue, which must be locked.
+ *	If the page lock is not held, there is no guarantee that the
+ *	page will not be enqueued by another thread before this function
+ *	returns.  In this case, it is up to the caller to ensure that
+ *	no other threads hold a reference to the page.
  *
- *	The page and page queue must be locked.
+ *	The page queue lock must be held.  If the page is not already
+ *	logically dequeued, the page lock must be held as well.
  */
 void
 vm_page_dequeue_locked(vm_page_t m)
 {
 	struct vm_pagequeue *pq;
 
-	vm_page_lock_assert(m, MA_OWNED);
-	pq = vm_page_pagequeue(m);
-	vm_pagequeue_assert_locked(pq);
+	KASSERT(m->queue != PQ_NONE,
+	    ("%s: page %p queue field is PQ_NONE", __func__, m));
+	vm_pagequeue_assert_locked(vm_page_pagequeue(m));
+	KASSERT((m->aflags & PGA_DEQUEUE) != 0 ||
+	    mtx_owned(vm_page_lockptr(m)),
+	    ("%s: queued unlocked page %p", __func__, m));
+
+	if ((m->aflags & PGA_ENQUEUED) != 0) {
+		pq = vm_page_pagequeue(m);
+		TAILQ_REMOVE(&pq->pq_pl, m, plinks.q);
+		vm_pagequeue_cnt_dec(pq);
+	}
+
+	/*
+	 * Synchronize with the page daemon, which may be simultaneously
+	 * scanning this page with only the page lock held.  We must be careful
+	 * to avoid leaving the page in a state where it appears to belong to a
+	 * page queue.
+	 */
 	m->queue = PQ_NONE;
-	TAILQ_REMOVE(&pq->pq_pl, m, plinks.q);
-	vm_pagequeue_cnt_dec(pq);
+	atomic_thread_fence_rel();
+	vm_page_aflag_clear(m, PGA_QUEUE_STATE_MASK);
 }
 
 /*
- *	vm_page_enqueue:
+ *	vm_page_dequeue:
  *
- *	Add the given page to the specified page queue.
+ *	Remove the page from whichever page queue it's in, if any.
+ *	If the page lock is not held, there is no guarantee that the
+ *	page will not be enqueued by another thread before this function
+ *	returns.  In this case, it is up to the caller to ensure that
+ *	no other threads hold a reference to the page.
+ */
+void
+vm_page_dequeue(vm_page_t m)
+{
+	struct mtx *lock, *lock1;
+
+	lock = vm_page_pagequeue_lockptr(m);
+	for (;;) {
+		if (lock == NULL)
+			return;
+		mtx_lock(lock);
+		if ((lock1 = vm_page_pagequeue_lockptr(m)) == lock)
+			break;
+		mtx_unlock(lock);
+		lock = lock1;
+	}
+	KASSERT(lock == vm_page_pagequeue_lockptr(m),
+	    ("%s: page %p migrated directly between queues", __func__, m));
+	vm_page_dequeue_locked(m);
+	mtx_unlock(lock);
+}
+
+/*
+ *	vm_page_enqueue_lazy:
  *
+ *	Schedule the given page for insertion into the specified page queue.
+ *	Physical insertion of the page may be deferred indefinitely.
+ *
  *	The page must be locked.
  */
 static void
-vm_page_enqueue(uint8_t queue, vm_page_t m)
+vm_page_enqueue_lazy(vm_page_t m, uint8_t queue)
 {
+	struct vm_batchqueue *bq;
 	struct vm_pagequeue *pq;
+	int domain;
 
-	vm_page_lock_assert(m, MA_OWNED);
-	KASSERT(queue < PQ_COUNT,
-	    ("vm_page_enqueue: invalid queue %u request for page %p",
-	    queue, m));
+	vm_page_assert_locked(m);
+	KASSERT(m->queue == PQ_NONE && (m->aflags & PGA_QUEUE_STATE_MASK) == 0,
+	    ("%s: page %p is already enqueued", __func__, m));
+
+	domain = vm_phys_domain(m);
 	pq = &vm_pagequeue_domain(m)->vmd_pagequeues[queue];
-	vm_pagequeue_lock(pq);
+
+	/*
+	 * The queue field might be changed back to PQ_NONE by a concurrent
+	 * call to vm_page_dequeue().  In that case the batch queue entry will
+	 * be a no-op.
+	 */
 	m->queue = queue;
-	TAILQ_INSERT_TAIL(&pq->pq_pl, m, plinks.q);
-	vm_pagequeue_cnt_inc(pq);
+
+	critical_enter();
+	bq = DPCPU_PTR(pqbatch[domain][queue]);
+	if (__predict_true(vm_batchqueue_insert(bq, m))) {
+		critical_exit();
+		return;
+	}
+	if (!vm_pagequeue_trylock(pq)) {
+		critical_exit();
+		vm_pagequeue_lock(pq);
+		critical_enter();
+		bq = DPCPU_PTR(pqbatch[domain][queue]);
+	}
+	vm_pqbatch_process(pq, bq, queue);
+	(void)vm_batchqueue_insert(bq, m);
+	vm_pqbatch_process(pq, bq, queue);
 	vm_pagequeue_unlock(pq);
+	critical_exit();
 }
 
 /*
  *	vm_page_requeue:
  *
- *	Move the given page to the tail of its current page queue.
+ *	Schedule a requeue of the given page.
  *
  *	The page must be locked.
  */
 void
 vm_page_requeue(vm_page_t m)
 {
+	struct vm_batchqueue *bq;
 	struct vm_pagequeue *pq;
+	int domain, queue;
 
 	vm_page_lock_assert(m, MA_OWNED);
 	KASSERT(m->queue != PQ_NONE,
-	    ("vm_page_requeue: page %p is not queued", m));
+	    ("%s: page %p is not enqueued", __func__, m));
+
+	domain = vm_phys_domain(m);
+	queue = m->queue;
 	pq = vm_page_pagequeue(m);
-	vm_pagequeue_lock(pq);
-	TAILQ_REMOVE(&pq->pq_pl, m, plinks.q);
-	TAILQ_INSERT_TAIL(&pq->pq_pl, m, plinks.q);
-	vm_pagequeue_unlock(pq);
-}
 
-/*
- *	vm_page_requeue_locked:
- *
- *	Move the given page to the tail of its current page queue.
- *
- *	The page queue must be locked.
- */
-void
-vm_page_requeue_locked(vm_page_t m)
-{
-	struct vm_pagequeue *pq;
+	if (queue == PQ_NONE)
+		return;
 
-	KASSERT(m->queue != PQ_NONE,
-	    ("vm_page_requeue_locked: page %p is not queued", m));
-	pq = vm_page_pagequeue(m);
-	vm_pagequeue_assert_locked(pq);
-	TAILQ_REMOVE(&pq->pq_pl, m, plinks.q);
-	TAILQ_INSERT_TAIL(&pq->pq_pl, m, plinks.q);
+	vm_page_aflag_set(m, PGA_REQUEUE);
+	critical_enter();
+	bq = DPCPU_PTR(pqbatch[domain][queue]);
+	if (__predict_true(vm_batchqueue_insert(bq, m))) {
+		critical_exit();
+		return;
+	}
+	if (!vm_pagequeue_trylock(pq)) {
+		critical_exit();
+		vm_pagequeue_lock(pq);
+		critical_enter();
+		bq = DPCPU_PTR(pqbatch[domain][queue]);
+	}
+	vm_pqbatch_process(pq, bq, queue);
+	KASSERT(m->queue == queue || m->queue == PQ_NONE,
+	    ("%s: page %p migrated between queues", __func__, m));
+	if (m->queue == queue) {
+		(void)vm_batchqueue_insert(bq, m);
+		vm_pqbatch_process(pq, bq, queue);
+	} else
+		vm_page_aflag_clear(m, PGA_REQUEUE);
+	vm_pagequeue_unlock(pq);
+	critical_exit();
 }
 
 /*
@@ -3337,18 +3520,18 @@ vm_page_activate(vm_page_t m)
 	int queue;
 
 	vm_page_lock_assert(m, MA_OWNED);
-	if ((queue = m->queue) != PQ_ACTIVE) {
-		if (m->wire_count == 0 && (m->oflags & VPO_UNMANAGED) == 0) {
-			if (m->act_count < ACT_INIT)
-				m->act_count = ACT_INIT;
-			if (queue != PQ_NONE)
-				vm_page_dequeue(m);
-			vm_page_enqueue(PQ_ACTIVE, m);
-		}
-	} else {
-		if (m->act_count < ACT_INIT)
+
+	if ((queue = m->queue) == PQ_ACTIVE || m->wire_count > 0 ||
+	    (m->oflags & VPO_UNMANAGED) != 0) {
+		if (queue == PQ_ACTIVE && m->act_count < ACT_INIT)
 			m->act_count = ACT_INIT;
+		return;
 	}
+
+	vm_page_remque(m);
+	if (m->act_count < ACT_INIT)
+		m->act_count = ACT_INIT;
+	vm_page_enqueue_lazy(m, PQ_ACTIVE);
 }
 
 /*
@@ -3359,11 +3542,10 @@ vm_page_activate(vm_page_t m)
  *	the page to the free list only if this function returns true.
  *
  *	The object must be locked.  The page must be locked if it is
- *	managed.  For a queued managed page, the pagequeue_locked
- *	argument specifies whether the page queue is already locked.
+ *	managed.
  */
 bool
-vm_page_free_prep(vm_page_t m, bool pagequeue_locked)
+vm_page_free_prep(vm_page_t m)
 {
 
 #if defined(DIAGNOSTIC) && defined(PHYS_TO_DMAP)
@@ -3402,12 +3584,14 @@ vm_page_free_prep(vm_page_t m, bool pagequeue_locked)
 		return (false);
 	}
 
-	if (m->queue != PQ_NONE) {
-		if (pagequeue_locked)
-			vm_page_dequeue_locked(m);
-		else
-			vm_page_dequeue(m);
-	}
+	/*
+	 * Pages need not be dequeued before they are returned to the physical
+	 * memory allocator, but they must at least be marked for a deferred
+	 * dequeue.
+	 */
+	if ((m->oflags & VPO_UNMANAGED) == 0)
+		vm_page_dequeue_lazy(m);
+
 	m->valid = 0;
 	vm_page_undirty(m);
 
@@ -3443,6 +3627,12 @@ static void
 vm_page_free_phys(struct vm_domain *vmd, vm_page_t m)
 {
 
+#if 0
+	/* XXX racy */
+	KASSERT((m->aflags & PGA_DEQUEUE) != 0 || m->queue == PQ_NONE,
+	    ("%s: page %p has lingering queue state", __func__, m));
+#endif
+
 	vm_domain_free_assert_locked(vmd);
 
 #if VM_NRESERVLEVEL > 0
@@ -3451,36 +3641,6 @@ vm_page_free_phys(struct vm_domain *vmd, vm_page_t m)
 		vm_phys_free_pages(m, 0);
 }
 
-void
-vm_page_free_phys_pglist(struct pglist *tq)
-{
-	struct vm_domain *vmd;
-	vm_page_t m;
-	int cnt;
-
-	if (TAILQ_EMPTY(tq))
-		return;
-	vmd = NULL;
-	cnt = 0;
-	TAILQ_FOREACH(m, tq, listq) {
-		if (vmd != vm_pagequeue_domain(m)) {
-			if (vmd != NULL) {
-				vm_domain_free_unlock(vmd);
-				vm_domain_freecnt_inc(vmd, cnt);
-				cnt = 0;
-			}
-			vmd = vm_pagequeue_domain(m);
-			vm_domain_free_lock(vmd);
-		}
-		vm_page_free_phys(vmd, m);
-		cnt++;
-	}
-	if (vmd != NULL) {
-		vm_domain_free_unlock(vmd);
-		vm_domain_freecnt_inc(vmd, cnt);
-	}
-}
-
 /*
  *	vm_page_free_toq:
  *
@@ -3493,15 +3653,32 @@ vm_page_free_phys_pglist(struct pglist *tq)
 void
 vm_page_free_toq(vm_page_t m)
 {
+	struct vm_batchqueue *cpubq, bq;
 	struct vm_domain *vmd;
+	int domain;
 
-	if (!vm_page_free_prep(m, false))
+	if (!vm_page_free_prep(m))
 		return;
-	vmd = vm_pagequeue_domain(m);
+
+	domain = vm_phys_domain(m);
+	vmd = VM_DOMAIN(domain);
+
+	critical_enter();
+	cpubq = DPCPU_PTR(freeqbatch[domain]);
+	if (vm_batchqueue_insert(cpubq, m)) {
+		critical_exit();
+		return;
+	}
+	memcpy(&bq, cpubq, sizeof(bq));
+	vm_batchqueue_init(cpubq);
+	critical_exit();
+
 	vm_domain_free_lock(vmd);
 	vm_page_free_phys(vmd, m);
+	VM_BATCHQ_FOREACH(&bq, m)
+		vm_page_free_phys(vmd, m);
 	vm_domain_free_unlock(vmd);
-	vm_domain_freecnt_inc(vmd, 1);
+	vm_domain_freecnt_inc(vmd, bq.bq_cnt + 1);
 }
 
 /*
@@ -3558,22 +3735,25 @@ vm_page_unwire(vm_page_t m, uint8_t queue)
 	KASSERT(queue < PQ_COUNT || queue == PQ_NONE,
 	    ("vm_page_unwire: invalid queue %u request for page %p",
 	    queue, m));
+	if ((m->oflags & VPO_UNMANAGED) == 0)
+		vm_page_assert_locked(m);
 
 	unwired = vm_page_unwire_noq(m);
-	if (unwired && (m->oflags & VPO_UNMANAGED) == 0 && m->object != NULL) {
-		if (m->queue == queue) {
+	if (!unwired || (m->oflags & VPO_UNMANAGED) != 0 || m->object == NULL)
+		return (unwired);
+
+	if (m->queue == queue) {
+		if (queue == PQ_ACTIVE)
+			vm_page_reference(m);
+		else if (queue != PQ_NONE)
+			vm_page_requeue(m);
+	} else {
+		vm_page_dequeue(m);
+		if (queue != PQ_NONE) {
+			vm_page_enqueue_lazy(m, queue);
 			if (queue == PQ_ACTIVE)
-				vm_page_reference(m);
-			else if (queue != PQ_NONE)
-				vm_page_requeue(m);
-		} else {
-			vm_page_remque(m);
-			if (queue != PQ_NONE) {
-				vm_page_enqueue(queue, m);
-				if (queue == PQ_ACTIVE)
-					/* Initialize act_count. */
-					vm_page_activate(m);
-			}
+				/* Initialize act_count. */
+				vm_page_activate(m);
 		}
 	}
 	return (unwired);
@@ -3620,7 +3800,7 @@ vm_page_unwire_noq(vm_page_t m)
  * The page must be locked.
  */
 static inline void
-_vm_page_deactivate(vm_page_t m, boolean_t noreuse)
+_vm_page_deactivate(vm_page_t m, bool noreuse)
 {
 	struct vm_pagequeue *pq;
 	int queue;
@@ -3629,31 +3809,34 @@ _vm_page_deactivate(vm_page_t m, boolean_t noreuse)
 
 	/*
 	 * Ignore if the page is already inactive, unless it is unlikely to be
-	 * reactivated.
+	 * reactivated.  Note that the test of m->queue is racy since the
+	 * inactive queue lock is not held.
 	 */
 	if ((queue = m->queue) == PQ_INACTIVE && !noreuse)
 		return;
-	if (m->wire_count == 0 && (m->oflags & VPO_UNMANAGED) == 0) {
-		pq = &vm_pagequeue_domain(m)->vmd_pagequeues[PQ_INACTIVE];
-		/* Avoid multiple acquisitions of the inactive queue lock. */
-		if (queue == PQ_INACTIVE) {
-			vm_pagequeue_lock(pq);
-			vm_page_dequeue_locked(m);
-		} else {
-			if (queue != PQ_NONE)
-				vm_page_dequeue(m);
-			vm_pagequeue_lock(pq);
-		}
+	if (m->wire_count > 0 || (m->oflags & VPO_UNMANAGED) != 0)
+		return;
+
+	/*
+	 * XXX we can do this with only one lock acquisition if m is already
+	 * in PQ_INACTIVE
+	 */
+	vm_page_remque(m);
+
+	pq = &vm_pagequeue_domain(m)->vmd_pagequeues[PQ_INACTIVE];
+	if (noreuse) {
+		/* This is a slow path. */
+		vm_pagequeue_lock(pq);
 		m->queue = PQ_INACTIVE;
-		if (noreuse)
-			TAILQ_INSERT_BEFORE(
-			    &vm_pagequeue_domain(m)->vmd_inacthead, m,
-			    plinks.q);
-		else
-			TAILQ_INSERT_TAIL(&pq->pq_pl, m, plinks.q);
+		TAILQ_INSERT_BEFORE(&vm_pagequeue_domain(m)->vmd_inacthead, m,
+		    plinks.q);
 		vm_pagequeue_cnt_inc(pq);
+		vm_page_aflag_set(m, PGA_ENQUEUED);
+		if ((m->aflags & PGA_REQUEUE) != 0)
+			vm_page_aflag_clear(m, PGA_REQUEUE);
 		vm_pagequeue_unlock(pq);
-	}
+	} else
+		vm_page_enqueue_lazy(m, PQ_INACTIVE);
 }
 
 /*
@@ -3665,7 +3848,7 @@ void
 vm_page_deactivate(vm_page_t m)
 {
 
-	_vm_page_deactivate(m, FALSE);
+	_vm_page_deactivate(m, false);
 }
 
 /*
@@ -3678,7 +3861,7 @@ void
 vm_page_deactivate_noreuse(vm_page_t m)
 {
 
-	_vm_page_deactivate(m, TRUE);
+	_vm_page_deactivate(m, true);
 }
 
 /*
@@ -3692,15 +3875,13 @@ vm_page_launder(vm_page_t m)
 	int queue;
 
 	vm_page_assert_locked(m);
-	if ((queue = m->queue) != PQ_LAUNDRY) {
-		if (m->wire_count == 0 && (m->oflags & VPO_UNMANAGED) == 0) {
-			if (queue != PQ_NONE)
-				vm_page_dequeue(m);
-			vm_page_enqueue(PQ_LAUNDRY, m);
-		} else
-			KASSERT(queue == PQ_NONE,
-			    ("wired page %p is queued", m));
-	}
+
+	if ((queue = m->queue) == PQ_LAUNDRY || m->wire_count > 0 ||
+	    (m->oflags & VPO_UNMANAGED) != 0)
+		return;
+
+	vm_page_remque(m);
+	vm_page_enqueue_lazy(m, PQ_LAUNDRY);
 }
 
 /*
@@ -3715,9 +3896,9 @@ vm_page_unswappable(vm_page_t m)
 	vm_page_assert_locked(m);
 	KASSERT(m->wire_count == 0 && (m->oflags & VPO_UNMANAGED) == 0,
 	    ("page %p already unswappable", m));
-	if (m->queue != PQ_NONE)
-		vm_page_dequeue(m);
-	vm_page_enqueue(PQ_UNSWAPPABLE, m);
+
+	vm_page_remque(m);
+	vm_page_enqueue_lazy(m, PQ_UNSWAPPABLE);
 }
 
 /*

Modified: user/markj/vm-playground/sys/vm/vm_page.h
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_page.h	Fri Mar  2 21:26:48 2018	(r330295)
+++ user/markj/vm-playground/sys/vm/vm_page.h	Fri Mar  2 21:50:02 2018	(r330296)
@@ -94,7 +94,9 @@
  *	In general, operations on this structure's mutable fields are
  *	synchronized using either one of or a combination of the lock on the
  *	object that the page belongs to (O), the pool lock for the page (P),
- *	or the lock for either the free or paging queue (Q).  If a field is
+ *	the per-domain lock for the free queues (F), or the page's queue
+ *	lock (Q).  The queue lock for a page depends on the value of its
+ *	queue field and described in detail below.  If a field is
  *	annotated below with two of these locks, then holding either lock is
  *	sufficient for read access, but both locks are required for write
  *	access.  An annotation of (C) indicates that the field is immutable.
@@ -143,6 +145,28 @@
  *	causing the thread to block.  vm_page_sleep_if_busy() can be used to
  *	sleep until the page's busy state changes, after which the caller
  *	must re-lookup the page and re-evaluate its state.
+ *
+ *	The queue field is the index of the page queue containing the
+ *	page, or PQ_NONE if the page is not enqueued.  The queue lock of a
+ *	page is the page queue lock corresponding to the page queue index,
+ *	or the page lock (P) for the page.  To modify the queue field, the
+ *	queue lock for the old value of the field must be held.  It is
+ *	invalid for a page's queue field to transition between two distinct
+ *	page queue indices.  That is, when updating the queue field, either
+ *	the new value or the old value must be PQ_NONE.
+ *
+ *	To avoid contention on page queue locks, page queue operations
+ *	(enqueue, dequeue, requeue) are batched using per-CPU queues.
+ *	A deferred operation is requested by inserting an entry into a
+ *	batch queue; the entry is simply a pointer to the page, and the
+ *	request type is encoded in the page's aflags field using the values
+ *	in PGA_QUEUE_STATE_MASK.  The type-stability of struct vm_pages is
+ *	crucial to this scheme since the processing of entries in a given
+ *	batch queue may be deferred indefinitely.  In particular, a page
+ *	may be freed before its pending batch queue entries have been
+ *	processed.  The page lock (P) must be held to schedule a batched
+ *	queue operation, and the page queue lock must be held in order to
+ *	process batch queue entries for the page queue.
  */
 
 #if PAGE_SIZE == 4096
@@ -174,7 +198,7 @@ struct vm_page {
 	TAILQ_ENTRY(vm_page) listq;	/* pages in same object (O) */
 	vm_object_t object;		/* which object am I in (O,P) */
 	vm_pindex_t pindex;		/* offset into object (O,P) */
-	vm_paddr_t phys_addr;		/* physical address of page */
+	vm_paddr_t phys_addr;		/* physical address of page (C) */
 	struct md_page md;		/* machine dependent stuff */
 	u_int wire_count;		/* wired down maps refs (P) */
 	volatile u_int busy_lock;	/* busy owners lock */
@@ -182,11 +206,11 @@ struct vm_page {
 	uint16_t flags;			/* page PG_* flags (P) */
 	uint8_t aflags;			/* access is atomic */
 	uint8_t oflags;			/* page VPO_* flags (O) */
-	uint8_t	queue;			/* page queue index (P,Q) */
+	uint8_t	queue;			/* page queue index (Q) */
 	int8_t psind;			/* pagesizes[] index (O) */
 	int8_t segind;			/* vm_phys segment index (C) */
-	uint8_t	order;			/* index of the buddy queue */
-	uint8_t pool;			/* vm_phys freepool index (Q) */
+	uint8_t	order;			/* index of the buddy queue (F) */
+	uint8_t pool;			/* vm_phys freepool index (F) */
 	u_char	act_count;		/* page usage count (P) */
 	/* NOTE that these must support one bit per DEV_BSIZE in a page */
 	/* so, on normal X86 kernels, they must be at least 8 bits wide */
@@ -314,11 +338,33 @@ extern struct mtx_padalign pa_lock[];
  *
  * PGA_EXECUTABLE may be set by pmap routines, and indicates that a page has
  * at least one executable mapping.  It is not consumed by the MI VM layer.
+ *
+ * PGA_ENQUEUED is set and cleared when a page is inserted into or removed
+ * from a page queue, respectively.  It determines whether the plinks.q field
+ * of the page is valid.  To set or clear this flag, the queue lock for the
+ * page must be held: the page queue lock corresponding to the page's "queue"
+ * field if its value is not PQ_NONE, and the page lock otherwise.
+ *
+ * PGA_DEQUEUE is set when the page is scheduled to be dequeued from a page
+ * queue, and cleared when the dequeue request is processed.  A page may
+ * have PGA_DEQUEUE set and PGA_ENQUEUED cleared, for instance if a dequeue
+ * is requested after the page is scheduled to be enqueued but before it is
+ * actually inserted into the page queue.  The page lock must be held to set
+ * this flag, and the queue lock for the page must be held to clear it.
+ *
+ * PGA_REQUEUE is set when the page is scheduled to be requeued in its page
+ * queue.  The page lock must be held to set this flag, and the queue lock
+ * for the page must be held to clear it.
  */
 #define	PGA_WRITEABLE	0x01		/* page may be mapped writeable */
 #define	PGA_REFERENCED	0x02		/* page has been referenced */
 #define	PGA_EXECUTABLE	0x04		/* page may be mapped executable */
+#define	PGA_ENQUEUED	0x08		/* page is enqueued in a page queue */
+#define	PGA_DEQUEUE	0x10		/* page is due to be dequeued */
+#define	PGA_REQUEUE	0x20		/* page is due to be requeued */
 
+#define	PGA_QUEUE_STATE_MASK	(PGA_ENQUEUED | PGA_DEQUEUE | PGA_REQUEUE)
+
 /*
  * Page flags.  If changed at any other time than page allocation or
  * freeing, the modification must be protected by the vm_page lock.
@@ -490,7 +536,7 @@ void vm_page_dequeue(vm_page_t m);
 void vm_page_dequeue_locked(vm_page_t m);
 vm_page_t vm_page_find_least(vm_object_t, vm_pindex_t);
 void vm_page_free_phys_pglist(struct pglist *tq);
-bool vm_page_free_prep(vm_page_t m, bool pagequeue_locked);
+bool vm_page_free_prep(vm_page_t m);
 vm_page_t vm_page_getfake(vm_paddr_t paddr, vm_memattr_t memattr);
 void vm_page_initfake(vm_page_t m, vm_paddr_t paddr, vm_memattr_t memattr);
 int vm_page_insert (vm_page_t, vm_object_t, vm_pindex_t);

Modified: user/markj/vm-playground/sys/vm/vm_pageout.c
==============================================================================
--- user/markj/vm-playground/sys/vm/vm_pageout.c	Fri Mar  2 21:26:48 2018	(r330295)

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***