Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Nov 2016 18:48:37 +0000 (UTC)
From:      Alan Cox <alc@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   svn commit: r308474 - in head/sys: sys vm
Message-ID:  <201611091848.uA9ImbLT013831@repo.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: alc
Date: Wed Nov  9 18:48:37 2016
New Revision: 308474
URL: https://svnweb.freebsd.org/changeset/base/308474

Log:
  Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty
  pages, specificially, dirty pages that have passed once through the inactive
  queue.  A new, dedicated thread is responsible for both deciding when to
  launder pages and actually laundering them.  The new policy uses the
  relative sizes of the inactive and laundry queues to determine whether to
  launder pages at a given point in time.  In general, this leads to more
  intelligent swapping behavior, since the laundry thread will avoid pageouts
  when the marginal benefit of doing so is low.  Previously, without a
  dedicated queue for dirty pages, the page daemon didn't have the information
  to determine whether pageout provides any benefit to the system.  Thus, the
  previous policy often resulted in small but steadily increasing amounts of
  swap usage when the system is under memory pressure, even when the inactive
  queue consisted mostly of clean pages.  This change addresses that issue,
  and also paves the way for some future virtual memory system improvements by
  removing the last source of object-cached clean pages, i.e., PG_CACHE pages.
  
  The new laundry thread sleeps while waiting for a request from the page
  daemon thread(s).  A request is raised by setting the variable
  vm_laundry_request and waking the laundry thread.  We request launderings
  for two reasons: to try and balance the inactive and laundry queue sizes
  ("background laundering"), and to quickly make up for a shortage of free
  pages and clean inactive pages ("shortfall laundering").  When background
  laundering is requested, the laundry thread computes the number of page
  daemon wakeups that have taken place since the last laundering.  If this
  number is large enough relative to the ratio of the laundry and (global)
  inactive queue sizes, we will launder vm_background_launder_target pages at
  vm_background_launder_rate KB/s.  Otherwise, the laundry thread goes back
  to sleep without doing any work.  When scanning the laundry queue during
  background laundering, reactivated pages are counted towards the laundry
  thread's target.
  
  In contrast, shortfall laundering is requested when an inactive queue scan
  fails to meet its target.  In this case, the laundry thread attempts to
  launder enough pages to meet v_free_target within 0.5s, which is the
  inactive queue scan period.
  
  A laundry request can be latched while another is currently being
  serviced.  In particular, a shortfall request will immediately preempt a
  background laundering.
  
  This change also redefines the meaning of vm_cnt.v_reactivated and removes
  the functions vm_page_cache() and vm_page_try_to_cache().  The new meaning
  of vm_cnt.v_reactivated now better reflects its name.  It represents the
  number of inactive or laundry pages that are returned to the active queue
  on account of a reference.
  
  In collaboration with:	markj
  Reviewed by:	kib
  Tested by:	pho
  Sponsored by:	Dell EMC Isilon
  Differential Revision:	https://reviews.freebsd.org/D8302

Modified:
  head/sys/sys/vmmeter.h
  head/sys/vm/swap_pager.c
  head/sys/vm/vm_fault.c
  head/sys/vm/vm_meter.c
  head/sys/vm/vm_object.c
  head/sys/vm/vm_page.c
  head/sys/vm/vm_page.h
  head/sys/vm/vm_pageout.c

Modified: head/sys/sys/vmmeter.h
==============================================================================
--- head/sys/sys/vmmeter.h	Wed Nov  9 18:42:30 2016	(r308473)
+++ head/sys/sys/vmmeter.h	Wed Nov  9 18:48:37 2016	(r308474)
@@ -75,9 +75,10 @@ struct vmmeter {
 	u_int v_vnodepgsin;	/* (p) vnode_pager pages paged in */
 	u_int v_vnodepgsout;	/* (p) vnode pager pages paged out */
 	u_int v_intrans;	/* (p) intransit blocking page faults */
-	u_int v_reactivated;	/* (f) pages reactivated from free list */
+	u_int v_reactivated;	/* (p) pages reactivated by the pagedaemon */
 	u_int v_pdwakeups;	/* (p) times daemon has awaken from sleep */
 	u_int v_pdpages;	/* (p) pages analyzed by daemon */
+	u_int v_pdshortfalls;	/* (p) page reclamation shortfalls */
 
 	u_int v_tcached;	/* (p) total pages cached */
 	u_int v_dfree;		/* (p) pages freed by daemon */
@@ -96,6 +97,7 @@ struct vmmeter {
 	u_int v_active_count;	/* (q) pages active */
 	u_int v_inactive_target; /* (c) pages desired inactive */
 	u_int v_inactive_count;	/* (q) pages inactive */
+	u_int v_laundry_count;	/* (q) pages eligible for laundering */
 	u_int v_cache_count;	/* (f) pages on cache queue */
 	u_int v_pageout_free_min;   /* (c) min pages reserved for kernel */
 	u_int v_interrupt_free_min; /* (c) reserved pages for int code */
@@ -111,7 +113,6 @@ struct vmmeter {
 	u_int v_vforkpages;	/* (p) VM pages affected by vfork() */
 	u_int v_rforkpages;	/* (p) VM pages affected by rfork() */
 	u_int v_kthreadpages;	/* (p) VM pages affected by fork() by kernel */
-	u_int v_spare[2];
 };
 #ifdef _KERNEL
 
@@ -184,6 +185,25 @@ vm_paging_needed(void)
 	    vm_pageout_wakeup_thresh);
 }
 
+/*
+ * Return the number of pages we need to launder.
+ * A positive number indicates that we have a shortfall of clean pages.
+ */
+static inline int
+vm_laundry_target(void)
+{
+
+	return (vm_paging_target());
+}
+
+/*
+ * Obtain the value of a per-CPU counter.
+ */
+#define	VM_METER_PCPU_CNT(member)					\
+	vm_meter_cnt(__offsetof(struct vmmeter, member))
+
+u_int	vm_meter_cnt(size_t);
+
 #endif
 
 /* systemwide totals computed every five seconds */

Modified: head/sys/vm/swap_pager.c
==============================================================================
--- head/sys/vm/swap_pager.c	Wed Nov  9 18:42:30 2016	(r308473)
+++ head/sys/vm/swap_pager.c	Wed Nov  9 18:48:37 2016	(r308474)
@@ -1549,17 +1549,18 @@ swp_pager_async_iodone(struct buf *bp)
 			 * For write success, clear the dirty
 			 * status, then finish the I/O ( which decrements the
 			 * busy count and possibly wakes waiter's up ).
+			 * A page is only written to swap after a period of
+			 * inactivity.  Therefore, we do not expect it to be
+			 * reused.
 			 */
 			KASSERT(!pmap_page_is_write_mapped(m),
 			    ("swp_pager_async_iodone: page %p is not write"
 			    " protected", m));
 			vm_page_undirty(m);
+			vm_page_lock(m);
+			vm_page_deactivate_noreuse(m);
+			vm_page_unlock(m);
 			vm_page_sunbusy(m);
-			if (vm_page_count_severe()) {
-				vm_page_lock(m);
-				vm_page_try_to_cache(m);
-				vm_page_unlock(m);
-			}
 		}
 	}
 
@@ -1635,12 +1636,15 @@ swap_pager_isswapped(vm_object_t object,
 /*
  * SWP_PAGER_FORCE_PAGEIN() - force a swap block to be paged in
  *
- *	This routine dissociates the page at the given index within a
- *	swap block from its backing store, paging it in if necessary.
- *	If the page is paged in, it is placed in the inactive queue,
- *	since it had its backing store ripped out from under it.
- *	We also attempt to swap in all other pages in the swap block,
- *	we only guarantee that the one at the specified index is
+ *	This routine dissociates the page at the given index within an object
+ *	from its backing store, paging it in if it does not reside in memory.
+ *	If the page is paged in, it is marked dirty and placed in the laundry
+ *	queue.  The page is marked dirty because it no longer has backing
+ *	store.  It is placed in the laundry queue because it has not been
+ *	accessed recently.  Otherwise, it would already reside in memory.
+ *
+ *	We also attempt to swap in all other pages in the swap block.
+ *	However, we only guarantee that the one at the specified index is
  *	paged in.
  *
  *	XXX - The code to page the whole block in doesn't work, so we
@@ -1669,7 +1673,7 @@ swp_pager_force_pagein(vm_object_t objec
 	vm_object_pip_wakeup(object);
 	vm_page_dirty(m);
 	vm_page_lock(m);
-	vm_page_deactivate(m);
+	vm_page_launder(m);
 	vm_page_unlock(m);
 	vm_page_xunbusy(m);
 	vm_pager_page_unswapped(m);

Modified: head/sys/vm/vm_fault.c
==============================================================================
--- head/sys/vm/vm_fault.c	Wed Nov  9 18:42:30 2016	(r308473)
+++ head/sys/vm/vm_fault.c	Wed Nov  9 18:48:37 2016	(r308474)
@@ -290,12 +290,13 @@ int
 vm_fault_hold(vm_map_t map, vm_offset_t vaddr, vm_prot_t fault_type,
     int fault_flags, vm_page_t *m_hold)
 {
-	vm_prot_t prot;
-	vm_object_t next_object;
 	struct faultstate fs;
 	struct vnode *vp;
+	vm_object_t next_object, retry_object;
 	vm_offset_t e_end, e_start;
 	vm_page_t m;
+	vm_pindex_t retry_pindex;
+	vm_prot_t prot, retry_prot;
 	int ahead, alloc_req, behind, cluster_offset, error, era, faultcount;
 	int locked, map_generation, nera, result, rv;
 	u_char behavior;
@@ -946,10 +947,6 @@ readrest:
 	 * lookup.
 	 */
 	if (!fs.lookup_still_valid) {
-		vm_object_t retry_object;
-		vm_pindex_t retry_pindex;
-		vm_prot_t retry_prot;
-
 		if (!vm_map_trylock_read(fs.map)) {
 			release_page(&fs);
 			unlock_and_deallocate(&fs);

Modified: head/sys/vm/vm_meter.c
==============================================================================
--- head/sys/vm/vm_meter.c	Wed Nov  9 18:42:30 2016	(r308473)
+++ head/sys/vm/vm_meter.c	Wed Nov  9 18:48:37 2016	(r308474)
@@ -216,29 +216,37 @@ vmtotal(SYSCTL_HANDLER_ARGS)
 }
 
 /*
- * vcnt() -	accumulate statistics from all cpus and the global cnt
- *		structure.
+ * vm_meter_cnt() -	accumulate statistics from all cpus and the global cnt
+ *			structure.
  *
  *	The vmmeter structure is now per-cpu as well as global.  Those
  *	statistics which can be kept on a per-cpu basis (to avoid cache
  *	stalls between cpus) can be moved to the per-cpu vmmeter.  Remaining
  *	statistics, such as v_free_reserved, are left in the global
  *	structure.
- *
- * (sysctl_oid *oidp, void *arg1, int arg2, struct sysctl_req *req)
  */
-static int
-vcnt(SYSCTL_HANDLER_ARGS)
+u_int
+vm_meter_cnt(size_t offset)
 {
-	int count = *(int *)arg1;
-	int offset = (char *)arg1 - (char *)&vm_cnt;
+	struct pcpu *pcpu;
+	u_int count;
 	int i;
 
+	count = *(u_int *)((char *)&vm_cnt + offset);
 	CPU_FOREACH(i) {
-		struct pcpu *pcpu = pcpu_find(i);
-		count += *(int *)((char *)&pcpu->pc_cnt + offset);
+		pcpu = pcpu_find(i);
+		count += *(u_int *)((char *)&pcpu->pc_cnt + offset);
 	}
-	return (SYSCTL_OUT(req, &count, sizeof(int)));
+	return (count);
+}
+
+static int
+cnt_sysctl(SYSCTL_HANDLER_ARGS)
+{
+	u_int count;
+
+	count = vm_meter_cnt((char *)arg1 - (char *)&vm_cnt);
+	return (SYSCTL_OUT(req, &count, sizeof(count)));
 }
 
 SYSCTL_PROC(_vm, VM_TOTAL, vmtotal, CTLTYPE_OPAQUE|CTLFLAG_RD|CTLFLAG_MPSAFE,
@@ -253,8 +261,8 @@ SYSCTL_NODE(_vm_stats, OID_AUTO, misc, C
 
 #define	VM_STATS(parent, var, descr) \
 	SYSCTL_PROC(parent, OID_AUTO, var, \
-	    CTLTYPE_UINT | CTLFLAG_RD | CTLFLAG_MPSAFE, &vm_cnt.var, 0, vcnt, \
-	    "IU", descr)
+	    CTLTYPE_UINT | CTLFLAG_RD | CTLFLAG_MPSAFE, &vm_cnt.var, 0,	\
+	    cnt_sysctl, "IU", descr)
 #define	VM_STATS_VM(var, descr)		VM_STATS(_vm_stats_vm, var, descr)
 #define	VM_STATS_SYS(var, descr)	VM_STATS(_vm_stats_sys, var, descr)
 
@@ -278,9 +286,10 @@ VM_STATS_VM(v_vnodeout, "Vnode pager pag
 VM_STATS_VM(v_vnodepgsin, "Vnode pages paged in");
 VM_STATS_VM(v_vnodepgsout, "Vnode pages paged out");
 VM_STATS_VM(v_intrans, "In transit page faults");
-VM_STATS_VM(v_reactivated, "Pages reactivated from free list");
+VM_STATS_VM(v_reactivated, "Pages reactivated by pagedaemon");
 VM_STATS_VM(v_pdwakeups, "Pagedaemon wakeups");
 VM_STATS_VM(v_pdpages, "Pages analyzed by pagedaemon");
+VM_STATS_VM(v_pdshortfalls, "Page reclamation shortfalls");
 VM_STATS_VM(v_tcached, "Total pages cached");
 VM_STATS_VM(v_dfree, "Pages freed by pagedaemon");
 VM_STATS_VM(v_pfree, "Pages freed by exiting processes");
@@ -295,6 +304,7 @@ VM_STATS_VM(v_wire_count, "Wired pages")
 VM_STATS_VM(v_active_count, "Active pages");
 VM_STATS_VM(v_inactive_target, "Desired inactive pages");
 VM_STATS_VM(v_inactive_count, "Inactive pages");
+VM_STATS_VM(v_laundry_count, "Pages eligible for laundering");
 VM_STATS_VM(v_cache_count, "Pages on cache queue");
 VM_STATS_VM(v_pageout_free_min, "Min pages reserved for kernel");
 VM_STATS_VM(v_interrupt_free_min, "Reserved pages for interrupt code");

Modified: head/sys/vm/vm_object.c
==============================================================================
--- head/sys/vm/vm_object.c	Wed Nov  9 18:42:30 2016	(r308473)
+++ head/sys/vm/vm_object.c	Wed Nov  9 18:48:37 2016	(r308474)
@@ -2329,9 +2329,9 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARG
 			 * sysctl is only meant to give an
 			 * approximation of the system anyway.
 			 */
-			if (m->queue == PQ_ACTIVE)
+			if (vm_page_active(m))
 				kvo.kvo_active++;
-			else if (m->queue == PQ_INACTIVE)
+			else if (vm_page_inactive(m))
 				kvo.kvo_inactive++;
 		}
 

Modified: head/sys/vm/vm_page.c
==============================================================================
--- head/sys/vm/vm_page.c	Wed Nov  9 18:42:30 2016	(r308473)
+++ head/sys/vm/vm_page.c	Wed Nov  9 18:48:37 2016	(r308474)
@@ -390,6 +390,10 @@ vm_page_domain_init(struct vm_domain *vm
 	    "vm active pagequeue";
 	*__DECONST(u_int **, &vmd->vmd_pagequeues[PQ_ACTIVE].pq_vcnt) =
 	    &vm_cnt.v_active_count;
+	*__DECONST(char **, &vmd->vmd_pagequeues[PQ_LAUNDRY].pq_name) =
+	    "vm laundry pagequeue";
+	*__DECONST(int **, &vmd->vmd_pagequeues[PQ_LAUNDRY].pq_vcnt) =
+	    &vm_cnt.v_laundry_count;
 	vmd->vmd_page_count = 0;
 	vmd->vmd_free_count = 0;
 	vmd->vmd_segs = 0;
@@ -1730,9 +1734,7 @@ vm_page_alloc(vm_object_t object, vm_pin
 		    ("vm_page_alloc: cached page %p is PG_ZERO", m));
 		KASSERT(m->valid != 0,
 		    ("vm_page_alloc: cached page %p is invalid", m));
-		if (m->object == object && m->pindex == pindex)
-			vm_cnt.v_reactivated++;
-		else
+		if (m->object != object || m->pindex != pindex)
 			m->valid = 0;
 		m_object = m->object;
 		vm_page_cache_remove(m);
@@ -2254,7 +2256,7 @@ retry:
 			}
 			KASSERT((m->flags & PG_UNHOLDFREE) == 0,
 			    ("page %p is PG_UNHOLDFREE", m));
-			/* Don't care: PG_NODUMP, PG_WINATCFLS, PG_ZERO. */
+			/* Don't care: PG_NODUMP, PG_ZERO. */
 			if (object->type != OBJT_DEFAULT &&
 			    object->type != OBJT_SWAP &&
 			    object->type != OBJT_VNODE)
@@ -2450,7 +2452,7 @@ retry:
 			}
 			KASSERT((m->flags & PG_UNHOLDFREE) == 0,
 			    ("page %p is PG_UNHOLDFREE", m));
-			/* Don't care: PG_NODUMP, PG_WINATCFLS, PG_ZERO. */
+			/* Don't care: PG_NODUMP, PG_ZERO. */
 			if (object->type != OBJT_DEFAULT &&
 			    object->type != OBJT_SWAP &&
 			    object->type != OBJT_VNODE)
@@ -2778,7 +2780,10 @@ struct vm_pagequeue *
 vm_page_pagequeue(vm_page_t m)
 {
 
-	return (&vm_phys_domain(m)->vmd_pagequeues[m->queue]);
+	if (vm_page_in_laundry(m))
+		return (&vm_dom[0].vmd_pagequeues[m->queue]);
+	else
+		return (&vm_phys_domain(m)->vmd_pagequeues[m->queue]);
 }
 
 /*
@@ -2840,7 +2845,10 @@ vm_page_enqueue(uint8_t queue, vm_page_t
 	KASSERT(queue < PQ_COUNT,
 	    ("vm_page_enqueue: invalid queue %u request for page %p",
 	    queue, m));
-	pq = &vm_phys_domain(m)->vmd_pagequeues[queue];
+	if (queue == PQ_LAUNDRY)
+		pq = &vm_dom[0].vmd_pagequeues[queue];
+	else
+		pq = &vm_phys_domain(m)->vmd_pagequeues[queue];
 	vm_pagequeue_lock(pq);
 	m->queue = queue;
 	TAILQ_INSERT_TAIL(&pq->pq_pl, m, plinks.q);
@@ -3124,11 +3132,8 @@ vm_page_unwire(vm_page_t m, uint8_t queu
 		if (m->wire_count == 0) {
 			atomic_subtract_int(&vm_cnt.v_wire_count, 1);
 			if ((m->oflags & VPO_UNMANAGED) == 0 &&
-			    m->object != NULL && queue != PQ_NONE) {
-				if (queue == PQ_INACTIVE)
-					m->flags &= ~PG_WINATCFLS;
+			    m->object != NULL && queue != PQ_NONE)
 				vm_page_enqueue(queue, m);
-			}
 			return (TRUE);
 		} else
 			return (FALSE);
@@ -3181,7 +3186,6 @@ _vm_page_deactivate(vm_page_t m, boolean
 		} else {
 			if (queue != PQ_NONE)
 				vm_page_dequeue(m);
-			m->flags &= ~PG_WINATCFLS;
 			vm_pagequeue_lock(pq);
 		}
 		m->queue = PQ_INACTIVE;
@@ -3221,24 +3225,25 @@ vm_page_deactivate_noreuse(vm_page_t m)
 }
 
 /*
- * vm_page_try_to_cache:
+ * vm_page_launder
  *
- * Returns 0 on failure, 1 on success
+ * 	Put a page in the laundry.
  */
-int
-vm_page_try_to_cache(vm_page_t m)
+void
+vm_page_launder(vm_page_t m)
 {
+	int queue;
 
-	vm_page_lock_assert(m, MA_OWNED);
-	VM_OBJECT_ASSERT_WLOCKED(m->object);
-	if (m->dirty || m->hold_count || m->wire_count ||
-	    (m->oflags & VPO_UNMANAGED) != 0 || vm_page_busied(m))
-		return (0);
-	pmap_remove_all(m);
-	if (m->dirty)
-		return (0);
-	vm_page_cache(m);
-	return (1);
+	vm_page_assert_locked(m);
+	if ((queue = m->queue) != PQ_LAUNDRY) {
+		if (m->wire_count == 0 && (m->oflags & VPO_UNMANAGED) == 0) {
+			if (queue != PQ_NONE)
+				vm_page_dequeue(m);
+			vm_page_enqueue(PQ_LAUNDRY, m);
+		} else
+			KASSERT(queue == PQ_NONE,
+			    ("wired page %p is queued", m));
+	}
 }
 
 /*
@@ -3265,112 +3270,6 @@ vm_page_try_to_free(vm_page_t m)
 }
 
 /*
- * vm_page_cache
- *
- * Put the specified page onto the page cache queue (if appropriate).
- *
- * The object and page must be locked.
- */
-void
-vm_page_cache(vm_page_t m)
-{
-	vm_object_t object;
-	boolean_t cache_was_empty;
-
-	vm_page_lock_assert(m, MA_OWNED);
-	object = m->object;
-	VM_OBJECT_ASSERT_WLOCKED(object);
-	if (vm_page_busied(m) || (m->oflags & VPO_UNMANAGED) ||
-	    m->hold_count || m->wire_count)
-		panic("vm_page_cache: attempting to cache busy page");
-	KASSERT(!pmap_page_is_mapped(m),
-	    ("vm_page_cache: page %p is mapped", m));
-	KASSERT(m->dirty == 0, ("vm_page_cache: page %p is dirty", m));
-	if (m->valid == 0 || object->type == OBJT_DEFAULT ||
-	    (object->type == OBJT_SWAP &&
-	    !vm_pager_has_page(object, m->pindex, NULL, NULL))) {
-		/*
-		 * Hypothesis: A cache-eligible page belonging to a
-		 * default object or swap object but without a backing
-		 * store must be zero filled.
-		 */
-		vm_page_free(m);
-		return;
-	}
-	KASSERT((m->flags & PG_CACHED) == 0,
-	    ("vm_page_cache: page %p is already cached", m));
-
-	/*
-	 * Remove the page from the paging queues.
-	 */
-	vm_page_remque(m);
-
-	/*
-	 * Remove the page from the object's collection of resident
-	 * pages.
-	 */
-	vm_radix_remove(&object->rtree, m->pindex);
-	TAILQ_REMOVE(&object->memq, m, listq);
-	object->resident_page_count--;
-
-	/*
-	 * Restore the default memory attribute to the page.
-	 */
-	if (pmap_page_get_memattr(m) != VM_MEMATTR_DEFAULT)
-		pmap_page_set_memattr(m, VM_MEMATTR_DEFAULT);
-
-	/*
-	 * Insert the page into the object's collection of cached pages
-	 * and the physical memory allocator's cache/free page queues.
-	 */
-	m->flags &= ~PG_ZERO;
-	mtx_lock(&vm_page_queue_free_mtx);
-	cache_was_empty = vm_radix_is_empty(&object->cache);
-	if (vm_radix_insert(&object->cache, m)) {
-		mtx_unlock(&vm_page_queue_free_mtx);
-		if (object->type == OBJT_VNODE &&
-		    object->resident_page_count == 0)
-			vdrop(object->handle);
-		m->object = NULL;
-		vm_page_free(m);
-		return;
-	}
-
-	/*
-	 * The above call to vm_radix_insert() could reclaim the one pre-
-	 * existing cached page from this object, resulting in a call to
-	 * vdrop().
-	 */
-	if (!cache_was_empty)
-		cache_was_empty = vm_radix_is_singleton(&object->cache);
-
-	m->flags |= PG_CACHED;
-	vm_cnt.v_cache_count++;
-	PCPU_INC(cnt.v_tcached);
-#if VM_NRESERVLEVEL > 0
-	if (!vm_reserv_free_page(m)) {
-#else
-	if (TRUE) {
-#endif
-		vm_phys_free_pages(m, 0);
-	}
-	vm_page_free_wakeup();
-	mtx_unlock(&vm_page_queue_free_mtx);
-
-	/*
-	 * Increment the vnode's hold count if this is the object's only
-	 * cached page.  Decrement the vnode's hold count if this was
-	 * the object's only resident page.
-	 */
-	if (object->type == OBJT_VNODE) {
-		if (cache_was_empty && object->resident_page_count != 0)
-			vhold(object->handle);
-		else if (!cache_was_empty && object->resident_page_count == 0)
-			vdrop(object->handle);
-	}
-}
-
-/*
  * vm_page_advise
  *
  * 	Deactivate or do nothing, as appropriate.
@@ -3413,11 +3312,13 @@ vm_page_advise(vm_page_t m, int advice)
 	/*
 	 * Place clean pages near the head of the inactive queue rather than
 	 * the tail, thus defeating the queue's LRU operation and ensuring that
-	 * the page will be reused quickly.  Dirty pages are given a chance to
-	 * cycle once through the inactive queue before becoming eligible for
-	 * laundering.
+	 * the page will be reused quickly.  Dirty pages not already in the
+	 * laundry are moved there.
 	 */
-	_vm_page_deactivate(m, m->dirty == 0);
+	if (m->dirty == 0)
+		vm_page_deactivate_noreuse(m);
+	else
+		vm_page_launder(m);
 }
 
 /*
@@ -3926,6 +3827,7 @@ DB_SHOW_COMMAND(page, vm_page_print_page
 	db_printf("vm_cnt.v_cache_count: %d\n", vm_cnt.v_cache_count);
 	db_printf("vm_cnt.v_inactive_count: %d\n", vm_cnt.v_inactive_count);
 	db_printf("vm_cnt.v_active_count: %d\n", vm_cnt.v_active_count);
+	db_printf("vm_cnt.v_laundry_count: %d\n", vm_cnt.v_laundry_count);
 	db_printf("vm_cnt.v_wire_count: %d\n", vm_cnt.v_wire_count);
 	db_printf("vm_cnt.v_free_reserved: %d\n", vm_cnt.v_free_reserved);
 	db_printf("vm_cnt.v_free_min: %d\n", vm_cnt.v_free_min);
@@ -3940,12 +3842,14 @@ DB_SHOW_COMMAND(pageq, vm_page_print_pag
 	db_printf("pq_free %d pq_cache %d\n",
 	    vm_cnt.v_free_count, vm_cnt.v_cache_count);
 	for (dom = 0; dom < vm_ndomains; dom++) {
-		db_printf("dom %d page_cnt %d free %d pq_act %d pq_inact %d\n",
+		db_printf(
+	    "dom %d page_cnt %d free %d pq_act %d pq_inact %d pq_laund %d\n",
 		    dom,
 		    vm_dom[dom].vmd_page_count,
 		    vm_dom[dom].vmd_free_count,
 		    vm_dom[dom].vmd_pagequeues[PQ_ACTIVE].pq_cnt,
-		    vm_dom[dom].vmd_pagequeues[PQ_INACTIVE].pq_cnt);
+		    vm_dom[dom].vmd_pagequeues[PQ_INACTIVE].pq_cnt,
+		    vm_dom[dom].vmd_pagequeues[PQ_LAUNDRY].pq_cnt);
 	}
 }
 

Modified: head/sys/vm/vm_page.h
==============================================================================
--- head/sys/vm/vm_page.h	Wed Nov  9 18:42:30 2016	(r308473)
+++ head/sys/vm/vm_page.h	Wed Nov  9 18:48:37 2016	(r308474)
@@ -206,7 +206,8 @@ struct vm_page {
 #define	PQ_NONE		255
 #define	PQ_INACTIVE	0
 #define	PQ_ACTIVE	1
-#define	PQ_COUNT	2
+#define	PQ_LAUNDRY	2
+#define	PQ_COUNT	3
 
 TAILQ_HEAD(pglist, vm_page);
 SLIST_HEAD(spglist, vm_page);
@@ -228,6 +229,7 @@ struct vm_domain {
 	boolean_t vmd_oom;
 	int vmd_oom_seq;
 	int vmd_last_active_scan;
+	struct vm_page vmd_laundry_marker;
 	struct vm_page vmd_marker; /* marker for pagedaemon private use */
 	struct vm_page vmd_inacthead; /* marker for LRU-defeating insertions */
 };
@@ -236,6 +238,7 @@ extern struct vm_domain vm_dom[MAXMEMDOM
 
 #define	vm_pagequeue_assert_locked(pq)	mtx_assert(&(pq)->pq_mutex, MA_OWNED)
 #define	vm_pagequeue_lock(pq)		mtx_lock(&(pq)->pq_mutex)
+#define	vm_pagequeue_lockptr(pq)	(&(pq)->pq_mutex)
 #define	vm_pagequeue_unlock(pq)		mtx_unlock(&(pq)->pq_mutex)
 
 #ifdef _KERNEL
@@ -327,7 +330,6 @@ extern struct mtx_padalign pa_lock[];
 #define	PG_FICTITIOUS	0x0004		/* physical page doesn't exist */
 #define	PG_ZERO		0x0008		/* page is zeroed */
 #define	PG_MARKER	0x0010		/* special queue marker page */
-#define	PG_WINATCFLS	0x0040		/* flush dirty page on inactive q */
 #define	PG_NODUMP	0x0080		/* don't include this page in a dump */
 #define	PG_UNHOLDFREE	0x0100		/* delayed free of a held page */
 
@@ -451,10 +453,8 @@ vm_page_t vm_page_alloc_contig(vm_object
     vm_paddr_t boundary, vm_memattr_t memattr);
 vm_page_t vm_page_alloc_freelist(int, int);
 vm_page_t vm_page_grab (vm_object_t, vm_pindex_t, int);
-void vm_page_cache(vm_page_t);
 void vm_page_cache_free(vm_object_t, vm_pindex_t, vm_pindex_t);
 void vm_page_cache_transfer(vm_object_t, vm_pindex_t, vm_object_t);
-int vm_page_try_to_cache (vm_page_t);
 int vm_page_try_to_free (vm_page_t);
 void vm_page_deactivate (vm_page_t);
 void vm_page_deactivate_noreuse(vm_page_t);
@@ -465,6 +465,7 @@ vm_page_t vm_page_getfake(vm_paddr_t pad
 void vm_page_initfake(vm_page_t m, vm_paddr_t paddr, vm_memattr_t memattr);
 int vm_page_insert (vm_page_t, vm_object_t, vm_pindex_t);
 boolean_t vm_page_is_cached(vm_object_t object, vm_pindex_t pindex);
+void vm_page_launder(vm_page_t m);
 vm_page_t vm_page_lookup (vm_object_t, vm_pindex_t);
 vm_page_t vm_page_next(vm_page_t m);
 int vm_page_pa_tryrelock(pmap_t, vm_paddr_t, vm_paddr_t *);
@@ -697,5 +698,26 @@ vm_page_replace_checked(vm_page_t mnew, 
 	(void)mret;
 }
 
+static inline bool
+vm_page_active(vm_page_t m)
+{
+
+	return (m->queue == PQ_ACTIVE);
+}
+
+static inline bool
+vm_page_inactive(vm_page_t m)
+{
+
+	return (m->queue == PQ_INACTIVE);
+}
+
+static inline bool
+vm_page_in_laundry(vm_page_t m)
+{
+
+	return (m->queue == PQ_LAUNDRY);
+}
+
 #endif				/* _KERNEL */
 #endif				/* !_VM_PAGE_ */

Modified: head/sys/vm/vm_pageout.c
==============================================================================
--- head/sys/vm/vm_pageout.c	Wed Nov  9 18:42:30 2016	(r308473)
+++ head/sys/vm/vm_pageout.c	Wed Nov  9 18:48:37 2016	(r308474)
@@ -119,7 +119,7 @@ __FBSDID("$FreeBSD$");
 /* the kernel process "vm_pageout"*/
 static void vm_pageout(void);
 static void vm_pageout_init(void);
-static int vm_pageout_clean(vm_page_t m);
+static int vm_pageout_clean(vm_page_t m, int *numpagedout);
 static int vm_pageout_cluster(vm_page_t m);
 static bool vm_pageout_scan(struct vm_domain *vmd, int pass);
 static void vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
@@ -154,6 +154,9 @@ static struct kproc_desc vm_kp = {
 SYSINIT(vmdaemon, SI_SUB_KTHREAD_VM, SI_ORDER_FIRST, kproc_start, &vm_kp);
 #endif
 
+/* Pagedaemon activity rates, in subdivisions of one second. */
+#define	VM_LAUNDER_RATE		10
+#define	VM_INACT_SCAN_RATE	2
 
 int vm_pageout_deficit;		/* Estimated number of pages deficit */
 u_int vm_pageout_wakeup_thresh;
@@ -161,6 +164,13 @@ static int vm_pageout_oom_seq = 12;
 bool vm_pageout_wanted;		/* Event on which pageout daemon sleeps */
 bool vm_pages_needed;		/* Are threads waiting for free pages? */
 
+/* Pending request for dirty page laundering. */
+static enum {
+	VM_LAUNDRY_IDLE,
+	VM_LAUNDRY_BACKGROUND,
+	VM_LAUNDRY_SHORTFALL
+} vm_laundry_request = VM_LAUNDRY_IDLE;
+
 #if !defined(NO_SWAPPING)
 static int vm_pageout_req_swapout;	/* XXX */
 static int vm_daemon_needed;
@@ -168,9 +178,7 @@ static struct mtx vm_daemon_mtx;
 /* Allow for use by vm_pageout before vm_daemon is initialized. */
 MTX_SYSINIT(vm_daemon, &vm_daemon_mtx, "vm daemon", MTX_DEF);
 #endif
-static int vm_max_launder = 32;
 static int vm_pageout_update_period;
-static int defer_swap_pageouts;
 static int disable_swap_pageouts;
 static int lowmem_period = 10;
 static time_t lowmem_uptime;
@@ -193,9 +201,6 @@ SYSCTL_INT(_vm, OID_AUTO, pageout_wakeup
 	CTLFLAG_RW, &vm_pageout_wakeup_thresh, 0,
 	"free page threshold for waking up the pageout daemon");
 
-SYSCTL_INT(_vm, OID_AUTO, max_launder,
-	CTLFLAG_RW, &vm_max_launder, 0, "Limit dirty flushes in pageout");
-
 SYSCTL_INT(_vm, OID_AUTO, pageout_update_period,
 	CTLFLAG_RW, &vm_pageout_update_period, 0,
 	"Maximum active LRU update period");
@@ -215,9 +220,6 @@ SYSCTL_INT(_vm, OID_AUTO, swap_idle_enab
 	CTLFLAG_RW, &vm_swap_idle_enabled, 0, "Allow swapout on idle criteria");
 #endif
 
-SYSCTL_INT(_vm, OID_AUTO, defer_swapspace_pageouts,
-	CTLFLAG_RW, &defer_swap_pageouts, 0, "Give preference to dirty pages in mem");
-
 SYSCTL_INT(_vm, OID_AUTO, disable_swapspace_pageouts,
 	CTLFLAG_RW, &disable_swap_pageouts, 0, "Disallow swapout of dirty pages");
 
@@ -229,6 +231,25 @@ SYSCTL_INT(_vm, OID_AUTO, pageout_oom_se
 	CTLFLAG_RW, &vm_pageout_oom_seq, 0,
 	"back-to-back calls to oom detector to start OOM");
 
+static int act_scan_laundry_weight = 3;
+SYSCTL_INT(_vm, OID_AUTO, act_scan_laundry_weight, CTLFLAG_RW,
+    &act_scan_laundry_weight, 0,
+    "weight given to clean vs. dirty pages in active queue scans");
+
+static u_int vm_background_launder_target;
+SYSCTL_UINT(_vm, OID_AUTO, background_launder_target, CTLFLAG_RW,
+    &vm_background_launder_target, 0,
+    "background laundering target, in pages");
+
+static u_int vm_background_launder_rate = 4096;
+SYSCTL_UINT(_vm, OID_AUTO, background_launder_rate, CTLFLAG_RW,
+    &vm_background_launder_rate, 0,
+    "background laundering rate, in kilobytes per second");
+
+static u_int vm_background_launder_max = 20 * 1024;
+SYSCTL_UINT(_vm, OID_AUTO, background_launder_max, CTLFLAG_RW,
+    &vm_background_launder_max, 0, "background laundering cap, in kilobytes");
+
 #define VM_PAGEOUT_PAGE_COUNT 16
 int vm_pageout_page_count = VM_PAGEOUT_PAGE_COUNT;
 
@@ -236,7 +257,11 @@ int vm_page_max_wired;		/* XXX max # of 
 SYSCTL_INT(_vm, OID_AUTO, max_wired,
 	CTLFLAG_RW, &vm_page_max_wired, 0, "System-wide limit to wired page count");
 
+static u_int isqrt(u_int num);
 static boolean_t vm_pageout_fallback_object_lock(vm_page_t, vm_page_t *);
+static int vm_pageout_launder(struct vm_domain *vmd, int launder,
+    bool in_shortfall);
+static void vm_pageout_laundry_worker(void *arg);
 #if !defined(NO_SWAPPING)
 static void vm_pageout_map_deactivate_pages(vm_map_t, long);
 static void vm_pageout_object_deactivate_pages(pmap_t, vm_object_t, long);
@@ -387,7 +412,7 @@ vm_pageout_cluster(vm_page_t m)
 
 	/*
 	 * We can cluster only if the page is not clean, busy, or held, and
-	 * the page is inactive.
+	 * the page is in the laundry queue.
 	 *
 	 * During heavy mmap/modification loads the pageout
 	 * daemon can really fragment the underlying file
@@ -413,7 +438,7 @@ more:
 			break;
 		}
 		vm_page_lock(p);
-		if (p->queue != PQ_INACTIVE ||
+		if (!vm_page_in_laundry(p) ||
 		    p->hold_count != 0) {	/* may be undergoing I/O */
 			vm_page_unlock(p);
 			ib = 0;
@@ -439,7 +464,7 @@ more:
 		if (p->dirty == 0)
 			break;
 		vm_page_lock(p);
-		if (p->queue != PQ_INACTIVE ||
+		if (!vm_page_in_laundry(p) ||
 		    p->hold_count != 0) {	/* may be undergoing I/O */
 			vm_page_unlock(p);
 			break;
@@ -519,23 +544,33 @@ vm_pageout_flush(vm_page_t *mc, int coun
 		    ("vm_pageout_flush: page %p is not write protected", mt));
 		switch (pageout_status[i]) {
 		case VM_PAGER_OK:
+			vm_page_lock(mt);
+			if (vm_page_in_laundry(mt))
+				vm_page_deactivate_noreuse(mt);
+			vm_page_unlock(mt);
+			/* FALLTHROUGH */
 		case VM_PAGER_PEND:
 			numpagedout++;
 			break;
 		case VM_PAGER_BAD:
 			/*
-			 * Page outside of range of object. Right now we
-			 * essentially lose the changes by pretending it
-			 * worked.
+			 * The page is outside the object's range.  We pretend
+			 * that the page out worked and clean the page, so the
+			 * changes will be lost if the page is reclaimed by
+			 * the page daemon.
 			 */
 			vm_page_undirty(mt);
+			vm_page_lock(mt);
+			if (vm_page_in_laundry(mt))
+				vm_page_deactivate_noreuse(mt);
+			vm_page_unlock(mt);
 			break;
 		case VM_PAGER_ERROR:
 		case VM_PAGER_FAIL:
 			/*
-			 * If page couldn't be paged out, then reactivate the
-			 * page so it doesn't clog the inactive list.  (We
-			 * will try paging out it again later).
+			 * If the page couldn't be paged out, then reactivate
+			 * it so that it doesn't clog the laundry and inactive
+			 * queues.  (We will try paging it out again later).
 			 */
 			vm_page_lock(mt);
 			vm_page_activate(mt);
@@ -617,10 +652,10 @@ vm_pageout_object_deactivate_pages(pmap_
 					act_delta = 1;
 				vm_page_aflag_clear(p, PGA_REFERENCED);
 			}
-			if (p->queue != PQ_ACTIVE && act_delta != 0) {
+			if (!vm_page_active(p) && act_delta != 0) {
 				vm_page_activate(p);
 				p->act_count += act_delta;
-			} else if (p->queue == PQ_ACTIVE) {
+			} else if (vm_page_active(p)) {
 				if (act_delta == 0) {
 					p->act_count -= min(p->act_count,
 					    ACT_DECLINE);
@@ -636,7 +671,7 @@ vm_pageout_object_deactivate_pages(pmap_
 						p->act_count += ACT_ADVANCE;
 					vm_page_requeue(p);
 				}
-			} else if (p->queue == PQ_INACTIVE)
+			} else if (vm_page_inactive(p))
 				pmap_remove_all(p);
 			vm_page_unlock(p);
 		}
@@ -739,7 +774,7 @@ vm_pageout_map_deactivate_pages(map, des
  * Returns 0 on success and an errno otherwise.
  */
 static int
-vm_pageout_clean(vm_page_t m)
+vm_pageout_clean(vm_page_t m, int *numpagedout)
 {
 	struct vnode *vp;
 	struct mount *mp;
@@ -797,7 +832,7 @@ vm_pageout_clean(vm_page_t m)
 		 * (3) reallocated to a different offset, or
 		 * (4) cleaned.
 		 */
-		if (m->queue != PQ_INACTIVE || m->object != object ||
+		if (!vm_page_in_laundry(m) || m->object != object ||
 		    m->pindex != pindex || m->dirty == 0) {
 			vm_page_unlock(m);
 			error = ENXIO;
@@ -821,7 +856,7 @@ vm_pageout_clean(vm_page_t m)
 	 * laundry.  If it is still in the laundry, then we
 	 * start the cleaning operation. 
 	 */
-	if (vm_pageout_cluster(m) == 0)
+	if ((*numpagedout = vm_pageout_cluster(m)) == 0)
 		error = EIO;
 
 unlock_all:
@@ -840,11 +875,390 @@ unlock_mp:
 }
 
 /*
+ * Attempt to launder the specified number of pages.
+ *
+ * Returns the number of pages successfully laundered.
+ */
+static int
+vm_pageout_launder(struct vm_domain *vmd, int launder, bool in_shortfall)
+{
+	struct vm_pagequeue *pq;
+	vm_object_t object;
+	vm_page_t m, next;
+	int act_delta, error, maxscan, numpagedout, starting_target;
+	int vnodes_skipped;
+	bool pageout_ok, queue_locked;
+
+	starting_target = launder;
+	vnodes_skipped = 0;
+
+	/*
+	 * Scan the laundry queue for pages eligible to be laundered.  We stop
+	 * once the target number of dirty pages have been laundered, or once
+	 * we've reached the end of the queue.  A single iteration of this loop
+	 * may cause more than one page to be laundered because of clustering.
+	 *
+	 * maxscan ensures that we don't re-examine requeued pages.  Any
+	 * additional pages written as part of a cluster are subtracted from
+	 * maxscan since they must be taken from the laundry queue.
+	 */
+	pq = &vmd->vmd_pagequeues[PQ_LAUNDRY];
+	maxscan = pq->pq_cnt;
+
+	vm_pagequeue_lock(pq);
+	queue_locked = true;
+	for (m = TAILQ_FIRST(&pq->pq_pl);
+	    m != NULL && maxscan-- > 0 && launder > 0;
+	    m = next) {
+		vm_pagequeue_assert_locked(pq);
+		KASSERT(queue_locked, ("unlocked laundry queue"));
+		KASSERT(vm_page_in_laundry(m),
+		    ("page %p has an inconsistent queue", m));
+		next = TAILQ_NEXT(m, plinks.q);
+		if ((m->flags & PG_MARKER) != 0)
+			continue;
+		KASSERT((m->flags & PG_FICTITIOUS) == 0,
+		    ("PG_FICTITIOUS page %p cannot be in laundry queue", m));
+		KASSERT((m->oflags & VPO_UNMANAGED) == 0,
+		    ("VPO_UNMANAGED page %p cannot be in laundry queue", m));
+		if (!vm_pageout_page_lock(m, &next) || m->hold_count != 0) {
+			vm_page_unlock(m);
+			continue;
+		}
+		object = m->object;
+		if ((!VM_OBJECT_TRYWLOCK(object) &&
+		    (!vm_pageout_fallback_object_lock(m, &next) ||
+		    m->hold_count != 0)) || vm_page_busied(m)) {
+			VM_OBJECT_WUNLOCK(object);
+			vm_page_unlock(m);
+			continue;
+		}
+
+		/*
+		 * Unlock the laundry queue, invalidating the 'next' pointer.
+		 * Use a marker to remember our place in the laundry queue.
+		 */
+		TAILQ_INSERT_AFTER(&pq->pq_pl, m, &vmd->vmd_laundry_marker,
+		    plinks.q);
+		vm_pagequeue_unlock(pq);
+		queue_locked = false;
+
+		/*
+		 * Invalid pages can be easily freed.  They cannot be
+		 * mapped; vm_page_free() asserts this.
+		 */
+		if (m->valid == 0)
+			goto free_page;
+
+		/*
+		 * If the page has been referenced and the object is not dead,
+		 * reactivate or requeue the page depending on whether the
+		 * object is mapped.
+		 */
+		if ((m->aflags & PGA_REFERENCED) != 0) {
+			vm_page_aflag_clear(m, PGA_REFERENCED);
+			act_delta = 1;
+		} else
+			act_delta = 0;
+		if (object->ref_count != 0)
+			act_delta += pmap_ts_referenced(m);
+		else {
+			KASSERT(!pmap_page_is_mapped(m),
+			    ("page %p is mapped", m));
+		}
+		if (act_delta != 0) {
+			if (object->ref_count != 0) {
+				PCPU_INC(cnt.v_reactivated);
+				vm_page_activate(m);
+
+				/*
+				 * Increase the activation count if the page
+				 * was referenced while in the laundry queue.
+				 * This makes it less likely that the page will
+				 * be returned prematurely to the inactive
+				 * queue.
+ 				 */
+				m->act_count += act_delta + ACT_ADVANCE;
+
+				/*
+				 * If this was a background laundering, count
+				 * activated pages towards our target.  The
+				 * purpose of background laundering is to ensure
+				 * that pages are eventually cycled through the
+				 * laundry queue, and an activation is a valid
+				 * way out.
+				 */
+				if (!in_shortfall)
+					launder--;
+				goto drop_page;
+			} else if ((object->flags & OBJ_DEAD) == 0)
+				goto requeue_page;
+		}
+
+		/*
+		 * If the page appears to be clean at the machine-independent
+		 * layer, then remove all of its mappings from the pmap in
+		 * anticipation of freeing it.  If, however, any of the page's
+		 * mappings allow write access, then the page may still be
+		 * modified until the last of those mappings are removed.
+		 */
+		if (object->ref_count != 0) {
+			vm_page_test_dirty(m);
+			if (m->dirty == 0)
+				pmap_remove_all(m);
+		}
+
+		/*
+		 * Clean pages are freed, and dirty pages are paged out unless
+		 * they belong to a dead object.  Requeueing dirty pages from
+		 * dead objects is pointless, as they are being paged out and
+		 * freed by the thread that destroyed the object.
+		 */
+		if (m->dirty == 0) {
+free_page:
+			vm_page_free(m);
+			PCPU_INC(cnt.v_dfree);
+		} else if ((object->flags & OBJ_DEAD) == 0) {
+			if (object->type != OBJT_SWAP &&
+			    object->type != OBJT_DEFAULT)
+				pageout_ok = true;
+			else if (disable_swap_pageouts)
+				pageout_ok = false;
+			else
+				pageout_ok = true;
+			if (!pageout_ok) {
+requeue_page:
+				vm_pagequeue_lock(pq);
+				queue_locked = true;
+				vm_page_requeue_locked(m);
+				goto drop_page;
+			}
+
+			/*
+			 * Form a cluster with adjacent, dirty pages from the
+			 * same object, and page out that entire cluster.
+			 *
+			 * The adjacent, dirty pages must also be in the
+			 * laundry.  However, their mappings are not checked
+			 * for new references.  Consequently, a recently
+			 * referenced page may be paged out.  However, that
+			 * page will not be prematurely reclaimed.  After page
+			 * out, the page will be placed in the inactive queue,
+			 * where any new references will be detected and the
+			 * page reactivated.
+			 */
+			error = vm_pageout_clean(m, &numpagedout);
+			if (error == 0) {

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201611091848.uA9ImbLT013831>