From owner-svn-src-all@freebsd.org Mon Nov 16 06:02:13 2015 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 39B9CA30161; Mon, 16 Nov 2015 06:02:13 +0000 (UTC) (envelope-from kib@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1027A182E; Mon, 16 Nov 2015 06:02:12 +0000 (UTC) (envelope-from kib@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id tAG62Csd064087; Mon, 16 Nov 2015 06:02:12 GMT (envelope-from kib@FreeBSD.org) Received: (from kib@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id tAG62CrZ064086; Mon, 16 Nov 2015 06:02:12 GMT (envelope-from kib@FreeBSD.org) Message-Id: <201511160602.tAG62CrZ064086@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: kib set sender to kib@FreeBSD.org using -f From: Konstantin Belousov Date: Mon, 16 Nov 2015 06:02:12 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r290917 - head/sys/vm X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Nov 2015 06:02:13 -0000 Author: kib Date: Mon Nov 16 06:02:11 2015 New Revision: 290917 URL: https://svnweb.freebsd.org/changeset/base/290917 Log: Do not use vmspace_resident_count() for the OOM process selection. Residency count track the number of pte entries installed into the current pmap, which does not reflect the consumption of the physical memory by the address map. Due to several mechanisms like pv entries reclamation, copy on write etc. the resident pte entries count may be much less than the amount of physical memory kept by the process. Provide the OOM-specific vm_pageout_oom_pagecount() function which estimates the amount of reclamaible memory which could be stolen if the process is killed. Reported and tested by: pho Reviewed by: alc Comments text by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Modified: head/sys/vm/vm_pageout.c Modified: head/sys/vm/vm_pageout.c ============================================================================== --- head/sys/vm/vm_pageout.c Mon Nov 16 06:02:09 2015 (r290916) +++ head/sys/vm/vm_pageout.c Mon Nov 16 06:02:11 2015 (r290917) @@ -1510,6 +1510,65 @@ vm_pageout_mightbe_oom(struct vm_domain atomic_subtract_int(&vm_pageout_oom_vote, 1); } +/* + * The OOM killer is the page daemon's action of last resort when + * memory allocation requests have been stalled for a prolonged period + * of time because it cannot reclaim memory. This function computes + * the approximate number of physical pages that could be reclaimed if + * the specified address space is destroyed. + * + * Private, anonymous memory owned by the address space is the + * principal resource that we expect to recover after an OOM kill. + * Since the physical pages mapped by the address space's COW entries + * are typically shared pages, they are unlikely to be released and so + * they are not counted. + * + * To get to the point where the page daemon runs the OOM killer, its + * efforts to write-back vnode-backed pages may have stalled. This + * could be caused by a memory allocation deadlock in the write path + * that might be resolved by an OOM kill. Therefore, physical pages + * belonging to vnode-backed objects are counted, because they might + * be freed without being written out first if the address space holds + * the last reference to an unlinked vnode. + * + * Similarly, physical pages belonging to OBJT_PHYS objects are + * counted because the address space might hold the last reference to + * the object. + */ +static long +vm_pageout_oom_pagecount(struct vmspace *vmspace) +{ + vm_map_t map; + vm_map_entry_t entry; + vm_object_t obj; + long res; + + map = &vmspace->vm_map; + KASSERT(!map->system_map, ("system map")); + sx_assert(&map->lock, SA_LOCKED); + res = 0; + for (entry = map->header.next; entry != &map->header; + entry = entry->next) { + if ((entry->eflags & MAP_ENTRY_IS_SUB_MAP) != 0) + continue; + obj = entry->object.vm_object; + if (obj == NULL) + continue; + if ((entry->eflags & MAP_ENTRY_NEEDS_COPY) != 0 && + obj->ref_count != 1) + continue; + switch (obj->type) { + case OBJT_DEFAULT: + case OBJT_SWAP: + case OBJT_PHYS: + case OBJT_VNODE: + res += obj->resident_page_count; + break; + } + } + return (res); +} + void vm_pageout_oom(int shortage) { @@ -1583,12 +1642,13 @@ vm_pageout_oom(int shortage) } PROC_UNLOCK(p); size = vmspace_swap_count(vm); - vm_map_unlock_read(&vm->vm_map); if (shortage == VM_OOM_MEM) - size += vmspace_resident_count(vm); + size += vm_pageout_oom_pagecount(vm); + vm_map_unlock_read(&vm->vm_map); vmspace_free(vm); + /* - * if the this process is bigger than the biggest one + * If this process is bigger than the biggest one, * remember it. */ if (size > bigsize) {