From owner-svn-src-head@FreeBSD.ORG Tue Jun 23 20:45:23 2009 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4EDD61065694; Tue, 23 Jun 2009 20:45:23 +0000 (UTC) (envelope-from kib@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 3A8D38FC0C; Tue, 23 Jun 2009 20:45:23 +0000 (UTC) (envelope-from kib@FreeBSD.org) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id n5NKjNwp089676; Tue, 23 Jun 2009 20:45:23 GMT (envelope-from kib@svn.freebsd.org) Received: (from kib@localhost) by svn.freebsd.org (8.14.3/8.14.3/Submit) id n5NKjMMC089652; Tue, 23 Jun 2009 20:45:22 GMT (envelope-from kib@svn.freebsd.org) Message-Id: <200906232045.n5NKjMMC089652@svn.freebsd.org> From: Konstantin Belousov Date: Tue, 23 Jun 2009 20:45:22 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r194766 - in head/sys: dev/md fs/procfs fs/tmpfs kern security/mac_biba security/mac_lomac sys vm X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2009 20:45:23 -0000 Author: kib Date: Tue Jun 23 20:45:22 2009 New Revision: 194766 URL: http://svn.freebsd.org/changeset/base/194766 Log: Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith) Modified: head/sys/dev/md/md.c head/sys/fs/procfs/procfs_map.c head/sys/fs/tmpfs/tmpfs_subr.c head/sys/kern/kern_fork.c head/sys/kern/kern_resource.c head/sys/kern/sys_process.c head/sys/kern/sysv_shm.c head/sys/kern/uipc_shm.c head/sys/security/mac_biba/mac_biba.c head/sys/security/mac_lomac/mac_lomac.c head/sys/sys/priv.h head/sys/sys/resource.h head/sys/sys/resourcevar.h head/sys/vm/default_pager.c head/sys/vm/device_pager.c head/sys/vm/phys_pager.c head/sys/vm/swap_pager.c head/sys/vm/vm.h head/sys/vm/vm_extern.h head/sys/vm/vm_fault.c head/sys/vm/vm_kern.c head/sys/vm/vm_map.c head/sys/vm/vm_map.h head/sys/vm/vm_mmap.c head/sys/vm/vm_object.c head/sys/vm/vm_object.h head/sys/vm/vm_pager.c head/sys/vm/vm_pager.h head/sys/vm/vnode_pager.c Modified: head/sys/dev/md/md.c ============================================================================== --- head/sys/dev/md/md.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/dev/md/md.c Tue Jun 23 20:45:22 2009 (r194766) @@ -1042,18 +1042,18 @@ mdcreate_swap(struct md_s *sc, struct md if (mdio->md_fwheads != 0) sc->fwheads = mdio->md_fwheads; sc->object = vm_pager_allocate(OBJT_SWAP, NULL, PAGE_SIZE * npage, - VM_PROT_DEFAULT, 0); + VM_PROT_DEFAULT, 0, td->td_ucred); if (sc->object == NULL) return (ENOMEM); sc->flags = mdio->md_options & MD_FORCE; if (mdio->md_options & MD_RESERVE) { if (swap_pager_reserve(sc->object, 0, npage) < 0) { - vm_object_deallocate(sc->object); - sc->object = NULL; - return (EDOM); + error = EDOM; + goto finish; } } error = mdsetcred(sc, td->td_ucred); + finish: if (error != 0) { vm_object_deallocate(sc->object); sc->object = NULL; Modified: head/sys/fs/procfs/procfs_map.c ============================================================================== --- head/sys/fs/procfs/procfs_map.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/fs/procfs/procfs_map.c Tue Jun 23 20:45:22 2009 (r194766) @@ -45,6 +45,7 @@ #include #include #include +#include #include #ifdef COMPAT_IA32 #include @@ -82,6 +83,7 @@ procfs_doprocmap(PFS_FILL_ARGS) vm_map_entry_t entry, tmp_entry; struct vnode *vp; char *fullpath, *freepath; + struct uidinfo *uip; int error, vfslocked; unsigned int last_timestamp; #ifdef COMPAT_IA32 @@ -134,6 +136,7 @@ procfs_doprocmap(PFS_FILL_ARGS) if (obj->shadow_count == 1) privateresident = obj->resident_page_count; } + uip = (entry->uip) ? entry->uip : (obj ? obj->uip : NULL); resident = 0; addr = entry->start; @@ -198,10 +201,11 @@ procfs_doprocmap(PFS_FILL_ARGS) /* * format: - * start, end, resident, private resident, cow, access, type. + * start, end, resident, private resident, cow, access, type, + * charged, charged uid. */ error = sbuf_printf(sb, - "0x%lx 0x%lx %d %d %p %s%s%s %d %d 0x%x %s %s %s %s\n", + "0x%lx 0x%lx %d %d %p %s%s%s %d %d 0x%x %s %s %s %s %s %d\n", (u_long)e_start, (u_long)e_end, resident, privateresident, #ifdef COMPAT_IA32 @@ -215,7 +219,8 @@ procfs_doprocmap(PFS_FILL_ARGS) ref_count, shadow_count, flags, (e_eflags & MAP_ENTRY_COW)?"COW":"NCOW", (e_eflags & MAP_ENTRY_NEEDS_COPY)?"NC":"NNC", - type, fullpath); + type, fullpath, + uip ? "CH":"NCH", uip ? uip->ui_uid : -1); if (freepath != NULL) free(freepath, M_TEMP); Modified: head/sys/fs/tmpfs/tmpfs_subr.c ============================================================================== --- head/sys/fs/tmpfs/tmpfs_subr.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/fs/tmpfs/tmpfs_subr.c Tue Jun 23 20:45:22 2009 (r194766) @@ -142,7 +142,8 @@ tmpfs_alloc_node(struct tmpfs_mount *tmp case VREG: nnode->tn_reg.tn_aobj = - vm_pager_allocate(OBJT_SWAP, NULL, 0, VM_PROT_DEFAULT, 0); + vm_pager_allocate(OBJT_SWAP, NULL, 0, VM_PROT_DEFAULT, 0, + NULL /* XXXKIB - tmpfs needs swap reservation */); nnode->tn_reg.tn_aobj_pages = 0; break; Modified: head/sys/kern/kern_fork.c ============================================================================== --- head/sys/kern/kern_fork.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/kern/kern_fork.c Tue Jun 23 20:45:22 2009 (r194766) @@ -214,6 +214,7 @@ fork1(td, flags, pages, procp) struct thread *td2; struct sigacts *newsigacts; struct vmspace *vm2; + vm_ooffset_t mem_charged; int error; /* Can't copy and clear. */ @@ -274,6 +275,7 @@ norfproc_fail: * however it proved un-needed and caused problems */ + mem_charged = 0; vm2 = NULL; /* Allocate new proc. */ newproc = uma_zalloc(proc_zone, M_WAITOK); @@ -295,12 +297,24 @@ norfproc_fail: } } if ((flags & RFMEM) == 0) { - vm2 = vmspace_fork(p1->p_vmspace); + vm2 = vmspace_fork(p1->p_vmspace, &mem_charged); if (vm2 == NULL) { error = ENOMEM; goto fail1; } - } + if (!swap_reserve(mem_charged)) { + /* + * The swap reservation failed. The accounting + * from the entries of the copied vm2 will be + * substracted in vmspace_free(), so force the + * reservation there. + */ + swap_reserve_force(mem_charged); + error = ENOMEM; + goto fail1; + } + } else + vm2 = NULL; #ifdef MAC mac_proc_init(newproc); #endif Modified: head/sys/kern/kern_resource.c ============================================================================== --- head/sys/kern/kern_resource.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/kern/kern_resource.c Tue Jun 23 20:45:22 2009 (r194766) @@ -1213,6 +1213,8 @@ uifind(uid) } else { refcount_init(&uip->ui_ref, 0); uip->ui_uid = uid; + mtx_init(&uip->ui_vmsize_mtx, "ui_vmsize", NULL, + MTX_DEF); LIST_INSERT_HEAD(UIHASH(uid), uip, ui_hash); } } @@ -1269,6 +1271,10 @@ uifree(uip) if (uip->ui_proccnt != 0) printf("freeing uidinfo: uid = %d, proccnt = %ld\n", uip->ui_uid, uip->ui_proccnt); + if (uip->ui_vmsize != 0) + printf("freeing uidinfo: uid = %d, swapuse = %lld\n", + uip->ui_uid, (unsigned long long)uip->ui_vmsize); + mtx_destroy(&uip->ui_vmsize_mtx); free(uip, M_UIDINFO); return; } Modified: head/sys/kern/sys_process.c ============================================================================== --- head/sys/kern/sys_process.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/kern/sys_process.c Tue Jun 23 20:45:22 2009 (r194766) @@ -59,6 +59,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #ifdef COMPAT_IA32 #include @@ -270,7 +271,10 @@ proc_rwmem(struct proc *p, struct uio *u */ error = vm_fault(map, pageno, reqprot, fault_flags); if (error) { - error = EFAULT; + if (error == KERN_RESOURCE_SHORTAGE) + error = ENOMEM; + else + error = EFAULT; break; } Modified: head/sys/kern/sysv_shm.c ============================================================================== --- head/sys/kern/sysv_shm.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/kern/sysv_shm.c Tue Jun 23 20:45:22 2009 (r194766) @@ -770,13 +770,10 @@ shmget_allocate_segment(td, uap, mode) * We make sure that we have allocated a pager before we need * to. */ - if (shm_use_phys) { - shm_object = - vm_pager_allocate(OBJT_PHYS, 0, size, VM_PROT_DEFAULT, 0); - } else { - shm_object = - vm_pager_allocate(OBJT_SWAP, 0, size, VM_PROT_DEFAULT, 0); - } + shm_object = vm_pager_allocate(shm_use_phys ? OBJT_PHYS : OBJT_SWAP, + 0, size, VM_PROT_DEFAULT, 0, cred); + if (shm_object == NULL) + return (ENOMEM); VM_OBJECT_LOCK(shm_object); vm_object_clear_flag(shm_object, OBJ_ONEMAPPING); vm_object_set_flag(shm_object, OBJ_NOSPLIT); Modified: head/sys/kern/uipc_shm.c ============================================================================== --- head/sys/kern/uipc_shm.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/kern/uipc_shm.c Tue Jun 23 20:45:22 2009 (r194766) @@ -110,7 +110,7 @@ static struct shmfd *shm_hold(struct shm static void shm_insert(char *path, Fnv32_t fnv, struct shmfd *shmfd); static struct shmfd *shm_lookup(char *path, Fnv32_t fnv); static int shm_remove(char *path, Fnv32_t fnv, struct ucred *ucred); -static void shm_dotruncate(struct shmfd *shmfd, off_t length); +static int shm_dotruncate(struct shmfd *shmfd, off_t length); static fo_rdwr_t shm_read; static fo_rdwr_t shm_write; @@ -167,8 +167,7 @@ shm_truncate(struct file *fp, off_t leng if (error) return (error); #endif - shm_dotruncate(shmfd, length); - return (0); + return (shm_dotruncate(shmfd, length)); } static int @@ -242,23 +241,26 @@ shm_close(struct file *fp, struct thread return (0); } -static void +static int shm_dotruncate(struct shmfd *shmfd, off_t length) { vm_object_t object; vm_page_t m; vm_pindex_t nobjsize; + vm_ooffset_t delta; object = shmfd->shm_object; VM_OBJECT_LOCK(object); if (length == shmfd->shm_size) { VM_OBJECT_UNLOCK(object); - return; + return (0); } nobjsize = OFF_TO_IDX(length + PAGE_MASK); /* Are we shrinking? If so, trim the end. */ if (length < shmfd->shm_size) { + delta = ptoa(object->size - nobjsize); + /* Toss in memory pages. */ if (nobjsize < object->size) vm_object_page_remove(object, nobjsize, object->size, @@ -266,8 +268,11 @@ shm_dotruncate(struct shmfd *shmfd, off_ /* Toss pages from swap. */ if (object->type == OBJT_SWAP) - swap_pager_freespace(object, nobjsize, - object->size - nobjsize); + swap_pager_freespace(object, nobjsize, delta); + + /* Free the swap accounted for shm */ + swap_release_by_uid(delta, object->uip); + object->charge -= delta; /* * If the last page is partially mapped, then zero out @@ -307,6 +312,15 @@ shm_dotruncate(struct shmfd *shmfd, off_ vm_page_cache_free(object, OFF_TO_IDX(length), nobjsize); } + } else { + + /* Attempt to reserve the swap */ + delta = ptoa(nobjsize - object->size); + if (!swap_reserve_by_uid(delta, object->uip)) { + VM_OBJECT_UNLOCK(object); + return (ENOMEM); + } + object->charge += delta; } shmfd->shm_size = length; mtx_lock(&shm_timestamp_lock); @@ -315,6 +329,7 @@ shm_dotruncate(struct shmfd *shmfd, off_ mtx_unlock(&shm_timestamp_lock); object->size = nobjsize; VM_OBJECT_UNLOCK(object); + return (0); } /* @@ -332,7 +347,7 @@ shm_alloc(struct ucred *ucred, mode_t mo shmfd->shm_gid = ucred->cr_gid; shmfd->shm_mode = mode; shmfd->shm_object = vm_pager_allocate(OBJT_DEFAULT, NULL, - shmfd->shm_size, VM_PROT_DEFAULT, 0); + shmfd->shm_size, VM_PROT_DEFAULT, 0, ucred); KASSERT(shmfd->shm_object != NULL, ("shm_create: vm_pager_allocate")); VM_OBJECT_LOCK(shmfd->shm_object); vm_object_clear_flag(shmfd->shm_object, OBJ_ONEMAPPING); Modified: head/sys/security/mac_biba/mac_biba.c ============================================================================== --- head/sys/security/mac_biba/mac_biba.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/security/mac_biba/mac_biba.c Tue Jun 23 20:45:22 2009 (r194766) @@ -1830,6 +1830,8 @@ biba_priv_check(struct ucred *cred, int case PRIV_VM_MADV_PROTECT: case PRIV_VM_MLOCK: case PRIV_VM_MUNLOCK: + case PRIV_VM_SWAP_NOQUOTA: + case PRIV_VM_SWAP_NORLIMIT: /* * Allow some but not all network privileges. In general, dont allow Modified: head/sys/security/mac_lomac/mac_lomac.c ============================================================================== --- head/sys/security/mac_lomac/mac_lomac.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/security/mac_lomac/mac_lomac.c Tue Jun 23 20:45:22 2009 (r194766) @@ -1822,6 +1822,8 @@ lomac_priv_check(struct ucred *cred, int case PRIV_VM_MADV_PROTECT: case PRIV_VM_MLOCK: case PRIV_VM_MUNLOCK: + case PRIV_VM_SWAP_NOQUOTA: + case PRIV_VM_SWAP_NORLIMIT: /* * Allow some but not all network privileges. In general, dont allow Modified: head/sys/sys/priv.h ============================================================================== --- head/sys/sys/priv.h Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/sys/priv.h Tue Jun 23 20:45:22 2009 (r194766) @@ -283,6 +283,14 @@ #define PRIV_VM_MADV_PROTECT 360 /* Can set MADV_PROTECT. */ #define PRIV_VM_MLOCK 361 /* Can mlock(), mlockall(). */ #define PRIV_VM_MUNLOCK 362 /* Can munlock(), munlockall(). */ +#define PRIV_VM_SWAP_NOQUOTA 363 /* + * Can override the global + * swap reservation limits. + */ +#define PRIV_VM_SWAP_NORLIMIT 364 /* + * Can override the per-uid + * swap reservation limits. + */ /* * Device file system privileges. Modified: head/sys/sys/resource.h ============================================================================== --- head/sys/sys/resource.h Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/sys/resource.h Tue Jun 23 20:45:22 2009 (r194766) @@ -94,8 +94,9 @@ struct rusage { #define RLIMIT_VMEM 10 /* virtual process size (inclusive of mmap) */ #define RLIMIT_AS RLIMIT_VMEM /* standard name for RLIMIT_VMEM */ #define RLIMIT_NPTS 11 /* pseudo-terminals */ +#define RLIMIT_SWAP 12 /* swap used */ -#define RLIM_NLIMITS 12 /* number of resource limits */ +#define RLIM_NLIMITS 13 /* number of resource limits */ #define RLIM_INFINITY ((rlim_t)(((uint64_t)1 << 63) - 1)) /* XXX Missing: RLIM_SAVED_MAX, RLIM_SAVED_CUR */ @@ -119,6 +120,7 @@ static char *rlimit_ident[RLIM_NLIMITS] "sbsize", "vmem", "npts", + "swap", }; #endif Modified: head/sys/sys/resourcevar.h ============================================================================== --- head/sys/sys/resourcevar.h Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/sys/resourcevar.h Tue Jun 23 20:45:22 2009 (r194766) @@ -86,9 +86,12 @@ struct plimit { * (a) Constant from inception * (b) Lockless, updated using atomics * (c) Locked by global uihashtbl_mtx + * (d) Locked by the ui_vmsize_mtx */ struct uidinfo { LIST_ENTRY(uidinfo) ui_hash; /* (c) hash chain of uidinfos */ + struct mtx ui_vmsize_mtx; + vm_ooffset_t ui_vmsize; /* (d) swap reservation by uid */ long ui_sbsize; /* (b) socket buffer space consumed */ long ui_proccnt; /* (b) number of processes */ long ui_ptscnt; /* (b) number of pseudo-terminals */ @@ -96,6 +99,9 @@ struct uidinfo { u_int ui_ref; /* (b) reference count */ }; +#define UIDINFO_VMSIZE_LOCK(ui) mtx_lock(&((ui)->ui_vmsize_mtx)) +#define UIDINFO_VMSIZE_UNLOCK(ui) mtx_unlock(&((ui)->ui_vmsize_mtx)) + struct proc; struct rusage_ext; struct thread; Modified: head/sys/vm/default_pager.c ============================================================================== --- head/sys/vm/default_pager.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/vm/default_pager.c Tue Jun 23 20:45:22 2009 (r194766) @@ -44,6 +44,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include @@ -53,7 +54,7 @@ __FBSDID("$FreeBSD$"); #include static vm_object_t default_pager_alloc(void *, vm_ooffset_t, vm_prot_t, - vm_ooffset_t); + vm_ooffset_t, struct ucred *); static void default_pager_dealloc(vm_object_t); static int default_pager_getpages(vm_object_t, vm_page_t *, int, int); static void default_pager_putpages(vm_object_t, vm_page_t *, int, @@ -76,12 +77,28 @@ struct pagerops defaultpagerops = { */ static vm_object_t default_pager_alloc(void *handle, vm_ooffset_t size, vm_prot_t prot, - vm_ooffset_t offset) + vm_ooffset_t offset, struct ucred *cred) { + vm_object_t object; + struct uidinfo *uip; + if (handle != NULL) panic("default_pager_alloc: handle specified"); - - return vm_object_allocate(OBJT_DEFAULT, OFF_TO_IDX(round_page(offset + size))); + if (cred != NULL) { + uip = cred->cr_ruidinfo; + if (!swap_reserve_by_uid(size, uip)) + return (NULL); + uihold(uip); + } + object = vm_object_allocate(OBJT_DEFAULT, + OFF_TO_IDX(round_page(offset + size))); + if (cred != NULL) { + VM_OBJECT_LOCK(object); + object->uip = uip; + object->charge = size; + VM_OBJECT_UNLOCK(object); + } + return (object); } /* Modified: head/sys/vm/device_pager.c ============================================================================== --- head/sys/vm/device_pager.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/vm/device_pager.c Tue Jun 23 20:45:22 2009 (r194766) @@ -54,7 +54,7 @@ __FBSDID("$FreeBSD$"); static void dev_pager_init(void); static vm_object_t dev_pager_alloc(void *, vm_ooffset_t, vm_prot_t, - vm_ooffset_t); + vm_ooffset_t, struct ucred *); static void dev_pager_dealloc(vm_object_t); static int dev_pager_getpages(vm_object_t, vm_page_t *, int, int); static void dev_pager_putpages(vm_object_t, vm_page_t *, int, @@ -97,7 +97,8 @@ dev_pager_init() * MPSAFE */ static vm_object_t -dev_pager_alloc(void *handle, vm_ooffset_t size, vm_prot_t prot, vm_ooffset_t foff) +dev_pager_alloc(void *handle, vm_ooffset_t size, vm_prot_t prot, + vm_ooffset_t foff, struct ucred *cred) { struct cdev *dev; vm_object_t object, object1; Modified: head/sys/vm/phys_pager.c ============================================================================== --- head/sys/vm/phys_pager.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/vm/phys_pager.c Tue Jun 23 20:45:22 2009 (r194766) @@ -60,7 +60,7 @@ phys_pager_init(void) */ static vm_object_t phys_pager_alloc(void *handle, vm_ooffset_t size, vm_prot_t prot, - vm_ooffset_t foff) + vm_ooffset_t foff, struct ucred *cred) { vm_object_t object, object1; vm_pindex_t pindex; Modified: head/sys/vm/swap_pager.c ============================================================================== --- head/sys/vm/swap_pager.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/vm/swap_pager.c Tue Jun 23 20:45:22 2009 (r194766) @@ -86,6 +86,8 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include +#include #include #include #include @@ -152,6 +154,127 @@ static int nswapdev; /* Number of swap int swap_pager_avail; static int swdev_syscall_active = 0; /* serialize swap(on|off) */ +static vm_ooffset_t swap_total; +SYSCTL_QUAD(_vm, OID_AUTO, swap_total, CTLFLAG_RD, &swap_total, 0, ""); +static vm_ooffset_t swap_reserved; +SYSCTL_QUAD(_vm, OID_AUTO, swap_reserved, CTLFLAG_RD, &swap_reserved, 0, ""); +static int overcommit = 0; +SYSCTL_INT(_vm, OID_AUTO, overcommit, CTLFLAG_RW, &overcommit, 0, ""); + +/* bits from overcommit */ +#define SWAP_RESERVE_FORCE_ON (1 << 0) +#define SWAP_RESERVE_RLIMIT_ON (1 << 1) +#define SWAP_RESERVE_ALLOW_NONWIRED (1 << 2) + +int +swap_reserve(vm_ooffset_t incr) +{ + + return (swap_reserve_by_uid(incr, curthread->td_ucred->cr_ruidinfo)); +} + +int +swap_reserve_by_uid(vm_ooffset_t incr, struct uidinfo *uip) +{ + vm_ooffset_t r, s, max; + int res, error; + static int curfail; + static struct timeval lastfail; + + if (incr & PAGE_MASK) + panic("swap_reserve: & PAGE_MASK"); + + res = 0; + error = priv_check(curthread, PRIV_VM_SWAP_NOQUOTA); + mtx_lock(&sw_dev_mtx); + r = swap_reserved + incr; + if (overcommit & SWAP_RESERVE_ALLOW_NONWIRED) { + s = cnt.v_page_count - cnt.v_free_reserved - cnt.v_wire_count; + s *= PAGE_SIZE; + } else + s = 0; + s += swap_total; + if ((overcommit & SWAP_RESERVE_FORCE_ON) == 0 || r <= s || + (error = priv_check(curthread, PRIV_VM_SWAP_NOQUOTA)) == 0) { + res = 1; + swap_reserved = r; + } + mtx_unlock(&sw_dev_mtx); + + if (res) { + PROC_LOCK(curproc); + UIDINFO_VMSIZE_LOCK(uip); + error = priv_check(curthread, PRIV_VM_SWAP_NORLIMIT); + max = (error != 0) ? lim_cur(curproc, RLIMIT_SWAP) : 0; + if (max != 0 && uip->ui_vmsize + incr > max && + (overcommit & SWAP_RESERVE_RLIMIT_ON) != 0) + res = 0; + else + uip->ui_vmsize += incr; + UIDINFO_VMSIZE_UNLOCK(uip); + PROC_UNLOCK(curproc); + if (!res) { + mtx_lock(&sw_dev_mtx); + swap_reserved -= incr; + mtx_unlock(&sw_dev_mtx); + } + } + if (!res && ppsratecheck(&lastfail, &curfail, 1)) { + printf("uid %d, pid %d: swap reservation for %jd bytes failed\n", + curproc->p_pid, uip->ui_uid, incr); + } + + return (res); +} + +void +swap_reserve_force(vm_ooffset_t incr) +{ + struct uidinfo *uip; + + mtx_lock(&sw_dev_mtx); + swap_reserved += incr; + mtx_unlock(&sw_dev_mtx); + + uip = curthread->td_ucred->cr_ruidinfo; + PROC_LOCK(curproc); + UIDINFO_VMSIZE_LOCK(uip); + uip->ui_vmsize += incr; + UIDINFO_VMSIZE_UNLOCK(uip); + PROC_UNLOCK(curproc); +} + +void +swap_release(vm_ooffset_t decr) +{ + struct uidinfo *uip; + + PROC_LOCK(curproc); + uip = curthread->td_ucred->cr_ruidinfo; + swap_release_by_uid(decr, uip); + PROC_UNLOCK(curproc); +} + +void +swap_release_by_uid(vm_ooffset_t decr, struct uidinfo *uip) +{ + + if (decr & PAGE_MASK) + panic("swap_release: & PAGE_MASK"); + + mtx_lock(&sw_dev_mtx); + if (swap_reserved < decr) + panic("swap_reserved < decr"); + swap_reserved -= decr; + mtx_unlock(&sw_dev_mtx); + + UIDINFO_VMSIZE_LOCK(uip); + if (uip->ui_vmsize < decr) + printf("negative vmsize for uid = %d\n", uip->ui_uid); + uip->ui_vmsize -= decr; + UIDINFO_VMSIZE_UNLOCK(uip); +} + static void swapdev_strategy(struct buf *, struct swdevt *sw); #define SWM_FREE 0x02 /* free, period */ @@ -198,7 +321,7 @@ static struct vm_object swap_zone_obj; */ static vm_object_t swap_pager_alloc(void *handle, vm_ooffset_t size, - vm_prot_t prot, vm_ooffset_t offset); + vm_prot_t prot, vm_ooffset_t offset, struct ucred *); static void swap_pager_dealloc(vm_object_t object); static int swap_pager_getpages(vm_object_t, vm_page_t *, int, int); static void swap_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *); @@ -440,13 +563,13 @@ swap_pager_swap_init(void) */ static vm_object_t swap_pager_alloc(void *handle, vm_ooffset_t size, vm_prot_t prot, - vm_ooffset_t offset) + vm_ooffset_t offset, struct ucred *cred) { vm_object_t object; vm_pindex_t pindex; + struct uidinfo *uip; pindex = OFF_TO_IDX(offset + PAGE_MASK + size); - if (handle) { mtx_lock(&Giant); /* @@ -457,21 +580,41 @@ swap_pager_alloc(void *handle, vm_ooffse */ sx_xlock(&sw_alloc_sx); object = vm_pager_object_lookup(NOBJLIST(handle), handle); - if (object == NULL) { + if (cred != NULL) { + uip = cred->cr_ruidinfo; + if (!swap_reserve_by_uid(size, uip)) { + sx_xunlock(&sw_alloc_sx); + mtx_unlock(&Giant); + return (NULL); + } + uihold(uip); + } object = vm_object_allocate(OBJT_DEFAULT, pindex); - object->handle = handle; - VM_OBJECT_LOCK(object); + object->handle = handle; + if (cred != NULL) { + object->uip = uip; + object->charge = size; + } swp_pager_meta_build(object, 0, SWAPBLK_NONE); VM_OBJECT_UNLOCK(object); } sx_xunlock(&sw_alloc_sx); mtx_unlock(&Giant); } else { + if (cred != NULL) { + uip = cred->cr_ruidinfo; + if (!swap_reserve_by_uid(size, uip)) + return (NULL); + uihold(uip); + } object = vm_object_allocate(OBJT_DEFAULT, pindex); - VM_OBJECT_LOCK(object); + if (cred != NULL) { + object->uip = uip; + object->charge = size; + } swp_pager_meta_build(object, 0, SWAPBLK_NONE); VM_OBJECT_UNLOCK(object); } @@ -2039,6 +2182,7 @@ swaponsomething(struct vnode *vp, void * TAILQ_INSERT_TAIL(&swtailq, sp, sw_list); nswapdev++; swap_pager_avail += nblks; + swap_total += (vm_ooffset_t)nblks * PAGE_SIZE; swp_sizecheck(); mtx_unlock(&sw_dev_mtx); } @@ -2143,6 +2287,7 @@ swapoff_one(struct swdevt *sp, struct uc swap_pager_avail -= blist_fill(sp->sw_blist, dvbase, dmmax); } + swap_total -= (vm_ooffset_t)nblks * PAGE_SIZE; mtx_unlock(&sw_dev_mtx); /* Modified: head/sys/vm/vm.h ============================================================================== --- head/sys/vm/vm.h Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/vm/vm.h Tue Jun 23 20:45:22 2009 (r194766) @@ -133,5 +133,12 @@ struct kva_md_info { extern struct kva_md_info kmi; extern void vm_ksubmap_init(struct kva_md_info *); +struct uidinfo; +int swap_reserve(vm_ooffset_t incr); +int swap_reserve_by_uid(vm_ooffset_t incr, struct uidinfo *uip); +void swap_reserve_force(vm_ooffset_t incr); +void swap_release(vm_ooffset_t decr); +void swap_release_by_uid(vm_ooffset_t decr, struct uidinfo *uip); + #endif /* VM_H */ Modified: head/sys/vm/vm_extern.h ============================================================================== --- head/sys/vm/vm_extern.h Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/vm/vm_extern.h Tue Jun 23 20:45:22 2009 (r194766) @@ -63,7 +63,7 @@ void vm_waitproc(struct proc *); int vm_mmap(vm_map_t, vm_offset_t *, vm_size_t, vm_prot_t, vm_prot_t, int, objtype_t, void *, vm_ooffset_t); void vm_set_page_size(void); struct vmspace *vmspace_alloc(vm_offset_t, vm_offset_t); -struct vmspace *vmspace_fork(struct vmspace *); +struct vmspace *vmspace_fork(struct vmspace *, vm_ooffset_t *); int vmspace_exec(struct proc *, vm_offset_t, vm_offset_t); int vmspace_unshare(struct proc *); void vmspace_exit(struct thread *); Modified: head/sys/vm/vm_fault.c ============================================================================== --- head/sys/vm/vm_fault.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/vm/vm_fault.c Tue Jun 23 20:45:22 2009 (r194766) @@ -1163,7 +1163,11 @@ vm_fault_copy_entry(dst_map, src_map, ds VM_OBJECT_LOCK(dst_object); dst_entry->object.vm_object = dst_object; dst_entry->offset = 0; - + if (dst_entry->uip != NULL) { + dst_object->uip = dst_entry->uip; + dst_object->charge = dst_entry->end - dst_entry->start; + dst_entry->uip = NULL; + } prot = dst_entry->max_protection; /* Modified: head/sys/vm/vm_kern.c ============================================================================== --- head/sys/vm/vm_kern.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/vm/vm_kern.c Tue Jun 23 20:45:22 2009 (r194766) @@ -235,7 +235,8 @@ kmem_suballoc(vm_map_t parent, vm_offset *min = vm_map_min(parent); ret = vm_map_find(parent, NULL, 0, min, size, superpage_align ? - VMFS_ALIGNED_SPACE : VMFS_ANY_SPACE, VM_PROT_ALL, VM_PROT_ALL, 0); + VMFS_ALIGNED_SPACE : VMFS_ANY_SPACE, VM_PROT_ALL, VM_PROT_ALL, + MAP_ACC_NO_CHARGE); if (ret != KERN_SUCCESS) panic("kmem_suballoc: bad status return of %d", ret); *max = *min + size; @@ -422,6 +423,8 @@ kmem_alloc_wait(map, size) vm_offset_t addr; size = round_page(size); + if (!swap_reserve(size)) + return (0); for (;;) { /* @@ -434,12 +437,14 @@ kmem_alloc_wait(map, size) /* no space now; see if we can ever get space */ if (vm_map_max(map) - vm_map_min(map) < size) { vm_map_unlock(map); + swap_release(size); return (0); } map->needs_wakeup = TRUE; vm_map_unlock_and_wait(map, 0); } - vm_map_insert(map, NULL, 0, addr, addr + size, VM_PROT_ALL, VM_PROT_ALL, 0); + vm_map_insert(map, NULL, 0, addr, addr + size, VM_PROT_ALL, + VM_PROT_ALL, MAP_ACC_CHARGED); vm_map_unlock(map); return (addr); } Modified: head/sys/vm/vm_map.c ============================================================================== --- head/sys/vm/vm_map.c Tue Jun 23 20:45:12 2009 (r194765) +++ head/sys/vm/vm_map.c Tue Jun 23 20:45:22 2009 (r194766) @@ -149,6 +149,10 @@ static void vm_map_zdtor(void *mem, int static void vmspace_zdtor(void *mem, int size, void *arg); #endif +#define ENTRY_CHARGED(e) ((e)->uip != NULL || \ + ((e)->object.vm_object != NULL && (e)->object.vm_object->uip != NULL && \ + !((e)->eflags & MAP_ENTRY_NEEDS_COPY))) + /* * PROC_VMSPACE_{UN,}LOCK() can be a noop as long as vmspaces are type * stable. @@ -1076,6 +1080,8 @@ vm_map_insert(vm_map_t map, vm_object_t vm_map_entry_t prev_entry; vm_map_entry_t temp_entry; vm_eflags_t protoeflags; + struct uidinfo *uip; + boolean_t charge_prev_obj; VM_MAP_ASSERT_LOCKED(map); @@ -1103,6 +1109,7 @@ vm_map_insert(vm_map_t map, vm_object_t return (KERN_NO_SPACE); protoeflags = 0; + charge_prev_obj = FALSE; if (cow & MAP_COPY_ON_WRITE) protoeflags |= MAP_ENTRY_COW|MAP_ENTRY_NEEDS_COPY; @@ -1118,6 +1125,27 @@ vm_map_insert(vm_map_t map, vm_object_t if (cow & MAP_DISABLE_COREDUMP) protoeflags |= MAP_ENTRY_NOCOREDUMP; + uip = NULL; + KASSERT((object != kmem_object && object != kernel_object) || + ((object == kmem_object || object == kernel_object) && + !(protoeflags & MAP_ENTRY_NEEDS_COPY)), + ("kmem or kernel object and cow")); + if (cow & (MAP_ACC_NO_CHARGE | MAP_NOFAULT)) + goto charged; + if ((cow & MAP_ACC_CHARGED) || ((prot & VM_PROT_WRITE) && + ((protoeflags & MAP_ENTRY_NEEDS_COPY) || object == NULL))) { + if (!(cow & MAP_ACC_CHARGED) && !swap_reserve(end - start)) + return (KERN_RESOURCE_SHORTAGE); + KASSERT(object == NULL || (cow & MAP_ENTRY_NEEDS_COPY) || + object->uip == NULL, + ("OVERCOMMIT: vm_map_insert o %p", object)); + uip = curthread->td_ucred->cr_ruidinfo; + uihold(uip); + if (object == NULL && !(protoeflags & MAP_ENTRY_NEEDS_COPY)) + charge_prev_obj = TRUE; + } + +charged: if (object != NULL) { /* * OBJ_ONEMAPPING must be cleared unless this mapping @@ -1135,11 +1163,13 @@ vm_map_insert(vm_map_t map, vm_object_t (prev_entry->eflags == protoeflags) && (prev_entry->end == start) && (prev_entry->wired_count == 0) && - ((prev_entry->object.vm_object == NULL) || - vm_object_coalesce(prev_entry->object.vm_object, - prev_entry->offset, - (vm_size_t)(prev_entry->end - prev_entry->start), - (vm_size_t)(end - prev_entry->end)))) { + (prev_entry->uip == uip || + (prev_entry->object.vm_object != NULL && + (prev_entry->object.vm_object->uip == uip))) && + vm_object_coalesce(prev_entry->object.vm_object, + prev_entry->offset, + (vm_size_t)(prev_entry->end - prev_entry->start), + (vm_size_t)(end - prev_entry->end), charge_prev_obj)) { /* * We were able to extend the object. Determine if we * can extend the previous map entry to include the @@ -1152,6 +1182,8 @@ vm_map_insert(vm_map_t map, vm_object_t prev_entry->end = end; vm_map_entry_resize_free(map, prev_entry); vm_map_simplify_entry(map, prev_entry); + if (uip != NULL) + uifree(uip); return (KERN_SUCCESS); } @@ -1165,6 +1197,12 @@ vm_map_insert(vm_map_t map, vm_object_t offset = prev_entry->offset + (prev_entry->end - prev_entry->start); vm_object_reference(object); + if (uip != NULL && object != NULL && object->uip != NULL && + !(prev_entry->eflags & MAP_ENTRY_NEEDS_COPY)) { + /* Object already accounts for this uid. */ + uifree(uip); + uip = NULL; + } } /* @@ -1179,6 +1217,7 @@ vm_map_insert(vm_map_t map, vm_object_t new_entry = vm_map_entry_create(map); new_entry->start = start; new_entry->end = end; + new_entry->uip = NULL; new_entry->eflags = protoeflags; new_entry->object.vm_object = object; @@ -1190,6 +1229,10 @@ vm_map_insert(vm_map_t map, vm_object_t new_entry->max_protection = max; new_entry->wired_count = 0; + KASSERT(uip == NULL || !ENTRY_CHARGED(new_entry), + ("OVERCOMMIT: vm_map_insert leaks vm_map %p", new_entry)); + new_entry->uip = uip; + /* * Insert the new entry into the list */ @@ -1398,7 +1441,8 @@ vm_map_simplify_entry(vm_map_t map, vm_m (prev->protection == entry->protection) && (prev->max_protection == entry->max_protection) && (prev->inheritance == entry->inheritance) && - (prev->wired_count == entry->wired_count)) { + (prev->wired_count == entry->wired_count) && + (prev->uip == entry->uip)) { vm_map_entry_unlink(map, prev); entry->start = prev->start; entry->offset = prev->offset; @@ -1416,6 +1460,8 @@ vm_map_simplify_entry(vm_map_t map, vm_m */ if (prev->object.vm_object) vm_object_deallocate(prev->object.vm_object); + if (prev->uip != NULL) + uifree(prev->uip); vm_map_entry_dispose(map, prev); } } @@ -1431,7 +1477,8 @@ vm_map_simplify_entry(vm_map_t map, vm_m (next->protection == entry->protection) && (next->max_protection == entry->max_protection) && (next->inheritance == entry->inheritance) && - (next->wired_count == entry->wired_count)) { + (next->wired_count == entry->wired_count) && + (next->uip == entry->uip)) { vm_map_entry_unlink(map, next); entry->end = next->end; vm_map_entry_resize_free(map, entry); @@ -1441,6 +1488,8 @@ vm_map_simplify_entry(vm_map_t map, vm_m */ if (next->object.vm_object) vm_object_deallocate(next->object.vm_object); + if (next->uip != NULL) + uifree(next->uip); vm_map_entry_dispose(map, next); } } @@ -1489,6 +1538,21 @@ _vm_map_clip_start(vm_map_t map, vm_map_ atop(entry->end - entry->start)); entry->object.vm_object = object; entry->offset = 0; + if (entry->uip != NULL) { + object->uip = entry->uip; + object->charge = entry->end - entry->start; + entry->uip = NULL; + } + } else if (entry->object.vm_object != NULL && + ((entry->eflags & MAP_ENTRY_NEEDS_COPY) == 0) && + entry->uip != NULL) { + VM_OBJECT_LOCK(entry->object.vm_object); + KASSERT(entry->object.vm_object->uip == NULL, + ("OVERCOMMIT: vm_entry_clip_start: both uip e %p", entry)); + entry->object.vm_object->uip = entry->uip; + entry->object.vm_object->charge = entry->end - entry->start; + VM_OBJECT_UNLOCK(entry->object.vm_object); + entry->uip = NULL; } new_entry = vm_map_entry_create(map); @@ -1497,6 +1561,8 @@ _vm_map_clip_start(vm_map_t map, vm_map_ new_entry->end = start; entry->offset += (start - entry->start); entry->start = start; + if (new_entry->uip != NULL) + uihold(entry->uip); vm_map_entry_link(map, entry->prev, new_entry); @@ -1542,6 +1608,21 @@ _vm_map_clip_end(vm_map_t map, vm_map_en atop(entry->end - entry->start)); entry->object.vm_object = object; entry->offset = 0; + if (entry->uip != NULL) { + object->uip = entry->uip; + object->charge = entry->end - entry->start; + entry->uip = NULL; + } + } else if (entry->object.vm_object != NULL && + ((entry->eflags & MAP_ENTRY_NEEDS_COPY) == 0) && + entry->uip != NULL) { + VM_OBJECT_LOCK(entry->object.vm_object); + KASSERT(entry->object.vm_object->uip == NULL, + ("OVERCOMMIT: vm_entry_clip_end: both uip e %p", entry)); + entry->object.vm_object->uip = entry->uip; + entry->object.vm_object->charge = entry->end - entry->start; + VM_OBJECT_UNLOCK(entry->object.vm_object); + entry->uip = NULL; } /* @@ -1552,6 +1633,8 @@ _vm_map_clip_end(vm_map_t map, vm_map_en new_entry->start = entry->end = end; new_entry->offset += (end - entry->start); + if (new_entry->uip != NULL) + uihold(entry->uip); vm_map_entry_link(map, entry, new_entry); @@ -1724,6 +1807,8 @@ vm_map_protect(vm_map_t map, vm_offset_t { vm_map_entry_t current; vm_map_entry_t entry; + vm_object_t obj; + struct uidinfo *uip; vm_map_lock(map); @@ -1751,6 +1836,61 @@ vm_map_protect(vm_map_t map, vm_offset_t current = current->next; } + + /* + * Do an accounting pass for private read-only mappings that + * now will do cow due to allowed write (e.g. debugger sets *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***