From owner-cvs-all@FreeBSD.ORG Sun May 29 13:38:08 2005 Return-Path: X-Original-To: cvs-all@FreeBSD.org Delivered-To: cvs-all@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 16D5116A41C; Sun, 29 May 2005 13:38:08 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from repoman.freebsd.org (repoman.freebsd.org [216.136.204.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id E0BB943D1D; Sun, 29 May 2005 13:38:07 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from repoman.freebsd.org (localhost [127.0.0.1]) by repoman.freebsd.org (8.13.1/8.13.1) with ESMTP id j4TDc7XF001598; Sun, 29 May 2005 13:38:07 GMT (envelope-from rwatson@repoman.freebsd.org) Received: (from rwatson@localhost) by repoman.freebsd.org (8.13.1/8.13.1/Submit) id j4TDc7FT001597; Sun, 29 May 2005 13:38:07 GMT (envelope-from rwatson) Message-Id: <200505291338.j4TDc7FT001597@repoman.freebsd.org> From: Robert Watson Date: Sun, 29 May 2005 13:38:07 +0000 (UTC) To: src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org X-FreeBSD-CVS-Branch: HEAD Cc: Subject: cvs commit: src/sys/sys malloc.h src/sys/kern kern_malloc.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 May 2005 13:38:08 -0000 rwatson 2005-05-29 13:38:07 UTC FreeBSD src repository Modified files: sys/sys malloc.h sys/kern kern_malloc.c Log: Kernel malloc layers malloc_type allocation over one of two underlying allocators: a set of power-of-two UMA zones for small allocations, and the VM page allocator for large allocations. In order to maintain unified statistics for specific malloc types, kernel malloc maintains a separate per-type statistics pool, which can be monitored using vmstat -m. Prior to this commit, each pool of per-type statistics was protected using a per-type mutex associated with the malloc type. This change modifies kernel malloc to maintain per-CPU statistics pools for each malloc type, and protects writing those statistics using critical sections. It also moves to unsynchronized reads of per-CPU statistics when generating coalesced statistics. To do this, several changes are implemented: - In the previous world order, the statistics memory was allocated by the owner of the malloc type structure, allocated statically using MALLOC_DEFINE(). This embedded the definition of the malloc_type structure into all kernel modules. Move to a model in which a pointer within struct malloc_type points at a UMA-allocated malloc_type_internal data structure owned and maintained by kern_malloc.c, and not part of the exported ABI/API to the rest of the kernel. For the purposes of easing a possible MFC, re-use an existing pointer in 'struct malloc_type', and maintain the current malloc_type structure size, as well as layout with respect to the fields reused outside of the malloc subsystem (such as ks_shortdesc). There are several unused fields as a result of no longer requiring the mutex in malloc_type. - Struct malloc_type_internal contains an array of malloc_type_stats, of size MAXCPU. The structure defined above avoids hard-coding a kernel compile-time value of MAXCPU into kernel modules that interact with malloc. - When accessing per-cpu statistics for a malloc type, surround read - modify - update requests with critical_enter()/critical_exit() in order to avoid races during write. The per-CPU fields are written only from the CPU that owns them. - Per-CPU stats now maintained "allocated" and "freed" counters for number of allocations/frees and bytes allocated/freed, since there is no longer a coherent global notion of the totals. When coalescing malloc stats, accept a slight race between reading stats across CPUs, and avoid showing the user a negative allocation count for the type in the event of a race. The global high watermark is no longer maintained for a malloc type, as there is no global notion of the number of allocations. - While tearing up the sysctl() path, also switch to using sbufs. The current "export as text" sysctl format is retained with the same syntax. We may want to change this in the future to export more per-CPU information, such as how allocations and frees are balanced across CPUs. This change results in a substantial speedup of kernel malloc and free paths on SMP, as critical sections (where usable) out-perform mutexes due to avoiding atomic/bus-locked operations. There is also a minor improvement on UP due to the slightly lower cost of critical sections there. The cost of the change to this approach is the loss of a continuous notion of total allocations that can be exploited to track per-type high watermarks, as well as increased complexity when monitoring statistics. Due to carefully avoiding changing the ABI, as well as hardening the ABI against future changes, it is not necessary to recompile kernel modules for this change. However, MFC'ing this change to RELENG_5 will require also MFC'ing optimizations for soft critical sections, which may modify exposed kernel ABIs. The internal malloc API is changed, and modifications to vmstat in order to restore "vmstat -m" on core dumps will follow shortly. Several improvements from: bde Statistics approach discussed with: ups Tested by: scottl, others Revision Changes Path 1.140 +159 -130 src/sys/kern/kern_malloc.c 1.79 +72 -16 src/sys/sys/malloc.h