From owner-freebsd-current@FreeBSD.ORG Thu Feb 7 21:39:16 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8FB3F16A418 for ; Thu, 7 Feb 2008 21:39:16 +0000 (UTC) (envelope-from jasone@freebsd.org) Received: from canonware.com (canonware.com [64.183.146.166]) by mx1.freebsd.org (Postfix) with ESMTP id 6D93A13C458 for ; Thu, 7 Feb 2008 21:39:16 +0000 (UTC) (envelope-from jasone@freebsd.org) Received: from [192.168.168.201] (unknown [192.168.168.201]) by canonware.com (Postfix) with ESMTP id 2110E128F37 for ; Thu, 7 Feb 2008 13:41:46 -0800 (PST) Message-ID: <47AB7A8A.9020906@freebsd.org> Date: Thu, 07 Feb 2008 13:39:22 -0800 From: Jason Evans User-Agent: Thunderbird 1.5.0.12 (X11/20071129) MIME-Version: 1.0 To: freebsd-current@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: MALLOC_OPTIONS=H obsolete X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Feb 2008 21:39:16 -0000 I've been working on jemalloc a bunch lately, as a result of working with the Mozilla folks to integrate it with Firefox. One of the problems we ran into is that on Windows, there is no good way to tell the resident set size of an individual application, unless we "decommit" unused pages. I won't go into the details of Windows memory management, but suffice it to say that the solution to this problem is also a reasonable solution to the problem of expensive madvise(... MADV_FREE) calls on FreeBSD. I recently committed code to FreeBSD that tracks whether each page within a mapped chunk is unused and dirty. If the number of such dirty pages exceeds a threshold, jemalloc sweeps downward through memory and calls madvise() on enough dirty pages to drop the dirty page count to no more than half of the threshold value. The default threshold value is currently 512 pages per arena (2 MB), but it can be tuned via MALLOC_OPTIONS=F or f. See the malloc(3) man page for details. By sweeping downward through memory, jemalloc tends to call madvise() on pages that are less likely to be reused soon. Also, by delaying the madvise() calls, unused pages tend to coalesce, thus reducing the total number of calls. Following are some statistics from a contrived test (repeatedly opening and closing a 36 MB file within vim): =================================================================== dirty: 119 pages dirty, 45 sweeps, 117 madvises, 20479 pages purged allocated nmalloc ndalloc small: 428216 64195 53915 large: 188416 41419 41404 total: 616632 105614 95319 =================================================================== I've been seeing madvise():pages purged ratios of 1:100+ for the tests I've run, so this mechanism appears to typically be pretty cheap. Anyway, the reason I think this change to jemalloc matters is that it inexpensively puts pretty reasonable bounds on how much dirty unused memory the entire OS (all processes) has lying around, without requiring any complex interactions with the kernel. I've been thinking a lot about this problem since the discussion here last month (see "sbrk(2) broken" thread), and the 100% solutions like receiving notifications from the kernel are in my opinion prohibitively complex in the context of multi-threaded applications. Jason