From owner-freebsd-current@FreeBSD.ORG Fri Dec 23 20:07:39 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 28B9016A41F; Fri, 23 Dec 2005 20:07:39 +0000 (GMT) (envelope-from jasone@freebsd.org) Received: from lh.synack.net (lh.synack.net [204.152.188.37]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8251743D6A; Fri, 23 Dec 2005 20:07:36 +0000 (GMT) (envelope-from jasone@freebsd.org) Received: by lh.synack.net (Postfix, from userid 100) id 5B8755E48B4; Fri, 23 Dec 2005 12:07:36 -0800 (PST) Received: from [192.168.168.203] (moscow-cuda-gen2-68-64-60-20.losaca.adelphia.net [68.64.60.20]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by lh.synack.net (Postfix) with ESMTP id 53BB55E48B4; Fri, 23 Dec 2005 12:07:35 -0800 (PST) In-Reply-To: <43ABD158.8080802@freebsd.org> References: <26940.1135332727@critter.freebsd.dk> <43ABD158.8080802@freebsd.org> Mime-Version: 1.0 (Apple Message framework v746.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <8326DD7F-9576-4D1F-8B3B-5FCD1BE135CC@freebsd.org> Content-Transfer-Encoding: quoted-printable From: Jason Evans Date: Fri, 23 Dec 2005 12:07:32 -0800 To: David Xu X-Mailer: Apple Mail (2.746.2) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on lh.synack.net X-Spam-Level: * X-Spam-Status: No, score=1.8 required=5.0 tests=RCVD_IN_NJABL_DUL, RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: Poul-Henning Kamp , freebsd-current@freebsd.org Subject: Re: New malloc ready, take 42 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Dec 2005 20:07:39 -0000 On Dec 23, 2005, at 2:28 AM, David Xu wrote: > I know what '>' does in phkmalloc. I found 'Q', he replaced '>' with > 'Q', this is really strange to me. ;-) Actually, the closest analog to phkmalloc's '>' and '<' are 'C' and =20 'c'. However, they don't have quite the same meaning, so I thought =20 that changing the designators was appropriate. Here's a snippet from =20= the man page about some of the performance tuning flags supported by =20 jemalloc: C Increase/decrease the size of the cache by a factor of two. The default cache size is 256 objects for each arena. This option can be specified multiple times. N Increase/decrease the number of arenas by a factor of two. The default number of arenas is twice the number of CPUs, or one if there is a single CPU. This option can be specified multiple times. Q Increase/decrease the size of the allocation quantum by a factor of two. The default quantum is the minimum allowed by the archi- tecture (typically 8 or 16 bytes). This option can be specified multiple times. The implications of each of these flags is described in some detail =20 later in the man page: This allocator uses multiple arenas in order to reduce lock contention for threaded programs on multi-processor systems. This works well =20 with regard to threading scalability, but incurs some costs. There is a =20= small fixed per-arena overhead, and additionally, arenas manage memory com- pletely independently of each other, which means a small fixed =20 increase in overall memory fragmentation. These overheads aren't generally an issue, given the number of arenas normally used. Note that using sub- stantially more arenas than the default is not likely to improve =20 perfor- mance, mainly due to reduced cache performance. However, it may make sense to reduce the number of arenas if an application does not =20 make much use of the allocation functions. This allocator uses a novel approach to object caching. For objects below a size threshold (use the ``P'' option to discover the =20 threshold), full deallocation and attempted coalescence with adjacent memory =20 regions are delayed. This is so that if the application requests an =20 allocation of that size soon thereafter, the request can be met much more =20 quickly. Most applications heavily use a small number of object sizes, so this caching has the potential to have a large positive performance impact. However, the effectiveness of the cache depends on the cache being =20 large enough to absorb typical fluctuations in the number of allocated =20 objects. If an application routinely fluctuates by thousands of objects, =20 then it may make sense to increase the size of the cache. Conversely, if an application's memory usage fluctuates very little, it may make =20 sense to reduce the size of the cache, so that unused regions can be coalesced sooner. This allocator is very aggressive about tightly packing objects in =20 mem- ory, even for objects much larger than the system page size. For pro- grams that allocate objects larger than half the system page size, =20 this has the potential to reduce memory footprint in comparison to other =20= allo- cators. However, it has some side effects that are important to =20 keep in mind. First, even multi-page objects can start at non-page-aligned addresses, since the implementation only guarantees quantum alignment. Second, this tight packing of objects can cause objects to share L1 =20= cache lines, which can be a performance issue for multi-threaded =20 applications. There are two ways to approach these issues. First, p=08osix_memalign()= provides the ability to align allocations as needed. By aligning an allocation to at least the L1 cache line size, and padding the =20 allocation request by one L1 cache line unit, the programmer can rest assured =20 that no cache line sharing will occur for the object. Second, the ``Q'' =20 option can be used to force all allocations to be aligned with the L1 cache lines. This approach should be used with care though, because =20 although easy to implement, it means that all allocations must be at least as large as the quantum, which can cause severe internal fragmentation if the application allocates many small objects. Jason=