From owner-freebsd-hackers@FreeBSD.ORG Sun Oct 2 19:32:01 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A052C1065670; Sun, 2 Oct 2011 19:32:01 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 365B58FC18; Sun, 2 Oct 2011 19:32:00 +0000 (UTC) Received: by gyf2 with SMTP id 2so3598954gyf.13 for ; Sun, 02 Oct 2011 12:32:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=aHT4/vpOTY3msQbvI1HAJw/7Js1fgLv78BCO+8UlOpY=; b=TSsVeLlDCDCko/pHXsvz8nCOAwmK7U6Tt1VWlakgJ1Tu3ZJxuStHbFcoVqyb6d9WMP VID1mkaJZFrTUNxUw3ez1ipRJknxGKSNMZLTvwhXk8LtbbOE7/6Hj9TES31YJWEYP9k1 osj6cMchlpQL7INiEGF2EK3SrRG3cIMQll4NQ= MIME-Version: 1.0 Received: by 10.68.19.196 with SMTP id h4mr75595483pbe.39.1317582114935; Sun, 02 Oct 2011 12:01:54 -0700 (PDT) Received: by 10.142.166.3 with HTTP; Sun, 2 Oct 2011 12:01:54 -0700 (PDT) In-Reply-To: References: <358651269.20111002162109@serebryakov.spb.ru> Date: Sun, 2 Oct 2011 14:01:54 -0500 Message-ID: From: Alan Cox To: mdf@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-hackers@freebsd.org, lev@freebsd.org Subject: Re: Memory allocation in kernel -- what to use in which situation? What is the best for page-sized allocations? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Oct 2011 19:32:01 -0000 On Sun, Oct 2, 2011 at 1:21 PM, wrote: > 2011/10/2 Lev Serebryakov : > > Hello, Freebsd-hackers. > > > > Here are several memory-allocation mechanisms in the kernel. The two > > I'm aware of is MALLOC_DEFINE()/malloc()/free() and uma_* (zone(9)). > > > > As far as I understand, malloc() is general-purpose, but it has > > fixed "transaction cost" (in term of memory consumption) for each > > block allocated, and is not very suitable for allocation of many small > > blocks, as lots of memory will be wasted for bookkeeping. > > > > zone(9) allocator, on other hand, have very low cost of each > > allocated block, but could allocate only pre-configured fixed-size > > blocks, and ideal for allocation tons of small objects (and provide > > API for reusing them, too!). > > > > Am I right? > > No one has quite answered this question, IMO, so here's my 2 cents. > > malloc(9) on smaller sizes (<= PAGE_SIZE) uses uma(9) under the > covers. There are a set of uma zones for 16, 32, 64, 128, ... > PAGE_SIZE bytes and malloc(9) looks up the malloc size in a small > array to determine which uma zone to allocate from. > > So malloc(9) on small sizes doesn't have overhead of bookkeeping, but > it does have overhead of rounding to the next highest malloc uma > bucket. At $WORK we found, for example, that 48 bytes and 96 bytes > were very common sizes and so I added uma zones there (and few other > odd sies determined by using the malloc statistics option). > > > But what if I need to allocate a lot (say, 16K-32K) of page-sized > > blocks? Not in one chunk, for sure, but in lifetime of my kernel > > module. Which allocator should I use? It seems, the best one will be > > very low-level only-page-sized allocator. Is here any in kernel? > > 4k allocations, as has been pointed out, get a single kernel page in > both the virtual space and physical space. They (like all the large > allocations) use a field in the vm_page for the physical page backing > the virtual address to record info about the allocation. > > Any allocation PAGE_SIZE and larger will round up to the next multiple > of pages and allocate whole pages. IMO the problems here are (1) as > was pointed out, TLB shootdown on free(9), and (2) the current > algorithm for finding space in a kmem_map is a linear search and > doesn't track where there are fragmented chunks, so it's not terribly > efficient when finding larger sies, and the PAGE_SIZE allocations will > not fill in fragmented areas. > > Regarding #2, no, it is not linear; it is an amortized logarithmic first fit. Every node in every vm map, including the kmem map, is augmented with free space information. This is used by the first fit traversal to skip entire subtrees that contain insufficient space. Regards, Alan