From owner-freebsd-hackers@FreeBSD.ORG  Sun Oct  2 14:00:28 2011
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2AA5610656B5;
	Sun,  2 Oct 2011 14:00:28 +0000 (UTC)
	(envelope-from davide.italiano@gmail.com)
Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com
	[209.85.212.54])
	by mx1.freebsd.org (Postfix) with ESMTP id BF1748FC1D;
	Sun,  2 Oct 2011 14:00:27 +0000 (UTC)
Received: by vws11 with SMTP id 11so3192881vws.13
	for <multiple recipients>; Sun, 02 Oct 2011 07:00:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=BI9TNsP56M68bxOorp32Bs2FqLcER0wbw7/Lx6NtZ5k=;
	b=hMQasSiPGMVZxDxibR8KYUeouFKxL84ScbqtVS1jvEo8YDSAle0HPNhN31I3Z7wrbv
	mpsUosmHCBbF9CbIPTbOG2HtOlFeMFsfPIgiZr+GHBPWW/ezNgEMzyiKV4NEL17gS/6y
	eaIvRRSnDS2Ss3qRsFc0mtHyWiZkFoUX8xXdM=
MIME-Version: 1.0
Received: by 10.52.75.195 with SMTP id e3mr12655231vdw.299.1317564026836; Sun,
	02 Oct 2011 07:00:26 -0700 (PDT)
Received: by 10.52.179.228 with HTTP; Sun, 2 Oct 2011 07:00:26 -0700 (PDT)
In-Reply-To: <1393358703.20111002174545@serebryakov.spb.ru>
References: <358651269.20111002162109@serebryakov.spb.ru>
	<CACYV=-FNM-3fcYzFGc9eFajdoBmG1E-rWo6tq-OwBefGPADywA@mail.gmail.com>
	<1393358703.20111002174545@serebryakov.spb.ru>
Date: Sun, 2 Oct 2011 16:00:26 +0200
Message-ID: <CACYV=-EA4nG2MG73hBitgtoRQEk8d2CzwzzAf+bceHcqOJHuiw@mail.gmail.com>
From: Davide Italiano <davide.italiano@gmail.com>
To: lev@freebsd.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-hackers@freebsd.org
Subject: Re: Memory allocation in kernel -- what to use in which situation?
 What is the best for page-sized allocations?
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Oct 2011 14:00:28 -0000

2011/10/2 Lev Serebryakov <lev@freebsd.org>:
> Hello, Davide.
> You wrote 2 =D0=BE=D0=BA=D1=82=D1=8F=D0=B1=D1=80=D1=8F 2011 =D0=B3., 16:5=
7:48:
>
>>> =C2=A0 But what if I need to allocate a lot (say, 16K-32K) of page-size=
d
>>> blocks? Not in one chunk, for sure, but in lifetime of my kernel
>>> module. Which allocator should I use? It seems, the best one will be
>>> very low-level only-page-sized allocator. Is here any in kernel?
>
>> My 2cents:
>> Everytime you request a certain amount of memory bigger than 4KB using
>> kernel malloc(), it results in a direct call to uma_large_malloc().
>> Right now, uma_large_malloc() calls kmem_malloc() (i.e. the memory is
>> requested to the VM directly).
>> This kind of approach has two main drawbacks:
>> 1) it heavily fragments the kernel heap
>> 2) when free() is called on these multipage chunks, it in turn calls
>> uma_large_free(), which immediately calls the VM system to unmap and
>> free the chunk of memory. =C2=A0The unmapping requires a system-wide TLB
>> shootdown, i.e. a global action by every processor in the system.
>
>> I'm currently working supervised by alc@ to an intermediate layer that
>> sits between UMA and the VM, which goal is satisfyinh efficiently
> requests >> 4KB (so, the one you want considering you're asking for
>> 16KB-32KB), but the work is in an early stage.
> =C2=A0I was not very clear here. I'm saying about page-sized blocks, but
> =C2=A0many of them. 16K-32K is not a size in bytes, but count of page-siz=
ed
> =C2=A0blocks my code needs :)
>
ok.

> =C2=A0BTW, I/O is often require big buffers, up to MAXPHYS (128KiB for
> =C2=A0now), do you mean, that any allocation of such memory has
> =C2=A0considerable performance penalties, especially on multi-core and
> =C2=A0multi-CPU systems?
>

In fact, the main client of such kind of allocations is the ZFS
filesystem (this is due to its mechanism of adaptative cache
replacement, ARC). Afaik, at the time in which UMA was written, such
kind of allocations you describe were so infrequent that no initial
effort was made in order to optimize them.
People tried to address this issue by having ZFS create a large number
of UMA zones for large allocations of different sizes. Unfortunately,
one of the side-effects of this approach was the growth of the
fragmentation, so we're investigating about.

> --
> // Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>
>
>