From owner-freebsd-arch  Wed Feb 27  9:58:59 2002
Delivered-To: freebsd-arch@freebsd.org
Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50])
	by hub.freebsd.org (Postfix) with ESMTP id 1128D37B405
	for <arch@freebsd.org>; Wed, 27 Feb 2002 09:58:50 -0800 (PST)
Received: from pool0329.cvx22-bradley.dialup.earthlink.net ([209.179.199.74] helo=mindspring.com)
	by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16g8Lo-0005ad-00; Wed, 27 Feb 2002 09:58:48 -0800
Message-ID: <3C7D1E31.B13915E7@mindspring.com>
Date: Wed, 27 Feb 2002 09:58:09 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Jeff Roberson <jroberson@chesapeake.net>
Cc: arch@freebsd.org
Subject: Re: Slab allocator
References: <20020227005915.C17591-100000@mail.chesapeake.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-arch.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-arch>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-arch>
X-Loop: FreeBSD.ORG

First, let me say OUTSTANDING WORK!

Jeff Roberson wrote:
> There are also per cpu queues of items, with a per cpu lock.  This allows
> for very effecient allocation, and also it provides near linear
> performance as the number of cpus increase.  I do still depend on giant to
> talk to the back end page supplier (kmem_alloc, etc.).  Once the VM is
> locked the allocator will not require giant at all.

What is the per-CPU lock required for?  I think it can be
gotten rid of, or at least taken out of the critical path,
with more information.


> I would eventually like to pull other allocators into uma (The slab
> allocator).  We could get rid of some of the kernel submaps and provide a
> much more dynamic amount of various resources.  Something I had in mind
> were pbufs and mbufs, which could easily come from uma.  This gives us the
> ability to redistribute memory to wherever it is needed, and not lock it
> in a particular place once it's there.

How do you handle interrupt-time allocation of mbufs, in
this case?  The zalloci() handles this by pre-creation of
the PTE's for the page mapping in the KVA, and then only
has to deal with grabbing free physical pages to back them,
which is a non-blocking operation that can occur at interrupt,
and which, if it fails, is not fatal (i.e. it's handled; I've
considered doing the same for the page mapping and PTE's, but
that would make the time-to-run far less deterministic).

> There are a few things that need to be fixed right now.  For one, the zone
> statistics don't reflect the items that are in the per cpu queues.  I'm
> thinking about clean ways to collect this without locking every zone and
> per cpu queue when some one calls sysctl.

The easy way around this is to say that these values are
snpashots.  So you maintain the figures of merit on a per
CPU basis in the context of the CPU doing the allocations
and deallocations, and treat it as read-only for the
purposes of statistics reporting.  This means that you
don't need locks to get the statistics.  For debugging,
you could provide a rigid locked interface (e.g. by only
enabling locking for the statistics gathering via a sysctl
that defaults to "off").


> The other problem is with the per cpu buckets.  They are a
> fixed size right now.  I need to define several zones for
> the buckets to come from and a way to manage growing/shrinking
> the buckets.

I built a "chain" allocator that dealt with this issue, and
also the object granularit issue.  Basically, it calculated
the LCM of the object size rounded to a MAX(sizeof(long),8)
boundary for processor alignment sensitivity reasons, and
the page size (also for processor sensitivity reasons), and
then allocated a contiguous region from which it obtained
objects of that type.  All in all, it meant zero unnecessary
space wastage (for 1,000,000 TCP connections, the savings
were 1/4 of a Gigabyte for one zone alone).

> There are two things that I would really like comments on.
> 
> 1) Should I keep the uma_ prefixes on exported functions/types.

Think of an acceptable acronym and use that; if UMA is
maningful, it's as good as any.  The real issue is to be
able to rip out the old code, and see where the bleeders
are so that the switchover can be as painless as possible.

> 2) How much of the malloc_type stats should I keep?  They either require
> atomic ops or a lock in their current state.  Also, non power of two
> malloc sizes breaks their usage tracking.

See above for the locks; I think they are unnecessary,
unless you are debugging, and arguably unnecessary then,
unless the lock is global to all CPUs.

For the power of two stats, we may lose them, but we gain
a higher granularity on zone identification for objects
that right now get rounded into the same zone.  I think
that's an acceptable trade-off, if not a net win.


> 3) Should I rename the files to vm_zone.c vm_zone.h, etc?

This should be last, I think.


And thanks again for the most excellent work!

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message