From owner-freebsd-arch Wed Feb 27 9:58:59 2002 Delivered-To: freebsd-arch@freebsd.org Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50]) by hub.freebsd.org (Postfix) with ESMTP id 1128D37B405 for ; Wed, 27 Feb 2002 09:58:50 -0800 (PST) Received: from pool0329.cvx22-bradley.dialup.earthlink.net ([209.179.199.74] helo=mindspring.com) by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16g8Lo-0005ad-00; Wed, 27 Feb 2002 09:58:48 -0800 Message-ID: <3C7D1E31.B13915E7@mindspring.com> Date: Wed, 27 Feb 2002 09:58:09 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Roberson Cc: arch@freebsd.org Subject: Re: Slab allocator References: <20020227005915.C17591-100000@mail.chesapeake.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG First, let me say OUTSTANDING WORK! Jeff Roberson wrote: > There are also per cpu queues of items, with a per cpu lock. This allows > for very effecient allocation, and also it provides near linear > performance as the number of cpus increase. I do still depend on giant to > talk to the back end page supplier (kmem_alloc, etc.). Once the VM is > locked the allocator will not require giant at all. What is the per-CPU lock required for? I think it can be gotten rid of, or at least taken out of the critical path, with more information. > I would eventually like to pull other allocators into uma (The slab > allocator). We could get rid of some of the kernel submaps and provide a > much more dynamic amount of various resources. Something I had in mind > were pbufs and mbufs, which could easily come from uma. This gives us the > ability to redistribute memory to wherever it is needed, and not lock it > in a particular place once it's there. How do you handle interrupt-time allocation of mbufs, in this case? The zalloci() handles this by pre-creation of the PTE's for the page mapping in the KVA, and then only has to deal with grabbing free physical pages to back them, which is a non-blocking operation that can occur at interrupt, and which, if it fails, is not fatal (i.e. it's handled; I've considered doing the same for the page mapping and PTE's, but that would make the time-to-run far less deterministic). > There are a few things that need to be fixed right now. For one, the zone > statistics don't reflect the items that are in the per cpu queues. I'm > thinking about clean ways to collect this without locking every zone and > per cpu queue when some one calls sysctl. The easy way around this is to say that these values are snpashots. So you maintain the figures of merit on a per CPU basis in the context of the CPU doing the allocations and deallocations, and treat it as read-only for the purposes of statistics reporting. This means that you don't need locks to get the statistics. For debugging, you could provide a rigid locked interface (e.g. by only enabling locking for the statistics gathering via a sysctl that defaults to "off"). > The other problem is with the per cpu buckets. They are a > fixed size right now. I need to define several zones for > the buckets to come from and a way to manage growing/shrinking > the buckets. I built a "chain" allocator that dealt with this issue, and also the object granularit issue. Basically, it calculated the LCM of the object size rounded to a MAX(sizeof(long),8) boundary for processor alignment sensitivity reasons, and the page size (also for processor sensitivity reasons), and then allocated a contiguous region from which it obtained objects of that type. All in all, it meant zero unnecessary space wastage (for 1,000,000 TCP connections, the savings were 1/4 of a Gigabyte for one zone alone). > There are two things that I would really like comments on. > > 1) Should I keep the uma_ prefixes on exported functions/types. Think of an acceptable acronym and use that; if UMA is maningful, it's as good as any. The real issue is to be able to rip out the old code, and see where the bleeders are so that the switchover can be as painless as possible. > 2) How much of the malloc_type stats should I keep? They either require > atomic ops or a lock in their current state. Also, non power of two > malloc sizes breaks their usage tracking. See above for the locks; I think they are unnecessary, unless you are debugging, and arguably unnecessary then, unless the lock is global to all CPUs. For the power of two stats, we may lose them, but we gain a higher granularity on zone identification for objects that right now get rounded into the same zone. I think that's an acceptable trade-off, if not a net win. > 3) Should I rename the files to vm_zone.c vm_zone.h, etc? This should be last, I think. And thanks again for the most excellent work! -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message