From owner-freebsd-hackers@FreeBSD.ORG Sat Feb 2 09:59:45 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4584C16A419; Sat, 2 Feb 2008 09:59:45 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 18B4F13C43E; Sat, 2 Feb 2008 09:59:45 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 760DF4A06C; Sat, 2 Feb 2008 04:59:44 -0500 (EST) Date: Sat, 2 Feb 2008 09:59:44 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Alexander Motin In-Reply-To: <47A43873.40801@FreeBSD.org> Message-ID: <20080202095658.R63379@fledge.watson.org> References: <47A25412.3010301@FreeBSD.org> <47A25A0D.2080508@elischer.org> <47A2C2A2.5040109@FreeBSD.org> <20080201185435.X88034@fledge.watson.org> <47A43873.40801@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org, Julian Elischer Subject: Re: Memory allocation performance X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Feb 2008 09:59:45 -0000 On Sat, 2 Feb 2008, Alexander Motin wrote: > Robert Watson wrote: >> I guess the question is: where are the cycles going? Are we suffering >> excessive cache misses in managing the slabs? Are you effectively "cycling >> through" objects rather than using a smaller set that fits better in the >> cache? > > In my test setup only several objects from zone usually allocated same time, > but they allocated two times per every packet. > > To check UMA dependency I have made a trivial one-element cache which in my > test case allows to avoid two for four allocations per packet. Avoiding unnecessary allocations is a good general principle, but duplicating cache logic is a bad idea. If you're able to structure the below without using locking, it strikes me you'd do much better, especially if it's in a single processing pass. Can you not use a per-thread/stack/session variable to avoid that? > .....alloc..... > - item = uma_zalloc(ng_qzone, wait | M_ZERO); > + mtx_lock_spin(&itemcachemtx); > + item = itemcache; > + itemcache = NULL; > + mtx_unlock_spin(&itemcachemtx); Why are you using spin locks? They are quite a bit more expensive on several hardwawre platforms, and any environment it's safe to call uma_zalloc() from will be equally safe to use regular mutexes from (i.e., mutex-sleepable). > + if (item == NULL) > + item = uma_zalloc(ng_qzone, wait | M_ZERO); > + else > + bzero(item, sizeof(*item)); > .....free..... > - uma_zfree(ng_qzone, item); > + mtx_lock_spin(&itemcachemtx); > + if (itemcache == NULL) { > + itemcache = item; > + item = NULL; > + } > + mtx_unlock_spin(&itemcachemtx); > + if (item) > + uma_zfree(ng_qzone, item); > ............... > > To be sure that test system is CPU-bound I have throttled it with sysctl to > 1044MHz. With this patch my test PPPoE-to-PPPoE router throughput has grown > from 17 to 21Mbytes/s. Profiling results I have sent promised close results. > >> Is some bit of debugging enabled that shouldn't be, perhaps due to a >> failure of ifdefs? > > I have commented out all INVARIANTS and WITNESS options from GENERIC kernel > config. What else should I check? Hence my request for drilling down a bit on profiling -- the question I'm asking is whether profiling shows things running or taking time that shouldn't be. Robert N M Watson Computer Laboratory University of Cambridge