From owner-freebsd-hackers@FreeBSD.ORG Sat Feb 2 22:51:16 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E5AD16A4AC; Sat, 2 Feb 2008 22:51:16 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 520F713C4FF; Sat, 2 Feb 2008 22:51:16 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id F1D384F81F; Sat, 2 Feb 2008 17:51:12 -0500 (EST) Date: Sat, 2 Feb 2008 22:51:12 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Kris Kennaway In-Reply-To: <47A4F1AF.9090306@FreeBSD.org> Message-ID: <20080202224923.T66602@fledge.watson.org> References: <47A25412.3010301@FreeBSD.org> <47A25A0D.2080508@elischer.org> <47A2C2A2.5040109@FreeBSD.org> <20080201185435.X88034@fledge.watson.org> <47A43873.40801@FreeBSD.org> <20080202095658.R63379@fledge.watson.org> <47A4E934.1050207@FreeBSD.org> <47A4F1AF.9090306@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-hackers@freebsd.org, Alexander Motin , freebsd-performance@freebsd.org, Julian Elischer Subject: Re: Memory allocation performance X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Feb 2008 22:51:16 -0000 On Sat, 2 Feb 2008, Kris Kennaway wrote: > Alexander Motin wrote: >> Robert Watson wrote: >>> Hence my request for drilling down a bit on profiling -- the question I'm >>> asking is whether profiling shows things running or taking time that >>> shouldn't be. >> >> I have not yet understood why does it happend, but hwpmc shows huge amount >> of "p4-resource-stall"s in UMA functions: > >> For this moment I have invent two possible explanation. One is that due to >> UMA's cyclic block allocation order it does not fits CPU caches and another >> that it is somehow related to critical_exit(), which possibly can cause >> context switch. Does anybody have better explanation how such small and >> simple in this part function can cause such results? > > You can look at the raw output from pmcstat, which is a collection of > instruction pointers that you can feed to e.g. addr2line to find out exactly > where in those functions the events are occurring. This will often help to > track down the precise causes. There was, FYI, a report a few years ago that there was a measurable improvement from allocating off the free bucket rather than maintaining separate alloc and free buckets. It sounded good at the time but I was never able to reproduce the benefits in my test environment. Now might be a good time to try to revalidate that. Basically, the goal would be to make the pcpu cache FIFO as much as possible as that maximizes the chances that the newly allocated object already has lines in the cache. It's a fairly trivial tweak to the UMA allocation code. Robert N M Watson Computer Laboratory University of Cambridge