From owner-freebsd-hackers@FreeBSD.ORG Sat Feb 2 22:05:43 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DC57616A419; Sat, 2 Feb 2008 22:05:43 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121]) by mx1.freebsd.org (Postfix) with ESMTP id E712413C448; Sat, 2 Feb 2008 22:05:42 +0000 (UTC) (envelope-from mav@FreeBSD.org) X-Spam-Flag: SKIP X-Spam-Yversion: Spamooborona 1.7.0 Received: from [212.86.226.226] (account mav@alkar.net HELO [192.168.3.2]) by cmail.optima.ua (CommuniGate Pro SMTP 5.1.14) with ESMTPA id 72286663; Sun, 03 Feb 2008 00:05:42 +0200 Message-ID: <47A4E934.1050207@FreeBSD.org> Date: Sun, 03 Feb 2008 00:05:40 +0200 From: Alexander Motin User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: Robert Watson References: <47A25412.3010301@FreeBSD.org> <47A25A0D.2080508@elischer.org> <47A2C2A2.5040109@FreeBSD.org> <20080201185435.X88034@fledge.watson.org> <47A43873.40801@FreeBSD.org> <20080202095658.R63379@fledge.watson.org> In-Reply-To: <20080202095658.R63379@fledge.watson.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org, Julian Elischer Subject: Re: Memory allocation performance X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Feb 2008 22:05:44 -0000 Robert Watson wrote: > Hence my request for drilling down a bit on profiling -- the question > I'm asking is whether profiling shows things running or taking time that > shouldn't be. I have not yet understood why does it happend, but hwpmc shows huge amount of "p4-resource-stall"s in UMA functions: % cumulative self self total time seconds seconds calls ms/call ms/call name 45.2 2303.00 2303.00 0 100.00% uma_zfree_arg [1] 41.2 4402.00 2099.00 0 100.00% uma_zalloc_arg [2] 1.4 4472.00 70.00 0 100.00% uma_zone_exhausted_nolock [3] 0.9 4520.00 48.00 0 100.00% ng_snd_item [4] 0.8 4562.00 42.00 0 100.00% __qdivrem [5] 0.8 4603.00 41.00 0 100.00% ether_input [6] 0.6 4633.00 30.00 0 100.00% ng_ppp_prepend [7] Probably it explains why "p4-global-power-events" shows many hits into them % cumulative self self total time seconds seconds calls ms/call ms/call name 20.0 37984.00 37984.00 0 100.00% uma_zfree_arg [1] 17.8 71818.00 33834.00 0 100.00% uma_zalloc_arg [2] 4.0 79483.00 7665.00 0 100.00% ng_snd_item [3] 3.0 85256.00 5773.00 0 100.00% __mcount [4] 2.3 89677.00 4421.00 0 100.00% bcmp [5] 2.2 93853.00 4176.00 0 100.00% generic_bcopy [6] , while "p4-instr-retired" does not. % cumulative self self total time seconds seconds calls ms/call ms/call name 11.1 5351.00 5351.00 0 100.00% ng_apply_item [1] 7.9 9178.00 3827.00 0 100.00% legacy_pcib_alloc_msi [2] 4.1 11182.00 2004.00 0 100.00% init386 [3] 4.0 13108.00 1926.00 0 100.00% rn_match [4] 3.5 14811.00 1703.00 0 100.00% uma_zalloc_arg [5] 2.6 16046.00 1235.00 0 100.00% SHA256_Transform [6] 2.2 17130.00 1084.00 0 100.00% ng_add_hook [7] 2.0 18111.00 981.00 0 100.00% ng_rmhook_self [8] 2.0 19054.00 943.00 0 100.00% em_encap [9] For this moment I have invent two possible explanation. One is that due to UMA's cyclic block allocation order it does not fits CPU caches and another that it is somehow related to critical_exit(), which possibly can cause context switch. Does anybody have better explanation how such small and simple in this part function can cause such results? -- Alexander Motin