From owner-freebsd-current@FreeBSD.ORG Sun Dec 5 19:10:56 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4F0D816A4CE for ; Sun, 5 Dec 2004 19:10:56 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id AB63343D5C for ; Sun, 5 Dec 2004 19:10:55 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.13.1/8.13.1) with ESMTP id iB5J8agQ046276; Sun, 5 Dec 2004 14:08:36 -0500 (EST) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)iB5J8acn046273; Sun, 5 Dec 2004 19:08:36 GMT (envelope-from robert@fledge.watson.org) Date: Sun, 5 Dec 2004 19:08:35 +0000 (GMT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Sean McNeil In-Reply-To: <1102273289.81612.1.camel@server.mcneil.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Barney Wolff cc: current@freebsd.org Subject: Re: mbuf count negative X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Dec 2004 19:10:56 -0000 On Sun, 5 Dec 2004, Sean McNeil wrote: > > It replaces non-atomic maintenance of the counters with atomic > > maintenance. However, this adds measurably to the cost of allocation, so > > I've been reluctant to commit it. The counters maintained by UMA are > > likely sufficient to generate the desired mbuf output now that we have > > mbuma, but I haven't had an opportunity to walk through the details of it. > > I hope to do so once I get closer to merging patches to use critical > > sections to protect UMA per-cpu caches, since I need to redo parts of the > > sysctl code then anyway. You might want to give this patch, or one much > > like it, a spin to confirm that the race is the one I think it is. The > > race in updating mbuf allocator statistics is one I hope to get fixed > > prior to 5.4. > > Since they appear to not be required for actual system use (by the fact > that it being negative doesn't cause problems), could the counts be > computed for display instead? This is pretty much what UMA does with its per-CPU caches. It pulls and pushes statistics from the caches in a couple of situations: - When pulling a new bucket into or out of the cache, it has to acquire the zone mutex, so also pushes statistics. - When a timer fires every few seconds, all the caches are checked to update the global zone statistics. - When the sysctl runs, it replicates the logic in the timer code to also update the zone statistics for display. And you can already extract pretty much all of the interesting allocation information for mbufs from vmstat -z as the mbufs are now stored using UMA. In the critical section protected version of the code, I haven't yet decided if the timers should run per-cpu, and/or how the sysctl should coalesce the information for display. I hope to have much of this resolved shortly. My current leaning is that a small amount of localized and temporary inconsistency in the stats isn't a problem, so simply doing a set of lockless reads across the per-cpu caches to update stats for presentation should be fine, and that we can probably drop the timer updates of statistics since the cache bucket balancing keeps things pretty in sync. I haven't committed the move to critical sections yet as it's currently a performance pessimization for the UP case, as entering a critical section on UP is more expensive than acquiring a mutex. John Baldwin has patches that remedy this, but hasn't yet merged them (there's also an instability with them I've seen). I know that Stephen Uphoff has also been investigating this issue. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research