From owner-freebsd-stable@FreeBSD.ORG Wed Mar 7 00:36:27 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B7CE01065678; Wed, 7 Mar 2012 00:36:27 +0000 (UTC) (envelope-from luke@hybrid-logic.co.uk) Received: from hybrid-sites.com (ns226322.hybrid-sites.com [176.31.229.137]) by mx1.freebsd.org (Postfix) with ESMTP id 741958FC22; Wed, 7 Mar 2012 00:36:26 +0000 (UTC) Received: from [127.0.0.1] (helo=ewes) by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD)) (envelope-from ) id 1S54rf-000JaQ-8o; Wed, 07 Mar 2012 00:36:24 +0000 Received: from [78.105.122.99] (helo=[192.168.1.23] by ns226322.hybrid-sites.com with esmtp (Hybrid Web Cluster distributed mail proxy) (envelope-from ); Wed, 07 Mar 2012 00:36:23 -0000 From: Luke Marsden To: Chuck Swiger In-Reply-To: <4F569DFF.8040807@mac.com> References: <1331061203.2218.38.camel@pow> <4F569DFF.8040807@mac.com> Content-Type: text/plain; charset="UTF-8" Date: Wed, 07 Mar 2012 00:36:21 +0000 Message-ID: <1331080581.2589.28.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-bar: + Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk, freebsd-stable@freebsd.org, freebsd-questions@freebsd.org Subject: Re: FreeBSD 8.2 - active plus inactive memory leak!? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2012 00:36:27 -0000 Thanks for your email, Chuck. > > Conversely, if a page *does not* occur in the resident > > memory of any process, it must not occupy any space in the active + > > inactive lists. > > Hmm...if a process gets swapped out entirely, the pages for it will be moved > to the cache list, flushed, and then reused as soon as the disk I/O completes. > But there is a window where the process can be marked as swapped out (and > considered no longer resident), but still has some of it's pages in physical > memory. There's no swapping happening on these machines (intentionally so, because as soon as we hit swap everything goes tits up), so this window doesn't concern me. I'm trying to confirm that, on a system with no pages swapped out, that the following is a true statement: a page is accounted for in active + inactive if and only if it corresponds to one or more of the pages accounted for in the resident memory lists of all the processes on the system (as per the output of 'top' and 'ps') > > Therefore the active + inactive memory should always be less than or > > equal to the sum of the resident memory of all the processes on the > > system, right? > > No. If you've got a lot of process pages shared (ie, a webserver with lots of > httpd children, or a database pulling in a large common shmem area), then your > process resident sizes can be very large compared to the system-wide > active+inactive count. But that's what I'm saying... sum(process resident sizes) >= active + inactive Or as I said it above, equivalently: active + inactive <= sum(process resident sizes) The data I've got from this system, and what's killing us, shows the opposite: active + inactive > sum(process resident sizes) - by over 5GB now and growing, which is what keeps causing these machines to crash. In particular: Mem: 13G Active, 1129M Inact, 7543M Wired, 120M Cache, 1553M Free But the total sum of resident memories is 9457M (according to summing the output from ps or top). 13G + 1129M = 14441M (active + inact) > 9457M (sum of res) That's 4984M out, and that's almost enough to push us over the edge. If my understanding of VM is correct, I don't see how this can happen. But it's happening, and it's causing real trouble here because our free memory keeps hitting zero and then we swap-spiral. What can I do to investigate this discrepancy? Are there some tools that I can use to debug the memory allocated in "active" to find out where it's going, if not to resident process memory? Thanks, Luke -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com