From owner-freebsd-questions@FreeBSD.ORG  Tue Mar  6 23:30:18 2012
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id E97171065672
	for <freebsd-questions@freebsd.org>;
	Tue,  6 Mar 2012 23:30:18 +0000 (UTC) (envelope-from cswiger@mac.com)
Received: from newmail.codefab.com (rrcs-24-103-228-244.nyc.biz.rr.com
	[24.103.228.244])
	by mx1.freebsd.org (Postfix) with ESMTP id A53F28FC15
	for <freebsd-questions@freebsd.org>;
	Tue,  6 Mar 2012 23:30:18 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by newmail.codefab.com (Postfix) with ESMTP id 1F16A11ED6A3D;
	Tue,  6 Mar 2012 18:30:12 -0500 (EST)
X-Virus-Scanned: amavisd-new at codefab.com
Received: from newmail.codefab.com ([127.0.0.1])
	by localhost (staging.codefab.com [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id 8aulfLa4Ctcb; Tue,  6 Mar 2012 18:30:10 -0500 (EST)
Received: from [192.168.1.3] (pool-96-224-39-126.nycmny.east.verizon.net
	[96.224.39.126])
	by newmail.codefab.com (Postfix) with ESMTPSA id A99A811ED6A2D;
	Tue,  6 Mar 2012 18:30:08 -0500 (EST)
Message-ID: <4F569DFF.8040807@mac.com>
Date: Tue, 06 Mar 2012 18:30:07 -0500
From: Chuck Swiger <cswiger@mac.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
MIME-Version: 1.0
To: Luke Marsden <luke@hybrid-logic.co.uk>, 
 freebsd-questions@freebsd.org
References: <1331061203.2218.38.camel@pow>
In-Reply-To: <1331061203.2218.38.camel@pow>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: FreeBSD 8.2 - active plus inactive memory leak!?
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Mar 2012 23:30:19 -0000

On 3/6/2012 2:13 PM, Luke Marsden wrote:
[ ... ]
> My current (probably quite simplistic) understanding of the FreeBSD
> virtual memory system is that, for each process as reported by top:
>
>        * Size corresponds to the total size of all the text pages for the
>          process (those belonging to code in the binary itself and linked
>          libraries) plus data pages (including stack and malloc()'d but
>          not-yet-written-to memory segments).

Size is the amount of the processes' VM address space which has been assigned; 
the various things you mention indeed are the common things which consume 
address space, but there are others like shared memory (ie, SysV shmem stuff), 
memory-mapped hardware like a video card VRAM buffer, thread-local storage, etc.

>        * Resident corresponds to a subset of the pages above: those pages
>          which actually occupy physical/core memory.  Notably pages may
>          appear in size but not appear in resident for read-only text
>          pages from libraries which have not been used yet or which have
>          been malloc()'d but not yet written-to.

Yes.

> My understanding for the values for the system as a whole (at the top in
> 'top') is as follows:
>
>        * Active / inactive memory is the same thing: resident memory from
>          processes in use.  Being in the inactive as opposed to active
>          list simply indicates that the pages in question are less
>          recently used and therefore more likely to get swapped out if
>          the machine comes under memory pressure.

Well, they aren't exactly the same thing.  The kernel implements a VM working 
set algorithm which periodically looks at all of the pages that are in memory 
and notes whether a process has accessed that page recently.  If it has, the 
page is active; if the page has not been used for "some time", it becomes 
inactive.

If the system has plenty of memory, it will not page or swap anything out.  If 
it is under mild memory pressure, it will only consider pages which are 
inactive or cache as candidates for which it might page them out.  Only under 
more severe memory pressure will it start looking to swap out entire processes 
rather than just page individual pages out.

[ Although, the FreeBSD implementation supposedly will try to balance the size 
of the active, inactive, and cache lists (or queues), so it is looking at the 
active list also-- but you don't want to page out an active page unless you 
really have to, and if you have to do that, maybe you might as well free up 
the whole process and let something have enough room to run. ]

>        * Wired is mostly kernel memory.

It's normally all kernel memory; only a rare handful of userland programs such 
as crypto code like gnupg ever ask for wired memory, AFAIK.

>        * Cache is freed memory which the kernel has decided to keep in
>          case it correspond to a useful page in future; it can be cheaply
>          evicted into the free list.

Sort of, although this description fits the "inactive" memory category also.

The major distinction is that the system is actively trying to flush any dirty 
pages in the cache category, so that they are available for reuse by something 
else immediately.

>        * Free memory is actually not being used for anything.

Yes, although the system likes to have at least a few pre-zeroed pages handy 
in case an interrupt handler needs them.

> It seems that pages which occur in the active + inactive lists must
> occur in the resident memory of one or more processes ("or more" since
> processes can share pages in e.g. read-only shared libs or COW forked
> address space).

Everything in the active and inactive (and cache) lists are resident in 
physical memory.

> Conversely, if a page *does not* occur in the resident
> memory of any process, it must not occupy any space in the active +
> inactive lists.

Hmm...if a process gets swapped out entirely, the pages for it will be moved 
to the cache list, flushed, and then reused as soon as the disk I/O completes. 
  But there is a window where the process can be marked as swapped out (and 
considered no longer resident), but still has some of it's pages in physical 
memory.

> Therefore the active + inactive memory should always be less than or
> equal to the sum of the resident memory of all the processes on the
> system, right?

No.  If you've got a lot of process pages shared (ie, a webserver with lots of 
httpd children, or a database pulling in a large common shmem area), then your 
process resident sizes can be very large compared to the system-wide 
active+inactive count.

> This "missing memory" is scary, because it seems to be increasing over
> time, and eventually when the system runs out of free memory, I'm
> certain it will crash in the same way described in my previous thread
> [1].

I don't have enough data to fully evaluate the interactions with ZFS; you can 
easily get system panics by running out of KVA on a 32-bit system, but that 
shouldn't apply to a 64-bit kernel.

But that's kernel memory, not system VM.  What you've described sounds pretty 
much like a classic load-spiral experienced by pre-forking webservers if you 
don't constrain the max # of children which can run to something that fits 
reasonably well without excessive paging, much less swapping.

> Is my understanding of the virtual memory system badly broken - in which
> case please educate me ;-) or is there a real problem here?  If so how
> can I dig deeper to help uncover/fix it?

You've got a pretty good understanding of VM, but the devil is in the details.

Regards,
-- 
-Chuck