Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 26 Oct 2018 19:40:09 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Robert <robert.ayrapetyan@gmail.com>
Cc:        Rozhuk Ivan <rozhuk.im@gmail.com>, FreeBSD <freebsd-hackers@freebsd.org>, Mark Johnston <markj@freebsd.org>
Subject:   Re: Sudden grow of memory in "Laundry" state
Message-ID:  <E2942CC3-68BF-4C2B-A5EA-90938221F55C@yahoo.com>
In-Reply-To: <42f6544f-830c-18c5-e1a8-0acc4c3f09cc@gmail.com>
References:  <55b0dd7d-19a3-b566-0602-762b783e8ff3@gmail.com> <20180911005411.GF2849@raichu> <ce38cbfa-e1c5-776e-ef2e-2b867c9a520f@gmail.com> <20180911150849.GD92634@raichu> <104be96a-c16b-7e7c-7d0d-00338ab5a106@gmail.com> <20180928152550.GA3609@raichu> <e705099c-ea42-4985-1012-50e9fa11addd@gmail.com> <20181024211237.302b72d9@gmail.com> <981C887D-78EB-46D2-AEE5-877E269AF066@yahoo.com> <c25e19a4-d3ef-e419-06f8-8a86082dbf31@gmail.com> <E4B508E7-04CC-41BD-934B-19EE69E85800@yahoo.com> <42f6544f-830c-18c5-e1a8-0acc4c3f09cc@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 2018-Oct-26, at 3:07 PM, Robert <robert.ayrapetyan at gmail.com> =
wrote:

> Sorry, let me be more specific.
>=20
> Please look into: =
https://docs.google.com/spreadsheets/d/152qBBNokl4mJUc6T6wVTcxaWOskT4Khcvd=
pOL68gEUM/edit?usp=3Dsharing (wait until charts fully loaded).

Thanks for giving folks access to the charts originally referred to.

> These are all memory states and mmap\munmap stats collected. Y axis is =
in MBs, X is a timeline.

Some things folks looking into this might want to know:

MAP_PRIVATE in use? Vs.: MAP_SHARED in use?

MAP_NOSYNC in use or not?

MAP_ANON in use or not?

MAP_FIXED in use or not? (Probably not?)

But I cover MAP_NOSYNC and another option that is
in a 3rd call below and do not need such information
for what I've written.

> It's not a problem to understand which process produces allocations =
and is being swapped. I know this for sure.
>=20
> The issue is: I strongly believe that by some reason FreeBSD kernel =
fails to reuse deallocated memory properly.
>=20
> Looking into graphs we can see following:
>=20
> 1. When process allocates memory (mmap), "Active" memory increases, =
"Free" memory decreases (that's expected).
>=20
> 2. When process deallocates memory (munmap), "Inactive" memory =
increases, "Active" memory decreases.
>=20
> Memory never returns into "Free" state. That's kind of expected as =
well.

=46rom the description of MAP_NOSYNC for mmap
(vs. the flushing behavior):

. . . Without this option any VM
			pages you dirty	may be flushed to disk every so	=
often
			(every 30-60 seconds usually) which can	create =
perfor-
			mance problems if you do not need that to occur	=
(such
			as when	you are	using shared file-backed mmap =
regions
			for IPC	purposes).  Dirty data will be flushed =
auto-
			matically when all mappings of an object are =
removed
			and all	descriptors referencing	the object are =
closed.
			Note that VM/file system coherency is maintained
			whether	you use	MAP_NOSYNC or not.

Note the specified behavior for flushing out "dirty data"
unless MAP_NOSYNC is in use. (I note another alternative
later.)

As I understand it FreeBSD uses the swapping/paging code to do the
flush activity: part of the swap/page space is mapped into the
the file in question and the flushing is a form of swapping/paging
out pages.

[Note: Top does not keep track of changes in swap space,
for example a "swapon" done after top has started
displaying things will not show an increased swap total
but the usage can show larger than the shown total.
Flushing out to a mapped file might be an example of
this for all I know.]

Also reported for flushing behavior is:

. . . The fsync(2) system call will flush all dirty data and
			metadata associated with a file, including dirty
			NOSYNC VM data,	to physical media.  The	sync(8)
			command and sync(2) system call generally do not
			flush dirty NOSYNC VM data.  The msync(2) system
			call is
			usually	not needed since BSD implements	a =
coherent
			file system buffer cache.  However, it may be =
used to
			associate dirty	VM pages with file system =
buffers and
			thus cause them	to be flushed to physical media	=
sooner
			rather than later.


As for munmap: its description is that the address range is still
reserved afterwards, quoting the description:

The munmap () system call deletes the mappings and guards for the speci-
     fied address range, and causes further references to addresses =
within the
     range to generate invalid memory references.

That last is not equivalent to the address range being "free"
in that the range still counts against the process address space.
(So being accurate about what is about RAM availability vs. address
space usage/availability is important in order to avoid confusions.)

It would appear that to force invalid memory references involves
keeping page descriptions around but they would be inactive,
rather than active. This is true no matter if RAM is still associated
or not. (So this could potentially lead to a form of extra counting
of RAM use, sort of like in my original note.) See later below for
another means of control . . .

Remember: "Dirty data will be flushed automatically when all mappings of
an object are removed and all descriptors referencing the object are
closed". So without MAP_NOSYNC the flushing is expected. But see below
for another means of control . . .

There is another call madvise that has an option tied
to enabling freeing pages and avoiding flushing the
pages:

MADV_FREE	      Gives the	VM system the freedom to free pages, and =
tells
		      the system that information in the specified page	=
range
		      is no longer important.  This is an efficient way	=
of
		      allowing malloc(3) to free pages anywhere	in the =
address
		      space, while keeping the address space valid.  The =
next
		      time that	the page is referenced,	the page might =
be
		      demand zeroed, or	might contain the data that was	=
there
		      before the MADV_FREE call.  References made to =
that
		      address space range will not make	the VM system =
page the
		      information back in from backing store until the =
page is
		      modified again.

This is a way to let the system free page ranges and
allow later use of the address range in the process's
address space. No descriptions of page ranges that should
generate invalid memory references, so no need of such
"inactive pages".

MADV_FREE makes clear that your expectations of the meaning
of munmap does not seem to match FreeBSD's actual usage:
MADV_FREE must be explicit to get the behavior you appear
to be looking for. At least that is the way I read the
documentation's meaning. MAP_NOSYNC does not seem sufficient
for matching what you are looking for as the behavioral
properties --but it appears possibly necessary up to when
MADV_FREE can be used.

> 3. At some point, when sum of "Active" and "Inactive" memory exceeds =
some upper memory limits,
>=20
> OS starts to push "Inactive" memory into "Laundry" and "Swap". This =
happens very quick and unexpectedly.

This is the flushing activity documented as far as I can tell.

> Now why OS doesn't reuse huge amounts of "Inactive" memory when =
calling mmap?

Without MADV_FREE use the system does not have "the freedom
to free pages". Without MAP_NOSYNC as well it is expected
to flush out some pages at various times as things go along.

> Or my assumption about availability of "Inactive" memory is wrong? =
Which one is free for allocations then?

Pages that are inactive and dirty, normally have to be
flushed out before the RAM for the page can be freed
for other uses. MADV_FREE is for indicating when this is
not the case and the usage of the RAM has reach a stage
where the RAM can be more directly freed (no longer tied
to the process).

At least that is my understanding.

Mark Johnston had already written about MADV_FREE but not
with such quoting of related documentation. If he and I
seem to contradict each other anywhere, believe Mark J.
I'm no FreeBSD expert. I'm just trying to reference and
understand the documentation.

> Thanks.
>=20
>=20
> On 10/24/18 11:58 AM, Mark Millard wrote:
>> On 2018-Oct-24, at 1:25 PM, Robert <robert.ayrapetyan at gmail.com> =
wrote:
>>=20
>>> Sorry, that wasn't my output, mine (related to the screenshot I've =
sent earlier) is:
>> No screen shot made it through the list back out to those that
>> get messages from the freebsd-hackers at freebsd.org reference
>> in the CC. The list limits itself to text as I understand.
>>=20
>>> Mem: 1701M Active, 20G Inact, 6225M Laundry, 2625M Wired, 280M Free
>>> ARC: 116M Total, 6907K MFU, 53M MRU, 544K Anon, 711K Header, 55M =
Other
>>>      6207K Compressed, 54M Uncompressed, 8.96:1 Ratio
>>> Swap: 32G Total, 15G Used, 17G Free, 46% Inuse
>> Relative to my limited point: I do not know the status of
>> mutually-exclusive categorizations vs. not for ZFS ARC and
>> Mem.
>>=20
>> Unfortunately, as I understand things, it is questionable if
>> adding -S to the top command gives you swap information that
>> can point to what makes up the 15G swapped out by totaling
>> the sizes listed. But you might at least be able to infer
>> what processes became swapped out even if you can not get
>> a good size for the swap space use for each.
>>=20
>> Using -ores does seem like it puts the top users of resident
>> memory at the top of top's process list.
>>=20
>> Sufficient Active RAM use by processes that stay active will
>> tend to cause inactive processes to be swapped out. FreeBSD
>> does not swap out processes that stay active: it pages those
>> as needed instead of swapping.
>>=20
>> So using top -Sores  might allow watching what active
>> process(es) grow and stay active and what inactive processes
>> are swapped out at the time of the activity.
>>=20
>> I do infer that the 15G Used for Swap is tied to processes
>> that were not active when swapped out.
>>=20
>>> I'm OK with a low "Free" memory if OS can effectively allocate from =
"Inactive",
>>>=20
>>> but I'm worrying about a sudden move of a huge piece of memory into =
"Swap" without any relevant mmap calls.
>>>=20
>>>=20
>>> My question is: what else (except mmap) may reduce "Free" memory and =
increase "Laundry"\"Swap" in the system?
>>>=20
>>> Thanks.
>>>=20
>>>=20
>>> On 10/24/18 9:34 AM, Mark Millard wrote:
>>>> On 2018-Oct-24, at 11:12 AM, Rozhuk Ivan <rozhuk.im at gmail.com> =
wrote:
>>>>=20
>>>>> On Wed, 24 Oct 2018 10:19:20 -0700
>>>>> Robert <robert.ayrapetyan@gmail.com> wrote:
>>>>>=20
>>>>>> So the issue is still happening. Please check attached =
screenshot.
>>>>>> The green area is "inactive + cached + free".
>>>>>>=20
>>>>>>  . . .
>>>>> +1
>>>>> Mem: 845M Active, 19G Inact, 4322M Laundry, 6996M Wired, 1569M =
Buf, 617M Free
>>>>> Swap: 112G Total, 19M Used, 112G Free
>>>> Just a limited point based on my understanding of "Buf" in
>>>> top's display . . .
>>>>=20
>>>> If "cached" means "Buf" in top's output, my understanding of Buf
>>>> is that it is not a distinct memory area. Instead it totals the
>>>> buffer space that is spread across multiple states: Active,
>>>> Inactive, Laundry, and possibly Wired(?).
>>>>=20
>>>> In other words: TotalMemory =3D Active+Inact+Laundry+Wired+Free.
>>>> If Buf is added to that then there is double counting of
>>>> everything included in Buf and the total will be larger
>>>> than the TotalMemory.
>>>>=20
>>>> Also Inact+Buf+Free may double count some of the Inact space,
>>>> the space that happens to be inactive buffer space.
>>>>=20
>>>> I may be wrong, but that is my understanding.
>>>>=20
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E2942CC3-68BF-4C2B-A5EA-90938221F55C>