Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 4 Aug 2018 22:56:42 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        John-Mark Gurney <jmg@funkthat.com>
Cc:        Jamie Landeg-Jones <jamie@catflap.org>, bob prohaska <fbsd@www.zefox.net>, freebsd-arm <freebsd-arm@freebsd.org>, markj@freebsd.org
Subject:   Re: RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"]
Message-ID:  <AD5BF62D-DE8F-488A-8F10-75C7485D0061@yahoo.com>
In-Reply-To: <20180805014545.GK2884@funkthat.com>
References:  <6BFE7B77-A0E2-4FAF-9C68-81951D2F6627@yahoo.com> <20180802002841.GB99523@www.zefox.net> <20180802015135.GC99523@www.zefox.net> <EC74A5A6-0DF4-48EB-88DA-543FD70FEA07@yahoo.com> <201808030034.w730YURL034270@donotpassgo.dyslexicfish.net> <F788BDD8-80DC-441A-AA3E-2745F50C3B56@yahoo.com> <201808040355.w743tPsF039729@donotpassgo.dyslexicfish.net> <8CC5DF53-F950-495C-9DC8-56FCA0087259@yahoo.com> <20180804140816.GJ2884@funkthat.com> <16ABD9F0-C908-479C-960D-0C1AEDE89053@yahoo.com> <20180805014545.GK2884@funkthat.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On 2018-Aug-4, at 6:45 PM, John-Mark Gurney <jmg at funkthat.com> wrote:

> Mark Millard wrote this message on Sat, Aug 04, 2018 at 09:08 -0700:
>> On 2018-Aug-4, at 7:08 AM, John-Mark Gurney <jmg at funkthat.com> =
wrote:
>>=20
>>> Mark Millard via freebsd-arm wrote this message on Sat, Aug 04, 2018 =
at 00:14 -0700:
>>>> On 2018-Aug-3, at 8:55 PM, Jamie Landeg-Jones <jamie at =
catflap.org> wrote:
>>>>=20
>>>>> Mark Millard <marklmi at yahoo.com> wrote:
>>>>>=20
>>>>>> If Inact+Laundry+Buf(?)+Free was not enough to provide sufficient
>>>>>> additional RAM, I'd would have guessed that some Active Real =
Memory
>>>>>> should then have been paged/swapped out and so RAM would be made
>>>>>> available. (This requires the system to have left itself =
sufficient
>>>>>> room in RAM for that guessed activity.)
>>>>>>=20
>>>>>> But I'm no expert at the intent or actual operation.
>>>>>>=20
>>>>>> Bob P.'s reports (for having sufficient swap space)
>>>>>> also indicate the likes of:
>>>>>>=20
>>>>>> v_free_count: 5439, v_inactive_count: 1
>>>>>>=20
>>>>>>=20
>>>>>> So all the examples have: "v_inactive_count: 1".
>>>>>> (So: vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt=3D=3D1 )
>>>>>=20
>>>>> Thanks for the feedback. I'll do a few more runs and other stress =
tests
>>>>> to see if that result is consistent. I'm open to any other idea =
too!
>>>>>=20
>>>>=20
>>>> The book "The Design and Implementation of the FreeBSD Operating =
System"
>>>> (2nd edition, 2014) states (page labeled 296):
>>>>=20
>>>> QUOTE:
>>>> The FreeBSD swap-out daemon will not select a runnable processes to =
swap
>>>> out. So, if the set of runnable processes do not fit in memory, the
>>>> machine will effectively deadlock. Current machines have enough =
memory
>>>> that this condition usually does not arise. If it does, FreeBSD =
avoids
>>>> deadlock by killing the largest process. If the condition begins to =
arise
>>>> in normal operation, the 4.4BSD algorithm will need to be restored.
>>>> END QUOTE.
>>>>=20
>>>> As near as I can tell, for the likes of rpi3's and rpi2's, the =
condition
>>>> is occurring during buildworld "normal operation" that tries to use =
the
>>>> available cores to advantage. (Your context does not have the I/O
>>>> problems that Bob P.'s have had in at least some of your OOM =
process
>>>> kill examples, if I understand right.)
>>>>=20
>>>> (4.4BSD used to swap out the runnable process that had been =
resident
>>>> the longest, followed by the processes taking turns being swapped =
out.
>>>> I'll not quote the exact text about such.)
>>>>=20
>>>> So I guess the question becomes, is there a reasonable way to =
enable
>>>> the 4.4BSD style of "Swapping" for "small" memory machines in order =
to
>>>> avoid having to figure out how to not end up with OOM process kills
>>>> while also not just wasting cores by using -j1 for buildworld?
>>>>=20
>>>> In other words: enable swapping out active RAM when it eats nearly
>>>> all the non-wired RAM.
>>>>=20
>>>> But it might be discovered that the performance is not better than
>>>> using fewer cores during buildworld. (Experiments needed and
>>>> possibly environment specific for the tradeoffs.) Avoiding having
>>>> to figure out the maximum -j? that avoids OOM process kills but
>>>> avoids just sticking to -j1 seems and advantage for some rpi3 and
>>>> rpi2 folks.
>>>=20
>>> Interesting observation, maybe playing w/:
>>> vm.swap_idle_threshold2: Time before a process will be swapped out
>>> vm.swap_idle_threshold1: Guaranteed swapped in time for a process
>>>=20
>>> will help thing...  lowering 2 will likely make the processes =
available
>>> for swap sooner...
>>=20
>> Looking up related information:
>>=20
>> https://www.freebsd.org/doc/handbook/configtuning-disk.html
>>=20
>> says vm.swap_idle_enabled is also involved with those two. In fact
>> it indicates the two are not even used until vm.swap_idle_enabled=3D1 =
.
>>=20
>> QUOTE
>> 11.10.1.4. vm.swap_idle_enabled
>> The vm.swap_idle_enabled sysctl(8) variable is useful in large =
multi-user systems with many active login users and lots of idle =
processes. Such systems tend to generate continuous pressure on free =
memory reserves. Turning this feature on and tweaking the swapout =
hysteresis (in idle seconds) via vm.swap_idle_threshold1 and =
vm.swap_idle_threshold2 depresses the priority of memory pages =
associated with idle processes more quickly then the normal pageout =
algorithm. This gives a helping hand to the pageout daemon. Only turn =
this option on if needed, because the tradeoff is essentially pre-page =
memory sooner rather than later which eats more swap and disk bandwidth. =
In a small system this option will have a determinable effect, but in a =
large system that is already doing moderate paging, this option allows =
the VM system to stage whole processes into and out of memory easily.
>> END QUOTE
>>=20
>> The defaults seem to be:
>>=20
>> # sysctl vm.swap_idle_enabled vm.swap_idle_threshold1 =
vm.swap_idle_threshold2
>> vm.swap_idle_enabled: 0
>> vm.swap_idle_threshold1: 2
>> vm.swap_idle_threshold2: 10
>>=20
>> Quoting the book again:
>>=20
>> QUOTE
>> If the swapping of idle processes is enabled and the pageout daemon =
can find any
>> processes that have been sleeping for more than 10 seconds =
(swap_idle_threshold2,
>> the cutoff for considering the time sleeping to be "a long time"), it =
will swap
>> them all out. [. . .] if none of these processes are available, the =
pageout
>> daemon will swap out all processes that has been sleeping for as =
briefly as 2
>> seconds (swap_idle_threshold1).
>> END QUOTE.
>>=20
>> I'd not normally expect a compile or link to sleep for such long =
periods
>> (unless I/O has long delays). Having, say, 4 such processes active at =
the
>> same time may be unlikely to have any of them swap out on the default =
scale.
>> (Clang is less I/O bound and more memory bound than GCC as I remember =
what
>> I've observed. That statement ignores paging/swapping by the system.)
>>=20
>> Such would likely be true on the scale of any positive integer =
seconds
>> figures?
>=20
> The point is to more aggressively swap out OTHER processes so that
> there is more memory available.

I guess I'm relying on what I've seen in top to indicate that most all
of the space from other processes has been paged out: not much of Active
is for non-compiles/non-links during the problem times.

For example, in =
http://www.catflap.org/jamie/rpi3/rpi3-mmc-swap-failure-stats.txt
it lists (last before the kill):

last pid: 30806;  load averages:  4.05,  4.04,  4.00  up 0+02:03:06    =
10:39:59
42 processes:  5 running, 37 sleeping
CPU: 88.5% user,  0.0% nice,  6.1% system,  0.4% interrupt,  5.0% idle
Mem: 564M Active, 2M Inact, 68M Laundry, 162M Wired, 97M Buf, 104M Free
Swap: 4G Total, 76M Used, 4G Free, 1% Inuse

  PID USERNAME    THR PRI NICE  SIZE   RES STATE    C   TIME    WCPU =
COMMAND
30762 root          1 101    0  175M  119M CPU2     2   0:39  99.07% c++
30613 root          1 101    0  342M  191M CPU0     0   2:02  95.17% c++
30689 root          1 101    0  302M  226M CPU3     3   1:28  94.48% c++
22226 root          1  20    0   19M    2M select   0   0:31   0.00% =
make
 1021 root          1  20    0   12M  340K wait     2   0:07   0.00% sh

Rule of thumb figures:
564M Active
vs. RES for the 3 c++'s:
119M+191M+226M =3D 536M for the 3 c++'s.

So: 564M - 536M =3D 28M (approx. active for other processes)

It appears to me that some c++ would likely need to swap out given that
this context lead to OOM kills.

(It might be that this rule of thumb is not good enough
for such judgments.)

[Personally I normally limit myself to -jN figures that have N*512 =
MiBytes
or more on the board. -j4 on a rpi3 or rpi2 has only 4*256 MiBytes.]



=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AD5BF62D-DE8F-488A-8F10-75C7485D0061>