Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Jan 2022 22:00:15 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        bob prohaska <fbsd@www.zefox.net>
Cc:        Free BSD <freebsd-arm@freebsd.org>, Mark Johnston <markj@FreeBSD.org>
Subject:   Re: devel/llvm13 failed to reclaim memory on 8 GB Pi4 running -current [ZFS context: used the whole swap space]
Message-ID:  <54CD0806-3902-4B9C-AA30-5ED003DE4D41@yahoo.com>
In-Reply-To: <10B4E2F0-6219-4674-875F-A7B01CA6671C@yahoo.com>
References:  <20220127164512.GA51200@www.zefox.net> <C8BDF77F-5144-4234-A453-8DEC9EA9E227@yahoo.com> <2C7E741F-4703-4E41-93FE-72E1F16B60E2@yahoo.com> <20220127214801.GA51710@www.zefox.net> <5E861D46-128A-4E09-A3CF-736195163B17@yahoo.com> <20220127233048.GA51951@www.zefox.net> <6528ED25-A3C6-4277-B951-1F58ADA2D803@yahoo.com> <10B4E2F0-6219-4674-875F-A7B01CA6671C@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2022-Jan-27, at 21:55, Mark Millard <marklmi@yahoo.com> wrote:

> On 2022-Jan-27, at 17:43, Mark Millard <marklmi@yahoo.com> wrote:
>=20
>> On 2022-Jan-27, at 15:30, bob prohaska <fbsd@www.zefox.net> wrote:
>>=20
>>> On Thu, Jan 27, 2022 at 02:21:44PM -0800, Mark Millard wrote:
>>>>=20
>>>> Okay. I just started a poudriere bulk devel/llvm13 build
>>>> in a ZFS context:
>>>>=20
>>>> . . .
>>>> [00:00:37] Pkg: +BE_AMDGPU -BE_FREEBSD +BE_NATIVE -BE_STANDARD =
+BE_WASM +CLANG +DOCS +EXTRAS -FLANG +LIT +LLD +LLDB +MLIR -OPENMP =
-PYCLANG
>>>> [00:00:37] New: +BE_AMDGPU -BE_FREEBSD -BE_NATIVE +BE_STANDARD =
+BE_WASM +CLANG +DOCS +EXTRAS +FLANG +LIT +LLD +LLDB +MLIR +OPENMP =
+PYCLANG
>>>> . . .
>>>> [00:01:27] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3
>>>>=20
>>>=20
>>> Is this ARM hardware, or an emulator?
>>=20
>> 8 GiByte RPi4B, USB3 NVMe media with a ZFS partition. The content
>> is a slightly modified copy of the HoneyComb's PCIe slot Optane
>> media.
>>=20
>> The UFS-based 8 GiByte RPi4B is also based on copying from the
>> same Optane media, both for the system materials and various
>> ports/packages/pouriere related materials. (Not, necessarily,
>> other things.)
>>=20
>>> I've been using plain old make in /usr/ports/devel,=20
>>> might it be informative to try a poudriere build as well?
>>=20
>> The Pkg:, New:, and llvm13 lines I listed are poudriere(-devel)
>> output. I am doing my builds via poudriere. ALLOW_PARALLEL_JOBS=3D
>> and USE_TMPFS=3D"data" in use.
>>=20
>> I have a context in which almost all prerequisites had already
>> been built. (The change in options lead to 2 very small ports
>> to build before devel/llvm13's started in a builder.)
>>=20
>> (You might not have a jail that already has the prerequisites.)
>>=20
>>> One would expect the added overhead to increase memory use.
>>>=20
>>=20
>> Well, from the context I started in, only devel/llvm13 is being
>> built once it starts. Once it gets to the build phase (after
>> dependencies and such are set up), there is not much overhead
>> because the only activity is the one builder and it is only
>> building llvm13 --via make in the builder. At the end there
>> would be extra activity as poudriere finishes up. During the
>> build phase, I only expect minor overhead from poudriere
>> monitoring the build logs and such.
>>=20
>> I expect that the mere fact that a poudriere jail is in use
>> for the builder to execute in does not contribute to
>> significantly increasing the system's memory use or changing
>> the system's memory use pattern.
>>=20
>>=20
>> There are some other differences my context. The instances of
>> main [so: 14] are non-debug builds (but with symbols). The
>> builds are optimized for the RPi4B (and others) via use of
>> -mcpu=3Dcortex-a72 usage. My /usr/main-src/ does have some
>> personal changes in it. (Some messaging about the kills is
>> part of that.)
>>=20
>> The RPi4B's are using:
>>=20
>> over_voltage=3D6=20
>> arm_freq=3D2000=20
>> sdram_freq_min=3D3200=20
>> force_turbo=3D1=20
>>=20
>> (There are heat-sinks, fans, and good power supplies.)
>>=20
>> The media in use are USB3 1 TB Samsung Portable SSD T7
>> Touch's. I'm unlikely to see "swap_pager: indefinite
>> wait buffer:" notices if the cause was based on the
>> media performance. (You have spinning rust, if I
>> remember right.)
>>=20
>> I do not have a monitoring script making a huge log file
>> during the build. So less is competing for media access
>> or leading to other overheads. (But, as I remember,
>> you have gotten the problem without having such a script
>> running.)
>=20
>=20
> ZFS context:
>=20
> Well, the ZFS example used up all the swap space, according
> to my patched top. This means that my setting of
> vm.pfault_oom_attempts is not appropriate for this context:
>=20
> # Delay when persistent low free RAM leads to
> # Out Of Memory killing of processes:
> vm.pageout_oom_seq=3D120
> #
> # For plunty of swap/paging space (will not
> # run out), avoid pageout delays leading to
> # Out Of Memory killing of processes:
> vm.pfault_oom_attempts=3D-1
> #
> # For possibly insufficient swap/paging space
> # (might run out), increase the pageout delay
> # that leads to Out Of Memory killing of
> # processes (showing defaults at the time):
> #vm.pfault_oom_attempts=3D 3
> #vm.pfault_oom_wait=3D 10
> # (The multiplication is the total but there
> # are other potential tradoffs in the factors
> # multiplied, even for nearly the same total.)
>=20
> I'll need to retest with something more like the
> commented out vm.pfault_oom_attempts and
> vm.pfault_oom_wait figures in order to see the
> intended handling of the test case.
>=20
> What are you using for each of:
> vm.pageout_oom_seq ?
> vm.pfault_oom_attempts ?
> vm.pfault_oom_wait ?
>=20
>=20
> For reference, for ZFS:
>=20
> last pid:   380;  load averages:   1.50,   3.07,   3.93 MaxObs:   =
5.71,   4.92,   4.76                                                     =
                                     up 0+07:23:14  21:23:43
> 68 threads:    1 running, 65 sleeping, 2 waiting, 19 MaxObsRunning
> CPU: 13.3% user,  0.0% nice,  4.9% system,  0.9% interrupt, 80.8% idle
> Mem: 4912Mi Active, 167936B Inact, 1193Mi Laundry, 1536Mi Wired, =
40960B Buf, 33860Ki Free, 6179Mi MaxObsActive, 6476Mi MaxObsWired, =
7820Mi MaxObs(Act+Wir+Lndry)
> ARC: 777086Ki Total, 132156Ki MFU, 181164Ki MRU, 147456B Anon, 5994Ki =
Header, 457626Ki Other
>     59308Ki Compressed, 254381Ki Uncompressed, 4.29:1 Ratio
> Swap: 8192Mi Total, 8192Mi Used, K Free, 100% Inuse, 19572Ki In, =
3436Ki Out, 8192Mi MaxObsUsed, 14458Mi MaxObs(Act+Lndry+SwapUsed), =
15993Mi MaxObs(Act+Wir+Lndry+SwapUsed)
>=20
> Console:
> (Looks like I misremembered adjusting the "out of swap space"
> wording for the misnomer message.)
>=20
> swap_pager: out of swap space
> swp_pager_getswapspace(18): failed
> swap_pager: out of swap space
> swp_pager_getswapspace(1): failed
> swp_pager_getswapspace(1): failed
> swap_pager: out of swap space
> swp_pager_getswapspace(1): failed
> swp_pager_getswapspace(7): failed
> swp_pager_getswapspace(24): failed
> swp_pager_getswapspace(3): failed
> swp_pager_getswapspace(18): failed
> swp_pager_getswapspace(17): failed
> swp_pager_getswapspace(1): failed
> swp_pager_getswapspace(12): failed
> swp_pager_getswapspace(23): failed
> swp_pager_getswapspace(30): failed
> swp_pager_getswapspace(3): failed
> swp_pager_getswapspace(2): failed
>=20
> . . . Then a bunch of time with no messages . . .
>=20
> swp_pager_getswapspace(5): failed
> swp_pager_getswapspace(28): failed
>=20
> . . . Then a bunch of time with no messages . . .
>=20
>=20
> Top again:
>=20
> last pid:   382;  load averages:   0.73,   1.00,   2.40 MaxObs:   =
5.71,   4.92,   4.76                                                     =
                                     up 0+07:31:26  21:31:55
> 70 threads:    1 running, 65 sleeping, 4 waiting, 19 MaxObsRunning
> CPU:  0.1% user,  0.0% nice,  5.6% system,  0.0% interrupt, 94.3% idle
> Mem: 3499Mi Active, 4096B Inact, 2612Mi Laundry, 1457Mi Wired, 40960B =
Buf, 34676Ki Free, 6179Mi MaxObsActive, 6476Mi MaxObsWired, 7820Mi =
MaxObs(Act+Wir+Lndry)
> ARC: 777154Ki Total, 135196Ki MFU, 178330Ki MRU, 5995Ki Header, =
457631Ki Other
>     59520Ki Compressed, 254231Ki Uncompressed, 4.27:1 Ratio
> Swap: 8192Mi Total, 8192Mi Used, K Free, 100% Inuse, 409600B In, 4096B =
Out, 8192Mi MaxObsUsed, 14458Mi MaxObs(Act+Lndry+SwapUsed), 15993Mi =
MaxObs(Act+Wir+Lndry+SwapUsed)
>=20
>=20
> I then used top to kill ninja and the 4 large compiles
> that were going on. I'll change:
>=20
> vm.pfault_oom_attempts
> vm.pfault_oom_wait
>=20
> and reboot and start over.
>=20
>=20
> I expect that the ongoing UFS test will likely end up
> similarly and that similar adjustments and restarts
> will be needed because of actually running out of
> swap space.
>=20

I forgot to report:

[00:01:27] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3
[07:49:17] [01] [07:47:50] Finished devel/llvm13 | llvm13-13.0.0_3: =
Failed: build

So the swap space filling happened somewhat before
that much time had passed.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54CD0806-3902-4B9C-AA30-5ED003DE4D41>