Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Dec 2017 20:47:12 -0500
From:      Laurent Cimon <laurent@nuxi.ca>
To:        Mark Millard <markmi@dsl-only.net>
Cc:        freebsd-arm@freebsd.org, freebsd-hackers@freebsd.org, freebsd-current@freebsd.org
Subject:   Re: rpi2 hangup during poudriere build: lots of pfault wmseg status
Message-ID:  <5014B6E6-68BA-4499-8728-EF80237F3269@nuxi.ca>
In-Reply-To: <36A8BDCC-4ECE-4187-8705-54A9E38E8AD5@dsl-only.net>
References:  <05BEA04B-249B-4E7D-855A-46DA1A0DEA16@dsl-only.net> <FEC2F023-58D2-423C-B17B-2CDBEA76E299@nuxi.ca> <36A8BDCC-4ECE-4187-8705-54A9E38E8AD5@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
> On Dec 6, 2017, at 20:01, Mark Millard <markmi@dsl-only.net> wrote:
>=20
> On 2017-Dec-6, at 1:54 PM, Laurent Cimon <laurent at nuxi.ca> wrote:
>=20
>>> On Dec 6, 2017, at 00:57, Mark Millard <markmi at dsl-only.net> =
wrote:
>>>=20
>>> I tried to build some ports on a rpi2
>>> (via poudriere) but it hung up:
>>> Ethernet and normal console use. (Note:
>>> the root file system is on a USB SSD
>>> and the swap partition is also on that
>>> USB SSD.)
>>>=20
>>> But ~^b worked for getting to the db>
>>> prompt on the console.
>>>=20
>>> =46rom there a ps suggests that it got hung
>>> up in pfault activity. (Possibly insufficient
>>> RAM+swap-partition space?) But it is not
>>> clear to me that it should end up hung up
>>> vs. killing processes or other such.
>>=20
>> Hi,
>>=20
>> =46rom what I know the raspberry pis use the same controller for =
ethernet and
>> the USB hub on which you=E2=80=99re hosting an SSD. It seems like you =
make very heavy
>> use of the USB ports, and all of the resources used by poudriere =
except for the
>> CPU and the (very limited) memory that=E2=80=99s not in swap is =
attached to them. If you
>> really didn=E2=80=99t have enough memory and swap, the linkers =
would=E2=80=99ve been stopped.
>>=20
>> I think it might just be a swap death. Poudriere compiles and fetches =
in parallel
>> a lot, ethernet and disk I/O is slow because it=E2=80=99s very =
limited, so linking takes
>> longer. You end up linking a few very big binaries at the same time, =
and they
>> all fight for the memory, to get out of swap through page faults, but =
there
>> are too many page faults, all too big, requesting for more CPU time =
that=E2=80=99s
>> allowed to them.
>>=20
>> This would explain why you have 3 linkers waiting on a page fault out =
of the 4
>> CPUs poudriere allows builds on, on top of the awk processes. It =
would also
>> explain why you had easy access to the debugger: it was in memory =
already with
>> the kernel.
>>=20
>> I=E2=80=99d advise you to disable parallel builds and see if it =
happens again,
>> but it would make building much slower. Using makejobs would help if =
you
>> can afford watching the build. Otherwise be patient, it should =
resolve itself
>> eventually, but it will take a while and it will happen again.
>=20
> My post was more about how FreeBSD handled the
> heavy-use context and less about getting the
> builds to finish: it managed to to get to a
> state of no-progress for processes and a loss
> of normal control as far as I could tell.
>=20
> I did a "c" to ddb and left it until just before
> this note then did ~ ^B again. Things looked the
> same. [I've finally rebooted the rpi2.]
>=20
> PARALLEL_JOBS=3D1 was already in use but
> ALLOW_MAKE_JOBS=3Dyes was also in use.
> USE_TMPFS=3Dno was already in use.
>=20
> While an ssh session was monitoring the
> build, Ethernet was not in heavy use.
> (No nfs mounts to its disks, for example.)
>=20
> I may try without ALLOW_MAKE_JOBS=3Dyes and
> with ALLOW_MAKE_JOBS_PACKAGES empty/undefined
> to see if it can complete for such a context
> without having the same sort of problem.
>=20
> Ultimately I can cross-build and install from
> those materials when I really want updates. I
> have the context for such. This was more about
> seeing how well the rpi2 did for self-hosted.
> Classically I've used a BPI-M3 with 2 GiBytes
> of RAM and a proportionally bigger swap partition
> instead (approximately).
>=20
>=20
> FYI (rpi2 after rebooting):
>=20
> # swapinfo
> Device          1K-blocks     Used    Avail Capacity
> /dev/label/RPI2swap   1572860        0  1572860     0%
>=20
> # df -m
> Filesystem           1M-blocks  Used  Avail Capacity  Mounted on
> /dev/ufs/RPI2rootfs     195378 30791 148957    17%    /
> devfs                        0     0      0   100%    /dev
> /dev/label/RPI2Aboot        49    12     37    25%    /boot/msdos
>=20
>=20
> An rpi3 (aarch64) with the same amount of RAM,
> same type of USB SSD, etc., but well more swap
> completed building basically the same set of
> ports for the same poudriere settings just
> fine.
>=20
> Interestingly for the default kern.maxswzone:
> (Just to show the reported recommended maximum
> figures for swap.)
>=20
> rpi2: . . . exceeds maximum recommended amount (411488 pages).
> rpi3: . . . exceeds maximum recommended amount (925680 pages).
>=20
> (I was running with somewhat under those maximums for
> the tests.)
>=20
> # swapinfo
> Device          1K-blocks     Used    Avail Capacity
> /dev/gpt/RPI3swap   3702784        0  3702784     0%
>=20
> # df -m
> Filesystem           1M-blocks  Used  Avail Capacity  Mounted on
> /dev/ufs/RPI3rootfs     195378 14937 164811     8%    /
> devfs                        0     0      0   100%    /dev
> /dev/label/RPI3Aboot        49     7     42    15%    /boot/efi
>=20
> If I restricted the rpi3 to somewhat under what the
> rpi2 allows for swap, I do not know if it would also
> hang up vs. not.
>=20
> If having more swap makes the difference, then it
> would not seem to be being I/O-bound that would
> explain the hangup.
>=20
>=20
> =3D=3D=3D
> Mark Millard
> markmi at dsl-only.net

There are a few factors that could have prevented this on your raspberry =
pi 3.

It has a faster, 64 bit CPU instead of the raspberry pi 2=E2=80=99s 32 =
bit CPU and the
RAM is twice as fast. These make it less likely for this to happen, =
because it
makes both building and linking faster, which reduces the odds of =
linking 2
binaries at once, let alone 3. There are many things that could have =
gone
differently in the build that didn=E2=80=99t make it end up linking 3 =
big binaries at
the same time to cause the same behaviour.

What I think happened on your raspberry pi 2 is just likely bad luck =
that could
also happen on your raspberry pi 3. The odds of 3 parallel builds =
needing so
much ram to link at the exact same time are still very low, just less =
low on
faster hardware.

Keep in mind that this is still entirely theoretical, I don=E2=80=99t =
present it as an
absolute explanation. It=E2=80=99s simply what I understand from this.

I=E2=80=99d be curious seeing how a different operating system using a =
system similar to
poudriere where builds are done on one CPU but in parallel would be =
handled on
the rpi2. My understanding is that this is simply a mix of hardware =
limitation
and conceptual flaws with the swap. And by flaws I mean, your operating =
system
cannot save you when you try to do something that your hardware cannot =
possibly
do.

Laurent=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5014B6E6-68BA-4499-8728-EF80237F3269>