From owner-freebsd-hackers@freebsd.org Thu Dec 7 01:47:16 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 80ACDE95830; Thu, 7 Dec 2017 01:47:16 +0000 (UTC) (envelope-from laurent@nuxi.ca) Received: from mail.nuxi.ca (nuxi.ca [142.44.162.10]) by mx1.freebsd.org (Postfix) with ESMTP id 322B06E422; Thu, 7 Dec 2017 01:47:15 +0000 (UTC) (envelope-from laurent@nuxi.ca) Received: from [192.168.0.174] (modemcable058.143-202-24.mc.videotron.ca [24.202.143.58]) (Authenticated sender: laurent) by mail.nuxi.ca (Postfix) with ESMTPSA id 24575209E2; Wed, 6 Dec 2017 20:47:14 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\)) Subject: Re: rpi2 hangup during poudriere build: lots of pfault wmseg status From: Laurent Cimon In-Reply-To: <36A8BDCC-4ECE-4187-8705-54A9E38E8AD5@dsl-only.net> Date: Wed, 6 Dec 2017 20:47:12 -0500 Cc: freebsd-arm@freebsd.org, freebsd-hackers@freebsd.org, freebsd-current@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <5014B6E6-68BA-4499-8728-EF80237F3269@nuxi.ca> References: <05BEA04B-249B-4E7D-855A-46DA1A0DEA16@dsl-only.net> <36A8BDCC-4ECE-4187-8705-54A9E38E8AD5@dsl-only.net> To: Mark Millard X-Mailer: Apple Mail (2.3445.5.20) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Dec 2017 01:47:16 -0000 > On Dec 6, 2017, at 20:01, Mark Millard wrote: >=20 > On 2017-Dec-6, at 1:54 PM, Laurent Cimon wrote: >=20 >>> On Dec 6, 2017, at 00:57, Mark Millard = wrote: >>>=20 >>> I tried to build some ports on a rpi2 >>> (via poudriere) but it hung up: >>> Ethernet and normal console use. (Note: >>> the root file system is on a USB SSD >>> and the swap partition is also on that >>> USB SSD.) >>>=20 >>> But ~^b worked for getting to the db> >>> prompt on the console. >>>=20 >>> =46rom there a ps suggests that it got hung >>> up in pfault activity. (Possibly insufficient >>> RAM+swap-partition space?) But it is not >>> clear to me that it should end up hung up >>> vs. killing processes or other such. >>=20 >> Hi, >>=20 >> =46rom what I know the raspberry pis use the same controller for = ethernet and >> the USB hub on which you=E2=80=99re hosting an SSD. It seems like you = make very heavy >> use of the USB ports, and all of the resources used by poudriere = except for the >> CPU and the (very limited) memory that=E2=80=99s not in swap is = attached to them. If you >> really didn=E2=80=99t have enough memory and swap, the linkers = would=E2=80=99ve been stopped. >>=20 >> I think it might just be a swap death. Poudriere compiles and fetches = in parallel >> a lot, ethernet and disk I/O is slow because it=E2=80=99s very = limited, so linking takes >> longer. You end up linking a few very big binaries at the same time, = and they >> all fight for the memory, to get out of swap through page faults, but = there >> are too many page faults, all too big, requesting for more CPU time = that=E2=80=99s >> allowed to them. >>=20 >> This would explain why you have 3 linkers waiting on a page fault out = of the 4 >> CPUs poudriere allows builds on, on top of the awk processes. It = would also >> explain why you had easy access to the debugger: it was in memory = already with >> the kernel. >>=20 >> I=E2=80=99d advise you to disable parallel builds and see if it = happens again, >> but it would make building much slower. Using makejobs would help if = you >> can afford watching the build. Otherwise be patient, it should = resolve itself >> eventually, but it will take a while and it will happen again. >=20 > My post was more about how FreeBSD handled the > heavy-use context and less about getting the > builds to finish: it managed to to get to a > state of no-progress for processes and a loss > of normal control as far as I could tell. >=20 > I did a "c" to ddb and left it until just before > this note then did ~ ^B again. Things looked the > same. [I've finally rebooted the rpi2.] >=20 > PARALLEL_JOBS=3D1 was already in use but > ALLOW_MAKE_JOBS=3Dyes was also in use. > USE_TMPFS=3Dno was already in use. >=20 > While an ssh session was monitoring the > build, Ethernet was not in heavy use. > (No nfs mounts to its disks, for example.) >=20 > I may try without ALLOW_MAKE_JOBS=3Dyes and > with ALLOW_MAKE_JOBS_PACKAGES empty/undefined > to see if it can complete for such a context > without having the same sort of problem. >=20 > Ultimately I can cross-build and install from > those materials when I really want updates. I > have the context for such. This was more about > seeing how well the rpi2 did for self-hosted. > Classically I've used a BPI-M3 with 2 GiBytes > of RAM and a proportionally bigger swap partition > instead (approximately). >=20 >=20 > FYI (rpi2 after rebooting): >=20 > # swapinfo > Device 1K-blocks Used Avail Capacity > /dev/label/RPI2swap 1572860 0 1572860 0% >=20 > # df -m > Filesystem 1M-blocks Used Avail Capacity Mounted on > /dev/ufs/RPI2rootfs 195378 30791 148957 17% / > devfs 0 0 0 100% /dev > /dev/label/RPI2Aboot 49 12 37 25% /boot/msdos >=20 >=20 > An rpi3 (aarch64) with the same amount of RAM, > same type of USB SSD, etc., but well more swap > completed building basically the same set of > ports for the same poudriere settings just > fine. >=20 > Interestingly for the default kern.maxswzone: > (Just to show the reported recommended maximum > figures for swap.) >=20 > rpi2: . . . exceeds maximum recommended amount (411488 pages). > rpi3: . . . exceeds maximum recommended amount (925680 pages). >=20 > (I was running with somewhat under those maximums for > the tests.) >=20 > # swapinfo > Device 1K-blocks Used Avail Capacity > /dev/gpt/RPI3swap 3702784 0 3702784 0% >=20 > # df -m > Filesystem 1M-blocks Used Avail Capacity Mounted on > /dev/ufs/RPI3rootfs 195378 14937 164811 8% / > devfs 0 0 0 100% /dev > /dev/label/RPI3Aboot 49 7 42 15% /boot/efi >=20 > If I restricted the rpi3 to somewhat under what the > rpi2 allows for swap, I do not know if it would also > hang up vs. not. >=20 > If having more swap makes the difference, then it > would not seem to be being I/O-bound that would > explain the hangup. >=20 >=20 > =3D=3D=3D > Mark Millard > markmi at dsl-only.net There are a few factors that could have prevented this on your raspberry = pi 3. It has a faster, 64 bit CPU instead of the raspberry pi 2=E2=80=99s 32 = bit CPU and the RAM is twice as fast. These make it less likely for this to happen, = because it makes both building and linking faster, which reduces the odds of = linking 2 binaries at once, let alone 3. There are many things that could have = gone differently in the build that didn=E2=80=99t make it end up linking 3 = big binaries at the same time to cause the same behaviour. What I think happened on your raspberry pi 2 is just likely bad luck = that could also happen on your raspberry pi 3. The odds of 3 parallel builds = needing so much ram to link at the exact same time are still very low, just less = low on faster hardware. Keep in mind that this is still entirely theoretical, I don=E2=80=99t = present it as an absolute explanation. It=E2=80=99s simply what I understand from this. I=E2=80=99d be curious seeing how a different operating system using a = system similar to poudriere where builds are done on one CPU but in parallel would be = handled on the rpi2. My understanding is that this is simply a mix of hardware = limitation and conceptual flaws with the swap. And by flaws I mean, your operating = system cannot save you when you try to do something that your hardware cannot = possibly do. Laurent=