From owner-freebsd-arm@freebsd.org Thu Sep 6 07:08:33 2018 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 57A09FF136E for ; Thu, 6 Sep 2018 07:08:33 +0000 (UTC) (envelope-from fbsd@www.zefox.net) Received: from www.zefox.net (www.zefox.net [50.1.20.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "www.zefox.org", Issuer "www.zefox.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id CD97C89FF5 for ; Thu, 6 Sep 2018 07:08:32 +0000 (UTC) (envelope-from fbsd@www.zefox.net) Received: from www.zefox.net (localhost [127.0.0.1]) by www.zefox.net (8.15.2/8.15.2) with ESMTPS id w8678Tlu004340 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 6 Sep 2018 00:08:30 -0700 (PDT) (envelope-from fbsd@www.zefox.net) Received: (from fbsd@localhost) by www.zefox.net (8.15.2/8.15.2/Submit) id w8678SeJ004339; Thu, 6 Sep 2018 00:08:28 -0700 (PDT) (envelope-from fbsd) Date: Thu, 6 Sep 2018 00:08:28 -0700 From: bob prohaska To: Mark Millard Cc: "Rodney W. Grimes" , freebsd-arm@freebsd.org, bob prohaska Subject: Re: RPI3 swap experiments (r338342 with vm.pageout_oom_seq="1024" and 6 GB swap) Message-ID: <20180906070828.GC3482@www.zefox.net> References: <20180906003829.GC818@www.zefox.net> <201809060243.w862hq7o058504@pdx.rh.CN85.dnsmgr.net> <20180906042353.GA3482@www.zefox.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Sep 2018 07:08:33 -0000 On Wed, Sep 05, 2018 at 11:20:14PM -0700, Mark Millard wrote: > > > On 2018-Sep-5, at 9:23 PM, bob prohaska wrote: > > > On Wed, Sep 05, 2018 at 07:43:52PM -0700, Rodney W. Grimes wrote: > >> > >> What makes you believe that the VM system has any concept about > >> the speed of swap devices? IIRC it simply uses them in a round > >> robbin fashion with no knowlege of them being fast or slow, or > >> shared with files systems or other stuff. > >> > > > > Mostly the assertion that OOMA kills happening when the system had > > plenty of free swap were caused by the swap being "too slow". If the > > machine knows some swap is slow, it seems capable of discerning other > > swap is faster. > > If an RPI3 magically had a full-speed/low-latency optane context > as its swap space, it would still get process kills for buildworld > buildkernel for vm.pageout_oom_seq=12 for -j4 as I understand > things at this point. (Presumes still having 1 GiByte of RAM.) > > In other words: the long latency issues you have in your rpi3 > configuration may contribute to the detailed "just when did it > fail" but low-latency/high-speed I/O would be unlikely to prevent > kills from eventually happening during the llvm parts of buildworld . > Free RAM would still be low for "long periods". Increasing > vm.pageout_oom_seq is essential from what I can tell. > Understood and accepted. I'm using vm.pageout_oom_seq=1024 at present. The system struggles mightily, but it keeps going and finishes. > vm.pageout_oom_seq is about controlling "how long". -j1 builds are > about keeping less RAM active. (That is also the intent for use of > LDFLAGS.lld+=-Wl,--no-threads .) Of course, for the workload involved, > using a context with more RAM can avoid having "low RAM" for > as long. An aarch64 board with 4 GiBYte of RAM and 4 cores possibly > has no problem for -j4 buildworld buildkernel for head at this > point: Free RAM might well never be low during such a build in such > a context. > > (The quotes like "how long" are because I refer to the time > consequences, the units are not time but I'm avoiding the detail.) > > The killing criteria do not directly measure and test swapping I/O > latencies or other such as far as I know. Such things are only > involved indirectly via other consequences of the delays involved > (when they are involved at all). That is my understanding. > Perhaps I'm being naive here, but when one sees two devices holding swap, one at ~25% busy and one at ~150% busy, it seems to beg for a little selective pressure for diverting traffic to the less busy device from the more busy one. Maybe it's impossible, maybe it's more trouble than the VM folks want to invest. Just maybe, it's doable and worthwhile, to take advantage of a cheap, power efficient platform. I too am unsure of the metric for "too slow". From earlier discussion I got the impression it was something like a count of how many cycles of request and rejection (more likely, deferral) for swap space were made; after a certain count is reached, OOMA is invoked. That picture is sure to be simplistic, and may well be flat-out wrong. If my picture is not wholly incorrect, it isn't a huge leap to ask for swap device-by-device, and accept swap from the device that offers it first. In the da0 vs mmcsd0 case, ask for swap on each in turn, first to say yes gets the business. The busier one will will get beaten in the race by the more idle device, relieving the bottleneck to the extent of the faster device's capacity. It isn't perfect, but it's an improvement. Thanks for reading! bob prohaska