From owner-freebsd-current@freebsd.org Wed Apr 4 04:43:05 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1CCCFF8E981 for ; Wed, 4 Apr 2018 04:43:05 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from mx2.catspoiler.org (mx2.catspoiler.org [IPv6:2607:f740:16::d18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "amnesiac", Issuer "amnesiac" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id B61F788611; Wed, 4 Apr 2018 04:43:04 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org ([76.212.85.177]) by mx2.catspoiler.org (8.15.2/8.15.2) with ESMTPS id w344iA3B071297 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 4 Apr 2018 04:44:12 GMT (envelope-from truckman@FreeBSD.org) Received: from mousie.catspoiler.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTPS id w344gswt042547 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 3 Apr 2018 21:42:55 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Date: Tue, 3 Apr 2018 21:42:48 -0700 (PDT) From: Don Lewis Subject: Re: Strange ARC/Swap/CPU on yesterday's -CURRENT To: Andriy Gapon cc: Bryan Drewery , Peter Jeremy , Jeff Roberson , FreeBSD current In-Reply-To: Message-ID: References: <20180306173455.oacyqlbib4sbafqd@ler-imac.lerctr.org> <201803061816.w26IGaW5050053@pdx.rh.CN85.dnsmgr.net> <20180306193645.vv3ogqrhauivf2tr@ler-imac.lerctr.org> <20180306221554.uyshbzbboai62rdf@dx240.localdomain> <20180307103911.GA72239@kloomba> <20180311004737.3441dbf9@thor.intern.walstatt.dynvpn.de> <20180320070745.GA12880@server.rulingia.com> <2b3db2af-03c7-65ff-25e7-425cfd8815b6@FreeBSD.org> <1fd2b47b-b559-69f8-7e39-665f0f599c8f@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=us-ascii Content-Disposition: INLINE X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Apr 2018 04:43:05 -0000 On 3 Apr, Don Lewis wrote: > On 1 Apr, Don Lewis wrote: >> On 27 Mar, Andriy Gapon wrote: >>> On 24/03/2018 01:21, Bryan Drewery wrote: >>>> On 3/20/2018 12:07 AM, Peter Jeremy wrote: >>>>> >>>>> On 2018-Mar-11 10:43:58 -1000, Jeff Roberson wrote: >>>>>> Also, if you could try going back to r328953 or r326346 and let me know if >>>>>> the problem exists in either. That would be very helpful. If anyone is >>>>>> willing to debug this with me contact me directly and I will send some >>>>>> test patches or debugging info after you have done the above steps. >>>>> >>>>> I ran into this on 11-stable and tracked it to r326619 (MFC of r325851). >>>>> I initially got around the problem by reverting that commit but either >>>>> it or something very similar is still present in 11-stable r331053. >>>>> >>>>> I've seen it in my main server (32GB RAM) but haven't managed to reproduce >>>>> it in smaller VBox guests - one difficulty I faced was artificially filling >>>>> ARC. >>> >>> First, it looks like maybe several different issues are being discussed and >>> possibly conflated in this thread. I see reports related to ZFS, I see reports >>> where ZFS is not used at all. Some people report problems that appeared very >>> recently while others chime in with "yes, yes, I've always had this problem". >>> This does not help to differentiate between problems and to analyze them. >>> >>>> Looking at the ARC change you referred to from r325851 >>>> https://reviews.freebsd.org/D12163, I am convinced that ARC backpressure >>>> is completely broken. >>> >>> Does your being convinced come from the code review or from experiments? >>> If the former, could you please share your analysis? >>> >>>> On my 78GB RAM system with ARC limited to 40GB and >>>> doing a poudriere build of all LLVM and GCC packages at once in tmpfs I >>>> can get swap up near 50GB and yet the ARC remains at 40GB through it >>>> all. It's always been slow to give up memory for package builds but it >>>> really seems broken right now. >>> >>> Well, there are multiple angles. Maybe it's ARC that does not react properly, >>> or maybe it's not being asked properly. >>> >>> It would be useful to monitor the system during its transition to the state that >>> you reported. There are some interesting DTrace probes in ARC, specifically >>> arc-available_memory and arc-needfree are first that come to mind. Their >>> parameters and how frequently they are called are of interest. VM parameters >>> could be of interest as well. >>> >>> A rant. >>> >>> Basically, posting some numbers and jumping to conclusions does not help at all. >>> Monitoring, graphing, etc does help. >>> ARC is a complex dynamic system. >>> VM (pagedaemon, UMA caches) is a complex dynamic system. >>> They interact in a complex dynamic ways. >>> Sometimes a change in ARC is incorrect and requires a fix. >>> Sometimes a change in VM is incorrect and requires a fix. >>> Sometimes a change in VM requires a change in ARC. >>> These three kinds of problems can all appear as a "problem with ARC". >>> >>> For instance, when vm.lowmem_period was introduced you wouldn't find any mention >>> of ZFS/ARC. But it does affect period between arc_lowmem() calls. >>> >>> Also, pin-pointing a specific commit requires proper bisecting and proper >>> testing to correctly attribute systemic behavior changes to code changes. >> >> I just upgraded my main package build box (12.0-CURRENT, 8 cores, 32 GB >> RAM) from r327616 to r331716. I was seeing higher swap usage and larger >> ARC sizes before the upgrade than I remember from the distant past, but >> ARC was still at least somewhat responsive to memory pressure and I >> didn't notice any performance issues. >> >> After the upgrade, ARC size seems to be pretty unresponsive to memory >> demand. Currently the machine is near the end of a poudriere run to >> build my usual set of ~1800 ports. The only currently running build is >> chromium and the machine is paging heavily. Settings of interest are: >> USE_TMPFS="wrkdir data localbase" >> ALLOW_MAKE_JOBS=yes >> >> last pid: 96239; load averages: 1.86, 1.76, 1.83 up 3+14:47:00 12:38:11 >> 108 processes: 3 running, 105 sleeping >> CPU: 18.6% user, 0.0% nice, 2.4% system, 0.0% interrupt, 79.0% idle >> Mem: 129M Active, 865M Inact, 61M Laundry, 29G Wired, 1553K Buf, 888M Free >> ARC: 23G Total, 8466M MFU, 10G MRU, 5728K Anon, 611M Header, 3886M Other >> 17G Compressed, 32G Uncompressed, 1.88:1 Ratio >> Swap: 40G Total, 17G Used, 23G Free, 42% Inuse, 4756K In >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN >> 96239 nobody 1 76 0 140M 93636K CPU5 5 0:01 82.90% clang- >> 96238 nobody 1 75 0 140M 92608K CPU7 7 0:01 80.81% clang- >> 5148 nobody 1 20 0 590M 113M swread 0 0:31 0.29% clang- >> 57290 root 1 20 0 12128K 2608K zio->i 7 8:11 0.28% find >> 78958 nobody 1 20 0 838M 299M swread 0 0:23 0.19% clang- >> 97840 nobody 1 20 0 698M 140M swread 4 0:27 0.13% clang- >> 96066 nobody 1 20 0 463M 104M swread 1 0:32 0.12% clang- >> 11050 nobody 1 20 0 892M 154M swread 2 0:39 0.12% clang- >> >> Pre-upgrade I was running r327616, which is newer than either of the >> commits that Jeff mentioned above. It seems like there has been a >> regression since then. >> >> I also don't recall seeing this problem on my Ryzen box, though it has >> 2x the core count and 2x the RAM. The last testing that I did on it was >> with r329844. > > I reconfigured my Ryzen box to be more similar to my default package > builder by disabling SMT and half of the RAM, to limit it to 8 cores > and 32 GB and then started bisecting to try to track down the problem. > For each test, I first filled ARC by tarring /usr/ports/distfiles to > /dev/null. The commit range that I was searching was r329844 to > r331716. I narrowed the range to r329844 to r329904. With r329904 > and newer, ARC is totally unresponsive to memory pressure and the > machine pages heavily. I see ARC sizes of 28-29GB and 30GB of wired > RAM, so there is not much leftover for getting useful work done. Active > memory and free memory both hover under 1GB each. Looking at the > commit logs over this range, the most likely culprit is: > > r329882 | jeff | 2018-02-23 14:51:51 -0800 (Fri, 23 Feb 2018) | 13 lines > > Add a generic Proportional Integral Derivative (PID) controller algorithm and > use it to regulate page daemon output. > > This provides much smoother and more responsive page daemon output, anticipating > demand and avoiding pageout stalls by increasing the number of pages to match > the workload. This is a reimplementation of work done by myself and mlaier at > Isilon. > > > It is quite possible that the recent fixes to the PID controller will > fix the problem. Not that r329844 was trouble free ... I left tar > running over lunchtime to fill ARC and the OOM killer nuked top, tar, > ntpd, both of my ssh sessions into the machine, and multiple instances > of getty while I was away. I was able to log in again and successfully > run poudriere, and ARC did respond to the memory pressure and cranked > itself down to about 5 GB by the end of the run. I did not see the same > problem with tar when I did the same with r329904. I just tried r331966 and see no improvement. No OOM process kills during the tar run to fill ARC, but with ARC filled, the machine is thrashing itself at the start of the poudriere run while trying to build ports-mgmt/pkg (39 minutes so far). ARC appears to be unresponsive to memory demand. I've seen no decrease in ARC size or wired memory since starting poudriere. last pid: 75652; load averages: 0.46, 0.27, 0.25 up 0+02:03:48 21:33:00 48 processes: 1 running, 47 sleeping CPU: 0.7% user, 0.0% nice, 0.8% system, 0.1% interrupt, 98.4% idle Mem: 4196K Active, 328K Inact, 148K Laundry, 30G Wired, 506M Free ARC: 29G Total, 309M MFU, 28G MRU, 3767K Anon, 103M Header, 79M Other 27G Compressed, 29G Uncompressed, 1.05:1 Ratio Swap: 80G Total, 234M Used, 80G Free, 2348K In, 420K Out PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 1340 root 1 52 0 13048K 1024K nanslp 6 0:10 0.72% sh 41806 nobody 1 20 0 133M 1440K swread 6 0:17 0.31% cc 42554 nobody 1 20 0 129M 1048K swread 6 0:12 0.22% cc 934 dl 1 20 0 19124K 592K select 6 0:03 0.14% sshd 938 dl 1 20 0 13596K 740K CPU7 7 0:07 0.12% top 40784 nobody 1 20 0 5024K 104K select 5 0:00 0.03% make 41142 nobody 1 20 0 5024K 124K select 6 0:00 0.01% make 638 root 1 20 0 11344K 252K swread 7 0:00 0.01% syslogd 41428 nobody 1 20 0 5024K 124K select 4 0:00 0.01% make 40782 nobody 1 20 0 5024K 124K select 4 0:00 0.01% make 41164 nobody 1 20 0 5024K 124K select 7 0:00 0.01% make 1050 root 1 23 0 13032K 468K select 0 0:06 0.00% sh 848 root 1 20 0 11400K 492K nanslp 5 0:00 0.00% cron 36198 root 1 20 0 13048K 396K select 7 0:00 0.00% sh