From owner-freebsd-stable@freebsd.org Tue Feb 12 18:15:28 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F2BEE14EF902 for ; Tue, 12 Feb 2019 18:15:27 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from hz.grosbein.net (hz.grosbein.net [IPv6:2a01:4f8:d12:604::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7433072645 for ; Tue, 12 Feb 2019 18:15:27 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from eg.sd.rdtc.ru (eg.sd.rdtc.ru [IPv6:2a03:3100:c:13:0:0:0:5]) by hz.grosbein.net (8.15.2/8.15.2) with ESMTPS id x1CIF1rL072737 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Feb 2019 19:15:02 +0100 (CET) (envelope-from eugen@grosbein.net) X-Envelope-From: eugen@grosbein.net X-Envelope-To: wollman@bimajority.org Received: from [10.58.0.4] ([10.58.0.4]) by eg.sd.rdtc.ru (8.15.2/8.15.2) with ESMTPS id x1CIF04r052700 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Wed, 13 Feb 2019 01:15:00 +0700 (+07) (envelope-from eugen@grosbein.net) Subject: Re: 11.2-STABLE kernel wired memory leak To: Garrett Wollman , freebsd-stable@freebsd.org References: <201902121757.x1CHve0h056876@hergotha.csail.mit.edu> From: Eugene Grosbein Message-ID: <7125e053-5adf-929a-bde6-a64fceae2aaa@grosbein.net> Date: Wed, 13 Feb 2019 01:14:54 +0700 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <201902121757.x1CHve0h056876@hergotha.csail.mit.edu> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_00,LOCAL_FROM,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Report: * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * -0.0 SPF_PASS SPF: sender matches SPF record * 2.6 LOCAL_FROM From my domains X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on hz.grosbein.net X-Rspamd-Queue-Id: 7433072645 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.98 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.98)[-0.982,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Feb 2019 18:15:28 -0000 13.02.2019 0:57, Garrett Wollman wrote: > In article > eugen@grosbein.net writes: > >> Long story short: 11.2-STABLE/amd64 r335757 leaked over 4600MB kernel >> wired memory over 81 days uptime >> out of 8GB total RAM. > > Not a whole lot of evidence yet, but anecdotally I'm seeing the same > thing on some huge-memory NFS servers running releng/11.2. They seem > to run fine for a few weeks, then mysteriously start swapping > continuously, a few hundred pages a second. The continues for hours > at a time, and then stops just as mysteriously. Over time the total > memory dedicated to ZFS ARC goes down but there's no decrease in wired > memory. I've tried disabling swap, but this seems to make the server > unstable. I have yet to find any obvious commonality (aside from the > fact that these are all large-memory NFS servers which don't do much > of anything else -- the only software running on them is related to > managing and monitoring the NFS service). I started to understand the issue. FreeBSD 11 has uma(9) zone allocator for kernel subsystems and vmstat -z shows some stats for UMA zones. When some subsystem using UMA frees its memory (including networking mbufs or ZFS ARC), some kernel memory blocks are moved from USED to FREE category inside corresponding UMA zone (see vmstat -z again) but this memory stays unavailable to other consumers including userland applications until pagedaemon reclaims this "FREE" memory back to global free pool. This part seems to be broken in 11.2-STABLE. Use following command to see how much memory is wasted in your case: vmstat -z | awk -F, '{printf "%10s %s\n", $2*$5/1024/1024, $1}' | sort -k1,1 -rn | head