From owner-freebsd-stable@FreeBSD.ORG Tue Sep 28 22:01:27 2010 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F036B106566C; Tue, 28 Sep 2010 22:01:27 +0000 (UTC) (envelope-from ben@wanderview.com) Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102]) by mx1.freebsd.org (Postfix) with ESMTP id 6EE088FC0A; Tue, 28 Sep 2010 22:01:27 +0000 (UTC) Received: from xykon.in.wanderview.com (xykon.in.wanderview.com [10.76.10.152]) (authenticated bits=0) by mail.wanderview.com (8.14.4/8.14.4) with ESMTP id o8SM1LVX031742 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 28 Sep 2010 22:01:21 GMT (envelope-from ben@wanderview.com) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: Ben Kelly In-Reply-To: <4CA25E92.4060904@icyb.net.ua> Date: Tue, 28 Sep 2010 18:01:21 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <5BD33772-C0EA-48A9-BE9A-C8FBAF0008D7@wanderview.com> References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> <4CA21809.7090504@icyb.net.ua> <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> <4CA22337.2010900@icyb.net.ua> <4CA25E92.4060904@icyb.net.ua> To: Andriy Gapon X-Mailer: Apple Mail (2.1081) X-Spam-Score: -1.01 () ALL_TRUSTED,T_RP_MATCHES_RCVD X-Scanned-By: MIMEDefang 2.67 on 10.76.20.1 Cc: stable@freebsd.org, Willem Jan Withagen , fs@freebsd.org, Jeremy Chadwick Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 22:01:28 -0000 On Sep 28, 2010, at 5:30 PM, Andriy Gapon wrote: << snipped lots of good info here... probably won't have time to look at = it in detail until the weekend >> >> there seems to be a layering violation in that the buffer cache = signals >> directly to the upper page daemon layer to trigger page reclamation.) >=20 > Umm, not sure if that is a fact. I was referring to the code in vfs_bio.c that used to twiddle = vm_pageout_deficit directly. That seems to have been replaced with a = call to vm_page_grab(). >> The old (ancient) patch I tried previously to help reduce the arc = working set >> and allow it to shrink is here: >>=20 >> http://www.wanderview.com/svn/public/misc/zfs/zfs_kmem_limit.diff >>=20 >> Unfortunately, there are a couple ideas on fighting fragmentation = mixed into >> that patch. See the part about arc_reclaim_pages(). This patch did = seem to >> allow my arc to stay under the target maximum even when under load = that >> previously caused the system to exceed the maximum. When I update = this >> weekend I'll try a stripped down version of the patch to see if it = helps or >> not with the latest zfs. >>=20 >> Thanks for your help in understanding this stuff! >=20 > The patch seems good, especially the part about taking into account = the kmem > fragmentation. But it also seems to be heavily tuned towards "tiny = ARC" systems > like yours, so I am not sure yet how suitable it is for "mainstream" = systems. Thanks. Yea, there is a lot of aggressive tuning there. In particular, = the slow growth algorithm is somewhat dubious. What I found, though, = was that the fragmentation jumped whenever the arc was reduced in size, = so it was an attempt to make the size slowly approach peak load without = overshooting. A better long term solution would probably be to enhance UMA to support = custom slab sizes on a zone-by-zone basis. That way all zfs/arc = allocations can use slabs of 128k (at a memory efficiency penalty of = course). I prototyped this with a dumbed down block pool allocator at = one point and was able to avoid most, if not all, of the fragmentation. = Adding the support to UMA seemed non-trivial, though. Thanks again for the information. I hope to get a chance to look at the = code this weekend. - Ben=