From owner-svn-src-all@FreeBSD.ORG Fri Aug 29 21:20:49 2014 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B442E935; Fri, 29 Aug 2014 21:20:49 +0000 (UTC) Received: from smtp2.wemm.org (smtp2.wemm.org [192.203.228.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp2.wemm.org", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 935E21A00; Fri, 29 Aug 2014 21:20:49 +0000 (UTC) Received: from overcee.wemm.org (canning.wemm.org [192.203.228.65]) by smtp2.wemm.org (Postfix) with ESMTP id 2349F156; Fri, 29 Aug 2014 14:20:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wemm.org; s=m20140428; t=1409347249; bh=XehHGiwOYYeilCDF/Uo5jBv5qvLbuDMhmTxUjY3mTBU=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=gFC0WzADCwkudjc+AcIUJvjp/tX4pBAWnVLC4FekaZy2H6Vlsr+STnTHb7F2bal5l GyWCOxLBhMqyMB0aAkbsFRbuUP2XtyHHwWh7hPsQRbhmXNT/jowwMIR7SoBAsoHilS Zgd0CQF9bKLwRlDuQgsklYphlAUBnwB4gF15bQQI= From: Peter Wemm To: Steven Hartland Subject: Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm Date: Fri, 29 Aug 2014 14:20:44 -0700 Message-ID: <2714752.cWQfguSlQD@overcee.wemm.org> User-Agent: KMail/4.12.5 (FreeBSD/11.0-CURRENT; KDE/4.12.5; amd64; ; ) In-Reply-To: <0B77E782B5004AEBA77E6A5D16924D83@multiplay.co.uk> References: <201408281950.s7SJo90I047213@svn.freebsd.org> <64121723.0IFfex9X4X@overcee.wemm.org> <0B77E782B5004AEBA77E6A5D16924D83@multiplay.co.uk> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart15720028.E9rAG9uuRh"; micalg="pgp-sha1"; protocol="application/pgp-signature" Cc: src-committers@freebsd.org, Alan Cox , svn-src-all@freebsd.org, Dmitry Morozovsky , "Matthew D. Fuller" , svn-src-head@freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Aug 2014 21:20:49 -0000 --nextPart15720028.E9rAG9uuRh Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" On Friday 29 August 2014 21:42:15 Steven Hartland wrote: > ----- Original Message ----- > From: "Peter Wemm" >=20 > > On Friday 29 August 2014 20:51:03 Steven Hartland wrote: > snip.. >=20 > > > Does Karl's explaination as to why this doesn't work above change= > > > your > > > mind? > >=20 > > Actually no, I would expect the code as committed would *cause* the= > > undesirable behavior that Karl described. > >=20 > > ie: access a few large files and cause them to reside in cache. Sa= y > > 50GB or so > > on a 200G ram machine. We now have the state where: > >=20 > > v_cache =3D 50GB > > v_free =3D 1MB > >=20 > > The rest of the vm system looks at vm_paging_needed(), which is: d= o > > we have > > enough "v_cache + v_free"? Since there's 50.001GB free, the answer= is > > no. > > It'll let v_free run right down to v_free_min because of the giant > > pool of > > v_cache just sitting there, waiting to be used. > >=20 > > The zfs change, as committed will ignore all the free memory in the= > > form of > > v_cache.. and will be freaking out about how low v_free is getting = and > > will be > > sacrificing ARC in order to put more memory into the v_free pool. > >=20 > > As long as ARC keeps sacrificing itself this way, the free pages in= > > the v_cache > > pool won't get used. When ARC finally runs out of pages to give up= to > > v_free, > > the kernel will start using the free pages from v_cache. Eventuall= y > > it'll run > > down that v_cache free pool and arc will be in a bare minimum state= > > while this > > is happening. > >=20 > > Meanwhile, ZFS ARC will be crippled. This has consequences - it do= es > > RCU like > > things from ARC to keep fragmentation under control. With ARC > > crippled, > > fragmentation will increase because there's less opportunistic > > gathering of > > data from ARC. > >=20 > > Granted, you have to get things freed from active/inactive to the > > cache state, > > but once it's there, depending on the worlkload, it'll mess with AR= C. >=20 > There's already a vm_paging_needed() check in there below so this wil= l > already > be dealt with will it not? No. If you read the code that you changed, you won't get that far. The v_fr= ee test=20 comes before vm_paging_needed(), and if the v_free test triggers then A= RC will=20 return pages and not look at the rest of the function. If this function returns non-zerp, ARC is given back: static int arc_reclaim_needed(void) { if (kmem_free_count() < zfs_arc_free_target) { return (1); } /* * Cooperate with pagedaemon when it's time for it to scan * and reclaim some pages. */ if (vm_paging_needed()) { return (1); } ie: if v_free (ignoring v_cache free pages) gets below the threshold, s= top=20 evertyhing and discard ARC pages.=20 The vm_paging_needed() code is a NO-OP at this point. It can never retu= rn=20 true. Consider: vm_cnt.v_free_target =3D 4 * vm_cnt.v_free_min + vm_cnt.v_free_= reserved; vs vm_pageout_wakeup_thresh =3D (vm_cnt.v_free_min / 10) * 11; zfs_arc_free_target defaults to vm_cnt.v_free_target, which is 400% of=20= v_free_min, and compares it against the smaller v_free pool. vm_paging_needed() compares the total free pool (v_free + v_cache) agai= nst the=20 smaller wakeup threshold - 110% of v_free_min. Comparing a larger value against a smaller target than the previous tes= t will=20 never succeed unless you manually change the arc_free_target sysctl. Also, what about the magic numbers here: u_int zfs_arc_free_target =3D (1 << 19); /* default before pagedaemon i= nit only=20 */ That's half a million pages, or 2GB of physical ram on a 4K page size s= ystem =20 How is this going to work on early boot in the machines in the cluster = with=20 less than 2GB of ram? =2D-=20 Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI= 6FJV UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246 --nextPart15720028.E9rAG9uuRh Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABAgAGBQJUAO6wAAoJEDXWlwnsgJ4EWGsH/25GwipkDGNwf9n3q5+CK8ri jLK2Bs5kXAlz9w6lnd5QxlxHmOT4s/X2BTleepYZkDdDCSyyBftHBrOzzLzQ9Sh5 T/ZZWcC2ofkY6ih7QTrE6asgG8E1VZtOo70fCLwJ/b9kmWqI/TnEov/aVafu76cx RJXTMHVju8pdbUzTSG77PHuCwCfl78T3MnW45tJgQrbLFHlUrR4ICT404fq0jbUA gxNKj1ONUZJApS/sesPqI+ueLtBwaJbNwtKM03zXc29FTmJmg393SAlG9nrfVWvZ J8Jhv809XhsRt2x0sAnyIlIdGy2mQ67cK17FYiaXQWJEjt5oTIGOghve8C7IqFU= =T44y -----END PGP SIGNATURE----- --nextPart15720028.E9rAG9uuRh--