From owner-freebsd-fs@freebsd.org Sat Aug 20 16:08:54 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 04B8FBC0194 for ; Sat, 20 Aug 2016 16:08:54 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (denninger.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CC5B0167F for ; Sat, 20 Aug 2016 16:08:53 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 3FB3F219FD for ; Sat, 20 Aug 2016 11:08:51 -0500 (CDT) Subject: Re: ZFS ARC under memory pressure References: <20160816193416.GM8192@zxy.spb.ru> <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> <20160818202657.GS8192@zxy.spb.ru> <20160819201840.GA12519@zxy.spb.ru> <20160820152225.GP83214@kib.kiev.ua> Cc: freebsd-fs@freebsd.org From: Karl Denninger Message-ID: <97f166f0-4d47-d5a3-ecb3-d15f1ecf9c1f@denninger.net> Date: Sat, 20 Aug 2016 11:08:44 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160820152225.GP83214@kib.kiev.ua> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms050405090709090407070503" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2016 16:08:54 -0000 This is a cryptographically signed message in MIME format. --------------ms050405090709090407070503 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 8/20/2016 10:22, Konstantin Belousov wrote: > On Fri, Aug 19, 2016 at 03:38:55PM -0500, Karl Denninger wrote: >> Paging *always* requires one I/O (to write the page(s) to the swap) an= d >> MAY involve two (to later page it back in.) It is never a "win" to >> spend a *guaranteed* I/O when you can instead act in a way that *might= * >> cause you to (later) need to execute one. > Why would pagedaemon need to write out clean page ? If you are talking about the case of an executable in which part of the text is evicted you are correct, however, you are still choosing in that instance to evict a page for which there will likely be a future demand and thus require an I/O (should that executable come back up for execution) as opposed to one for which you have no idea how likely demand for same will be (a data page in the ARC.) Since the VM has no means of "coloring" the ARC (as it is opaque other than the consumption of system memory to the VM) as to how "useful" (e.g. how often used, etc) a particular data item in the ARC is, it has no information available on which to decide. However, the fact that an executing process is in some sort of waiting state still likely trumps an ARC data page in terms of likelihood of future access. root@NewFS:/usr/src/sys/amd64/conf # pstat -s Device 1K-blocks Used Avail Capacity /dev/mirror/sw.eli 67108860 291356 66817504 0% While this is not a large amount of page space used I can assure you that at no time since boot was all 32GB of memory in the machine consumed with other-than-ARC data. As such for the VM system to have decided to evict pages to the swap file rather than the ARC be pared back is demonstrably wrong since the result was the execution of I/Os on the *speculative* bet that a page in the ARC would preferentially be required. On 10.x, unpatched, there were fairly trivial "added" workload choices that one might make on a routine basis (e.g. "make -j8 buildworld") on this machine that, if you had a largish text file open in "vi", would lead to user-perceived stalls exceeding 10 seconds in length during which that process's working set had been evicted so as to keep ARC cache data! While it might at first blush appear that the Postgres database consumers on the same machine would be happy with this when *their* RSS got paged out and *they* took the resulting 10+ second stall as well that certainly was not the case! 11.x does exhibit far less pathology in this regard than did 10.x (unpatched) and I've yet to see the "stall system to the point that it appears it has crashed" behavior that I formerly could provoke with a trivial test. However, the fact remains that the same machine, with the same load, running 10.x and my patches ran for months at a time with zero page space consumed, a fully-utilized ARC and very little slack space (defined as RAM in "Cache" + allocated-but-unused UMA) -- in other words, with no displayed pathology at all. The behavior of unpatched 11.x, while very-materially better than unpatched 10.x, IMHO does not meet this standard. In particular there are quite-large quantities of UMA space out-but-unused on a regular basis= and while *at present* the ARC looks pretty healthy this is a weekend when system load is quite low. During the week not only does the UMA situation look far worse so does the ARC size and efficiency which frequently winds up running at "half-mast" compared to where it ought to = be. I believe FreeBSD 11.x can do better and intend to roll forward the 10.x work in an attempt to implement that. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms050405090709090407070503 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA4MjAxNjA4NDRaME8GCSqGSIb3DQEJBDFCBEAS Zl+4p0iIIr2XvXPcFFFHySop9cG9weehGjGjTN2fG8b+6nWgMCkvSGozOg9Ezvojy4PuNEuj 4aJJlOKAnsp8MGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAHf4plh2t fRHIRSFT/S6u8gAkyud9Gq+LnTpO4e2MAvXeNUORco00hBXqa5WW8n0mtUmupmBYMAHsreST F3sCwmk0yLyK4RqB6rs84/flVvm0GJlwOaHRxeq4B8qGoxUe4KscjiHLfR+YRI1DAHTP5MER vze4Hk6ANMGUBPlea7Nj6IgAA/pAx8knw3pON0YOnKf6Zb5Rhlbe4pz9I/n7o8BEZ35xfm3o Of39r9QSQX5Y4IyegpIQjdH1kStAHLA8QmCFbhMpwOi0f6xi/tO0qU18Jhew6y3CqGmAYddN nBVEV9u0S5JNgClRcV6JZMYjHxT7PyGGRPVtXJ4hKsy0fZxYUNaZ0Ha5fZvabfGAClW1PDLv sj6DhUvPQ7yXvRFt/ocCQCkGj+UJtHrWcFr75RW6md8/MGnfL386zLLc+/3/h1bm1ig9KRdN PkJYYMxqmUux3ueNCj0kxlnWcctsXaQpChxrdhTns+yxj+32bHXzDiqR8Me4m1IPQkqdpAW2 KQ0fNlop1E4PguteLdQafmtz6DIdIid4N8hgJ75UevlUf705+nJlZCYTLFATfEAO0liiqZxf kcuvvU7dmjKFFdH1pfscQDCDbDD5EaHSp7rEShWJbxrOfxc6RoHEWmBzwo/uSlbVh6ZJ+7Gf 4NTodj3yG6NadnGUw1dtmTqjiokAAAAAAAA= --------------ms050405090709090407070503--