Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Aug 2016 11:08:44 -0500
From:      Karl Denninger <karl@denninger.net>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS ARC under memory pressure
Message-ID:  <97f166f0-4d47-d5a3-ecb3-d15f1ecf9c1f@denninger.net>
In-Reply-To: <20160820152225.GP83214@kib.kiev.ua>
References:  <20160816193416.GM8192@zxy.spb.ru> <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> <20160818202657.GS8192@zxy.spb.ru> <c3bc6c5a-961c-e3a4-2302-f0f7417bc34f@denninger.net> <20160819201840.GA12519@zxy.spb.ru> <bcb14d0b-bd6d-cb93-ea71-3656cfce8b3b@denninger.net> <20160820152225.GP83214@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms050405090709090407070503
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


On 8/20/2016 10:22, Konstantin Belousov wrote:
> On Fri, Aug 19, 2016 at 03:38:55PM -0500, Karl Denninger wrote:
>> Paging *always* requires one I/O (to write the page(s) to the swap) an=
d
>> MAY involve two (to later page it back in.)  It is never a "win" to
>> spend a *guaranteed* I/O when you can instead act in a way that *might=
*
>> cause you to (later) need to execute one.
> Why would pagedaemon need to write out clean page ?
If you are talking about the case of an executable in which part of the
text is evicted you are correct, however, you are still choosing in that
instance to evict a page for which there will likely be a future demand
and thus require an I/O (should that executable come back up for
execution) as opposed to one for which you have no idea how likely
demand for same will be (a data page in the ARC.)

Since the VM has no means of "coloring" the ARC (as it is opaque other
than the consumption of system memory to the VM) as to how "useful"
(e.g. how often used, etc) a particular data item in the ARC is, it has
no information available on which to decide.  However, the fact that an
executing process is in some sort of waiting state still likely trumps
an ARC data page in terms of likelihood of future access.

root@NewFS:/usr/src/sys/amd64/conf # pstat -s
Device          1K-blocks     Used    Avail Capacity
/dev/mirror/sw.eli  67108860   291356 66817504     0%

While this is not a large amount of page space used I can assure you
that at no time since boot was all 32GB of memory in the machine
consumed with other-than-ARC data.  As such for the VM system to have
decided to evict pages to the swap file rather than the ARC be pared
back is demonstrably wrong since the result was the execution of I/Os on
the *speculative* bet that a page in the ARC would preferentially be
required.

On 10.x, unpatched, there were fairly trivial "added" workload choices
that one might make on a routine basis (e.g. "make -j8 buildworld") on
this machine that, if you had a largish text file open in "vi", would
lead to user-perceived stalls exceeding 10 seconds in length during
which that process's working set had been evicted so as to keep ARC
cache data!  While it might at first blush appear that the Postgres
database consumers on the same machine would be happy with this when
*their* RSS got paged out and *they* took the resulting 10+ second stall
as well that certainly was not the case!

11.x does exhibit far less pathology in this regard than did 10.x
(unpatched) and I've yet to see the "stall system to the point that it
appears it has crashed" behavior that I formerly could provoke with a
trivial test.

However, the fact remains that the same machine, with the same load,
running 10.x and my patches ran for months at a time with zero page
space consumed, a fully-utilized ARC and very little slack space
(defined as RAM in "Cache" + allocated-but-unused UMA)  -- in other
words, with no displayed pathology at all.

The behavior of unpatched 11.x, while very-materially better than
unpatched 10.x, IMHO does not meet this standard.  In particular there
are quite-large quantities of UMA space out-but-unused on a regular basis=

and while *at present* the ARC looks pretty healthy this is a weekend
when system load is quite low. During the week not only does the UMA
situation look far worse so does the ARC size and efficiency which
frequently winds up running at "half-mast" compared to where it ought to =
be.

I believe FreeBSD 11.x can do better and intend to roll forward the 10.x
work in an attempt to implement that.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms050405090709090407070503
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA4MjAxNjA4NDRaME8GCSqGSIb3DQEJBDFCBEAS
Zl+4p0iIIr2XvXPcFFFHySop9cG9weehGjGjTN2fG8b+6nWgMCkvSGozOg9Ezvojy4PuNEuj
4aJJlOKAnsp8MGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAHf4plh2t
fRHIRSFT/S6u8gAkyud9Gq+LnTpO4e2MAvXeNUORco00hBXqa5WW8n0mtUmupmBYMAHsreST
F3sCwmk0yLyK4RqB6rs84/flVvm0GJlwOaHRxeq4B8qGoxUe4KscjiHLfR+YRI1DAHTP5MER
vze4Hk6ANMGUBPlea7Nj6IgAA/pAx8knw3pON0YOnKf6Zb5Rhlbe4pz9I/n7o8BEZ35xfm3o
Of39r9QSQX5Y4IyegpIQjdH1kStAHLA8QmCFbhMpwOi0f6xi/tO0qU18Jhew6y3CqGmAYddN
nBVEV9u0S5JNgClRcV6JZMYjHxT7PyGGRPVtXJ4hKsy0fZxYUNaZ0Ha5fZvabfGAClW1PDLv
sj6DhUvPQ7yXvRFt/ocCQCkGj+UJtHrWcFr75RW6md8/MGnfL386zLLc+/3/h1bm1ig9KRdN
PkJYYMxqmUux3ueNCj0kxlnWcctsXaQpChxrdhTns+yxj+32bHXzDiqR8Me4m1IPQkqdpAW2
KQ0fNlop1E4PguteLdQafmtz6DIdIid4N8hgJ75UevlUf705+nJlZCYTLFATfEAO0liiqZxf
kcuvvU7dmjKFFdH1pfscQDCDbDD5EaHSp7rEShWJbxrOfxc6RoHEWmBzwo/uSlbVh6ZJ+7Gf
4NTodj3yG6NadnGUw1dtmTqjiokAAAAAAAA=
--------------ms050405090709090407070503--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?97f166f0-4d47-d5a3-ecb3-d15f1ecf9c1f>