From owner-freebsd-stable@FreeBSD.ORG Thu Mar 27 11:52:49 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A2D59A51 for ; Thu, 27 Mar 2014 11:52:49 +0000 (UTC) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "NewFS.denninger.net", Issuer "NewFS.denninger.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4AE4C6C5 for ; Thu, 27 Mar 2014 11:52:48 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.8/8.14.8) with ESMTP id s2RBqhSu004142 for ; Thu, 27 Mar 2014 06:52:43 -0500 (CDT) (envelope-from karl@denninger.net) Received: from [127.0.0.1] (TLS/SSL) [192.168.1.40] by Spamblock-sys (LOCAL/AUTH); Thu Mar 27 06:52:43 2014 Message-ID: <53341106.4060101@denninger.net> Date: Thu, 27 Mar 2014 06:52:38 -0500 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix References: <201403261230.s2QCU3vI095105@freefall.freebsd.org> <8659e58b9fabd9f553c8be5da5dc61fd@mail.mikej.com> In-Reply-To: <8659e58b9fabd9f553c8be5da5dc61fd@mail.mikej.com> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha1; boundary="------------ms060209040607030606090406" X-Antivirus: avast! (VPS 140326-2, 03/26/2014), Outbound message X-Antivirus-Status: Clean X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Mar 2014 11:52:49 -0000 This is a cryptographically signed message in MIME format. --------------ms060209040607030606090406 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable On 3/27/2014 4:11 AM, mikej wrote: > I've been running the latest patch now on r263711 and want to give it=20 > a +1 > > No ZFS knobs set and I must go out of my way to have my system swap. > > I hope this patch gets a much wider review and can be put into the > tree permanently. > > Karl, thanks for the working on this. > > Regards, > > Michael Jung No problem; I was being driven insane by the stalls and related bad=20 behavior... and there's that old saw about complaining about something=20 without proposing a fix for it (I've done it!) being "less than optimum" = so.... :-) Hopefully wider review (and, if the general consensus is similar to what = I've seen here and what you're reporting as well, inclusion in the=20 codebase) will come. On my sandbox system I have to get truly abusive before I can get the=20 system to swap now, but that load is synthetic and we all know what=20 sometimes happens when you try to extrapolate from synthetic loads to=20 real production ones. What really has my attention is the impact on systems running live=20 production loads. It has entirely changed the character of those machines, working=20 equally-well for both pure ZFS machines and mixed UFS/ZFS systems. One=20 of these systems that gets pounded on pretty good and has a=20 moderately-large configuration (~10TB of storage, 2 Xeon quad-core=20 processors and 24GB of RAM serving a combination of Samba users=20 internally, a decently-large Postgres installation supporting an=20 externally-facing web forum and blog application, email and similar=20 things) has been completely transformed from being "frequently=20 challenged" by its workload to literally loafing 90%+ of the day. DBMS=20 response times have seen their standard deviation drop by an order of=20 magnitude with best-response times down for one of the most-common query = sequences (~30 separate ops) from ~180ms to ~140. This particular machine has a separate pool for the system itself (root, = usr and var) which was formerly UFS because it had to be in order to=20 avoid the worst of the "stall" bad behavior. It also has two other=20 pools on it, one for read-nearly-only data sets that are comprised of=20 very large files that are almost archival in character and a second that = has the system's "working set" on it. The latter has a separate intent=20 log; I had a cache SSD drive on it as well but have recently dropped=20 that as with these changes it no longer produces a material improvement=20 in performance. I'm frankly not sure the intent log is helping any more = either but I've yet to drop it and instrument the results -- it used to=20 be *necessary* to avoid nasty problems during busy periods. I now have that machine set up booting from ZFS with the system on a=20 mirrored pool dedicated to system images, with lz4 *and* dedup on (for=20 that filesystem's root), which allows me to clone it almost instantly,=20 start a jail on the clone and then do a "buildworld buildkernel -j8"=20 while only allocating storage to actual changes. Dedup ratio on that=20 mirror set is 1.4x and lz4 is showing a net compression ratio of 2.01x.=20 Even better I cannot provoke misbehavior by doing this sort of thing=20 during the middle of the day where formerly that was just begging for=20 trouble; the impact on user perceptible performance during it is zero=20 although I can see the degradation in performance (a modest increase in=20 system latency) in the stats. Oh, did I mention that everything except the boot/root/usr/var=20 filesystems (including swap) are geli-encrypted on this machine as well=20 and that the nightly PC backup jobs bury the GIG-E interface on which=20 they're attached -- and sustain that performance against the ZFS disks=20 for the duration? (The machine does have AESNI loaded....) Finally swap allocation remains at zero throughout all of this. At present, coming off the overnight that has an activity spike for=20 routine in-house backup activity from connected PCs but is otherwise the = "low point" of activity shows 1GB of free memory, an "auto-tuned" amount = of 12.9GB of ARC cache (with a maximum size of 22.3) and inactive pages=20 have remained stable. Wired memory is almost 19GB with Postgres using a = sizable chunk of it. Cache efficiency is claimed to be 98.9% (!) =20 That'll go down somewhat over the day but during the busiest part of the = day it remains well into the 90s which I'm sure has a heck of a lot to=20 do with the performance improvements.... Cross-posted over to -STABLE in the hope of expanding review and testing = by others. --=20 -- Karl karl@denninger.net --------------ms060209040607030606090406 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDAzMjcxMTUyMzhaMCMGCSqGSIb3DQEJBDEW BBST4cX/La1k81iZ920b7xLGzU6ExzBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIAg58Yb4ERaNOoW98HiWSJ9hmNZVot ULYq1OHZwR4jRaSIpWM9bKiMj8VQ+2XJvfB4VfqNRZhJKhm96Ssx7k7gAM8MX/U1U4OReih8 fDRsI+YsAeDiog6gAG1CsTZiXF1K0yMXTa/o2WaODssbS9sDH7utMaeH/u/XwawPRl+NAEN6 e+0cceRNTPg3k/iWkErg0CC6XmlXrFEXfH29ytyMF+dtIKnqXyxbkIeo+Hd5JFUSn+2cAa9D cfHHcNwF1sEas0Y+4X63yBrZAf68nCyYngQZaqob1Ox2LfL+GQ0S63WpiBRfvPZldUOfQIzE hh00FoL2lwInI1geMnB1k9qRFxvI2SPVxBA3ic/seBb0wbyXb+dnyK7dhq9XwX/Wl7FpT/N6 jf6EwwPFAkSqGsC6Xa5D1/tgWjnrX7rIqIBCSkWFjXakUTFvxpwF7jAJrX2ucG4uZM5+Z9qP 1V1/hA/NvqU6fjr2HOS6O0bKiKWL7iYRHFjxRExq0vkTwEQTwOb4fTmGTHVj+ojYUlGIUsQq xcj/O3w7zdzD3RncCjqGs6+sutTODkIQa0medmBNWPOEOdAgPHaYa+GmG4Kp7nvnwHRECnr3 KQoZu0TyRnoqPvq+cYSPSTerkzp6GXsIMjLoazsiW+m4dAP1N7aTxmWlQ47M1n5mc+E34o5K KSdVZcQAAAAAAAA= --------------ms060209040607030606090406--