Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 May 2014 07:52:03 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-fs@freebsd.org
Subject:   Re: Turn off RAID read and write caching with ZFS?
Message-ID:  <537DF2F3.10604@denninger.net>
In-Reply-To: <719056985.20140522033824@supranet.net>
References:  <719056985.20140522033824@supranet.net>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms030208020007040007000000
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable


On 5/22/2014 5:38 AM, Jeff Chan wrote:
> As mentioned before we have a server with the LSI 2208 RAID chip which
> apparently doesn't seem to have HBA firmware available.  (If anyone
> knows of one, please let me know.)  Therefore we are running each drive=

> as separate, individual RAID0, and we've turned off the RAID harware
> read and write caching on the claim it performs better with ZFS, such
> as:
>
>
> http://forums.freenas.org/index.php?threads/disable-cache-flush.12253/
>
> " cyberjock, Apr 7, 2013
>
>      AAh. You have a RAID controller with on-card RAM. Based on my
> testing with 3 different RAID controllers that had RAM and benchmark
> and real world tests, here's my recommended settings for ZFS users:
>
>      1. Disable your on-card write cache. Believe it or not this
> improves write performance significantly. I was very disappointed with
> this choice, but it seems to be a universal truth. I upgraded one of
> the cards to 4GB of cache a few months before going to ZFS and I'm
> disappointed that I wasted my money. It helped a LOT on the Windows
> server, but in FreeBSD it's a performance killer. :("
>
>      2. If your RAID controller supports read-ahead cache, you should
> be setting to either "disabled", the most "conservative"(smallest
> read-ahead) or "normal"(medium size read-ahead). I found that
> "conservative" was better for random reads from lots of users and the
> "normal" was better for things where you were constantly reading a
> file in order(such as copying a single very large file). If you choose
> anything else for the read-ahead size the latency of your zpool will
> go way up because any read by the zpool will be multiplied by 100x
> because the RAID card is constantly reading a bunch of sectors before
> and after the one sector or area requested."
>
>
>
> Does anyone have any comments or test results about this?  I have not
> attempted to test it independently.  Should we run with RAID hardware
> caching on or off?
>
That's mostly-right.

Write-caching is very evil in a ZFS world, because ZFS checksums each=20
block.  If the filesystem gets back an "OK" for a block not actually on=20
the disk ZFS will presume the checksum is ok.  If that assumption proves =

to be false down the road you're going to have a very bad day.

READ caching is not so simple.  The problem that comes about is that in=20
order to obtain the best speed from a spinning piece of rust you must=20
read whole tracks.  If you don't you take a latency penalty every time=20
you want a sector, because you must wait for the rust to pass under the=20
head.  If you read a single sector and then come back to read a second=20
one inter-sector gap sync is lost and you get to wait for another rotatio=
n.

Therefore what you WANT for spinning rust in virtually all cases is for=20
all reads coming off the rust to be one full **TRACK** in size. If you=20
wind up only using one sector of that track you still don't get hurt=20
materially because you had to wait for the rotational latency anyway as=20
soon as you move the head.

Unfortunately this stopped being easy to figure out quite a long time=20
ago in the disk drive world with the sort of certainty that you need to=20
best-optimize workload.  It used to be that ST506-style drives had 17=20
sectors per track and RLL 2,7 ones had 26.  Then areal density became=20
the limit and variable geometry showed up, frustrating an operating=20
system (or disk controller!) that tried to, at the driver level, issue=20
one DMA command per physical track in an attempt to capitalize on the=20
fact that all but the first sector read for a given rotation were=20
essentially "free".

Modern drives typically try to compensate for their=20
variable-geometryness through their own read-ahead cache, but the exact=20
details of their algorithm are typically not exposed.

What I would love to find is a "buffered" controller that recognizes all =

of this and works as follows:

1. Writes, when committed, are committed and no return is made until=20
storage has written the data and claims it's on the disk.  If the=20
sector(s) written are in the buffer memory (from a previous read in 2=20
below) then the write physically alters both the disk AND the buffer.

2. Reads are always one full track in size and go into the buffer memory =

on a LRU basis.  A read for a sector already in the buffer memory=20
results in no physical I/O taking place.  The controller does not store=20
sectors per-se in the buffer, it stores tracks.  This requires that the=20
adapter be able to discern the *actual* underlying geometry of the drive =

so it knows where track boundaries are.  Yes, I know drive caches=20
themselves try to do this, but how well do they manage?  Evidence=20
suggests that it's not particularly effective.

Without this read cache is a crapshoot that gets difficult to tune and=20
is very workload-dependent in terms of what delivers best performance. =20
All you can do is tune (if you're able with a given controller) and test.=


--=20
-- Karl
karl@denninger.net



--------------ms030208020007040007000000
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC
BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI
EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv
bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5
MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg
RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG
SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th
d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W
6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV
jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5
SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY
5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8
Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4
GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci
WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN
nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB
o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg
hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4
+LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG
CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy
bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO
31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c
L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1
YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD
pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE
f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk
YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD
VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2
aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ
KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDA1MjIxMjUyMDNaMCMGCSqGSIb3DQEJBDEW
BBSvff0Qm625KGVcraX266RLbfn6ODBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL
BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV
BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT
EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq
hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG
9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV
BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk
YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh
c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIAK5XZy5GieJaKJXpeuhNaOC6vRDjg
9fs/ovWaEYWt2FLgO2rtb2vFNihDLcsgd2JdbEfo/6/9Z01ve5jng2JbkFWfvDxjpZxQxZ8d
jBsn9PASBYwdicfz4Or1A8erlj3tKU1IEmx7zJBkXj8kOFdsftoBo3hhdfBQXmzlFxI1pcKJ
j+5KWmMDIUHxbVaf/lRBxjLxsGSrTZkVUtOCjIxR671WJ45+2R9dBRgC4R+Az/mMt530cVq7
bsWlNfnH8nZashxx3omYkMVAFjs81ffKJFKvlTL40PJt4rnoiYc2PxXQftSoORLyeJ8kvJvH
9KHmX1XpRtU2IczgGg9/KcJgYDvmJMTq/SP+lfpR+IPFkDubn24NDnW7VI2st8WomR1kXhtA
BSc13h3GeMT67CxyZGzGNu6AY3KrjVGmzPyxvVC9nDj+l4CMz35gCIkx4YjCbwyL4nMp3C8h
ex6e8HH4aV5etTja2s65gJXrFwzEVjgsW9NJaazhv6hCc3yWeoa05Vd9W/3NB77VKWZSLXzZ
/qTGu8Iaqbm2yhkvezfHaplLgxik/E3zsVF28vSwq+TFg4s5FUdJVLst/D84RGYbRlbiotnL
EXGLrP9cu0CBoYWZDe1xZwrodutGCkF4y15ppSsHmUaz+/jw7A3QTKoy72O56Jo7souUM4/5
4r1y7h0AAAAAAAA=
--------------ms030208020007040007000000--





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?537DF2F3.10604>