Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 May 2019 08:46:26 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-stable@freebsd.org
Subject:   Re: ZFS...
Message-ID:  <a82bfabe-a8c3-fd9a-55ec-52530d4eafff@denninger.net>
In-Reply-To: <A535026E-F9F6-4BBA-8287-87EFD02CF207@sorbs.net>
References:  <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <CAOtMX2gf3AZr1-QOX_6yYQoqE-H%2B8MjOWc=eK1tcwt5M3dCzdw@mail.gmail.com> <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net> <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it> <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net> <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de> <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net> <CAGMYy3tYqvrKgk2c==WTwrH03uTN1xQifPRNxXccMsRE1spaRA@mail.gmail.com> <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net> <d0118f7e-7cfc-8bf1-308c-823bce088039@denninger.net> <2e4941bf-999a-7f16-f4fe-1a520f2187c0@sorbs.net> <20190430102024.E84286@mulder.mintsol.com> <41FA461B-40AE-4D34-B280-214B5C5868B5@punkt.de> <20190506080804.Y87441@mulder.mintsol.com> <08E46EBF-154F-4670-B411-482DCE6F395D@sorbs.net> <33D7EFC4-5C15-4FE0-970B-E6034EF80BEF@gromit.dlib.vt.edu> <A535026E-F9F6-4BBA-8287-87EFD02CF207@sorbs.net>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms010102010001050009060703
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 5/7/2019 00:02, Michelle Sullivan wrote:
> The problem I see with that statement is that the zfs dev mailing lists=
 constantly and consistently following the line of, the data is always ri=
ght there is no need for a =E2=80=9Cfsck=E2=80=9D (which I actually get) =
but it=E2=80=99s used to shut down every thread... the irony is I=E2=80=99=
m now installing windows 7 and SP1 on a usb stick (well it=E2=80=99s actu=
ally installed, but sp1 isn=E2=80=99t finished yet) so I can install a zf=
s data recovery tool which reports to be able to =E2=80=9Cwalk the data=E2=
=80=9D to retrieve all the files...  the irony eh... install windows7 on =
a usb stick to recover a FreeBSD installed zfs filesystem...  will let yo=
u know if the tool works, but as it was recommended by a dev I=E2=80=99m =
hopeful... have another array (with zfs I might add) loaded and ready to =
go... if the data recovery is successful I=E2=80=99ll blow away the origi=
nal machine and work out what OS and drive setup will be safe for the dat=
a in the future.  I might even put FreeBSD and zfs back on it, but if I d=
o it won=E2=80=99t be in the current Zraid2 config.

Meh.

Hardware failure is, well, hardware failure.=C2=A0 Yes, power-related
failures are hardware failures.

Never mind the potential for /software /failures.=C2=A0 Bugs are, well,
bugs.=C2=A0 And they're a real thing.=C2=A0 Never had the shortcomings of=
 UFS bite
you on an "unexpected" power loss?=C2=A0 Well, I have.=C2=A0 Is ZFS absol=
utely
safe against any such event?=C2=A0 No, but it's safe*r*.

I've yet to have ZFS lose an entire pool due to something bad happening,
but the same basic risk (entire filesystem being gone) has occurred more
than once in my IT career with other filesystems -- including UFS, lowly
MSDOS and NTFS, never mind their predecessors all the way back to floppy
disks and the first 5Mb Winchesters.=C2=A0

I learned a long time ago that two is one and one is none when it comes
to data, and WHEN two becomes one you SWEAT, because that second failure
CAN happen at the worst possible time.

As for RaidZ2 .vs. mirrored it's not as simple as you might think.=C2=A0
Mirrored vdevs can only lose one member per mirror set, unless you use
three-member mirrors.=C2=A0 That sounds insane but actually it isn't in
certain circumstances, such as very-read-heavy and high-performance-read
environments.

The short answer is that a 2-way mirrored set is materially faster on
reads but has no acceleration on writes, and can lose one member per
mirror.=C2=A0 If the SECOND one fails before you can resilver, and that
resilver takes quite a long while if the disks are large, you're dead.=C2=
=A0
However, if you do six drives as a 2x3 way mirror (that is, 3 vdevs each
of a 2-way mirror) you now have three parallel data paths going at once
and potentially six for reads -- and performance is MUCH better.=C2=A0 A
3-way mirror can lose two members (and could be organized as 3x2) but
obviously requires lots of drive slots, 3x as much *power* per gigabyte
stored (and you pay for power twice; once to buy it and again to get the
heat out of the room where the machine is.)

Raidz2 can also lose 2 drives without being dead.=C2=A0 However, it doesn=
't
get any of the read performance improvement *and* takes a write
performance penalty; Z2 has more write penalty than Z1 since it has to
compute and write two parity entries instead of one, although in theory
at least it can parallel those parity writes -- albeit at the cost of
drive bandwidth congestion (e.g. interfering with other accesses to the
same disk at the same time.)=C2=A0 In short RaidZx performs about as "wel=
l"
as the *slowest* disk in the set.=C2=A0 So why use it (particularly Z2) a=
t
all?=C2=A0 Because for "N" drives you get the protection of a 3-way mirro=
r
and *much* more storage.=C2=A0 A six-member RaidZ2 setup returns ~4Tb of
usable space, where with a 2-way mirror it returns 3Tb and a 3-way
mirror (which provides the same protection against drive failure as Z2)
you have only *half* the storage.=C2=A0 IMHO ordinary Raidz isn't worth t=
he
trade-offs, but Z2 frequently is.

In addition more spindles means more failures, all other things being
equal, so if you need "X" TB of storage and organize it as 3-way mirrors
you now have twice as many physical spindles which means on average
you'll take twice as many faults.=C2=A0 If performance is more important =
then
the choice is obvious.=C2=A0 If density is more important (that is, a lot=
 or
even most of the data is rarely accessed at all) then the choice is
fairly simple too.=C2=A0 In many workloads you have some of both, and thu=
s
the correct choice is a hybrid arrangement; that's what I do here,
because I have a lot of data that is rarely-to-never accessed and
read-only but also have some data that is frequently accessed and
frequently written.=C2=A0 One size does not fit all in such a workload.

MOST systems, by the way, have this sort of paradigm (a huge percentage
of the data is rarely read and never written) but it doesn't become
economic or sane to try to separate them until you get well into the
terabytes of storage range and a half-dozen or so physical volumes.=C2=A0=

There's a=C2=A0 very clean argument that prior to that point but with gre=
ater
than one drive mirrored is always the better choice.

Note that if you have an *adapter* go insane (and as I've noted here
I've had it happen TWICE in my IT career!) then *all* of the data on the
disks served by that adapter is screwed.

It doesn't make a bit of difference what filesystem you're using in that
scenario and thus you had better have a backup scheme and make sure it
works as well, never mind software bugs or administrator stupidity ("dd"
as root to the wrong target, for example, will reliably screw you every
single time!)

For a single-disk machine ZFS is no *less* safe than UFS and provides a
number of advantages, with arguably the most-important being easily-used
snapshots.=C2=A0 Not only does this simplify backups since coherency duri=
ng
the backup is never at issue and incremental backups become fast and
easily-done in addition boot environments make roll-forward and even
*roll-back* reasonable to implement for software updates -- a critical
capability if you ever run an OS version update and something goes
seriously wrong with it.=C2=A0 If you've never had that happen then consi=
der
yourself blessed; it's NOT fun to manage in a UFS environment and often
winds up leading to a "restore from backup" scenario.=C2=A0 (To be fair i=
t
can be with ZFS too if you're foolish enough to upgrade the pool before
being sure you're happy with the new OS rev.)

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms010102010001050009060703
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
DdgwggagMIIEiKADAgECAhMA5EiKghDOXrvfxYxjITXYDdhIMA0GCSqGSIb3DQEBCwUAMIGL
MQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJTmljZXZpbGxlMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExITAf
BgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQTAeFw0xNzA4MTcxNjQyMTdaFw0yNzA4
MTUxNjQyMTdaMHsxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkwFwYDVQQKDBBD
dWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExJTAjBgNVBAMMHEN1
ZGEgU3lzdGVtcyBMTEMgMjAxNyBJbnQgQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIK
AoICAQC1aJotNUI+W4jP7xQDO8L/b4XiF4Rss9O0B+3vMH7Njk85fZ052QhZpMVlpaaO+sCI
KqG3oNEbuOHzJB/NDJFnqh7ijBwhdWutdsq23Ux6TvxgakyMPpT6TRNEJzcBVQA0kpby1DVD
0EKSK/FrWWBiFmSxg7qUfmIq/mMzgE6epHktyRM3OGq3dbRdOUgfumWrqHXOrdJz06xE9NzY
vc9toqZnd79FUtE/nSZVm1VS3Grq7RKV65onvX3QOW4W1ldEHwggaZxgWGNiR/D4eosAGFxn
uYeWlKEC70c99Mp1giWux+7ur6hc2E+AaTGh+fGeijO5q40OGd+dNMgK8Es0nDRw81lRcl24
SWUEky9y8DArgIFlRd6d3ZYwgc1DMTWkTavx3ZpASp5TWih6yI8ACwboTvlUYeooMsPtNa9E
6UQ1nt7VEi5syjxnDltbEFoLYcXBcqhRhFETJe9CdenItAHAtOya3w5+fmC2j/xJz29og1KH
YqWHlo3Kswi9G77an+zh6nWkMuHs+03DU8DaOEWzZEav3lVD4u76bKRDTbhh0bMAk4eXriGL
h4MUoX3Imfcr6JoyheVrAdHDL/BixbMH1UUspeRuqQMQ5b2T6pabXP0oOB4FqldWiDgJBGRd
zWLgCYG8wPGJGYgHibl5rFiI5Ix3FQncipc6SdUzOQIDAQABo4IBCjCCAQYwHQYDVR0OBBYE
FF3AXsKnjdPND5+bxVECGKtc047PMIHABgNVHSMEgbgwgbWAFBu1oRhUMNEzjODolDka5k4Q
EDBioYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJ
TmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5
c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYIJAKxAy1WBo2kY
MBIGA1UdEwEB/wQIMAYBAf8CAQAwDgYDVR0PAQH/BAQDAgGGMA0GCSqGSIb3DQEBCwUAA4IC
AQCB5686UCBVIT52jO3sz9pKuhxuC2npi8ZvoBwt/IH9piPA15/CGF1XeXUdu2qmhOjHkVLN
gO7XB1G8CuluxofOIUce0aZGyB+vZ1ylHXlMeB0R82f5dz3/T7RQso55Y2Vog2Zb7PYTC5B9
oNy3ylsnNLzanYlcW3AAfzZcbxYuAdnuq0Im3EpGm8DoItUcf1pDezugKm/yKtNtY6sDyENj
tExZ377cYA3IdIwqn1Mh4OAT/Rmh8au2rZAo0+bMYBy9C11Ex0hQ8zWcvPZBDn4v4RtO8g+K
uQZQcJnO09LJNtw94W3d2mj4a7XrsKMnZKvm6W9BJIQ4Nmht4wXAtPQ1xA+QpxPTmsGAU0Cv
HmqVC7XC3qxFhaOrD2dsvOAK6Sn3MEpH/YrfYCX7a7cz5zW3DsJQ6o3pYfnnQz+hnwLlz4MK
17NIA0WOdAF9IbtQqarf44+PEyUbKtz1r0KGeGLs+VGdd2FLA0e7yuzxJDYcaBTVwqaHhU2/
Fna/jGU7BhrKHtJbb/XlLeFJ24yvuiYKpYWQSSyZu1R/gvZjHeGb344jGBsZdCDrdxtQQcVA
6OxsMAPSUPMrlg9LWELEEYnVulQJerWxpUecGH92O06wwmPgykkz//UmmgjVSh7ErNvL0lUY
UMfunYVO/O5hwhW+P4gviCXzBFeTtDZH259O7TCCBzAwggUYoAMCAQICEwCg0WvVwekjGFiO
62SckFwepz0wDQYJKoZIhvcNAQELBQAwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3Jp
ZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBD
QTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExMQyAyMDE3IEludCBDQTAeFw0xNzA4MTcyMTIx
MjBaFw0yMjA4MTYyMTIxMjBaMFcxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRswGQYDVQQDDBJrYXJsQGRlbm5pbmdlci5uZXQw
ggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQC+HVSyxVtJhy3Ohs+PAGRuO//Dha9A
16l5FPATr6wude9zjX5f2lrkRyU8vhCXTZW7WbvWZKpcZ8r0dtZmiK9uF58Ec6hhvfkxJzbg
96WHBw5Fumd5ahZzuCJDtCAWW8R7/KN+zwzQf1+B3MVLmbaXAFBuKzySKhKMcHbK3/wjUYTg
y+3UK6v2SBrowvkUBC+jxNg3Wy12GsTXcUS/8FYIXgVVPgfZZrbJJb5HWOQpvvhILpPCD3xs
YJFNKEPltXKWHT7Qtc2HNqikgNwj8oqOb+PeZGMiWapsatKm8mxuOOGOEBhAoTVTwUHlMNTg
6QUCJtuWFCK38qOCyk9Haj+86lUU8RG6FkRXWgMbNQm1mWREQhw3axgGLSntjjnznJr5vsvX
SYR6c+XKLd5KQZcS6LL8FHYNjqVKHBYM+hDnrTZMqa20JLAF1YagutDiMRURU23iWS7bA9tM
cXcqkclTSDtFtxahRifXRI7Epq2GSKuEXe/1Tfb5CE8QsbCpGsfSwv2tZ/SpqVG08MdRiXxN
5tmZiQWo15IyWoeKOXl/hKxA9KPuDHngXX022b1ly+5ZOZbxBAZZMod4y4b4FiRUhRI97r9l
CxsP/EPHuuTIZ82BYhrhbtab8HuRo2ofne2TfAWY2BlA7ExM8XShMd9bRPZrNTokPQPUCWCg
CdIATQIDAQABo4IBzzCCAcswPAYIKwYBBQUHAQEEMDAuMCwGCCsGAQUFBzABhiBodHRwOi8v
b2NzcC5jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNVHRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIF
oDAOBgNVHQ8BAf8EBAMCBeAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMDMGCWCG
SAGG+EIBDQQmFiRPcGVuU1NMIEdlbmVyYXRlZCBDbGllbnQgQ2VydGlmaWNhdGUwHQYDVR0O
BBYEFLElmNWeVgsBPe7O8NiBzjvjYnpRMIHKBgNVHSMEgcIwgb+AFF3AXsKnjdPND5+bxVEC
GKtc047PoYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UE
BwwJTmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRh
IFN5c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYITAORIioIQ
zl6738WMYyE12A3YSDAdBgNVHREEFjAUgRJrYXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcN
AQELBQADggIBAJXboPFBMLMtaiUt4KEtJCXlHO/3ZzIUIw/eobWFMdhe7M4+0u3te0sr77QR
dcPKR0UeHffvpth2Mb3h28WfN0FmJmLwJk+pOx4u6uO3O0E1jNXoKh8fVcL4KU79oEQyYkbu
2HwbXBU9HbldPOOZDnPLi0whi/sbFHdyd4/w/NmnPgzAsQNZ2BYT9uBNr+jZw4SsluQzXG1X
lFL/qCBoi1N2mqKPIepfGYF6drbr1RnXEJJsuD+NILLooTNf7PMgHPZ4VSWQXLNeFfygoOOK
FiO0qfxPKpDMA+FHa8yNjAJZAgdJX5Mm1kbqipvb+r/H1UAmrzGMbhmf1gConsT5f8KU4n3Q
IM2sOpTQe7BoVKlQM/fpQi6aBzu67M1iF1WtODpa5QUPvj1etaK+R3eYBzi4DIbCIWst8MdA
1+fEeKJFvMEZQONpkCwrJ+tJEuGQmjoQZgK1HeloepF0WDcviiho5FlgtAij+iBPtwMuuLiL
shAXA5afMX1hYM4l11JXntle12EQFP1r6wOUkpOdxceCcMVDEJBBCHW2ZmdEaXgAm1VU+fnQ
qS/wNw/S0X3RJT1qjr5uVlp2Y0auG/eG0jy6TT0KzTJeR9tLSDXprYkN2l/Qf7/nT6Q03qyE
QnnKiBXWAZXveafyU/zYa7t3PTWFQGgWoC4w6XqgPo4KV44OMYIFBzCCBQMCAQEwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBglghkgBZQMEAgMFAKCCAkUw
GAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTkwNTA3MTM0NjI2
WjBPBgkqhkiG9w0BCQQxQgRAOYQkBDX0w1SyuZowvLyKERKTu0KfFWvNwvQ+h7FPK7xM4F3D
MftnjzqjVhuDuhA4Qntbz4XEDZXJkKUo+kFQIzBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFl
AwQBKjALBglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3
DQMCAgFAMAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIGjBgkrBgEEAYI3EAQxgZUwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTCBpQYLKoZIhvcNAQkQAgsxgZWg
gZIwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lz
dGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0
ZW1zIExMQyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBgkqhkiG9w0BAQEF
AASCAgCKguBHkVXQdYg1mTgrRWpiQ3hv0b1FJQV9IOkIq9jYlahMylctdekrZjWiLHqX8+HW
s7nzHZahyYiA1ke6JVxloYrm2LLL9Sj0Jo9CEhwyky8aAYY4JcqNJ3ehN5+wHyIEaiHteB88
hXXobjpQA9aDRSXozH3njZ7zdRxaYCWg/FkMLILGbknoLM4uhn6ToCnSLKJD1FVXTBoFoc+b
uHbbo3Ueo8/vNZIXNWR7k85yZXHhEDE7OPhnwGH0aoH8/70KKqsZtu9xEnlTvGKlAGBpo5sH
601rvszw/22GKOfKv8zAIb0C4K8p3IPHLJSu8zuEfnSr9LmY2Iq78rk4NXa5HVm8HtJEbqkn
pRbEKSOhJRsijKwPD7XtrKtw5BsiddtfHKxN6kAgAsEKLY0Ft/7m/F06Zkfdn1FmrhBSkNtU
WXIoB6xzgdKHQCK/qbQQXWyMqcyODORsnkz+LgRB96JgZ10vp338XTiwAjzZ2CYp2dnI5QM7
bQlbTME7IvxVeHXvZpZ2XIGuDe9kZxivcah2DyZki3YaIw/o6prGwnPYE3zHvpff9h9HlwnM
TXN2ELofe/G2Dobbc/+WEX8qH4822PSEyHOhAr+AIKAJg5R98quORqsL65Jzp2omys1Y1s7p
aeT/I/Nu2X3i56iSQ5hnjXBayRT+2pX1ty/vTHHFuQAAAAAAAA==
--------------ms010102010001050009060703--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a82bfabe-a8c3-fd9a-55ec-52530d4eafff>