Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Apr 2019 14:01:03 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-stable@freebsd.org
Subject:   Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
Message-ID:  <f87f32f2-b8c5-75d3-4105-856d9f4752ef@denninger.net>

next in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms030000080804070202030104
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

I've run into something often -- and repeatably -- enough since updating
to 12-STABLE that I suspect there may be a code problem lurking in the
ZFS stack or in the driver and firmware compatibility with various HBAs
based on the LSI/Avago devices.

The scenario is this -- I have data sets that are RaidZ2 that are my
"normal" working set; one is comprised of SSD volumes and one of
spinning rust volumes.=C2=A0 These all are normal and scrubs never show
problems.=C2=A0 I've had physical failures with them over the years (alth=
ough
none since moving to 12-STABLE as of yet) and have never had trouble
with resilvers or other misbehavior.

I also have a "backup" pool that is a 3-member mirror, to which the
volatile (that is, the zfs filesystems not set read-only) has zfs send's
done to.=C2=A0 Call them backup-i, backup-e1 and backup-e2.

All disks in these pools are geli-encrypted running on top of a
freebsd-zfs partition inside a GPT partition table using -s 4096 (4k)
geli "sectors".

Two of the backup mirror members are always in the machine; backup-i
(the base internal drive) is never removed.=C2=A0 The third is in a bank
vault.=C2=A0 Every week the vault drive is exchanged with the other, so t=
hat
the "first" member is never removed from the host, but the other two
(-e1 and -e2) alternate.=C2=A0 If the building burns I have a full copy o=
f
all the volatile data in the vault.=C2=A0 (I also have mirrored copies, 2=

each, of all the datasets that are operationally read-only in the vault
too; those get updated quarterly if there are changes to the
operationally read-only portion of the data store.)=C2=A0 The drive in th=
e
vault is swapped weekly, so a problem should be detected almost
immediately before it can bugger me.

Before removing the disk intended to go to the vault I "offline" it,
spin it down (camcontrol standby) which issues a standby immediate to
the drive insuring that its cache is flushed and the spindle spun down
and then pull it.=C2=A0 I go exchange them at the bank, insert the other =
one,
and "zpool online...." it, which automatically resilvers it.

The disk resilvers and all is well -- no errors.

Or is it all ok?

If I run a scrub on the pool as soon as the resilver completes the disk
I just inserted will /invariably /have a few checksum errors on it that
the scrub fixes.=C2=A0 It's not a large number, anywhere from a couple do=
zen
to a hundred or so, but it's not zero -- and it damn well should be as
the resilver JUST COMPLETED with no errors which means the ENTIRE DISK'S
IN USE AREA was examined, compared, and blocks not on the "new member"
or changed copied over.=C2=A0 The "-i" disk (the one that is never pulled=
)
NEVER is the one with the checksum errors on it -- it's ALWAYS the one I
just inserted and which was resilvered to.

If I zpool clear the errors and scrub again all is fine -- no errors.=C2=A0=

If I scrub again before pulling the disk the next time to do the swap
all is fine as well.=C2=A0 I swap the two, resilver, and I'll get a few m=
ore
errors on the next scrub, ALWAYS on the disk I just put in.

Smartctl shows NO errors on the disk.=C2=A0 No ECC, no reallocated sector=
s,
no interface errors, no resets, nothing.=C2=A0 Smartd is running and neve=
r
posts any real-time complaints, other than the expected one a minute or
two after I yank the drive to take it to the bank.=C2=A0 There are no
CAM-related errors printing on the console either.=C2=A0 So ZFS says ther=
e's
a *silent* data error (bad checksum; never a read or write error) in a
handful of blocks but the disk says there have been no errors, the
driver does not report any errors, there have been no power failures as
the disk was in a bank vault and thus it COULDN'T have had a write-back
cache corruption event or similar occur.

I never had trouble with this under 11.1 or before and have been using
this paradigm for something on the order of five years running on this
specific machine without incident.=C2=A0 Now I'm seeing it repeatedly and=

*reliably* under 12.0-STABLE.=C2=A0 I swapped the first disk that did it,=

thinking it was physically defective -- the replacement did it on the
next swap.=C2=A0 In fact I've yet to record a swap-out on 12-STABLE that
*hasn't* done this and yet it NEVER happened under 11.1.=C2=A0 At the sam=
e
time I can run scrubs until the cows come home on the multiple Raidz2
packs on the same controller and never get any checksum errors on any of
them.

The firmware in the card was 19.00.00.00 -- again, this firmware *has
been stable for years.*=C2=A0

I have just rolled the firmware on the card forward to 20.00.07.00,
which is the "latest" available.=C2=A0 I had previously not moved to 20.x=

because earlier versions had known issues (some severe and potentially
fatal to data integrity) and 19 had been working without problem -- I
thus had no reason to move to 20.00.07.00.

But there apparently are some fairly significant timing differences
between the driver code in 11.1 and 11.2/12.0, as I discovered when the
SAS expander I used to have in these boxes started returning timeout
errors that were false.=C2=A0 Again -- this same configuration was comple=
tely
stable under 11.1 and previous over a period of years.

With 20.00.07.00 I have yet to have this situation recur -- so far --
but I have limited time with 20.00.07.00 and as such my confidence that
the issue is in fact resolved by the card firmware change is only modest
at this point.=C2=A0 Over the next month or so, if it doesn't happen agai=
n,
my confidence will of course improve.

Checksum errors on ZFS volumes are extraordinarily uncool for the
obvious reason -- they imply the disk thinks the data is fine (since it
is not recording any errors on the interface or at the drive level) BUT
ZFS thinks the data off that particular record was corrupt as the
checksum fails.=C2=A0 Silent corruption is the worst sort in that it can =
hide
for months or even years before being discovered, long after your
redundant copies have been re-used or overwritten.

Assuming I do not see a recurrence with the 20.00.07.00 firmware I would
suggest that UPDATING, the Release Notes or Errata have an entry added
that for 12.x forward card firmware revisions prior to 20.00.07.00 carry
*strong* cautions and that those with these HBAs be strongly urged to
flash the card forward to 20.00.07.00 before upgrading or installing.=C2=A0=

If you get a surprise of this sort and have no second copy that is not
impacted you could find yourself severely hosed.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms030000080804070202030104
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
DdgwggagMIIEiKADAgECAhMA5EiKghDOXrvfxYxjITXYDdhIMA0GCSqGSIb3DQEBCwUAMIGL
MQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJTmljZXZpbGxlMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExITAf
BgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQTAeFw0xNzA4MTcxNjQyMTdaFw0yNzA4
MTUxNjQyMTdaMHsxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkwFwYDVQQKDBBD
dWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExJTAjBgNVBAMMHEN1
ZGEgU3lzdGVtcyBMTEMgMjAxNyBJbnQgQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIK
AoICAQC1aJotNUI+W4jP7xQDO8L/b4XiF4Rss9O0B+3vMH7Njk85fZ052QhZpMVlpaaO+sCI
KqG3oNEbuOHzJB/NDJFnqh7ijBwhdWutdsq23Ux6TvxgakyMPpT6TRNEJzcBVQA0kpby1DVD
0EKSK/FrWWBiFmSxg7qUfmIq/mMzgE6epHktyRM3OGq3dbRdOUgfumWrqHXOrdJz06xE9NzY
vc9toqZnd79FUtE/nSZVm1VS3Grq7RKV65onvX3QOW4W1ldEHwggaZxgWGNiR/D4eosAGFxn
uYeWlKEC70c99Mp1giWux+7ur6hc2E+AaTGh+fGeijO5q40OGd+dNMgK8Es0nDRw81lRcl24
SWUEky9y8DArgIFlRd6d3ZYwgc1DMTWkTavx3ZpASp5TWih6yI8ACwboTvlUYeooMsPtNa9E
6UQ1nt7VEi5syjxnDltbEFoLYcXBcqhRhFETJe9CdenItAHAtOya3w5+fmC2j/xJz29og1KH
YqWHlo3Kswi9G77an+zh6nWkMuHs+03DU8DaOEWzZEav3lVD4u76bKRDTbhh0bMAk4eXriGL
h4MUoX3Imfcr6JoyheVrAdHDL/BixbMH1UUspeRuqQMQ5b2T6pabXP0oOB4FqldWiDgJBGRd
zWLgCYG8wPGJGYgHibl5rFiI5Ix3FQncipc6SdUzOQIDAQABo4IBCjCCAQYwHQYDVR0OBBYE
FF3AXsKnjdPND5+bxVECGKtc047PMIHABgNVHSMEgbgwgbWAFBu1oRhUMNEzjODolDka5k4Q
EDBioYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJ
TmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5
c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYIJAKxAy1WBo2kY
MBIGA1UdEwEB/wQIMAYBAf8CAQAwDgYDVR0PAQH/BAQDAgGGMA0GCSqGSIb3DQEBCwUAA4IC
AQCB5686UCBVIT52jO3sz9pKuhxuC2npi8ZvoBwt/IH9piPA15/CGF1XeXUdu2qmhOjHkVLN
gO7XB1G8CuluxofOIUce0aZGyB+vZ1ylHXlMeB0R82f5dz3/T7RQso55Y2Vog2Zb7PYTC5B9
oNy3ylsnNLzanYlcW3AAfzZcbxYuAdnuq0Im3EpGm8DoItUcf1pDezugKm/yKtNtY6sDyENj
tExZ377cYA3IdIwqn1Mh4OAT/Rmh8au2rZAo0+bMYBy9C11Ex0hQ8zWcvPZBDn4v4RtO8g+K
uQZQcJnO09LJNtw94W3d2mj4a7XrsKMnZKvm6W9BJIQ4Nmht4wXAtPQ1xA+QpxPTmsGAU0Cv
HmqVC7XC3qxFhaOrD2dsvOAK6Sn3MEpH/YrfYCX7a7cz5zW3DsJQ6o3pYfnnQz+hnwLlz4MK
17NIA0WOdAF9IbtQqarf44+PEyUbKtz1r0KGeGLs+VGdd2FLA0e7yuzxJDYcaBTVwqaHhU2/
Fna/jGU7BhrKHtJbb/XlLeFJ24yvuiYKpYWQSSyZu1R/gvZjHeGb344jGBsZdCDrdxtQQcVA
6OxsMAPSUPMrlg9LWELEEYnVulQJerWxpUecGH92O06wwmPgykkz//UmmgjVSh7ErNvL0lUY
UMfunYVO/O5hwhW+P4gviCXzBFeTtDZH259O7TCCBzAwggUYoAMCAQICEwCg0WvVwekjGFiO
62SckFwepz0wDQYJKoZIhvcNAQELBQAwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3Jp
ZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBD
QTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExMQyAyMDE3IEludCBDQTAeFw0xNzA4MTcyMTIx
MjBaFw0yMjA4MTYyMTIxMjBaMFcxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRswGQYDVQQDDBJrYXJsQGRlbm5pbmdlci5uZXQw
ggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQC+HVSyxVtJhy3Ohs+PAGRuO//Dha9A
16l5FPATr6wude9zjX5f2lrkRyU8vhCXTZW7WbvWZKpcZ8r0dtZmiK9uF58Ec6hhvfkxJzbg
96WHBw5Fumd5ahZzuCJDtCAWW8R7/KN+zwzQf1+B3MVLmbaXAFBuKzySKhKMcHbK3/wjUYTg
y+3UK6v2SBrowvkUBC+jxNg3Wy12GsTXcUS/8FYIXgVVPgfZZrbJJb5HWOQpvvhILpPCD3xs
YJFNKEPltXKWHT7Qtc2HNqikgNwj8oqOb+PeZGMiWapsatKm8mxuOOGOEBhAoTVTwUHlMNTg
6QUCJtuWFCK38qOCyk9Haj+86lUU8RG6FkRXWgMbNQm1mWREQhw3axgGLSntjjnznJr5vsvX
SYR6c+XKLd5KQZcS6LL8FHYNjqVKHBYM+hDnrTZMqa20JLAF1YagutDiMRURU23iWS7bA9tM
cXcqkclTSDtFtxahRifXRI7Epq2GSKuEXe/1Tfb5CE8QsbCpGsfSwv2tZ/SpqVG08MdRiXxN
5tmZiQWo15IyWoeKOXl/hKxA9KPuDHngXX022b1ly+5ZOZbxBAZZMod4y4b4FiRUhRI97r9l
CxsP/EPHuuTIZ82BYhrhbtab8HuRo2ofne2TfAWY2BlA7ExM8XShMd9bRPZrNTokPQPUCWCg
CdIATQIDAQABo4IBzzCCAcswPAYIKwYBBQUHAQEEMDAuMCwGCCsGAQUFBzABhiBodHRwOi8v
b2NzcC5jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNVHRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIF
oDAOBgNVHQ8BAf8EBAMCBeAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMDMGCWCG
SAGG+EIBDQQmFiRPcGVuU1NMIEdlbmVyYXRlZCBDbGllbnQgQ2VydGlmaWNhdGUwHQYDVR0O
BBYEFLElmNWeVgsBPe7O8NiBzjvjYnpRMIHKBgNVHSMEgcIwgb+AFF3AXsKnjdPND5+bxVEC
GKtc047PoYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UE
BwwJTmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRh
IFN5c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYITAORIioIQ
zl6738WMYyE12A3YSDAdBgNVHREEFjAUgRJrYXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcN
AQELBQADggIBAJXboPFBMLMtaiUt4KEtJCXlHO/3ZzIUIw/eobWFMdhe7M4+0u3te0sr77QR
dcPKR0UeHffvpth2Mb3h28WfN0FmJmLwJk+pOx4u6uO3O0E1jNXoKh8fVcL4KU79oEQyYkbu
2HwbXBU9HbldPOOZDnPLi0whi/sbFHdyd4/w/NmnPgzAsQNZ2BYT9uBNr+jZw4SsluQzXG1X
lFL/qCBoi1N2mqKPIepfGYF6drbr1RnXEJJsuD+NILLooTNf7PMgHPZ4VSWQXLNeFfygoOOK
FiO0qfxPKpDMA+FHa8yNjAJZAgdJX5Mm1kbqipvb+r/H1UAmrzGMbhmf1gConsT5f8KU4n3Q
IM2sOpTQe7BoVKlQM/fpQi6aBzu67M1iF1WtODpa5QUPvj1etaK+R3eYBzi4DIbCIWst8MdA
1+fEeKJFvMEZQONpkCwrJ+tJEuGQmjoQZgK1HeloepF0WDcviiho5FlgtAij+iBPtwMuuLiL
shAXA5afMX1hYM4l11JXntle12EQFP1r6wOUkpOdxceCcMVDEJBBCHW2ZmdEaXgAm1VU+fnQ
qS/wNw/S0X3RJT1qjr5uVlp2Y0auG/eG0jy6TT0KzTJeR9tLSDXprYkN2l/Qf7/nT6Q03qyE
QnnKiBXWAZXveafyU/zYa7t3PTWFQGgWoC4w6XqgPo4KV44OMYIFBzCCBQMCAQEwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBglghkgBZQMEAgMFAKCCAkUw
GAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTkwNDA5MTkwMTAz
WjBPBgkqhkiG9w0BCQQxQgRACWGRzU+WR5iZpc8LML8YOsBeOmU4puYh5TSh/jS8dBkRhKq5
4iBkVPLDr5uSqo3wpKMpy7mzNnXwrU67bOtI6zBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFl
AwQBKjALBglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3
DQMCAgFAMAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIGjBgkrBgEEAYI3EAQxgZUwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTCBpQYLKoZIhvcNAQkQAgsxgZWg
gZIwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lz
dGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0
ZW1zIExMQyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBgkqhkiG9w0BAQEF
AASCAgAjPTKMymFEqeTA0ySbcoiKp1Xb69tPHjH5EESX7Z+tO3jc43g3v4sCNRvc8Qnsy4vL
ZhLfn6yvLKr+IdMNtR0As8IGwD5l9EbHfZrwKiAJjqLEmK2++kW69BSSG1aRpBKNOYzQiQVS
LhzBJFjyHOOcb40kSP2aFcA7xvtABDNKvFBLQS1Aje/X7yzlEULCCjcfDUMDba9ulVi+3k2E
08lXA6oC5UIY6oIxf2xdWtfv5de396Q0hVu3WEgaEgPFuOZVJO0nkzsfP5AwsaNlxVwicQKO
T4bNVgd3J382F9jEneJztF7jHs2yGQjiND4N/embwt6QyCDqfmlPIvGDzZGs4FYqpFbX/50m
Cjtu/vLEuAlJZ8+qtawQKDUxPstAoviJgM6Ggu1VyqQUANEyGFP2/s98CXBQ+ZKEyHE/FQsa
adwR5PRGGLsE2bW/Jg8gDk3QoQLsKvzzpqLAOFF0T4Jk0UQBN4+LhWR9QxYj/vD+WlZXUQWP
upqmmodmw0LBnnJZ24FsccubO89f5Gvlp80IO+BnVuntE0bmFnOXVvuFTiT1lZWwlQ7XxD2k
jfyJ9sx6JnghCCzsRwTrIOXeRZShQ9e9DNGg0BzIR2P83kn5ALTYkgSyevXuuCQ3WKDKToxj
i5p6JRD4Z0uGO/vacMbH+UUyBFB0nmMbt3pDS55TTAAAAAAAAA==
--------------ms030000080804070202030104--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f87f32f2-b8c5-75d3-4105-856d9f4752ef>