Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Feb 2019 09:34:10 -0600
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-stable@freebsd.org
Subject:   Re: 9211 (LSI/SAS) issues on 11.2-STABLE
Message-ID:  <76c295f5-fc2b-2c9f-78b1-163939afb24a@denninger.net>
In-Reply-To: <1FFC1686-E70F-4649-B170-34F90B773918@sarenet.es>
References:  <7bb25f55-fa77-f67e-11f3-b2240b01e25a@denninger.net> <b50c527c-e7f7-3e64-af3a-e597ec77c021@denninger.net> <9ea70420-0c06-ad9d-e8b7-f9d92fed20d8@denninger.net> <57ddc2f4-681c-e0aa-0484-42cee3876a05@denninger.net> <1FFC1686-E70F-4649-B170-34F90B773918@sarenet.es>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms090408070206030201050706
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 2/6/2019 09:18, Borja Marcos wrote:
>> On 5 Feb 2019, at 23:49, Karl Denninger <karl@denninger.net> wrote:
>>
>> BTW under 12.0-STABLE (built this afternoon after the advisories came
>> out, with the patches) it's MUCH worse.  I get the same device resets
>> BUT it's followed by an immediate panic which I cannot dump as it
>> generates a page-fault (supervisor read data, page not present) in the=

>> mps *driver* at mpssas_send_abort+0x21.
>> This precludes a dump of course since attempting to do so gives you a
>> double-panic (I was wondering why I didn't get a crash dump!); I'll
>> re-jigger the box to stick a dump device on an internal SATA device so=
 I
>> can successfully get the dump when it happens and see if I can obtain =
a
>> proper crash dump on this.
>>
>> I think it's fair to assume that 12.0-STABLE should not panic on a dis=
k
>> problem (unless of course the problem is trying to page something back=

>> in -- it's not, the drive that aborts and resets is on a data pack doi=
ng
>> a scrub)
> It shouldn=E2=80=99t panic I imagine.
>
>>>>> mps0: Sending reset from mpssas_send_abort for target ID 37
>
>>> 0x06  =3D=3D=3D=3D=3D  =3D               =3D  =3D=3D=3D  =3D=3D Trans=
port Statistics (rev 1) =3D=3D
>>> 0x06  0x008  4               6  ---  Number of Hardware Resets
>>> 0x06  0x010  4               0  ---  Number of ASR Events
>>> 0x06  0x018  4               0  ---  Number of Interface CRC Errors
>>>                                 |||_ C monitored condition met
>>>                                 ||__ D supports DSN
>>>                                 |___ N normalized value
>>>
>>> 0x06  0x008  4               7  ---  Number of Hardware Resets
>>> 0x06  0x010  4               0  ---  Number of ASR Events
>>> 0x06  0x018  4               0  ---  Number of Interface CRC Errors
>>>                                 |||_ C monitored condition met
>>>                                 ||__ D supports DSN
>>>                                 |___ N normalized value
>>>
>>> Number of Hardware Resets has incremented.  There are no other errors=
 shown:
> What is _exactly_ that value? Is it related to the number of resets sen=
t from the HBA
> _or_ the device resetting by itself?
Good question.=C2=A0 What counts as a "reset"; UNIT ATTENTION is what the=

controller receives but whether that's a power reset, a reset *command*
from the HBA or a firmware crash (yikes!) in the disk I'm not certain.
>>> I'd throw possible shade at the backplane or cable /but I have alread=
y
>>> swapped both out for spares without any change in behavior./
> What about the power supply?=20
>
There are multiple other devices and the system board on that supply
(and thus voltage rails) but it too has been swapped out without
change.=C2=A0 In fact at this point other than the system board and RAM
(which is ECC, and is showing no errors in the system's BMC log)
/everything /in the server case (HBA, SATA expander, backplane, power
supply and cables) has been swapped for spares.=C2=A0 No change in behavi=
or.

Note that with 20.0.7.0 firmware in the HBA instead of a unit attention
I get a *controller* reset (!!!) which detaches some random number of
devices from ZFS's point of view before it comes back up (depending on
what's active at the time) which is potentially catastrophic if it hits
the system pool.=C2=A0 I immediately went back to 19.0.0.0 firmware on th=
e
HBA; I had upgraded to 20.0.7.0 since there had been good reports of
stability with it when I first saw this, thinking there was a drive
change that might have resulted in issues with it when running 19.0
firmware on the card.

This system was completely stable for over a year on 11.1-STABLE and in
fact hadn't been rebooted or logged a single "event" in over six months;
the problems started immediately upon upgrade to 11.2-STABLE and
persists on 12.0-STABLE.=C2=A0 The disks in question haven't changed eith=
er
(so it can't be a difference in firmware that is in a newer purchased
disk, for example.)

I'm thinking perhaps *something* in the codebase change made the HBA ->
SAS Expander combination trouble where it wasn't before; I've got a
couple of 16i HBAs on the way which will allow me to remove the SAS
expander to see if that causes the problem to disappear.=C2=A0 I've got a=

bunch of these Lenovo expanders and have been using them without any
sort of trouble in multiple machines; it's only when I went beyond 11.1
that I started having trouble with them.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms090408070206030201050706
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
DdgwggagMIIEiKADAgECAhMA5EiKghDOXrvfxYxjITXYDdhIMA0GCSqGSIb3DQEBCwUAMIGL
MQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJTmljZXZpbGxlMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExITAf
BgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQTAeFw0xNzA4MTcxNjQyMTdaFw0yNzA4
MTUxNjQyMTdaMHsxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkwFwYDVQQKDBBD
dWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExJTAjBgNVBAMMHEN1
ZGEgU3lzdGVtcyBMTEMgMjAxNyBJbnQgQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIK
AoICAQC1aJotNUI+W4jP7xQDO8L/b4XiF4Rss9O0B+3vMH7Njk85fZ052QhZpMVlpaaO+sCI
KqG3oNEbuOHzJB/NDJFnqh7ijBwhdWutdsq23Ux6TvxgakyMPpT6TRNEJzcBVQA0kpby1DVD
0EKSK/FrWWBiFmSxg7qUfmIq/mMzgE6epHktyRM3OGq3dbRdOUgfumWrqHXOrdJz06xE9NzY
vc9toqZnd79FUtE/nSZVm1VS3Grq7RKV65onvX3QOW4W1ldEHwggaZxgWGNiR/D4eosAGFxn
uYeWlKEC70c99Mp1giWux+7ur6hc2E+AaTGh+fGeijO5q40OGd+dNMgK8Es0nDRw81lRcl24
SWUEky9y8DArgIFlRd6d3ZYwgc1DMTWkTavx3ZpASp5TWih6yI8ACwboTvlUYeooMsPtNa9E
6UQ1nt7VEi5syjxnDltbEFoLYcXBcqhRhFETJe9CdenItAHAtOya3w5+fmC2j/xJz29og1KH
YqWHlo3Kswi9G77an+zh6nWkMuHs+03DU8DaOEWzZEav3lVD4u76bKRDTbhh0bMAk4eXriGL
h4MUoX3Imfcr6JoyheVrAdHDL/BixbMH1UUspeRuqQMQ5b2T6pabXP0oOB4FqldWiDgJBGRd
zWLgCYG8wPGJGYgHibl5rFiI5Ix3FQncipc6SdUzOQIDAQABo4IBCjCCAQYwHQYDVR0OBBYE
FF3AXsKnjdPND5+bxVECGKtc047PMIHABgNVHSMEgbgwgbWAFBu1oRhUMNEzjODolDka5k4Q
EDBioYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJ
TmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5
c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYIJAKxAy1WBo2kY
MBIGA1UdEwEB/wQIMAYBAf8CAQAwDgYDVR0PAQH/BAQDAgGGMA0GCSqGSIb3DQEBCwUAA4IC
AQCB5686UCBVIT52jO3sz9pKuhxuC2npi8ZvoBwt/IH9piPA15/CGF1XeXUdu2qmhOjHkVLN
gO7XB1G8CuluxofOIUce0aZGyB+vZ1ylHXlMeB0R82f5dz3/T7RQso55Y2Vog2Zb7PYTC5B9
oNy3ylsnNLzanYlcW3AAfzZcbxYuAdnuq0Im3EpGm8DoItUcf1pDezugKm/yKtNtY6sDyENj
tExZ377cYA3IdIwqn1Mh4OAT/Rmh8au2rZAo0+bMYBy9C11Ex0hQ8zWcvPZBDn4v4RtO8g+K
uQZQcJnO09LJNtw94W3d2mj4a7XrsKMnZKvm6W9BJIQ4Nmht4wXAtPQ1xA+QpxPTmsGAU0Cv
HmqVC7XC3qxFhaOrD2dsvOAK6Sn3MEpH/YrfYCX7a7cz5zW3DsJQ6o3pYfnnQz+hnwLlz4MK
17NIA0WOdAF9IbtQqarf44+PEyUbKtz1r0KGeGLs+VGdd2FLA0e7yuzxJDYcaBTVwqaHhU2/
Fna/jGU7BhrKHtJbb/XlLeFJ24yvuiYKpYWQSSyZu1R/gvZjHeGb344jGBsZdCDrdxtQQcVA
6OxsMAPSUPMrlg9LWELEEYnVulQJerWxpUecGH92O06wwmPgykkz//UmmgjVSh7ErNvL0lUY
UMfunYVO/O5hwhW+P4gviCXzBFeTtDZH259O7TCCBzAwggUYoAMCAQICEwCg0WvVwekjGFiO
62SckFwepz0wDQYJKoZIhvcNAQELBQAwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3Jp
ZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBD
QTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExMQyAyMDE3IEludCBDQTAeFw0xNzA4MTcyMTIx
MjBaFw0yMjA4MTYyMTIxMjBaMFcxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRswGQYDVQQDDBJrYXJsQGRlbm5pbmdlci5uZXQw
ggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQC+HVSyxVtJhy3Ohs+PAGRuO//Dha9A
16l5FPATr6wude9zjX5f2lrkRyU8vhCXTZW7WbvWZKpcZ8r0dtZmiK9uF58Ec6hhvfkxJzbg
96WHBw5Fumd5ahZzuCJDtCAWW8R7/KN+zwzQf1+B3MVLmbaXAFBuKzySKhKMcHbK3/wjUYTg
y+3UK6v2SBrowvkUBC+jxNg3Wy12GsTXcUS/8FYIXgVVPgfZZrbJJb5HWOQpvvhILpPCD3xs
YJFNKEPltXKWHT7Qtc2HNqikgNwj8oqOb+PeZGMiWapsatKm8mxuOOGOEBhAoTVTwUHlMNTg
6QUCJtuWFCK38qOCyk9Haj+86lUU8RG6FkRXWgMbNQm1mWREQhw3axgGLSntjjnznJr5vsvX
SYR6c+XKLd5KQZcS6LL8FHYNjqVKHBYM+hDnrTZMqa20JLAF1YagutDiMRURU23iWS7bA9tM
cXcqkclTSDtFtxahRifXRI7Epq2GSKuEXe/1Tfb5CE8QsbCpGsfSwv2tZ/SpqVG08MdRiXxN
5tmZiQWo15IyWoeKOXl/hKxA9KPuDHngXX022b1ly+5ZOZbxBAZZMod4y4b4FiRUhRI97r9l
CxsP/EPHuuTIZ82BYhrhbtab8HuRo2ofne2TfAWY2BlA7ExM8XShMd9bRPZrNTokPQPUCWCg
CdIATQIDAQABo4IBzzCCAcswPAYIKwYBBQUHAQEEMDAuMCwGCCsGAQUFBzABhiBodHRwOi8v
b2NzcC5jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNVHRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIF
oDAOBgNVHQ8BAf8EBAMCBeAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMDMGCWCG
SAGG+EIBDQQmFiRPcGVuU1NMIEdlbmVyYXRlZCBDbGllbnQgQ2VydGlmaWNhdGUwHQYDVR0O
BBYEFLElmNWeVgsBPe7O8NiBzjvjYnpRMIHKBgNVHSMEgcIwgb+AFF3AXsKnjdPND5+bxVEC
GKtc047PoYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UE
BwwJTmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRh
IFN5c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYITAORIioIQ
zl6738WMYyE12A3YSDAdBgNVHREEFjAUgRJrYXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcN
AQELBQADggIBAJXboPFBMLMtaiUt4KEtJCXlHO/3ZzIUIw/eobWFMdhe7M4+0u3te0sr77QR
dcPKR0UeHffvpth2Mb3h28WfN0FmJmLwJk+pOx4u6uO3O0E1jNXoKh8fVcL4KU79oEQyYkbu
2HwbXBU9HbldPOOZDnPLi0whi/sbFHdyd4/w/NmnPgzAsQNZ2BYT9uBNr+jZw4SsluQzXG1X
lFL/qCBoi1N2mqKPIepfGYF6drbr1RnXEJJsuD+NILLooTNf7PMgHPZ4VSWQXLNeFfygoOOK
FiO0qfxPKpDMA+FHa8yNjAJZAgdJX5Mm1kbqipvb+r/H1UAmrzGMbhmf1gConsT5f8KU4n3Q
IM2sOpTQe7BoVKlQM/fpQi6aBzu67M1iF1WtODpa5QUPvj1etaK+R3eYBzi4DIbCIWst8MdA
1+fEeKJFvMEZQONpkCwrJ+tJEuGQmjoQZgK1HeloepF0WDcviiho5FlgtAij+iBPtwMuuLiL
shAXA5afMX1hYM4l11JXntle12EQFP1r6wOUkpOdxceCcMVDEJBBCHW2ZmdEaXgAm1VU+fnQ
qS/wNw/S0X3RJT1qjr5uVlp2Y0auG/eG0jy6TT0KzTJeR9tLSDXprYkN2l/Qf7/nT6Q03qyE
QnnKiBXWAZXveafyU/zYa7t3PTWFQGgWoC4w6XqgPo4KV44OMYIFBzCCBQMCAQEwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBglghkgBZQMEAgMFAKCCAkUw
GAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTkwMjA2MTUzNDEw
WjBPBgkqhkiG9w0BCQQxQgRA4N+IkEeaYlMQQBammZDI85N7PPn96QNOISquLsHtpDLlIsag
78vM4FRPA/sJ0Qg49FlmCaNXMqIJDfJuHgMUkjBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFl
AwQBKjALBglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3
DQMCAgFAMAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIGjBgkrBgEEAYI3EAQxgZUwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTCBpQYLKoZIhvcNAQkQAgsxgZWg
gZIwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lz
dGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0
ZW1zIExMQyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBgkqhkiG9w0BAQEF
AASCAgAhMubAE1EE8sx17lP3W5JOg/Ka5j9508aE+nraCu0R+4VEKX42cn7kqi0lxKC4Oyt6
dqQ9fo1JF08JOQX3IUxrFFbFQvx2rHKoiWqPIxxlYGL5l5nJVvEWKNorOCBHXOx6L5bZjKO/
JLErYpcxLwDdRPDQVkQpYp+IMw/5Gd6R80TzYs1EvJRypKbuuaEBneCG4EawWFWONshs9GVl
Un4urvGs1p5TRIkvKNf1BJv1eAzDiEszmaHeVcQUt3goaq6iZEmCKyvBpqbAixLwtPkSWWXL
Wbbdq9lUfBtYekTOJ2/SaeQKn5dv1NJNTySRYai30OCDlEtgqOGGd7+Y9+TPe+0C9eJk6qC2
X5C2uLO/nHtoblpQSQvuizU/KUJMYhjwJXvH8U0vfp7+NiKs0fNElxKYSLk0lJ1l7zkszxBj
/Ss5vPv9+i6UavRyJR5pPkHZPUoxxxjwILphjM/AK++SS+iNRU6xgS2QddPacPj+vmA80TpH
Iz5/lYm/cpTT7X2qtID1PAUwSP617SHi87oe5EexTz+2KTfDQaMKCFWYuPoYxuWj8h1EcYRW
Nsi7x9ROLd0aOeFzACP4D2utaPKyi6vLJhXCoxI2i5qPoiJYvm8p9CvNoUipBM+oZ6bpBsvm
uaJjlg9B7sqH82VrdRVfB5YTLmC/aEFC6r5rEXwZ+wAAAAAAAA==
--------------ms090408070206030201050706--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?76c295f5-fc2b-2c9f-78b1-163939afb24a>