Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 30 Oct 2016 07:52:00 -0400
From:      Jeremy Beker <gothmog@confusticate.com>
To:        freebsd-stable@freebsd.org
Subject:   FreeBSD 11.0 and LSI SAS3081E losing all devices
Message-ID:  <FF400F3A-350A-4133-BED1-78087F1657F3@confusticate.com>

next in thread | raw e-mail | index | archive | help

--Apple-Mail-2B53EFC2-3302-4FCB-A6F6-4CDFECE20F4D
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable

Good Morning!

Since upgrading my home server from 10.3 to 11.0-RELEASE-p1 about a week ago=
, I have twice had a serious problem where my LSI adapter is having errors a=
nd dropping all the drives out of my ZFS pool.

Hardware:
- LSI SAS3081E-R PCI-E card with the IT firmware loaded=20
- 6x2TB WD Black drives
- 1 SSD
- Supermicro X10SLL-F MB (not sure that is relevant)=20

This system has been running with this exact hardware for about a year with n=
o problems under the 10.X versions of FreeBSD. Last weekend, I upgraded the s=
ystem to 11.0-RELEASE-p1. Since then, twice, all of the drives have been mar=
ked as unavailable to ZFS after generating a stream of errors.

The problems start with a number of errors like this:

Oct 26 03:28:29 rivendell kernel: mpt0: request 0xfffffe0000f73058:57643 tim=
ed out for ccb 0xfffff803456ea000 (req->ccb 0xfffff803456ea000)=20
Oct 26 03:28:29 rivendell kernel: mpt0: attempting to abort req 0xfffffe0000=
f73058:57643 function 0=20
Oct 26 03:28:29 rivendell kernel: mpt0: completing timedout/aborted req 0xff=
fffe0000f73058:57643=20
Oct 26 03:28:29 rivendell kernel: (da0:mpt0:0:10:0): READ(10). CDB: 28 00 04=
 c4 91 c0 00 00 08 00=20
Oct 26 03:28:29 rivendell kernel: (da0:mpt0:0:10:0): CAM status: CCB request=
 terminated by the host=20
Oct 26 03:28:29 rivendell kernel: (da0:mpt0:0:10:0): mpt0: Retrying command=20=

Oct 26 03:28:29 rivendell kernel: abort of req 0xfffffe0000f73058:0 complete=
d=20
Oct 26 03:28:49 rivendell kernel: mpt0: request 0xfffffe0000f6c3b0:57658 tim=
ed out for ccb 0xfffff803456ea000 (req->ccb 0xfffff803456ea000)=20
Oct 26 03:28:49 rivendell kernel: mpt0: attempting to abort req 0xfffffe0000=
f6c3b0:57658 function 0=20
Oct 26 03:28:49 rivendell kernel: mpt0: completing timedout/aborted req 0xff=
fffe0000f6c3b0:57658=20
Oct 26 03:28:49 rivendell kernel: (da0:mpt0:0:10:0): READ(10). CDB: 28 00 04=
 c4 91 c0 00 00 08 00=20
Oct 26 03:28:49 rivendell kernel: (da0:mpt0:0:10:0): CAM status: CCB request=
 terminated by the host=20
Oct 26 03:28:49 rivendell kernel: (da0:mpt0:0:10:0): Retrying command=20
Oct 26 03:28:49 rivendell kernel: mpt0: abort of req 0xfffffe0000f6c3b0:0 co=
mpleted=20
Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): READ(10). CDB: 28 00 04=
 c4 91 c0 00 00 08 00=20
Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): CAM status: SCSI Status=
 Error=20
Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): SCSI status: Check Cond=
ition=20
Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): SCSI sense: UNIT ATTENT=
ION asc:29,0 (Power on, reset, or bus device reset occurred)=20
Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): Retrying command (per s=
ense data)=20

Also these:

Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): SYNCHRONIZE CACHE(10). C=
DB: 35 00 00 00 00 00 00 00 00 00
Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): CAM status: SCSI Status=
 Error
Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): SCSI status: Check Cond=
ition
Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): SCSI sense: UNIT ATTENT=
ION asc:29,0 (Power on, reset, or bus device reset occurred)
Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): Error 6, Retries exhaus=
ted
Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): Invalidating pack

After a bunch of rounds of the errors above, I get this:

Oct 26 03:35:17 rivendell kernel: mpt0: request 0xfffffe0000f73350:62027 tim=
ed out for ccb 0xfffff800160ce000 (req->ccb 0xfffff800160ce000)
Oct 26 03:35:17 rivendell kernel: mpt0: attempting to abort req 0xfffffe0000=
f73350:62027 function 0
Oct 26 03:35:18 rivendell kernel: mpt0: mpt_wait_req(1) timed out
Oct 26 03:35:18 rivendell kernel: mpt0: mpt_recover_commands: abort timed-ou=
t. Resetting controller
Oct 26 03:35:18 rivendell kernel: mpt0: mpt_cam_event: 0x0
Oct 26 03:35:18 rivendell kernel: mpt0: mpt_cam_event: 0x0
Oct 26 03:35:18 rivendell kernel: mpt0: completing timedout/aborted req 0xff=
fffe0000f73350:62027

After which all the drives seem to disappear and the system detaches all of t=
hem:

Oct 26 03:35:33 rivendell kernel: da1 at mpt0 bus 0 scbus0 target 14 lun 0
Oct 26 03:35:33 rivendell kernel: da1: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WM=
AY01559141 detached
Oct 26 03:35:33 rivendell kernel: da2 at mpt0 bus 0 scbus0 target 15 lun 0
Oct 26 03:35:33 rivendell kernel: da2: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WM=
AY01603430 detached
Oct 26 03:35:33 rivendell kernel: da5 at mpt0 bus 0 scbus0 target 18 lun 0
Oct 26 03:35:33 rivendell kernel: da5: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WM=
AY01159727 detached
Oct 26 03:35:33 rivendell kernel: da6 at mpt0 bus 0 scbus0 target 19 lun 0
Oct 26 03:35:33 rivendell kernel: da6: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WM=
AY02971691 detached
Oct 26 03:35:33 rivendell kernel: da4 at mpt0 bus 0 scbus0 target 17 lun 0
Oct 26 03:35:33 rivendell kernel: da4: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WM=
AY01470856 detached
Oct 26 03:35:33 rivendell kernel: da3 at mpt0 bus 0 scbus0 target 16 lun 0
Oct 26 03:35:33 rivendell kernel: da3: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WM=
AY01602648 detached

At this point I have had to reboot the server and then all the drives magica=
lly reappear.

Any help would be greatly appreciated.

-Jeremy

--=20
Jeremy Beker - @gothmog=20
http://www.confusticate.com
Condensing fact from the vapor of nuance.


--Apple-Mail-2B53EFC2-3302-4FCB-A6F6-4CDFECE20F4D
Content-Type: application/pkcs7-signature;
	name=smime.p7s
Content-Disposition: attachment;
	filename=smime.p7s
Content-Transfer-Encoding: base64

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFDDCCBQgw
ggPwoAMCAQICED7waKcRDjPZchDYp4xW7X0wDQYJKoZIhvcNAQELBQAwdTELMAkGA1UEBhMCSUwx
FjAUBgNVBAoTDVN0YXJ0Q29tIEx0ZC4xKTAnBgNVBAsTIFN0YXJ0Q29tIENlcnRpZmljYXRpb24g
QXV0aG9yaXR5MSMwIQYDVQQDExpTdGFydENvbSBDbGFzcyAxIENsaWVudCBDQTAeFw0xNjAzMjcx
MjA0MjVaFw0xNzAzMjcxMjA0MjVaMEwxITAfBgNVBAMMGGdvdGhtb2dAY29uZnVzdGljYXRlLmNv
bTEnMCUGCSqGSIb3DQEJARYYZ290aG1vZ0Bjb25mdXN0aWNhdGUuY29tMIIBIjANBgkqhkiG9w0B
AQEFAAOCAQ8AMIIBCgKCAQEA3RoESoAhdajTxi3KVNa8fnM9blHxqbylwHh9bDQ3A+w5xguZlOxg
pLAJSczpLVGRilU/e6UlRzgXCaRhEFIv6rb5czqxqq+Aktvus9uY99Q+vCU/LbnutPeF/X0Hr01E
ff+Ts+wVBVjnj1vuvW1x/lSzTGKCVsuYhvOb5ULXTTp/OLpRJhprpZXJCmJ+6LQftykLBR/fhyL9
jIEPAxa7JV64VkYk/qANeX29j36y1W8+J5CV2egwrrXpOnIOsY15K00eHIoNcRiXJnR0LfDST8eT
dUVQWjBA5gzbTGs96hlS2EQ3Dz3jSZc2CsdM5k8rgzkdBXwzpvz6kbWNE8Q5ywIDAQABo4IBuzCC
AbcwDgYDVR0PAQH/BAQDAgSwMB0GA1UdJQQWMBQGCCsGAQUFBwMCBggrBgEFBQcDBDAJBgNVHRME
AjAAMB0GA1UdDgQWBBRaB+bmgn7KdjjHK5ZnNUVBAuycKzAfBgNVHSMEGDAWgBQkgWw5Yb5JD4+3
G0YrySi1J0htaDBvBggrBgEFBQcBAQRjMGEwJAYIKwYBBQUHMAGGGGh0dHA6Ly9vY3NwLnN0YXJ0
c3NsLmNvbTA5BggrBgEFBQcwAoYtaHR0cDovL2FpYS5zdGFydHNzbC5jb20vY2VydHMvc2NhLmNs
aWVudDEuY3J0MDgGA1UdHwQxMC8wLaAroCmGJ2h0dHA6Ly9jcmwuc3RhcnRzc2wuY29tL3NjYS1j
bGllbnQxLmNybDAjBgNVHREEHDAagRhnb3RobW9nQGNvbmZ1c3RpY2F0ZS5jb20wIwYDVR0SBBww
GoYYaHR0cDovL3d3dy5zdGFydHNzbC5jb20vMEYGA1UdIAQ/MD0wOwYLKwYBBAGBtTcBAgQwLDAq
BggrBgEFBQcCARYeaHR0cDovL3d3dy5zdGFydHNzbC5jb20vcG9saWN5MA0GCSqGSIb3DQEBCwUA
A4IBAQACW4t9PdRYwzKMfSdGBlBhkcd+OAF8lHT3Jh/FYgRVrkkPvEh7SIPa7wPKuzwf9hFjhxPE
zyG264lW1WNyMbD3Hl4Djwu8tXPNjW1nxXO3iRIA9acqpvivp8SCIWoO5AigAm8G6KEIQS3rYPV+
q28YEziMoRGvb+seEBQCYANxRtEVTaQfYA3iOezKiYmftC+EXT/J3AqerQD7v9+kyloZ62OhHgof
yAvXeVY7sK8BmG1h9LDPQgxDVwW1JRQJmw6WHVu2twj3W+DTTmEjZM9F8XqNvScaZvPhSx7ZIkvU
bNo7rK5O+05825BkqJwgrwuhXS7utuBA3Gr6UYz9fdxQMYIDTjCCA0oCAQEwgYkwdTELMAkGA1UE
BhMCSUwxFjAUBgNVBAoTDVN0YXJ0Q29tIEx0ZC4xKTAnBgNVBAsTIFN0YXJ0Q29tIENlcnRpZmlj
YXRpb24gQXV0aG9yaXR5MSMwIQYDVQQDExpTdGFydENvbSBDbGFzcyAxIENsaWVudCBDQQIQPvBo
pxEOM9lyENinjFbtfTAJBgUrDgMCGgUAoIIBmTAYBgkqhkiG9w0BCQMxCwYJKoZIhvcNAQcBMBwG
CSqGSIb3DQEJBTEPFw0xNjEwMzAxMTUyMDBaMCMGCSqGSIb3DQEJBDEWBBSH6mGTBhkIJiCjdV4B
R0iw7DsviTCBmgYJKwYBBAGCNxAEMYGMMIGJMHUxCzAJBgNVBAYTAklMMRYwFAYDVQQKEw1TdGFy
dENvbSBMdGQuMSkwJwYDVQQLEyBTdGFydENvbSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTEjMCEG
A1UEAxMaU3RhcnRDb20gQ2xhc3MgMSBDbGllbnQgQ0ECED7waKcRDjPZchDYp4xW7X0wgZwGCyqG
SIb3DQEJEAILMYGMoIGJMHUxCzAJBgNVBAYTAklMMRYwFAYDVQQKEw1TdGFydENvbSBMdGQuMSkw
JwYDVQQLEyBTdGFydENvbSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTEjMCEGA1UEAxMaU3RhcnRD
b20gQ2xhc3MgMSBDbGllbnQgQ0ECED7waKcRDjPZchDYp4xW7X0wDQYJKoZIhvcNAQEBBQAEggEA
QLiXvIFR/yC9YcHSVYv/D7u30LomvtnQ6f3NrFzbsFF2nrUKMtAl+pxQ+YShol+sPHQ2hfdRt8fZ
qE/bddHO8hCgziBTkFTMPqQm4EZGN2bDtKGeOHTJlP3/af0b0nzYHHGaznIVJHE9eWQvoX12153V
ljsBw8DO7N0VvlgaAJwd4uSsEwc+eSJbdaqzRrZta6iPgq+znz+4e2ulzr7al+uRUcbVIeuotMJQ
8yfxdbwdL37Uyb+jEwh+Ld+O/jjJv7GmkhFuV2FTtOjyU2y/j/3zkibXYzghWcge64yBcUOFsgbr
21Hk+tRYkmDILO8qj6r2tEQUicX+9RzkXNcK0AAAAAAAAA==

--Apple-Mail-2B53EFC2-3302-4FCB-A6F6-4CDFECE20F4D--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FF400F3A-350A-4133-BED1-78087F1657F3>