Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Sep 2015 06:51:43 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 191348] [mps] LSI2308 with WD3000FYYZ drives disappears after hotswapping
Message-ID:  <bug-191348-8-sYY7P9nIxC@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-191348-8@https.bugs.freebsd.org/bugzilla/>
References:  <bug-191348-8@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D191348

--- Comment #22 from Karli.Sjoberg@slu.se ---
No it can=C2=B4t, it=C2=B4s not really fixed.

We have upgraded several of our systems to have this driver and also flashed
the firmware of our HBA's to P19. Tried to flash with firmware 20.00.04.00 =
to
match the 20.00.00.00 driver as well, but then ZFS went nuts displaying
checksum errors all over. Reverting to P19 fixed that.

I have captured what happened the last time a drive (WD40EZRX) went bye-bye:

Sep 27 01:39:18 zfs1-1 kernel: (da9:mps0:0:16:0): SYNCHRONIZE CACHE(10). CD=
B:
35 00 00 00 00 00 00 00 00 00 length 0 SMID 368 command timeout cm
0xfffffe0000cb8300 ccb 0xfffff80302ab4800
Sep 27 01:39:18 zfs1-1 kernel: (noperiph:mps0:0:4294967295:0): SMID 1 Abort=
ing
command 0xfffffe0000cb8300
Sep 27 01:39:18 zfs1-1 kernel: mps0: Sending reset from mpssas_send_abort f=
or
target ID 16
Sep 27 01:39:18 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e =
5f
3f d8 00 00 08 00 length 4096 SMID 411 command timeout cm 0xfffffe0000cbbb70
ccb 0xfffff80302355800
Sep 27 01:39:18 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0=
e 4c
d8 00 00 80 00 length 65536 SMID 378 command timeout cm 0xfffffe0000cb9020 =
ccb
0xfffff802e6b06000
Sep 27 01:39:18 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0=
e 4d
d8 00 00 80 00 length 65536 SMID 404 command timeout cm 0xfffffe0000cbb240 =
ccb
0xfffff800670db000
Sep 27 01:39:18 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0=
e 4d
58 00 00 80 00 length 65536 SMID 885 command timeout cm 0xfffffe0000ce2990 =
ccb
0xfffff801eefbb000
Sep 27 01:39:18 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e =
5f
8d 38 00 00 08 00 length 4096 SMID 234 command timeout cm 0xfffffe0000cad320
ccb 0xfffff8022997c000
Sep 27 01:39:20 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e =
5f
8e 50 00 01 00 00 length 131072 SMID 79 command timeout cm 0xfffffe0000ca07=
b0
ccb 0xfffff801ee18f800
Sep 27 01:39:20 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e =
5f
8d 50 00 01 00 00 length 131072 SMID 218 command timeout cm 0xfffffe0000cab=
e20
ccb 0xfffff80067d7c800
Sep 27 01:39:21 zfs1-1 kernel: mps0: mpssas_prepare_remove: Sending reset f=
or
target ID 16
Sep 27 01:39:21 zfs1-1 kernel: da9 at mps0 bus 0 scbus0 target 16 lun 0
Sep 27 01:39:21 zfs1-1 kernel: da9: <ATA WDC WD40EZRX-00S 0A80> s/n=20=20=
=20=20=20
WD-WCC4E4YSXNYH detached
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e =
5f
8e 50 00 01 00 00 length 131072 SMID 79 terminated ioc 804b scsi 0 state c =
xfer
0
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e =
5f
8d 50 00 01 00 00 length 131072 SMID 218 terminated ioc 804b scsi 0 state c
xf(da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e 5f 8e 50 00 01 00 00=20
Sep 27 01:39:22 zfs1-1 kernel: er 0
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Unconditional=
ly
Re-queue Request
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e =
5f
8d 38 00 00 08 00 length 4096 SMID 234 terminated ioc 804b scsi 0 state c
xfer(da9: 0
Sep 27 01:39:22 zfs1-1 kernel: mps0:0:  (da9:mps0:0:16:0): READ(10). CDB: 2=
8 00
13 0e 4d 58 00 00 80 00 length 65536 SMID 885 terminated ioc 804b scsi 0 st=
ate
c xfer16: 0
Sep 27 01:39:22 zfs1-1 kernel: 0):      (da9:mps0:0:16:0): READ(10). CDB: 2=
8 00
13 0e 4c d8 00 00 80 00 length 65536 SMID 378 terminated ioc 804b scsi 0 st=
ate
c xferError 5, Periph was invalidated
Sep 27 01:39:22 zfs1-1 kernel: 0
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e =
5f
8d 50 00 01 00 00=20
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0=
e 4d
d8 00 00 80 00 length 65536 SMID 404 terminated ioc 804b scsi 0 state c
xfer(da9:mps0:0:16:0): CAM status: Unconditionally Re-queue Request
Sep 27 01:39:22 zfs1-1 kernel: 0
Sep 27 01:39:22 zfs1-1 kernel: (da9:    (da9:mps0:0:16:0): WRITE(10). CDB: =
2a
00 1e 5f 3f d8 00 00 08 00 length 4096 SMID 411 terminated ioc 804b scsi 0
state c xfermps0:0: 0
Sep 27 01:39:22 zfs1-1 kernel: 16:mps0: 0): IOCStatus =3D 0x4b while resett=
ing
device 0x14
Sep 27 01:39:22 zfs1-1 kernel: Error 5, Periph was invalidated
Sep 27 01:39:22 zfs1-1 kernel: mps0: (da9:mps0:0:16:0): WRITE(10). CDB: 2a =
00
1e 5f 8d 38 00 00 08 00=20
Sep 27 01:39:22 zfs1-1 kernel: Unfreezing devq for target ID 16
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Unconditional=
ly
Re-queue Request
Sep 27 01:39:22 zfs1-1 kernel: mps0: (da9:Unfreezing devq for target ID 16
Sep 27 01:39:22 zfs1-1 kernel: mps0:0:16:0): Error 5, Periph was invalidated
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0=
e 4d
58 00 00 80 00=20
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Unconditional=
ly
Re-queue Request
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): Error 5, Periph was
invalidated
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0=
e 4c
d8 00 00 80 00=20
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Unconditional=
ly
Re-queue Request
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): Error 5, Periph was
invalidated
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0=
e 4d
d8 00 00 80 00=20
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Unconditional=
ly
Re-queue Request
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): Error 5, Periph was
invalidated
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e =
5f
3f d8 00 00 08 00=20
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Unconditional=
ly
Re-queue Request
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): Error 5, Periph was
invalidated
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): SYNCHRONIZE CACHE(10). CD=
B:
35 00 00 00 00 00 00 00 00 00=20
Sep 27 01:39:22 zfs1-1 kernel: ctl_datamove: tag 0x183baa00 on (0:9:0:0)
aborted
Sep 27 01:39:22 zfs1-1 kernel: ctl_datamove: tag 0x433baa00 on (0:9:0:0)
aborted
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Command timeo=
ut
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): Error 5, Periph was
invalidated
Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): Periph destroyed
Sep 27 01:39:21 zfs1-1 devd: Executing 'logger -p kern.notice -t ZFS 'vdev =
is
removed, pool_guid=3D11769113696885915207 vdev_guid=3D10111278074591061297''
Sep 27 01:39:21 zfs1-1 ZFS: vdev is removed, pool_guid=3D117691136968859152=
07
vdev_guid=3D10111278074591061297

This server is running 10.1-STABLE r281643, close to 10.2-RELEASE. When
reinserting a new SATA drive that has never previously been in the system,
nothing prints in the logs and that bay is "blocked" until you reboot the
server. We have also added 'dev.mps.0.spinup_wait_time=3D"5"' to loader.con=
f and
it haven=C2=B4t made any difference. The number of drives in this system ar=
e only 14
so I don=C2=B4t think extending the timeout makes any difference in this ca=
se.

Oddly enough, I had the opportunity to test inserting a SAS drive and it
successfully showed up in the OS, so whatever happens is only affecting SAT=
A.

Any way of further debugging this serious problem would be greatly apprecia=
ted!

/K

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-191348-8-sYY7P9nIxC>