Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 25 Sep 2011 17:32:55 +0200
From:      Adam Nowacki <nowakpl@platinum.linux.pl>
To:        freebsd-fs@freebsd.org
Subject:   ZFS and 3ware controller resets
Message-ID:  <4E7F49A7.1020909@platinum.linux.pl>

next in thread | raw e-mail | index | archive | help
I have a 20 disk storage system, every now and then a disk dies and 
causes 3ware controller to reset because of disk timeouts. This cuts out 
ZFS from all disks, even healthy ones and the system requires a hard reset.
Two issues here:
1) Why the controller has to reset? Thats a completely insane way of 
dealing with drive timeout.
2) ZFS not reopening the disk after controller reset.

FreeBSD version: 8.1-RELEASE-p1

/c0 Driver Version = 3.80.06.003
/c0 Model = 9650SE-16ML
/c0 Available Memory = 224MB
/c0 Firmware Version = FE9X 4.10.00.007
/c0 Bios Version = BE9X 4.08.00.002
/c0 Boot Loader Version = BL9X 3.08.00.001

   pool: zp2
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         zp2         ONLINE       0     0     0
           raidz2    ONLINE       0     0     0
             da1p1   ONLINE       0     0     0
             da2p1   ONLINE       0     0     0
             da3p1   ONLINE       0     0     0
             da4p1   ONLINE       0     0     0
             da5p1   ONLINE       0     0     0
             da6p1   ONLINE       0     0     0
             da7p1   ONLINE       0     0     0
             da9p1   ONLINE       0     0     0
             da8p1   ONLINE       0     0     0
             da10p1  ONLINE       0     0     0


Then when disk starts behaving:


twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a3 f4 e7 60 0 0 8 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 cb 7c 43 b8 0 0 10 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 ce e5 ca 30 0 0 20 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a4 2d 2d f8 0 0 8 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 cb 91 7c f8 0 0 20 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: Request 72 timed out!
twa0: INFO: (0x16: 0x1108): Resetting controller...:
twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=0
twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=3
twa0: INFO: (0x04: 0x0001): Controller reset occurred: resets=1
twa0: [ITHREAD]
(da1:twa0:0:1:0): lost device
(da2:twa0:0:2:0): lost device
(da3:twa0:0:3:0): lost device
(da4:twa0:0:4:0): lost device
(da5:twa0:0:5:0): lost device
(da6:twa0:0:6:0): lost device
(da7:twa0:0:7:0): lost device
(da8:twa0:0:8:0): lost device
(da9:twa0:0:9:0): lost device
(da10:twa0:0:10:0): lost device
(da11:twa0:0:11:0): lost device
(da12:twa0:0:12:0): lost device
(da13:twa0:0:13:0): lost device
(da1:twa0:0:1:0): removing device entry
da1 at twa0 bus 0 scbus0 target 1 lun 0
da1: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da1: 100.000MB/s transfers
da1: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da2:twa0:0:2:0): removing device entry
da2 at twa0 bus 0 scbus0 target 2 lun 0
da2: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da2: 100.000MB/s transfers
da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da3:twa0:0:3:0): removing device entry
da3 at twa0 bus 0 scbus0 target 3 lun 0
da3: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da3: 100.000MB/s transfers
da3: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da4:twa0:0:4:0): removing device entry
da4 at twa0 bus 0 scbus0 target 4 lun 0
da4: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da4: 100.000MB/s transfers
da4: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da5:twa0:0:5:0): removing device entry
da5 at twa0 bus 0 scbus0 target 5 lun 0
da5: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da5: 100.000MB/s transfers
da5: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da6:twa0:0:6:0): removing device entry
da6 at twa0 bus 0 scbus0 target 6 lun 0
da6: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da6: 100.000MB/s transfers
da6: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da7:twa0:0:7:0): removing device entry
da7 at twa0 bus 0 scbus0 target 7 lun 0
da7: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da7: 100.000MB/s transfers
da7: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da8:twa0:0:8:0): removing device entry
da8 at twa0 bus 0 scbus0 target 8 lun 0
da8: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da8: 100.000MB/s transfers
da8: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da9:twa0:0:9:0): removing device entry
da9 at twa0 bus 0 scbus0 target 9 lun 0
da9: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da9: 100.000MB/s transfers
da9: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da10:twa0:0:10:0): removing device entry
da10 at twa0 bus 0 scbus0 target 10 lun 0
da10: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da10: 100.000MB/s transfers
da10: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da11:twa0:0:11:0): removing device entry
da11 at twa0 bus 0 scbus0 target 11 lun 0
da11: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da11: 100.000MB/s transfers
da11: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da12:twa0:0:12:0): removing device entry
da12 at twa0 bus 0 scbus0 target 12 lun 0
da12: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da12: 100.000MB/s transfers
da12: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da13:twa0:0:13:0): removing device entry
da13 at twa0 bus 0 scbus0 target 13 lun 0
da13: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da13: 100.000MB/s transfers
da13: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)

   pool: zp2
  state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool 
clear'.
    see: http://www.sun.com/msg/ZFS-8000-HC
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         zp2         ONLINE       7    11     0
           raidz2    ONLINE      16    32     0
             da1p1   ONLINE       4    10     0
             da2p1   ONLINE       4    10     0
             da3p1   ONLINE       5   642     1
             da4p1   ONLINE       3     8     0
             da5p1   ONLINE       3    12     0
             da6p1   ONLINE       3    12     0
             da7p1   ONLINE       3    12     0
             da9p1   ONLINE       3    12     0
             da8p1   ONLINE       3    14     0
             da10p1  ONLINE       3    10     0

errors: 10 data errors, use '-v' for a list



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E7F49A7.1020909>