Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Jan 2010 13:30:12 -0500
From:      Toby Burress <kurin@delete.org>
To:        freebsd-questions@freebsd.org
Subject:   Drive errors in raidz array
Message-ID:  <20100122183012.GD6476@lithium.delete.org>

Next in thread | Raw E-Mail | Index | Archive | Help
I have a system with 24 drives in raidz2.  When testing with bonnie++
it seemed to work fine (although I had to raise the arc_max to
prevent kernel panics).  However, now we're copying data to it and
dmesg is showing many errors like:

mpt0: mpt_cam_event: 0x16
mpt0: request 0xffffff80005f3840:63495 timed out for ccb 0xffffff000988f800 (req->ccb 0xffffff000988f800)
mpt0: request 0xffffff80005f1f80:63496 timed out for ccb 0xffffff00098d0800 (req->ccb 0xffffff00098d0800)
mpt0: attempting to abort req 0xffffff80005f3840:63495 function 0
mpt0: request 0xffffff8000601ee0:63497 timed out for ccb 0xffffff011edaa800 (req->ccb 0xffffff011edaa800)
mpt0: request 0xffffff80005f4ec0:63498 timed out for ccb 0xffffff011eda5800 (req->ccb 0xffffff011eda5800)
mpt0: mpt_wait_req(1) timed out
mpt0: mpt_recover_commands: abort timed-out. Resetting controller
mpt0: mpt_cam_event: 0x0
mpt0: completing timedout/aborted req 0xffffff80005f3840:63495
mpt0: completing timedout/aborted req 0xffffff80005f1f80:63496
mpt0: completing timedout/aborted req 0xffffff8000601ee0:63497
mpt0: completing timedout/aborted req 0xffffff80005f4ec0:63498

followed by

(da0:mpt0:0:1:0): READ(10). CDB: 28 0 1 23 81 6f 0 0 2b 0 
(da0:mpt0:0:1:0): CAM Status: SCSI Status Error
(da0:mpt0:0:1:0): SCSI Status: Check Condition
(da0:mpt0:0:1:0): UNIT ATTENTION asc:29,0
(da0:mpt0:0:1:0): Power on, reset, or bus device reset occurred
(da0:mpt0:0:1:0): Retrying Command (per Sense Data)

for every drive in the array.  Additionally, zpool scrub says:

 pool: backups
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 0h0m with 0 errors on Thu Jan 21 23:15:36 2010

I'm using 8.0-RELEASE-p2 on amd64.  One other thing that changed
between testing with bonnie++ and now is that I used glabel to label
the drives before I put them in the raidz array.

There is no raid controller.

Is this something anyone has seen before?  Googling around shows
some similar errors but no solutions.



Want to link to this message? Use this URL: <http://docs.FreeBSD.org/cgi/mid.cgi?20100122183012.GD6476>