From owner-freebsd-questions@FreeBSD.ORG Fri Jan 22 18:46:54 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E27F61065670 for ; Fri, 22 Jan 2010 18:46:54 +0000 (UTC) (envelope-from kurin@delete.org) Received: from lithium.delete.org (lithium.delete.org [198.177.254.210]) by mx1.freebsd.org (Postfix) with ESMTP id C62E98FC18 for ; Fri, 22 Jan 2010 18:46:54 +0000 (UTC) Received: by lithium.delete.org (Postfix, from userid 1028) id 323467E860; Fri, 22 Jan 2010 13:30:12 -0500 (EST) Date: Fri, 22 Jan 2010 13:30:12 -0500 From: Toby Burress To: freebsd-questions@freebsd.org Message-ID: <20100122183012.GD6476@lithium.delete.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Subject: Drive errors in raidz array X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jan 2010 18:46:55 -0000 I have a system with 24 drives in raidz2. When testing with bonnie++ it seemed to work fine (although I had to raise the arc_max to prevent kernel panics). However, now we're copying data to it and dmesg is showing many errors like: mpt0: mpt_cam_event: 0x16 mpt0: request 0xffffff80005f3840:63495 timed out for ccb 0xffffff000988f800 (req->ccb 0xffffff000988f800) mpt0: request 0xffffff80005f1f80:63496 timed out for ccb 0xffffff00098d0800 (req->ccb 0xffffff00098d0800) mpt0: attempting to abort req 0xffffff80005f3840:63495 function 0 mpt0: request 0xffffff8000601ee0:63497 timed out for ccb 0xffffff011edaa800 (req->ccb 0xffffff011edaa800) mpt0: request 0xffffff80005f4ec0:63498 timed out for ccb 0xffffff011eda5800 (req->ccb 0xffffff011eda5800) mpt0: mpt_wait_req(1) timed out mpt0: mpt_recover_commands: abort timed-out. Resetting controller mpt0: mpt_cam_event: 0x0 mpt0: completing timedout/aborted req 0xffffff80005f3840:63495 mpt0: completing timedout/aborted req 0xffffff80005f1f80:63496 mpt0: completing timedout/aborted req 0xffffff8000601ee0:63497 mpt0: completing timedout/aborted req 0xffffff80005f4ec0:63498 followed by (da0:mpt0:0:1:0): READ(10). CDB: 28 0 1 23 81 6f 0 0 2b 0 (da0:mpt0:0:1:0): CAM Status: SCSI Status Error (da0:mpt0:0:1:0): SCSI Status: Check Condition (da0:mpt0:0:1:0): UNIT ATTENTION asc:29,0 (da0:mpt0:0:1:0): Power on, reset, or bus device reset occurred (da0:mpt0:0:1:0): Retrying Command (per Sense Data) for every drive in the array. Additionally, zpool scrub says: pool: backups state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 0h0m with 0 errors on Thu Jan 21 23:15:36 2010 I'm using 8.0-RELEASE-p2 on amd64. One other thing that changed between testing with bonnie++ and now is that I used glabel to label the drives before I put them in the raidz array. There is no raid controller. Is this something anyone has seen before? Googling around shows some similar errors but no solutions.