From owner-freebsd-questions@FreeBSD.ORG  Fri Jan 22 18:46:54 2010
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E27F61065670
	for <freebsd-questions@freebsd.org>;
	Fri, 22 Jan 2010 18:46:54 +0000 (UTC)
	(envelope-from kurin@delete.org)
Received: from lithium.delete.org (lithium.delete.org [198.177.254.210])
	by mx1.freebsd.org (Postfix) with ESMTP id C62E98FC18
	for <freebsd-questions@freebsd.org>;
	Fri, 22 Jan 2010 18:46:54 +0000 (UTC)
Received: by lithium.delete.org (Postfix, from userid 1028)
	id 323467E860; Fri, 22 Jan 2010 13:30:12 -0500 (EST)
Date: Fri, 22 Jan 2010 13:30:12 -0500
From: Toby Burress <kurin@delete.org>
To: freebsd-questions@freebsd.org
Message-ID: <20100122183012.GD6476@lithium.delete.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.20 (2009-06-14)
Subject: Drive errors in raidz array
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Jan 2010 18:46:55 -0000

I have a system with 24 drives in raidz2.  When testing with bonnie++
it seemed to work fine (although I had to raise the arc_max to
prevent kernel panics).  However, now we're copying data to it and
dmesg is showing many errors like:

mpt0: mpt_cam_event: 0x16
mpt0: request 0xffffff80005f3840:63495 timed out for ccb 0xffffff000988f800 (req->ccb 0xffffff000988f800)
mpt0: request 0xffffff80005f1f80:63496 timed out for ccb 0xffffff00098d0800 (req->ccb 0xffffff00098d0800)
mpt0: attempting to abort req 0xffffff80005f3840:63495 function 0
mpt0: request 0xffffff8000601ee0:63497 timed out for ccb 0xffffff011edaa800 (req->ccb 0xffffff011edaa800)
mpt0: request 0xffffff80005f4ec0:63498 timed out for ccb 0xffffff011eda5800 (req->ccb 0xffffff011eda5800)
mpt0: mpt_wait_req(1) timed out
mpt0: mpt_recover_commands: abort timed-out. Resetting controller
mpt0: mpt_cam_event: 0x0
mpt0: completing timedout/aborted req 0xffffff80005f3840:63495
mpt0: completing timedout/aborted req 0xffffff80005f1f80:63496
mpt0: completing timedout/aborted req 0xffffff8000601ee0:63497
mpt0: completing timedout/aborted req 0xffffff80005f4ec0:63498

followed by

(da0:mpt0:0:1:0): READ(10). CDB: 28 0 1 23 81 6f 0 0 2b 0 
(da0:mpt0:0:1:0): CAM Status: SCSI Status Error
(da0:mpt0:0:1:0): SCSI Status: Check Condition
(da0:mpt0:0:1:0): UNIT ATTENTION asc:29,0
(da0:mpt0:0:1:0): Power on, reset, or bus device reset occurred
(da0:mpt0:0:1:0): Retrying Command (per Sense Data)

for every drive in the array.  Additionally, zpool scrub says:

 pool: backups
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 0h0m with 0 errors on Thu Jan 21 23:15:36 2010

I'm using 8.0-RELEASE-p2 on amd64.  One other thing that changed
between testing with bonnie++ and now is that I used glabel to label
the drives before I put them in the raidz array.

There is no raid controller.

Is this something anyone has seen before?  Googling around shows
some similar errors but no solutions.