From owner-freebsd-fs@FreeBSD.ORG Wed Apr 18 14:41:08 2007 Return-Path: X-Original-To: fs@freebsd.org Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AD38916A404 for ; Wed, 18 Apr 2007 14:41:08 +0000 (UTC) (envelope-from staalebk@ifi.uio.no) Received: from smtp.bluecom.no (smtp.bluecom.no [193.75.75.28]) by mx1.freebsd.org (Postfix) with ESMTP id 43FAD13C487 for ; Wed, 18 Apr 2007 14:41:08 +0000 (UTC) (envelope-from staalebk@ifi.uio.no) Received: from eschew.pusen.org (unknown [193.69.145.10]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.bluecom.no (Postfix) with ESMTP id D234F16FD9E; Wed, 18 Apr 2007 16:41:06 +0200 (CEST) Received: from chiller by eschew.pusen.org with local (Exim 4.50) id 1HeBLE-0006g1-5j; Wed, 18 Apr 2007 16:41:04 +0200 Date: Wed, 18 Apr 2007 16:41:03 +0200 From: =?iso-8859-1?Q?St=E5le?= Kristoffersen To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= Message-ID: <20070418144103.GB31727@eschew.pusen.org> References: <20070418104155.GA31727@eschew.pusen.org> <86hcrdlqak.fsf@dwp.des.no> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <86hcrdlqak.fsf@dwp.des.no> User-Agent: Mutt/1.5.13 (2006-08-11) Cc: fs@freebsd.org Subject: Re: ZFS + replacing failing hard-drive. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2007 14:41:08 -0000 On 2007-04-18 at 16:18, Dag-Erling Smørgrav wrote: > Ståle Kristoffersen writes: > > I have been testing ZFS on my fileserver. The data I had in the zpool is > > not that important so I do not have redundancy. Unfortunately I got a bad > > hard-drive: > > Apr 13 22:02:44 fs root: ZFS: vdev I/O failure, zpool=stash path=/dev/ad14s1d offset=336216064000 size=131072 error=5 > > Apr 13 22:02:56 fs kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3013311 > > I don't think you do. This appears to be a bug in the ata driver > which ZFS is particularly good at triggering. I first noticed the problems running UFS an the first partition, and I have tried the drive on all of the following controllers: atapci0: port 0xcf00-0xcf7f mem 0xfddff000-0xfddff07f,0xfddf8000-0xfddfbfff irq 19 at device 0.0 on pci4 atapci1: port 0xaf00-0xaf07,0xae00-0xae03,0xad00-0xad07,0xac00-0xac03,0xab00-0xab0f mem 0xfd9fe000-0xfd9fffff irq 17 at device 0.0 on pci6 atapci2: port 0xfa00-0xfa07,0xf900-0xf903,0xf800-0xf807,0xf700-0xf703,0xf600-0xf60f,0xf500-0xf50f irq 19 at device 31.2 on pci0 atapci3: port 0xf300-0xf307,0xf200-0xf203,0xf100-0xf107,0xf000-0xf003,0xef00-0xef0f,0xee00-0xee0f irq 19 at device 31.5 on pci0 Same problem on all. And to support my theory that the disk was bad the new disk does not behave badly, even after a zpool scrub. > BTW, the message you show is harmless: see where it says "retrying"? > No need to worry until it says "FAILURE - WRITE_DMA timed out". Just had a quick peek in the logs and did not find any of them the last time, but I do get them: Apr 13 21:17:14 fs kernel: ad14: FAILURE - WRITE_DMA48 timed out LBA=719378349 Apr 13 21:22:23 fs kernel: ad14: FAILURE - WRITE_DMA48 status=51 error=10 LBA=719341415 Another issue is that even if all the drives support SATA300, and all the controllers does so as well, they still come up as SATA150 (except one). (And yeah, I have removed that jumper) ad8: 305245MB at ata4-master SATA300 ad10: 381554MB at ata5-master SATA150 ad14: 305245MB at ata7-master SATA150 ad15: 305245MB at ata7-slave SATA150 ad16: 305245MB at ata8-master SATA150 but this is probably the wrong place for that. -- Ståle Kristoffersen staalebk@ifi.uio.no