Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Apr 2007 16:41:03 +0200
From:      =?iso-8859-1?Q?St=E5le?= Kristoffersen <staalebk@ifi.uio.no>
To:        Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= <des@des.no>
Cc:        fs@freebsd.org
Subject:   Re: ZFS + replacing failing hard-drive.
Message-ID:  <20070418144103.GB31727@eschew.pusen.org>
In-Reply-To: <86hcrdlqak.fsf@dwp.des.no>
References:  <20070418104155.GA31727@eschew.pusen.org> <86hcrdlqak.fsf@dwp.des.no>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2007-04-18 at 16:18, Dag-Erling Smørgrav wrote:
> Ståle Kristoffersen <staalebk@ifi.uio.no> writes:
> > I have been testing ZFS on my fileserver. The data I had in the zpool is
> > not that important so I do not have redundancy. Unfortunately I got a bad
> > hard-drive:
> > Apr 13 22:02:44 fs root: ZFS: vdev I/O failure, zpool=stash path=/dev/ad14s1d offset=336216064000 size=131072 error=5
> > Apr 13 22:02:56 fs kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3013311
> 
> I don't think you do.  This appears to be a bug in the ata driver
> which ZFS is particularly good at triggering.

I first noticed the problems running UFS an the first partition, and I have
tried the drive on all of the following controllers:
atapci0: <SiI 3132 SATA300 controller> port 0xcf00-0xcf7f mem 0xfddff000-0xfddff07f,0xfddf8000-0xfddfbfff irq 19 at device 0.0 on pci4
atapci1: <JMicron JMB363 SATA300 controller> port 0xaf00-0xaf07,0xae00-0xae03,0xad00-0xad07,0xac00-0xac03,0xab00-0xab0f mem 0xfd9fe000-0xfd9fffff irq 17 at device 0.0 on pci6
atapci2: <Intel ICH8 SATA300 controller> port 0xfa00-0xfa07,0xf900-0xf903,0xf800-0xf807,0xf700-0xf703,0xf600-0xf60f,0xf500-0xf50f irq 19 at device 31.2 on pci0
atapci3: <Intel ICH8 SATA300 controller> port 0xf300-0xf307,0xf200-0xf203,0xf100-0xf107,0xf000-0xf003,0xef00-0xef0f,0xee00-0xee0f irq 19 at device 31.5 on pci0

Same problem on all. And to support my theory that the disk was bad the new
disk does not behave badly, even after a zpool scrub.

> BTW, the message you show is harmless: see where it says "retrying"?
> No need to worry until it says "FAILURE - WRITE_DMA timed out".

Just had a quick peek in the logs and did not find any of them the last
time, but I do get them:
Apr 13 21:17:14 fs kernel: ad14: FAILURE - WRITE_DMA48 timed out LBA=719378349
Apr 13 21:22:23 fs kernel: ad14: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=719341415

Another issue is that even if all the drives support SATA300, and all the
controllers does so as well, they still come up as SATA150 (except one).
(And yeah, I have removed that jumper)
ad8: 305245MB <Seagate ST3320620AS 3.AAC> at ata4-master SATA300
ad10: 381554MB <Seagate ST3400620AS 3.AAK> at ata5-master SATA150
ad14: 305245MB <Seagate ST3320620AS 3.AAC> at ata7-master SATA150
ad15: 305245MB <Seagate ST3320620AS 3.AAC> at ata7-slave SATA150
ad16: 305245MB <Seagate ST3320620AS 3.AAE> at ata8-master SATA150

but this is probably the wrong place for that.

-- 
Ståle Kristoffersen
staalebk@ifi.uio.no



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070418144103.GB31727>