From owner-freebsd-stable@FreeBSD.ORG Tue Jan 20 09:16:43 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E9A47106566B for ; Tue, 20 Jan 2009 09:16:43 +0000 (UTC) (envelope-from admin@kkip.pl) Received: from mainframe.kkip.pl (kkip.pl [87.105.164.78]) by mx1.freebsd.org (Postfix) with ESMTP id 6BF368FC08 for ; Tue, 20 Jan 2009 09:16:43 +0000 (UTC) (envelope-from admin@kkip.pl) Received: from admin.admin.lan.kkip.pl ([10.66.3.254]) by mainframe.kkip.pl with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1LPCGl-0006W6-Bx; Tue, 20 Jan 2009 09:47:39 +0100 Message-ID: <49758FA7.3060606@kkip.pl> Date: Tue, 20 Jan 2009 09:47:35 +0100 From: Bartosz Stec User-Agent: Thunderbird 2.0.0.19 (Windows/20081209) MIME-Version: 1.0 To: Marc UBM References: <20090119202016.11f42e3b.ubm.freebsd@gmail.com> In-Reply-To: <20090119202016.11f42e3b.ubm.freebsd@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Authenticated-User: admin@kkip.pl X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Spam-Score: -8.9 X-Spam-Score-Int: -88 X-Exim-Version: 4.69 (build at 14-Jan-2009 11:48:13) X-Date: 2009-01-20 09:47:39 X-Connected-IP: 10.66.3.254:2488 X-Message-Linecount: 144 X-Body-Linecount: 130 X-Message-Size: 5178 X-Body-Size: 4593 X-Received-Count: 1 X-Recipient-Count: 2 X-Local-Recipient-Count: 2 X-Local-Recipient-Defer-Count: 0 X-Local-Recipient-Fail-Count: 0 Cc: freebsd-stable@freebsd.org Subject: Re: problems with sata disks (taskqueue timeout) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jan 2009 09:16:44 -0000 Marc UBM pisze: > Hiho! :-) > > Occasionally, especially when uploading a large number of files, the > (brand-new, tested) sata disks in my fileserver spit out some of these > errors: > > ----------------------- > > Jan 19 19:51:14 hamstor kernel: ad10: WARNING - WRITE_DMA48 UDMA ICRC > error (retrying request) LBA=882778752 > > Jan 19 19:51:23 hamstor kernel: > ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - > completing request directly > > Jan 19 19:51:27 hamstor kernel: ad10: > WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing > request directly > > Jan 19 19:51:31 hamstor kernel: ad10: WARNING - > SETFEATURES ENABLE WCACHE taskqueue timeout - completing request > directly > > Jan 19 19:51:35 hamstor kernel: ad10: WARNING - SET_MULTI > taskqueue timeout - completing request directly > > Jan 19 19:51:35 hamstor > kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (0 retries left) > LBA=882778752 > > Jan 19 19:51:35 hamstor kernel: ad10: FAILURE - > WRITE_DMA48 > status=ff > error=ff > LBA=882778752 > > Jan 19 19:51:35 hamstor root: ZFS: vdev I/O failure, > zpool=gedaerm path=/dev/ad10 offset=451982655488 size=131072 error=5 > > Jan 19 19:51:41 hamstor kernel: ad10: FAILURE - SET_MULTI > status=51 error=4 > > Jan 19 19:51:41 hamstor > kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (1 retry left) > LBA=882779008 > > Jan 19 19:51:41 hamstor kernel: ad10: WARNING - > WRITE_DMA48 UDMA ICRC error (retrying request) LBA=882779008 Jan 19 > 19:51:50 hamstor kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE > taskqueue timeout - completing request directly > > Jan 19 19:51:54 hamstor > kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout > - completing request directly > > Jan 19 19:51:58 hamstor kernel: ad10: > WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing > request directly > > Jan 19 19:52:02 hamstor kernel: ad10: WARNING - > SET_MULTI taskqueue timeout - completing request directly Jan 19 > 19:52:02 hamstor kernel: ad10: FAILURE - WRITE_DMA48 timed out > LBA=882779008 > > Jan 19 19:52:02 hamstor root: ZFS: vdev I/O failure, > zpool=gedaerm path=/dev/ad10 offset=451982786560 size=131072 error=5 > > ----------------------- > > I've fiddled with the cables, which seemed to help, but I've been > unable to completely eliminate the errors. The disks are two Western > Digital MyBooks Home Edition (1 TB per disk), connected to a Promise TX > 4 SATA Controller: > > atapci0@pci0:1:6:0: class=0x018000 card=0x3d17105a chip=0x3d17105a > rev=0x02 hdr=0x00 vendor = 'Promise Technology Inc' > device = 'PDC40718-GP SATA 300 TX4 Controller' > class = mass storage > > They're connected via 50cm esata cables. > > I've googled on the net and found some vague hints about problems with > the Promise TX4, but nothing concrete. > > What I've found is > > http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting > > basically telling me "these things happen, deal with it" :-) > > The problem is, I cannot produce these problems reliably, only thing I > notice is that they *seem* to happen more often if a lot of large files > are copied in succession. > > Can anybody tell me if upgrading to 7.2 oder -current will help? > > I'm currently running > > 7.0-STABLE-200804 FreeBSD 7.0-STABLE-200804 #0: Wed Dec 10 15:29:03 CET > 2008 ***@host:/usr/obj/usr/src/sys/GENERIC amd64 > > Next step I'll try is upgrading to RELENG_7 to see if that helps. > > > Greetings, > Marc > Cheers Marc. My personal experience makes me think that this issue is controller/driver related. I'm using SATA 300 TX4 Controller from times of 6.1-Relaese on my fileserver (with 2 of 4 ports used) and I saw a lot of exactly the same errors in logs. Sometimes it was harmless, but sometimes as an effect of these one of disks magically disconnected from controller and only way to get it back and working was power down and up PC. That mostly happened while heavy I/O like while dumping filesystems. Good thing is that starting from 7.0-release I saw such errors maybe 2-3 times and I didn't saw them at all from at least 6 months. Probably because I rebuild my system about once a month to keep up with stable branch and something was corrected in sources through that time. So I also advice to upgrade to RELENG_7 and you probably get rid of these. Good luck! -- Bartosz Stec