From owner-freebsd-stable@FreeBSD.ORG Tue Jul 24 04:42:08 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3F47516A41F for ; Tue, 24 Jul 2007 04:42:08 +0000 (UTC) (envelope-from jdc@parodius.com) Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3]) by mx1.freebsd.org (Postfix) with ESMTP id 2BAAD13C469 for ; Tue, 24 Jul 2007 04:42:08 +0000 (UTC) (envelope-from jdc@parodius.com) Received: by mx01.sc1.parodius.com (Postfix, from userid 1000) id 18C731CC05D; Mon, 23 Jul 2007 21:42:08 -0700 (PDT) Date: Mon, 23 Jul 2007 21:42:08 -0700 From: Jeremy Chadwick To: Bill Swingle Message-ID: <20070724044208.GA79101@eos.sc1.parodius.com> Mail-Followup-To: Bill Swingle , Daniel O'Connor , freebsd-stable@freebsd.org References: <46A54B6F.9010100@dub.net> <200707241128.19418.doconnor@gsoft.com.au> <46A56695.1000001@dub.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <46A56695.1000001@dub.net> User-Agent: Mutt/1.5.15 (2007-04-06) Cc: freebsd-stable@freebsd.org Subject: Re: problems with Hitachi 1TB SATA drives X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Jul 2007 04:42:08 -0000 On Mon, Jul 23, 2007 at 07:40:21PM -0700, Bill Swingle wrote: > Doh, I knew I forgot something in my original email. > Here's the full dmesg: http://dub.net/rum.dub.net.dmesg Actually you did include this in your original Email. I think Daniel overlooked it. :-) After looking at your dmesg and your claim, I got confused because your initial statement included the use of a 3ware card. A verbose description of your configuration: * ad0: 43979MB at ata0-master UDMA100 -- hooked to: atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 ata0: on atapci0 ata1: on atapci0 * ad4: 953869MB at ata2-master SATA150 * ad6: 953869MB at ata3-master SATA150 -- both hooked to: atapci1: port 0xec00-0xec07,0xe800-0xe803,0xe400-0xe407,0xe000-0xe003,0xdc00-0xdc0f irq 18 at device 31.2 on pci0 ata2: on atapci1 ata3: on atapci1 * twed0: on twe0 twed0: 583440MB (1194885120 sectors) -- hoooked to: twe0: <3ware Storage Controller. Driver version 1.50.01.002> port 0xb800-0xb80f mem 0xfeaffc00-0xfeaffc0f,0xfe000000-0xfe7fffff irq 17 at device 2.0 on pci3 twe0: [GIANT-LOCKED] twe0: 4 ports, Firmware FE7X 1.05.00.063, BIOS BE7X 1.08.00.048 I have to assume that atapci0 is actually using IRQ 14 even though it's not shown (weird...). Additionally your ICH5 SATA controller is sharing an IRQ with a couple other devices on the PCI bus; this isn't bad, but I'm noting it here in case this turns out to be some weird interrupt problem: em0: port 0xac00-0xac1f mem 0xfd9e0000-0xfd9fffff irq 18 at device 1.0 on pci2 uhci2: port 0xd400-0xd41f irq 18 at device 29.2 on pci0 On to this: > Jul 21 00:21:45 rum kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=54194911 > Jul 21 00:22:20 rum kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=107260543 > Jul 21 00:22:57 rum kernel: ad4: FAILURE - device detached > Jul 21 00:22:57 rum kernel: subdisk4: detached > Jul 21 00:22:57 rum kernel: ad4: detached > Jul 21 00:24:19 rum kernel: ad6: FAILURE - device detached > Jul 21 00:24:19 rum kernel: subdisk6: detached > Jul 21 00:24:19 rum kernel: ad6: detached > > ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=1456106111 > ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=1456106111 > ad4: FAILURE - WRITE_DMA48 timed out LBA=1456106111 > ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=54194911 > ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=461407775 > ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=461407775 > ad4: FAILURE - WRITE_DMA48 timed out LBA=461407775 But then: > When trying to newfs them both eventually failed with DMA READ or > WRITE timeouts. Now I'm confused. :-) I only see evidence of a failure on ad4. The ad6 disk disconnecting from the bus could be caused by the controller getting wedged while waiting for certain transactions sent to ad4 (which are failing). I've seen this scenario happen many times. The panic you got is probably also induced by the same issue. Does the WRITE_DMA/DMA48 problem happen for you when newfs'ing a slice on ad6? > I've read that bad SATA cables could cause this, the cables I'm using > are brand new but are probably pretty cheap. For testing purposes swap them out with some other cables. It may not be the cables at all, so keep the originals around. Also might try using some of that canned air to blow out any dust around the SATA connector ends on the cables, drives, and motherboard. Remaining questions I have: Q: Is your ICH5 controller actually ICH5R and you've turned on some Intel RAID option in the BIOS? Maybe turning it on but leaving the disks in a JBOD fashion (not defining an array)? The reason I ask is that you said you're going to use the Hitachi drives as "a pair of 1TB synchronised drives", which implies RAID-1, yet I don't see use of gmirror or ccd or anything else. :-) Q: What motherboard and model is this? Looks like an Intel. Q: If an Intel, have you gone looking at Intel's site for BIOS updates for that board? Intel is the one company who is thorough about documenting BIOS changes in their Release Notes. It would not surprise me if this turned out to be some kind of weird BIOS bug. Q: Some motherboards let you toggle certain "compatibility" mode stuff for the SATA controller in the BIOS. You might want to flip that to see what happens (if compatibility, try the opposite. And vice-versa of course). Q: Have you searched Google for issues others have reported (such as in Linux) with the HDS721010KLA330 or similar (differently-sized) models? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |