From owner-freebsd-current@FreeBSD.ORG Sat Feb 28 20:06:02 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0CA18106564A for ; Sat, 28 Feb 2009 20:06:02 +0000 (UTC) (envelope-from elliot+list@schlegelmilch.org) Received: from a.r00t-servers.net (a.r00t-servers.net [206.223.224.18]) by mx1.freebsd.org (Postfix) with ESMTP id D85E78FC16 for ; Sat, 28 Feb 2009 20:06:01 +0000 (UTC) (envelope-from elliot+list@schlegelmilch.org) Received: by a.r00t-servers.net (Postfix, from userid 1002) id D2F6C33C3F; Sat, 28 Feb 2009 14:48:52 -0500 (EST) Date: Sat, 28 Feb 2009 14:48:52 -0500 From: Elliot Schlegelmilch To: Gary Jennejohn Message-ID: <20090228194852.GA62162@schlegelmilch.org> References: <49A5A276.9080401@FreeBSD.org> <20090226122212.76077ed0@ernst.jennejohn.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090226122212.76077ed0@ernst.jennejohn.org> User-Agent: Mutt/1.4.2.3i Cc: FreeBSD-Current Subject: Re: SATA disks suddenly stop working X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Feb 2009 20:06:02 -0000 On Thu, Feb 26, 2009 at 12:22:12PM +0100, Gary Jennejohn wrote: > On Wed, 25 Feb 2009 21:56:38 +0200 > Alexander Motin wrote: > > > Gary Jennejohn wrote: > > > I've been having lots of problems with SATA drives attached to higher > > > port numbers, namely ata5 and ata6. > > > > > > I was installing Linux under qemu today and it had been running for > > > several hours and had installed multi-gigabytes of data when qemu > > > just stopped. > > > > > > I noticed that all I/O to the disk had ceased. > > > > > > Doing "atacontrol reinit" on the port (ata5) resulted in a message > > > that the device was not configured, which was patently false since > > > qemu had just been merrily writing to it. > > > > > > This with a kernel made from sources updated today at about 2 PM (GMT+1). > > > > > > I've also seen problems with a disk attached to ata6. It just sort > > > of disappears after a while. > > > > > > Disks attached to ata2, ata3 and ata4 don't exhibit any problems. > > > > You have told much and same time gave nothing that can be used. > > > > I was only interested in whether others have seen this problem. I was > not looking for a solution. > > > What controller do you have? What drives on what channels? Is there any > > kernel messages about the problem? Have you tried to enable verbose > > messages to get additional details? > > > > atapci0@pci0:0:17:0: class=0x010601 card=0xb0021458 chip=0x43911002 rev=0x00 hdr=0x00 > vendor = 'ATI Technologies Inc' > class = mass storage > subclass = SATA > > There were no kernel messages at all, the drive simply hung. > > I'll do a verbose boot and try to reproduce the disk hang later. > > > Reinit could return ENXIO if it already was in progress. Disappearing > > drives are also can be related to that reinit. Can't it be just a real > > hardware problem? > > > > I should have mentioned that the error returned was about some IOCTL. > Can't remember which one right now, but the error message did include > that the device was not configured. > > I've also noticed several times in the past when the problem occurred > that the BIOS could not enumerate the AHCI disks anymore. I had to > do a POR. Seems that the controller was completely hosed such that > a simple reset didn't reinitialize it sufficiently for it to work. > > This morning I booted the box and started a cvsup. My repository is > on a ZFS mirror with the disks on ata3 and ata4. The system hung after > the data from the server were received, although all the data were > successfully written to the disks. > > I couldn't do anything at all - it looked like the root disk was not > responding and the disk light was on solid red. I had to do a hard > reset. > > This is the first time I've seen a problem with this port. The root > disk is on ata2. > > I rebooted and turned off MSI. I'll monitor the situation to see > whether that helps. I don't mean to hijack your thread, but I've had problems with one of my SATA disks falling off the bus. I could usually retrieve it with an atacontrol detach / retach. However, with a recent kernel all I'm getting is this: ata2: on atapci1 ata2: AHCI reset...: 2 ata2: SATA connect time=0ms ata2: ready wait time=0ms52 (12272 MB) ata2: software reset port 15... ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001 ata2: software reset set timeout ata2: software reset port 0... ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001 ata2: software reset set timeout ata2: SIGNATURE: ffffffff ata2: Unknown signature, assuming disk device ata2: AHCI reset done: devices=00000001 ata2: [MPSAFE] ata2: [ITHREAD] One for each channel, up to ata7. atapci0@pci0:0:31:1: class=0x01018a card=0x948115d9 chip=0x269e8086 rev=0x09 hdr=0x00 vendor = 'Intel Corporation' device = '631xESB/632xESB/3100 Ultra ATA Storage Controller' class = mass storage subclass = ATA The last known kernel which works was Dec 17, but trying to rebuild a kernel from that date doesn't see the SATA disks either (as the kernel which sees the disks zfs doesn't work.) Or perhaps I'm csup'ing incorrectly. I'm still trying to back up far enough so it will work.