From owner-freebsd-stable@FreeBSD.ORG Fri May 14 22:42:38 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DECFD106566B for ; Fri, 14 May 2010 22:42:37 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta14.emeryville.ca.mail.comcast.net (qmta14.emeryville.ca.mail.comcast.net [76.96.27.212]) by mx1.freebsd.org (Postfix) with ESMTP id C68E48FC16 for ; Fri, 14 May 2010 22:42:37 +0000 (UTC) Received: from omta15.emeryville.ca.mail.comcast.net ([76.96.30.71]) by qmta14.emeryville.ca.mail.comcast.net with comcast id HT921e0041Y3wxoAEaiebU; Fri, 14 May 2010 22:42:38 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta15.emeryville.ca.mail.comcast.net with comcast id Haid1e0083S48mS8baidhu; Fri, 14 May 2010 22:42:38 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 289E89B419; Fri, 14 May 2010 15:42:36 -0700 (PDT) Date: Fri, 14 May 2010 15:42:36 -0700 From: Jeremy Chadwick To: Pieter de Boer Message-ID: <20100514224236.GA11680@icarus.home.lan> References: <4BED8B89.6010901@os3.nl> <20100514195346.GA8977@icarus.home.lan> <4BEDBC08.2040002@os3.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4BEDBC08.2040002@os3.nl> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-stable@freebsd.org Subject: Re: Read / write timeouts on SATA disks connected to ICH9 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 22:42:38 -0000 On Fri, May 14, 2010 at 11:09:28PM +0200, Pieter de Boer wrote: > The ad4 SMART output is showing errors, as this disk is indeed > broken now. It wasn't before and it is a replacement of another disk > that wasn't broken either. Grmbl, I now see reallocated sectors on > ad6 as well, in the smartctl output. So both disks look wonky; > although afaik that's not the main issue here. Lots to say about all of this. Focusing on drive ad4 (Western Digital): The disk has 1 uncorrected sector (Attribute 198). This means the drive tried to remap it and was not successful. This could have happened any time during the lifetime of the drive. There are no pending sector reallocations (Attribute 197) (meaning there aren't others which are bad which the drive is waiting to attempt remapping for), and there are no remapped sectors (Attribute 5). There have been no successful reallocation attempts during the drive's lifetime (Attribute 196). In general, I would say this is acceptable. If Attribute 198 was higher, or you had other pending sectors which needed to be remapped, I'd say replace the disk. UDMA/CRC error count (Attribute 199) is zero. That's good -- it means that most likely cabling issues can be ruled out, since the attribute tracks the number of communication errors between the controller and the disk PCB. Drive temperature looks good, so nothing to worry about there. The drive itself has detected numerous error conditions in the SMART error log during its lifetime -- a total of 48, but SMART only lists the most recent 5. The drive has been online for a total of 827 hours (Attribute 9), which we can use to determine how recent the drive experienced said errors. Let's examine the first 3: > Error 48 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours) > 40 51 00 9d 84 0e e0 Error: UNC at LBA = 0x000e849d = 951453 > c8 00 20 00 84 0e 00 00 00:45:18.204 READ DMA > > Error 47 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours) > 40 51 00 0c 9d 0e e0 Error: UNC at LBA = 0x000e9d0c = 957708 > c8 00 80 00 9b 0e 00 00 00:03:08.605 READ DMA > > Error 46 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours) > 40 51 00 9d 84 0e e0 Error: UNC at LBA = 0x000e849d = 951453 > c8 00 80 80 82 0e 00 00 00:03:05.176 READ DMA Okay, it's probably safe to assume these are all signs of the uncorrected sector. When a drive attempts a LBA remap -- which in this case it did, but failed -- it can spend quite a bit of time doing that; in some cases minutes, not seconds. The drive essentially "locks up" during this time (from the perspective of the SATA controller) -- it's literally spending all of its time trying to read and re-read the LBA/sector in different ways, hoping to get the data out of it (and/or correct it with ECC) so that it can be written to a spare block and then internally the bad LBA won't ever be used again. What the OS ends up seeing in this situation is disk timeouts. This is completely normal. The WD Caviar Black drives have a useful feature called TLER -- it's disabled by default, for reasons which I don't want to get into here -- which can force the drive to internally give up after X seconds (it's user-selectable) when dealing with such remapping/errors. The idea is to keep the drive from being deemed dead from the OS/controller's point of view. I believe Seagate, Hitachi, or Samsung (I forget which) have this feature as well, but it's not called TLER. Anyway, so this is probably the cause of one detachment/timeout you've seen FreeBSD report. Let's move on to the 2 remaining errors: > Error 45 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours) > 40 51 08 20 47 6c e0 Error: UNC at LBA = 0x006c4720 = 7096096 > c4 ff 08 ff 46 6c 00 00 00:01:09.459 READ MULTIPLE > > Error 44 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours) > 40 51 08 21 8e 67 e0 Error: UNC at LBA = 0x00678e21 = 6786593 > c4 ff 04 3f 2f 00 00 00 00:01:00.724 READ MULTIPLE These two happened around the same time (10 seconds within one another). I'm under the impression that these are *probably* the result of the above uncorrected sector issue, but I'm not 100% certain. Here's why I think that: - The errors occurred within the same hour mark (817) as the previous 3 errors, - The errors happened only 2 minutes prior to the preceding 3, - The drive was in the process of executing READ MULTIPLE (cmd 0xc4), which tells the disk to read multiple logical sectors within 1 pass. The ATA-8 specification states that READ MULTIPLE is a PIO command. I'm not sure how/why FreeBSD would be submitting this to a disk unless the communication protocol had been downgraded from DMA to PIO. mav@ might have some insights on this, as well as how to decode some of the SMART error data shown. It looks like the 48-bit read input block is written in reverse order (word 5 to word 0). If you want to find out the exact LBA that has the problem (there may be more than one), I can step you through performing a selective LBA scan using SMART, since this model of disk does support such. It's easy to do, easy to understand the results, and can be done while the drive is in operation (though I would recommend trying to keep disk I/O to a minimum during this test). Let me know. Focusing on drive ad6 (Seagate): This drive has 0 uncorrected sectors (Attribute 198); however, I don't know what Attribute 187 represents on this model of drive. The disk has successfully remapped 50 sectors (Attribute 5) during its lifetime. There are no pending sector reallocations (Attribute 197). Drive temperature looks good. I would recommend you consider replacing this drive "down the road" sometime, since it does indicate 50 reallocated sectors. One or two is fine, but 50 is a bit much. At least the drive was able to re-read them and reallocate the data successfully. It's very possible that said remappings caused the drive to time out during some I/O operations (from the perspective of the controller and OS); I went over this with regards to ad4. Finally, your vmstat -i output: > # vmstat -i > interrupt total rate > irq23: atapci0 371021299 10423 Good to know there's no IRQ sharing going on, but what does worry me is the interrupt rate (10K interrupts/second). That seems *extremely* high, but it also depends on what kind of disk I/O is happening on this system -- especially since you have 2 disks attached to the same controller. "iostat 1", "iostat -x 1", or "gstat" might come in handy to tell you what kind of disk I/O is going on. If actual I/O is very little, then something weird is going on with regards to the number of interrupts being seen on IRQ 23. mav@ might have some ideas, otherwise I'd recommend rebooting the machine and seeing if the number drops. If so, it may be that the OS has some sort of bug where a disk timing out or falling off the bus causes interrupt problems. (It's too bad you don't have AHCI on this system. It handles stuff like this much more elegantly...) -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |