From owner-freebsd-stable@FreeBSD.ORG Mon Jun 21 22:02:11 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1DD96106566B for ; Mon, 21 Jun 2010 22:02:11 +0000 (UTC) (envelope-from matt@bubblegen.co.uk) Received: from relay.pcl-ipout01.plus.net (relay.pcl-ipout01.plus.net [212.159.7.99]) by mx1.freebsd.org (Postfix) with ESMTP id 34C2C8FC24 for ; Mon, 21 Jun 2010 22:02:09 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av0EAKd3H0xUXebj/2dsb2JhbACDHZtqcbIVkRyBJYEpEoFLcASQdg Received: from outmx01.plus.net ([84.93.230.227]) by relay.pcl-ipout01.plus.net with ESMTP; 21 Jun 2010 22:33:13 +0100 Received: from bubblegen.plus.com ([80.229.236.194] helo=[192.136.1.18]) by outmx01.plus.net with esmtp (Exim) id 1OQocC-0004e4-Ui; Mon, 21 Jun 2010 22:33:13 +0100 From: Matthew Lear To: Jeremy Chadwick In-Reply-To: <1276889330.2210.44.camel@almscliff.bubblegen.co.uk> References: <1276844904.7519.19.camel@almscliff.bubblegen.co.uk> <20100618082127.GA34578@icarus.home.lan> <1276876031.7519.39.camel@almscliff.bubblegen.co.uk> <20100618174208.GA47470@icarus.home.lan> <1276889330.2210.44.camel@almscliff.bubblegen.co.uk> Content-Type: text/plain; charset="UTF-8" Date: Mon, 21 Jun 2010 22:33:12 +0100 Message-ID: <1277155992.1860.3.camel@almscliff.bubblegen.co.uk> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: 7.2-RELEASE-p4, IO errors & RAID1 failure X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jun 2010 22:02:11 -0000 Hello Jeremy. I just wondered if you had any further thoughts on the info below. Two new disks arrived over the weekend and I'm still unsure if I'm best to replace ad0 or not... Much appreciated indeed. -- Matt On Fri, 2010-06-18 at 20:28 +0100, Matthew Lear wrote: > On Fri, 2010-06-18 at 10:42 -0700, Jeremy Chadwick wrote: > > On Fri, Jun 18, 2010 at 04:47:11PM +0100, Matthew Lear wrote: > > > Hello Jeremy, > > > Thanks very much for the feedback. > > > > > > [snip] > > > > Could you please provide the full output from "smartctl -a /dev/ad0" > > > > here? Your drive may be completely fine and you may not have to swap it > > > > at all; hard to say. > > > > > > Sure. See below: > > > {snip} > > > > Your SMART statistics look completely OK. There's nothing there that > > indicates there were any write failures or otherwise. I'll explain near > > the end of the Email how to test a range of LBAs "just in case". > > Good. That's what I thought too :-) > > > I'll take a moment to point out that the error previously seen was a > > timeout during a write transaction (WRITE_DMA48). Recap: > > > > > > > ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=395032335 > > > > > ad0: FAILURE - WRITE_DMA48 status=51 error=10 LBA=395032335 > > > > > ar0: WARNING - mirror protection lost. RAID1 array in DEGRADED mode > > > > The status codes shown (status=51 and error=10) are hexadecimal. I'm > > pointing this out because they aren't preceded by '0x' or '$' and it > > clarifies my next point: > > > > NID_NOT_FOUND (bit 4 set in the ATA error field) is referred to as IDNF > > per ATA6-ACS specification and onward, so I'll refer to it as that. > > (I've always wondered why FreeBSD calls this NID_NOT_FOUND; IDFN stands > > for ID Not Found, so what's with the extra "N"? I've always felt this > > is a typo...) > > > > Using the ATA8-ACS specification working draft (2007/05/21), since it's > > more recent, we see the following: > > > > Section 6.2 - Error field > > Section 6.2.4 - ID Not Found (IDNF) bit > > > > Error bit 4. The IDNF bit shall be set to one if a user-accessible > > address was not found. The IDNF bit shall be set to one if an > > address outside of the range of user-accessible addresses is > > requested when command aborted is not returned (see 4.11.3 and > > 6.2.1). > > > > Section 4.11 - Host Protected Area (HPA) feature set > > Section 4.11.3 - 28-bit and 48-bit HPA commands > > > > Any read or write command to an address above the maximum address > > specified by the SET MAX ADDRESS or SET MAX ADDRESS EXT command shall > > cause command completion with the IDNF bit set to one and ERR set to > > one, or command aborted. > > > > There's no definition of what "address" means in 6.2.4, but the most > > logical (pun intended) guess is an LBA. This error is returned by the > > disk (e.g. not a controller-induced error). I've mentioned this problem > > in the past: > > > > http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting > > > > I've always read IDNF to mean "OS requested access (read or write) to an > > LBA which is out of bounds", where "out of bounds" means "not between 0 > > and ". How exactly is that possible? Alexander, do you have > > any familiarity with this error code per ATA spec? > > > > Matthew, can you provide output from "atacontrol cap ad0"? Thanks. > > Sure thing. See below. > [root@meshuga /home/matt]# atacontrol cap ad0 > > Protocol SATA revision 2.x > device model WDC WD3200AAKS-00VYA0 > serial number WD-WCARW0164427 > firmware revision 12.01B02 > cylinders 16383 > heads 16 > sectors/track 63 > lba supported 268435455 sectors > lba48 supported 625142448 sectors > dma supported > overlap not supported > > Feature Support Enable Value Vendor > write cache yes yes > read ahead yes yes > Native Command Queuing (NCQ) yes - 31/0x1F > Tagged Command Queuing (TCQ) no no 31/0x1F > SMART yes yes > microcode download yes yes > security yes no > power management yes yes > advanced power management no no 0/0x00 > automatic acoustic management yes no 254/0xFE 128/0x80 > [root@meshuga /home/matt]# > > > > > > Now regarding the LBA tests -- "smartctl -t select,start-end" will do > > the trick. start should be a starting LBA, end should be an ending LBA. > > The OS claims that LBA 395032335 is what was requested to be accessed > > when the failure happened, so I would recommend picking start/end ranges > > around that area. Remember that a single sector encapsulates a very > > large number of blocks (especially given sizes of disks today), so it's > > wise to pick a very large range of LBAs. I would recommend this in your > > case: > > > > smartctl -t select,390000000,410000000 /dev/ad0 > > [root@meshuga /home/matt]# smartctl -t > select,390000000-410000000 /dev/ad0 > smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 7.2-RELEASE-p4 i386] (local > build) > Copyright (C) 2002-10 by Bruce Allen, > http://smartmontools.sourceforge.net > > === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === > Sending command: "Execute SMART Selective self-test routine immediately > in off-line mode". > SPAN STARTING_LBA ENDING_LBA > 0 390000000 410000000 > Drive command "Execute SMART Selective self-test routine immediately in > off-line mode" successful. > Testing has begun. > You have mail in /var/mail/matt > [root@meshuga /home/matt]# > > ... some time passes polling status with smartctl -a /dev/ad0 ... > > [root@meshuga /home/matt]# smartctl -a /dev/ad0 > smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 7.2-RELEASE-p4 i386] (local > build) > Copyright (C) 2002-10 by Bruce Allen, > http://smartmontools.sourceforge.net > > === START OF INFORMATION SECTION === > Model Family: Western Digital Caviar Blue Serial ATA family > Device Model: WDC WD3200AAKS-00VYA0 > Serial Number: WD-WCARW0164427 > Firmware Version: 12.01B02 > User Capacity: 320,072,933,376 bytes > Device is: In smartctl database [for details use: -P show] > ATA Version is: 8 > ATA Standard is: Exact ATA specification draft version not indicated > Local Time is: Fri Jun 18 20:20:17 2010 BST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x84) Offline data collection activity > was suspended by an interrupting command from host. > Auto Offline Data Collection: Enabled. > Self-test execution status: ( 0) The previous self-test routine > completed > without error or no self-test has ever > been run. > Total time to complete Offline > data collection: (8400) seconds. > Offline data collection > capabilities: (0x7b) SMART execute Offline immediate. > Auto Offline data collection on/off support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 100) minutes. > Conveyance self-test routine > recommended polling time: ( 5) minutes. > SCT capabilities: (0x303f) SCT Status supported. > SCT Feature Control supported. > SCT Data Table supported. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always > - 0 > 3 Spin_Up_Time 0x0003 218 150 021 Pre-fail Always > - 2100 > 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always > - 118 > 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always > - 0 > 7 Seek_Error_Rate 0x000e 200 200 051 Old_age Always > - 0 > 9 Power_On_Hours 0x0032 088 088 000 Old_age Always > - 9320 > 10 Spin_Retry_Count 0x0012 100 100 051 Old_age Always > - 0 > 11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always > - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always > - 116 > 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always > - 115 > 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always > - 118 > 194 Temperature_Celsius 0x0022 108 103 000 Old_age Always > - 39 > 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always > - 0 > 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always > - 0 > 198 Offline_Uncorrectable 0x0010 200 200 000 Old_age > Offline - 0 > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always > - 0 > 200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age > Offline - 0 > > SMART Error Log Version: 1 > No Errors Logged > > SMART Self-test log structure revision number 1 > Num Test_Description Status Remaining > LifeTime(hours) LBA_of_first_error > # 1 Selective offline Completed without error 00% 9319 > - > # 2 Extended offline Completed without error 00% 9299 > - > # 3 Short offline Completed without error 00% 9298 > - > > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 390000000 410000000 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 minute > delay. > > > > I would highly recommend doing this with the disk not doing any I/O, > > though it won't hurt it (it'll just delay the scan). "smartctl -a" will > > show the state of things in the "SMART Selective self-test log" at the > > bottom, or somewhere else within the output (depends on the drive). > > > > This should, in my opinion, rule out whether or not there's a bad block > > or something along those lines within said range. Given what I believe > > IDNF represents, I would say your scan will probably come back clean. > > Also remember that the scan performed here is a *disk-level scan*; the > > disk firmware itself is doing it (the OS isn't involved). This helps > > rule out any sort of "weird" issues that the OS may be reporting ("hey > > man, LBA 8943943983492893428932489324 is bad!" "Yeah sure it is"). > > > > > The two devices in the array are on channels 0 and 1. There is indeed a > > > second drive on channel 0 (160G). As I said above, I use that as an > > > additional back up device but it's not part of the array. > > > > Okay, so executing "atacontrol detach ata0" will cause you to lose both > > ad0 and ad1. If you can live with that, then cool. > > > > > > What motherboard is this? Can you change the setting to either > > > > "Native", "Enhanced", or (even better) "AHCI"? I've seen some systems > > > > where the Serial ATA option in the BIOS has an "Auto" option, which does > > > > totally bizarre things at times. > > > > > > I think this has been covered in subsequent postings. I could try it but > > > as you say below, I'd like to resolve the disk issue first. > > > ... > > > > The atacontrol man page covers your situation: > > > > ... > > > I don't think this is the case for me since ad0 and ad2 are on seperate > > > ata channels. > > > ... > > > Indeed but my hw doesn't have hot-swap capability (at the moment!). > > > > That's the problem -- we're not sure if this really is a disk issue. > > It's been reported before, others have reported solving it by increasing > > ATA timeout values, etc... But the fact of the matter is, that error > > code is being returned by the device. > > Indeed. if I look back through my logs the error has happened quite a > few times over the course of several weeks before atacontrol took action > and offlined the drive: > > [root@meshuga /home/matt]# zcat /var/log/messages.0.bz2 | grep ad0 > Mar 14 21:17:33 meshuga kernel: ad0: 305245MB 12.01B02> at ata0-master SATA300 > Mar 14 21:17:33 meshuga kernel: GEOM_LABEL: Label for provider ad0s1a is > ufsid/47284adaeab4f771. > Mar 14 21:17:33 meshuga kernel: GEOM_LABEL: Label for provider ad0s1d is > ufsid/47284ae5159318bb. > Mar 14 21:17:33 meshuga kernel: GEOM_LABEL: Label for provider ad0s1e is > ufsid/47284ada8adb1df9. > Mar 14 21:17:33 meshuga kernel: GEOM_LABEL: Label for provider ad0s1f is > ufsid/47284adaa202995d. > Mar 14 21:17:33 meshuga kernel: ar0: disk0 READY (master) using ad0 at > ata0-master > Mar 28 17:59:59 meshuga kernel: ad0: 305245MB 12.01B02> at ata0-master SATA300 > Mar 28 17:59:59 meshuga kernel: GEOM_LABEL: Label for provider ad0s1a is > ufsid/47284adaeab4f771. > Mar 28 17:59:59 meshuga kernel: GEOM_LABEL: Label for provider ad0s1d is > ufsid/47284ae5159318bb. > Mar 28 17:59:59 meshuga kernel: GEOM_LABEL: Label for provider ad0s1e is > ufsid/47284ada8adb1df9. > Mar 28 17:59:59 meshuga kernel: GEOM_LABEL: Label for provider ad0s1f is > ufsid/47284adaa202995d. > Mar 28 17:59:59 meshuga kernel: ar0: disk0 READY (master) using ad0 at > ata0-master > May 1 03:03:23 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=20472527 > May 30 03:01:54 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=24996179 > Jun 1 03:02:47 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=27745599 > Jun 1 03:03:35 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=29127787 > Jun 1 03:03:41 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=29131871 > Jun 1 05:00:10 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=21646831 > Jun 2 03:02:02 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=25019047 > Jun 2 03:02:09 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=24262079 > Jun 2 03:02:25 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=26503067 > Jun 2 03:02:45 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=24991895 > Jun 2 03:03:22 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=26526099 > Jun 2 03:03:34 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=21229415 > Jun 2 03:03:46 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=20498839 > Jun 2 03:04:07 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=29996295 > Jun 2 03:04:14 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=27617575 > Jun 2 03:04:40 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=29939883 > Jun 2 03:04:48 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=22747195 > Jun 7 05:04:22 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=21220491 > [r > > And most recently: > > [root@meshuga /home/matt]# cat /var/log/messages | grep ad0 > Jun 13 03:01:09 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=28825859 > Jun 13 03:01:17 meshuga kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=30662831 > Jun 17 05:00:58 meshuga kernel: ad0: TIMEOUT - WRITE_DMA48 retrying (1 > retry left) LBA=395032335 > Jun 17 05:00:58 meshuga kernel: ad0: FAILURE - WRITE_DMA48 > status=51 error=10 LBA=395032335 > Jun 17 05:02:20 meshuga kernel: ad0: WARNING - WRITE_DMA taskqueue > timeout - completing request directly > Jun 17 05:02:20 meshuga kernel: ad0: WARNING - WRITE_DMA48 freeing > taskqueue zombie request > > On first glance there was a single failure which has progressively got > worse... This did make me suspect there was indeed a problem with the > drive. > > > Speaking generally about disk replacements on your system -- when I say > > generally, I do mean generally and *not* in regards to the specific > > situation reported: > > > > Since there's no AHCI in use, we should just assume that a power-down of > > the system is the safest way to go about a disk replacement. Follow > > that procedure in the future and you should be fine. If you ever get a > > hot-swap backplane, you absolutely should use AHCI; hot-swap, especially > > on an Intel controller (FreeBSD is tested pretty thoroughly on Intel > > ICHxx and ESBx controllers), will work fine in that case. > > > > If you do go the AHCI route, and eventually upgrade to RELENG_8 down the > > line, I highly recommend you load kernel module ahci.ko (instead of the > > default/historic ataahci.ko). This will get you NCQ support amongst > > other things. > > > > I've actually got 8.0-RELEASE running on a similar box (no hot-swap back > plane) with AHCI. I think it's a newer generation of HP Proliant. That > runs RAID1 too. I had been contemplating upgrading motherboards to allow > hot-swap. I think I might have more than just a think about it now :-)