From owner-freebsd-questions@FreeBSD.ORG Tue Jul 24 19:30:05 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F04621065672 for ; Tue, 24 Jul 2012 19:30:04 +0000 (UTC) (envelope-from dweimer@dweimer.net) Received: from webmail.dweimer.net (24-240-198-187.static.stls.mo.charter.com [24.240.198.187]) by mx1.freebsd.org (Postfix) with ESMTP id 8249B8FC0C for ; Tue, 24 Jul 2012 19:30:04 +0000 (UTC) Received: from www.dweimer.net (webmail.dweimer.net [192.168.5.1]) by webmail.dweimer.net (8.14.5/8.14.5) with ESMTP id q6OJU3QW025743 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Tue, 24 Jul 2012 14:30:04 -0500 (CDT) (envelope-from dweimer@dweimer.net) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Tue, 24 Jul 2012 14:30:03 -0500 From: dweimer To: Organization: dweimer.net Mail-Reply-To: In-Reply-To: <20120724180421.GF38393@dan.emsphone.com> References: <20120724180421.GF38393@dan.emsphone.com> Message-ID: X-Sender: dweimer@dweimer.net User-Agent: Roundcube Webmail/0.8-rc Subject: Re: Disk Errors X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dweimer@dweimer.net List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Jul 2012 19:30:05 -0000 On 2012-07-24 13:04, Dan Nelson wrote: > In the last episode (Jul 24), dweimer said: >> I have three 1TB disks I use for backup, two of them are Western >> Digital >> drives I bought specifically for this purpose. One is a Seagate >> drive >> that came out of a barebones PC that I replaced with a couple >> smaller >> drives in a stripe to gain performance. I use the drives in an >> external >> SATA dock, using geom eli encryption, the western digital drives >> give me >> no problems, but the seagate drive gives me a lot of the following >> errors >> under load. >> >> ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=817755328 >> ad4: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) >> LBA=837397120 >> ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=879786112 >> ad4: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) >> LBA=882931200 >> ad4: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) >> LBA=890542016 >> ad4: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) >> LBA=902767296 >> ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=904071296 > > If you install the sysutils/smartmontools port, you can run "smartctl > -x > /dev/ad4" to dump the drive's SMART attribute table and error logs. > Those > should give you an indication of whether the drive is going bad. If > the > drive is logging those write errors in its internal log, then you > know it's > not a cabling issue. If it's not logging errors, I suppose you might > have a > loose SATA plug on the drive itself, which would explain why the > problem > follows the drive around. > Running a long test on the drive now, doesn't seem to show anything that sticks out at me as failing right now. smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.0-RELEASE-p3 amd64] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.12 Device Model: ST31000528AS Serial Number: 5VP7ST1C LU WWN Device Id: 5 000c50 02f7a3bb4 Firmware Version: CC46 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Tue Jul 24 14:29:08 2012 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM level is: 208 (intermediate), recommended: 208 APM feature is: Unavailable Rd look-ahead is: Enabled Write cache is: Enabled ATA Security is: Disabled, NOT FROZEN [SEC1] === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 248) Self-test routine in progress... 80% of test remaining. Total time to complete Offline data collection: ( 600) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 173) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 117 099 006 - 145191418 3 Spin_Up_Time PO---- 095 095 000 - 0 4 Start_Stop_Count -O--CK 100 100 020 - 114 5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0 7 Seek_Error_Rate POSR-- 078 060 030 - 77590473 9 Power_On_Hours -O--CK 090 090 000 - 9156 10 Spin_Retry_Count PO--C- 100 100 097 - 0 12 Power_Cycle_Count -O--CK 100 100 020 - 46 183 Runtime_Bad_Block -O--CK 100 100 000 - 0 184 End-to-End_Error -O--CK 100 100 099 - 0 187 Reported_Uncorrect -O--CK 100 100 000 - 0 188 Command_Timeout -O--CK 100 098 000 - 21475164202 189 High_Fly_Writes -O-RCK 100 100 000 - 0 190 Airflow_Temperature_Cel -O---K 062 052 045 - 38 (Min/Max 35/38) 194 Temperature_Celsius -O---K 038 048 000 - 38 (0 23 0 0 0) 195 Hardware_ECC_Recovered -O-RC- 025 023 000 - 145191418 197 Current_Pending_Sector -O--C- 100 100 000 - 0 198 Offline_Uncorrectable ----C- 100 100 000 - 0 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 833 240 Head_Flying_Hours ------ 100 253 000 - 96417720837162 241 Total_LBAs_Written ------ 100 253 000 - 1480696469 242 Total_LBAs_Read ------ 100 253 000 - 922627427 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA commands not supported Read GP Log Directory failed. SMART Log Directory Version 1 [multi-sector log support] SMART Log at address 0x00 has 1 sectors [Log Directory] SMART Log at address 0x01 has 1 sectors [Summary SMART error log] SMART Log at address 0x02 has 5 sectors [Comprehensive SMART error log] SMART Log at address 0x06 has 1 sectors [SMART self-test log] SMART Log at address 0x09 has 1 sectors [Selective self-test log] SMART Log at address 0x80 has 16 sectors [Host vendor specific log] SMART Log at address 0x81 has 16 sectors [Host vendor specific log] SMART Log at address 0x82 has 16 sectors [Host vendor specific log] SMART Log at address 0x83 has 16 sectors [Host vendor specific log] SMART Log at address 0x84 has 16 sectors [Host vendor specific log] SMART Log at address 0x85 has 16 sectors [Host vendor specific log] SMART Log at address 0x86 has 16 sectors [Host vendor specific log] SMART Log at address 0x87 has 16 sectors [Host vendor specific log] SMART Log at address 0x88 has 16 sectors [Host vendor specific log] SMART Log at address 0x89 has 16 sectors [Host vendor specific log] SMART Log at address 0x8a has 16 sectors [Host vendor specific log] SMART Log at address 0x8b has 16 sectors [Host vendor specific log] SMART Log at address 0x8c has 16 sectors [Host vendor specific log] SMART Log at address 0x8d has 16 sectors [Host vendor specific log] SMART Log at address 0x8e has 16 sectors [Host vendor specific log] SMART Log at address 0x8f has 16 sectors [Host vendor specific log] SMART Log at address 0x90 has 16 sectors [Host vendor specific log] SMART Log at address 0x91 has 16 sectors [Host vendor specific log] SMART Log at address 0x92 has 16 sectors [Host vendor specific log] SMART Log at address 0x93 has 16 sectors [Host vendor specific log] SMART Log at address 0x94 has 16 sectors [Host vendor specific log] SMART Log at address 0x95 has 16 sectors [Host vendor specific log] SMART Log at address 0x96 has 16 sectors [Host vendor specific log] SMART Log at address 0x97 has 16 sectors [Host vendor specific log] SMART Log at address 0x98 has 16 sectors [Host vendor specific log] SMART Log at address 0x99 has 16 sectors [Host vendor specific log] SMART Log at address 0x9a has 16 sectors [Host vendor specific log] SMART Log at address 0x9b has 16 sectors [Host vendor specific log] SMART Log at address 0x9c has 16 sectors [Host vendor specific log] SMART Log at address 0x9d has 16 sectors [Host vendor specific log] SMART Log at address 0x9e has 16 sectors [Host vendor specific log] SMART Log at address 0x9f has 16 sectors [Host vendor specific log] SMART Log at address 0xa1 has 20 sectors [Device vendor specific log] SMART Log at address 0xa8 has 129 sectors [Device vendor specific log] SMART Log at address 0xa9 has 1 sectors [Device vendor specific log] SMART Log at address 0xc0 has 1 sectors [Device vendor specific log] SMART Log at address 0xe0 has 1 sectors [SCT Command/Status] SMART Log at address 0xe1 has 1 sectors [SCT Data Transfer] SMART Extended Comprehensive Error Log (GP Log 0x03) not supported SMART Error Log Version: 1 No Errors Logged SMART Extended Self-test Log (GP Log 0x07) not supported SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Self-test routine in progress 80% 9156 - # 2 Short offline Completed without error 00% 9156 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 522 (0x020a) SCT Support Level: 1 Device State: Active (0) Current Temperature: 38 Celsius Power Cycle Min/Max Temperature: 35/38 Celsius Lifetime Min/Max Temperature: 23/48 Celsius Under/Over Temperature Limit Count: 0/0 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 59 minutes Min/Max recommended Temperature: 14/55 Celsius Min/Max Temperature Limit: 10/60 Celsius Temperature History Size (Index): 128 (53) Index Estimated Time Temperature Celsius 54 2012-07-19 09:24 35 **************** ... ..( 3 skipped). .. **************** 58 2012-07-19 13:20 35 **************** 59 2012-07-19 14:19 34 *************** ... ..( 3 skipped). .. *************** 63 2012-07-19 18:15 34 *************** 64 2012-07-19 19:14 35 **************** 65 2012-07-19 20:13 35 **************** 66 2012-07-19 21:12 35 **************** 67 2012-07-19 22:11 36 ***************** 68 2012-07-19 23:10 36 ***************** 69 2012-07-20 00:09 35 **************** ... ..( 11 skipped). .. **************** 81 2012-07-20 11:57 35 **************** 82 2012-07-20 12:56 34 *************** ... ..( 5 skipped). .. *************** 88 2012-07-20 18:50 34 *************** 89 2012-07-20 19:49 35 **************** 90 2012-07-20 20:48 35 **************** 91 2012-07-20 21:47 36 ***************** 92 2012-07-20 22:46 37 ****************** 93 2012-07-20 23:45 36 ***************** 94 2012-07-21 00:44 36 ***************** 95 2012-07-21 01:43 35 **************** 96 2012-07-21 02:42 35 **************** 97 2012-07-21 03:41 35 **************** 98 2012-07-21 04:40 36 ***************** 99 2012-07-21 05:39 36 ***************** 100 2012-07-21 06:38 36 ***************** 101 2012-07-21 07:37 35 **************** ... ..( 6 skipped). .. **************** 108 2012-07-21 14:30 35 **************** 109 2012-07-21 15:29 34 *************** 110 2012-07-21 16:28 35 **************** ... ..( 6 skipped). .. **************** 117 2012-07-21 23:21 35 **************** 118 2012-07-22 00:20 34 *************** 119 2012-07-22 01:19 34 *************** 120 2012-07-22 02:18 34 *************** 121 2012-07-22 03:17 35 **************** ... ..( 14 skipped). .. **************** 8 2012-07-22 18:02 35 **************** 9 2012-07-22 19:01 ? - 10 2012-07-22 20:00 35 **************** 11 2012-07-22 20:59 35 **************** 12 2012-07-22 21:58 38 ******************* 13 2012-07-22 22:57 38 ******************* 14 2012-07-22 23:56 38 ******************* 15 2012-07-23 00:55 39 ******************** 16 2012-07-23 01:54 38 ******************* 17 2012-07-23 02:53 38 ******************* 18 2012-07-23 03:52 39 ******************** 19 2012-07-23 04:51 39 ******************** 20 2012-07-23 05:50 38 ******************* ... ..( 11 skipped). .. ******************* 32 2012-07-23 17:38 38 ******************* 33 2012-07-23 18:37 37 ****************** ... ..( 3 skipped). .. ****************** 37 2012-07-23 22:33 37 ****************** 38 2012-07-23 23:32 38 ******************* 39 2012-07-24 00:31 ? - 40 2012-07-24 01:30 25 ****** 41 2012-07-24 02:29 25 ****** 42 2012-07-24 03:28 36 ***************** 43 2012-07-24 04:27 36 ***************** 44 2012-07-24 05:26 ? - 45 2012-07-24 06:25 36 ***************** 46 2012-07-24 07:24 36 ***************** 47 2012-07-24 08:23 35 **************** 48 2012-07-24 09:22 36 ***************** 49 2012-07-24 10:21 35 **************** 50 2012-07-24 11:20 36 ***************** 51 2012-07-24 12:19 36 ***************** 52 2012-07-24 13:18 35 **************** 53 2012-07-24 14:17 38 ******************* SCT Error Recovery Control: Read: Disabled Write: Disabled SATA Phy Event Counters (GP Log 0x11) not supported -- Thanks, Dean E. Weimer http://www.dweimer.net/