From owner-freebsd-stable@FreeBSD.ORG Sun Aug 10 23:01:41 2008 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B6B8106567C for ; Sun, 10 Aug 2008 23:01:41 +0000 (UTC) (envelope-from ler@lerctr.org) Received: from thebighonker.lerctr.org (thebighonker.lerctr.org [192.147.25.65]) by mx1.freebsd.org (Postfix) with ESMTP id E17448FC1F for ; Sun, 10 Aug 2008 23:01:40 +0000 (UTC) (envelope-from ler@lerctr.org) Received: from 76-205-169-61.lightspeed.austtx.sbcglobal.net ([76.205.169.61]:63161 helo=borg) by thebighonker.lerctr.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1KSJus-000CkU-5P for freebsd-stable@FreeBSD.org; Sun, 10 Aug 2008 18:01:39 -0500 Date: Sun, 10 Aug 2008 18:01:34 -0500 (CDT) From: Larry Rosenman Sender: ler@borg To: freebsd-stable@FreeBSD.org Message-ID: <20080810175934.X2427@borg> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Spam-Score: -2.5 (--) X-LERCTR-Spam-Score: -2.5 (--) X-Spam-Report: SpamScore (-2.5/5.0) ALL_TRUSTED=-1.8, BAYES_00=-2.599, TVD_RCVD_IP=1.931 X-LERCTR-Spam-Report: SpamScore (-2.5/5.0) ALL_TRUSTED=-1.8, BAYES_00=-2.599, TVD_RCVD_IP=1.931 DomainKey-Status: no signature Cc: Subject: ICRC's X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Aug 2008 23:01:41 -0000 I'm getting the following on a zpool scrub: ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=54817587 ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=187521229 ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=187522189 ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=109095258 ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=101327859 ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=172911744 ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=65393370 ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=64741875 ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=262496999 ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=154593293 pool: vault state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed with 0 errors on Sun Aug 10 16:20:30 2008 config: NAME STATE READ WRITE CKSUM vault ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad6 ONLINE 0 0 0 ad8 ONLINE 0 0 17 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad14 ONLINE 0 0 0 ad4s1f ONLINE 0 0 0 ad4s1e ONLINE 0 0 0 ad4s1d ONLINE 0 0 0 errors: No known data errors I replaced the drive at ad8 because the original one would get an ICRC and then hang the bus. Smart info: smartctl version 5.38 [amd64-portbld-freebsd7.0] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3500630AS Serial Number: 9QG19C2Q Firmware Version: 3.AAE User Capacity: 500,107,862,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sun Aug 10 18:01:07 2008 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 163) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 105 100 006 Pre-fail Always - 9366477 3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 4 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 2364626 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 41 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 7 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 064 061 045 Old_age Always - 36 (Lifetime Min/Max 35/39) 194 Temperature_Celsius 0x0022 036 040 000 Old_age Always - 36 (0 32 0 0) 195 Hardware_ECC_Recovered 0x001a 068 064 000 Old_age Always - 207627383 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 94 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 110 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 110 occurred at disk power-on lifetime: 41 hours (1 days + 17 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 0f fe e7 36 49 Error: ICRC, ABRT 15 sectors at LBA = 0x0936e7fe = 154593278 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 0d e7 36 49 00 01:23:46.872 READ DMA c8 00 00 0d e6 36 49 00 01:23:46.871 READ DMA c8 00 00 0d e5 36 49 00 01:23:46.871 READ DMA c8 00 00 0d e4 36 49 00 01:23:46.870 READ DMA c8 00 00 0d e3 36 49 00 01:23:46.853 READ DMA Error 109 occurred at disk power-on lifetime: 40 hours (1 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 5f 88 62 a5 4f Error: ICRC, ABRT 95 sectors at LBA = 0x0fa56288 = 262496904 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 e7 61 a5 4f 00 01:11:12.732 READ DMA c8 00 00 e7 60 a5 4f 00 01:11:12.730 READ DMA c8 00 00 e7 5f a5 4f 00 01:11:12.729 READ DMA c8 00 00 e7 5e a5 4f 00 01:11:12.727 READ DMA c8 00 00 e7 5d a5 4f 00 01:11:12.724 READ DMA Error 108 occurred at disk power-on lifetime: 40 hours (1 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 1f d4 e1 db 43 Error: ICRC, ABRT 31 sectors at LBA = 0x03dbe1d4 = 64741844 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 40 b3 e1 db 43 00 01:10:40.553 READ DMA c8 00 40 73 e1 db 43 00 01:10:40.552 READ DMA c8 00 40 33 e1 db 43 00 01:10:40.487 READ DMA c8 00 00 33 e0 db 43 00 01:10:40.485 READ DMA c8 00 00 33 df db 43 00 01:10:40.484 READ DMA Error 107 occurred at disk power-on lifetime: 40 hours (1 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 3f 8e d2 e5 43 Error: ICRC, ABRT 63 sectors at LBA = 0x03e5d28e = 65393294 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 cd d1 e5 43 00 00:52:56.221 READ DMA c8 00 40 5a d1 e5 43 00 00:52:56.218 READ DMA c8 00 00 5a d0 e5 43 00 00:52:56.217 READ DMA c8 00 00 5a cf e5 43 00 00:52:56.216 READ DMA c8 00 c0 67 ce e5 43 00 00:52:56.215 READ DMA Error 106 occurred at disk power-on lifetime: 40 hours (1 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 2f 51 6c 4e 4a Error: ICRC, ABRT 47 sectors at LBA = 0x0a4e6c51 = 172911697 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 80 00 6c 4e 4a 00 00:40:47.156 READ DMA c8 00 80 80 6b 4e 4a 00 00:40:47.156 READ DMA c8 00 80 00 6b 4e 4a 00 00:40:47.155 READ DMA c8 00 80 80 6a 4e 4a 00 00:40:47.155 READ DMA c8 00 80 00 6a 4e 4a 00 00:40:47.155 READ DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 32 - # 2 Short offline Completed without error 00% 10 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Ideas? This is on a SuperMicro SYS-7045-TR+ -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893