Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Aug 2008 18:01:34 -0500 (CDT)
From:      Larry Rosenman <ler@lerctr.org>
To:        freebsd-stable@FreeBSD.org
Subject:   ICRC's
Message-ID:  <20080810175934.X2427@borg>

next in thread | raw e-mail | index | archive | help
I'm getting the following on a zpool scrub:

ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=54817587
ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=187521229
ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=187522189
ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=109095258
ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=101327859
ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=172911744
ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=65393370
ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=64741875
ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=262496999
ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=154593293


   pool: vault
  state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
 	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
 	using 'zpool clear' or replace the device with 'zpool replace'.
    see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: scrub completed with 0 errors on Sun Aug 10 16:20:30 2008
config:

 	NAME        STATE     READ WRITE CKSUM
 	vault       ONLINE       0     0     0
 	  raidz1    ONLINE       0     0     0
 	    ad6     ONLINE       0     0     0
 	    ad8     ONLINE       0     0    17
 	    ad10    ONLINE       0     0     0
 	    ad12    ONLINE       0     0     0
 	    ad14    ONLINE       0     0     0
 	  ad4s1f    ONLINE       0     0     0
 	  ad4s1e    ONLINE       0     0     0
 	  ad4s1d    ONLINE       0     0     0

errors: No known data errors


I replaced the drive at ad8 because the original one would get an ICRC and then hang the bus.

Smart info:

smartctl version 5.38 [amd64-portbld-freebsd7.0] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10 family
Device Model:     ST3500630AS
Serial Number:    9QG19C2Q
Firmware Version: 3.AAE
User Capacity:    500,107,862,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Aug 10 18:01:07 2008 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
 					was completed without error.
 					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
 					without error or no self-test has ever
 					been run.
Total time to complete Offline 
data collection: 		 ( 430) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
 					Auto Offline data collection on/off support.
 					Suspend Offline collection upon new
 					command.
 					Offline surface scan supported.
 					Self-test supported.
 					No Conveyance Self-test supported.
 					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
 					power-saving mode.
 					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
 					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 163) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000f   105   100   006    Pre-fail  Always       -       9366477
   3 Spin_Up_Time            0x0003   095   095   000    Pre-fail  Always       -       0
   4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       4
   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
   7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2364626
   9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       41
  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       7
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   064   061   045    Old_age   Always       -       36 (Lifetime Min/Max 35/39)
194 Temperature_Celsius     0x0022   036   040   000    Old_age   Always       -       36 (0 32 0 0)
195 Hardware_ECC_Recovered  0x001a   068   064   000    Old_age   Always       -       207627383
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       94
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 110 (device log contains only the most recent five errors)
 	CR = Command Register [HEX]
 	FR = Features Register [HEX]
 	SC = Sector Count Register [HEX]
 	SN = Sector Number Register [HEX]
 	CL = Cylinder Low Register [HEX]
 	CH = Cylinder High Register [HEX]
 	DH = Device/Head Register [HEX]
 	DC = Device Command Register [HEX]
 	ER = Error register [HEX]
 	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 110 occurred at disk power-on lifetime: 41 hours (1 days + 17 hours)
   When the command that caused the error occurred, the device was active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   84 51 0f fe e7 36 49  Error: ICRC, ABRT 15 sectors at LBA = 0x0936e7fe = 154593278

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 00 00 0d e7 36 49 00      01:23:46.872  READ DMA
   c8 00 00 0d e6 36 49 00      01:23:46.871  READ DMA
   c8 00 00 0d e5 36 49 00      01:23:46.871  READ DMA
   c8 00 00 0d e4 36 49 00      01:23:46.870  READ DMA
   c8 00 00 0d e3 36 49 00      01:23:46.853  READ DMA

Error 109 occurred at disk power-on lifetime: 40 hours (1 days + 16 hours)
   When the command that caused the error occurred, the device was active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   84 51 5f 88 62 a5 4f  Error: ICRC, ABRT 95 sectors at LBA = 0x0fa56288 = 262496904

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 00 00 e7 61 a5 4f 00      01:11:12.732  READ DMA
   c8 00 00 e7 60 a5 4f 00      01:11:12.730  READ DMA
   c8 00 00 e7 5f a5 4f 00      01:11:12.729  READ DMA
   c8 00 00 e7 5e a5 4f 00      01:11:12.727  READ DMA
   c8 00 00 e7 5d a5 4f 00      01:11:12.724  READ DMA

Error 108 occurred at disk power-on lifetime: 40 hours (1 days + 16 hours)
   When the command that caused the error occurred, the device was active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   84 51 1f d4 e1 db 43  Error: ICRC, ABRT 31 sectors at LBA = 0x03dbe1d4 = 64741844

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 00 40 b3 e1 db 43 00      01:10:40.553  READ DMA
   c8 00 40 73 e1 db 43 00      01:10:40.552  READ DMA
   c8 00 40 33 e1 db 43 00      01:10:40.487  READ DMA
   c8 00 00 33 e0 db 43 00      01:10:40.485  READ DMA
   c8 00 00 33 df db 43 00      01:10:40.484  READ DMA

Error 107 occurred at disk power-on lifetime: 40 hours (1 days + 16 hours)
   When the command that caused the error occurred, the device was active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   84 51 3f 8e d2 e5 43  Error: ICRC, ABRT 63 sectors at LBA = 0x03e5d28e = 65393294

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 00 00 cd d1 e5 43 00      00:52:56.221  READ DMA
   c8 00 40 5a d1 e5 43 00      00:52:56.218  READ DMA
   c8 00 00 5a d0 e5 43 00      00:52:56.217  READ DMA
   c8 00 00 5a cf e5 43 00      00:52:56.216  READ DMA
   c8 00 c0 67 ce e5 43 00      00:52:56.215  READ DMA

Error 106 occurred at disk power-on lifetime: 40 hours (1 days + 16 hours)
   When the command that caused the error occurred, the device was active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   84 51 2f 51 6c 4e 4a  Error: ICRC, ABRT 47 sectors at LBA = 0x0a4e6c51 = 172911697

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 00 80 00 6c 4e 4a 00      00:40:47.156  READ DMA
   c8 00 80 80 6b 4e 4a 00      00:40:47.156  READ DMA
   c8 00 80 00 6b 4e 4a 00      00:40:47.155  READ DMA
   c8 00 80 80 6a 4e 4a 00      00:40:47.155  READ DMA
   c8 00 80 00 6a 4e 4a 00      00:40:47.155  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%        32         -
# 2  Short offline       Completed without error       00%        10         -

SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Ideas?

This is on a SuperMicro SYS-7045-TR+

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 512-248-2683                 E-Mail: ler@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080810175934.X2427>