From owner-freebsd-stable@FreeBSD.ORG Sat Dec 22 09:09:31 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 667A9131; Sat, 22 Dec 2012 09:09:31 +0000 (UTC) (envelope-from kulinski@cs.ucla.edu) Received: from smtp.cs.ucla.edu (smtp.cs.ucla.edu [131.179.128.62]) by mx1.freebsd.org (Postfix) with ESMTP id 420288FC12; Sat, 22 Dec 2012 09:09:31 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id C293B39E810F; Sat, 22 Dec 2012 01:01:40 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id minQHdh9ViGU; Sat, 22 Dec 2012 01:01:39 -0800 (PST) Received: from localhost.takeda.tk (mail.takeda.tk [74.0.89.210]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 9A78639E810A; Sat, 22 Dec 2012 01:01:39 -0800 (PST) Date: Sat, 22 Dec 2012 01:01:10 -0800 From: Derek Kulinski X-Priority: 3 (Normal) Message-ID: <1664598999.20121222010110@cs.ucla.edu> To: Alex Povolotsky Subject: Re: Strange problem with... ZFS? Disk? Controller? In-Reply-To: <50D56D4B.4060709@webmail.sub.ru> References: <50D56D4B.4060709@webmail.sub.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org, freebsd-hardware@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2012 09:09:31 -0000 Hello Alex, SMART values are collected by the disk itself (smartmontools is only reading it). This would imply that the problem is between disk and controller. Since you have tons of Hardware_ECC_Recovered and none of UDMA_CRC_Error_Count I would think that the problem is with disk itself. I think the long waits are due to disk trying to re-read given sector multiple times. Your drive is 2TB, and according to this the bigger the drive the more likely you'll run into problems like these: http://forums.storagereview.com/index.php/topic/27994-smart-hardware-ecc-recovered-values/ I don't know how serious it is but if you keep anything important there I would recommend a backup. You should try SMART self tests. Best regards, Derek Saturday, December 22, 2012, 12:20:27 AM, you wrote: > Hello, > I'm running FreeBSD 9.0/amd64, pure ZFS setup, one Seagate disk > ST2000NM0011 SN02 on LSI Logic (mpt) controller. > Yes, I know that running one disk on RAID controller is a bit weird, I > have to find yet if it is possible to connect disk to internal SATA > controller. > About two days ago, system became SLOW. Disk usage is constantly 100%, > and sometimes I'm getting swap_pager: indefinite wait buffer error. I > had to reset computer twice in two days. > mptutil does not show any errors, and smartctl shows > SMART Attributes Data Structure revision number: 10 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED > WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 067 063 044 Pre-fail > Always - 6218970 > 3 Spin_Up_Time 0x0003 093 092 000 Pre-fail > Always - 0 > 4 Start_Stop_Count 0x0032 100 100 020 Old_age > Always - 14 > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail > Always - 21 > 7 Seek_Error_Rate 0x000f 091 060 030 Pre-fail > Always - 1433294073 > 9 Power_On_Hours 0x0032 090 090 000 Old_age > Always - 8825 > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail > Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age > Always - 16 > 184 End-to-End_Error 0x0032 100 100 099 Old_age > Always - 0 > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age > Always - 0 > 188 Command_Timeout 0x0032 100 099 000 Old_age > Always - 12885098499 > 189 High_Fly_Writes 0x003a 100 100 000 Old_age > Always - 0 > 190 Airflow_Temperature_Cel 0x0022 068 047 045 Old_age > Always - 32 (Min/Max 31/32) > 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age > Always - 859 > 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age > Always - 15 > 193 Load_Cycle_Count 0x0032 100 100 000 Old_age > Always - 26 > 194 Temperature_Celsius 0x0022 032 053 000 Old_age > Always - 32 (0 21 0 0 0) > 195 Hardware_ECC_Recovered 0x001a 103 099 000 Old_age > Always - 6218970 > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age > Always - 0 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age > Offline - 0 > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age > Always - 0 > SMART Error Log Version: 1 > No Errors Logged > I have removed most of snapshots, it does not help. > I have stopped all active processes, disk load did not decrease, same 100%. > What can I check and/or replace to get the problem fixed? Any ideas? > Alex > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org" -- Best regards, Derek mailto:kulinski@cs.ucla.edu If you choke a Smurf, what color does it turn?