From owner-freebsd-questions@FreeBSD.ORG Sat Mar 28 06:15:09 2015 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8C20EB43 for ; Sat, 28 Mar 2015 06:15:09 +0000 (UTC) Received: from know-smtprelay-omc-7.server.virginmedia.net (know-smtprelay-omc-7.server.virginmedia.net [80.0.253.71]) by mx1.freebsd.org (Postfix) with ESMTP id EA656361 for ; Sat, 28 Mar 2015 06:15:08 +0000 (UTC) Received: from localhost.localdomain ([81.106.150.188]) by know-smtprelay-7-imp with bizsmtp id 8uDx1q01N4481jl01uDxG9; Sat, 28 Mar 2015 06:13:58 +0000 X-Originating-IP: [81.106.150.188] X-Spam: 0 X-Authority: v=2.1 cv=cpwVkjIi c=1 sm=1 tr=0 a=DGj713NdaxKrsjjgQne7PA==:117 a=DGj713NdaxKrsjjgQne7PA==:17 a=J0QyKEt1u0cA:10 a=IkcTkHD0fZMA:10 a=NLZqzBF-AAAA:8 a=emO1SXQWCLwA:10 a=-9fLHVDCAAAA:8 a=6I5d2MoRAAAA:8 a=PZ8aFhzBlsYE0_cGU5MA:9 a=5oRma74kqghcA8kQ:21 a=Mzqb5po2o4ipaTfE:21 a=QEXdDO2ut3YA:10 Received: by localhost.localdomain (Postfix, from userid 500) id AD6EB848A2; Sat, 28 Mar 2015 06:13:57 +0000 (GMT) Date: Sat, 28 Mar 2015 06:13:57 +0000 From: Ken Moffat To: CK Subject: Re: smartctl Message-ID: <20150328061357.GA18597@milliways> References: <0LzskF-1ZWnak3ftL-0150PB@mail.gmx.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Clacks-Overhead: GNU Terry Pratchett Content-Transfer-Encoding: 8bit In-Reply-To: <0LzskF-1ZWnak3ftL-0150PB@mail.gmx.com> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Mar 2015 06:15:09 -0000 On Fri, Mar 27, 2015 at 09:05:29PM -0800, CK wrote: > Regarding the unexpected loss of files from the filesystem under various > loads, is the appended 'smartctl' data sufficient to make the determination > that the loss of files while the operating system is in use could be due to > the condition of the drive? > Drives fail. Sometimes smartctl reports problems _if_ you run the tests, other times they fail suddenly. The drive is old (only 40GB), so although the hours are only 12540 (500 days) I suspect it might have been "round the clock". Apparently it is a 5400rpm PATA drive - I used to use a pair of 5400rpm drives for RAID1 on a previous server, but I think I bought those 6 or more years ago, and even then they were 320GB. So old age seems a possible answer. > I didn't think so at first, because: > > 1) I would expect a FreeBSD error to the effect of "unable to read/write > /dev/ada0" or "block checksum does not match block data". > > 2) I would expect that all data read/written to from a drive is verfied to be > correct by FreeBSD with checksums, and that it is guaranteed to be correct > if there are no serious and fatal errors reported by the operating system. I cannot comment on that (except in VMs I'm a linux user), but if the drive's write cache is enabled then technically all bets are off - most modern drives will do that to improve throughput. You can also get filesystem errors, and unfortunate use of 'rm -rf'. > > But I may be wrong in these assumptions. Anybody know for sure? I have never > seen FreeBSD report any filesystem r/w errors. My past experience has only > taught me that when a drive begins to make very bad noises, this generally > accompanies obvious and serious problems; and that a drive fails when the > mechanical parts fail, but not due to wear on heads/platters or other things > that may cause failures that are not detected/reported by the operating > system. > My experience is limited (starting with two or three machines, mostly with one drive each, through to the current day where I have 4 desktop machines with one drive each, and machine used as a server with 3 drives). But recently I seem to have to replace at least one drive every year (although the last one was "just in case" because the SMART checks were often reporting unreadable sectors - not permanent errors, and it was in RAID-1 so ok while the other one still worked - and I've discarded others because they became too slow or too antiquated (IDE, SATAv1). But I would seriously suggest that if you have installed smartmontools then you ought to run some of the tests - on a server I tend to run long tests daily, at a time when I hope it is quiet, but on desktops less frequently. For a laptop I probably only run them when I think about it and know it will be on mains power. > I can't see how the loss of files could occur without FreeBSD noticing it and > reporting on it. Does FreeBSD just trust drives to do everything correctly > at all times? > > -- > > smartctl 6.2 2014-02-18 r3874 [FreeBSD 9.2-RELEASE i386] (local build) > Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Caviar WDxxxAB > Device Model: WDC WD400AB-22CDB0 > Serial Number: WD-WMA9T1222658 > Firmware Version: 22.04A22 > User Capacity: 40,020,664,320 bytes [40.0 GB] > Sector Size: 512 bytes logical/physical > Device is: In smartctl database [for details use: -P show] > ATA Version is: ATA/ATAPI-5 (minor revision not indicated) > Local Time is: Fri Mar 27 20:35:32 2015 AKDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x84) Offline data collection activity > was suspended by an interrupting command from host. > Auto Offline Data Collection: Enabled. > Self-test execution status: ( 0) The previous self-test routine completed > without error or no self-test has ever > been run. > Total time to complete Offline > data collection: ( 2376) seconds. > Offline data collection > capabilities: (0x3b) SMART execute Offline immediate. > Auto Offline data collection on/off support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > Conveyance Self-test supported. > No Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > No General Purpose Logging support. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 42) minutes. > Conveyance self-test routine > recommended polling time: ( 5) minutes. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0 > 3 Spin_Up_Time 0x0007 102 099 021 Pre-fail Always - 3975 > 4 Start_Stop_Count 0x0032 100 100 040 Old_age Always - 58 > 5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 1 I've had recent drives which started to give problems (particularly, unreadable sectors) around the time the Reallocated Sector Count became non-zero. > 7 Seek_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0 > 9 Power_On_Hours 0x0032 083 083 000 Old_age Always - 12540 > 10 Spin_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0 > 11 Calibration_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 57 > 196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1 > 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0 > 198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 0 > 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 0 > 200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail Offline - 0 > > SMART Error Log Version: 1 > No Errors Logged > > SMART Self-test log structure revision number 1 > No self-tests have been logged. [To run self-tests, use: smartctl -t] > I would try running some self-tests. > > Selective Self-tests/Logging not supported > > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" ĸen -- Nanny Ogg usually went to bed early. After all, she was an old lady. Sometimes she went to bed as early as 6 a.m.