Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 17 May 2008 09:52:23 +0200
From:      Willy Offermans <Willy@Offermans.Rompen.nl>
To:        Roland Smith <rsmith@xs4all.nl>
Cc:        freebsd-stable@FreeBSD.ORG
Subject:   Re: g_vfs_done error third part--PLEASE HELP!
Message-ID:  <20080517075222.GA4250@wiz.vpn.offrom.nl>
In-Reply-To: <20080516190718.GA73178@slackbox.xs4all.nl>
References:  <20080421190403.GA4625@wiz.vpn.offrom.nl> <20080421201047.GB6884@slackbox.xs4all.nl> <20080516121414.GD4618@wiz.vpn.offrom.nl> <20080516190718.GA73178@slackbox.xs4all.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello Roland and FreeBSD friends,

On Fri, May 16, 2008 at 09:07:18PM +0200, Roland Smith wrote:
> On Fri, May 16, 2008 at 02:14:14PM +0200, Willy Offermans wrote:
> 
> > Filesystem  1K-blocks     Used     Avail Capacity  Mounted on
> > /dev/ar0s1a  20308398   230438  18453290     1%    /
> > devfs               1        1         0   100%    /dev
> > /dev/ar0s1d  21321454  3814482  15801256    19%    /usr
> > /dev/ar0s1e  50777034  5331686  41383186    11%    /var
> > /dev/ar0s1f 101554150 18813760  74616058    20%    /home
> > /dev/ar0s1g 274977824 34564876 218414724    14%    /share
> > 
> > pretty normal I would say.
> 
> Yes.
> 
> > > Did you notice any file corruption in the filesystem on ar0s1g?
> > 
> > No the two disks are brand new and I did not encounter any noticeable
> > file corruption. However I assume that nowadays bad sectors on HD are
> > handled by the hardware and do not need any user interaction to correct
> > this. But maybe I'm totally wrong.
> 
> Every ATA disk has spare sectors, and they usually don't report bad
> blocks untill the spares are exhausted. In which case it is prudent to
> replace the disk.
> 
> > > Unmount the filesystem and run fsck(8) on it. Does it report any errors?
> > 
> > sun# fsck /dev/ar0s1g 
> > ** /dev/ar0s1g
> > ** Last Mounted on /share
> > ** Phase 1 - Check Blocks and Sizes
> > INCORRECT BLOCK COUNT I=34788357 (272 should be 264)
> > CORRECT? [yn] y
> > 
> > INCORRECT BLOCK COUNT I=34789217 (296 should be 288)
> > CORRECT? [yn] y
> > 
> > ** Phase 2 - Check Pathnames
> > ** Phase 3 - Check Connectivity
> > ** Phase 4 - Check Reference Counts
> > ** Phase 5 - Check Cyl groups
> > FREE BLK COUNT(S) WRONG IN SUPERBLK
> > SALVAGE? [yn] y
> > 
> > SUMMARY INFORMATION BAD
> > SALVAGE? [yn] y
> > 
> > BLK(S) MISSING IN BIT MAPS
> > SALVAGE? [yn] y
> > 
> > 182863 files, 17282440 used, 120206472 free (12448 frags, 15024253
> > blocks, 0.0% fragmentation)
> > 
> > ***** FILE SYSTEM MARKED CLEAN *****
> > 
> > ***** FILE SYSTEM WAS MODIFIED *****
> > 
> > The usual stuff I would say.
> 
> Disk corruption is never normal. It can be explained by if the machine
> crashed or was power-cycles before the disks were unmounted, but it can
> also indicate hardware troubles.
> 
> > > > Any hints are very much appreciated.
> 
> > So I have to conclude that the write error message does make sense and
> > that something seems to be wrong with the disks. The next question is
> > what can I do about it? Should I return the disks to the shop and ask
> > for new ones?
> 
> Install sysutils/smartmontools, and run 'smartctl -A /dev/adX|less', where X
> are the numbers of the drives in the RAID array.
> 
> In the output, look at the values for Reallocated_Sector_Ct,
> Current_Pending_Sector, Offline_Uncorrectable, which is the last number
> that you see on each line.
> 
> A small number for Reallocated_Sector_Ct is allowable. But non-zero counts
> for Current_Pending_Sector or Offline_Uncorrectable means it's time to
> get a new disk.

sun# atacontrol status ar0
ar0: ATA RAID1 status: READY
 subdisks:
   0 ad4  ONLINE
   1 ad6  ONLINE

So ad4 and ad6 are the HDs of the array.

sun# smartctl -A /dev/ad6 
smartctl version 5.38 [i386-portbld-freebsd7.0] Copyright (C) 2002-8
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail Always      -       3
  3 Spin_Up_Time            0x0007   100   100   015    Pre-fail Always      -       7232
  4 Start_Stop_Count        0x0032   100   100   000    Old_age Always       -       31
  5 Reallocated_Sector_Ct   0x0033   253   253   010    Pre-fail Always      -       0
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail Always      -       0
  8 Seek_Time_Performance   0x0025   253   253   015    Pre-fail Offline     -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age Always       -       1478
 10 Spin_Retry_Count        0x0033   253   253   051    Pre-fail Always      -       0
 11 Calibration_Retry_Count 0x0012   253   253   000    Old_age Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age Always       -       31
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age Always       -       439070649
187 Reported_Uncorrect      0x0032   253   253   000    Old_age Always       -       0
188 Unknown_Attribute       0x0032   253   253   000    Old_age Always       -       0
190 Airflow_Temperature_Cel 0x0022   062   060   000    Old_age Always       -       38
194 Temperature_Celsius     0x0022   124   115   000    Old_age Always       -       38
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age Always       -       439070649
196 Reallocated_Event_Count 0x0032   253   253   000    Old_age Always       -       0
197 Current_Pending_Sector  0x0012   253   253   000    Old_age Always       -       0
198 Offline_Uncorrectable   0x0030   253   253   000    Old_age Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age Always       -       0
202 TA_Increase_Count       0x0032   253   253   000    Old_age Always       -       0

un# smartctl -A /dev/ad4
smartctl version 5.38 [i386-portbld-freebsd7.0] Copyright (C) 2002-8
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail Always      -       109
  3 Spin_Up_Time            0x0007   100   100   015    Pre-fail Always      -       7360
  4 Start_Stop_Count        0x0032   100   100   000    Old_age Always       -       32
  5 Reallocated_Sector_Ct   0x0033   253   253   010    Pre-fail Always      -       0
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail Always      -       0
  8 Seek_Time_Performance   0x0025   253   253   015    Pre-fail Offline     -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age Always       -       1478
 10 Spin_Retry_Count        0x0033   253   253   051    Pre-fail Always      -       0
 11 Calibration_Retry_Count 0x0012   253   253   000    Old_age Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age Always       -       31
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age Always       -       835531250
187 Reported_Uncorrect      0x0032   253   253   000    Old_age Always       -       0
188 Unknown_Attribute       0x0032   253   253   000    Old_age Always       -       0
190 Airflow_Temperature_Cel 0x0022   062   060   000    Old_age Always       -       38
194 Temperature_Celsius     0x0022   124   118   000    Old_age Always       -       38
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age Always       -       835531250
196 Reallocated_Event_Count 0x0032   253   253   000    Old_age Always       -       0
197 Current_Pending_Sector  0x0012   253   253   000    Old_age Always       -       0
198 Offline_Uncorrectable   0x0030   253   253   000    Old_age Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age Always       -       0
202 TA_Increase_Count       0x0032   253   253   000    Old_age Always       -       0

The critical values you have mentioned are all zero, but maybe you
notice some other oddities.

> 
> > However other people that I have contacted and who had a similar
> > problem before have solved it by using software raid setup instead of a
> > hardware raid setup. This seems to indicate that there is some bug in
> > the FreeBSD code.
> 
> The RAID support that you find on most desktop motherboards _is_
> software RAID. See ataraid(4).

Well then read motherboard supported raid instead of hardware raid!
What I meant was that Toomas noticed a similar problem and turned to
gmirror to ``solve'' the issue. But somewhere is something weird going on. I'm not the first
one to discover this and would be nice to nail it down, so that in the
future no one has to suffer anymore from this.

> 
> Roland
> -- 
> R.F.Smith                                   http://www.xs4all.nl/~rsmith/
> [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
> pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)



-- 
Met vriendelijke groeten,
With kind regards,
Mit freundlichen Gruessen,
De jrus wah,

Willy

*************************************
W.K. Offermans
Home:   +31 45 544 49 44
Mobile: +31 653 27 16 23
e-mail: Willy@Offermans.Rompen.nl

                                       Powered by ....

                                            (__)
                                         \\\'',)
                                           \/  \ ^
                                           .\._/_)

                                       www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080517075222.GA4250>