Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Oct 2008 15:07:30 -0200
From:      JoaoBR <joao@matik.com.br>
To:        Jeremy Chadwick <koitsu@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: constant zfs data corruption
Message-ID:  <200810201507.30778.joao@matik.com.br>
In-Reply-To: <20081020132208.GA3847@icarus.home.lan>
References:  <200810171530.45570.joao@matik.com.br> <200810200837.40451.joao@matik.com.br> <20081020132208.GA3847@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday 20 October 2008 11:22:08 you wrote:
> On Mon, Oct 20, 2008 at 08:37:40AM -0200, JoaoBR wrote:
> > On Friday 17 October 2008 15:39:59 Chuck Swiger wrote:
> > > On Oct 17, 2008, at 11:30 AM, JoaoBR wrote:
> > > > constantly I find data corruption on ZFS volums, ever from rrdtool,
> > > > this
> > > > corrupt data happens on SATA disks, never seem on SCSI
> > >
> > > Presumably your SATA drives are correctly being reported by ZFS as
> > > corrupting data, and you should do something like replace cables, the
> > > drives themselves, perhaps try downgrading to SATA-150 rather than
> > > -300 if you are using the later.  Also consider running a drive
> > > diagnostic utility from the mfgr (or smartmontools) and doing an
> > > extended self-test or destructive write surface check.
> >
> > well, hardware seems to be ok and not older than 6 month, also happens
> > not only on one machine ... smartctl do not report any hw failures on
> > disk
> >
> > regarding jumpering the drives to 150 you suspect a driver problem?
>
> It's not because of a driver problem.  There are known SATA chipsets
> which do not properly work with SATA300 (particularly VIA and SiS
> chipsets); they claim to support it, but data is occasionally corrupted.
> Capping the drive to SATA150 fixes this problem.
>
> http://en.wikipedia.org/wiki/Serial_ATA#SATA_1.5_Gbit.2Fs_and_SATA_3_Gbit=
=2E2
>Fs
>
> There are also known problems with Silicon Image chipsets (on Linux,
> Windows, and FreeBSD).
>
> Because you didn't provide your smartctl output, I can't really tell if
> the drives are in "good shape" or not.  :-)
>

ok then here it comes

smartctl version 5.38 [amd64-portbld-freebsd7.0] Copyright (C) 2002-8 Bruce=
=20
Allen
Home page is http://smartmontools.sourceforge.net/

=3D=3D=3D START OF INFORMATION SECTION =3D=3D=3D
Model Family:     Hitachi Deskstar T7K500
Device Model:     Hitachi HDT725025VLA380
Serial Number:    VFL101RK0A9SDP
=46irmware Version: V5DOA7EA
User Capacity:    250.058.268.160 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 1
Local Time is:    Mon Oct 20 15:07:01 2008 BRST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection:=20
Disabled.
Self-test execution status:      (   0) The previous self-test routine=20
completed
                                        without error or no self-test has e=
ver
                                        been run.
Total time to complete Offline
data collection:                 (4949) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off=
=20
support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  83) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED =
=20
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   099   099   016    Pre-fail =20
Always       -       3
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail =20
Offline      -       0
  3 Spin_Up_Time            0x0007   117   117   024    Pre-fail =20
Always       -       316 (Average 322)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age  =20
Always       -       36
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail =20
Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail =20
Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   020    Pre-fail =20
Offline      -       0
  9 Power_On_Hours          0x0012   100   100   000    Old_age  =20
Always       -       800
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail =20
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age  =20
Always       -       36
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age  =20
Always       -       69
193 Load_Cycle_Count        0x0012   100   100   000    Old_age  =20
Always       -       69
194 Temperature_Celsius     0x0002   130   130   000    Old_age  =20
Always       -       46 (Lifetime Min/Max 19/52)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age  =20
Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age  =20
Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age  =20
Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age  =20
Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.






> Also, do you not think it's a little odd that the only data corruption
> occurring for you are related to RRDtool?

this yes I think is suspitious


=2D-=20

Jo=E3o







A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura.
Service fornecido pelo Datacenter Matik  https://datacenter.matik.com.br



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200810201507.30778.joao>