Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Jun 2011 02:33:18 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Karl Pielorz <kpielorz_lst@tdx.co.uk>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS scrub 'repaired' pool with no chksum or read errors?
Message-ID:  <20110610093318.GA39276@icarus.home.lan>
In-Reply-To: <729A0755FAEF480774EEF4AB@HexaDeca64.dmpriest.net.uk>
References:  <729A0755FAEF480774EEF4AB@HexaDeca64.dmpriest.net.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jun 10, 2011 at 09:43:14AM +0100, Karl Pielorz wrote:
> I'm running FreeBSD-8.2R amd64 w/4Gb of ECC RAM on a machine used
> for 'offsite' backups (that are copied to it using zfs
> send/receive).
> 
> I scrub this machine every now and again (about once a month) -
> recently this resulted in the following output after the scrub
> completed:
> 
> "
> # zpool status
>  pool: vol
> state: ONLINE
> scrub: scrub completed after 2h49m with 0 errors on Thu Jun  9
> 17:09:31 2011
> config:
> 
>        NAME        STATE     READ WRITE CKSUM
>        vol         ONLINE       0     0     0
>          raidz1    ONLINE       0     0     0
>            ada0    ONLINE       0     0     0  256K repaired
>            ada1    ONLINE       0     0     0
>            ada2    ONLINE       0     0     0
> 
> errors: No known data errors
> "
> 
> Should I be worried there was 256k of 'repairs' done, even though
> there were no checksum errors, or read errors detected?
> 
> The console logged no errors - and nothing shows in syslog.
> 
> The machine is always cleanly shut down - and the drives all appear
> fine from a SMART point of view - I'm just a bit concerned as to
> where the repairs came from - as ZFS doesn't seem to know (or be
> able to tell me) either :)

ZFS experts please correct me, but my experience with this has shown me
that the scrub itself found actual issues while analysing all data on
the entire pool -- more specifically, I believe READ/WRITE/CKSUM are
counters used for when errors are encountered during normal (read:
non-scrub) operations.  It's been a while since I've seen this happen,
but have seen it on our Solaris 10 machines at my workplace.  I've never
been sure what it means; possibly signs of "bit rot"?

If you're worried about your disk (ada0), please provide output from
"smartctl -a /dev/ada0" and I'll be more than happy to review the output
and provide you with any insights.  I do believe you when you say it
looks fine, but every model of disk is different in some regard.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110610093318.GA39276>