From owner-freebsd-fs@FreeBSD.ORG  Mon Jun  7 09:08:52 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 70BAF106564A
	for <freebsd-fs@freebsd.org>; Mon,  7 Jun 2010 09:08:52 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta02.emeryville.ca.mail.comcast.net
	(qmta02.emeryville.ca.mail.comcast.net [76.96.30.24])
	by mx1.freebsd.org (Postfix) with ESMTP id 57BF88FC13
	for <freebsd-fs@freebsd.org>; Mon,  7 Jun 2010 09:08:51 +0000 (UTC)
Received: from omta11.emeryville.ca.mail.comcast.net ([76.96.30.36])
	by qmta02.emeryville.ca.mail.comcast.net with comcast
	id Sx7f1e0020mlR8UA2x8r8r; Mon, 07 Jun 2010 09:08:51 +0000
Received: from koitsu.dyndns.org ([98.248.46.159])
	by omta11.emeryville.ca.mail.comcast.net with comcast
	id Sx8q1e0043S48mS8Xx8r6q; Mon, 07 Jun 2010 09:08:51 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 9563B9B418; Mon,  7 Jun 2010 02:08:50 -0700 (PDT)
Date: Mon, 7 Jun 2010 02:08:50 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Andriy Gapon <avg@icyb.net.ua>
Message-ID: <20100607090850.GA49166@icarus.home.lan>
References: <4C0CAABA.2010506@icyb.net.ua>
	<20100607083428.GA48419@icarus.home.lan>
	<4C0CB3FC.8070001@icyb.net.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4C0CB3FC.8070001@icyb.net.ua>
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs i/o error, no driver error
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Jun 2010 09:08:52 -0000

On Mon, Jun 07, 2010 at 11:55:24AM +0300, Andriy Gapon wrote:
> on 07/06/2010 11:34 Jeremy Chadwick said the following:
> > On Mon, Jun 07, 2010 at 11:15:54AM +0300, Andriy Gapon wrote:
> >> During recent zpool scrub one read error was detected and "128K repaired".
> >>
> >> In system log I see the following message:
> >> ZFS: vdev I/O failure, zpool=tank
> >> path=/dev/gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff offset=284456910848
> >> size=131072 error=5
> >>
> >> On the other hand, there are no other errors, nothing from geom, ahci, etc.
> >> Why would that happen? What kind of error could this be?
> > 
> > I believe this indicates silent data corruption[1], which ZFS can
> > auto-correct if the pool is a mirror or raidz (otherwise it can detect
> > the problem but not fix it).
> 
> This pool is a mirror.
> 
> > This can happen for a lot of reasons, but
> > tracking down the source is often difficult.  Usually it indicates the
> > disk itself has some kind of problem (cache going bad, some sector
> > remaps which didn't happen or failed, etc.).
> 
> Please note that this is not a CKSUM error, but READ error.

Okay, then it indicates reading some data off the disk failed.  ZFS
auto-corrected it by reading the data from the other member in the pool
(ada0p4).  That's confirmed here:

> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are unaffected.
> 
>         NAME                                            STATE     READ WRITE CKSUM
>         tank                                            ONLINE       0     0     0
>           mirror                                        ONLINE       0     0     0
>             ada0p4                                      ONLINE       0     0     0
>             gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff  ONLINE       1     0     0  128K repaired

> > - Full "smartctl -a /dev/XXX" for all disk members of zpool "tank"
> 
> Those output for both disks are "perfect".
> I monitor them regularly, also smartd is running and complaints from it.

Most people I know if do not know how to interpret SMART statistics, and
that's not their fault -- and that's why I requested them.  :-)  In this
case, I'd like to see "smartctl -a" output for the disk that's
associated with the above GPT ID.  There may be some attributes or data
in the SMART error log which could indicate what's going on.  smartd
does not know how to interpret data; it just logs what it sees.

> > Furthermore, what made you decide to scrub the pool on a whim?
> 
> Why on a whim? It was a regularly scheduled scrub (bi-weekly).

I'm still trying to figure out why people do this.  ZFS will
automatically detect and correct errors of this sort when it encounters
them during normal operation.  It's good that you caught an error ahead
of time, but ZFS would have dealt with this on its own.

It's important to remember that scrubs are *highly* intensive on both
the system itself as well as on all pool members.  Disk I/O activity is
very heavy during a scrub; it's not considered "normal use".

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |