From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 09:08:52 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 70BAF106564A for ; Mon, 7 Jun 2010 09:08:52 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta02.emeryville.ca.mail.comcast.net (qmta02.emeryville.ca.mail.comcast.net [76.96.30.24]) by mx1.freebsd.org (Postfix) with ESMTP id 57BF88FC13 for ; Mon, 7 Jun 2010 09:08:51 +0000 (UTC) Received: from omta11.emeryville.ca.mail.comcast.net ([76.96.30.36]) by qmta02.emeryville.ca.mail.comcast.net with comcast id Sx7f1e0020mlR8UA2x8r8r; Mon, 07 Jun 2010 09:08:51 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta11.emeryville.ca.mail.comcast.net with comcast id Sx8q1e0043S48mS8Xx8r6q; Mon, 07 Jun 2010 09:08:51 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 9563B9B418; Mon, 7 Jun 2010 02:08:50 -0700 (PDT) Date: Mon, 7 Jun 2010 02:08:50 -0700 From: Jeremy Chadwick To: Andriy Gapon Message-ID: <20100607090850.GA49166@icarus.home.lan> References: <4C0CAABA.2010506@icyb.net.ua> <20100607083428.GA48419@icarus.home.lan> <4C0CB3FC.8070001@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C0CB3FC.8070001@icyb.net.ua> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 09:08:52 -0000 On Mon, Jun 07, 2010 at 11:55:24AM +0300, Andriy Gapon wrote: > on 07/06/2010 11:34 Jeremy Chadwick said the following: > > On Mon, Jun 07, 2010 at 11:15:54AM +0300, Andriy Gapon wrote: > >> During recent zpool scrub one read error was detected and "128K repaired". > >> > >> In system log I see the following message: > >> ZFS: vdev I/O failure, zpool=tank > >> path=/dev/gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff offset=284456910848 > >> size=131072 error=5 > >> > >> On the other hand, there are no other errors, nothing from geom, ahci, etc. > >> Why would that happen? What kind of error could this be? > > > > I believe this indicates silent data corruption[1], which ZFS can > > auto-correct if the pool is a mirror or raidz (otherwise it can detect > > the problem but not fix it). > > This pool is a mirror. > > > This can happen for a lot of reasons, but > > tracking down the source is often difficult. Usually it indicates the > > disk itself has some kind of problem (cache going bad, some sector > > remaps which didn't happen or failed, etc.). > > Please note that this is not a CKSUM error, but READ error. Okay, then it indicates reading some data off the disk failed. ZFS auto-corrected it by reading the data from the other member in the pool (ada0p4). That's confirmed here: > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror ONLINE 0 0 0 > ada0p4 ONLINE 0 0 0 > gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff ONLINE 1 0 0 128K repaired > > - Full "smartctl -a /dev/XXX" for all disk members of zpool "tank" > > Those output for both disks are "perfect". > I monitor them regularly, also smartd is running and complaints from it. Most people I know if do not know how to interpret SMART statistics, and that's not their fault -- and that's why I requested them. :-) In this case, I'd like to see "smartctl -a" output for the disk that's associated with the above GPT ID. There may be some attributes or data in the SMART error log which could indicate what's going on. smartd does not know how to interpret data; it just logs what it sees. > > Furthermore, what made you decide to scrub the pool on a whim? > > Why on a whim? It was a regularly scheduled scrub (bi-weekly). I'm still trying to figure out why people do this. ZFS will automatically detect and correct errors of this sort when it encounters them during normal operation. It's good that you caught an error ahead of time, but ZFS would have dealt with this on its own. It's important to remember that scrubs are *highly* intensive on both the system itself as well as on all pool members. Disk I/O activity is very heavy during a scrub; it's not considered "normal use". -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |