From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 08:55:30 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 092601065674 for ; Mon, 7 Jun 2010 08:55:30 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 4B9778FC19 for ; Mon, 7 Jun 2010 08:55:28 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA19509; Mon, 07 Jun 2010 11:55:25 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OLY7A-000GDF-QW; Mon, 07 Jun 2010 11:55:24 +0300 Message-ID: <4C0CB3FC.8070001@icyb.net.ua> Date: Mon, 07 Jun 2010 11:55:24 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100321) MIME-Version: 1.0 To: Jeremy Chadwick References: <4C0CAABA.2010506@icyb.net.ua> <20100607083428.GA48419@icarus.home.lan> In-Reply-To: <20100607083428.GA48419@icarus.home.lan> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 08:55:30 -0000 on 07/06/2010 11:34 Jeremy Chadwick said the following: > On Mon, Jun 07, 2010 at 11:15:54AM +0300, Andriy Gapon wrote: >> During recent zpool scrub one read error was detected and "128K repaired". >> >> In system log I see the following message: >> ZFS: vdev I/O failure, zpool=tank >> path=/dev/gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff offset=284456910848 >> size=131072 error=5 >> >> On the other hand, there are no other errors, nothing from geom, ahci, etc. >> Why would that happen? What kind of error could this be? > > I believe this indicates silent data corruption[1], which ZFS can > auto-correct if the pool is a mirror or raidz (otherwise it can detect > the problem but not fix it). This pool is a mirror. > This can happen for a lot of reasons, but > tracking down the source is often difficult. Usually it indicates the > disk itself has some kind of problem (cache going bad, some sector > remaps which didn't happen or failed, etc.). Please note that this is not a CKSUM error, but READ error. > What I'd need to determine the cause: > > - Full "zpool status tank" output before the scrub This was "all clear". > - Full "zpool status tank" output after the scrub zpool status -v pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 5h0m with 0 errors on Sat Jun 5 05:05:43 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 ada0p4 ONLINE 0 0 0 gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff ONLINE 1 0 0 128K repaired > - Full "smartctl -a /dev/XXX" for all disk members of zpool "tank" Those output for both disks are "perfect". I monitor them regularly, also smartd is running and complaints from it. > Furthermore, what made you decide to scrub the pool on a whim? Why on a whim? It was a regularly scheduled scrub (bi-weekly). -- Andriy Gapon