From owner-freebsd-stable@FreeBSD.ORG Sun Oct 3 12:08:22 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0DEA61065670 for ; Sun, 3 Oct 2010 12:08:22 +0000 (UTC) (envelope-from dan@langille.org) Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42]) by mx1.freebsd.org (Postfix) with ESMTP id D09698FC18 for ; Sun, 3 Oct 2010 12:08:21 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by nyi.unixathome.org (Postfix) with ESMTP id 12C18508D8; Sun, 3 Oct 2010 13:08:21 +0100 (BST) X-Virus-Scanned: amavisd-new at unixathome.org Received: from nyi.unixathome.org ([127.0.0.1]) by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dP9JWju2fwql; Sun, 3 Oct 2010 13:08:20 +0100 (BST) Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7]) (Authenticated sender: hidden) by nyi.unixathome.org (Postfix) with ESMTPSA id BF079508AD ; Sun, 3 Oct 2010 13:08:20 +0100 (BST) Message-ID: <4CA87233.2050308@langille.org> Date: Sun, 03 Oct 2010 08:08:19 -0400 From: Dan Langille Organization: The FreeBSD Diary User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.9) Gecko/20100915 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <4CA73702.5080203@langille.org> <20101002141921.GC70283@icarus.home.lan> <4CA7AD95.9040703@langille.org> <20101002223626.GB78136@icarus.home.lan> <4CA7BEE4.9050201@langille.org> <20101002235024.GA80643@icarus.home.lan> <4CA7E4AE.4060607@langille.org> In-Reply-To: <4CA7E4AE.4060607@langille.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable Subject: Re: out of HDD space - zfs degraded X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Oct 2010 12:08:22 -0000 On 10/2/2010 10:04 PM, Dan Langille wrote: > After a 'shutdown -p now', it was about 20 minutes before I went and > powered it up (I was on minecraft). The box came back with the missing HDD: > > $ zpool status storage > pool: storage > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using 'zpool clear' or replace the device with 'zpool replace'. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > storage ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > gpt/disk01-live ONLINE 0 0 0 > gpt/disk02-live ONLINE 0 0 0 > gpt/disk03-live ONLINE 0 0 0 > gpt/disk04-live ONLINE 0 0 0 > gpt/disk05-live ONLINE 0 0 0 > gpt/disk06-live ONLINE 0 0 12 > gpt/disk07-live ONLINE 0 0 0 Overnight, the following appeared in /var/log/messages: Oct 2 21:56:46 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=123103157760 size=1024 Oct 2 21:56:47 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=123103159808 size=1024 Oct 2 21:56:47 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=123103164416 size=512 Oct 2 21:56:47 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=123103162880 size=512 Oct 2 23:00:58 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1875352305152 size=1024 Oct 3 02:44:55 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1914424351744 size=512 Oct 3 03:01:01 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1875175041536 size=512 Oct 3 03:01:02 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1886724290048 size=1024 Oct 3 04:05:44 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1953680806912 size=512 Oct 3 04:05:44 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1953680807424 size=512 Oct 3 04:05:44 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1953680807936 size=512 Oct 3 04:05:44 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1953680808448 size=512 Oct 3 04:59:38 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=98172631552 size=512 Oct 3 04:59:38 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=98172729856 size=512 Oct 3 04:59:38 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=98172730368 size=512 Oct 3 04:59:38 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=98172730880 size=512 Oct 3 04:59:38 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=98172731392 size=512 Given the outage from yesterday when ada0 was offline for several hours, I'm guessing that checksum mismatches on that drive are expected. Yes, /dev/gpt/disk06-live == ada0. The current zpool status is: $ zpool status pool: storage state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 0h1m with 0 errors on Sun Oct 3 00:01:17 2010 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 gpt/disk06-live ONLINE 0 0 25 778M resilvered gpt/disk07-live ONLINE 0 0 0 errors: No known data errors -- Dan Langille - http://langille.org/