From owner-freebsd-fs@FreeBSD.ORG Tue Jan 3 15:17:18 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B0B3F106566B for ; Tue, 3 Jan 2012 15:17:18 +0000 (UTC) (envelope-from joh.hendriks@gmail.com) Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) by mx1.freebsd.org (Postfix) with ESMTP id 454068FC08 for ; Tue, 3 Jan 2012 15:17:17 +0000 (UTC) Received: by wibhr1 with SMTP id hr1so16054923wib.13 for ; Tue, 03 Jan 2012 07:17:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=64Ewm/b2/7RC2ojdYs6szAbWBe/ESIIer4MO2kghh3o=; b=TFmpgvDfmeHb3VPxDDhRNl9gqnxo9MYjCCTIvdXYceNl7F7qcOBFCZd1ZU3zVWM3o1 sVv3U+lH/tvX09hI9wdiCG4pc+WXRjrJmsTQjAWT5XMTujy0idw/ShztEunWWcbKmXLY hvSnvtHr2UCG9rLALVLKIXZHOiKksj7okRa78= Received: by 10.180.106.165 with SMTP id gv5mr113954976wib.18.1325603837140; Tue, 03 Jan 2012 07:17:17 -0800 (PST) Received: from [192.168.50.103] (double-l.xs4all.nl. [80.126.205.144]) by mx.google.com with ESMTPS id q5sm6282754wbo.8.2012.01.03.07.17.15 (version=SSLv3 cipher=OTHER); Tue, 03 Jan 2012 07:17:16 -0800 (PST) Message-ID: <4F031BF7.8000900@gmail.com> Date: Tue, 03 Jan 2012 16:17:11 +0100 From: Johan Hendriks User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 MIME-Version: 1.0 To: Matt Burke References: <4F031654.1080200@icritical.com> In-Reply-To: <4F031654.1080200@icritical.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: ZFS v28 on -STABLE not using hot spare X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jan 2012 15:17:18 -0000 Matt Burke schreef: > Over the holidays one of the disks on a server has failed, but despite > configuring a hot spare, ZFS hasn't used it for some reason. Can anyone > shed some light on what I might have mis-configured to break the hot-spare > functionality? > > > [root@x ~]# uname -a > FreeBSD x 8.2-STABLE FreeBSD 8.2-STABLE #4: Mon Dec 5 12:43:58 GMT 2011 > root@x:/usr/obj/usr/src/sys/x amd64 > > > [root@x ~]# more /usr/src/sys/amd64/conf/x > include GENERIC > ident x > > options GEOM_STRIPE > options ROUTETABLES=4 > > > [root@x ~]# zpool status -v > pool: data > state: DEGRADED > status: One or more devices are faulted in response to persistent errors. > Sufficient replicas exist for the pool to continue functioning in a > degraded state. > action: Replace the faulted device, or use 'zpool clear' to mark the device > repaired. > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > data DEGRADED 0 0 0 > mirror-0 ONLINE 0 0 0 > mfid0 ONLINE 0 0 0 > mfid14 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > mfid1 ONLINE 0 0 0 > mfid15 ONLINE 0 0 0 > mirror-2 DEGRADED 0 0 0 > mfid2 ONLINE 0 0 0 > mfid16 FAULTED 0 931 0 too many errors > mirror-3 ONLINE 0 0 0 > mfid3 ONLINE 0 0 0 > mfid17 ONLINE 0 0 0 > mirror-4 ONLINE 0 0 0 > mfid4 ONLINE 0 0 0 > mfid18 ONLINE 0 0 0 > mirror-5 ONLINE 0 0 0 > mfid5 ONLINE 0 0 0 > mfid19 ONLINE 0 0 0 > mirror-6 ONLINE 0 0 0 > mfid6 ONLINE 0 0 0 > mfid20 ONLINE 0 0 0 > mirror-7 ONLINE 0 0 0 > mfid7 ONLINE 0 0 0 > mfid21 ONLINE 0 0 0 > mirror-8 ONLINE 0 0 0 > mfid8 ONLINE 0 0 0 > mfid22 ONLINE 0 0 0 > mirror-9 ONLINE 0 0 0 > mfid9 ONLINE 0 0 0 > mfid23 ONLINE 0 0 0 > mirror-10 ONLINE 0 0 0 > mfid10 ONLINE 0 0 0 > mfid24 ONLINE 0 0 0 > logs > mirror-11 ONLINE 0 0 0 > mfid13 ONLINE 0 0 0 > mfid26 ONLINE 0 0 0 > cache > mfid12 ONLINE 0 0 0 > mfid25 ONLINE 0 0 0 > spares > mfid11 AVAIL > > errors: No known data errors > > The logs show loads of mfi1 and mfid16 errors for a few minutes, and then > (presumably when ZFS dropped the disk) nothing relevant after that. ZFS > hasn't logged anything, not even that it's failed a disk. > > I've manually done a 'zpool replace data mfid16 mfid11' which has brought > the spare in without problems, but I'm eager to learn what I did (or didn't > do?) to cause the spare to not be used automatically. > > Thanks in advance, > > ZFS on FreeBSD does not have 'HOT' spares. They are cold, and human intervention is needed to replace a disk in a pool. There are some topics about it on the net. I opt for a warning, because a lot of users get a false security sence when using the spares. zpool should not accept the spare without a warning to the user that it is a cold spare and not a hot one. it looks like there is some work planned for a zfs deamon that should overcome this problem on FreeBSD http://svnweb.freebsd.org/base?view=revision&revision=222836 On Solaris there is also a deamon running that does the actual replace. It should not be to hard to make a script that checks every minute or what time interval you want and check if a pool is degraded, then check if autoreplace is set for the pool, if so check if there is a spare, if so do the actual replace. Unfortunally i can not code :( Maybe some one has a script lying around. ?? regards Johan Hendriks