From owner-freebsd-stable@FreeBSD.ORG Thu Jan 13 00:32:45 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 49DC21065675 for ; Thu, 13 Jan 2011 00:32:45 +0000 (UTC) (envelope-from cforgeron@acsi.ca) Received: from mta01.eastlink.ca (mta01.eastlink.ca [24.224.136.30]) by mx1.freebsd.org (Postfix) with ESMTP id 0E2AC8FC0A for ; Thu, 13 Jan 2011 00:32:44 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from ip02.eastlink.ca ([unknown] [24.222.39.20]) by mta01.eastlink.ca (Sun Java(tm) System Messaging Server 7u3-12.01 64bit (built Oct 15 2009)) with ESMTP id <0LEX00BSDQUKJG23@mta01.eastlink.ca> for freebsd-stable@freebsd.org; Wed, 12 Jan 2011 20:32:44 -0400 (AST) X-CMAE-Score: 0 X-CMAE-Analysis: v=1.1 cv=b0sI0M7bjhCmEOs51LbeKzGQ5ECIs9m+H5QCeOcUmtc= c=1 sm=1 a=T6VSQHqbTHoA:10 a=kj9zAlcOel0A:10 a=uVkr0bZLAAAA:8 a=ep_KMAzDAAAA:8 a=cexIBkohAAAA:8 a=6I5d2MoRAAAA:8 a=6r1kPjOZHrZPmrUSHPkA:9 a=2Vts0pPOK_bug2Pw0R8A:7 a=szAKYIf62M0mh85MRm7qFw0WMzkA:4 a=CjuIK1q_8ugA:10 a=-y2NtEVjt-8A:10 a=wbyU6Wivc7IA:10 a=RjEokbflrYAA:10 a=SV7veod9ZcQA:10 a=NnPfv43VsoWtsnbu:21 a=vUjyhbbWYmqtMJCT:21 a=Y4g+zi6NJtbRuBVJrbSZ6Q==:117 Received: from blk-222-10-85.eastlink.ca (HELO server7.acsi.ca) ([24.222.10.85]) by ip02.eastlink.ca with ESMTP; Wed, 12 Jan 2011 20:32:43 -0400 Received: from server7.acsi.ca ([192.168.9.7]) by server7.acsi.ca ([192.168.9.7]) with mapi; Wed, 12 Jan 2011 20:32:43 -0400 From: Chris Forgeron To: freebsd-stable Date: Wed, 12 Jan 2011 20:32:42 -0400 Thread-topic: ZFS - hot spares : automatic or not? Thread-index: AcuxqmZWxHSqk1RpTla28yMUxgTo2QBDdL5Q Message-id: References: <4D228F41.7040403@langille.org> <4D23504D.8060103@libeljournal.com> <4D2BD0A7.9060003@langille.org> <4D2C810E.2070007@libeljournal.com> In-reply-to: <4D2C810E.2070007@libeljournal.com> Accept-Language: en-US Content-language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Subject: RE: ZFS - hot spares : automatic or not? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jan 2011 00:32:45 -0000 Interesting, I was just testing Solaris 11 Express's ability to handle a pulled drive today. It handles it quite well. However, my Areca 1880 drive (arcmsr0) crashes when you reinsert the drive.. but that's another topic, and an issue for Areca tech support.. ..back to the point: Solaris runs a separate process called Fault Management Daemon (fmd) that looks to handle this logic - This means that it's really not inside the ZFS code to handle this, and FreeBSD would need something similar, hopefully less kludgy than a user script. I wonder if anyone has been eyeing the fma code in the cddl with a thought for porting it - It looks to be a really neat bit of code - I'm still quite new with it, having only been working with Solaris the last few months. Here's two links to a bit of info on the Solaris daemon: http://www.princeton.edu/~unix/Solaris/troubleshoot/fm.html http://hub.opensolaris.org/bin/view/Community+Group+fm/ Here's my log of the event in Solaris 11 Express: Jan 12 21:28:47 solaris fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Jan 12 21:28:47 solaris EVENT-TIME: Wed Jan 12 21:28:47 UTC 2011 Jan 12 21:28:47 solaris PLATFORM: PowerEdge-T710, CSN: 39SLQN1, HOSTNAME: solaris Jan 12 21:28:47 solaris SOURCE: zfs-diagnosis, REV: 1.0 Jan 12 21:28:47 solaris EVENT-ID: ccfa7a23-838b-ebc8-decf-c2607afb390d Jan 12 21:28:47 solaris DESC: The number of I/O errors associated with a ZFS device exceeded Jan 12 21:28:47 solaris acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Jan 12 21:28:47 solaris AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Jan 12 21:28:47 solaris will be made to activate a hot spare if available. Jan 12 21:28:47 solaris IMPACT: Fault tolerance of the pool may be compromised. Jan 12 21:28:47 solaris REC-ACTION: Run 'zpool status -x' and replace the bad device. -----Original Message----- From: owner-freebsd-stable@freebsd.org [mailto:owner-freebsd-stable@freebsd.org] On Behalf Of John Hawkes-Reed Sent: Tuesday, January 11, 2011 12:11 PM To: Dan Langille Cc: freebsd-stable Subject: Re: ZFS - hot spares : automatic or not? On 11/01/2011 03:38, Dan Langille wrote: > On 1/4/2011 11:52 AM, John Hawkes-Reed wrote: >> On 04/01/2011 03:08, Dan Langille wrote: >>> Hello folks, >>> >>> I'm trying to discover if ZFS under FreeBSD will automatically pull in a >>> hot spare if one is required. >>> >>> This raised the issue back in March 2010, and refers to a PR opened in >>> May 2009 >>> >>> * http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007943.html >>> * http://www.freebsd.org/cgi/query-pr.cgi?pr=134491 >>> >>> In turn, the PR refers to this March 2010 post referring to using devd >>> to accomplish this task. >>> >>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/055686.html >>> >>> Does the above represent the the current state? >>> >>> I ask because I just ordered two more HDD to use as spares. Whether they >>> sit on the shelf or in the box is open to discussion. >> >> As far as our testing could discover, it's not automatic. >> >> I wrote some Ugly Perl that's called by devd when it spots a drive-fail >> event, which seemed to DTRT when simulating a failure by pulling a drive. > > Without such a script, what is the value in creating hot spares? We went through that loop in the office. We're used to the way the Netapps work here, where often one's first notice of a failed disk is a visit from the courier with a replacement. (I'm only half joking) In the end, writing enough perl to swap in the spare disk made much more sense than paging the relevant admin on disk-fail and expecting them to be able to type straight at 4AM. Our thinking is that having a hot spare allows us to do the physical disk-swap in office hours, rather than (for instance) running in a degraded state over a long weekend. If it's of interest, I'll see if I can share the code. -- JH-R _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"