From owner-freebsd-fs@FreeBSD.ORG  Thu May 19 23:22:59 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DB83E1065670;
	Thu, 19 May 2011 23:22:58 +0000 (UTC)
	(envelope-from fjwcash@gmail.com)
Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com
	[209.85.213.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 6C8CE8FC12;
	Thu, 19 May 2011 23:22:58 +0000 (UTC)
Received: by yxl31 with SMTP id 31so1427409yxl.13
	for <multiple recipients>; Thu, 19 May 2011 16:22:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=WT7LfSUGZjGBQDHfJxmgmjseHDgVbSiZMZ4g+ma8V9A=;
	b=TtRMCQb8hEOhu3hSkGj+LoSJxSoKD2oUkZibBE8/AUIltro85SdoJ6g9Tv38mdF8vV
	Tu6hjLkiityPbfsSdJ5eo1nK/0/d1+aQOl1xH00yTxtevy9ycoNAkuFrU4tG6CjQ0kYf
	TGgWXotBWdS2oK4Mg6ffOM/59swPdiNFAiZIw=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=D04O2Ce5UiWVQXWXkz0mOk8VyO4uOu4VZfHNLuTdofMxxURYT9emGjUfYWCeVBT6Li
	I3T43uJzqDAPCXAqTnyYPMh0JJvnLPD5CPOXn6hYejD9UtpYcR9A9185hbuZlQQu7DiP
	MAzHHglLQqnGgXaAlgPN1p5el/iIqEar//GLM=
MIME-Version: 1.0
Received: by 10.90.147.18 with SMTP id u18mr251911agd.95.1305847377816; Thu,
	19 May 2011 16:22:57 -0700 (PDT)
Received: by 10.90.138.17 with HTTP; Thu, 19 May 2011 16:22:57 -0700 (PDT)
In-Reply-To: <20110519230921.GF2100@garage.freebsd.pl>
References: <85EC77D3-116E-43B0-BFF1-AE1BD71B5CE9@itassistans.se>
	<20110519181436.GB2100@garage.freebsd.pl>
	<4DD5A1CF.70807@itassistans.se>
	<20110519230921.GF2100@garage.freebsd.pl>
Date: Thu, 19 May 2011 16:22:57 -0700
Message-ID: <BANLkTi=1psNnEOFxD1YEmuNAHRDyXBdBfw@mail.gmail.com>
From: Freddie Cash <fjwcash@gmail.com>
To: Pawel Jakub Dawidek <pjd@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org
Subject: Re: HAST + ZFS self healing? Hot spares?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 19 May 2011 23:22:59 -0000

On Thu, May 19, 2011 at 4:09 PM, Pawel Jakub Dawidek <pjd@freebsd.org>
wrote:
> On Fri, May 20, 2011 at 01:03:43AM +0200, Per von Zweigbergk wrote:
>> Very well, that is how failures are handled. But how do we *recover*
>> from a disk failure? Without taking the entire server down that is.
>
> HAST opens local disk only when changing role to primary or changing
> role to secondary and accepting connection from primary.
> If your disk fails, switch to init for that HAST device, replace you
> disk, call 'hastctl create <resource>' and switch back to primary or
> secondary.
>
>> I already know how to deal with my HBA to hot-add and hot-remove
>> devices. And how to deal with hardware failures on the *secondary*
>> node seems fairly straightforward, after all, it doesn't really
>> matter if the mirroring becomes degraded for a few seconds while I
>> futz around with restarting hastd and such. The primary sees the
>> secondary disappear a few seconds, when it comes back, it will just
>> truck all of the dirty data over. Big deal.
>
> You don't need to restart hastd or stop secondary. Just use hastctl to
> change role to init for the failing resource.

This process works exceedingly well.  Just went through it a week or so
ago.  You just need to think in layers, the way GEOM works:

Non-HAST setup                       HAST setup
------------------                   ------------------
<disk>                               <disk>
<controller>                         <controller>
<glabel, gpt, etc>                   <glabel, gpt, etc>
<zfs>                                <hast>
                                     <zfs>

The non-HAST process for replacing a disk in a ZFS pool is:
 - zpool offline poolname diskname
 - remove dead disk
 - insert new disk
 - partition, label, etc as needed
 - zpool replace poolname olddisk newdisk
 - wait for resilver to complete

With HAST, there's only a couple of small changes needed:
 - zpool offline poolname diskname        <-- removes the /dev/hast node
from the pool
 - hastctl role init diskname             <-- removes the /dev/hast node
 - remove dead disk
 - insert new disk
 - partition, label, etc as needed
 - hastctl role create diskname           <-- creates the hast resource
 - hastctl role primary diskname          <-- creates the new /dev/hast node
 - zpool replace poolname olddisk newdisk <-- adds the /dev/hast node to
pool
 - wait for resilver to complete

The downside to this setup is that the data on the disk in the secondary
node is lost, as the resilver of the disk on the primary node recreates all
the data on the secondary node.  But, at least then you know the data is
good on both disks in the HAST resource.

-- 
Freddie Cash
fjwcash@gmail.com