From owner-freebsd-stable@FreeBSD.ORG  Fri Jul  1 03:33:53 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 62EF5106564A
	for <freebsd-stable@freebsd.org>; Fri,  1 Jul 2011 03:33:53 +0000 (UTC)
	(envelope-from tts@personalmis.com)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id EA3E68FC0A
	for <freebsd-stable@freebsd.org>; Fri,  1 Jul 2011 03:33:52 +0000 (UTC)
Received: by bwa20 with SMTP id 20so3361745bwa.13
	for <freebsd-stable@freebsd.org>; Thu, 30 Jun 2011 20:33:51 -0700 (PDT)
Received: by 10.204.42.21 with SMTP id q21mr1327436bke.186.1309489344668;
	Thu, 30 Jun 2011 20:02:24 -0700 (PDT)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx.google.com with ESMTPS id k16sm2590963bks.13.2011.06.30.20.02.20
	(version=SSLv3 cipher=OTHER); Thu, 30 Jun 2011 20:02:22 -0700 (PDT)
Received: by bwa20 with SMTP id 20so3345761bwa.13
	for <freebsd-stable@freebsd.org>; Thu, 30 Jun 2011 20:02:19 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.204.19.83 with SMTP id z19mr2566192bka.191.1309489339783; Thu,
	30 Jun 2011 20:02:19 -0700 (PDT)
Received: by 10.204.117.198 with HTTP; Thu, 30 Jun 2011 20:02:19 -0700 (PDT)
Date: Thu, 30 Jun 2011 20:02:19 -0700
Message-ID: <BANLkTi==ctVw1HpGkw-8QG68abCg-1Vp9g@mail.gmail.com>
From: Timothy Smith <tts@personalmis.com>
To: freebsd-stable@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: HAST + ZFS: no action on drive failure
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Jul 2011 03:33:53 -0000

First posting here, hopefully I'm doing it right =)

I also posted this to the FreeBSD forum, but I know some hast folks monitor
this list regularly and not so much there, so...

Basically, I'm testing failure scenarios with HAST/ZFS. I got two nodes,
scripted up a bunch of checks and failover actions between the nodes.
Looking good so far, though more complex that I expected. It would be cool
to post it somewher to get some pointers/critiques, but that's another
thing.

Anyway, now I'm just seeing what happens when a drive fails on primary node.
Oddly/sadly, NOTHING!

Hast just keeps on a ticking, and doesn't change the state of the failed
drive, so the zpool has no clue the drive is offline. The
/dev/hast/<resource> remains. The hastd does log some errors to the system
log like this, but nothing more.

messages.0:Jun 30 18:39:59 nas1 hastd[11066]: [ada6] (primary) Unable to
flush activemap to disk: Device not configured.
messages.0:Jun 30 18:39:59 nas1 hastd[11066]: [ada6] (primary) Local request
failed (Device not configured): WRITE(4736512, 512).

So, I guess the question is, "Do I have to script a cronjob to check for
these kinds of errors and then change the hast resource to 'init' or
something to handle this?" Or is there some kind of hastd config setting
that I need to set? What's the SOP for this?

As something related too, when the zpool in FreeBSD does finally notice that
the drive is missing because I have manually changed the hast resource to
INIT (so the /dev/hast/<res> is gone), my zpool (raidz2) hot spare doesn't
engage, even with "autoreplace=on". The zpool status of the degraded pool
seems to indicate that I should manually replace the failed drive. If that's
the case, it's not really a "hot spare". Does this mean the "FMA Agent"
referred to in the ZFS manual is not implemented in FreeBSD?

thanks!