From owner-freebsd-fs@freebsd.org  Thu Oct 29 02:20:14 2015
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id AD4C0A20D92
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 29 Oct 2015 02:20:14 +0000 (UTC)
 (envelope-from mwlucas@mail.michaelwlucas.com)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 9340D1DAD
 for <freebsd-fs@freebsd.org>; Thu, 29 Oct 2015 02:20:14 +0000 (UTC)
 (envelope-from mwlucas@mail.michaelwlucas.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 90537A20D90; Thu, 29 Oct 2015 02:20:14 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 774A4A20D8E
 for <fs@mailman.ysv.freebsd.org>; Thu, 29 Oct 2015 02:20:14 +0000 (UTC)
 (envelope-from mwlucas@mail.michaelwlucas.com)
Received: from mail.michaelwlucas.com (mail.michaelwlucas.com
 [104.236.197.233])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 05B391DAA
 for <fs@freebsd.org>; Thu, 29 Oct 2015 02:20:13 +0000 (UTC)
 (envelope-from mwlucas@mail.michaelwlucas.com)
Received: from mail.michaelwlucas.com (localhost [127.0.0.1])
 by mail.michaelwlucas.com (8.14.9/8.14.7) with ESMTP id t9T1vMik095128
 for <fs@freebsd.org>; Wed, 28 Oct 2015 21:57:22 -0400 (EDT)
 (envelope-from mwlucas@mail.michaelwlucas.com)
Received: (from mwlucas@localhost)
 by mail.michaelwlucas.com (8.14.9/8.14.7/Submit) id t9T1vMqU095127
 for fs@freebsd.org; Wed, 28 Oct 2015 21:57:22 -0400 (EDT)
 (envelope-from mwlucas)
Date: Wed, 28 Oct 2015 21:57:21 -0400
From: "Michael W. Lucas" <mwlucas@michaelwlucas.com>
To: fs@freebsd.org
Subject: iSCSI/ZFS strangeness
Message-ID: <20151029015721.GA95057@mail.michaelwlucas.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Status: No, score=0.0 required=5.0 tests=UNPARSEABLE_RELAY,
 URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on
 mail.michaelwlucas.com
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (mail.michaelwlucas.com [127.0.0.1]); Wed, 28 Oct 2015 21:57:27 -0400 (EDT)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Oct 2015 02:20:14 -0000

Hi,

I'm experimenting with iSCSI HA with FreeBSD 10.2 amd64. I know people
do this sort of thing, but I can't figure out what I'm missing. (Most
of the tutorials cover HAST instead). I suspect the real problem is
"Lucas doesn't know the right search terms."

The goal is to make an iSCSI-based ZFS pool that's available to two
separate hosts, and remains available even if one of the iSCSI servers
fails. Instead, the pool hangs when either of the iSCSI servers goes
down.

My test environment has two iSCSI servers, iscsi1 and iscsi2. They
each export three drives as a single target.

There's two iSCSI initiators, zfs1 and zfs2. Both of them have active
connections to the iSCSI targets.

On another host I've created a ZFS pool of striped mirrors. Each
mirror has one drive from each iSCSI server.

The initiators can both access the iSCSI-based pool--not
simultaneously, of course. But CARP, devd, and some shell scripting
should get me a highly available pool that can withstand the demise of
any one iSCSI server and any one initiator.

The hope is that the pool would continue to work even if an iSCSI host
shuts down. When the downed iSCSI host returns, the initiators should
log back in and the pool auto-resilver.

Some ten minutes ago, I killed iscsi2. The pool is live on zfs1. And
the drives really have disappeared.

# iscsictl
Target name                          Target portal    State
iqn.2013-11.io.mwl:target0           iscsi2.blackhelicopters.org Operation timed out
iqn.2013-11.io.mwl:target0           iscsi1.blackhelicopters.org Connected: da2 da3 da4

I would expect to see the pool appear degraded. But instead, I have:

# zpool status iscsi
  pool: iscsi
   state: ONLINE
     scan: resilvered 1.16G in 0h3m with 0 errors on Wed Oct 28 14:13:08 2015
     config:

        NAME              STATE     READ WRITE CKSUM
        iscsi             ONLINE       0     0     0
          mirror-0        ONLINE       0     0     0
            gpt/iscsi1-0  ONLINE       0     0     0
            gpt/iscsi2-0  ONLINE       0     0     0
          mirror-1        ONLINE       0     0     0
            gpt/iscsi1-1  ONLINE       0     0     0
	    gpt/iscsi2-1  ONLINE       0     0     0
	  mirror-2        ONLINE       0     0     0
	    gpt/iscsi1-2  ONLINE       0     0     0
	    gpt/iscsi2-2  ONLINE       0     0     0

errors: No known data errors

To try to make ZFS realize the pool is degraded, I write to the iSCSI
pool. (tar -xvpf ports.tar.gz) Each time, the extract gets to a
certain point and hangs. Can't ^C or ^Z out of it.

This latest time, the extract reaches:

x ports/www/firefox-esr/files/patch-media-mtransport-third_party-nICEr-src-util-mbslen.c

I can still SSH into the machine, but if I try to look in any
directories under /iscsi/ports/* my terminal hangs.

So I restart the downed iSCSI server. The initiators log back into the
target.  And the hung tar extract picks up where it left off.

So, I haven't achieved HA. The pool stays up, but it's not exactly
usable.

Any hints on what I'm missing?

Thanks,
==ml

-- 
Michael W. Lucas  -  mwlucas@michaelwlucas.com, Twitter @mwlauthor 
http://www.MichaelWLucas.com/, http://blather.MichaelWLucas.com/