From owner-freebsd-fs@FreeBSD.ORG Thu Jun 6 22:39:33 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7936CAB2 for ; Thu, 6 Jun 2013 22:39:33 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id 0228A197B for ; Thu, 6 Jun 2013 22:39:32 +0000 (UTC) Received: from mfilter27-d.gandi.net (mfilter27-d.gandi.net [217.70.178.155]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id C0ADDA80BE; Fri, 7 Jun 2013 00:39:15 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter27-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter27-d.gandi.net (mfilter27-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id qfD2WC21NxB2; Fri, 7 Jun 2013 00:39:13 +0200 (CEST) X-Originating-IP: 67.180.84.87 Received: from jdc.koitsu.org (c-67-180-84-87.hsd1.ca.comcast.net [67.180.84.87]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 3C698A80B6; Fri, 7 Jun 2013 00:39:13 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 4FB6D73A1C; Thu, 6 Jun 2013 15:39:11 -0700 (PDT) Date: Thu, 6 Jun 2013 15:39:11 -0700 From: Jeremy Chadwick To: mxb Subject: Re: zpool export/import on failover - The pool metadata is corrupted Message-ID: <20130606223911.GA45807@icarus.home.lan> References: <016B635E-4EDC-4CDF-AC58-82AC39CBFF56@alumni.chalmers.se> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <016B635E-4EDC-4CDF-AC58-82AC39CBFF56@alumni.chalmers.se> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Jun 2013 22:39:33 -0000 On Fri, Jun 07, 2013 at 12:12:39AM +0200, mxb wrote: > > Then MASTER goes down, CARP on the second node goes MASTER (devd.conf, and script for lifting): > > root@nfs2:/root # cat /etc/devd.conf > > > notify 30 { > match "system" "IFNET"; > match "subsystem" "carp0"; > match "type" "LINK_UP"; > action "/etc/zfs_switch.sh active"; > }; > > notify 30 { > match "system" "IFNET"; > match "subsystem" "carp0"; > match "type" "LINK_DOWN"; > action "/etc/zfs_switch.sh backup"; > }; > > root@nfs2:/root # cat /etc/zfs_switch.sh > #!/bin/sh > > DATE=`date +%Y%m%d` > HOSTNAME=`hostname` > > ZFS_POOL="jbod" > > > case $1 in > active) > echo "Switching to ACTIVE and importing ZFS" | mail -s ''$DATE': '$HOSTNAME' switching to ACTIVE' root > sleep 10 > /sbin/zpool import -f jbod > /etc/rc.d/mountd restart > /etc/rc.d/nfsd restart > ;; > backup) > echo "Switching to BACKUP and exporting ZFS" | mail -s ''$DATE': '$HOSTNAME' switching to BACKUP' root > /sbin/zpool export jbod > /etc/rc.d/mountd restart > /etc/rc.d/nfsd restart > ;; > *) > exit 0 > ;; > esac > > This works, most of the time, but sometimes I'm forced to re-create pool. Those machines suppose to go into prod. > Loosing pool(and data inside it) stops me from deploy this setup. This script looks highly error-prone. Hasty hasty... :-) This script assumes that the "zpool" commands (import and export) always work/succeed; there is no exit code ($?) checking being used. Since this is run from within devd(8): where does stdout/stderr go to when running a program/script under devd(8)? Does it effectively go to the bit bucket (/dev/null)? If so, you'd never know if the import or export actually succeeded or not (the export sounds more likely to be the problem point). I imagine there would be some situations where the export would fail (some files on filesystems under pool "jbod" still in use), yet CARP is already blindly assuming everything will be fantastic. Surprise. I also do not know if devd.conf(5) "action" commands spawn a sub-shell (/bin/sh) or not. If they don't, you won't be able to use things like" 'action "/etc/zfs_switch.sh active >> /var/log/failover.log";'. You would then need to implement the equivalent of logging within your zfs_switch.sh script. You may want to consider the -f flag to zpool import/export (particularly export). However there are risks involved -- userland applications which have an fd/fh open on a file which is stored on a filesystem that has now completely disappeared can sometimes crash (segfault) or behave very oddly (100% CPU usage, etc.) depending on how they're designed. Basically what I'm trying to say is that devd(8) being used as a form of HA (high availability) and load balancing is not always possible. Real/true HA (especially with SANs) is often done very differently (now you know why it's often proprietary. :-) ) -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB |