From owner-freebsd-stable@FreeBSD.ORG Fri Apr 29 21:58:53 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E00C0106564A for ; Fri, 29 Apr 2011 21:58:53 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 6BB538FC12 for ; Fri, 29 Apr 2011 21:58:52 +0000 (UTC) Received: by fxm11 with SMTP id 11so3991181fxm.13 for ; Fri, 29 Apr 2011 14:58:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:references:x-comment-to :sender:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=PaGF4rVDuYBzFZtuQgL5e4jQAYPHPzkwB+kPXlGtqyk=; b=QV/EXdbRgKQ3PdgB687teVTUW5ow9IHzIfsXmtyTXiHljkJhx9pLL2gff0oKO2x5kF smIf7jbCWTLdAxv7Jm0g1HbdL7q1gT0YwM/EyhVeXtT6QDXWwqipHpb7CMVdMfJGQnVf ITT1H0BAnw41WrIqduYqEdSdlkvTsPyfAkWCA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:sender:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=OckUH0kpDLiPvybMaR8d52glaA5GdNgo8mrTFIsM/vW35P0VY6SDdhSNntH08AwYis uy7Sp4xkLciatOceJeHayOWkB4YL7o6/lO92RUUWhAimuFSjbWoSxg7v4mBRWJLXzTYi GiZws1F34SSQ4aB2J8Al/h9oDjHn9xB1ESrug= Received: by 10.223.23.212 with SMTP id s20mr843017fab.120.1304114332266; Fri, 29 Apr 2011 14:58:52 -0700 (PDT) Received: from localhost ([95.69.172.154]) by mx.google.com with ESMTPS id 13sm1013220fau.40.2011.04.29.14.58.49 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 29 Apr 2011 14:58:50 -0700 (PDT) From: Mikolaj Golub To: Denny Schierz References: <1301397421.11113.250.camel@pcdenny> <86ipv1ll4f.fsf@kopusha.home.net> <1303905911.4232.86.camel@pcdenny> <861v0nrdkc.fsf@in138.ua3> <1303996942.4232.160.camel@pcdenny> X-Comment-To: Denny Schierz Sender: Mikolaj Golub Date: Sat, 30 Apr 2011 00:58:48 +0300 In-Reply-To: <1303996942.4232.160.camel@pcdenny> (Denny Schierz's message of "Thu, 28 Apr 2011 15:22:22 +0200") Message-ID: <86mxj8lnxj.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-stable Subject: Re: way for failover zpool (no HAST needed): hastmon X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Apr 2011 21:58:54 -0000 Oops, just noticed this mail :-) Denny sent me another message privately and I hope I answered his questions but will answer to this message too, in case someone is interested. On Thu, 28 Apr 2011 15:22:22 +0200 Denny Schierz wrote: DS> hi, DS> ok, here we go: I've installed hastmon and both FreeBSD nodes and one on DS> Linux Debian as watchdog: DS> Simple setup: DS> DS> # cat /etc.local/hastmon.conf DS> resource sanip { DS> exec /usr/local/_rbg/bin/san-ip DS> friends iscsihead-m iscsihead-s nos DS> on iscsihead-m { DS> remote tcp4://iscsihead-s DS> priority 0 DS> } DS> on iscsihead-s { DS> remote tcp4://iscsihead-m DS> priority 1 DS> } DS> on linux { DS> remote tcp4://iscsihead-m tcp4://iscsihead-s DS> } DS> } DS> It works only half. DS> The simple script adds/remove an alias for the em0 and for status it DS> does a ping -c 1 to the global ip. After tell every host, what is role DS> is, I get on the primary "state unknown", in the secondary "state run" DS> and watchdog for the Linux host. It is difficult to tell without additional information what happened. It might be that your '/usr/local/_rbg/bin/san-ip status' was returning unknown status. In this case running manually /usr/local/_rbg/bin/san-ip status; echo $? might be helpful. And logs too :-). DS> Than I rebooted the primary, the secondary take over and executed the DS> script. After the primary was reachable again, he doesn't get the DS> secondary role, but init/unknown. DS> The same happens, in the opposite: DS> from Linux: DS> hastmonctl status DS> sanip: DS> role: watchdog DS> exec: /usr/local/_rbg/bin/san-ip DS> remote: DS> tcp4://iscsihead-m (primary/run) DS> tcp4://iscsihead-s (init/unknown) DS> state: run DS> attempts: 0 from 5 DS> complaints: 0 for last 60 sec (threshold 3) DS> heartbeat: 10 sec DS> from iscsihead-s: DS> hastmonctl status DS> sanip: DS> role: init DS> exec: /usr/local/_rbg/bin/san-ip DS> remote: DS> tcp4://iscsihead-m DS> state: unknown DS> attempts: 0 from 5 DS> complaints: 0 for last 60 sec (threshold 3) DS> heartbeat: 10 sec DS> and last from iscsihead-m DS> hastmonctl status DS> sanip: DS> role: primary DS> exec: /usr/local/_rbg/bin/san-ip DS> remote: DS> tcp4://iscsihead-s (disconnected) DS> state: run DS> attempts: 0 from 5 DS> complaints: 0 for last 60 sec (threshold 3) DS> heartbeat: 10 sec DS> If I take a look into the logfile from the iscsihead-m: DS> [sanip] (primary) Remote node acts as init for the resource and not as DS> secondary. DS> [sanip] (primary) Handshake header from tcp4://iscsihead-s has no DS> 'token' field. DS> Do I have missed something? DS> cu denny This is expected behavior. After start hastmon is in init role. You need to setup the role you want manually or via a startup script. This is because you might want different configurations depending on your requirenments: 1) After start the role is set manually by administrator (useful e.g. if you prefer to investigate crashed host before returning it back to cluster). 2) After star the node is switched to secondary automatically (by rc script). If all cluster nodes are configured to be in secondary on startup, and all started simultaneously watchdog will figure out that there is no primary and will send complaints to all secondary nodes. The nodes will be trying to switch to master simultaneously and the node with highest priority will win. 3) One node that has highest priority configures is set on startup always to primary. All others are to secondary. With this configuration if the primary fails, secondary switches to primary, then when the initial primary comes back it becomes primary again automatically. -- Mikolaj Golub