Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Apr 2011 00:58:48 +0300
From:      Mikolaj Golub <trociny@freebsd.org>
To:        Denny Schierz <linuxmail@4lin.net>
Cc:        freebsd-stable <freebsd-stable@freebsd.org>
Subject:   Re: way for failover zpool (no HAST needed): hastmon
Message-ID:  <86mxj8lnxj.fsf@kopusha.home.net>
In-Reply-To: <1303996942.4232.160.camel@pcdenny> (Denny Schierz's message of "Thu, 28 Apr 2011 15:22:22 %2B0200")
References:  <1301397421.11113.250.camel@pcdenny> <86ipv1ll4f.fsf@kopusha.home.net> <1303905911.4232.86.camel@pcdenny> <861v0nrdkc.fsf@in138.ua3> <1303996942.4232.160.camel@pcdenny>

next in thread | previous in thread | raw e-mail | index | archive | help
Oops, just noticed this mail :-) Denny sent me another message privately and I
hope I answered his questions but will answer to this message too, in case
someone is interested.

On Thu, 28 Apr 2011 15:22:22 +0200 Denny Schierz wrote:

 DS> hi,

 DS> ok, here we go: I've installed hastmon and both FreeBSD nodes and one on
 DS> Linux Debian as watchdog:

 DS> Simple setup:
 DS>  
 DS> # cat /etc.local/hastmon.conf 

 DS> resource sanip {
 DS>         exec /usr/local/_rbg/bin/san-ip
 DS>         friends iscsihead-m iscsihead-s nos

 DS>         on iscsihead-m {
 DS>                 remote tcp4://iscsihead-s
 DS>                 priority 0
 DS>         }
 DS>         on iscsihead-s {
 DS>                 remote tcp4://iscsihead-m
 DS>                 priority 1
 DS>         }
 DS>         on linux {
 DS>                 remote tcp4://iscsihead-m tcp4://iscsihead-s
 DS>         }
 DS> } 

 DS> It works only half. 

 DS> The simple script adds/remove an alias for the em0 and for status it
 DS> does a ping -c 1 to the global ip. After tell every host, what is role
 DS> is, I get on the primary "state unknown", in the secondary "state run"
 DS> and watchdog for the Linux host.

It is difficult to tell without additional information what happened. It might
be that your '/usr/local/_rbg/bin/san-ip status' was returning unknown status.

In this case running manually

/usr/local/_rbg/bin/san-ip status; echo $?

might be helpful. And logs too :-).

 DS> Than I rebooted the primary, the secondary take over and executed the
 DS> script. After the primary was reachable again, he doesn't get the
 DS> secondary role, but init/unknown.

 DS> The same happens, in the opposite:

 DS> from Linux:

 DS> hastmonctl status
 DS> sanip:
 DS>   role: watchdog
 DS>   exec: /usr/local/_rbg/bin/san-ip
 DS>   remote:
 DS>     tcp4://iscsihead-m (primary/run)
 DS>     tcp4://iscsihead-s (init/unknown)
 DS>   state: run
 DS>   attempts: 0 from 5
 DS>   complaints: 0 for last 60 sec (threshold 3)
 DS>   heartbeat: 10 sec

 DS> from iscsihead-s:

 DS> hastmonctl status
 DS> sanip:
 DS>   role: init
 DS>   exec: /usr/local/_rbg/bin/san-ip
 DS>   remote:
 DS>     tcp4://iscsihead-m
 DS>   state: unknown
 DS>   attempts: 0 from 5
 DS>   complaints: 0 for last 60 sec (threshold 3)
 DS>   heartbeat: 10 sec

 DS> and last from iscsihead-m


 DS> hastmonctl status
 DS> sanip:
 DS>   role: primary
 DS>   exec: /usr/local/_rbg/bin/san-ip
 DS>   remote:
 DS>     tcp4://iscsihead-s (disconnected)
 DS>   state: run
 DS>   attempts: 0 from 5
 DS>   complaints: 0 for last 60 sec (threshold 3)
 DS>   heartbeat: 10 sec

 DS> If I take a look into the logfile from the iscsihead-m:

 DS> [sanip] (primary) Remote node acts as init for the resource and not as
 DS> secondary.

 DS> [sanip] (primary) Handshake header from tcp4://iscsihead-s has no
 DS> 'token' field.

 DS> Do I have missed something?

 DS> cu denny

This is expected behavior. After start hastmon is in init role. You need to
setup the role you want manually or via a startup script.

This is because you might want different configurations depending on your
requirenments:

1) After start the role is set manually by administrator (useful e.g. if you
prefer to investigate crashed host before returning it back to cluster).

2) After star the node is switched to secondary automatically (by rc script).

If all cluster nodes are configured to be in secondary on startup, and all
started simultaneously watchdog will figure out that there is no primary and
will send complaints to all secondary nodes. The nodes will be trying to
switch to master simultaneously and the node with highest priority will win.

3) One node that has highest priority configures is set on startup always to
primary. All others are to secondary.

With this configuration if the primary fails, secondary switches to primary,
then when the initial primary comes back it becomes primary again
automatically.

-- 
Mikolaj Golub



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86mxj8lnxj.fsf>