Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Feb 2011 23:49:37 +0200
From:      Mikolaj Golub <to.my.trociny@gmail.com>
To:        Christian Vogt <christian.vogt@haw-hamburg.de>
Cc:        freebsd-fs@FreeBSD.org, Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject:   Re: hastd Failover with ucarp
Message-ID:  <86ei713vny.fsf@kopusha.home.net>
In-Reply-To: <2C4EE30F-7731-4B84-ADC6-75C0266863F0@haw-hamburg.de> (Christian Vogt's message of "Mon, 21 Feb 2011 16:55:35 %2B0100")
References:  <2C4EE30F-7731-4B84-ADC6-75C0266863F0@haw-hamburg.de>

next in thread | previous in thread | raw e-mail | index | archive | help
--=-=-=


On Mon, 21 Feb 2011 16:55:35 +0100 Christian Vogt wrote:

 CV> Hello! 

 CV> Thanks for the great work, I like this straight-forward FreeBSD a lot
 CV> from what I experienced untill now. I used the HAST How-To from
 CV> http://wiki.freebsd.org/HAST and it works perfectly if I use "pkill -USR2
 CV> -f 'ucarp -B'" to initiate the failover. The secondary node becomes
 CV> primary and the carp-interface is switched over to it.

 CV> But if I do a hard shutdown of the primary node it doesn't work, the
 CV> secondary node doesn't get primary. The ucarp-up script on the secondary
 CV> node is executed, but it fails because of the still running secondary
 CV> worker process (Secondary process for resource test is still running
 CV> after 30 seconds). Is the secondary process expected to end
 CV> automatically, when the primary process fails?

I think it should exit but currently it does not. In r207371 timeouts for
primary incoming and outgoing and secondary outgoing were added but not for
secondary incoming. After keep alive mechanism was implemented I think we can
add timeout for secondary incoming too. E.g. like in the attached patch?

With the patch the secondary will exit in 20 seconds if it does not receive
any packets from the primary.

Or may by it is better to replace RETRY_SLEEP with timeout configuration
parameter, both for keep alive/reconnection interval in primary and secondary
incoming timeout?

-- 
Mikolaj Golub


--=-=-=
Content-Type: text/x-patch
Content-Disposition: attachment;
	filename=hastd.secondary_incoming_timeout.patch

Index: sbin/hastd/secondary.c
===================================================================
--- sbin/hastd/secondary.c	(revision 218930)
+++ sbin/hastd/secondary.c	(working copy)
@@ -416,7 +416,7 @@ hastd_secondary(struct hast_resource *res, struct
 	PJDLOG_VERIFY(sigprocmask(SIG_SETMASK, &mask, NULL) == 0);
 
 	/* Error in setting timeout is not critical, but why should it fail? */
-	if (proto_timeout(res->hr_remotein, 0) < 0)
+	if (proto_timeout(res->hr_remotein, RETRY_SLEEP * 2) < 0)
 		pjdlog_errno(LOG_WARNING, "Unable to set connection timeout");
 	if (proto_timeout(res->hr_remoteout, res->hr_timeout) < 0)
 		pjdlog_errno(LOG_WARNING, "Unable to set connection timeout");
Index: sbin/hastd/hast.h
===================================================================
--- sbin/hastd/hast.h	(revision 218930)
+++ sbin/hastd/hast.h	(working copy)
@@ -97,6 +97,9 @@
 #define	HAST_ADDRSIZE	1024
 #define	HAST_TOKEN_SIZE	16
 
+/* Number of seconds to sleep between reconnect retries or keepalive packets. */
+#define	RETRY_SLEEP	10
+
 struct hastd_config {
 	/* Address to communicate with hastctl(8). */
 	char	 hc_controladdr[HAST_ADDRSIZE];
Index: sbin/hastd/primary.c
===================================================================
--- sbin/hastd/primary.c	(revision 218930)
+++ sbin/hastd/primary.c	(working copy)
@@ -150,10 +150,6 @@ static pthread_mutex_t metadata_lock;
  * and remote components.
  */
 #define	HAST_NCOMPONENTS	2
-/*
- * Number of seconds to sleep between reconnect retries or keepalive packets.
- */
-#define	RETRY_SLEEP		10
 
 #define	ISCONNECTED(res, no)	\
 	((res)->hr_remotein != NULL && (res)->hr_remoteout != NULL)

--=-=-=--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86ei713vny.fsf>