Date: Mon, 21 Feb 2011 23:49:37 +0200 From: Mikolaj Golub <to.my.trociny@gmail.com> To: Christian Vogt <christian.vogt@haw-hamburg.de> Cc: freebsd-fs@FreeBSD.org, Pawel Jakub Dawidek <pjd@FreeBSD.org> Subject: Re: hastd Failover with ucarp Message-ID: <86ei713vny.fsf@kopusha.home.net> In-Reply-To: <2C4EE30F-7731-4B84-ADC6-75C0266863F0@haw-hamburg.de> (Christian Vogt's message of "Mon, 21 Feb 2011 16:55:35 %2B0100") References: <2C4EE30F-7731-4B84-ADC6-75C0266863F0@haw-hamburg.de>
next in thread | previous in thread | raw e-mail | index | archive | help
--=-=-= On Mon, 21 Feb 2011 16:55:35 +0100 Christian Vogt wrote: CV> Hello! CV> Thanks for the great work, I like this straight-forward FreeBSD a lot CV> from what I experienced untill now. I used the HAST How-To from CV> http://wiki.freebsd.org/HAST and it works perfectly if I use "pkill -USR2 CV> -f 'ucarp -B'" to initiate the failover. The secondary node becomes CV> primary and the carp-interface is switched over to it. CV> But if I do a hard shutdown of the primary node it doesn't work, the CV> secondary node doesn't get primary. The ucarp-up script on the secondary CV> node is executed, but it fails because of the still running secondary CV> worker process (Secondary process for resource test is still running CV> after 30 seconds). Is the secondary process expected to end CV> automatically, when the primary process fails? I think it should exit but currently it does not. In r207371 timeouts for primary incoming and outgoing and secondary outgoing were added but not for secondary incoming. After keep alive mechanism was implemented I think we can add timeout for secondary incoming too. E.g. like in the attached patch? With the patch the secondary will exit in 20 seconds if it does not receive any packets from the primary. Or may by it is better to replace RETRY_SLEEP with timeout configuration parameter, both for keep alive/reconnection interval in primary and secondary incoming timeout? -- Mikolaj Golub --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=hastd.secondary_incoming_timeout.patch Index: sbin/hastd/secondary.c =================================================================== --- sbin/hastd/secondary.c (revision 218930) +++ sbin/hastd/secondary.c (working copy) @@ -416,7 +416,7 @@ hastd_secondary(struct hast_resource *res, struct PJDLOG_VERIFY(sigprocmask(SIG_SETMASK, &mask, NULL) == 0); /* Error in setting timeout is not critical, but why should it fail? */ - if (proto_timeout(res->hr_remotein, 0) < 0) + if (proto_timeout(res->hr_remotein, RETRY_SLEEP * 2) < 0) pjdlog_errno(LOG_WARNING, "Unable to set connection timeout"); if (proto_timeout(res->hr_remoteout, res->hr_timeout) < 0) pjdlog_errno(LOG_WARNING, "Unable to set connection timeout"); Index: sbin/hastd/hast.h =================================================================== --- sbin/hastd/hast.h (revision 218930) +++ sbin/hastd/hast.h (working copy) @@ -97,6 +97,9 @@ #define HAST_ADDRSIZE 1024 #define HAST_TOKEN_SIZE 16 +/* Number of seconds to sleep between reconnect retries or keepalive packets. */ +#define RETRY_SLEEP 10 + struct hastd_config { /* Address to communicate with hastctl(8). */ char hc_controladdr[HAST_ADDRSIZE]; Index: sbin/hastd/primary.c =================================================================== --- sbin/hastd/primary.c (revision 218930) +++ sbin/hastd/primary.c (working copy) @@ -150,10 +150,6 @@ static pthread_mutex_t metadata_lock; * and remote components. */ #define HAST_NCOMPONENTS 2 -/* - * Number of seconds to sleep between reconnect retries or keepalive packets. - */ -#define RETRY_SLEEP 10 #define ISCONNECTED(res, no) \ ((res)->hr_remotein != NULL && (res)->hr_remoteout != NULL) --=-=-=--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86ei713vny.fsf>