From owner-freebsd-fs@FreeBSD.ORG Mon Feb 21 21:49:44 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9BA3C106566C; Mon, 21 Feb 2011 21:49:44 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-bw0-f44.google.com (mail-bw0-f44.google.com [209.85.214.44]) by mx1.freebsd.org (Postfix) with ESMTP id E53C58FC08; Mon, 21 Feb 2011 21:49:43 +0000 (UTC) Received: by bwz13 with SMTP id 13so2855513bwz.17 for ; Mon, 21 Feb 2011 13:49:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:references:x-comment-to:date :in-reply-to:message-id:user-agent:mime-version:content-type; bh=6C//JpSJ1BkRkPFDtUTt+Jk4R4PqXv/Z7XoTahBHJkw=; b=exRwoTpmzGdpCyKsTmieKH+8L7jVDJwbbUfoC4uV0SYcuimZbpsMY4FRB6rPDPfw50 g9axQNsbcMZMSbc6M91+hRc2rXtE/pisTgr4DxT36PwfCtMn52T0G5JXudHkP/YyUwPc dU7J/CTcYZtihjIL5znHBWJ9L1l3KAoB2DAQ0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=qO6+38OoFGxv8/k6xaDrx4tS2RQdwL5tJd9DC/UDTxMMIOaQFmreq1wkiPVJ0JXT2C fYyH8mBleW7SBGzLzsCjyRXn2qylxJ9fBppJJaY9SBOn/GdGOnLizRoHE1epiw1mmjYB d1bLpxOaSqleyxiJ+qjJja1shF9ztPCtMfiGI= Received: by 10.204.68.65 with SMTP id u1mr1813323bki.193.1298324981809; Mon, 21 Feb 2011 13:49:41 -0800 (PST) Received: from localhost ([95.69.172.154]) by mx.google.com with ESMTPS id x38sm4054089bkj.13.2011.02.21.13.49.39 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 21 Feb 2011 13:49:40 -0800 (PST) From: Mikolaj Golub To: Christian Vogt References: <2C4EE30F-7731-4B84-ADC6-75C0266863F0@haw-hamburg.de> X-Comment-To: Christian Vogt Date: Mon, 21 Feb 2011 23:49:37 +0200 In-Reply-To: <2C4EE30F-7731-4B84-ADC6-75C0266863F0@haw-hamburg.de> (Christian Vogt's message of "Mon, 21 Feb 2011 16:55:35 +0100") Message-ID: <86ei713vny.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Cc: freebsd-fs@FreeBSD.org, Pawel Jakub Dawidek Subject: Re: hastd Failover with ucarp X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Feb 2011 21:49:44 -0000 --=-=-= On Mon, 21 Feb 2011 16:55:35 +0100 Christian Vogt wrote: CV> Hello! CV> Thanks for the great work, I like this straight-forward FreeBSD a lot CV> from what I experienced untill now. I used the HAST How-To from CV> http://wiki.freebsd.org/HAST and it works perfectly if I use "pkill -USR2 CV> -f 'ucarp -B'" to initiate the failover. The secondary node becomes CV> primary and the carp-interface is switched over to it. CV> But if I do a hard shutdown of the primary node it doesn't work, the CV> secondary node doesn't get primary. The ucarp-up script on the secondary CV> node is executed, but it fails because of the still running secondary CV> worker process (Secondary process for resource test is still running CV> after 30 seconds). Is the secondary process expected to end CV> automatically, when the primary process fails? I think it should exit but currently it does not. In r207371 timeouts for primary incoming and outgoing and secondary outgoing were added but not for secondary incoming. After keep alive mechanism was implemented I think we can add timeout for secondary incoming too. E.g. like in the attached patch? With the patch the secondary will exit in 20 seconds if it does not receive any packets from the primary. Or may by it is better to replace RETRY_SLEEP with timeout configuration parameter, both for keep alive/reconnection interval in primary and secondary incoming timeout? -- Mikolaj Golub --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=hastd.secondary_incoming_timeout.patch Index: sbin/hastd/secondary.c =================================================================== --- sbin/hastd/secondary.c (revision 218930) +++ sbin/hastd/secondary.c (working copy) @@ -416,7 +416,7 @@ hastd_secondary(struct hast_resource *res, struct PJDLOG_VERIFY(sigprocmask(SIG_SETMASK, &mask, NULL) == 0); /* Error in setting timeout is not critical, but why should it fail? */ - if (proto_timeout(res->hr_remotein, 0) < 0) + if (proto_timeout(res->hr_remotein, RETRY_SLEEP * 2) < 0) pjdlog_errno(LOG_WARNING, "Unable to set connection timeout"); if (proto_timeout(res->hr_remoteout, res->hr_timeout) < 0) pjdlog_errno(LOG_WARNING, "Unable to set connection timeout"); Index: sbin/hastd/hast.h =================================================================== --- sbin/hastd/hast.h (revision 218930) +++ sbin/hastd/hast.h (working copy) @@ -97,6 +97,9 @@ #define HAST_ADDRSIZE 1024 #define HAST_TOKEN_SIZE 16 +/* Number of seconds to sleep between reconnect retries or keepalive packets. */ +#define RETRY_SLEEP 10 + struct hastd_config { /* Address to communicate with hastctl(8). */ char hc_controladdr[HAST_ADDRSIZE]; Index: sbin/hastd/primary.c =================================================================== --- sbin/hastd/primary.c (revision 218930) +++ sbin/hastd/primary.c (working copy) @@ -150,10 +150,6 @@ static pthread_mutex_t metadata_lock; * and remote components. */ #define HAST_NCOMPONENTS 2 -/* - * Number of seconds to sleep between reconnect retries or keepalive packets. - */ -#define RETRY_SLEEP 10 #define ISCONNECTED(res, no) \ ((res)->hr_remotein != NULL && (res)->hr_remoteout != NULL) --=-=-=--