From owner-svn-src-all@FreeBSD.ORG Mon Mar 21 15:29:21 2011 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E642106566C; Mon, 21 Mar 2011 15:29:21 +0000 (UTC) (envelope-from pjd@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 3536F8FC18; Mon, 21 Mar 2011 15:29:21 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id p2LFTLZM010141; Mon, 21 Mar 2011 15:29:21 GMT (envelope-from pjd@svn.freebsd.org) Received: (from pjd@localhost) by svn.freebsd.org (8.14.3/8.14.3/Submit) id p2LFTL04010139; Mon, 21 Mar 2011 15:29:21 GMT (envelope-from pjd@svn.freebsd.org) Message-Id: <201103211529.p2LFTL04010139@svn.freebsd.org> From: Pawel Jakub Dawidek Date: Mon, 21 Mar 2011 15:29:21 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r219837 - head/sbin/hastd X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Mar 2011 15:29:21 -0000 Author: pjd Date: Mon Mar 21 15:29:20 2011 New Revision: 219837 URL: http://svn.freebsd.org/changeset/base/219837 Log: Before handling any events on descriptors check signals so we can update our info about worker processes if any of them was terminated in the meantime. This fixes the problem with 'hastctl status' running from a hook called on split-brain: 1. Secondary calls a hooks and terminates. 2. Hook asks for resource status via 'hastctl status'. 3. The main hastd handles the status request by sending it to the secondary worker who is already dead, but because signals weren't checked yet he doesn't know that and we get EPIPE. MFC after: 1 week Modified: head/sbin/hastd/hastd.c Modified: head/sbin/hastd/hastd.c ============================================================================== --- head/sbin/hastd/hastd.c Mon Mar 21 15:23:10 2011 (r219836) +++ head/sbin/hastd/hastd.c Mon Mar 21 15:29:20 2011 (r219837) @@ -884,19 +884,12 @@ out: } static void -main_loop(void) +check_signals(void) { - struct hast_resource *res; - struct timeval seltimeout; struct timespec sigtimeout; - int fd, maxfd, ret, signo; - time_t lastcheck, now; sigset_t mask; - fd_set rfds; + int signo; - lastcheck = time(NULL); - seltimeout.tv_sec = REPORT_INTERVAL; - seltimeout.tv_usec = 0; sigtimeout.tv_sec = 0; sigtimeout.tv_nsec = 0; @@ -906,29 +899,45 @@ main_loop(void) PJDLOG_VERIFY(sigaddset(&mask, SIGTERM) == 0); PJDLOG_VERIFY(sigaddset(&mask, SIGCHLD) == 0); + while ((signo = sigtimedwait(&mask, NULL, &sigtimeout)) != -1) { + switch (signo) { + case SIGINT: + case SIGTERM: + sigexit_received = true; + terminate_workers(); + proto_close(cfg->hc_controlconn); + exit(EX_OK); + break; + case SIGCHLD: + child_exit(); + break; + case SIGHUP: + hastd_reload(); + break; + default: + PJDLOG_ABORT("Unexpected signal (%d).", signo); + } + } +} + +static void +main_loop(void) +{ + struct hast_resource *res; + struct timeval seltimeout; + int fd, maxfd, ret; + time_t lastcheck, now; + fd_set rfds; + + lastcheck = time(NULL); + seltimeout.tv_sec = REPORT_INTERVAL; + seltimeout.tv_usec = 0; + pjdlog_info("Started successfully, running protocol version %d.", HAST_PROTO_VERSION); for (;;) { - while ((signo = sigtimedwait(&mask, NULL, &sigtimeout)) != -1) { - switch (signo) { - case SIGINT: - case SIGTERM: - sigexit_received = true; - terminate_workers(); - proto_close(cfg->hc_controlconn); - exit(EX_OK); - break; - case SIGCHLD: - child_exit(); - break; - case SIGHUP: - hastd_reload(); - break; - default: - PJDLOG_ABORT("Unexpected signal (%d).", signo); - } - } + check_signals(); /* Setup descriptors for select(2). */ FD_ZERO(&rfds); @@ -976,6 +985,12 @@ main_loop(void) pjdlog_exit(EX_OSERR, "select() failed"); } + /* + * Check for signals before we do anything to update our + * info about terminated workers in the meantime. + */ + check_signals(); + if (FD_ISSET(proto_descriptor(cfg->hc_controlconn), &rfds)) control_handle(cfg); if (FD_ISSET(proto_descriptor(cfg->hc_listenconn), &rfds))