Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 25 Apr 2010 14:17:15 +0300
From:      Mikolaj Golub <to.my.trociny@gmail.com>
To:        Mikolaj Golub <to.my.trociny@gmail.com>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>, Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject:   Re: HAST: primary might get stuck when there are connectivity problems with secondary
Message-ID:  <86tyqzeq84.fsf@kopusha.onet>
In-Reply-To: <868w8dgk4e.fsf@kopusha.onet> (Mikolaj Golub's message of "Sat\, 24 Apr 2010 14\:33\:53 %2B0300")
References:  <86r5m9dvqf.fsf@zhuzha.ua1> <20100423062950.GD1670@garage.freebsd.pl> <86k4rye33e.fsf@zhuzha.ua1> <20100424073031.GD3067@garage.freebsd.pl> <868w8dgk4e.fsf@kopusha.onet>

next in thread | previous in thread | raw e-mail | index | archive | help
--=-=-=

On Sat, 24 Apr 2010 14:33:53 +0300 Mikolaj Golub wrote:

> From the code I don't see how hast_proto_recv_hdr() may timeout if the
> connection is alive, have I missed something?

I did some experiments adding the code that sets SO_RCVTIMEO socket option
(see the attached patch). It fixes this issue. After timeout the worker on the
secondary is restarted with the error:

Apr 25 13:06:45 hastb hastd: [storage] (secondary) Unable to receive request header: Resource temporarily unavailable.
Apr 25 13:06:45 hastb hastd: [storage] (secondary) Worker process (pid=1243) exited ungracefully: status=19200.

On the other hand when the FS is idle (there is no I/O at all) we have the
worker restart too and the primary is not being connected to the secondary
until some I/O appears. So it might look not very nicely :-)

Also note, I had to modify proto_common_recv() to have timeout working. After
timeout recv() sets errno to EWOULDBLOCK, which has the same number as EAGAIN
in FreeBSD. The current proto_common_recv() restarts recv() if EAGAIN is
returned.

-- 
Mikolaj Golub


--=-=-=
Content-Type: text/x-diff
Content-Disposition: inline; filename=hastd.proto_tcp4.c.SO_RCVTIMEO.patch

Index: sbin/hastd/proto_common.c
===================================================================
--- sbin/hastd/proto_common.c	(revision 207185)
+++ sbin/hastd/proto_common.c	(working copy)
@@ -76,7 +76,7 @@ proto_common_recv(int fd, unsigned char *data, siz
 
 	do {
 		done = recv(fd, data, size, MSG_WAITALL);
-	} while (done == -1 && errno == EAGAIN);
+	} while (done == -1 && errno == EINTR);
 	if (done == 0)
 		return (ENOTCONN);
 	else if (done < 0)
Index: sbin/hastd/proto_tcp4.c
===================================================================
--- sbin/hastd/proto_tcp4.c	(revision 207185)
+++ sbin/hastd/proto_tcp4.c	(working copy)
@@ -31,6 +31,7 @@
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>	/* MAXHOSTNAMELEN */
+#include <sys/time.h>
 
 #include <netinet/in.h>
 #include <netinet/tcp.h>
@@ -203,7 +204,7 @@ tcp4_common_setup(const char *addr, void **ctxp, i
 	    sizeof(val)) == -1) {
 		pjdlog_warning("Unable to set receive buffer size on %s", addr);
 	}
-
+	
 	tctx->tc_side = side;
 	tctx->tc_magic = TCP4_CTX_MAGIC;
 	*ctxp = tctx;
@@ -214,8 +215,23 @@ tcp4_common_setup(const char *addr, void **ctxp, i
 static int
 tcp4_client(const char *addr, void **ctxp)
 {
+	struct tcp4_ctx *tctx;
+	struct timeval tv;
+	int ret;
 
-	return (tcp4_common_setup(addr, ctxp, TCP4_SIDE_CLIENT));
+	if ((ret = tcp4_common_setup(addr, ctxp, TCP4_SIDE_CLIENT)) != 0)
+		return (ret);
+
+	tctx = *ctxp;
+
+	tv.tv_sec = 300;
+	tv.tv_usec = 0;
+	if (setsockopt(tctx->tc_fd, SOL_SOCKET, SO_RCVTIMEO, &tv,
+	    sizeof(tv)) == -1) {
+		pjdlog_warning("Unable to set receive timeout %s", addr);
+	}
+
+	return (0);
 }
 
 static int
@@ -273,6 +289,7 @@ tcp4_accept(void *ctx, void **newctxp)
 {
 	struct tcp4_ctx *tctx = ctx;
 	struct tcp4_ctx *newtctx;
+	struct timeval tv;
 	socklen_t fromlen;
 	int ret;
 
@@ -294,6 +311,13 @@ tcp4_accept(void *ctx, void **newctxp)
 		return (ret);
 	}
 
+	tv.tv_sec = 300;
+	tv.tv_usec = 0;
+	if (setsockopt(newtctx->tc_fd, SOL_SOCKET, SO_RCVTIMEO, &tv,
+	    sizeof(tv)) == -1) {
+		pjdlog_debug(2, "Unable to set receive timeout");
+	}
+
 	newtctx->tc_side = TCP4_SIDE_SERVER_WORK;
 	newtctx->tc_magic = TCP4_CTX_MAGIC;
 	*newctxp = newtctx;

--=-=-=--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86tyqzeq84.fsf>