Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Apr 2010 11:03:33 +0300
From:      Mikolaj Golub <to.my.trociny@gmail.com>
To:        Pawel Jakub Dawidek <pjd@FreeBSD.org>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: HAST: primary might get stuck when there are connectivity problems with secondary
Message-ID:  <86mxwmk7my.fsf@zhuzha.ua1>
In-Reply-To: <20100428214636.GD1677@garage.freebsd.pl> (Pawel Jakub Dawidek's message of "Wed, 28 Apr 2010 23:46:36 %2B0200")
References:  <86r5m9dvqf.fsf@zhuzha.ua1> <20100423062950.GD1670@garage.freebsd.pl> <86k4rye33e.fsf@zhuzha.ua1> <20100424073031.GD3067@garage.freebsd.pl> <868w8dgk4e.fsf@kopusha.onet> <86tyqzeq84.fsf@kopusha.onet> <20100428214636.GD1677@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help

On Wed, 28 Apr 2010 23:46:36 +0200 Pawel Jakub Dawidek wrote:

 PJD> Could you see if the following patch fixes the problem for you:

 PJD>         http://people.freebsd.org/~pjd/patches/hastd_timeout.patch

 PJD> The patch sets timeout on both incoming and outgoing sockets on primary
 PJD> and on outgoing socket on secondary. Incoming socket on secondary is
 PJD> left with no timeout to avoid problem you described above.

The patch works for me.

After disabling the network connection between the primary and the secondary
FS operations on the primary do not get stuck and the following messages are
observed:

Apr 29 10:37:41 hasta hastd: [storage] (primary) Unable to receive reply header: Resource temporarily unavailable.
Apr 29 10:37:57 hasta hastd: [tank] (primary) Unable to receive reply header: Resource temporarily unavailable.
Apr 29 10:37:57 hasta hastd: [tank] (primary) Unable to send request (Resource temporarily unavailable): WRITE(972292096, 14336).
Apr 29 10:38:56 hasta hastd: [storage] (primary) Unable to connect to 172.20.66.202: Operation timed out.
Apr 29 10:39:12 hasta hastd: [tank] (primary) Unable to connect to 172.20.66.202: Operation timed out.

After restoring the network connection the primary reconnects to the secondary
and the status changes back from "degraded" to "complete".

Thank you.

-- 
Mikolaj Golub



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86mxwmk7my.fsf>