Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Apr 2010 23:46:36 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Mikolaj Golub <to.my.trociny@gmail.com>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: HAST: primary might get stuck when there are connectivity problems with secondary
Message-ID:  <20100428214636.GD1677@garage.freebsd.pl>
In-Reply-To: <86tyqzeq84.fsf@kopusha.onet>
References:  <86r5m9dvqf.fsf@zhuzha.ua1> <20100423062950.GD1670@garage.freebsd.pl> <86k4rye33e.fsf@zhuzha.ua1> <20100424073031.GD3067@garage.freebsd.pl> <868w8dgk4e.fsf@kopusha.onet> <86tyqzeq84.fsf@kopusha.onet>

next in thread | previous in thread | raw e-mail | index | archive | help

--zS7rBR6csb6tI2e1
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Apr 25, 2010 at 02:17:15PM +0300, Mikolaj Golub wrote:
> On Sat, 24 Apr 2010 14:33:53 +0300 Mikolaj Golub wrote:
>=20
> > From the code I don't see how hast_proto_recv_hdr() may timeout if the
> > connection is alive, have I missed something?
>=20
> I did some experiments adding the code that sets SO_RCVTIMEO socket option
> (see the attached patch). It fixes this issue. After timeout the worker o=
n the
> secondary is restarted with the error:
>=20
> Apr 25 13:06:45 hastb hastd: [storage] (secondary) Unable to receive requ=
est header: Resource temporarily unavailable.
> Apr 25 13:06:45 hastb hastd: [storage] (secondary) Worker process (pid=3D=
1243) exited ungracefully: status=3D19200.
>=20
> On the other hand when the FS is idle (there is no I/O at all) we have the
> worker restart too and the primary is not being connected to the secondary
> until some I/O appears. So it might look not very nicely :-)
>=20
> Also note, I had to modify proto_common_recv() to have timeout working. A=
fter
> timeout recv() sets errno to EWOULDBLOCK, which has the same number as EA=
GAIN
> in FreeBSD. The current proto_common_recv() restarts recv() if EAGAIN is
> returned.

Could you see if the following patch fixes the problem for you:

	http://people.freebsd.org/~pjd/patches/hastd_timeout.patch

The patch sets timeout on both incoming and outgoing sockets on primary
and on outgoing socket on secondary. Incoming socket on secondary is
left with no timeout to avoid problem you described above.

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--zS7rBR6csb6tI2e1
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)

iEYEARECAAYFAkvYrLwACgkQForvXbEpPzRkMACeMd+9AKiccA5DguLCLmL9YN59
q28Anj2eo1PDEFxf+xjqeU9fpv+yHmBn
=wUGw
-----END PGP SIGNATURE-----

--zS7rBR6csb6tI2e1--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100428214636.GD1677>