Date: Thu, 29 Apr 2010 10:12:00 +0200 From: Pawel Jakub Dawidek <pjd@FreeBSD.org> To: Mikolaj Golub <to.my.trociny@gmail.com> Cc: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: HAST: primary might get stuck when there are connectivity problems with secondary Message-ID: <20100429081200.GB1697@garage.freebsd.pl> In-Reply-To: <86mxwmk7my.fsf@zhuzha.ua1> References: <86r5m9dvqf.fsf@zhuzha.ua1> <20100423062950.GD1670@garage.freebsd.pl> <86k4rye33e.fsf@zhuzha.ua1> <20100424073031.GD3067@garage.freebsd.pl> <868w8dgk4e.fsf@kopusha.onet> <86tyqzeq84.fsf@kopusha.onet> <20100428214636.GD1677@garage.freebsd.pl> <86mxwmk7my.fsf@zhuzha.ua1>
next in thread | previous in thread | raw e-mail | index | archive | help
--E13BgyNx05feLLmH Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 29, 2010 at 11:03:33AM +0300, Mikolaj Golub wrote: >=20 > On Wed, 28 Apr 2010 23:46:36 +0200 Pawel Jakub Dawidek wrote: >=20 > PJD> Could you see if the following patch fixes the problem for you: >=20 > PJD> http://people.freebsd.org/~pjd/patches/hastd_timeout.patch >=20 > PJD> The patch sets timeout on both incoming and outgoing sockets on pri= mary > PJD> and on outgoing socket on secondary. Incoming socket on secondary is > PJD> left with no timeout to avoid problem you described above. >=20 > The patch works for me. >=20 > After disabling the network connection between the primary and the second= ary > FS operations on the primary do not get stuck and the following messages = are > observed: >=20 > Apr 29 10:37:41 hasta hastd: [storage] (primary) Unable to receive reply = header: Resource temporarily unavailable. > Apr 29 10:37:57 hasta hastd: [tank] (primary) Unable to receive reply hea= der: Resource temporarily unavailable. > Apr 29 10:37:57 hasta hastd: [tank] (primary) Unable to send request (Res= ource temporarily unavailable): WRITE(972292096, 14336). > Apr 29 10:38:56 hasta hastd: [storage] (primary) Unable to connect to 172= .20.66.202: Operation timed out. > Apr 29 10:39:12 hasta hastd: [tank] (primary) Unable to connect to 172.20= .66.202: Operation timed out. >=20 > After restoring the network connection the primary reconnects to the seco= ndary > and the status changes back from "degraded" to "complete". Good. And I assume you don't observe problems on secondary? Eg. recv(2) on secondary doesn't timeout? --=20 Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --E13BgyNx05feLLmH Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkvZP1AACgkQForvXbEpPzQFQwCgiD1RGHI73+QgfVJ+kxGCXT2/ MZ4An2CG/Dlvk7zDa0IlfvhdCoJOHzst =iuSN -----END PGP SIGNATURE----- --E13BgyNx05feLLmH--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100429081200.GB1697>