Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 09 Apr 2011 16:02:33 +0300
From:      Mikolaj Golub <trociny@freebsd.org>
To:        freebsd-net@freebsd.org
Cc:        Kostik Belousov <kib@FreeBSD.org>, Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject:   recv() with MSG_WAITALL might stuck when receiving more than rcvbuf
Message-ID:  <868vvjmvbq.fsf@kopusha.home.net>

next in thread | raw e-mail | index | archive | help
--=-=-=

Hi,

When testing HAST synchronization running both primary and secondary HAST
instances on the same host I faced an issue that the synchronization may be
very slow:

Apr  9 14:04:04 kopusha hastd[3812]: [test] (primary) Synchronization complete. 512MB synchronized in 16m38s (525KB/sec).

hastd is synchronizing data in MAXPHYS (131072 bytes) blocks. Sending it
splits them on smaller chunks of MAX_SEND_SIZE (32768 bytes), while receives
the whole block calling recv() with MSG_WAITALL option.

Sometimes recv() gets stuck: in tcpdump I see that sending side sent all
chunks, all they were acked, but receiving thread is still waiting in
recv(). netstat is reporting non empty Recv-Q for receiving side (with the
amount of bytes usually equal to the size of last sent chunk). It looked like
the receiving userspace was not informed by the kernel that all data had been
arrived.

I can reproduce the issue with the attached test_MSG_WAITALL.c.

I think the issue is in soreceive_generic(). 

If MSG_WAITALL is set but the request is larger than the receive buffer, it
has to do the receive in sections. So after receiving some data it notifies
protocol (calls pr_usrreqs->pru_rcvd) about the data, releasing so_rcv
lock. Returning it blocks in sbwait() waiting for the rest of data. I think
there is a race: when it was in pr_usrreqs->pru_rcvd not keeping the lock the
rest of data could arrive. Thus it should check for this before sbwait().

See the attached uipc_socket.c.soreceive.patch. The patch fixes the issue for
me.

Apr  9 14:16:40 kopusha hastd[2926]: [test] (primary) Synchronization complete. 512MB synchronized in 4s (128MB/sec).

I observed the problem on STABLE but believe the same is on CURRENT.

BTW, I also tried optimized version of soreceive(), soreceive_stream(). It
does not have this problem. But with it I was observing tcp connections
getting stuck in soreceive_stream() on firefox (with many tabs) or pidgin
(with many contacts) start. The processes were killable only with -9. I did
not investigate this much though.

-- 
Mikolaj Golub


--=-=-=
Content-Type: application/octet-stream
Content-Disposition: attachment; filename=test_MSG_WAITALL.c
Content-Transfer-Encoding: base64

I2luY2x1ZGUgPHN5cy9wYXJhbS5oPgojaW5jbHVkZSA8c3lzL3NvY2tldC5oPgojaW5jbHVkZSA8
c3lzL3RpbWUuaD4KCiNpbmNsdWRlIDxuZXRpbmV0L2luLmg+CgojaW5jbHVkZSA8ZXJyLmg+CiNp
bmNsdWRlIDxlcnJuby5oPgojaW5jbHVkZSA8c2lnbmFsLmg+CiNpbmNsdWRlIDxzdGRpby5oPgoj
aW5jbHVkZSA8c3RkbGliLmg+CiNpbmNsdWRlIDxzdHJpbmcuaD4KI2luY2x1ZGUgPHVuaXN0ZC5o
PgoKI2RlZmluZSBOCQkxMDAKCiNkZWZpbmUJU0VORF9TSVpFCTMyNzY4CiNkZWZpbmUgUkVDVl9T
SVpFCVNFTkRfU0laRSAqIDEwCiNkZWZpbmUgU09CVUZfU0laRQlTRU5EX1NJWkUgKiAyCiNkZWZp
bmUgUE9SVAkJMjM0ODIKCmludAptYWluKGludCBhcmdjLCBjaGFyICoqYXJndikKewoJc3RydWN0
IHNvY2thZGRyX2luIHNpbjsKCWludCBsaXN0ZW5mZCwgY29ubmZkLCBwaWQsIHNpemUsIGRvbmUs
IHNlbmRzaXplLCBpLCB2YWw7CgljaGFyIGJ1ZltSRUNWX1NJWkVdOwoJY2hhciAqZGF0YTsKCXN0
cnVjdCB0aW1ldmFsIHRzdGFydCwgdGVuZCwgdGRpZmY7CQoJCglpZiAoKGxpc3RlbmZkID0gc29j
a2V0KEFGX0lORVQsIFNPQ0tfU1RSRUFNLCAwKSkgPCAwKQoJCWVycigxLCAic29ja2V0IGVycm9y
Iik7CgoJdmFsID0gU09CVUZfU0laRTsKCWlmIChzZXRzb2Nrb3B0KGxpc3RlbmZkLCBTT0xfU09D
S0VULCBTT19TTkRCVUYsICZ2YWwsIAkgCgkJc2l6ZW9mKHZhbCkpIDwgMCkKCQllcnIoMSwgInNl
dHNvY2tvcHQgZXJyb3IiKTsKCXZhbCA9IFNPQlVGX1NJWkU7IAkgCglpZiAoc2V0c29ja29wdChs
aXN0ZW5mZCwgU09MX1NPQ0tFVCwgU09fUkNWQlVGLCAmdmFsLAoJCXNpemVvZih2YWwpKSA8IDAp
CgkJZXJyKDEsICJzZXRzb2Nrb3B0IGVycm9yIik7CgoJbWVtc2V0KCZzaW4sIDAsIHNpemVvZihz
aW4pKTsKCXNpbi5zaW5fZmFtaWx5ID0gQUZfSU5FVDsKCXNpbi5zaW5fcG9ydCA9IGh0b25zKFBP
UlQpOwoKCWlmIChiaW5kKGxpc3RlbmZkLCAoc3RydWN0IHNvY2thZGRyICopICZzaW4sCgkJIHNp
emVvZihzaW4pKSA8IDApCgkJZXJyKDEsICJiaW5kIGVycm9yIik7CgoJaWYgKGxpc3RlbihsaXN0
ZW5mZCwgMTAyNCkgPCAwKQoJCWVycigxLCAibGlzdGVuIGVycm9yIik7CgoJcGlkID0gZm9yaygp
OwoKCWlmIChwaWQgPT0gLTEpCgkJZXJyKDEsICJmb3JrIGVycm9yIik7CgoJaWYgKHBpZCAhPSAw
KSB7CgoJCWNsb3NlKGxpc3RlbmZkKTsKCQlzbGVlcCgxKTsKCQoJCWlmICgoY29ubmZkID0gc29j
a2V0KEFGX0lORVQsIFNPQ0tfU1RSRUFNLCAwKSkgPCAwKSB7CgkJCSh2b2lkKWtpbGwocGlkLCBT
SUdURVJNKTsKCQkJZXJyKDEsICJwYXJlbnQ6IHNvY2tldCBlcnJvciIpOwoJCX0KCQlpZiAoY29u
bmVjdChjb25uZmQsIChzdHJ1Y3Qgc29ja2FkZHIgKikmc2luLAoJCQkgICAgc2l6ZW9mKHNpbikp
IDwgMCkgewoJCQkodm9pZClraWxsKHBpZCwgU0lHVEVSTSk7CgkJCWVycigxLCAicGFyZW50OiBj
b25uZWN0IGVycm9yIik7CgkJfQoKCQlmb3IgKGkgPSAwOyA7IGkrKykgewoJCQlkYXRhID0gYnVm
OwoJCQlzaXplID0gc2l6ZW9mKGJ1Zik7CgoJCQlkbyB7CgkJCQlzZW5kc2l6ZSA9IHNpemUgPCBT
RU5EX1NJWkUgPyBzaXplIDogU0VORF9TSVpFOwoJCQkJZG9uZSA9IHNlbmQoY29ubmZkLCBkYXRh
LCBzZW5kc2l6ZSwgTVNHX05PU0lHTkFMKTsKCQkJCWlmIChkb25lID09IDApIHsKCQkJCQlicmVh
azsKCQkJCX0gZWxzZSBpZiAoZG9uZSA8IDApIHsKCQkJCQkodm9pZClraWxsKHBpZCwgU0lHVEVS
TSk7CgkJCQkJZXJyKDEsICJwYXJlbnQ6IHNlbmQgZXJyb3IiKTsKCQkJCX0KCQkJCWRhdGEgKz0g
ZG9uZTsKCQkJCXNpemUgLT0gZG9uZTsKCQkJfSB3aGlsZSAoc2l6ZSA+IDApOwoKCQkJaWYgKHJl
Y3YoY29ubmZkLCBidWYsIDEsIDApIDw9IDApCgkJCQllcnIoMSwgInBhcmVudDogY2FuJ3QgcmVj
diIpOwoJCQlpZiAoaSAlIE4gPT0gMCkgewoJCQkJaWYgKGkgIT0gMCkgewoJCQkJCWdldHRpbWVv
ZmRheSgmdGVuZCwgTlVMTCk7CgkJCQkJdGltZXJzdWIoJnRlbmQsICZ0c3RhcnQsICZ0ZGlmZik7
CgkJCQkJcHJpbnRmKCJwYXJlbnQgWyVkXTogc2VudCAlZCB0aW1lcyAlZCBieXRlcyAoJWQgdG90
YWwpIGluICVkIHNlYyAlZCB1c2VjXG4iLAoJCQkJCSAgICBpLCBOLCBzaXplb2YoYnVmKSwgTiAq
IHNpemVvZihidWYpLCB0ZGlmZi50dl9zZWMsIHRkaWZmLnR2X3VzZWMpOwoJCQkJfQoJCQkJZ2V0
dGltZW9mZGF5KCZ0c3RhcnQsIE5VTEwpOwoJCQl9CgkJfQoJCWlmIChjbG9zZShjb25uZmQpIDwg
MCkgewoJCQkodm9pZClraWxsKHBpZCwgU0lHVEVSTSk7CgkJCWVycigxLCAicGFyZW50OiBjbG9z
ZSBlcnJvciIpOwoJCX0KCgl9IGVsc2UgewoJCWlmICgoY29ubmZkID0gYWNjZXB0KGxpc3RlbmZk
LCAoc3RydWN0IHNvY2thZGRyICopTlVMTCwKCQkJCSAgICAgTlVMTCkpIDwgMCkKCQkJZXJyKDEs
ICJjaGlsZDogYWNjZXB0IGVycm9yIik7CgkJCgkJZm9yKGkgPTA7IDtpKyspIHsKCgkJCWRvIHsK
CQkJCWRvbmUgPSByZWN2KGNvbm5mZCwgYnVmLCBzaXplb2YoYnVmKSwgTVNHX1dBSVRBTEwpOwoJ
CQl9IHdoaWxlIChkb25lID09IC0xICYmIGVycm5vID09IEVJTlRSKTsKCgkJCWlmIChkb25lID09
IDApCgkJCQllcnJ4KDEsICJjaGlsZDogY29ubmVjdGlvbiBjbG9zZWQiKTsKCQkJZWxzZSBpZiAo
ZG9uZSA8IDApCgkJCQllcnIoMSwgImNoaWxkOiByZWN2IGVycm9yIik7CgoJCQlpZiAoc2VuZChj
b25uZmQsIGJ1ZiwgMSwgTVNHX05PU0lHTkFMKSA8PSAwKQoJCQkJZXJyKDEsICJjaGlsZDogY2Fu
J3Qgc2VuZCIpOwoJCQlpZiAoaSAlIE4gPT0gMCAmJiBpICE9IDApCgkJCQlwcmludGYoImNoaWxk
IFslaV06IHJlY2VpdmVkICVkIHRpbWVzICVkIGJ5dGVzXG4iLAoJCQkJICAgIGksIE4sIHNpemVv
ZihidWYpKTsKCQl9CgkJaWYgKGNsb3NlKGNvbm5mZCkgPCAwKQoJCQllcnIoMSwgImNoaWxkOiBj
bG9zZSBlcnJvciIpOwoJfQoJZXhpdCgwKTsKfQo=
--=-=-=
Content-Type: text/x-patch
Content-Disposition: attachment; filename=uipc_socket.c.soreceive.patch

Index: sys/kern/uipc_socket.c
===================================================================
--- sys/kern/uipc_socket.c	(revision 220472)
+++ sys/kern/uipc_socket.c	(working copy)
@@ -1836,28 +1836,34 @@ dontblock:
 			/*
 			 * Notify the protocol that some data has been
 			 * drained before blocking.
 			 */
 			if (pr->pr_flags & PR_WANTRCVD) {
 				SOCKBUF_UNLOCK(&so->so_rcv);
 				VNET_SO_ASSERT(so);
 				(*pr->pr_usrreqs->pru_rcvd)(so, flags);
 				SOCKBUF_LOCK(&so->so_rcv);
 			}
 			SBLASTRECORDCHK(&so->so_rcv);
 			SBLASTMBUFCHK(&so->so_rcv);
-			error = sbwait(&so->so_rcv);
-			if (error) {
-				SOCKBUF_UNLOCK(&so->so_rcv);
-				goto release;
+			/*
+			 * We could receive some data while was notifying the
+			 * the protocol. Skip blocking in this case.
+			 */
+			if (so->so_rcv.sb_mb == NULL) {
+				error = sbwait(&so->so_rcv);
+				if (error) {
+					SOCKBUF_UNLOCK(&so->so_rcv);
+					goto release;
+				}
 			}
 			m = so->so_rcv.sb_mb;
 			if (m != NULL)
 				nextrecord = m->m_nextpkt;
 		}
 	}
 
 	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 	if (m != NULL && pr->pr_flags & PR_ATOMIC) {
 		flags |= MSG_TRUNC;
 		if ((flags & MSG_PEEK) == 0)
 			(void) sbdroprecord_locked(&so->so_rcv);

--=-=-=--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?868vvjmvbq.fsf>