From owner-freebsd-net@FreeBSD.ORG Sat Apr 9 13:31:28 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C82D106566B for ; Sat, 9 Apr 2011 13:31:28 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id C75968FC08 for ; Sat, 9 Apr 2011 13:31:27 +0000 (UTC) Received: by fxm11 with SMTP id 11so3692486fxm.13 for ; Sat, 09 Apr 2011 06:31:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:sender:date:message-id :user-agent:mime-version:content-type; bh=A43Ey5zbe24JGQmMSmG3CNOGUIZrUqxFJJlo9an2JsA=; b=j4xSN4uYWTRF4u32K+sF1paGbaA/MFhc7PKbyNMAWSUjh6FvZA2cY6M4CI2cBmTcuJ rsNXuMoHUAqWnopih/zZNqPcjwkhRm46Q8NkwLwI3/PlAiQFSMk5bL21zY1q9Rtz6WsL wXQpGVet4edprHQLLzHYpCABbWB0x6nK/cVmI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:sender:date:message-id:user-agent:mime-version :content-type; b=HppG6k0OpfEhBnVAT2vqZAv02yK1seA5OS85BVn+n5MykT+KVyMID853ldinjE//wE i5qGDsxWoCsvkgDP8qKgD7F8U48p/y1mq7/K/Q4qeeRsAfDwS//hC4XIO+AkZ8WbEd+J T8k3Y6DlgEpeWAjha3x3lmWstSBpMAw8qI5nM= Received: by 10.223.7.73 with SMTP id c9mr2274026fac.117.1302354158089; Sat, 09 Apr 2011 06:02:38 -0700 (PDT) Received: from localhost ([95.69.172.154]) by mx.google.com with ESMTPS id j11sm1068937faa.44.2011.04.09.06.02.34 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 09 Apr 2011 06:02:37 -0700 (PDT) From: Mikolaj Golub To: freebsd-net@freebsd.org Sender: Mikolaj Golub Date: Sat, 09 Apr 2011 16:02:33 +0300 Message-ID: <868vvjmvbq.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Cc: Kostik Belousov , Pawel Jakub Dawidek Subject: recv() with MSG_WAITALL might stuck when receiving more than rcvbuf X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Apr 2011 13:31:28 -0000 --=-=-= Hi, When testing HAST synchronization running both primary and secondary HAST instances on the same host I faced an issue that the synchronization may be very slow: Apr 9 14:04:04 kopusha hastd[3812]: [test] (primary) Synchronization complete. 512MB synchronized in 16m38s (525KB/sec). hastd is synchronizing data in MAXPHYS (131072 bytes) blocks. Sending it splits them on smaller chunks of MAX_SEND_SIZE (32768 bytes), while receives the whole block calling recv() with MSG_WAITALL option. Sometimes recv() gets stuck: in tcpdump I see that sending side sent all chunks, all they were acked, but receiving thread is still waiting in recv(). netstat is reporting non empty Recv-Q for receiving side (with the amount of bytes usually equal to the size of last sent chunk). It looked like the receiving userspace was not informed by the kernel that all data had been arrived. I can reproduce the issue with the attached test_MSG_WAITALL.c. I think the issue is in soreceive_generic(). If MSG_WAITALL is set but the request is larger than the receive buffer, it has to do the receive in sections. So after receiving some data it notifies protocol (calls pr_usrreqs->pru_rcvd) about the data, releasing so_rcv lock. Returning it blocks in sbwait() waiting for the rest of data. I think there is a race: when it was in pr_usrreqs->pru_rcvd not keeping the lock the rest of data could arrive. Thus it should check for this before sbwait(). See the attached uipc_socket.c.soreceive.patch. The patch fixes the issue for me. Apr 9 14:16:40 kopusha hastd[2926]: [test] (primary) Synchronization complete. 512MB synchronized in 4s (128MB/sec). I observed the problem on STABLE but believe the same is on CURRENT. BTW, I also tried optimized version of soreceive(), soreceive_stream(). It does not have this problem. But with it I was observing tcp connections getting stuck in soreceive_stream() on firefox (with many tabs) or pidgin (with many contacts) start. The processes were killable only with -9. I did not investigate this much though. -- Mikolaj Golub --=-=-= Content-Type: application/octet-stream Content-Disposition: attachment; filename=test_MSG_WAITALL.c Content-Transfer-Encoding: base64 I2luY2x1ZGUgPHN5cy9wYXJhbS5oPgojaW5jbHVkZSA8c3lzL3NvY2tldC5oPgojaW5jbHVkZSA8 c3lzL3RpbWUuaD4KCiNpbmNsdWRlIDxuZXRpbmV0L2luLmg+CgojaW5jbHVkZSA8ZXJyLmg+CiNp bmNsdWRlIDxlcnJuby5oPgojaW5jbHVkZSA8c2lnbmFsLmg+CiNpbmNsdWRlIDxzdGRpby5oPgoj aW5jbHVkZSA8c3RkbGliLmg+CiNpbmNsdWRlIDxzdHJpbmcuaD4KI2luY2x1ZGUgPHVuaXN0ZC5o PgoKI2RlZmluZSBOCQkxMDAKCiNkZWZpbmUJU0VORF9TSVpFCTMyNzY4CiNkZWZpbmUgUkVDVl9T SVpFCVNFTkRfU0laRSAqIDEwCiNkZWZpbmUgU09CVUZfU0laRQlTRU5EX1NJWkUgKiAyCiNkZWZp bmUgUE9SVAkJMjM0ODIKCmludAptYWluKGludCBhcmdjLCBjaGFyICoqYXJndikKewoJc3RydWN0 IHNvY2thZGRyX2luIHNpbjsKCWludCBsaXN0ZW5mZCwgY29ubmZkLCBwaWQsIHNpemUsIGRvbmUs IHNlbmRzaXplLCBpLCB2YWw7CgljaGFyIGJ1ZltSRUNWX1NJWkVdOwoJY2hhciAqZGF0YTsKCXN0 cnVjdCB0aW1ldmFsIHRzdGFydCwgdGVuZCwgdGRpZmY7CQoJCglpZiAoKGxpc3RlbmZkID0gc29j a2V0KEFGX0lORVQsIFNPQ0tfU1RSRUFNLCAwKSkgPCAwKQoJCWVycigxLCAic29ja2V0IGVycm9y Iik7CgoJdmFsID0gU09CVUZfU0laRTsKCWlmIChzZXRzb2Nrb3B0KGxpc3RlbmZkLCBTT0xfU09D S0VULCBTT19TTkRCVUYsICZ2YWwsIAkgCgkJc2l6ZW9mKHZhbCkpIDwgMCkKCQllcnIoMSwgInNl dHNvY2tvcHQgZXJyb3IiKTsKCXZhbCA9IFNPQlVGX1NJWkU7IAkgCglpZiAoc2V0c29ja29wdChs aXN0ZW5mZCwgU09MX1NPQ0tFVCwgU09fUkNWQlVGLCAmdmFsLAoJCXNpemVvZih2YWwpKSA8IDAp CgkJZXJyKDEsICJzZXRzb2Nrb3B0IGVycm9yIik7CgoJbWVtc2V0KCZzaW4sIDAsIHNpemVvZihz aW4pKTsKCXNpbi5zaW5fZmFtaWx5ID0gQUZfSU5FVDsKCXNpbi5zaW5fcG9ydCA9IGh0b25zKFBP UlQpOwoKCWlmIChiaW5kKGxpc3RlbmZkLCAoc3RydWN0IHNvY2thZGRyICopICZzaW4sCgkJIHNp emVvZihzaW4pKSA8IDApCgkJZXJyKDEsICJiaW5kIGVycm9yIik7CgoJaWYgKGxpc3RlbihsaXN0 ZW5mZCwgMTAyNCkgPCAwKQoJCWVycigxLCAibGlzdGVuIGVycm9yIik7CgoJcGlkID0gZm9yaygp OwoKCWlmIChwaWQgPT0gLTEpCgkJZXJyKDEsICJmb3JrIGVycm9yIik7CgoJaWYgKHBpZCAhPSAw KSB7CgoJCWNsb3NlKGxpc3RlbmZkKTsKCQlzbGVlcCgxKTsKCQoJCWlmICgoY29ubmZkID0gc29j a2V0KEFGX0lORVQsIFNPQ0tfU1RSRUFNLCAwKSkgPCAwKSB7CgkJCSh2b2lkKWtpbGwocGlkLCBT SUdURVJNKTsKCQkJZXJyKDEsICJwYXJlbnQ6IHNvY2tldCBlcnJvciIpOwoJCX0KCQlpZiAoY29u bmVjdChjb25uZmQsIChzdHJ1Y3Qgc29ja2FkZHIgKikmc2luLAoJCQkgICAgc2l6ZW9mKHNpbikp IDwgMCkgewoJCQkodm9pZClraWxsKHBpZCwgU0lHVEVSTSk7CgkJCWVycigxLCAicGFyZW50OiBj b25uZWN0IGVycm9yIik7CgkJfQoKCQlmb3IgKGkgPSAwOyA7IGkrKykgewoJCQlkYXRhID0gYnVm OwoJCQlzaXplID0gc2l6ZW9mKGJ1Zik7CgoJCQlkbyB7CgkJCQlzZW5kc2l6ZSA9IHNpemUgPCBT RU5EX1NJWkUgPyBzaXplIDogU0VORF9TSVpFOwoJCQkJZG9uZSA9IHNlbmQoY29ubmZkLCBkYXRh LCBzZW5kc2l6ZSwgTVNHX05PU0lHTkFMKTsKCQkJCWlmIChkb25lID09IDApIHsKCQkJCQlicmVh azsKCQkJCX0gZWxzZSBpZiAoZG9uZSA8IDApIHsKCQkJCQkodm9pZClraWxsKHBpZCwgU0lHVEVS TSk7CgkJCQkJZXJyKDEsICJwYXJlbnQ6IHNlbmQgZXJyb3IiKTsKCQkJCX0KCQkJCWRhdGEgKz0g ZG9uZTsKCQkJCXNpemUgLT0gZG9uZTsKCQkJfSB3aGlsZSAoc2l6ZSA+IDApOwoKCQkJaWYgKHJl Y3YoY29ubmZkLCBidWYsIDEsIDApIDw9IDApCgkJCQllcnIoMSwgInBhcmVudDogY2FuJ3QgcmVj diIpOwoJCQlpZiAoaSAlIE4gPT0gMCkgewoJCQkJaWYgKGkgIT0gMCkgewoJCQkJCWdldHRpbWVv ZmRheSgmdGVuZCwgTlVMTCk7CgkJCQkJdGltZXJzdWIoJnRlbmQsICZ0c3RhcnQsICZ0ZGlmZik7 CgkJCQkJcHJpbnRmKCJwYXJlbnQgWyVkXTogc2VudCAlZCB0aW1lcyAlZCBieXRlcyAoJWQgdG90 YWwpIGluICVkIHNlYyAlZCB1c2VjXG4iLAoJCQkJCSAgICBpLCBOLCBzaXplb2YoYnVmKSwgTiAq IHNpemVvZihidWYpLCB0ZGlmZi50dl9zZWMsIHRkaWZmLnR2X3VzZWMpOwoJCQkJfQoJCQkJZ2V0 dGltZW9mZGF5KCZ0c3RhcnQsIE5VTEwpOwoJCQl9CgkJfQoJCWlmIChjbG9zZShjb25uZmQpIDwg MCkgewoJCQkodm9pZClraWxsKHBpZCwgU0lHVEVSTSk7CgkJCWVycigxLCAicGFyZW50OiBjbG9z ZSBlcnJvciIpOwoJCX0KCgl9IGVsc2UgewoJCWlmICgoY29ubmZkID0gYWNjZXB0KGxpc3RlbmZk LCAoc3RydWN0IHNvY2thZGRyICopTlVMTCwKCQkJCSAgICAgTlVMTCkpIDwgMCkKCQkJZXJyKDEs ICJjaGlsZDogYWNjZXB0IGVycm9yIik7CgkJCgkJZm9yKGkgPTA7IDtpKyspIHsKCgkJCWRvIHsK CQkJCWRvbmUgPSByZWN2KGNvbm5mZCwgYnVmLCBzaXplb2YoYnVmKSwgTVNHX1dBSVRBTEwpOwoJ CQl9IHdoaWxlIChkb25lID09IC0xICYmIGVycm5vID09IEVJTlRSKTsKCgkJCWlmIChkb25lID09 IDApCgkJCQllcnJ4KDEsICJjaGlsZDogY29ubmVjdGlvbiBjbG9zZWQiKTsKCQkJZWxzZSBpZiAo ZG9uZSA8IDApCgkJCQllcnIoMSwgImNoaWxkOiByZWN2IGVycm9yIik7CgoJCQlpZiAoc2VuZChj b25uZmQsIGJ1ZiwgMSwgTVNHX05PU0lHTkFMKSA8PSAwKQoJCQkJZXJyKDEsICJjaGlsZDogY2Fu J3Qgc2VuZCIpOwoJCQlpZiAoaSAlIE4gPT0gMCAmJiBpICE9IDApCgkJCQlwcmludGYoImNoaWxk IFslaV06IHJlY2VpdmVkICVkIHRpbWVzICVkIGJ5dGVzXG4iLAoJCQkJICAgIGksIE4sIHNpemVv ZihidWYpKTsKCQl9CgkJaWYgKGNsb3NlKGNvbm5mZCkgPCAwKQoJCQllcnIoMSwgImNoaWxkOiBj bG9zZSBlcnJvciIpOwoJfQoJZXhpdCgwKTsKfQo= --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=uipc_socket.c.soreceive.patch Index: sys/kern/uipc_socket.c =================================================================== --- sys/kern/uipc_socket.c (revision 220472) +++ sys/kern/uipc_socket.c (working copy) @@ -1836,28 +1836,34 @@ dontblock: /* * Notify the protocol that some data has been * drained before blocking. */ if (pr->pr_flags & PR_WANTRCVD) { SOCKBUF_UNLOCK(&so->so_rcv); VNET_SO_ASSERT(so); (*pr->pr_usrreqs->pru_rcvd)(so, flags); SOCKBUF_LOCK(&so->so_rcv); } SBLASTRECORDCHK(&so->so_rcv); SBLASTMBUFCHK(&so->so_rcv); - error = sbwait(&so->so_rcv); - if (error) { - SOCKBUF_UNLOCK(&so->so_rcv); - goto release; + /* + * We could receive some data while was notifying the + * the protocol. Skip blocking in this case. + */ + if (so->so_rcv.sb_mb == NULL) { + error = sbwait(&so->so_rcv); + if (error) { + SOCKBUF_UNLOCK(&so->so_rcv); + goto release; + } } m = so->so_rcv.sb_mb; if (m != NULL) nextrecord = m->m_nextpkt; } } SOCKBUF_LOCK_ASSERT(&so->so_rcv); if (m != NULL && pr->pr_flags & PR_ATOMIC) { flags |= MSG_TRUNC; if ((flags & MSG_PEEK) == 0) (void) sbdroprecord_locked(&so->so_rcv); --=-=-=--