From owner-freebsd-net@FreeBSD.ORG Mon Nov 24 02:00:50 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5AA32106564A for ; Mon, 24 Nov 2008 02:00:50 +0000 (UTC) (envelope-from andreas.carbin@run.se) Received: from mailgw.upg.se (h-89-233-252-20.wholesale.rp80.se [89.233.252.20]) by mx1.freebsd.org (Postfix) with ESMTP id D75FD8FC08 for ; Mon, 24 Nov 2008 02:00:49 +0000 (UTC) (envelope-from andreas.carbin@run.se) Received: from localhost (localhost [127.0.0.1]) by mailgw.upg.se (Postfix) with ESMTP id 78F583AF4E5; Mon, 24 Nov 2008 02:36:56 +0100 (CET) X-Virus-Scanned: amavisd-new at upg.se Received: from mailgw.upg.se ([127.0.0.1]) by localhost (mailgw.upg.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aUxzEqavjikH; Mon, 24 Nov 2008 02:36:51 +0100 (CET) Received: from mail03.upg.se (mail03.upg.se [77.72.100.141]) by mailgw.upg.se (Postfix) with ESMTP; Mon, 24 Nov 2008 02:36:51 +0100 (CET) Received: from mail03.upg.se (mail03.upg.se [77.72.100.141]) by mail03.upg.se (Postfix) with ESMTP id ED4A224014B; Mon, 24 Nov 2008 02:36:22 +0100 (CET) Date: Mon, 24 Nov 2008 02:36:22 +0100 (CET) From: Andreas Carbin To: freebsd-net@freebsd.org Message-ID: <19347937.95871227490582742.JavaMail.root@mail03> In-Reply-To: <8072814.95851227489833192.JavaMail.root@mail03> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [213.115.92.164] X-Mailer: Zimbra 5.0.10_GA_2638.UBUNTU8 (ZimbraWebClient - FF3.0 (Win)/5.0.10_GA_2638.UBUNTU8) Cc: andreas Subject: FreeBSD 7.0 / Recv-Q full ? / win 0 ? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2008 02:00:50 -0000 Hello all, I have the following issue with my (quite newly installed) FreeBSD 7.0 machines: (I use "FreeBSD 7.0-RELEASE-p5 #0: Wed Oct 1 07:51:58 UTC 2008" on Dell PowerEdge 2970.) When I copy large files with SCP from one host to another the destination host's recieve queue seems to fill up after a random number of seconds (10 - 300) with about 89.000 bytes, and the destination host sends Window Size = 0 to the sender. This means no data is transferred and the connection has "locked up" in some way (true?). This almost always happens when I copy a file from one host to another where there is a WAN connection between them. I have checked firewall rules - these are open to almost any traffic. (I have seen it happen between two locally connected machines also.) When copying with SCP starts, it runs perfectly at about 10 megabyte/s (100Mbit/s WAN network). A 3 GB file may succeed <5%. Error occurrs in about 10 to 300 seconds - then all payload data traffic stops. The TCP connection is still open. My guess was that maybe we get errors when copying this fast "close to thoeretical limit", so I used "scp -l " where I specified as 50 and 5 Mbit/s. This reduces speed perfectly, but gives me the same errors as in full speed. I have also tried (with no good results): * net.inet.tcp.rfc1323 (on and off) * net.inet.tcp.tso (on and off) * RCXSUM and TXCSUM on and off * change from on-board bce0 / Broadcom NetXtreme II BCM5708 1000Base-T to em0 / Intel(R) PRO/1000 Network Connection Version - 6.7.3 * setting net.inet.tcp.recvbuf_max: 16777216 * setting net.inet.tcp.sendbuf_max: 16777216 One really strange thing is that I can make the copy continue (!) with full data transfer if I truss the ssh process on the destination machine. So if I truss with output to /dev/null in the background all the copy completes (!!!!). This is a tcpdump on destination host of SCP's TCP connection when no data is transferred: 15:56:17.798079 IP sender_host.51296 > destination_host.ssh: . 8:9(1) ack 1 win 33304 15:56:17.897407 IP destination_host.ssh > sender_host.51296: . ack 9 win 0 15:56:22.797808 IP sender_host.51296 > destination_host.ssh: . 9:10(1) ack 1 win 33304 15:56:22.897457 IP destination_host.ssh > sender_host.51296: . ack 10 win 0 15:56:27.797913 IP sender_host.51296 > destination_host.ssh: . 10:11(1) ack 1 win 33304 15:56:27.897508 IP destination_host.ssh > sender_host.51296: . ack 11 win 0 15:56:32.798016 IP sender_host.51296 > destination_host.ssh: . 11:12(1) ack 1 win 33304 15:56:32.897559 IP destination_host.ssh > sender_host.51296: . ack 12 win 0 15:56:37.798119 IP sender_host.51296 > destination_host.ssh: . 12:13(1) ack 1 win 33304 15:56:37.897610 IP destination_host.ssh > sender_host.51296: . ack 13 win 0 Does enyone have an idea what this might be? The error occurs when the receiving host is a FreeBSD 7.0 host (the sender can be 7.0 or 6.2 accoriding to my tests). Thank you, //Andreas ------------------------------------------------------- Andreas Carbin RUN Communications AB http://www.run.se E-mail: andreas.carbin@run.se -------------------------------------------------------