From owner-freebsd-net@FreeBSD.ORG Sun Aug 17 19:15:20 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 26DB61065671 for ; Sun, 17 Aug 2008 19:15:20 +0000 (UTC) (envelope-from freebsd@chrisbuechler.com) Received: from mail.livebsd.com (mail.livebsd.com [69.64.6.14]) by mx1.freebsd.org (Postfix) with SMTP id D80348FC16 for ; Sun, 17 Aug 2008 19:15:19 +0000 (UTC) (envelope-from freebsd@chrisbuechler.com) Received: (qmail 92550 invoked by uid 89); 17 Aug 2008 19:15:18 -0000 Received: from unknown (HELO ?10.0.64.15?) (74.130.92.110) by 172.29.29.14 with SMTP; 17 Aug 2008 19:15:18 -0000 Message-ID: <48A878C6.9000001@chrisbuechler.com> Date: Sun, 17 Aug 2008 15:15:18 -0400 From: Chris Buechler User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: repeatable scp stalls from 7.0 to 7.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Aug 2008 19:15:20 -0000 I've been seeing pretty frequent and repeatable scp stalls between two FreeBSD 7.0 servers (7.0-RELEASE-p2 to be exact) on a 100 Mb LAN. They're two HP servers, an Opteron 275 and a dual Xeon 3.4 (don't recall the models but I can get them if it's relevant) using the onboard bge(4) cards. The client side (builder7) SCPs a file to the server side (hosting7) about 20 times a day. The stall happens about 2-4 times a week or so, and has happened ever since we put these two boxes online in their current functions. Initially they were the original 7.0 release, prior to the TCP fix in June. It's behaved the same way both prior to and after that fix. There are no apparent network issues aside from this with either of the boxes. Since we had nothing to go on other than scp sessions going to "stalled" (no relevant logs), I setup a tcpdump on each end filtering on the TCP 22 traffic between these hosts, grabbing 100 bytes of each frame to avoid chewing up too much disk space. When it happened again I split the end out into its own file with editcap, 4.2-4.3 MB each. http://chrisbuechler.com/temp/lastcut-hosting7.pcap - server end, capture taken on host but destination IP is a jail http://chrisbuechler.com/temp/lastcut-builder7.pcap - client end, connection is initiated from the host, no jails involved. The TCP window on the ACKs from server to client start decrementing [1], to the point where it's down to a window of 0. From that point, everything the server (172.29.29.181 ) sends back to the client (172.29.29.170 ) has a window of 0. Restarting the scp makes it work again. It doesn't happen every time, somewhere around 2-3% of the time it does. I don't see any cause for the decrementing window in those captures but maybe I'm missing something. 1 - lastcut-hosting7.pcap frame #21298; lastcut-builder7.pcap #25088 These are both very stock boxes, GENERIC kernels, no significant changes in sysctl or anything else. I'm not sure where to go from here, any assistance in resolving this would be appreciated. cheers, Chris