From owner-freebsd-net@FreeBSD.ORG Sat Jan 4 18:38:57 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6F58F58F for ; Sat, 4 Jan 2014 18:38:57 +0000 (UTC) Received: from mail-gg0-x22b.google.com (mail-gg0-x22b.google.com [IPv6:2607:f8b0:4002:c02::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 21FC716C2 for ; Sat, 4 Jan 2014 18:38:57 +0000 (UTC) Received: by mail-gg0-f171.google.com with SMTP id j1so3252759ggn.16 for ; Sat, 04 Jan 2014 10:38:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wemm.org; s=google; h=message-id:date:from:organization:user-agent:mime-version:to:cc :subject:content-type; bh=JSAgrF8ut7FMcZd7+bElnPVhJyl9ZTw49Z7HazO15Xs=; b=t8DcWP1tukq3H9s/eKwdmH9l6AJl2pnrRjxGR2B1Q4GeoZqoq9gUdL84mM/C6JRzgI /1RHlTCVqDQh1LWz1+GnRJIgU61fD5Jciw4ge4JBmd2S5+DfXjCPnngoyL4f5LlsPR+1 n1gkvd8itgKDtgfLC7tn8EMzzHIT4vKNbMOxg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:organization:user-agent :mime-version:to:cc:subject:content-type; bh=JSAgrF8ut7FMcZd7+bElnPVhJyl9ZTw49Z7HazO15Xs=; b=FJGSJ2M5WvOQeeIjOjRDh7RS2YIZLlSJP1vrR8RFdS5cS/8E1j1kLCZ5V49iEPByqU oBgSq2I5EduHUkceSIBdcDYm0iUJesClGFS2QECSeym5jPRI4r7CAH2i/0eDh/mKEeJF Ts8mCdU/Wi89QYZ/abGJfNQcznEoqx4uCJdtN9TjCeuUxKOcuVw4KFtqEximlI76qYBO LCKV806kA1IuQa+vYhNOHWt5f7CddUcHPSzzkjn9kXyJNTQmY/B4MzxtB2hzofDDK4ZY 5n4VV2W37O7zODDBybBO+rPI2qhDOodpw+BhkG+R2ZnN6W4017dKzEi2tP0YJgCYhTpG Cvtg== X-Gm-Message-State: ALoCoQmh3gpqn8aLvkZ3inP4d+NUg6qQu7Wb2zZ8u4z8VoaEqOj+9fSnHBzwaaNNaUaOHCqkrozT X-Received: by 10.236.84.226 with SMTP id s62mr2957048yhe.112.1388860736300; Sat, 04 Jan 2014 10:38:56 -0800 (PST) Received: from hackintosh.wemm.org ([2601:9:e80:770:7885:d41f:f3a8:eada]) by mx.google.com with ESMTPSA id k76sm9807111yho.18.2014.01.04.10.38.53 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 04 Jan 2014 10:38:54 -0800 (PST) Message-ID: <52C85537.7080307@wemm.org> Date: Sat, 04 Jan 2014 10:38:47 -0800 From: Peter Wemm Organization: World Domination in progress. User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: freebsd-net@freebsd.org Subject: Long-haul problems - connections stuck in slow start Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="36bQ1sTSD7NQcDKVhvmAqAkjfWOnAOJmH" Cc: Gavin Atkinson , andre@freebsd.org, Peter Wemm X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jan 2014 18:38:57 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --36bQ1sTSD7NQcDKVhvmAqAkjfWOnAOJmH Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable We're seeing some unfortunate misbehavior with tcp over an intercontinent= al link. eg: fetching a 30GB http file from various package mirrors by a remote: us-west(ISC) -> london(BME) bd93e71c-cae4-44fd-943c-d1a88dbf6c6d.tar 0% of 29 GB 961 kBps 09h03m^= C us-east(NYI) -> london(BME) bd93e71c-cae4-44fd-943c-d1a88dbf6c6d.tar 0% of 29 GB 1070 kBps 08h08m^= C us-west(YSV) -> london(BME) bd93e71c-cae4-44fd-943c-d1a88dbf6c6d.tar 0% of 29 GB 14 kBps 590h22m= ^C Spot the one we're concerned about... Ping times for the three (in order): round-trip min/avg/max/std-dev =3D 144.330/144.532/144.797/0.157 ms round-trip min/avg/max/std-dev =3D 79.650/79.965/80.488/0.287 ms round-trip min/avg/max/std-dev =3D 148.588/153.292/155.688/2.903 ms The problem pair is worth showing some detail on: 16 bytes from ..:206a::1001:10, icmp_seq=3D4 hlim=3D55 time=3D148.588 ms 16 bytes from ..:206a::1001:10, icmp_seq=3D5 hlim=3D55 time=3D155.140 ms 16 bytes from ..:206a::1001:10, icmp_seq=3D6 hlim=3D55 time=3D149.443 ms 16 bytes from ..:206a::1001:10, icmp_seq=3D7 hlim=3D55 time=3D155.688 ms 16 bytes from ..:206a::1001:10, icmp_seq=3D8 hlim=3D55 time=3D148.630 ms 16 bytes from ..:206a::1001:10, icmp_seq=3D9 hlim=3D55 time=3D155.486 ms It appears that there are two packet paths between the endpoints that hav= e either ~148ms or ~155ms. I've done some longer samples and they're fairl= y consistent clusters. All four machines talk to each other. Here's where it gets interesting. On the sender at us-west(YSV), I see t= his: net.inet.tcp.hostcache.list: IP address SSTRESH RTT RTTVAR CWND HITS us-west(ISC) 59521 5ms 1ms 16845 15055031 eu-west(BME) 7343 150ms 2ms 13501 3433775 us-east(NYI) 530489 100ms 37ms 16681 43043786 The ssthresh is very low for the problematic ysv<->bme pair. When I do a tcpdump, I see the sender fire off 7343 bytes of data, then s= top and wait for acks. It's completely ignoring the receiver's window state.= It appears stuck in slowstart mode. Some other data: Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp6 0 1047852 2001:19:2.443 2001:41c8:.24490 ESTABLISHED (netstat -x, sorry about the wrap) Proto Recv-Q Send-Q Local Address Foreign Address R-MBUF S-MBUF R-CLUS S-CLUS R-HIWA S-HIWA R-LOWA S-LOWA R-BCNT S-BCNT R-BMAX S-B= MAX rexmt persist keep 2msl delack rcvtime tcp6 0 1048152 2001:1900:2254:2.443 2001:41c8:112:83.24490 0= 374 0 373 65688 1049580 1 2048 0 1420800 525504 8396640 0.43 0.00 7199.93 0.00 0.00 0.06 The "interesting" parts of -x: rexmt persist keep 2msl delack rcvtime 0.43 0.00 7199.93 0.00 0.00 0.06 -T Proto Rexmit OOORcv 0-win Local Address Foreign Address tcp6 54161 0 0 2001:192.443 2001:41:83.24490 note retransmits(!) Some tcpcb fields that caught my eye for the connection: snd_wnd =3D 1048576, snd_cwnd =3D 5712, t_srtt =3D 6391, t_rttvar =3D 903, t_rxtshift =3D 0, t_rttmin =3D 30, t_rttbest =3D 4903, t_rttupdated =3D 220095, max_sndwnd =3D 1048576, snd_cwnd_prev =3D 4284, snd_ssthresh_prev =3D 2856, snd_recover_prev =3D 1397053524, t_sndzerowin =3D 0, t_badrxtwin =3D 584273259, snd_limited =3D 0 '\0', t_rttlow =3D 150, I've stored some dumps of the tcpcb at http://people.freebsd.org/~peter/tcpcb.txt Note that some in the tcpcb.txt file also have snd_limited =3D 2 '\002', Over the last few days I've tried things like turning off sack, tso, the various rfc knobs etc. I believe they're all back to normal now. There's small ~15 second tcpdump sample of the sender side and the receiv= er side at: http://people.freebsd.org/~peter/send.cap.gz and http://people.freebsd.org/~peter/recv.cap.gz Both ends were ntp synced. The dumps have no sensitive data. For amusement, I just tried this, with roughly 1 second in between: peter@bme:~ % scp pkg-ysv:k.gz /tmp k.gz 100% 25MB 5.0MB/s 00:05 peter@bme:~ % scp pkg-ysv:k.gz /tmp k.gz 0% 960KB 20.3KB/s 41:29 ETA^C There was no pre-existing hostcache state between those two endpoints for= the first run. At the end, this was created in the hostcache: IP address SSTRESH RTT RTTVAR BANDWIDTH CWND 213.138.. 5952 165ms 21ms 0 8688 All connections went slow after that. Note that the ssh test was over ip= v4 - the rest above is on ipv6. However, we're seeing the same weird stuff with http over ipv4 as well between the same two endpoints. It was pointed out to me that this has come up before, eg: misc/173859 I know we've seen this at work as well. A few days earlier we were pushing ~45MB/sec (bytes, not bits) between th= ese endpoints. Out of the blue it crashed to ~10KB/sec. Why can't it get out= of slow-start? Is it even stuck in slow-start like I think? Is the 148-155= ms bimodal rtt the problem? Any insight would be greatly appreciated. (please don't drop me from cc:= ) --=20 Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6F= JV UTF-8: for when a ' just won\342\200\231t do. --36bQ1sTSD7NQcDKVhvmAqAkjfWOnAOJmH Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLIVTsACgkQFRKuUnJ3cX9djgCfa8OuEF6AwFnlsyA1YT849YdN rsUAnA6X1BMs3Rsdgh5yUTkVubFquIuM =k83d -----END PGP SIGNATURE----- --36bQ1sTSD7NQcDKVhvmAqAkjfWOnAOJmH--