From owner-freebsd-net@FreeBSD.ORG Wed Jan 23 16:32:37 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 934E5A77; Wed, 23 Jan 2013 16:32:37 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 63A42EEA; Wed, 23 Jan 2013 16:32:37 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id C4B3AB911; Wed, 23 Jan 2013 11:32:36 -0500 (EST) From: John Baldwin To: freebsd-net@freebsd.org Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option Date: Wed, 23 Jan 2013 11:15:06 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <201301221511.02496.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Message-Id: <201301231115.06393.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 23 Jan 2013 11:32:36 -0500 (EST) Cc: Sepherosa Ziehau , Bjoern Zeeb X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jan 2013 16:32:37 -0000 On Wednesday, January 23, 2013 1:33:27 am Sepherosa Ziehau wrote: > On Wed, Jan 23, 2013 at 4:11 AM, John Baldwin wrote: > > As I mentioned in an earlier thread, I recently had to debug an issue w= e were > > seeing across a link with a high bandwidth-delay product (both high ban= dwidth > > and high RTT). Our specific use case was to use a TCP connection to re= liably > > forward a latency-sensitive datagram stream across a WAN connection. W= e would > > often see spikes in the latency of individual datagrams. I eventually = tracked > > this down to the connection entering slow start when it would transmit = data > > after being idle. The data stream was quite bursty and would often att= empt to > > transmit a burst of data after being idle for far longer than a retrans= mit > > timeout. > > > > In 7.x we had worked around this in the past by disabling RFC 3390 and = jacking > > the slow start window size up via a sysctl. On 8.x this no longer work= ed. > > The solution I came up with was to add a new socket option to disable i= dle > > handling completely. That is, when an idle connection restarts with th= is new > > option enabled, it keeps its current congestion window and doesn't ente= r slow > > start. > > > > There are only a few cases where such an option is useful, but if anyon= e else > > thinks this might be useful I'd be happy to add the option to FreeBSD. >=20 > I think what you need is the RFC2861, however, you probably should > ignore the "application-limited period" part of RFC2861. Hummm. It appears btw, that Linux uses RFC 2861, but has a global knob to disable it due to applictions having problems. When it is disabled, it doesn't decay the congestion window at all during idle handling. That i= s, it appears to act the same as if TCP_IGNOREIDLE were enabled. =46rom http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html: tcp_slow_start_after_idle (Boolean; default: enabled; since Linux 2.= 6.18) If enabled, provide RFC 2861 behavior and time out the conges= tion window after an idle period. An idle period is defined as th= e current RTO (retransmission timeout). If disabled, the congestion wi= ndow will not be timed out after an idle period. Also, in this thread on tcp-m it appears no one on that list realizes that there are any implementations which follow the "SHOULD" in RFC 2581 for idle handling (which is what we do currently): http://www.ietf.org/mail-archive/web/tcpm/current/msg02864.html So if we were to implement RFC 2861, the new socket option would be equival= ent to setting Linux's 'tcp_slow_start_after_idle' to false, but on a per-socket basis rather than globally. =2D-=20 John Baldwin