From owner-freebsd-net@FreeBSD.ORG Tue Jan 22 20:35:44 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8423BFA2; Tue, 22 Jan 2013 20:35:44 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 73D58210; Tue, 22 Jan 2013 20:35:44 +0000 (UTC) Received: from Alfreds-MacBook-Pro-9.local (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218]) by elvis.mu.org (Postfix) with ESMTPSA id 95D541A3D83; Tue, 22 Jan 2013 12:35:41 -0800 (PST) Message-ID: <50FEF81C.1070002@mu.org> Date: Tue, 22 Jan 2013 12:35:40 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: John Baldwin Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option References: <201301221511.02496.jhb@freebsd.org> In-Reply-To: <201301221511.02496.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Jan 2013 20:35:44 -0000 On 1/22/13 12:11 PM, John Baldwin wrote: > As I mentioned in an earlier thread, I recently had to debug an issue we were > seeing across a link with a high bandwidth-delay product (both high bandwidth > and high RTT). Our specific use case was to use a TCP connection to reliably > forward a latency-sensitive datagram stream across a WAN connection. We would > often see spikes in the latency of individual datagrams. I eventually tracked > this down to the connection entering slow start when it would transmit data > after being idle. The data stream was quite bursty and would often attempt to > transmit a burst of data after being idle for far longer than a retransmit > timeout. > > In 7.x we had worked around this in the past by disabling RFC 3390 and jacking > the slow start window size up via a sysctl. On 8.x this no longer worked. > The solution I came up with was to add a new socket option to disable idle > handling completely. That is, when an idle connection restarts with this new > option enabled, it keeps its current congestion window and doesn't enter slow > start. > > There are only a few cases where such an option is useful, but if anyone else > thinks this might be useful I'd be happy to add the option to FreeBSD. This looks good, but it almost sounds like a bug for TCP to be doing this anyhow. Why would one want this behavior? Wouldn't it make sense to keep the window large until there was a problem rather than unconditionally chop it down? I almost think TCP is afraid that you might wind up swapping out a 10gig interface for a modem? I'm just not getting it. (probably simple oversight on my part). What do you think about also making this a sysctl for global on/off by default? -Alfred > > Index: share/man/man4/tcp.4 > =================================================================== > --- share/man/man4/tcp.4 (revision 245742) > +++ share/man/man4/tcp.4 (working copy) > @@ -205,6 +205,18 @@ > in the > .Sx MIB Variables > section further down. > +.It Dv TCP_IGNOREIDLE > +If a TCP connection is idle for more than one retransmit timeout, > +it enters slow start when new data is available to transmit. > +This avoids flooding the network with a full window of traffic at line rate. > +It also allows the connection to adjust to changes to network conditions > +that occurred while the connection was idle. A connection that sends > +bursts of data separated by large idle periods can be permamently stuck in > +slow start as a result. > +The boolean option > +.Dv TCP_IGNOREIDLE > +disables the idle connection handling allowing connections to maintain the > +existing congestion window when restarting after an idle period. > .It Dv TCP_NODELAY > Under most circumstances, > .Tn TCP > Index: sys/netinet/tcp_var.h > =================================================================== > --- sys/netinet/tcp_var.h (revision 245742) > +++ sys/netinet/tcp_var.h (working copy) > @@ -230,6 +230,7 @@ > #define TF_NEEDFIN 0x000800 /* send FIN (implicit state) */ > #define TF_NOPUSH 0x001000 /* don't push */ > #define TF_PREVVALID 0x002000 /* saved values for bad rxmit valid */ > +#define TF_IGNOREIDLE 0x004000 /* connection is never idle */ > #define TF_MORETOCOME 0x010000 /* More data to be appended to sock */ > #define TF_LQ_OVERFLOW 0x020000 /* listen queue overflow */ > #define TF_LASTIDLE 0x040000 /* connection was previously idle */ > Index: sys/netinet/tcp_output.c > =================================================================== > --- sys/netinet/tcp_output.c (revision 245742) > +++ sys/netinet/tcp_output.c (working copy) > @@ -206,7 +206,8 @@ > * to send, then transmit; otherwise, investigate further. > */ > idle = (tp->t_flags & TF_LASTIDLE) || (tp->snd_max == tp->snd_una); > - if (idle && ticks - tp->t_rcvtime >= tp->t_rxtcur) > + if (!(tp->t_flags & TF_IGNOREIDLE) && > + idle && ticks - tp->t_rcvtime >= tp->t_rxtcur) > cc_after_idle(tp); > tp->t_flags &= ~TF_LASTIDLE; > if (idle) { > Index: sys/netinet/tcp.h > =================================================================== > --- sys/netinet/tcp.h (revision 245823) > +++ sys/netinet/tcp.h (working copy) > @@ -156,6 +156,7 @@ > #define TCP_NODELAY 1 /* don't delay send to coalesce packets */ > #if __BSD_VISIBLE > #define TCP_MAXSEG 2 /* set maximum segment size */ > +#define TCP_IGNOREIDLE 3 /* disable idle connection handling */ > #define TCP_NOPUSH 4 /* don't push last block of write */ > #define TCP_NOOPT 8 /* don't use TCP options */ > #define TCP_MD5SIG 16 /* use MD5 digests (RFC2385) */ > Index: sys/netinet/tcp_usrreq.c > =================================================================== > --- sys/netinet/tcp_usrreq.c (revision 245742) > +++ sys/netinet/tcp_usrreq.c (working copy) > @@ -1354,6 +1354,7 @@ > > case TCP_NODELAY: > case TCP_NOOPT: > + case TCP_IGNOREIDLE: > INP_WUNLOCK(inp); > error = sooptcopyin(sopt, &optval, sizeof optval, > sizeof optval); > @@ -1368,6 +1369,9 @@ > case TCP_NOOPT: > opt = TF_NOOPT; > break; > + case TCP_IGNOREIDLE: > + opt = TF_IGNOREIDLE; > + break; > default: > opt = 0; /* dead code to fool gcc */ > break; > @@ -1578,6 +1582,11 @@ > INP_WUNLOCK(inp); > error = sooptcopyout(sopt, buf, TCP_CA_NAME_MAX); > break; > + case TCP_IGNOREIDLE: > + optval = tp->t_flags & TF_IGNOREIDLE; > + INP_WUNLOCK(inp); > + error = sooptcopyout(sopt, &optval, sizeof optval); > + break; > default: > INP_WUNLOCK(inp); > error = ENOPROTOOPT; >