Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Jan 2013 15:11:02 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        net@freebsd.org
Subject:   [PATCH] Add a new TCP_IGNOREIDLE socket option
Message-ID:  <201301221511.02496.jhb@freebsd.org>

next in thread | raw e-mail | index | archive | help
As I mentioned in an earlier thread, I recently had to debug an issue we were 
seeing across a link with a high bandwidth-delay product (both high bandwidth 
and high RTT).  Our specific use case was to use a TCP connection to reliably 
forward a latency-sensitive datagram stream across a WAN connection.  We would 
often see spikes in the latency of individual datagrams.  I eventually tracked 
this down to the connection entering slow start when it would transmit data 
after being idle.  The data stream was quite bursty and would often attempt to 
transmit a burst of data after being idle for far longer than a retransmit 
timeout.

In 7.x we had worked around this in the past by disabling RFC 3390 and jacking 
the slow start window size up via a sysctl.  On 8.x this no longer worked.  
The solution I came up with was to add a new socket option to disable idle 
handling completely.  That is, when an idle connection restarts with this new 
option enabled, it keeps its current congestion window and doesn't enter slow 
start.

There are only a few cases where such an option is useful, but if anyone else 
thinks this might be useful I'd be happy to add the option to FreeBSD.

Index: share/man/man4/tcp.4
===================================================================
--- share/man/man4/tcp.4	(revision 245742)
+++ share/man/man4/tcp.4	(working copy)
@@ -205,6 +205,18 @@
 in the
 .Sx MIB Variables
 section further down.
+.It Dv TCP_IGNOREIDLE
+If a TCP connection is idle for more than one retransmit timeout,
+it enters slow start when new data is available to transmit.
+This avoids flooding the network with a full window of traffic at line rate.
+It also allows the connection to adjust to changes to network conditions
+that occurred while the connection was idle.  A connection that sends
+bursts of data separated by large idle periods can be permamently stuck in
+slow start as a result.
+The boolean option
+.Dv TCP_IGNOREIDLE
+disables the idle connection handling allowing connections to maintain the
+existing congestion window when restarting after an idle period.
 .It Dv TCP_NODELAY
 Under most circumstances,
 .Tn TCP
Index: sys/netinet/tcp_var.h
===================================================================
--- sys/netinet/tcp_var.h	(revision 245742)
+++ sys/netinet/tcp_var.h	(working copy)
@@ -230,6 +230,7 @@
 #define	TF_NEEDFIN	0x000800	/* send FIN (implicit state) */
 #define	TF_NOPUSH	0x001000	/* don't push */
 #define	TF_PREVVALID	0x002000	/* saved values for bad rxmit valid */
+#define	TF_IGNOREIDLE	0x004000	/* connection is never idle */
 #define	TF_MORETOCOME	0x010000	/* More data to be appended to sock */
 #define	TF_LQ_OVERFLOW	0x020000	/* listen queue overflow */
 #define	TF_LASTIDLE	0x040000	/* connection was previously idle */
Index: sys/netinet/tcp_output.c
===================================================================
--- sys/netinet/tcp_output.c	(revision 245742)
+++ sys/netinet/tcp_output.c	(working copy)
@@ -206,7 +206,8 @@
 	 * to send, then transmit; otherwise, investigate further.
 	 */
 	idle = (tp->t_flags & TF_LASTIDLE) || (tp->snd_max == tp->snd_una);
-	if (idle && ticks - tp->t_rcvtime >= tp->t_rxtcur)
+	if (!(tp->t_flags & TF_IGNOREIDLE) &&
+	    idle && ticks - tp->t_rcvtime >= tp->t_rxtcur)
 		cc_after_idle(tp);
 	tp->t_flags &= ~TF_LASTIDLE;
 	if (idle) {
Index: sys/netinet/tcp.h
===================================================================
--- sys/netinet/tcp.h	(revision 245823)
+++ sys/netinet/tcp.h	(working copy)
@@ -156,6 +156,7 @@
 #define	TCP_NODELAY	1	/* don't delay send to coalesce packets */
 #if __BSD_VISIBLE
 #define	TCP_MAXSEG	2	/* set maximum segment size */
+#define	TCP_IGNOREIDLE	3	/* disable idle connection handling */
 #define TCP_NOPUSH	4	/* don't push last block of write */
 #define TCP_NOOPT	8	/* don't use TCP options */
 #define TCP_MD5SIG	16	/* use MD5 digests (RFC2385) */
Index: sys/netinet/tcp_usrreq.c
===================================================================
--- sys/netinet/tcp_usrreq.c	(revision 245742)
+++ sys/netinet/tcp_usrreq.c	(working copy)
@@ -1354,6 +1354,7 @@
 
 		case TCP_NODELAY:
 		case TCP_NOOPT:
+		case TCP_IGNOREIDLE:
 			INP_WUNLOCK(inp);
 			error = sooptcopyin(sopt, &optval, sizeof optval,
 			    sizeof optval);
@@ -1368,6 +1369,9 @@
 			case TCP_NOOPT:
 				opt = TF_NOOPT;
 				break;
+			case TCP_IGNOREIDLE:
+				opt = TF_IGNOREIDLE;
+				break;
 			default:
 				opt = 0; /* dead code to fool gcc */
 				break;
@@ -1578,6 +1582,11 @@
 			INP_WUNLOCK(inp);
 			error = sooptcopyout(sopt, buf, TCP_CA_NAME_MAX);
 			break;
+		case TCP_IGNOREIDLE:
+			optval = tp->t_flags & TF_IGNOREIDLE;
+			INP_WUNLOCK(inp);
+			error = sooptcopyout(sopt, &optval, sizeof optval);
+			break;
 		default:
 			INP_WUNLOCK(inp);
 			error = ENOPROTOOPT;

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201301221511.02496.jhb>