Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Jan 2013 12:35:40 -0800
From:      Alfred Perlstein <bright@mu.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        net@freebsd.org
Subject:   Re: [PATCH] Add a new TCP_IGNOREIDLE socket option
Message-ID:  <50FEF81C.1070002@mu.org>
In-Reply-To: <201301221511.02496.jhb@freebsd.org>
References:  <201301221511.02496.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 1/22/13 12:11 PM, John Baldwin wrote:
> As I mentioned in an earlier thread, I recently had to debug an issue we were
> seeing across a link with a high bandwidth-delay product (both high bandwidth
> and high RTT).  Our specific use case was to use a TCP connection to reliably
> forward a latency-sensitive datagram stream across a WAN connection.  We would
> often see spikes in the latency of individual datagrams.  I eventually tracked
> this down to the connection entering slow start when it would transmit data
> after being idle.  The data stream was quite bursty and would often attempt to
> transmit a burst of data after being idle for far longer than a retransmit
> timeout.
>
> In 7.x we had worked around this in the past by disabling RFC 3390 and jacking
> the slow start window size up via a sysctl.  On 8.x this no longer worked.
> The solution I came up with was to add a new socket option to disable idle
> handling completely.  That is, when an idle connection restarts with this new
> option enabled, it keeps its current congestion window and doesn't enter slow
> start.
>
> There are only a few cases where such an option is useful, but if anyone else
> thinks this might be useful I'd be happy to add the option to FreeBSD.

This looks good, but it almost sounds like a bug for TCP to be doing 
this anyhow.

Why would one want this behavior?

Wouldn't it make sense to keep the window large until there was a 
problem rather than unconditionally chop it down?  I almost think TCP is 
afraid that you might wind up swapping out a 10gig interface for a 
modem?  I'm just not getting it.  (probably simple oversight on my part).

What do you think about also making this a sysctl for global on/off by 
default?

-Alfred

>
> Index: share/man/man4/tcp.4
> ===================================================================
> --- share/man/man4/tcp.4	(revision 245742)
> +++ share/man/man4/tcp.4	(working copy)
> @@ -205,6 +205,18 @@
>   in the
>   .Sx MIB Variables
>   section further down.
> +.It Dv TCP_IGNOREIDLE
> +If a TCP connection is idle for more than one retransmit timeout,
> +it enters slow start when new data is available to transmit.
> +This avoids flooding the network with a full window of traffic at line rate.
> +It also allows the connection to adjust to changes to network conditions
> +that occurred while the connection was idle.  A connection that sends
> +bursts of data separated by large idle periods can be permamently stuck in
> +slow start as a result.
> +The boolean option
> +.Dv TCP_IGNOREIDLE
> +disables the idle connection handling allowing connections to maintain the
> +existing congestion window when restarting after an idle period.
>   .It Dv TCP_NODELAY
>   Under most circumstances,
>   .Tn TCP
> Index: sys/netinet/tcp_var.h
> ===================================================================
> --- sys/netinet/tcp_var.h	(revision 245742)
> +++ sys/netinet/tcp_var.h	(working copy)
> @@ -230,6 +230,7 @@
>   #define	TF_NEEDFIN	0x000800	/* send FIN (implicit state) */
>   #define	TF_NOPUSH	0x001000	/* don't push */
>   #define	TF_PREVVALID	0x002000	/* saved values for bad rxmit valid */
> +#define	TF_IGNOREIDLE	0x004000	/* connection is never idle */
>   #define	TF_MORETOCOME	0x010000	/* More data to be appended to sock */
>   #define	TF_LQ_OVERFLOW	0x020000	/* listen queue overflow */
>   #define	TF_LASTIDLE	0x040000	/* connection was previously idle */
> Index: sys/netinet/tcp_output.c
> ===================================================================
> --- sys/netinet/tcp_output.c	(revision 245742)
> +++ sys/netinet/tcp_output.c	(working copy)
> @@ -206,7 +206,8 @@
>   	 * to send, then transmit; otherwise, investigate further.
>   	 */
>   	idle = (tp->t_flags & TF_LASTIDLE) || (tp->snd_max == tp->snd_una);
> -	if (idle && ticks - tp->t_rcvtime >= tp->t_rxtcur)
> +	if (!(tp->t_flags & TF_IGNOREIDLE) &&
> +	    idle && ticks - tp->t_rcvtime >= tp->t_rxtcur)
>   		cc_after_idle(tp);
>   	tp->t_flags &= ~TF_LASTIDLE;
>   	if (idle) {
> Index: sys/netinet/tcp.h
> ===================================================================
> --- sys/netinet/tcp.h	(revision 245823)
> +++ sys/netinet/tcp.h	(working copy)
> @@ -156,6 +156,7 @@
>   #define	TCP_NODELAY	1	/* don't delay send to coalesce packets */
>   #if __BSD_VISIBLE
>   #define	TCP_MAXSEG	2	/* set maximum segment size */
> +#define	TCP_IGNOREIDLE	3	/* disable idle connection handling */
>   #define TCP_NOPUSH	4	/* don't push last block of write */
>   #define TCP_NOOPT	8	/* don't use TCP options */
>   #define TCP_MD5SIG	16	/* use MD5 digests (RFC2385) */
> Index: sys/netinet/tcp_usrreq.c
> ===================================================================
> --- sys/netinet/tcp_usrreq.c	(revision 245742)
> +++ sys/netinet/tcp_usrreq.c	(working copy)
> @@ -1354,6 +1354,7 @@
>   
>   		case TCP_NODELAY:
>   		case TCP_NOOPT:
> +		case TCP_IGNOREIDLE:
>   			INP_WUNLOCK(inp);
>   			error = sooptcopyin(sopt, &optval, sizeof optval,
>   			    sizeof optval);
> @@ -1368,6 +1369,9 @@
>   			case TCP_NOOPT:
>   				opt = TF_NOOPT;
>   				break;
> +			case TCP_IGNOREIDLE:
> +				opt = TF_IGNOREIDLE;
> +				break;
>   			default:
>   				opt = 0; /* dead code to fool gcc */
>   				break;
> @@ -1578,6 +1582,11 @@
>   			INP_WUNLOCK(inp);
>   			error = sooptcopyout(sopt, buf, TCP_CA_NAME_MAX);
>   			break;
> +		case TCP_IGNOREIDLE:
> +			optval = tp->t_flags & TF_IGNOREIDLE;
> +			INP_WUNLOCK(inp);
> +			error = sooptcopyout(sopt, &optval, sizeof optval);
> +			break;
>   		default:
>   			INP_WUNLOCK(inp);
>   			error = ENOPROTOOPT;
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50FEF81C.1070002>