Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Jan 2013 14:33:27 +0800
From:      Sepherosa Ziehau <sepherosa@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        "freebsd-net@freebsd.org" <net@freebsd.org>
Subject:   Re: [PATCH] Add a new TCP_IGNOREIDLE socket option
Message-ID:  <CAMOc5czyB=c0fQ%2BHnYdZf0Ym7wPQsXzR-b81yWg%2BLwziZeCQOA@mail.gmail.com>
In-Reply-To: <201301221511.02496.jhb@freebsd.org>
References:  <201301221511.02496.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jan 23, 2013 at 4:11 AM, John Baldwin <jhb@freebsd.org> wrote:
> As I mentioned in an earlier thread, I recently had to debug an issue we were
> seeing across a link with a high bandwidth-delay product (both high bandwidth
> and high RTT).  Our specific use case was to use a TCP connection to reliably
> forward a latency-sensitive datagram stream across a WAN connection.  We would
> often see spikes in the latency of individual datagrams.  I eventually tracked
> this down to the connection entering slow start when it would transmit data
> after being idle.  The data stream was quite bursty and would often attempt to
> transmit a burst of data after being idle for far longer than a retransmit
> timeout.
>
> In 7.x we had worked around this in the past by disabling RFC 3390 and jacking
> the slow start window size up via a sysctl.  On 8.x this no longer worked.
> The solution I came up with was to add a new socket option to disable idle
> handling completely.  That is, when an idle connection restarts with this new
> option enabled, it keeps its current congestion window and doesn't enter slow
> start.
>
> There are only a few cases where such an option is useful, but if anyone else
> thinks this might be useful I'd be happy to add the option to FreeBSD.

I think what you need is the RFC2861, however, you probably should
ignore the "application-limited period" part of RFC2861.

Best Regards,
sephe

>
> Index: share/man/man4/tcp.4
> ===================================================================
> --- share/man/man4/tcp.4        (revision 245742)
> +++ share/man/man4/tcp.4        (working copy)
> @@ -205,6 +205,18 @@
>  in the
>  .Sx MIB Variables
>  section further down.
> +.It Dv TCP_IGNOREIDLE
> +If a TCP connection is idle for more than one retransmit timeout,
> +it enters slow start when new data is available to transmit.
> +This avoids flooding the network with a full window of traffic at line rate.
> +It also allows the connection to adjust to changes to network conditions
> +that occurred while the connection was idle.  A connection that sends
> +bursts of data separated by large idle periods can be permamently stuck in
> +slow start as a result.
> +The boolean option
> +.Dv TCP_IGNOREIDLE
> +disables the idle connection handling allowing connections to maintain the
> +existing congestion window when restarting after an idle period.
>  .It Dv TCP_NODELAY
>  Under most circumstances,
>  .Tn TCP
> Index: sys/netinet/tcp_var.h
> ===================================================================
> --- sys/netinet/tcp_var.h       (revision 245742)
> +++ sys/netinet/tcp_var.h       (working copy)
> @@ -230,6 +230,7 @@
>  #define        TF_NEEDFIN      0x000800        /* send FIN (implicit state) */
>  #define        TF_NOPUSH       0x001000        /* don't push */
>  #define        TF_PREVVALID    0x002000        /* saved values for bad rxmit valid */
> +#define        TF_IGNOREIDLE   0x004000        /* connection is never idle */
>  #define        TF_MORETOCOME   0x010000        /* More data to be appended to sock */
>  #define        TF_LQ_OVERFLOW  0x020000        /* listen queue overflow */
>  #define        TF_LASTIDLE     0x040000        /* connection was previously idle */
> Index: sys/netinet/tcp_output.c
> ===================================================================
> --- sys/netinet/tcp_output.c    (revision 245742)
> +++ sys/netinet/tcp_output.c    (working copy)
> @@ -206,7 +206,8 @@
>          * to send, then transmit; otherwise, investigate further.
>          */
>         idle = (tp->t_flags & TF_LASTIDLE) || (tp->snd_max == tp->snd_una);
> -       if (idle && ticks - tp->t_rcvtime >= tp->t_rxtcur)
> +       if (!(tp->t_flags & TF_IGNOREIDLE) &&
> +           idle && ticks - tp->t_rcvtime >= tp->t_rxtcur)
>                 cc_after_idle(tp);
>         tp->t_flags &= ~TF_LASTIDLE;
>         if (idle) {
> Index: sys/netinet/tcp.h
> ===================================================================
> --- sys/netinet/tcp.h   (revision 245823)
> +++ sys/netinet/tcp.h   (working copy)
> @@ -156,6 +156,7 @@
>  #define        TCP_NODELAY     1       /* don't delay send to coalesce packets */
>  #if __BSD_VISIBLE
>  #define        TCP_MAXSEG      2       /* set maximum segment size */
> +#define        TCP_IGNOREIDLE  3       /* disable idle connection handling */
>  #define TCP_NOPUSH     4       /* don't push last block of write */
>  #define TCP_NOOPT      8       /* don't use TCP options */
>  #define TCP_MD5SIG     16      /* use MD5 digests (RFC2385) */
> Index: sys/netinet/tcp_usrreq.c
> ===================================================================
> --- sys/netinet/tcp_usrreq.c    (revision 245742)
> +++ sys/netinet/tcp_usrreq.c    (working copy)
> @@ -1354,6 +1354,7 @@
>
>                 case TCP_NODELAY:
>                 case TCP_NOOPT:
> +               case TCP_IGNOREIDLE:
>                         INP_WUNLOCK(inp);
>                         error = sooptcopyin(sopt, &optval, sizeof optval,
>                             sizeof optval);
> @@ -1368,6 +1369,9 @@
>                         case TCP_NOOPT:
>                                 opt = TF_NOOPT;
>                                 break;
> +                       case TCP_IGNOREIDLE:
> +                               opt = TF_IGNOREIDLE;
> +                               break;
>                         default:
>                                 opt = 0; /* dead code to fool gcc */
>                                 break;
> @@ -1578,6 +1582,11 @@
>                         INP_WUNLOCK(inp);
>                         error = sooptcopyout(sopt, buf, TCP_CA_NAME_MAX);
>                         break;
> +               case TCP_IGNOREIDLE:
> +                       optval = tp->t_flags & TF_IGNOREIDLE;
> +                       INP_WUNLOCK(inp);
> +                       error = sooptcopyout(sopt, &optval, sizeof optval);
> +                       break;
>                 default:
>                         INP_WUNLOCK(inp);
>                         error = ENOPROTOOPT;
>
> --
> John Baldwin
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"



--
Tomorrow Will Never Die



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMOc5czyB=c0fQ%2BHnYdZf0Ym7wPQsXzR-b81yWg%2BLwziZeCQOA>