From owner-freebsd-net@FreeBSD.ORG  Thu Apr 11 19:45:57 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 94B04908
 for <freebsd-net@freebsd.org>; Thu, 11 Apr 2013 19:45:57 +0000 (UTC)
 (envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
 by mx1.freebsd.org (Postfix) with ESMTP id F01D915F9
 for <freebsd-net@freebsd.org>; Thu, 11 Apr 2013 19:45:56 +0000 (UTC)
Received: (qmail 71924 invoked from network); 11 Apr 2013 20:52:59 -0000
Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2])
 (envelope-sender <andre@freebsd.org>)
 by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
 for <matt@matthewjmiller.net>; 11 Apr 2013 20:52:59 -0000
Message-ID: <516712EE.8010708@freebsd.org>
Date: Thu, 11 Apr 2013 21:45:50 +0200
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130307 Thunderbird/17.0.4
MIME-Version: 1.0
To: Matt Miller <matt@matthewjmiller.net>
Subject: Re: RFC 3042 Implementation
References: <CAFc6gu9ynWjeF0J+v_vYeHQNmArJdteH3AaTrHwa1bRVHYtQgg@mail.gmail.com>
In-Reply-To: <CAFc6gu9ynWjeF0J+v_vYeHQNmArJdteH3AaTrHwa1bRVHYtQgg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: FreeBSD Net <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Apr 2013 19:45:57 -0000

On 11.04.2013 20:59, Matt Miller wrote:
> In some of our tests, we noticed some duplicate pure ACKs (not window
> updates), most of which the duplicates were coming from this tcp_output()
> call in tcp_do_segment() (line 2534):
>
> 2508                                 } else if (V_tcp_do_rfc3042) {
> 2509                                         cc_ack_received(tp, th,
> CC_DUPACK);
> 2510                                         u_long oldcwnd = tp->snd_cwnd;
> 2511                                         tcp_seq oldsndmax =
> tp->snd_max;
> 2512                                         u_int sent;
> 2513
> 2514                                         KASSERT(tp->t_dupacks == 1 ||
> 2515                                             tp->t_dupacks == 2,
> 2516                                             ("%s: dupacks not 1 or 2",
> 2517                                             __func__));
> 2518                                         if (tp->t_dupacks == 1)
> 2519                                                 tp->snd_limited = 0;
> 2520                                         tp->snd_cwnd =
> 2521                                             (tp->snd_nxt -
> tp->snd_una) +
> 2522                                             (tp->t_dupacks -
> tp->snd_limited) *
> 2523                                             tp->t_maxseg;
> 2524                                         if ((thflags & TH_FIN) &&
> 2525
> (TCPS_HAVERCVDFIN(tp->t_state) == 0)) {
> 2526                                                 /*
> 2527                                                  * If its a fin we
> need to process
> 2528                                                  * it to avoid a race
> where both
> 2529                                                  * sides enter
> FIN-WAIT and send FIN|ACK
> 2530                                                  * at the same time.
> 2531                                                  */
> 2532                                                 break;
> 2533                                         }
> 2534                                         (void) tcp_output(tp);
>
> I added some instrumentation here to count how many time the following is
> zero prior to calling tcp_output():
>
> so->so_snd.sb_cc - ((tp->snd_nxt - tp->snd_una)
>
> And, in our tests, it was like 97% of the time.
>
> It looks like the intent of the RFC is to only send one or two unsent data
> segments here, not pure ACKs.  And this subsequent standard seems to
> clarify that new data should be available for transmission:
>
> http://tools.ietf.org/html/rfc5681
>
> <quote>
>         On the first and second duplicate ACKs received at a sender, a
>         TCP SHOULD send a segment of previously unsent data per [RFC3042]
>         provided that the receiver's advertised window allows, the total
>         FlightSize would remain less than or equal to cwnd plus 2*SMSS,
>         and that new data is available for transmission.
> </quote>
>
> So, this is a detailed way of asking: do we need a check here to make sure
> there is new data to send prior to calling tcp_output()?

Yes.  Based on your input the attached patch should fix it (untested).

-- 
Andre

Index: tcp_input.c
===================================================================
--- tcp_input.c (revision 249375)
+++ tcp_input.c (working copy)
@@ -2564,6 +2564,7 @@
                                         u_long oldcwnd = tp->snd_cwnd;
                                         tcp_seq oldsndmax = tp->snd_max;
                                         u_int sent;
+                                       int avail;

                                         KASSERT(tp->t_dupacks == 1 ||
                                             tp->t_dupacks == 2,
@@ -2585,7 +2586,17 @@
                                                  */
                                                 break;
                                         }
-                                       (void) tcp_output(tp);
+                                       /*
+                                        * Only call tcp_output when there
+                                        * is new data available to be sent.
+                                        * Otherwise we send pure ACKs.
+                                        */
+                                       SOCKBUF_LOCK(&so->so_snd);
+                                       avail = so->so_snd.sb_cc -
+                                           (tp->snd_nxt - tp->snd_una);
+                                       SOCKBUF_UNLOCK(&so->so_snd);
+                                       if (avail > 0)
+                                               (void) tcp_output(tp);
                                         sent = tp->snd_max - oldsndmax;
                                         if (sent > tp->t_maxseg) {
                                                 KASSERT((tp->t_dupacks == 2 &&