From owner-svn-src-head@freebsd.org  Mon Apr 10 15:18:02 2017
Return-Path: <owner-svn-src-head@freebsd.org>
Delivered-To: svn-src-head@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2C534D375FE
 for <svn-src-head@mailman.ysv.freebsd.org>;
 Mon, 10 Apr 2017 15:18:02 +0000 (UTC)
 (envelope-from steven@multiplay.co.uk)
Received: from mail-wm0-x230.google.com (mail-wm0-x230.google.com
 [IPv6:2a00:1450:400c:c09::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 9DFD1FB6
 for <svn-src-head@freebsd.org>; Mon, 10 Apr 2017 15:18:01 +0000 (UTC)
 (envelope-from steven@multiplay.co.uk)
Received: by mail-wm0-x230.google.com with SMTP id u2so40145599wmu.0
 for <svn-src-head@freebsd.org>; Mon, 10 Apr 2017 08:18:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623;
 h=from:subject:to:references:message-id:date:user-agent:mime-version
 :in-reply-to; bh=DRSUZLWBMillVV/f1IYZl3lYWB9nNiONsNMrnQziDWI=;
 b=uuvUKoyxL0ac50KpIJuLPfYJKlirDHlv0W90OpLD1JlDcWOQQooVE/hsuUyHQRGUtI
 Gp3i6s4TaYknJXrk9/vCTwgWPSEf2lnKHaEkxKABCWbppwvGjG4aCHQsK2YWiNpfz9l8
 8JChsjMUiuBIkYL8p0+QsBcnuZwFNUEEgHHQaDYfzDkwXWY7ROF2JiysjfEixhTVpWOm
 KAj7cUIAeB8hDy7kpafqMVudRdhClo86NjDgRlWtrv1qKb2jWfsEZlhQ0ysMEpmGkag+
 E3YnakvZAlO8cqbaIuQdLo+/S/niN/6e+uULA8SABLL0w3RMkETJ6Rk71IBmAYB2wqe2
 qYvQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:subject:to:references:message-id:date
 :user-agent:mime-version:in-reply-to;
 bh=DRSUZLWBMillVV/f1IYZl3lYWB9nNiONsNMrnQziDWI=;
 b=ND0eB/ZnduaUFBnr1nXYLy/0D3yHb12NMy+tdgWkzzNuFv1R7bJ5A4wnqGySAgPfQp
 X+Bk5zGCw7eP6L0kJjfI6FOEP02B1DvZ03Ht7ZgxyVz8Y27FpJ73B0PD28INFWNggv9F
 2B6cK3FBhNBJDaynhgqWRIUfQqqvcyOPehNZ0ZbhTwTWe519p2fkZHoC86e6EYE9TmbF
 bLrjkRgOYMzGHXdrf1SdndFOil5JNAG91Cx3IBZmTo225fnBechwRqhKG5htT8/7CAyX
 xU85AtITN57vyW1B3mzDidPzUBK+w5ed0SKJ5kI5OaMyte/qUz2bglTOBeHoY2c/QrxJ
 CneQ==
X-Gm-Message-State: AN3rC/6Pfcv/0QfUZg/4X3S51pIvXJf51K6/dhAE3GkMVJ+6238MLZnmwTywLR9nDum80X7H
X-Received: by 10.28.91.82 with SMTP id p79mr9916525wmb.130.1491837479234;
 Mon, 10 Apr 2017 08:17:59 -0700 (PDT)
Received: from [10.10.1.58] ([185.97.61.26])
 by smtp.gmail.com with ESMTPSA id 201sm10697188wmr.5.2017.04.10.08.17.58
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 10 Apr 2017 08:17:58 -0700 (PDT)
From: Steven Hartland <steven@multiplay.co.uk>
X-Google-Original-From: Steven Hartland <smh@freebsd.org>
Subject: Re: svn commit: r316676 - in head/sys/netinet: . tcp_stacks
To: Julian Elischer <julian@freebsd.org>, src-committers@freebsd.org,
 svn-src-all@freebsd.org, svn-src-head@freebsd.org
References: <201704100819.v3A8JZCh014531@repo.freebsd.org>
 <5dc195ac-d78d-1c7c-10dd-a1e21f503330@freebsd.org>
Message-ID: <6f86b644-1ae3-101e-629b-4e9437e8bb85@freebsd.org>
Date: Mon, 10 Apr 2017 16:17:59 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <5dc195ac-d78d-1c7c-10dd-a1e21f503330@freebsd.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.23
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head/>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Apr 2017 15:18:02 -0000

I don't tend to MFC 10.x now, but do agree given the impact that for 
this one it should be done.

The fix is a little different, due to code restructuring in 11 / head, 
but I do have a 10.x version already.

     Regards
     Steve

On 10/04/2017 15:51, Julian Elischer wrote:
> If possible MFC to 10 too would be nice..
> thanks
>
>
> On 10/4/17 4:19 pm, Steven Hartland wrote:
>> Author: smh
>> Date: Mon Apr 10 08:19:35 2017
>> New Revision: 316676
>> URL: https://svnweb.freebsd.org/changeset/base/316676
>>
>> Log:
>>    Use estimated RTT for receive buffer auto resizing instead of 
>> timestamps
>>       Switched from using timestamps to RTT estimates when performing 
>> TCP receive
>>    buffer auto resizing, as not all hosts support / enable TCP 
>> timestamps.
>>       Disabled reset of receive buffer auto scaling when not in bulk 
>> receive mode,
>>    which gives an extra 20% performance increase.
>>       Also extracted auto resizing to a common method shared between 
>> standard and
>>    fastpath modules.
>>       With this AWS S3 downloads at ~17ms latency on a 1Gbps 
>> connection jump from
>>    ~3MB/s to ~100MB/s using the default settings.
>>       Reviewed by:    lstewart, gnn
>>    MFC after:      2 weeks
>>    Relnotes:       Yes
>>    Sponsored by:   Multiplay
>>    Differential Revision:  https://reviews.freebsd.org/D9668
>>
>> Modified:
>>    head/sys/netinet/in_kdtrace.c
>>    head/sys/netinet/in_kdtrace.h
>>    head/sys/netinet/tcp_input.c
>>    head/sys/netinet/tcp_output.c
>>    head/sys/netinet/tcp_stacks/fastpath.c
>>    head/sys/netinet/tcp_var.h
>>
>> Modified: head/sys/netinet/in_kdtrace.c
>> ============================================================================== 
>>
>> --- head/sys/netinet/in_kdtrace.c    Mon Apr 10 06:19:09 2017 (r316675)
>> +++ head/sys/netinet/in_kdtrace.c    Mon Apr 10 08:19:35 2017 (r316676)
>> @@ -132,6 +132,14 @@ SDT_PROBE_DEFINE6_XLATE(tcp, , , state__
>>       "void *", "void *",
>>       "int", "tcplsinfo_t *");
>>   +SDT_PROBE_DEFINE6_XLATE(tcp, , , receive__autoresize,
>> +    "void *", "void *",
>> +    "struct tcpcb *", "csinfo_t *",
>> +    "struct mbuf *", "ipinfo_t *",
>> +    "struct tcpcb *", "tcpsinfo_t *" ,
>> +    "struct tcphdr *", "tcpinfoh_t *",
>> +    "int", "int");
>> +
>>   SDT_PROBE_DEFINE5_XLATE(udp, , , receive,
>>       "void *", "pktinfo_t *",
>>       "struct inpcb *", "csinfo_t *",
>>
>> Modified: head/sys/netinet/in_kdtrace.h
>> ============================================================================== 
>>
>> --- head/sys/netinet/in_kdtrace.h    Mon Apr 10 06:19:09 2017 (r316675)
>> +++ head/sys/netinet/in_kdtrace.h    Mon Apr 10 08:19:35 2017 (r316676)
>> @@ -65,6 +65,7 @@ SDT_PROBE_DECLARE(tcp, , , debug__input)
>>   SDT_PROBE_DECLARE(tcp, , , debug__output);
>>   SDT_PROBE_DECLARE(tcp, , , debug__user);
>>   SDT_PROBE_DECLARE(tcp, , , debug__drop);
>> +SDT_PROBE_DECLARE(tcp, , , receive__autoresize);
>>     SDT_PROBE_DECLARE(udp, , , receive);
>>   SDT_PROBE_DECLARE(udp, , , send);
>>
>> Modified: head/sys/netinet/tcp_input.c
>> ============================================================================== 
>>
>> --- head/sys/netinet/tcp_input.c    Mon Apr 10 06:19:09 2017 (r316675)
>> +++ head/sys/netinet/tcp_input.c    Mon Apr 10 08:19:35 2017 (r316676)
>> @@ -1486,6 +1486,68 @@ drop:
>>       return (IPPROTO_DONE);
>>   }
>>   +/*
>> + * Automatic sizing of receive socket buffer.  Often the send
>> + * buffer size is not optimally adjusted to the actual network
>> + * conditions at hand (delay bandwidth product).  Setting the
>> + * buffer size too small limits throughput on links with high
>> + * bandwidth and high delay (eg. trans-continental/oceanic links).
>> + *
>> + * On the receive side the socket buffer memory is only rarely
>> + * used to any significant extent.  This allows us to be much
>> + * more aggressive in scaling the receive socket buffer.  For
>> + * the case that the buffer space is actually used to a large
>> + * extent and we run out of kernel memory we can simply drop
>> + * the new segments; TCP on the sender will just retransmit it
>> + * later.  Setting the buffer size too big may only consume too
>> + * much kernel memory if the application doesn't read() from
>> + * the socket or packet loss or reordering makes use of the
>> + * reassembly queue.
>> + *
>> + * The criteria to step up the receive buffer one notch are:
>> + *  1. Application has not set receive buffer size with
>> + *     SO_RCVBUF. Setting SO_RCVBUF clears SB_AUTOSIZE.
>> + *  2. the number of bytes received during the time it takes
>> + *     one timestamp to be reflected back to us (the RTT);
>> + *  3. received bytes per RTT is within seven eighth of the
>> + *     current socket buffer size;
>> + *  4. receive buffer size has not hit maximal automatic size;
>> + *
>> + * This algorithm does one step per RTT at most and only if
>> + * we receive a bulk stream w/o packet losses or reorderings.
>> + * Shrinking the buffer during idle times is not necessary as
>> + * it doesn't consume any memory when idle.
>> + *
>> + * TODO: Only step up if the application is actually serving
>> + * the buffer to better manage the socket buffer resources.
>> + */
>> +int
>> +tcp_autorcvbuf(struct mbuf *m, struct tcphdr *th, struct socket *so,
>> +    struct tcpcb *tp, int tlen)
>> +{
>> +    int newsize = 0;
>> +
>> +    if (V_tcp_do_autorcvbuf && (so->so_rcv.sb_flags & SB_AUTOSIZE) &&
>> +        tp->t_srtt != 0 && tp->rfbuf_ts != 0 &&
>> +        TCP_TS_TO_TICKS(tcp_ts_getticks() - tp->rfbuf_ts) >
>> +        (tp->t_srtt >> TCP_RTT_SHIFT)) {
>> +        if (tp->rfbuf_cnt > (so->so_rcv.sb_hiwat / 8 * 7) &&
>> +            so->so_rcv.sb_hiwat < V_tcp_autorcvbuf_max) {
>> +            newsize = min(so->so_rcv.sb_hiwat +
>> +                V_tcp_autorcvbuf_inc, V_tcp_autorcvbuf_max);
>> +        }
>> +        TCP_PROBE6(receive__autoresize, NULL, tp, m, tp, th, newsize);
>> +
>> +        /* Start over with next RTT. */
>> +        tp->rfbuf_ts = 0;
>> +        tp->rfbuf_cnt = 0;
>> +    } else {
>> +        tp->rfbuf_cnt += tlen;    /* add up */
>> +    }
>> +
>> +    return (newsize);
>> +}
>> +
>>   void
>>   tcp_do_segment(struct mbuf *m, struct tcphdr *th, struct socket *so,
>>       struct tcpcb *tp, int drop_hdrlen, int tlen, uint8_t iptos,
>> @@ -1849,62 +1911,7 @@ tcp_do_segment(struct mbuf *m, struct tc
>>   #endif
>>               TCP_PROBE3(debug__input, tp, th, m);
>>   -        /*
>> -         * Automatic sizing of receive socket buffer.  Often the send
>> -         * buffer size is not optimally adjusted to the actual network
>> -         * conditions at hand (delay bandwidth product). Setting the
>> -         * buffer size too small limits throughput on links with high
>> -         * bandwidth and high delay (eg. trans-continental/oceanic 
>> links).
>> -         *
>> -         * On the receive side the socket buffer memory is only rarely
>> -         * used to any significant extent.  This allows us to be much
>> -         * more aggressive in scaling the receive socket buffer.  For
>> -         * the case that the buffer space is actually used to a large
>> -         * extent and we run out of kernel memory we can simply drop
>> -         * the new segments; TCP on the sender will just retransmit it
>> -         * later.  Setting the buffer size too big may only consume too
>> -         * much kernel memory if the application doesn't read() from
>> -         * the socket or packet loss or reordering makes use of the
>> -         * reassembly queue.
>> -         *
>> -         * The criteria to step up the receive buffer one notch are:
>> -         *  1. Application has not set receive buffer size with
>> -         *     SO_RCVBUF. Setting SO_RCVBUF clears SB_AUTOSIZE.
>> -         *  2. the number of bytes received during the time it takes
>> -         *     one timestamp to be reflected back to us (the RTT);
>> -         *  3. received bytes per RTT is within seven eighth of the
>> -         *     current socket buffer size;
>> -         *  4. receive buffer size has not hit maximal automatic size;
>> -         *
>> -         * This algorithm does one step per RTT at most and only if
>> -         * we receive a bulk stream w/o packet losses or reorderings.
>> -         * Shrinking the buffer during idle times is not necessary as
>> -         * it doesn't consume any memory when idle.
>> -         *
>> -         * TODO: Only step up if the application is actually serving
>> -         * the buffer to better manage the socket buffer resources.
>> -         */
>> -            if (V_tcp_do_autorcvbuf &&
>> -                (to.to_flags & TOF_TS) &&
>> -                to.to_tsecr &&
>> -                (so->so_rcv.sb_flags & SB_AUTOSIZE)) {
>> -                if (TSTMP_GT(to.to_tsecr, tp->rfbuf_ts) &&
>> -                    to.to_tsecr - tp->rfbuf_ts < hz) {
>> -                    if (tp->rfbuf_cnt >
>> -                        (so->so_rcv.sb_hiwat / 8 * 7) &&
>> -                        so->so_rcv.sb_hiwat <
>> -                        V_tcp_autorcvbuf_max) {
>> -                        newsize =
>> -                            min(so->so_rcv.sb_hiwat +
>> -                            V_tcp_autorcvbuf_inc,
>> -                            V_tcp_autorcvbuf_max);
>> -                    }
>> -                    /* Start over with next RTT. */
>> -                    tp->rfbuf_ts = 0;
>> -                    tp->rfbuf_cnt = 0;
>> -                } else
>> -                    tp->rfbuf_cnt += tlen;    /* add up */
>> -            }
>> +            newsize = tcp_autorcvbuf(m, th, so, tp, tlen);
>>                 /* Add data to socket buffer. */
>>               SOCKBUF_LOCK(&so->so_rcv);
>> @@ -1945,10 +1952,6 @@ tcp_do_segment(struct mbuf *m, struct tc
>>           win = 0;
>>       tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));
>>   -    /* Reset receive buffer auto scaling when not in bulk receive 
>> mode. */
>> -    tp->rfbuf_ts = 0;
>> -    tp->rfbuf_cnt = 0;
>> -
>>       switch (tp->t_state) {
>>         /*
>>
>> Modified: head/sys/netinet/tcp_output.c
>> ============================================================================== 
>>
>> --- head/sys/netinet/tcp_output.c    Mon Apr 10 06:19:09 2017 (r316675)
>> +++ head/sys/netinet/tcp_output.c    Mon Apr 10 08:19:35 2017 (r316676)
>> @@ -831,11 +831,13 @@ send:
>>               to.to_tsval = tcp_ts_getticks() + tp->ts_offset;
>>               to.to_tsecr = tp->ts_recent;
>>               to.to_flags |= TOF_TS;
>> -            /* Set receive buffer autosizing timestamp. */
>> -            if (tp->rfbuf_ts == 0 &&
>> -                (so->so_rcv.sb_flags & SB_AUTOSIZE))
>> -                tp->rfbuf_ts = tcp_ts_getticks();
>>           }
>> +
>> +        /* Set receive buffer autosizing timestamp. */
>> +        if (tp->rfbuf_ts == 0 &&
>> +            (so->so_rcv.sb_flags & SB_AUTOSIZE))
>> +            tp->rfbuf_ts = tcp_ts_getticks();
>> +
>>           /* Selective ACK's. */
>>           if (tp->t_flags & TF_SACK_PERMIT) {
>>               if (flags & TH_SYN)
>>
>> Modified: head/sys/netinet/tcp_stacks/fastpath.c
>> ============================================================================== 
>>
>> --- head/sys/netinet/tcp_stacks/fastpath.c    Mon Apr 10 06:19:09 
>> 2017    (r316675)
>> +++ head/sys/netinet/tcp_stacks/fastpath.c    Mon Apr 10 08:19:35 
>> 2017    (r316676)
>> @@ -399,62 +399,8 @@ tcp_do_fastnewdata(struct mbuf *m, struc
>>                 (void *)tcp_saveipgen, &tcp_savetcp, 0);
>>   #endif
>>       TCP_PROBE3(debug__input, tp, th, m);
>> -    /*
>> -     * Automatic sizing of receive socket buffer.  Often the send
>> -     * buffer size is not optimally adjusted to the actual network
>> -     * conditions at hand (delay bandwidth product).  Setting the
>> -     * buffer size too small limits throughput on links with high
>> -     * bandwidth and high delay (eg. trans-continental/oceanic links).
>> -     *
>> -     * On the receive side the socket buffer memory is only rarely
>> -     * used to any significant extent.  This allows us to be much
>> -     * more aggressive in scaling the receive socket buffer. For
>> -     * the case that the buffer space is actually used to a large
>> -     * extent and we run out of kernel memory we can simply drop
>> -     * the new segments; TCP on the sender will just retransmit it
>> -     * later.  Setting the buffer size too big may only consume too
>> -     * much kernel memory if the application doesn't read() from
>> -     * the socket or packet loss or reordering makes use of the
>> -     * reassembly queue.
>> -     *
>> -     * The criteria to step up the receive buffer one notch are:
>> -     *  1. Application has not set receive buffer size with
>> -     *     SO_RCVBUF. Setting SO_RCVBUF clears SB_AUTOSIZE.
>> -     *  2. the number of bytes received during the time it takes
>> -     *     one timestamp to be reflected back to us (the RTT);
>> -     *  3. received bytes per RTT is within seven eighth of the
>> -     *     current socket buffer size;
>> -     *  4. receive buffer size has not hit maximal automatic size;
>> -     *
>> -     * This algorithm does one step per RTT at most and only if
>> -     * we receive a bulk stream w/o packet losses or reorderings.
>> -     * Shrinking the buffer during idle times is not necessary as
>> -     * it doesn't consume any memory when idle.
>> -     *
>> -     * TODO: Only step up if the application is actually serving
>> -     * the buffer to better manage the socket buffer resources.
>> -     */
>> -    if (V_tcp_do_autorcvbuf &&
>> -        (to->to_flags & TOF_TS) &&
>> -        to->to_tsecr &&
>> -        (so->so_rcv.sb_flags & SB_AUTOSIZE)) {
>> -        if (TSTMP_GT(to->to_tsecr, tp->rfbuf_ts) &&
>> -            to->to_tsecr - tp->rfbuf_ts < hz) {
>> -            if (tp->rfbuf_cnt >
>> -                (so->so_rcv.sb_hiwat / 8 * 7) &&
>> -                so->so_rcv.sb_hiwat <
>> -                V_tcp_autorcvbuf_max) {
>> -                newsize =
>> -                    min(so->so_rcv.sb_hiwat +
>> -                        V_tcp_autorcvbuf_inc,
>> -                        V_tcp_autorcvbuf_max);
>> -            }
>> -            /* Start over with next RTT. */
>> -            tp->rfbuf_ts = 0;
>> -            tp->rfbuf_cnt = 0;
>> -        } else
>> -            tp->rfbuf_cnt += tlen;    /* add up */
>> -    }
>> +
>> +    newsize = tcp_autorcvbuf(m, th, so, tp, tlen);
>>         /* Add data to socket buffer. */
>>       SOCKBUF_LOCK(&so->so_rcv);
>> @@ -532,10 +478,6 @@ tcp_do_slowpath(struct mbuf *m, struct t
>>           win = 0;
>>       tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));
>>   -    /* Reset receive buffer auto scaling when not in bulk receive 
>> mode. */
>> -    tp->rfbuf_ts = 0;
>> -    tp->rfbuf_cnt = 0;
>> -
>>       switch (tp->t_state) {
>>         /*
>>
>> Modified: head/sys/netinet/tcp_var.h
>> ============================================================================== 
>>
>> --- head/sys/netinet/tcp_var.h    Mon Apr 10 06:19:09 2017 (r316675)
>> +++ head/sys/netinet/tcp_var.h    Mon Apr 10 08:19:35 2017 (r316676)
>> @@ -778,6 +778,8 @@ void    hhook_run_tcp_est_in(struct tcpcb *
>>   #endif
>>     int     tcp_input(struct mbuf **, int *, int);
>> +int     tcp_autorcvbuf(struct mbuf *, struct tcphdr *, struct socket *,
>> +        struct tcpcb *, int);
>>   void     tcp_do_segment(struct mbuf *, struct tcphdr *,
>>               struct socket *, struct tcpcb *, int, int, uint8_t,
>>               int);
>>
>>
>