From owner-svn-src-head@freebsd.org Mon Apr 10 15:18:02 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2C534D375FE for ; Mon, 10 Apr 2017 15:18:02 +0000 (UTC) (envelope-from steven@multiplay.co.uk) Received: from mail-wm0-x230.google.com (mail-wm0-x230.google.com [IPv6:2a00:1450:400c:c09::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9DFD1FB6 for ; Mon, 10 Apr 2017 15:18:01 +0000 (UTC) (envelope-from steven@multiplay.co.uk) Received: by mail-wm0-x230.google.com with SMTP id u2so40145599wmu.0 for ; Mon, 10 Apr 2017 08:18:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=from:subject:to:references:message-id:date:user-agent:mime-version :in-reply-to; bh=DRSUZLWBMillVV/f1IYZl3lYWB9nNiONsNMrnQziDWI=; b=uuvUKoyxL0ac50KpIJuLPfYJKlirDHlv0W90OpLD1JlDcWOQQooVE/hsuUyHQRGUtI Gp3i6s4TaYknJXrk9/vCTwgWPSEf2lnKHaEkxKABCWbppwvGjG4aCHQsK2YWiNpfz9l8 8JChsjMUiuBIkYL8p0+QsBcnuZwFNUEEgHHQaDYfzDkwXWY7ROF2JiysjfEixhTVpWOm KAj7cUIAeB8hDy7kpafqMVudRdhClo86NjDgRlWtrv1qKb2jWfsEZlhQ0ysMEpmGkag+ E3YnakvZAlO8cqbaIuQdLo+/S/niN/6e+uULA8SABLL0w3RMkETJ6Rk71IBmAYB2wqe2 qYvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:references:message-id:date :user-agent:mime-version:in-reply-to; bh=DRSUZLWBMillVV/f1IYZl3lYWB9nNiONsNMrnQziDWI=; b=ND0eB/ZnduaUFBnr1nXYLy/0D3yHb12NMy+tdgWkzzNuFv1R7bJ5A4wnqGySAgPfQp X+Bk5zGCw7eP6L0kJjfI6FOEP02B1DvZ03Ht7ZgxyVz8Y27FpJ73B0PD28INFWNggv9F 2B6cK3FBhNBJDaynhgqWRIUfQqqvcyOPehNZ0ZbhTwTWe519p2fkZHoC86e6EYE9TmbF bLrjkRgOYMzGHXdrf1SdndFOil5JNAG91Cx3IBZmTo225fnBechwRqhKG5htT8/7CAyX xU85AtITN57vyW1B3mzDidPzUBK+w5ed0SKJ5kI5OaMyte/qUz2bglTOBeHoY2c/QrxJ CneQ== X-Gm-Message-State: AN3rC/6Pfcv/0QfUZg/4X3S51pIvXJf51K6/dhAE3GkMVJ+6238MLZnmwTywLR9nDum80X7H X-Received: by 10.28.91.82 with SMTP id p79mr9916525wmb.130.1491837479234; Mon, 10 Apr 2017 08:17:59 -0700 (PDT) Received: from [10.10.1.58] ([185.97.61.26]) by smtp.gmail.com with ESMTPSA id 201sm10697188wmr.5.2017.04.10.08.17.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Apr 2017 08:17:58 -0700 (PDT) From: Steven Hartland X-Google-Original-From: Steven Hartland Subject: Re: svn commit: r316676 - in head/sys/netinet: . tcp_stacks To: Julian Elischer , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org References: <201704100819.v3A8JZCh014531@repo.freebsd.org> <5dc195ac-d78d-1c7c-10dd-a1e21f503330@freebsd.org> Message-ID: <6f86b644-1ae3-101e-629b-4e9437e8bb85@freebsd.org> Date: Mon, 10 Apr 2017 16:17:59 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <5dc195ac-d78d-1c7c-10dd-a1e21f503330@freebsd.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 15:18:02 -0000 I don't tend to MFC 10.x now, but do agree given the impact that for this one it should be done. The fix is a little different, due to code restructuring in 11 / head, but I do have a 10.x version already. Regards Steve On 10/04/2017 15:51, Julian Elischer wrote: > If possible MFC to 10 too would be nice.. > thanks > > > On 10/4/17 4:19 pm, Steven Hartland wrote: >> Author: smh >> Date: Mon Apr 10 08:19:35 2017 >> New Revision: 316676 >> URL: https://svnweb.freebsd.org/changeset/base/316676 >> >> Log: >> Use estimated RTT for receive buffer auto resizing instead of >> timestamps >> Switched from using timestamps to RTT estimates when performing >> TCP receive >> buffer auto resizing, as not all hosts support / enable TCP >> timestamps. >> Disabled reset of receive buffer auto scaling when not in bulk >> receive mode, >> which gives an extra 20% performance increase. >> Also extracted auto resizing to a common method shared between >> standard and >> fastpath modules. >> With this AWS S3 downloads at ~17ms latency on a 1Gbps >> connection jump from >> ~3MB/s to ~100MB/s using the default settings. >> Reviewed by: lstewart, gnn >> MFC after: 2 weeks >> Relnotes: Yes >> Sponsored by: Multiplay >> Differential Revision: https://reviews.freebsd.org/D9668 >> >> Modified: >> head/sys/netinet/in_kdtrace.c >> head/sys/netinet/in_kdtrace.h >> head/sys/netinet/tcp_input.c >> head/sys/netinet/tcp_output.c >> head/sys/netinet/tcp_stacks/fastpath.c >> head/sys/netinet/tcp_var.h >> >> Modified: head/sys/netinet/in_kdtrace.c >> ============================================================================== >> >> --- head/sys/netinet/in_kdtrace.c Mon Apr 10 06:19:09 2017 (r316675) >> +++ head/sys/netinet/in_kdtrace.c Mon Apr 10 08:19:35 2017 (r316676) >> @@ -132,6 +132,14 @@ SDT_PROBE_DEFINE6_XLATE(tcp, , , state__ >> "void *", "void *", >> "int", "tcplsinfo_t *"); >> +SDT_PROBE_DEFINE6_XLATE(tcp, , , receive__autoresize, >> + "void *", "void *", >> + "struct tcpcb *", "csinfo_t *", >> + "struct mbuf *", "ipinfo_t *", >> + "struct tcpcb *", "tcpsinfo_t *" , >> + "struct tcphdr *", "tcpinfoh_t *", >> + "int", "int"); >> + >> SDT_PROBE_DEFINE5_XLATE(udp, , , receive, >> "void *", "pktinfo_t *", >> "struct inpcb *", "csinfo_t *", >> >> Modified: head/sys/netinet/in_kdtrace.h >> ============================================================================== >> >> --- head/sys/netinet/in_kdtrace.h Mon Apr 10 06:19:09 2017 (r316675) >> +++ head/sys/netinet/in_kdtrace.h Mon Apr 10 08:19:35 2017 (r316676) >> @@ -65,6 +65,7 @@ SDT_PROBE_DECLARE(tcp, , , debug__input) >> SDT_PROBE_DECLARE(tcp, , , debug__output); >> SDT_PROBE_DECLARE(tcp, , , debug__user); >> SDT_PROBE_DECLARE(tcp, , , debug__drop); >> +SDT_PROBE_DECLARE(tcp, , , receive__autoresize); >> SDT_PROBE_DECLARE(udp, , , receive); >> SDT_PROBE_DECLARE(udp, , , send); >> >> Modified: head/sys/netinet/tcp_input.c >> ============================================================================== >> >> --- head/sys/netinet/tcp_input.c Mon Apr 10 06:19:09 2017 (r316675) >> +++ head/sys/netinet/tcp_input.c Mon Apr 10 08:19:35 2017 (r316676) >> @@ -1486,6 +1486,68 @@ drop: >> return (IPPROTO_DONE); >> } >> +/* >> + * Automatic sizing of receive socket buffer. Often the send >> + * buffer size is not optimally adjusted to the actual network >> + * conditions at hand (delay bandwidth product). Setting the >> + * buffer size too small limits throughput on links with high >> + * bandwidth and high delay (eg. trans-continental/oceanic links). >> + * >> + * On the receive side the socket buffer memory is only rarely >> + * used to any significant extent. This allows us to be much >> + * more aggressive in scaling the receive socket buffer. For >> + * the case that the buffer space is actually used to a large >> + * extent and we run out of kernel memory we can simply drop >> + * the new segments; TCP on the sender will just retransmit it >> + * later. Setting the buffer size too big may only consume too >> + * much kernel memory if the application doesn't read() from >> + * the socket or packet loss or reordering makes use of the >> + * reassembly queue. >> + * >> + * The criteria to step up the receive buffer one notch are: >> + * 1. Application has not set receive buffer size with >> + * SO_RCVBUF. Setting SO_RCVBUF clears SB_AUTOSIZE. >> + * 2. the number of bytes received during the time it takes >> + * one timestamp to be reflected back to us (the RTT); >> + * 3. received bytes per RTT is within seven eighth of the >> + * current socket buffer size; >> + * 4. receive buffer size has not hit maximal automatic size; >> + * >> + * This algorithm does one step per RTT at most and only if >> + * we receive a bulk stream w/o packet losses or reorderings. >> + * Shrinking the buffer during idle times is not necessary as >> + * it doesn't consume any memory when idle. >> + * >> + * TODO: Only step up if the application is actually serving >> + * the buffer to better manage the socket buffer resources. >> + */ >> +int >> +tcp_autorcvbuf(struct mbuf *m, struct tcphdr *th, struct socket *so, >> + struct tcpcb *tp, int tlen) >> +{ >> + int newsize = 0; >> + >> + if (V_tcp_do_autorcvbuf && (so->so_rcv.sb_flags & SB_AUTOSIZE) && >> + tp->t_srtt != 0 && tp->rfbuf_ts != 0 && >> + TCP_TS_TO_TICKS(tcp_ts_getticks() - tp->rfbuf_ts) > >> + (tp->t_srtt >> TCP_RTT_SHIFT)) { >> + if (tp->rfbuf_cnt > (so->so_rcv.sb_hiwat / 8 * 7) && >> + so->so_rcv.sb_hiwat < V_tcp_autorcvbuf_max) { >> + newsize = min(so->so_rcv.sb_hiwat + >> + V_tcp_autorcvbuf_inc, V_tcp_autorcvbuf_max); >> + } >> + TCP_PROBE6(receive__autoresize, NULL, tp, m, tp, th, newsize); >> + >> + /* Start over with next RTT. */ >> + tp->rfbuf_ts = 0; >> + tp->rfbuf_cnt = 0; >> + } else { >> + tp->rfbuf_cnt += tlen; /* add up */ >> + } >> + >> + return (newsize); >> +} >> + >> void >> tcp_do_segment(struct mbuf *m, struct tcphdr *th, struct socket *so, >> struct tcpcb *tp, int drop_hdrlen, int tlen, uint8_t iptos, >> @@ -1849,62 +1911,7 @@ tcp_do_segment(struct mbuf *m, struct tc >> #endif >> TCP_PROBE3(debug__input, tp, th, m); >> - /* >> - * Automatic sizing of receive socket buffer. Often the send >> - * buffer size is not optimally adjusted to the actual network >> - * conditions at hand (delay bandwidth product). Setting the >> - * buffer size too small limits throughput on links with high >> - * bandwidth and high delay (eg. trans-continental/oceanic >> links). >> - * >> - * On the receive side the socket buffer memory is only rarely >> - * used to any significant extent. This allows us to be much >> - * more aggressive in scaling the receive socket buffer. For >> - * the case that the buffer space is actually used to a large >> - * extent and we run out of kernel memory we can simply drop >> - * the new segments; TCP on the sender will just retransmit it >> - * later. Setting the buffer size too big may only consume too >> - * much kernel memory if the application doesn't read() from >> - * the socket or packet loss or reordering makes use of the >> - * reassembly queue. >> - * >> - * The criteria to step up the receive buffer one notch are: >> - * 1. Application has not set receive buffer size with >> - * SO_RCVBUF. Setting SO_RCVBUF clears SB_AUTOSIZE. >> - * 2. the number of bytes received during the time it takes >> - * one timestamp to be reflected back to us (the RTT); >> - * 3. received bytes per RTT is within seven eighth of the >> - * current socket buffer size; >> - * 4. receive buffer size has not hit maximal automatic size; >> - * >> - * This algorithm does one step per RTT at most and only if >> - * we receive a bulk stream w/o packet losses or reorderings. >> - * Shrinking the buffer during idle times is not necessary as >> - * it doesn't consume any memory when idle. >> - * >> - * TODO: Only step up if the application is actually serving >> - * the buffer to better manage the socket buffer resources. >> - */ >> - if (V_tcp_do_autorcvbuf && >> - (to.to_flags & TOF_TS) && >> - to.to_tsecr && >> - (so->so_rcv.sb_flags & SB_AUTOSIZE)) { >> - if (TSTMP_GT(to.to_tsecr, tp->rfbuf_ts) && >> - to.to_tsecr - tp->rfbuf_ts < hz) { >> - if (tp->rfbuf_cnt > >> - (so->so_rcv.sb_hiwat / 8 * 7) && >> - so->so_rcv.sb_hiwat < >> - V_tcp_autorcvbuf_max) { >> - newsize = >> - min(so->so_rcv.sb_hiwat + >> - V_tcp_autorcvbuf_inc, >> - V_tcp_autorcvbuf_max); >> - } >> - /* Start over with next RTT. */ >> - tp->rfbuf_ts = 0; >> - tp->rfbuf_cnt = 0; >> - } else >> - tp->rfbuf_cnt += tlen; /* add up */ >> - } >> + newsize = tcp_autorcvbuf(m, th, so, tp, tlen); >> /* Add data to socket buffer. */ >> SOCKBUF_LOCK(&so->so_rcv); >> @@ -1945,10 +1952,6 @@ tcp_do_segment(struct mbuf *m, struct tc >> win = 0; >> tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt)); >> - /* Reset receive buffer auto scaling when not in bulk receive >> mode. */ >> - tp->rfbuf_ts = 0; >> - tp->rfbuf_cnt = 0; >> - >> switch (tp->t_state) { >> /* >> >> Modified: head/sys/netinet/tcp_output.c >> ============================================================================== >> >> --- head/sys/netinet/tcp_output.c Mon Apr 10 06:19:09 2017 (r316675) >> +++ head/sys/netinet/tcp_output.c Mon Apr 10 08:19:35 2017 (r316676) >> @@ -831,11 +831,13 @@ send: >> to.to_tsval = tcp_ts_getticks() + tp->ts_offset; >> to.to_tsecr = tp->ts_recent; >> to.to_flags |= TOF_TS; >> - /* Set receive buffer autosizing timestamp. */ >> - if (tp->rfbuf_ts == 0 && >> - (so->so_rcv.sb_flags & SB_AUTOSIZE)) >> - tp->rfbuf_ts = tcp_ts_getticks(); >> } >> + >> + /* Set receive buffer autosizing timestamp. */ >> + if (tp->rfbuf_ts == 0 && >> + (so->so_rcv.sb_flags & SB_AUTOSIZE)) >> + tp->rfbuf_ts = tcp_ts_getticks(); >> + >> /* Selective ACK's. */ >> if (tp->t_flags & TF_SACK_PERMIT) { >> if (flags & TH_SYN) >> >> Modified: head/sys/netinet/tcp_stacks/fastpath.c >> ============================================================================== >> >> --- head/sys/netinet/tcp_stacks/fastpath.c Mon Apr 10 06:19:09 >> 2017 (r316675) >> +++ head/sys/netinet/tcp_stacks/fastpath.c Mon Apr 10 08:19:35 >> 2017 (r316676) >> @@ -399,62 +399,8 @@ tcp_do_fastnewdata(struct mbuf *m, struc >> (void *)tcp_saveipgen, &tcp_savetcp, 0); >> #endif >> TCP_PROBE3(debug__input, tp, th, m); >> - /* >> - * Automatic sizing of receive socket buffer. Often the send >> - * buffer size is not optimally adjusted to the actual network >> - * conditions at hand (delay bandwidth product). Setting the >> - * buffer size too small limits throughput on links with high >> - * bandwidth and high delay (eg. trans-continental/oceanic links). >> - * >> - * On the receive side the socket buffer memory is only rarely >> - * used to any significant extent. This allows us to be much >> - * more aggressive in scaling the receive socket buffer. For >> - * the case that the buffer space is actually used to a large >> - * extent and we run out of kernel memory we can simply drop >> - * the new segments; TCP on the sender will just retransmit it >> - * later. Setting the buffer size too big may only consume too >> - * much kernel memory if the application doesn't read() from >> - * the socket or packet loss or reordering makes use of the >> - * reassembly queue. >> - * >> - * The criteria to step up the receive buffer one notch are: >> - * 1. Application has not set receive buffer size with >> - * SO_RCVBUF. Setting SO_RCVBUF clears SB_AUTOSIZE. >> - * 2. the number of bytes received during the time it takes >> - * one timestamp to be reflected back to us (the RTT); >> - * 3. received bytes per RTT is within seven eighth of the >> - * current socket buffer size; >> - * 4. receive buffer size has not hit maximal automatic size; >> - * >> - * This algorithm does one step per RTT at most and only if >> - * we receive a bulk stream w/o packet losses or reorderings. >> - * Shrinking the buffer during idle times is not necessary as >> - * it doesn't consume any memory when idle. >> - * >> - * TODO: Only step up if the application is actually serving >> - * the buffer to better manage the socket buffer resources. >> - */ >> - if (V_tcp_do_autorcvbuf && >> - (to->to_flags & TOF_TS) && >> - to->to_tsecr && >> - (so->so_rcv.sb_flags & SB_AUTOSIZE)) { >> - if (TSTMP_GT(to->to_tsecr, tp->rfbuf_ts) && >> - to->to_tsecr - tp->rfbuf_ts < hz) { >> - if (tp->rfbuf_cnt > >> - (so->so_rcv.sb_hiwat / 8 * 7) && >> - so->so_rcv.sb_hiwat < >> - V_tcp_autorcvbuf_max) { >> - newsize = >> - min(so->so_rcv.sb_hiwat + >> - V_tcp_autorcvbuf_inc, >> - V_tcp_autorcvbuf_max); >> - } >> - /* Start over with next RTT. */ >> - tp->rfbuf_ts = 0; >> - tp->rfbuf_cnt = 0; >> - } else >> - tp->rfbuf_cnt += tlen; /* add up */ >> - } >> + >> + newsize = tcp_autorcvbuf(m, th, so, tp, tlen); >> /* Add data to socket buffer. */ >> SOCKBUF_LOCK(&so->so_rcv); >> @@ -532,10 +478,6 @@ tcp_do_slowpath(struct mbuf *m, struct t >> win = 0; >> tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt)); >> - /* Reset receive buffer auto scaling when not in bulk receive >> mode. */ >> - tp->rfbuf_ts = 0; >> - tp->rfbuf_cnt = 0; >> - >> switch (tp->t_state) { >> /* >> >> Modified: head/sys/netinet/tcp_var.h >> ============================================================================== >> >> --- head/sys/netinet/tcp_var.h Mon Apr 10 06:19:09 2017 (r316675) >> +++ head/sys/netinet/tcp_var.h Mon Apr 10 08:19:35 2017 (r316676) >> @@ -778,6 +778,8 @@ void hhook_run_tcp_est_in(struct tcpcb * >> #endif >> int tcp_input(struct mbuf **, int *, int); >> +int tcp_autorcvbuf(struct mbuf *, struct tcphdr *, struct socket *, >> + struct tcpcb *, int); >> void tcp_do_segment(struct mbuf *, struct tcphdr *, >> struct socket *, struct tcpcb *, int, int, uint8_t, >> int); >> >> >