From owner-freebsd-net@FreeBSD.ORG Mon Sep 13 17:40:50 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 27EFB106566C; Mon, 13 Sep 2010 17:40:50 +0000 (UTC) (envelope-from mdounin@mdounin.ru) Received: from mdounin.cust.ramtel.ru (mdounin.cust.ramtel.ru [81.19.69.81]) by mx1.freebsd.org (Postfix) with ESMTP id 9B2D98FC13; Mon, 13 Sep 2010 17:40:49 +0000 (UTC) Received: from mdounin.ru (mdounin.cust.ramtel.ru [81.19.69.81]) by mdounin.cust.ramtel.ru (Postfix) with ESMTP id EB49E1701C; Mon, 13 Sep 2010 21:24:53 +0400 (MSD) Date: Mon, 13 Sep 2010 21:24:53 +0400 From: Maxim Dounin To: Andre Oppermann Message-ID: <20100913172453.GG99657@mdounin.ru> References: <20100512124702.GJ2679@rambler-co.ru> <20100713140051.GV99657@mdounin.ru> <4C5BCE48.5070504@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C5BCE48.5070504@freebsd.org> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-net@freebsd.org, Igor Sysoev , Lawrence Stewart Subject: Re: net.inet.tcp.slowstart_flightsize in 8-STABLE X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Sep 2010 17:40:50 -0000 Hello! On Fri, Aug 06, 2010 at 10:56:40AM +0200, Andre Oppermann wrote: > On 13.07.2010 16:01, Maxim Dounin wrote: > >On Wed, May 12, 2010 at 04:47:02PM +0400, Igor Sysoev wrote: > > > >>It seems that net.inet.tcp.slowstart_flightsize does not work in 8-STABLE. > >>For a long time I used slowstart_flightsize=2 on FreeBSD 4, 6, and 7 hosts. > >>However, FreeBSD-8 always starts with the single packet. > >>I saw this on different versions of 8-STABLE since 8 Oct 2009 till > >>04 Apr 2010. > > > >Finally I had some time to look into it (sorry for long delay). > > > >1. Slow start isn't used on recent FreeBSD versions for initial snd_cwnd > >calculations as long as you have rfc3390 support switched on (default since > >Jan 06 23:29:46 2004, at least in 7.*). It effectively sets initial > >snd_cwnd to 3*MSS on common networks and shouldn't cause any problems. > >Slowstart_flightsize only affects connection restarts. > > > >2. Due to bug in syncache code (patch below) all accepted connections has > >their snd_cwnd reset to 1*MSS (since r171639, 7.0+ AFAIR). > > > >3. Support for rfc3465 introduced in r187289 uncovered (2) as > >ACK to SYN/ACK no longer causes snd_cwnd increase by MSS (actually, this > >increase shouldn't happen as it's explicitly forbidden by rfc 3390, but > >it's another issue). Snd_cwnd remains really small (1*MSS + 1) and this > >causes really bad interaction with delayed acks on other side. > > > >As a workaround to delayed acks interaction problems you may disable > >rfc3465 by setting net.inet.tcp.rfc3465 to 0. Correct fix would be to apply > >the patch below. > > > >To Andre Oppermann: could you please take a look at the patch and > >commit it if found appropriate? > > I've committed your fix with svn r210666. In a few days I will MFC it back > to the stable branches. Thanks for reporting the bug and a patch for it. Andre, could you please take a look at one more patch as well? Igor reported that it still sees 100ms delays with rfc3465 turned on, and it turns out to be similar issue (setting cwnd to 1*MSS) for hosts found in hostcache. The problem with setting cwnd from hostcache was already reported here: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/92690 http://lists.freebsd.org/pipermail/freebsd-net/2007-July/014780.html As using larger cwnd from hostcache may cause problems on congested links (see second thread) I changed code to only use cached cwnd as an upper bound for cwnd (instead of fixing current code). This is also in-line with what we do on connection restarts. We may later consider re-adding usage of larger cwnd from hostcache. But I believe it should be done carefully and probably behind sysctl, off by default. # HG changeset patch # User Maxim Dounin # Date 1284352618 -14400 # Node ID bbb9fea7978b26b95e96d463238a3acd8bfb5575 # Parent 6aec795c568cf6b9d2fabf8b8b9e25ad75b053d0 Use cwnd from hostcache only as upper bound. Setting initial congestion window from hostcache wasn't working for accepted connection since introduction due to tp->snd_wnd being 0. As a result it was instead limiting cwnd on such connections to 1*MSS. With net.inet.tcp.rfc3465 enabled this results bad interaction with delayed acks and 100ms delays for hosts found in hostcache. Additionally, it's considered unsafe to use initial congestion window larger than thouse specified in RFC3390 as this may cause problems on congested links. RFC5681 says equation from RFC3390 MUST be used as upper bound. Links: http://lists.freebsd.org/pipermail/freebsd-net/2007-July/014780.html http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/92690 diff --git a/netinet/tcp_input.c b/netinet/tcp_input.c --- a/netinet/tcp_input.c +++ b/netinet/tcp_input.c @@ -3332,29 +3332,13 @@ tcp_mss(struct tcpcb *tp, int offer) tp->snd_bandwidth = metrics.rmx_bandwidth; /* - * Set the slow-start flight size depending on whether this - * is a local network or not. + * Set initial congestion window per RFC3390. Alternatively, set + * flight size depending on whether this is a local network or not. * - * Extend this so we cache the cwnd too and retrieve it here. - * Make cwnd even bigger than RFC3390 suggests but only if we - * have previous experience with the remote host. Be careful - * not make cwnd bigger than remote receive window or our own - * send socket buffer. Maybe put some additional upper bound - * on the retrieved cwnd. Should do incremental updates to - * hostcache when cwnd collapses so next connection doesn't - * overloads the path again. - * - * RFC3390 says only do this if SYN or SYN/ACK didn't got lost. - * We currently check only in syncache_socket for that. + * RFC3390 says we MUST limit initial window to one segment if SYN + * or SYN/ACK is lost. We currently check only in syncache_socket() + * for that. */ -#define TCP_METRICS_CWND -#ifdef TCP_METRICS_CWND - if (metrics.rmx_cwnd) - tp->snd_cwnd = max(mss, - min(metrics.rmx_cwnd / 2, - min(tp->snd_wnd, so->so_snd.sb_hiwat))); - else -#endif if (V_tcp_do_rfc3390) tp->snd_cwnd = min(4 * mss, max(2 * mss, 4380)); #ifdef INET6 @@ -3367,6 +3351,10 @@ tcp_mss(struct tcpcb *tp, int offer) else tp->snd_cwnd = mss * V_ss_fltsz; + /* If we have cached cwnd - use it as an upper bound. */ + if (metrics.rmx_cwnd) + tp->snd_cwnd = max(mss, min(tp->snd_cwnd, metrics.rmx_cwnd)); + /* Check the interface for TSO capabilities. */ if (mtuflags & CSUM_TSO) tp->t_flags |= TF_TSO; Maxim Dounin