From owner-freebsd-net@FreeBSD.ORG Mon Jan 14 21:04:35 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 439D1BEF; Mon, 14 Jan 2013 21:04:35 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 0AE62A47; Mon, 14 Jan 2013 21:04:35 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 24773B946; Mon, 14 Jan 2013 16:04:34 -0500 (EST) From: John Baldwin To: net@freebsd.org Subject: Some questions about the new TCP congestion control code Date: Mon, 14 Jan 2013 16:04:29 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201301141604.29864.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 14 Jan 2013 16:04:34 -0500 (EST) Cc: Lawrence Stewart X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 21:04:35 -0000 I was looking at TCP congestion control at work recently and noticed a few "odd" things in the current code. First, there is this chunk of code in cc_ack_received() in tcp_input.c: static void inline cc_ack_received(struct tcpcb *tp, struct tcphdr *th, uint16_t type) { INP_WLOCK_ASSERT(tp->t_inpcb); tp->ccv->bytes_this_ack = BYTES_THIS_ACK(tp, th); if (tp->snd_cwnd == min(tp->snd_cwnd, tp->snd_wnd)) tp->ccv->flags |= CCF_CWND_LIMITED; else tp->ccv->flags &= ~CCF_CWND_LIMITED; Due to hysterical raisins, snd_cwnd and snd_wnd are u_long values, not integers, so the call to min() results in truncation on 64-bit hosts. It should probably be ulmin() instead. However, this line seems to be a really obfuscated way to just write: if (tp->snd_cwnd <= tp->snd_wnd) If that is correct, I would vote for changing this to use the much simpler logic. Secondly, in the particular case I was investigating at work (restart of an idle connnection), the newreno congestion control code in 8.x and later uses a different algorithm than in 7. Specifically, in 7 TCP would reuse the same logic used for an initial cwnd (honoring ss_fltsz). In 8 this no longer happens (instead, 2 is hardcoded). A guess at a possible fix might look something like this: Index: cc_newreno.c =================================================================== --- cc_newreno.c (revision 243660) +++ cc_newreno.c (working copy) @@ -169,8 +169,21 @@ newreno_after_idle(struct cc_var *ccv) if (V_tcp_do_rfc3390) rw = min(4 * CCV(ccv, t_maxseg), max(2 * CCV(ccv, t_maxseg), 4380)); +#if 1 else rw = CCV(ccv, t_maxseg) * 2; +#else + /* XXX: This is missing a lot of stuff that used to be in 7. */ +#ifdef INET6 + else if ((isipv6 ? in6_localaddr(&CCV(ccv, t_inpcb->in6p_faddr)) : + in_localaddr(CCV(ccv, t_inpcb->inp_faddr)))) +#else + else if (in_localaddr(CCV(ccv, t_inpcb->inp_faddr))) +#endif + rw = V_ss_fltsz_local * CCV(ccv, t_maxseg); + else + rw = V_ss_fltsz * CCV(ccv, t_maxseg); +#endif CCV(ccv, snd_cwnd) = min(rw, CCV(ccv, snd_cwnd)); } (But using the #else clause instead of the current #if 1 code). Was this change in 8 intentional? If not, I would suggest that a real fix would be to export a function that includes the logic to compute the initial cwnd and share that between cc_conn_init() in tcp_input.c and cc_newreno.c. (Yes, I realize that ss_fltsz and friends are gone in 10, but if this was a bug then the fix I suggested above of using a common function could be applied to 8 to restore that functionality if desired. The bigger point is to make sure what we are doing is correct as I'm not sure.) -- John Baldwin