From owner-svn-src-stable@FreeBSD.ORG Sat Nov 6 15:49:59 2010 Return-Path: Delivered-To: svn-src-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA510106566B; Sat, 6 Nov 2010 15:49:59 +0000 (UTC) (envelope-from lstewart@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 97BED8FC13; Sat, 6 Nov 2010 15:49:59 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id oA6Fnxaf036075; Sat, 6 Nov 2010 15:49:59 GMT (envelope-from lstewart@svn.freebsd.org) Received: (from lstewart@localhost) by svn.freebsd.org (8.14.3/8.14.3/Submit) id oA6FnxCs036073; Sat, 6 Nov 2010 15:49:59 GMT (envelope-from lstewart@svn.freebsd.org) Message-Id: <201011061549.oA6FnxCs036073@svn.freebsd.org> From: Lawrence Stewart Date: Sat, 6 Nov 2010 15:49:59 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-7@freebsd.org X-SVN-Group: stable-7 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r214890 - stable/7/sys/netinet X-BeenThere: svn-src-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for all the -stable branches of the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Nov 2010 15:49:59 -0000 Author: lstewart Date: Sat Nov 6 15:49:59 2010 New Revision: 214890 URL: http://svn.freebsd.org/changeset/base/214890 Log: MFC r213913: Retire the system-wide, per-reassembly queue segment limit. The mechanism is far too coarse grained to be useful and the default value significantly degrades TCP performance on moderate to high bandwidth-delay product paths with non-zero loss (e.g. 5+Mbps connections across the public Internet often suffer). Replace the outgoing mechanism with an individual per-queue limit based on the number of MSS segments that fit into the socket's receive buffer. This should strike a good balance between performance and the potential for resource exhaustion when FreeBSD is acting as a TCP receiver. With socket buffer autotuning (which is enabled by default), the reassembly queue tracks the socket buffer and benefits too. As the XXX comment suggests, my testing uncovered some unexpected behaviour which requires further investigation. By using so->so_rcv.sb_hiwat instead of sbspace(&so->so_rcv), we allow more segments to be held across both the socket receive buffer and reassembly queue than we probably should. The tradeoff is better performance in at least one common scenario, versus a devious sender's ability to consume more resources on a FreeBSD receiver. The base code from r213913 was modified as part of this MFC in order to work correctly on FreeBSD 7. Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo Modified: stable/7/sys/netinet/tcp_reass.c Directory Properties: stable/7/sys/ (props changed) stable/7/sys/cddl/contrib/opensolaris/ (props changed) stable/7/sys/contrib/dev/acpica/ (props changed) stable/7/sys/contrib/pf/ (props changed) Modified: stable/7/sys/netinet/tcp_reass.c ============================================================================== --- stable/7/sys/netinet/tcp_reass.c Sat Nov 6 15:40:34 2010 (r214889) +++ stable/7/sys/netinet/tcp_reass.c Sat Nov 6 15:49:59 2010 (r214890) @@ -89,11 +89,6 @@ SYSCTL_PROC(_net_inet_tcp_reass, OID_AUT &tcp_reass_qsize, 0, &tcp_reass_sysctl_qsize, "I", "Global number of TCP Segments currently in Reassembly Queue"); -static int tcp_reass_maxqlen = 48; -SYSCTL_INT(_net_inet_tcp_reass, OID_AUTO, maxqlen, CTLFLAG_RW, - &tcp_reass_maxqlen, 0, - "Maximum number of TCP Segments per individual Reassembly Queue"); - static int tcp_reass_overflows = 0; SYSCTL_INT(_net_inet_tcp_reass, OID_AUTO, overflows, CTLFLAG_RD, &tcp_reass_overflows, 0, @@ -183,13 +178,23 @@ tcp_reass(struct tcpcb *tp, struct tcphd goto present; /* - * Limit the number of segments in the reassembly queue to prevent - * holding on to too many segments (and thus running out of mbufs). - * Make sure to let the missing segment through which caused this - * queue. + * Limit the number of segments that can be queued to reduce the + * potential for mbuf exhaustion. For best performance, we want to be + * able to queue a full window's worth of segments. The size of the + * socket receive buffer determines our advertised window and grows + * automatically when socket buffer autotuning is enabled. Use it as the + * basis for our queue limit. + * Always let the missing segment through which caused this queue. + * NB: Access to the socket buffer is left intentionally unlocked as we + * can tolerate stale information here. + * + * XXXLAS: Using sbspace(so->so_rcv) instead of so->so_rcv.sb_hiwat + * should work but causes packets to be dropped when they shouldn't. + * Investigate why and re-evaluate the below limit after the behaviour + * is understood. */ if (th->th_seq != tp->rcv_nxt && - tp->t_segqlen >= tcp_reass_maxqlen) { + tp->t_segqlen >= (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1) { tcp_reass_overflows++; tcpstat.tcps_rcvmemdrop++; m_freem(m);