Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 16 Oct 2010 07:12:40 +0000 (UTC)
From:      Lawrence Stewart <lstewart@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   svn commit: r213913 - head/sys/netinet
Message-ID:  <201010160712.o9G7Ce0S058025@svn.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: lstewart
Date: Sat Oct 16 07:12:39 2010
New Revision: 213913
URL: http://svn.freebsd.org/changeset/base/213913

Log:
  Retire the system-wide, per-reassembly queue segment limit. The mechanism is far
  too coarse grained to be useful and the default value significantly degrades TCP
  performance on moderate to high bandwidth-delay product paths with non-zero loss
  (e.g. 5+Mbps connections across the public Internet often suffer).
  
  Replace the outgoing mechanism with an individual per-queue limit based on the
  number of MSS segments that fit into the socket's receive buffer. This should
  strike a good balance between performance and the potential for resource
  exhaustion when FreeBSD is acting as a TCP receiver. With socket buffer
  autotuning (which is enabled by default), the reassembly queue tracks the
  socket buffer and benefits too.
  
  As the XXX comment suggests, my testing uncovered some unexpected behaviour
  which requires further investigation. By using so->so_rcv.sb_hiwat
  instead of sbspace(&so->so_rcv), we allow more segments to be held across both
  the socket receive buffer and reassembly queue than we probably should. The
  tradeoff is better performance in at least one common scenario, versus a devious
  sender's ability to consume more resources on a FreeBSD receiver.
  
  Sponsored by:	FreeBSD Foundation
  Reviewed by:	andre, gnn, rpaulo
  MFC after:	2 weeks

Modified:
  head/sys/netinet/tcp_reass.c

Modified: head/sys/netinet/tcp_reass.c
==============================================================================
--- head/sys/netinet/tcp_reass.c	Sat Oct 16 05:37:45 2010	(r213912)
+++ head/sys/netinet/tcp_reass.c	Sat Oct 16 07:12:39 2010	(r213913)
@@ -92,12 +92,6 @@ SYSCTL_VNET_PROC(_net_inet_tcp_reass, OI
     &VNET_NAME(tcp_reass_qsize), 0, &tcp_reass_sysctl_qsize, "I",
     "Global number of TCP Segments currently in Reassembly Queue");
 
-static VNET_DEFINE(int, tcp_reass_maxqlen) = 48;
-#define	V_tcp_reass_maxqlen		VNET(tcp_reass_maxqlen)
-SYSCTL_VNET_INT(_net_inet_tcp_reass, OID_AUTO, maxqlen, CTLFLAG_RW,
-    &VNET_NAME(tcp_reass_maxqlen), 0,
-    "Maximum number of TCP Segments per individual Reassembly Queue");
-
 static VNET_DEFINE(int, tcp_reass_overflows) = 0;
 #define	V_tcp_reass_overflows		VNET(tcp_reass_overflows)
 SYSCTL_VNET_INT(_net_inet_tcp_reass, OID_AUTO, overflows, CTLFLAG_RD,
@@ -197,13 +191,23 @@ tcp_reass(struct tcpcb *tp, struct tcphd
 		goto present;
 
 	/*
-	 * Limit the number of segments in the reassembly queue to prevent
-	 * holding on to too many segments (and thus running out of mbufs).
-	 * Make sure to let the missing segment through which caused this
-	 * queue.
+	 * Limit the number of segments that can be queued to reduce the
+	 * potential for mbuf exhaustion. For best performance, we want to be
+	 * able to queue a full window's worth of segments. The size of the
+	 * socket receive buffer determines our advertised window and grows
+	 * automatically when socket buffer autotuning is enabled. Use it as the
+	 * basis for our queue limit.
+	 * Always let the missing segment through which caused this queue.
+	 * NB: Access to the socket buffer is left intentionally unlocked as we
+	 * can tolerate stale information here.
+	 *
+	 * XXXLAS: Using sbspace(so->so_rcv) instead of so->so_rcv.sb_hiwat
+	 * should work but causes packets to be dropped when they shouldn't.
+	 * Investigate why and re-evaluate the below limit after the behaviour
+	 * is understood.
 	 */
 	if (th->th_seq != tp->rcv_nxt &&
-	    tp->t_segqlen >= V_tcp_reass_maxqlen) {
+	    tp->t_segqlen >= (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1) {
 		V_tcp_reass_overflows++;
 		TCPSTAT_INC(tcps_rcvmemdrop);
 		m_freem(m);



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201010160712.o9G7Ce0S058025>