Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Mar 2009 01:44:07 GMT
From:      Renaud Lienhart <renaud@vmware.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/132832: tcp_output() might generate invalid TSO frames when len > TCP_MAXWIN - hdrlen - optlen
Message-ID:  <200903200144.n2K1i7QY017113@www.freebsd.org>
Resent-Message-ID: <200903200150.n2K1o1wO021450@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         132832
>Category:       kern
>Synopsis:       tcp_output() might generate invalid TSO frames when len > TCP_MAXWIN - hdrlen - optlen
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Mar 20 01:50:01 UTC 2009
>Closed-Date:
>Last-Modified:
>Originator:     Renaud Lienhart
>Release:        FreeBSD 7.1
>Organization:
VMware, Inc.
>Environment:
>Description:
The tcp_output() routine has an issue when the send window exceeds TCP_MAXWIN and the underlying interface supports TSO.

There is a check to ensure the data being pushed doesn't exceed the maximum TSO packet size. If this is the case, "len" is trimmed and "sendalot" is set:

--- 8< ---
if (tso) {
	if (len > TCP_MAXWIN - hdrlen - optlen) {
		len = TCP_MAXWIN - hdrlen - optlen;
		len = len - (len % (tp->t_maxopd - optlen));
		sendalot = 1;
--- >8 ---

Everything's hunky-dory, until the next "sendalot" iteration;

If the remaining window does not require TSO (i.e. len <= tp->t_maxseg), this following piece of code fails to properly reset "tso":

--- 8< ---
if (len > tp->t_maxseg) {
	if (<tso_conds>)
		tso = 1;
	} else {
		len = tp->t_maxseg;
		sendalot = 1;
		tso = 0;
	}
}
--- >8 ---

"tso" remains set to 1 and the resulting packet is tagged with CSUM_TSO although it doesn't require TSO. This causes problems with a large number of nics which refuse to send the resulting frame and, in some case, wedge.
>How-To-Repeat:
Using netperf (or any TCP workload) with a large socket buffer size exposes the issue in a matter of seconds.
>Fix:
The solution is to always reset "tso" at the beginning of the "sendalot" loop in order to ensure it is not stale.

In my patch I also added a KASSERT() which directly catches any problematic frame before it reaches any other layer.

Patch attached with submission follows:

Index: netinet/tcp_output.c
===================================================================
--- netinet/tcp_output.c	(revision 190117)
+++ netinet/tcp_output.c	(working copy)
@@ -140,7 +140,7 @@
 	int idle, sendalot;
 	int sack_rxmit, sack_bytes_rxmt;
 	struct sackhole *p;
-	int tso = 0;
+	int tso;
 	struct tcpopt to;
 #if 0
 	int maxburst = TCP_MAXBURST;
@@ -198,6 +198,7 @@
 	    SEQ_LT(tp->snd_nxt, tp->snd_max))
 		tcp_sack_adjust(tp);
 	sendalot = 0;
+	tso = 0;
 	off = tp->snd_nxt - tp->snd_una;
 	sendwin = min(tp->snd_wnd, tp->snd_cwnd);
 	sendwin = min(sendwin, tp->snd_bwnd);
@@ -477,7 +478,6 @@
 		} else {
 			len = tp->t_maxseg;
 			sendalot = 1;
-			tso = 0;
 		}
 	}
 	if (sack_rxmit) {
@@ -996,6 +996,8 @@
 	 * XXX: Fixme: This is currently not the case for IPv6.
 	 */
 	if (tso) {
+		KASSERT(len > tp->t_maxopd - optlen,
+		    ("%s: len <= tso_segsz", __func__));
 		m->m_pkthdr.csum_flags = CSUM_TSO;
 		m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen;
 	}


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200903200144.n2K1i7QY017113>