From owner-freebsd-net@FreeBSD.ORG  Mon Jun  4 00:22:41 2012
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 6814F106566C
	for <freebsd-net@freebsd.org>; Mon,  4 Jun 2012 00:22:41 +0000 (UTC)
	(envelope-from lstewart@freebsd.org)
Received: from lauren.room52.net (lauren.room52.net [210.50.193.198])
	by mx1.freebsd.org (Postfix) with ESMTP id EF39C8FC15
	for <freebsd-net@freebsd.org>; Mon,  4 Jun 2012 00:22:40 +0000 (UTC)
Received: from lstewart.caia.swin.edu.au (lstewart.caia.swin.edu.au
	[136.186.229.95])
	by lauren.room52.net (Postfix) with ESMTPSA id 147E77E8CB;
	Mon,  4 Jun 2012 10:22:33 +1000 (EST)
Message-ID: <4FCBFFC8.8000402@freebsd.org>
Date: Mon, 04 Jun 2012 10:22:32 +1000
From: Lawrence Stewart <lstewart@freebsd.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:10.0.2) Gecko/20120311 Thunderbird/10.0.2
MIME-Version: 1.0
To: Kevin Oberman <kob6558@gmail.com>
References: <CAN6yY1sLxFJ18ANO7nQqLetnJiT-K6pHC-X3yT1dWuWGa0VLUg@mail.gmail.com>
	<4FBF88CE.20209@cs.duke.edu>
	<CAN6yY1v+vf=SW7WDGHxCkJtOdj8K3f450jNxFWK_Jc+-pFg0nA@mail.gmail.com>
	<4FC82D6C.4050309@freebsd.org>
	<CAN6yY1v08qk2VhXFg0Qiz-pMM6md2c_E_kEvA-oqbxuvSN1JDg@mail.gmail.com>
In-Reply-To: <CAN6yY1v08qk2VhXFg0Qiz-pMM6md2c_E_kEvA-oqbxuvSN1JDg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=0.0 required=5.0 tests=UNPARSEABLE_RELAY
	autolearn=unavailable version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on lauren.room52.net
Cc: freebsd-net@freebsd.org, Andrew Gallatin <gallatin@cs.duke.edu>,
	Andrew Gallatin <gallatin@myri.com>
Subject: Re: Major performance hit with ToS setting
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jun 2012 00:22:41 -0000

On 06/03/12 15:18, Kevin Oberman wrote:
> On Fri, Jun 1, 2012 at 2:48 AM, Lawrence Stewart<lstewart@freebsd.org>  wrote:
>> On 05/31/12 13:33, Kevin Oberman wrote:
>> [snip]
>>>
>>> I used SIFTR at the suggestion of Lawrence Stewart who headed the
>>>
>>> project to bring plugable congestion algorithms to FreeBSD and found
>>> really odd congestion behavior. First, I do see a triple ACK, but the
>>> congestion window suddenly drops from 73K to 8K. If I understand
>>> CUBIC, it should half the congestion window, not what is happening..
>>> It then increases slowly (in slow start) to 82K. while the slow-start
>>> bytes are INCREASING, the congestion window again goes to 8K while the
>>> SS size moves from 36K up to 52K. It just continues to bound wildly
>>> between 8K (always the low point) and between 64k and 82K. The swings
>>> start at 83K and, over the first few seconds the peaks drop to about
>>> 64K.
>>
>>
>> Oh, and a comment about this behaviour. Dropping back to 8k (1MSS) is only
>> nasty if the TF_{CONG|FAST}RECOVERY flags are *not* set i.e. if you see cwnd
>> grow, drop to 8k with those flags set, and then when the flags are unset,
>> cwnd starts at the value of ssthresh, then that is perfectly normal recovery
>> behaviour. What *is* nasty is if an RTO fires, which will reset cwnd to 8k,
>> ssthresh to 2*MSS and make the connection effectively start from scratch
>> again.
>>
>> There is evidence of RTOs in your siftr output, which is bad news e.g here's
>> one example of 2 side-by-side log lines from your trace:
>>
>> # Direction,time,ssthresh,cwnd,flags
>> i,1338319593.574706,27044,27044,1630544864
>> o,1338319593.831482,16384,8192,1092625377
>>
>> Note the 300ms gap, and how cwnd resets to 1MSS and flags go from 1630544864
>> (TF_WASCRECOVERY|TF_CONGRECOVERY|TF_WASFRECOVERY|TF_FASTRECOVERY) to
>> 1092625377 (TF_WASCRECOVERY|TF_WASFRECOVERY).
>
> What can I say but that you are right. When I looked at the interface
> stats I found that the link overflow drops were through the roof! This
> confuses me a bit since the traffic is outbound and I woudl assume
> from the description on hte Myricom web page that these are input
> drops. A problem a problem with that card?  On systems that are
> working "normally", I still see a sharp drop with the ToS bits set,
> but nothing nearly as drastic. Now it is a drop from 4.5G to 728M on a
> cross-country (US) circuit.
>
> I am now looking for issues on the route that might explain the
> performance, but the question of why the drop-of only shows up in
> FreeBSD 8 means something odd is still going on. It is even possible
> that the problem is with 7 and the losses are due to the policy for
> ToS 32 on the path. ToS 32 is less than best effort in our network.
> Maybe the marking was getting lost on 7. Not likely, but possible.

The receiver is FreeBSD 7? If so, have you tuned your reassembly queue 
on that machine? If not, that could explain the RTOs you're seeing. Send 
through the output of "sysctl net.inet.tcp.reass" and "netstat -sp tcp" 
obtained from the receiver immediately before and after running a short 
ToS=32 test.

Cheers,
Lawrence