From owner-freebsd-net@FreeBSD.ORG  Fri Jan 24 05:06:50 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 65459FB0
 for <freebsd-net@freebsd.org>; Fri, 24 Jan 2014 05:06:50 +0000 (UTC)
Received: from mail-ie0-x22c.google.com (mail-ie0-x22c.google.com
 [IPv6:2607:f8b0:4001:c03::22c])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 341871438
 for <freebsd-net@freebsd.org>; Fri, 24 Jan 2014 05:06:50 +0000 (UTC)
Received: by mail-ie0-f172.google.com with SMTP id e14so2353491iej.17
 for <freebsd-net@freebsd.org>; Thu, 23 Jan 2014 21:06:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=VHCtYxYbxKg9hU3ZlP69fzTncVdbKNholfqHChBSp3k=;
 b=n27AMdgVjMa8+X3Vh4u34cGOqRil+sNjcgmr4G2GKILnSQfGuBlPiTePNLCy1EedZq
 8L0bt7S9IoZ6imaFEVJZZqHdlGFpI9144iKXy/xuQ88fD+Zd18YqxVFTfwBTSJVHM1Mn
 imhWI+M+Mzns7Hk0qEzNnx8hk1d1MOhtPgIwQ1UE+WklML39jmJVodrEaSlxVYtJYytH
 h21nt+uIntGblycUE6uISyF3G6w7pV1p9NwDnRdbsYzfwRQCVES2Hskc7OgJrDJVeTKM
 b1yM5K+qaPuyLfpMnH+IMT4LJuFMm5fW1lmmu37heER2FdbWrKTNFJ3QrRBC5dYdYLfC
 CLqw==
MIME-Version: 1.0
X-Received: by 10.42.122.146 with SMTP id n18mr8956971icr.41.1390540009750;
 Thu, 23 Jan 2014 21:06:49 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Thu, 23 Jan 2014 21:06:49 -0800 (PST)
In-Reply-To: <390483613.15499210.1390530437153.JavaMail.root@uoguelph.ca>
References: <CABXB=RToav++V38pOorVPWpgZSuYmL-x7e8oxd3ayJCmAtLn-g@mail.gmail.com>
 <390483613.15499210.1390530437153.JavaMail.root@uoguelph.ca>
Date: Fri, 24 Jan 2014 00:06:49 -0500
X-Google-Sender-Auth: J-RDA1etKcWy_3X8t-BB1Pnxidc
Message-ID: <CABXB=RSebaWTD1LjQz__ZZ3EJwTpOMpxq0Q=bt4280dx+0auCw@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Jan 2014 05:06:50 -0000

On Thu, Jan 23, 2014 at 9:27 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Well, my TCP is pretty rusty, but...
> Since your stats didn't show any jumbo frames, each IP
> datagram needs to fit in the MTU of 1500bytes. NFS hands an mbuf
> list of just over 64K (or 32K) to TCP in a single sosend(), then TCP
> will generate about 45 (or about 23 for 32K) TCP segments and put
> each in an IP datagram, then hand it to the network device driver
> for transmission.

This is *not* what happens with TSO/LRO.

With TSO, TCP generates IP datagrams of up to 64k which are passed
directly to the driver, which passes them directly to the hardware.

Furthermore, in this unique case (two virtual machines on the same
host and bridge with both TSO and LRO enabled end-to-end), the packet
is *never* fragmented.  The host takes the 64k packet off of one
guest's output ring and puts it onto the other guest's input ring,
intact.

This is, as you might expect, a *massive* performance win.

With TSO & LRO:

$ time iperf -c 172.20.20.162  -d

------------------------------------------------------------

Server listening on TCP port 5001

TCP window size: 1.00 MByte (default)

------------------------------------------------------------

------------------------------------------------------------

Client connecting to 172.20.20.162, TCP port 5001

TCP window size: 1.00 MByte (default)

------------------------------------------------------------

[  5] local 172.20.20.169 port 60889 connected with 172.20.20.162 port 5001

[  4] local 172.20.20.169 port 5001 connected with 172.20.20.162 port 44101

[ ID] Interval       Transfer     Bandwidth

[  5]  0.0-10.0 sec  17.0 GBytes  14.6 Gbits/sec

[  4]  0.0-10.0 sec  17.4 GBytes  14.9 Gbits/sec


real 0m10.061s

user 0m0.229s

sys 0m7.711s


Without TSO & LRO:


$ time iperf -c 172.20.20.162  -d

------------------------------------------------------------

Server listening on TCP port 5001

TCP window size: 1.00 MByte (default)

------------------------------------------------------------

------------------------------------------------------------

Client connecting to 172.20.20.162, TCP port 5001

TCP window size: 1.26 MByte (default)

------------------------------------------------------------

[  5] local 172.20.20.169 port 22088 connected with 172.20.20.162 port 5001

[  4] local 172.20.20.169 port 5001 connected with 172.20.20.162 port 48615

[ ID] Interval       Transfer     Bandwidth

[  5]  0.0-10.0 sec   637 MBytes   534 Mbits/sec

[  4]  0.0-10.0 sec   767 MBytes   642 Mbits/sec


real 0m10.057s

user 0m0.231s

sys 0m3.935s


Look at the difference.  In this bidirectional test, TSO is over 25x
faster using not even 2x the CPU.  This shows how essential TSO/LRO is
if you plan to move data at real world speeds and still have enough
CPU left to operate on that data.


> I recall you saying you tried turning off TSO with no
> effect. You might also try turning off checksum offload. I doubt it will
> be where things are broken, but might be worth a try.

That was not me, that was someone else.  If there is a problem with
NFS and TSO, the solution is *not* to disable TSO.  That is, at best,
a workaround that produces much more CPU load and much less
throughput.  The solution is to find the problem and fix it.

More data to follow.

Thanks!