From owner-freebsd-net@FreeBSD.ORG  Sun Jan 26 01:40:45 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 53913559
 for <net@freebsd.org>; Sun, 26 Jan 2014 01:40:45 +0000 (UTC)
Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [119.145.14.65])
 (using TLSv1 with cipher RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5B0D11A2F
 for <net@freebsd.org>; Sun, 26 Jan 2014 01:40:43 +0000 (UTC)
Received: from 172.24.2.119 (EHLO szxeml207-edg.china.huawei.com)
 ([172.24.2.119])
 by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued)
 with ESMTP id BPD48434; Sun, 26 Jan 2014 09:40:23 +0800 (CST)
Received: from SZXEML410-HUB.china.huawei.com (10.82.67.137) by
 szxeml207-edg.china.huawei.com (172.24.2.56) with Microsoft SMTP Server (TLS)
 id 14.3.158.1; Sun, 26 Jan 2014 09:40:15 +0800
Received: from [127.0.0.1] (10.177.18.75) by szxeml410-hub.china.huawei.com
 (10.82.67.137) with Microsoft SMTP Server id 14.3.158.1; Sun, 26 Jan 2014
 09:40:01 +0800
Message-ID: <52E46770.2000000@huawei.com>
Date: Sun, 26 Jan 2014 09:40:00 +0800
From: Wang Weidong <wangweidong1@huawei.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1;
 rv:24.0) Gecko/20100101 Thunderbird/24.0.1
MIME-Version: 1.0
To: Vincenzo Maffione <v.maffione@gmail.com>
Subject: Re: netmap: I got some troubles with netmap
References: <52D74E15.1040909@huawei.com>	<CA+hQ2+jBhSyHwFsFo+zH-EuJEkKEcyc6YBH+fnEHi=Y27FyWyQ@mail.gmail.com>	<92C7725B-B30A-4A19-925A-A93A2489A525@iet.unipi.it>	<52D8A5E1.9020408@huawei.com>	<52DD1914.7090506@iet.unipi.it>	<52E1E272.8060009@huawei.com>
 <CA+_eA9hOzQiOWKvHOiKjY4kjxmerMWp=MhtF_vbr8t-q4V732g@mail.gmail.com>
In-Reply-To: <CA+_eA9hOzQiOWKvHOiKjY4kjxmerMWp=MhtF_vbr8t-q4V732g@mail.gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.177.18.75]
X-CFilter-Loop: Reflected
Cc: =?ISO-8859-1?Q?facolt=E0?= <giuseppe.lettieri73@gmail.com>,
 Giuseppe Lettieri <g.lettieri@iet.unipi.it>, Luigi Rizzo <rizzo@iet.unipi.it>,
 net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Jan 2014 01:40:45 -0000

On 2014/1/24 22:56, Vincenzo Maffione wrote:
> 
> 
> 
> 2014/1/24 Wang Weidong <wangweidong1@huawei.com <mailto:wangweidong1@huawei.com>>
> 
[...]
> 
> Hello,
>  
[...]
> 
> You are using the old/deprecated QEMU command line syntax (-net), and therefore honestly It's not clear to me what kind of network configuration you are running.
> 
Here, I use the default configuration which provided by the QEMU.

> Please use our scripts "launch-qemu.sh", "prep-taps.sh", according to what described in the README.images file (attached).
> Alternatively, use the syntax like in the following examples
> 
> (#1)   qemu-system-x86_64 archdisk.qcow -enable-kvm -device virtio-net-pci,netdev=mynet -netdev tap,ifname=tap01,id=mynet,script=no,downscript=no -smp 2
> (#2)   qemu-system-x86_64 archdisk.qcow -enable-kvm -device e1000,mitigation=off,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev netmap,ifname=vale0:01,id=mynet -smp 2
> 
I will use them, thanks.

> so that it's clear to us what network frontend (e.g. emulated NIC) and network backend (e.g. netmap, tap, vde, ecc..) you are using.
> In example #1 we are using virtio-net as frontend and tap as backend, while in example #2 we are using e1000 as frontend and netmap as backend.
> Also consider giving more than one core (e.g. -smp 2) to each guest, to mitigate receiver livelock problems.
>  
> 
> 
>     2. I use the vale below:
>     qemu-system-x86_64 -m 2048 -boot c -net nic -net netmap,vale0:0 -hda /home/wwd/tinycores/20131019-tinycore-netmap.hdd -enable-kvm -vnc :0
> 
> Same for here, it's not clear what you are using. I guess each guest has an e1000 device and is connected to a different port of the same vale switch (e.g. vale0:0 and vale0:1)?
> 
>     Test with 2 vms from the same host
>     vale0 without device.
>     I use the pkt-gen, the speed is 938 Kpps
> 
> 
> You should get ~4Mpps with e1000 frontend + netmap backend on a reasonably good machine. Make sure you have ./configure'd QEMU with --enable-e1000-paravirt.
>  
> 
>     I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 195M/195M, then add -- -m 8, I only got 1.07M/1.07M.
>     When use the smaller msg size, the speed will smaller?
> 
> 
> If you use e1000 with netperf (without pkt-gen) your performance is doomed to be horrible. Use e1000-paravirt (as a frontend) instead if you are interested in netperf experiment.
> Also consider that the point in using the "-- -m8" options is experimenting high packet rates, so what you should measure here is not the througput in Mbps, but the packet rate: netperf reports the number of packets sent and received, so you can obtain the packet rate by dividing by the running time.
> The throughput in Mbps is uninteresting, if you want high bulk throughput you just don't use "-- -m 8", but leave the defaults.
> Using virtio-net in this case will help because of the TSO offloadings.
> 
> cheers
>   Vincenzo
>  
Hi Vincenzo,

Nice, I will retest them.

Thanks,
Wang


> 
> 
>     with vale-ctl -a vale0:eth2,
>     use pkt-gen, the speed is 928 Kpps
>     I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 209M/208M, then add -- -m 8, I only got 1.06M/1.06M.
> 
>     with vale-ctl -h vale0:eth2,
>     use pkt-gen, the speed is 928 Kpps
>     I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 192M/192M, then add -- -m 8, I only got 1.06M/1.06M.
> 
>     Test with 2 vms form two host,
>     I only can test it by vale-ctl -h vale0:eth2 and set eth2 into promisc
>     use pkt-gen with the default params, the speed is about 750 Kpps
>     use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 160M/160M
>     Is this right?
> 
> 
>     3. I can't use the l2 utils.
>     When I do the "sudo l2open -t eth0 l2recv[l2send], I got that "l2open ioctl(TUNSETIFF...): Invalid argument"
>     and "use l2open -r eth0 l2recv", wait a moment (only several seconds), I got the result:
>     TEST-RESULT: 0.901 kpps 1pkts
>     select/read=100.00 err=0
> 
>     And I can't find the l2 utils from the net? Is it implemented by your team?
> 
>     All of them is tested on vms.
> 
>     Cheers.
>     Wang
> 
> 
>     >
>     > Cheers,
>     > Giuseppe
>     >
>     > Il 17/01/2014 04:39, Wang Weidong ha scritto:
>     >> On 2014/1/16 18:24, facoltà wrote:
>     [...]
>     >>
>     >>
>     >
>     >
> 
> 
> 
> 
> 
> -- 
> Vincenzo Maffione


From owner-freebsd-net@FreeBSD.ORG  Sun Jan 26 01:56:12 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 472DF794;
 Sun, 26 Jan 2014 01:56:12 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id ED2E41ADF;
 Sun, 26 Jan 2014 01:56:11 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: 
X-IronPort-AV: E=Sophos;i="4.95,721,1384318800"; d="scan'208";a="91061178"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 25 Jan 2014 20:55:47 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id B306CB4051;
 Sat, 25 Jan 2014 20:55:47 -0500 (EST)
Date: Sat, 25 Jan 2014 20:55:47 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: J David <j.david.lists@gmail.com>
Message-ID: <278396201.16318356.1390701347722.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RSGhshBe3CWDiQcis4fYYHqRbyQr70QiXM1nLMTSyCQvQ@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Jan 2014 01:56:12 -0000

J David wrote:
> On Fri, Jan 24, 2014 at 7:10 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> > I would like to hear if you find Linux doing read before write when
> > you use "-r 2k", since I think that is writing less than a page.
> 
> It doesn't.  As I reported in the original test, I used an 8k
> rsize/wsize and a 4k write size on the Linux test and no
> read-before-write was observed.  And just now I did as you asked, a
> 2k
> test with Linux mounting with 32k rsize/wsize.  No extra reads,
> excellent performance.  FreeBSD, with the same mount options, does
> reads even on the appends in this case and can't.
> 
Well, when I get home in April, I'll try the fairly recent Linux client
I have at home and see what it does. Not sure what trick they could use
to avoid the read before write for partial pages. (I suppose I can
look at their sources, but that could be pretty scary;-)

If I understand the 15year old commit message, the main problem with
not doing the read before write for a partial buffer is that mmap()'d
file access will look at entire pages and potentially gets garbage if
the entire page isn't valid.
At this time, there is a single B_CACHE flag to indicate the buffer cache
entry has been filled in. I think it would be possible to add a bitmap
that marks which pages are actually allocated to the buffer cache entry,
but I suspect the coding would be non-trivial. This would help for the
case of page size writes on page boundaries, but would require the pages
to be read in before write when the writes are not of page size on page
boundaries.
Well, one application I do have some experience with is software builds
and the "ld" stage tends to write lots of chunks of odd sizes at any
byte offset. (When I did testing of some code that extended the single
dirty byte range to a list of dirty byte ranges, I discovered that "ld"
often generates 100+ of these odd sized non-contiguous writes before resulting
in a completely written block. I recently added a mount option called
"noncontigwr" that would allow the single dirty byte range to cover these
non-contiguous writes.)
Bottom line, if the pages were read in individually, the "ld" case would
result in several (up to 16 for 4K in a 64K buffer) small reads against
the server, which isn't nearly as efficient as one larger 64K read.

As mentioned above, I don't know how Linux would avoid the read before
write for partial blocks/pages being written.

rick

>                                                             random
>                                                              random
> 
>               KB  reclen   write rewrite    read    reread    read
>                 write
> 
> Linux    1048576       2  281082  358672                    125687
>  121964
> 
> FreeBSD  1048576       2   59042   22624                     10304
> 1933
> 
> 
> For comparison, here's the same test with 32k reclen (again, both
> Linux and FreeBSD using 32k rsize/wsize):
> 
>                                                             random
>                                                              random
> 
>               KB  reclen   write rewrite    read    reread    read
>                 write
> 
> Linux    1048576      32  319387  373021                    411106
>  364393
> 
> FreeBSD  1048576      32   74892   73703                     34889
>   66350
> 
> 
> Unfortunately it sounds like this state of affairs isn't really going
> to improve, at least in the near future.  If there was one area where
> I never thought Linux would surpass us, it was NFS. :(
> 
> Thanks!
> 

From owner-freebsd-net@FreeBSD.ORG  Sun Jan 26 02:25:33 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4A6C6DD8;
 Sun, 26 Jan 2014 02:25:33 +0000 (UTC)
Received: from hergotha.csail.mit.edu
 (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id DCAB41DB2;
 Sun, 26 Jan 2014 02:25:32 +0000 (UTC)
Received: from hergotha.csail.mit.edu (localhost [127.0.0.1])
 by hergotha.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s0Q2PU2n045130;
 Sat, 25 Jan 2014 21:25:30 -0500 (EST)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: (from wollman@localhost)
 by hergotha.csail.mit.edu (8.14.7/8.14.4/Submit) id s0Q2PUp1045129;
 Sat, 25 Jan 2014 21:25:30 -0500 (EST) (envelope-from wollman)
Date: Sat, 25 Jan 2014 21:25:30 -0500 (EST)
Message-Id: <201401260225.s0Q2PUp1045129@hergotha.csail.mit.edu>
From: wollman@freebsd.org
To: rmacklem@uoguelph.ca
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
X-Newsgroups: mit.lcs.mail.freebsd-net
In-Reply-To: <278396201.16318356.1390701347722.JavaMail.root@uoguelph.ca>
References: <CABXB=RSGhshBe3CWDiQcis4fYYHqRbyQr70QiXM1nLMTSyCQvQ@mail.gmail.com>
Organization: none
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (hergotha.csail.mit.edu [127.0.0.1]); Sat, 25 Jan 2014 21:25:30 -0500 (EST)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED
 autolearn=disabled version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 hergotha.csail.mit.edu
Cc: freebsd-fs@freebsd.org, freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Jan 2014 02:25:33 -0000

In article
<278396201.16318356.1390701347722.JavaMail.root@uoguelph.ca>, Rick
Macklem writes:

>Well, when I get home in April, I'll try the fairly recent Linux client
>I have at home and see what it does. Not sure what trick they could use
>to avoid the read before write for partial pages. (I suppose I can
>look at their sources, but that could be pretty scary;-)

For what it's worth, our performance for large-block 100%-read
workloads is also not what it could (or ought to) be.  Between two
20G-attached servers, I can get about 12 Gbit/s with three parallel
TCP connections.  (Multiple connections are required to trick the lagg
hash into balancing the load across both 10G links, because the hash
function used for load-balancing uses the source and destination
ports.)  On the same pair of servers, "dd if=/mnt/test bs=1024k" runs
at about 3 Gbit/s, whereas reading from the local filesystem goes
anywhere from 1.5 to 3 G*byte*/s (i.e., eight times faster) with much
higher CPU utilization.  Luckily, most of our users are only connected
at 1G so they don't notice.

I'm going to lose my test server soon (it has to go into production
shortly), so I'm not really able to work on this.  I'll have another
test server soon (old hardware being replaced by the new server) and
hope to be able to try out the new code that's going to be in 10.1,
with the expectation of upgrading to 10.x over summer break.

-GAWollman


From owner-freebsd-net@FreeBSD.ORG  Sun Jan 26 02:36:33 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 64AC9257;
 Sun, 26 Jan 2014 02:36:33 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id 03BF11E4D;
 Sun, 26 Jan 2014 02:36:32 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ag8UACJ05FKDaFve/2dsb2JhbABagXICAYFPVoJ9tVKDL0+BH3SCJQEBAQMBAQEBICsgCwUWGAICDRkCKQEJJg4HBAEcBIdcCA2rYJwvF4EpjRMBAQ0ONAcWglmBSQSJSIwMhAWQbINLHjF9Bxci
X-IronPort-AV: E=Sophos;i="4.95,721,1384318800"; d="scan'208";a="90488526"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 25 Jan 2014 21:36:25 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 026E5B4042;
 Sat, 25 Jan 2014 21:36:26 -0500 (EST)
Date: Sat, 25 Jan 2014 21:36:26 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: wollman@freebsd.org
Message-ID: <188195924.16327973.1390703786000.JavaMail.root@uoguelph.ca>
In-Reply-To: <201401260225.s0Q2PUp1045129@hergotha.csail.mit.edu>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-fs@freebsd.org, freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Jan 2014 02:36:33 -0000

Garrett Wollman wrote:
> In article
> <278396201.16318356.1390701347722.JavaMail.root@uoguelph.ca>, Rick
> Macklem writes:
> 
> >Well, when I get home in April, I'll try the fairly recent Linux
> >client
> >I have at home and see what it does. Not sure what trick they could
> >use
> >to avoid the read before write for partial pages. (I suppose I can
> >look at their sources, but that could be pretty scary;-)
> 
> For what it's worth, our performance for large-block 100%-read
> workloads is also not what it could (or ought to) be.  Between two
> 20G-attached servers, I can get about 12 Gbit/s with three parallel
> TCP connections.  (Multiple connections are required to trick the
> lagg
> hash into balancing the load across both 10G links, because the hash
> function used for load-balancing uses the source and destination
> ports.)  On the same pair of servers, "dd if=/mnt/test bs=1024k" runs
> at about 3 Gbit/s, whereas reading from the local filesystem goes
> anywhere from 1.5 to 3 G*byte*/s (i.e., eight times faster) with much
> higher CPU utilization.  Luckily, most of our users are only
> connected
> at 1G so they don't notice.
> 
Have you tried increasing readahead by any chance? I think the default
is 1, which means the client will make 2 read requests and then wait for
those replies before doing any more reads. Since you have fast links,
maybe the 2 * 64K reads isn't enough to keep the pipe filled? (This
depends on latency, which you didn't mention.)

Might be worth trying, rick
ps: If/when you have a test server, you could also try compiling a
    kernel with MAXBSIZE set to 128Kbytes instead of 64Kbytes. You'll
    need to boot this kernel on both the server and client (assuming
    a FreeBSD client) before the default rsize will increase to 128Kbytes.
    I'm no ZFS guy, but I understand 128Kbytes is the blocksize it likes.

> I'm going to lose my test server soon (it has to go into production
> shortly), so I'm not really able to work on this.  I'll have another
> test server soon (old hardware being replaced by the new server) and
> hope to be able to try out the new code that's going to be in 10.1,
> with the expectation of upgrading to 10.x over summer break.
> 
> -GAWollman
> 
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Sun Jan 26 03:13:17 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8AD51B4D;
 Sun, 26 Jan 2014 03:13:17 +0000 (UTC)
Received: from hergotha.csail.mit.edu
 (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 452041188;
 Sun, 26 Jan 2014 03:13:17 +0000 (UTC)
Received: from hergotha.csail.mit.edu (localhost [127.0.0.1])
 by hergotha.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s0Q3DFEw045684;
 Sat, 25 Jan 2014 22:13:15 -0500 (EST)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: (from wollman@localhost)
 by hergotha.csail.mit.edu (8.14.7/8.14.4/Submit) id s0Q3DFYt045681;
 Sat, 25 Jan 2014 22:13:15 -0500 (EST) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <21220.32074.958702.595502@hergotha.csail.mit.edu>
Date: Sat, 25 Jan 2014 22:13:14 -0500
From: Garrett Wollman <wollman@freebsd.org>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
In-Reply-To: <188195924.16327973.1390703786000.JavaMail.root@uoguelph.ca>
References: <201401260225.s0Q2PUp1045129@hergotha.csail.mit.edu>
 <188195924.16327973.1390703786000.JavaMail.root@uoguelph.ca>
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (hergotha.csail.mit.edu [127.0.0.1]); Sat, 25 Jan 2014 22:13:15 -0500 (EST)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED
 autolearn=disabled version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 hergotha.csail.mit.edu
Cc: freebsd-fs@freebsd.org, freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Jan 2014 03:13:17 -0000

<<On Sat, 25 Jan 2014 21:36:26 -0500 (EST), Rick Macklem <rmacklem@uoguelph.ca> said:

> Have you tried increasing readahead by any chance? I think the default
> is 1, which means the client will make 2 read requests and then wait for
> those replies before doing any more reads. Since you have fast links,
> maybe the 2 * 64K reads isn't enough to keep the pipe filled? (This
> depends on latency, which you didn't mention.)

-o readahead=4 nearly doubles the speed, to a bit over 5 Gbit/s.

Oddly, when I unmount the filesystem, the test client sometimes
freezes for 15-30 seconds.  Since I'm not on the console I can't tell
what it's doing when this happens.

-GAWollman


From owner-freebsd-net@FreeBSD.ORG  Sun Jan 26 03:19:24 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 576EFCC4;
 Sun, 26 Jan 2014 03:19:24 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id EA2CF11C8;
 Sun, 26 Jan 2014 03:19:23 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqQEAHF95FKDaFve/2dsb2JhbABag0RWgn25AU+BH3SCJQEBAQMBAQEBICsgCwUWGAICDRkCKQEJJgYIBwQBHASHXAgNq1+cLheBKY0TAQEbNAeCb4FJBIlIjAyEBZBsg0seMYEEOQ
X-IronPort-AV: E=Sophos;i="4.95,721,1384318800"; d="scan'208";a="90491411"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 25 Jan 2014 22:19:23 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 15FD7B4054;
 Sat, 25 Jan 2014 22:19:23 -0500 (EST)
Date: Sat, 25 Jan 2014 22:19:23 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Garrett Wollman <wollman@freebsd.org>
Message-ID: <688905116.16333139.1390706363082.JavaMail.root@uoguelph.ca>
In-Reply-To: <21220.32074.958702.595502@hergotha.csail.mit.edu>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.209]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-fs@freebsd.org, freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Jan 2014 03:19:24 -0000

Garrett Wollman wrote:
> <<On Sat, 25 Jan 2014 21:36:26 -0500 (EST), Rick Macklem
> <rmacklem@uoguelph.ca> said:
> 
> > Have you tried increasing readahead by any chance? I think the
> > default
> > is 1, which means the client will make 2 read requests and then
> > wait for
> > those replies before doing any more reads. Since you have fast
> > links,
> > maybe the 2 * 64K reads isn't enough to keep the pipe filled? (This
> > depends on latency, which you didn't mention.)
> 
> -o readahead=4 nearly doubles the speed, to a bit over 5 Gbit/s.
> 
And "-o readahead=8" is slower or faster? (I think you can go up to at
least 16, but I can't remember the upper bound. It's in one of the
.h files.;-)

> Oddly, when I unmount the filesystem, the test client sometimes
> freezes for 15-30 seconds.  Since I'm not on the console I can't tell
> what it's doing when this happens.
> 
Hmm, no idea. Maybe it takes a while to throw away all the buffer cache
blocks?

I run such small systems by to-days standards, I wouldn't see a delay
that "might" occur for a large buffer cache.

At least a little progress, rick

> -GAWollman
> 
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Sun Jan 26 07:43:41 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id DF0CD3DE
 for <net@freebsd.org>; Sun, 26 Jan 2014 07:43:41 +0000 (UTC)
Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [119.145.14.64])
 (using TLSv1 with cipher RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 949C0118B
 for <net@freebsd.org>; Sun, 26 Jan 2014 07:43:40 +0000 (UTC)
Received: from 172.24.2.119 (EHLO szxeml209-edg.china.huawei.com)
 ([172.24.2.119])
 by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued)
 with ESMTP id BQP79666; Sun, 26 Jan 2014 15:41:51 +0800 (CST)
Received: from SZXEML420-HUB.china.huawei.com (10.82.67.159) by
 szxeml209-edg.china.huawei.com (172.24.2.184) with Microsoft SMTP Server
 (TLS) id 14.3.158.1; Sun, 26 Jan 2014 15:41:43 +0800
Received: from [127.0.0.1] (10.177.18.75) by szxeml420-hub.china.huawei.com
 (10.82.67.159) with Microsoft SMTP Server id 14.3.158.1; Sun, 26 Jan 2014
 15:41:45 +0800
Message-ID: <52E4BC38.7040407@huawei.com>
Date: Sun, 26 Jan 2014 15:41:44 +0800
From: Wang Weidong <wangweidong1@huawei.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1;
 rv:24.0) Gecko/20100101 Thunderbird/24.0.1
MIME-Version: 1.0
To: Vincenzo Maffione <v.maffione@gmail.com>
Subject: Re: netmap: I got some troubles with netmap
References: <52D74E15.1040909@huawei.com>	<CA+hQ2+jBhSyHwFsFo+zH-EuJEkKEcyc6YBH+fnEHi=Y27FyWyQ@mail.gmail.com>	<92C7725B-B30A-4A19-925A-A93A2489A525@iet.unipi.it>	<52D8A5E1.9020408@huawei.com>	<52DD1914.7090506@iet.unipi.it>	<52E1E272.8060009@huawei.com>
 <CA+_eA9hOzQiOWKvHOiKjY4kjxmerMWp=MhtF_vbr8t-q4V732g@mail.gmail.com>
In-Reply-To: <CA+_eA9hOzQiOWKvHOiKjY4kjxmerMWp=MhtF_vbr8t-q4V732g@mail.gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.177.18.75]
X-CFilter-Loop: Reflected
Cc: =?ISO-8859-1?Q?facolt=E0?= <giuseppe.lettieri73@gmail.com>,
 Giuseppe Lettieri <g.lettieri@iet.unipi.it>, Luigi Rizzo <rizzo@iet.unipi.it>,
 net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Jan 2014 07:43:41 -0000

On 2014/1/24 22:56, Vincenzo Maffione wrote:
> 
> 
> 
> 2014/1/24 Wang Weidong <wangweidong1@huawei.com <mailto:wangweidong1@huawei.com>>
> 
>     On 2014/1/20 20:39, Giuseppe Lettieri wrote:
>     > Hi Wang,
>     >
>     > OK, you are using the netmap support in the upstream qemu git. That does not yet include all our modifications, some of which are very important for high throughput with VALE. In particular, the upstream qemu does not include the batching improvements in the frontend/backend interface, and it does not include the "map ring" optimization of the e1000 frontend. Please find attached a gzipped patch that contains all of our qemu code. The patch is against the latest upstream master (commit 1cf892ca).
>     >
>     > Please ./configure the patched qemu with the following option, in addition to any other option you may need:
>     >
>     > --enable-e1000-paravirt --enable-netmap \
>     > --extra-cflags=-I/path/to/netmap/sys/directory
>     >
>     > Note that --enable-e1000-paravirt is needed to enable the "map ring" optimization in the e1000 frontend, even if you are not going to use the e1000-paravirt device.
>     >
>     > Now you should be able to rerun your tests. I am also attaching a README file that describes some more tests you may want to run.
>     >
> 
> Hello,
>  
> 
>     Yes, I patch the qemu-netmap-bc767e701.patch to the qemu, download the 20131019-tinycore-netmap.hdd.
>     And I do some test that:
> 
>     1. I use the bridge below:
>     qemu-system-x86_64 -m 2048 -boot c -net nic -net bridge,br=br1 -hda /home/wwd/tinycores/20131019-tinycore-netmap.hdd -enable-kvm -vnc :0
>     test between two vms.
>     br1 without device.
>     Use pktgen, I got the 237.95 kpps.
>     Use the netserver/netperf I got the speed 1037M bits/sec with TCP_STREAM. The max speed is up to 1621M.
>     Use the netserver/netperf I got the speed 3296/s with TCP_RR
>     Use the netserver/netperf I got the speed 234M/86M bits/sec with UDP_STREAM
> 
>     When I add a device from host to the br1, the speed is 159.86 kpps.
>     Use the netserver/netperf I got the speed 720M bits/sec with TCP_STREAM. The max speed is up to 1000M.
>     Use the netserver/netperf I got the speed 3556/s with TCP_RR
>     Use the netserver/netperf I got the speed 181M/181M bits/sec with UDP_STREAM
> 
>     What do you think of these data?
> 
> 
> You are using the old/deprecated QEMU command line syntax (-net), and therefore honestly It's not clear to me what kind of network configuration you are running.
> 
> Please use our scripts "launch-qemu.sh", "prep-taps.sh", according to what described in the README.images file (attached).
> Alternatively, use the syntax like in the following examples
> 
> (#1)   qemu-system-x86_64 archdisk.qcow -enable-kvm -device virtio-net-pci,netdev=mynet -netdev tap,ifname=tap01,id=mynet,script=no,downscript=no -smp 2
> (#2)   qemu-system-x86_64 archdisk.qcow -enable-kvm -device e1000,mitigation=off,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev netmap,ifname=vale0:01,id=mynet -smp 2
> 
Here I use the 20131019-tinycore-netmap.hdd (download from the http://info.iet.unipi.it/~luigi/netmap/) instead archdisk.qcow  with #2.
And I can't do the "cpufreq-set -g performance	# on linux" as the README.image.
Although, I use the pkt-gen test the vms, I got the tx speed is ~3Mpps while the other vm's rx speed is only 1.44Mpps.

Is it right? I can't get ~4Mpps is the reason that I can't set the "CPU power saving".

> so that it's clear to us what network frontend (e.g. emulated NIC) and network backend (e.g. netmap, tap, vde, ecc..) you are using.
> In example #1 we are using virtio-net as frontend and tap as backend, while in example #2 we are using e1000 as frontend and netmap as backend.
> Also consider giving more than one core (e.g. -smp 2) to each guest, to mitigate receiver livelock problems.
>  
> 
> 
>     2. I use the vale below:
>     qemu-system-x86_64 -m 2048 -boot c -net nic -net netmap,vale0:0 -hda /home/wwd/tinycores/20131019-tinycore-netmap.hdd -enable-kvm -vnc :0
> 
> Same for here, it's not clear what you are using. I guess each guest has an e1000 device and is connected to a different port of the same vale switch (e.g. vale0:0 and vale0:1)?
> 
>     Test with 2 vms from the same host
>     vale0 without device.
>     I use the pkt-gen, the speed is 938 Kpps
> 
> 
> You should get ~4Mpps with e1000 frontend + netmap backend on a reasonably good machine. Make sure you have ./configure'd QEMU with --enable-e1000-paravirt.
>  
> 
>     I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 195M/195M, then add -- -m 8, I only got 1.07M/1.07M.
>     When use the smaller msg size, the speed will smaller?
> 
> 
> If you use e1000 with netperf (without pkt-gen) your performance is doomed to be horrible. Use e1000-paravirt (as a frontend) instead if you are interested in netperf experiment.
> Also consider that the point in using the "-- -m8" options is experimenting high packet rates, so what you should measure here is not the througput in Mbps, but the packet rate: netperf reports the number of packets sent and received, so you can obtain the packet rate by dividing by the running time.
> The throughput in Mbps is uninteresting, if you want high bulk throughput you just don't use "-- -m 8", but leave the defaults.
> Using virtio-net in this case will help because of the TSO offloadings.
> 
Here, I am a little interested in netperf. So I did that:
qemu-system-x86_64 20131019-tinycore-netmap.hdd -enable-kvm -device *e1000-paravirt*,mitigation=off,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev netmap,ifname=vale0:01,id=mynet -smp 2 -m 2048 -vnc :0
qemu-system-x86_64 20131019-tinycore-netmap.hdd -enable-kvm -device *e1000-paravirt*,mitigation=off,mac=00:AA:BB:CC:DD:02,netdev=mynet -netdev netmap,ifname=vale0:02,id=mynet -smp 2 -m 2048 -vnc :1
I think this cmd is error. because after I set ipv4 addrs to the device, But I find two vms can't communicate with each other.

Thanks,
Wang

> cheers
>   Vincenzo
>  
> 
> 
>     with vale-ctl -a vale0:eth2,
>     use pkt-gen, the speed is 928 Kpps
>     I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 209M/208M, then add -- -m 8, I only got 1.06M/1.06M.
> 
>     with vale-ctl -h vale0:eth2,
>     use pkt-gen, the speed is 928 Kpps
>     I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 192M/192M, then add -- -m 8, I only got 1.06M/1.06M.
> 
>     Test with 2 vms form two host,
>     I only can test it by vale-ctl -h vale0:eth2 and set eth2 into promisc
>     use pkt-gen with the default params, the speed is about 750 Kpps
>     use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 160M/160M
>     Is this right?
> 
> 
>     3. I can't use the l2 utils.
>     When I do the "sudo l2open -t eth0 l2recv[l2send], I got that "l2open ioctl(TUNSETIFF...): Invalid argument"
>     and "use l2open -r eth0 l2recv", wait a moment (only several seconds), I got the result:
>     TEST-RESULT: 0.901 kpps 1pkts
>     select/read=100.00 err=0
> 
>     And I can't find the l2 utils from the net? Is it implemented by your team?
> 
>     All of them is tested on vms.
> 
>     Cheers.
>     Wang
> 
> 
>     >
>     > Cheers,
>     > Giuseppe
>     >
>     > Il 17/01/2014 04:39, Wang Weidong ha scritto:
>     >> On 2014/1/16 18:24, facoltà wrote:
>     [...]
>     >>
>     >>
>     >
>     >
> 
> 
> 
> 
> 
> -- 
> Vincenzo Maffione


From owner-freebsd-net@FreeBSD.ORG  Sun Jan 26 23:35:47 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx2.freebsd.org (mx2.freebsd.org [8.8.178.116])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 54D9F9F2;
 Sun, 26 Jan 2014 23:35:47 +0000 (UTC)
Received: from butcher-nb.yandex.net (hub.freebsd.org
 [IPv6:2001:1900:2254:206c::16:88])
 by mx2.freebsd.org (Postfix) with ESMTP id 1FE04228D;
 Sun, 26 Jan 2014 23:35:45 +0000 (UTC)
Message-ID: <52E59B93.90304@FreeBSD.org>
Date: Mon, 27 Jan 2014 03:34:43 +0400
From: "Andrey V. Elsukov" <ae@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: "Alexander V. Chernikov" <melifaro@yandex-team.ru>, 
 "net@freebsd.org" <net@freebsd.org>
Subject: Re: "slow path" in network code || IPv6 panic on inteface removal
References: <52E21721.5010309@yandex-team.ru>
In-Reply-To: <52E21721.5010309@yandex-team.ru>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: arch@freebsd.org, hackers@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Jan 2014 23:35:47 -0000

Hello, Alexander,

probably it would be better, it you split your patch into two.
The one, that implements this:

> What exactly is proposed:
> - Another one netisr queue for handling different types of packets
> - metainfo is stored in mbuf_tag attached to packet
> - ifnet departure handler taking care of packets queued from/to killed
> ifnet
> - API to register/unregister/dispath given type of traffic

And second, that shows usage example:

> #5 T2 calls nd6_ifptomac() which reads interface MAC from ifp->if_addr
> 
> #6 User inspects core generated by previous call
> 
> Using new API, we can avoid #6 by making the following code changes:
> * LLE timer does not drop/reacquire LLE lock
> * we require nd6_ns_output callers to lock LLE if it is provided
> * nd6_ns_output() uses "slow" path instead of sending mbuf to
> ip6_output() immediately if LLE is not NULL.

-- 
WBR, Andrey V. Elsukov

From owner-freebsd-net@FreeBSD.ORG  Mon Jan 27 02:16:56 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3D174B64
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 02:16:56 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id B6DAE12B0
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 02:16:55 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: 
X-IronPort-AV: E=Sophos;i="4.95,726,1384318800"; d="scan'208";a="91214176"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 26 Jan 2014 21:16:54 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 51E6FB3F48;
 Sun, 26 Jan 2014 21:16:54 -0500 (EST)
Date: Sun, 26 Jan 2014 21:16:54 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Adam McDougall <mcdouga9@egr.msu.edu>
Message-ID: <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca>
In-Reply-To: <52DC1241.7010004@egr.msu.edu>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: multipart/mixed; 
 boundary="----=_Part_16590856_824730477.1390789014322"
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jan 2014 02:16:56 -0000

------=_Part_16590856_824730477.1390789014322
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Adam McDougall wrote:
> Also try rsize=32768,wsize=32768 in your mount options, made a huge
> difference for me.  I've noticed slow file transfers on NFS in 9 and
> finally did some searching a couple months ago, someone suggested it
> and
> they were on to something.
> 
I have a "hunch" that might explain why 64K NFS reads/writes perform
poorly for some network environments.
A 64K NFS read reply/write request consists of a list of 34 mbufs when
passed to TCP via sosend() and a total data length of around 65680bytes.
Looking at a couple of drivers (virtio and ixgbe), they seem to expect
no more than 32-33 mbufs in a list for a 65535 byte TSO xmit. I think
(I don't have anything that does TSO to confirm this) that NFS will pass
a list that is longer (34 plus a TCP/IP header).
At a glance, it appears that the drivers call m_defrag() or m_collapse()
when the mbuf list won't fit in their scatter table (32 or 33 elements)
and if this fails, just silently drop the data without sending it.
If I'm right, there would considerable overhead from m_defrag()/m_collapse()
and near disaster if they fail to fix the problem and the data is silently
dropped instead of xmited.

Anyhow, I have attached a patch that makes NFS use MJUMPAGESIZE clusters,
so the mbuf count drops from 34 to 18.

If anyone has a TSO scatter/gather enabled net interface and can test this
patch on it with NFS I/O (default of 64K rsize/wsize) when TSO is enabled
and see what effect it has, that would be appreciated.

Btw, thanks go to Garrett Wollman for suggesting the change to MJUMPAGESIZE
clusters.

rick
ps: If the attachment doesn't make it through and you want the patch, just
    email me and I'll send you a copy.

> On 01/19/2014 09:32, Alfred Perlstein wrote:
> > 9.x has pretty poor mbuf tuning by default.
> > 
> > I hit nearly the same problem and raising the mbufs worked for me.
> > 
> > I'd suggest raising that and retrying.
> > 
> > -Alfred
> > 
> > On 1/19/14 12:47 AM, J David wrote:
> >> While setting up a test for other purposes, I noticed some really
> >> horrible NFS performance issues.
> >>
> >> To explore this, I set up a test environment with two FreeBSD
> >> 9.2-RELEASE-p3 virtual machines running under KVM.  The NFS server
> >> is
> >> configured to serve a 2 gig mfs on /mnt.
> >>
> >> The performance of the virtual network is outstanding:
> >>
> >> Server:
> >>
> >> $ iperf -c 172.20.20.169
> >>
> >> ------------------------------------------------------------
> >>
> >> Client connecting to 172.20.20.169, TCP port 5001
> >>
> >> TCP window size: 1.00 MByte (default)
> >>
> >> ------------------------------------------------------------
> >>
> >> [  3] local 172.20.20.162 port 59717 connected with 172.20.20.169
> >> port
> >> 5001
> >>
> >> [ ID] Interval       Transfer     Bandwidth
> >>
> >> [  3]  0.0-10.0 sec  16.1 GBytes  13.8 Gbits/sec
> >>
> >> $ iperf -s
> >>
> >> ------------------------------------------------------------
> >>
> >> Server listening on TCP port 5001
> >>
> >> TCP window size: 1.00 MByte (default)
> >>
> >> ------------------------------------------------------------
> >>
> >> [  4] local 172.20.20.162 port 5001 connected with 172.20.20.169
> >> port
> >> 45655
> >>
> >> [ ID] Interval       Transfer     Bandwidth
> >>
> >> [  4]  0.0-10.0 sec  15.8 GBytes  13.6 Gbits/sec
> >>
> >>
> >> Client:
> >>
> >>
> >> $ iperf -s
> >>
> >> ------------------------------------------------------------
> >>
> >> Server listening on TCP port 5001
> >>
> >> TCP window size: 1.00 MByte (default)
> >>
> >> ------------------------------------------------------------
> >>
> >> [  4] local 172.20.20.169 port 5001 connected with 172.20.20.162
> >> port
> >> 59717
> >>
> >> [ ID] Interval       Transfer     Bandwidth
> >>
> >> [  4]  0.0-10.0 sec  16.1 GBytes  13.8 Gbits/sec
> >>
> >> ^C$ iperf -c 172.20.20.162
> >>
> >> ------------------------------------------------------------
> >>
> >> Client connecting to 172.20.20.162, TCP port 5001
> >>
> >> TCP window size: 1.00 MByte (default)
> >>
> >> ------------------------------------------------------------
> >>
> >> [  3] local 172.20.20.169 port 45655 connected with 172.20.20.162
> >> port
> >> 5001
> >>
> >> [ ID] Interval       Transfer     Bandwidth
> >>
> >> [  3]  0.0-10.0 sec  15.8 GBytes  13.6 Gbits/sec
> >>
> >>
> >> The performance of the mfs filesystem on the server is also good.
> >>
> >> Server:
> >>
> >> $ sudo mdconfig -a -t swap -s 2g
> >>
> >> md0
> >>
> >> $ sudo newfs -U -b 4k -f 4k /dev/md0
> >>
> >> /dev/md0: 2048.0MB (4194304 sectors) block size 4096, fragment
> >> size 4096
> >>
> >> using 43 cylinder groups of 48.12MB, 12320 blks, 6160 inodes.
> >>
> >> with soft updates
> >>
> >> super-block backups (for fsck_ffs -b #) at:
> >>
> >>   144, 98704, 197264, 295824, 394384, 492944, 591504, 690064,
> >>   788624,
> >> 887184,
> >>
> >>   985744, 1084304, 1182864, 1281424, 1379984, 1478544, 1577104,
> >>   1675664,
> >>
> >>   1774224, 1872784, 1971344, 2069904, 2168464, 2267024, 2365584,
> >>   2464144,
> >>
> >>   2562704, 2661264, 2759824, 2858384, 2956944, 3055504, 3154064,
> >>   3252624,
> >>
> >>   3351184, 3449744, 3548304, 3646864, 3745424, 3843984, 3942544,
> >>   4041104,
> >>
> >>   4139664
> >>
> >> $ sudo mount /dev/md0 /mnt
> >>
> >> $ cd /mnt
> >>
> >> $ sudo iozone -e -I -s 512m -r 4k -i 0 -i 1 -i 2
> >>
> >> Iozone: Performance Test of File I/O
> >>
> >>          Version $Revision: 3.420 $
> >>
> >> [...]
> >>
> >>                                                              random
> >> random
> >>
> >>                KB  reclen   write rewrite    read    reread
> >>                   read
> >> write
> >>
> >>            524288       4  560145 1114593   933699   831902
> >>              56347
> >> 158904
> >>
> >>
> >> iozone test complete.
> >>
> >>
> >> But introduce NFS into the mix and everything falls apart.
> >>
> >> Client:
> >>
> >> $ sudo mount -o tcp,nfsv3 f12.phxi:/mnt /mnt
> >>
> >> $ cd /mnt
> >>
> >> $ sudo iozone -e -I -s 512m -r 4k -i 0 -i 1 -i 2
> >>
> >> Iozone: Performance Test of File I/O
> >>
> >>          Version $Revision: 3.420 $
> >>
> >> [...]
> >>
> >>                                                              random
> >> random
> >>
> >>                KB  reclen   write rewrite    read    reread
> >>                   read
> >> write
> >>
> >>            524288       4   67246    2923   103295  1272407
> >>             172475
> >> 196
> >>
> >>
> >> And the above took 48 minutes to run, compared to 14 seconds for
> >> the
> >> local version.  So it's 200x slower over NFS.  The random write
> >> test
> >> is over 800x slower.  Of course NFS is slower, that's expected,
> >> but it
> >> definitely wasn't this exaggerated in previous releases.
> >>
> >> To emphasize that iozone reflects real workloads here, I tried
> >> doing
> >> an svn co of the 9-STABLE source tree over NFS but after two hours
> >> it
> >> was still in llvm so I gave up.
> >>
> >> While all this not-much-of-anything NFS traffic is going on, both
> >> systems are essentially idle.  The process on the client sits in
> >> "newnfs" wait state with nearly no CPU.  The server is completely
> >> idle
> >> except for the occasional 0.10% in an nfsd thread, which otherwise
> >> spend their lives in rpcsvc wait state.
> >>
> >> Server iostat:
> >>
> >> $ iostat -x -w 10 md0
> >>
> >>                         extended device statistics
> >>
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >>
> >> [...]
> >>
> >> md0        0.0  36.0     0.0     0.0    0   1.2   0
> >> md0        0.0  38.8     0.0     0.0    0   1.5   0
> >> md0        0.0  73.6     0.0     0.0    0   1.0   0
> >> md0        0.0  53.3     0.0     0.0    0   2.5   0
> >> md0        0.0  33.7     0.0     0.0    0   1.1   0
> >> md0        0.0  45.5     0.0     0.0    0   1.8   0
> >>
> >> Server nfsstat:
> >>
> >> $ nfsstat -s -w 10
> >>
> >>   GtAttr Lookup Rdlink   Read  Write Rename Access  Rddir
> >>
> >> [...]
> >>
> >>        0      0      0    471    816      0      0      0
> >>
> >>        0      0      0    480    751      0      0      0
> >>
> >>        0      0      0    481     36      0      0      0
> >>
> >>        0      0      0    469    550      0      0      0
> >>
> >>        0      0      0    485    814      0      0      0
> >>
> >>        0      0      0    467    503      0      0      0
> >>
> >>        0      0      0    473    345      0      0      0
> >>
> >>
> >> Client nfsstat:
> >>
> >> $ nfsstat -c -w 10
> >>
> >>   GtAttr Lookup Rdlink   Read  Write Rename Access  Rddir
> >>
> >> [...]
> >>
> >>        0      0      0      0    518      0      0      0
> >>
> >>        0      0      0      0    498      0      0      0
> >>
> >>        0      0      0      0    503      0      0      0
> >>
> >>        0      0      0      0    474      0      0      0
> >>
> >>        0      0      0      0    525      0      0      0
> >>
> >>        0      0      0      0    497      0      0      0
> >>
> >>
> >> Server vmstat:
> >>
> >> $ vmstat -w 10
> >>
> >>   procs      memory      page                    disks
> >> faults         cpu
> >>
> >>   r b w     avm    fre   flt  re  pi  po    fr  sr vt0 vt1   in
> >>     sy
> >> cs us sy id
> >>
> >> [...]
> >>
> >>   0 4 0    634M  6043M    37   0   0   0     1   0   0   0 1561
> >>     46
> >> 3431  0  2 98
> >>
> >>   0 4 0    640M  6042M    62   0   0   0    28   0   0   0 1598
> >>     94
> >> 3552  0  2 98
> >>
> >>   0 4 0    648M  6042M    38   0   0   0     0   0   0   0 1609
> >>     47
> >> 3485  0  1 99
> >>
> >>   0 4 0    648M  6042M    37   0   0   0     0   0   0   0 1615
> >>     46
> >> 3667  0  2 98
> >>
> >>   0 4 0    648M  6042M    37   0   0   0     0   0   0   0 1606
> >>     45
> >> 3678  0  2 98
> >>
> >>   0 4 0    648M  6042M    37   0   0   0     0   0   1   0 1561
> >>     45
> >> 3377  0  2 98
> >>
> >>
> >> Client vmstat:
> >>
> >> $ vmstat -w 10
> >>
> >>   procs      memory      page                    disks
> >> faults         cpu
> >>
> >>   r b w     avm    fre   flt  re  pi  po    fr  sr md0 da0   in
> >>     sy
> >> cs us sy id
> >>
> >> [...]
> >>
> >>   0 0 0    639M   593M    33   0   0   0  1237   0   0   0  281
> >>   5575
> >> 1043  0  3 97
> >>
> >>   0 0 0    639M   591M     0   0   0   0   712   0   0   0  235
> >>    122
> >> 889  0  2 98
> >>
> >>   0 0 0    639M   583M     0   0   0   0   571   0   0   1  227
> >>    120
> >> 851  0  2 98
> >>
> >>   0 0 0    639M   592M   198   0   0   0  1212   0   0   0  251
> >>   2497
> >> 950  0  3 97
> >>
> >>   0 0 0    639M   586M     0   0   0   0   614   0   0   0  250
> >>    121
> >> 924  0  2 98
> >>
> >>   0 0 0    639M   586M     0   0   0   0   765   0   0   0  250
> >>    120
> >> 918  0  3 97
> >>
> >>
> >> Top on the KVM host says it is 93-95% idle and that each VM sits
> >> around 7-10% CPU.  So basically nobody is doing anything.  There's
> >> no
> >> visible bottleneck, and I've no idea where to go from here to
> >> figure
> >> out what's going on.
> >>
> >> Does anyone have any suggestions for debugging this?
> >>
> >> Thanks!
> >> _______________________________________________
> >> freebsd-net@freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> >> To unsubscribe, send any mail to
> >> "freebsd-net-unsubscribe@freebsd.org"
> >>
> > 
> > _______________________________________________
> > freebsd-net@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to
> > "freebsd-net-unsubscribe@freebsd.org"
> 
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

------=_Part_16590856_824730477.1390789014322
Content-Type: text/x-patch; name=4kmcl.patch
Content-Disposition: attachment; filename=4kmcl.patch
Content-Transfer-Encoding: base64

LS0tIGZzL25mcy9uZnNwb3J0Lmguc2F2MgkyMDE0LTAxLTI2IDE4OjQzOjQ3LjAwMDAwMDAwMCAt
MDUwMAorKysgZnMvbmZzL25mc3BvcnQuaAkyMDE0LTAxLTI2IDE5OjA0OjI3LjAwMDAwMDAwMCAt
MDUwMApAQCAtMTUzLDE0ICsxNTMsMjcgQEAKIAkJCU1HRVRIRFIoKG0pLCBNX1dBSVRPSywgTVRf
REFUQSk7IAlcCiAJCX0gCQkJCQkJXAogCX0gd2hpbGUgKDApCi0jZGVmaW5lCU5GU01DTEdFVCht
LCB3KQlkbyB7IAkJCQkJXAotCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEpOyAJCQlcCi0J
CXdoaWxlICgobSkgPT0gTlVMTCApIHsgCQkJCVwKLQkJCSh2b2lkKSBuZnNfY2F0bmFwKFBaRVJP
LCAwLCAibmZzbWdldCIpOwlcCi0JCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEpOyAJCVwK
LQkJfSAJCQkJCQlcCi0JCU1DTEdFVCgobSksICh3KSk7CQkJCVwKKyNpZiBNSlVNUEFHRVNJWkUg
PiBNQ0xCWVRFUworI2RlZmluZQlORlNNQ0xHRVQobSwgdykJZG8gewkgCQkJCQlcCisJCShtKSA9
IG1fZ2V0amNsKE1fV0FJVE9LLCBNVF9EQVRBLCAwLCBNSlVNUEFHRVNJWkUpOwlcCisJCXdoaWxl
ICgobSkgPT0gTlVMTCkgewkgCQkJCVwKKwkJCSh2b2lkKW5mc19jYXRuYXAoUFpFUk8sIDAsICJu
ZnNtZ2V0Iik7CQlcCisJCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEpOwkgCQlcCisJCQlp
ZiAoKG0pICE9IE5VTEwpCQkJCVwKKwkJCQlNQ0xHRVQoKG0pLCAodykpOwkJCVwKKwkJfQkgCQkJ
CQkJXAogCX0gd2hpbGUgKDApCisjZWxzZQorI2RlZmluZQlORlNNQ0xHRVQobSwgdykJZG8gewkg
CQkJCQlcCisJCShtKSA9IG1fZ2V0amNsKE1fV0FJVE9LLCBNVF9EQVRBLCAwLCBNQ0xCWVRFUyk7
CQlcCisJCXdoaWxlICgobSkgPT0gTlVMTCkgewkgCQkJCVwKKwkJCSh2b2lkKW5mc19jYXRuYXAo
UFpFUk8sIDAsICJuZnNtZ2V0Iik7CQlcCisJCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEp
OwkgCQlcCisJCQlpZiAoKG0pICE9IE5VTEwpCQkJCVwKKwkJCQlNQ0xHRVQoKG0pLCAodykpOwkJ
CVwKKwkJfQkgCQkJCQkJXAorCX0gd2hpbGUgKDApCisjZW5kaWYKICNkZWZpbmUJTkZTTUNMR0VU
SERSKG0sIHcpIGRvIHsgCQkJCVwKIAkJTUdFVEhEUigobSksIE1fV0FJVE9LLCBNVF9EQVRBKTsJ
CVwKIAkJd2hpbGUgKChtKSA9PSBOVUxMICkgeyAJCQkJXAotLS0gZnMvbmZzc2VydmVyL25mc19u
ZnNkcG9ydC5jLnNhdjIJMjAxNC0wMS0yNiAxODo1NDoyOS4wMDAwMDAwMDAgLTA1MDAKKysrIGZz
L25mc3NlcnZlci9uZnNfbmZzZHBvcnQuYwkyMDE0LTAxLTI2IDE4OjU2OjA4LjAwMDAwMDAwMCAt
MDUwMApAQCAtNTY2LDggKzU2Niw3IEBAIG5mc3Zub19yZWFkbGluayhzdHJ1Y3Qgdm5vZGUgKnZw
LCBzdHJ1Y3QKIAlsZW4gPSAwOwogCWkgPSAwOwogCXdoaWxlIChsZW4gPCBORlNfTUFYUEFUSExF
TikgewotCQlORlNNR0VUKG1wKTsKLQkJTUNMR0VUKG1wLCBNX1dBSVRPSyk7CisJCU5GU01DTEdF
VChtcCwgTV9XQUlUT0spOwogCQltcC0+bV9sZW4gPSBORlNNU0laKG1wKTsKIAkJaWYgKGxlbiA9
PSAwKSB7CiAJCQltcDMgPSBtcDIgPSBtcDsKQEAgLTYzNiw4ICs2MzUsNyBAQCBuZnN2bm9fcmVh
ZChzdHJ1Y3Qgdm5vZGUgKnZwLCBvZmZfdCBvZmYsCiAJICovCiAJaSA9IDA7CiAJd2hpbGUgKGxl
ZnQgPiAwKSB7Ci0JCU5GU01HRVQobSk7Ci0JCU1DTEdFVChtLCBNX1dBSVRPSyk7CisJCU5GU01D
TEdFVChtLCBNX1dBSVRPSyk7CiAJCW0tPm1fbGVuID0gMDsKIAkJc2l6ID0gbWluKE1fVFJBSUxJ
TkdTUEFDRShtKSwgbGVmdCk7CiAJCWxlZnQgLT0gc2l6Owo=
------=_Part_16590856_824730477.1390789014322--

From owner-freebsd-net@FreeBSD.ORG  Mon Jan 27 03:23:50 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 05458AF5
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 03:23:50 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id D27B4187B
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 03:23:49 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s0R3NcFh047099
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Sun, 26 Jan 2014 19:23:39 -0800 (PST)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s0R3Nct6047098;
 Sun, 26 Jan 2014 19:23:38 -0800 (PST) (envelope-from jmg)
Date: Sun, 26 Jan 2014 19:23:38 -0800
From: John-Mark Gurney <jmg@funkthat.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID: <20140127032338.GP13704@funkthat.com>
Mail-Followup-To: Rick Macklem <rmacklem@uoguelph.ca>,
 Adam McDougall <mcdouga9@egr.msu.edu>, freebsd-net@freebsd.org
References: <52DC1241.7010004@egr.msu.edu>
 <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Sun, 26 Jan 2014 19:23:39 -0800 (PST)
Cc: freebsd-net@freebsd.org, Adam McDougall <mcdouga9@egr.msu.edu>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jan 2014 03:23:50 -0000

Rick Macklem wrote this message on Sun, Jan 26, 2014 at 21:16 -0500:
> Btw, thanks go to Garrett Wollman for suggesting the change to MJUMPAGESIZE
> clusters.
> 
> rick
> ps: If the attachment doesn't make it through and you want the patch, just
>     email me and I'll send you a copy.

The patch looks good, but we probably shouldn't change _readlink..
The chances of a link being >2k are pretty slim, and the chances of
the link being >32k are even smaller...

In fact, we might want to switch _readlink to MGET (could be conditional
upon cnt) so that if it fits in an mbuf we don't allocate a cluster for
it...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-net@FreeBSD.ORG  Mon Jan 27 05:50:57 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A949CD66
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 05:50:57 +0000 (UTC)
Received: from mail-pb0-x229.google.com (mail-pb0-x229.google.com
 [IPv6:2607:f8b0:400e:c01::229])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7B5211298
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 05:50:57 +0000 (UTC)
Received: by mail-pb0-f41.google.com with SMTP id up15so5468494pbc.14
 for <freebsd-net@freebsd.org>; Sun, 26 Jan 2014 21:50:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=from:date:to:cc:subject:message-id:reply-to:references:mime-version
 :content-type:content-disposition:in-reply-to:user-agent;
 bh=Bw/K5LPfCjrQMxv+KYJgyeb0epoQ04Nt6bAuB1SjFps=;
 b=MyYnhd1dPuBQvMdoU0cdOv2MJ1qWTOlbQ/j8eqjueHtdo6MPKLsX2hFYrzYq2e+4wD
 j6JGLdLd1MklYq8wbPUj79E5aYtGuCp2zOhE/vkOY67yATVT5EtdTMISLQchaExM0CFh
 ozcB+LQKCyxIDPsDAsEyX6QJY921hLzzA/8HNP6qARBea4IBgtQoCEFjMk+LSjWHX0A3
 8dSs/x0yok+e62MUzRYoMCO2AuUP+HnofatjTqC7YQkwZzbBrt0A/dRD+6cyQ68hIiz9
 klOvj1Es1OPu2rkxYL9Gk55PX1qCRO2TzHMNKXchgKccqqoOW7Dog949ednxcVnLy0dx
 8nDQ==
X-Received: by 10.68.198.97 with SMTP id jb1mr28355539pbc.104.1390801856275;
 Sun, 26 Jan 2014 21:50:56 -0800 (PST)
Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249])
 by mx.google.com with ESMTPSA id e6sm28142111pbg.4.2014.01.26.21.50.52
 for <multiple recipients>
 (version=TLSv1 cipher=RC4-SHA bits=128/128);
 Sun, 26 Jan 2014 21:50:55 -0800 (PST)
Received: by pyunyh@gmail.com (sSMTP sendmail emulation);
 Mon, 27 Jan 2014 14:50:47 +0900
From: Yonghyeon PYUN <pyunyh@gmail.com>
Date: Mon, 27 Jan 2014 14:50:47 +0900
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID: <20140127055047.GA1368@michelle.cdnetworks.com>
References: <52DC1241.7010004@egr.msu.edu>
 <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-net@freebsd.org, Adam McDougall <mcdouga9@egr.msu.edu>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: pyunyh@gmail.com
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jan 2014 05:50:57 -0000

On Sun, Jan 26, 2014 at 09:16:54PM -0500, Rick Macklem wrote:
> Adam McDougall wrote:
> > Also try rsize=32768,wsize=32768 in your mount options, made a huge
> > difference for me.  I've noticed slow file transfers on NFS in 9 and
> > finally did some searching a couple months ago, someone suggested it
> > and
> > they were on to something.
> > 
> I have a "hunch" that might explain why 64K NFS reads/writes perform
> poorly for some network environments.
> A 64K NFS read reply/write request consists of a list of 34 mbufs when
> passed to TCP via sosend() and a total data length of around 65680bytes.
> Looking at a couple of drivers (virtio and ixgbe), they seem to expect
> no more than 32-33 mbufs in a list for a 65535 byte TSO xmit. I think
> (I don't have anything that does TSO to confirm this) that NFS will pass
> a list that is longer (34 plus a TCP/IP header).
> At a glance, it appears that the drivers call m_defrag() or m_collapse()
> when the mbuf list won't fit in their scatter table (32 or 33 elements)
> and if this fails, just silently drop the data without sending it.
> If I'm right, there would considerable overhead from m_defrag()/m_collapse()
> and near disaster if they fail to fix the problem and the data is silently
> dropped instead of xmited.
> 

I think the actual number of DMA segments allocated for the mbuf
chain is determined by bus_dma(9).  bus_dma(9) will coalesce
current segment with previous segment if possible.

I'm not sure whether you're referring to ixgbe(4) or ix(4) but I
see the total length of all segment size of ix(4) is 65535 so
it has no room for ethernet/VLAN header of the mbuf chain.  The
driver should be fixed to transmit a 64KB datagram.
I think the use of m_defrag(9) in TSO is suboptimal. All TSO
capable controllers are able to handle multiple TX buffers so it
should have used m_collapse(9) rather than copying entire chain
with m_defrag(9).

> Anyhow, I have attached a patch that makes NFS use MJUMPAGESIZE clusters,
> so the mbuf count drops from 34 to 18.
> 

Could we make it conditional on size?

> If anyone has a TSO scatter/gather enabled net interface and can test this
> patch on it with NFS I/O (default of 64K rsize/wsize) when TSO is enabled
> and see what effect it has, that would be appreciated.
> 
> Btw, thanks go to Garrett Wollman for suggesting the change to MJUMPAGESIZE
> clusters.
> 
> rick
> ps: If the attachment doesn't make it through and you want the patch, just
>     email me and I'll send you a copy.
> 

From owner-freebsd-net@FreeBSD.ORG  Mon Jan 27 10:21:36 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0705A872
 for <net@freebsd.org>; Mon, 27 Jan 2014 10:21:36 +0000 (UTC)
Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [119.145.14.66])
 (using TLSv1 with cipher RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9EEA216C2
 for <net@freebsd.org>; Mon, 27 Jan 2014 10:21:34 +0000 (UTC)
Received: from 172.24.2.119 (EHLO szxeml209-edg.china.huawei.com)
 ([172.24.2.119])
 by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued)
 with ESMTP id AJX01520; Mon, 27 Jan 2014 18:15:00 +0800 (CST)
Received: from SZXEML405-HUB.china.huawei.com (10.82.67.60) by
 szxeml209-edg.china.huawei.com (172.24.2.184) with Microsoft SMTP Server
 (TLS) id 14.3.158.1; Mon, 27 Jan 2014 18:14:56 +0800
Received: from [127.0.0.1] (10.177.18.75) by szxeml405-hub.china.huawei.com
 (10.82.67.60) with Microsoft SMTP Server id 14.3.158.1; Mon, 27 Jan 2014
 18:14:51 +0800
Message-ID: <52E6319A.8070601@huawei.com>
Date: Mon, 27 Jan 2014 18:14:50 +0800
From: Wang Weidong <wangweidong1@huawei.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1;
 rv:24.0) Gecko/20100101 Thunderbird/24.0.1
MIME-Version: 1.0
To: Vincenzo Maffione <v.maffione@gmail.com>
Subject: Re: netmap: I got some troubles with netmap
References: <52D74E15.1040909@huawei.com>	<CA+hQ2+jBhSyHwFsFo+zH-EuJEkKEcyc6YBH+fnEHi=Y27FyWyQ@mail.gmail.com>	<92C7725B-B30A-4A19-925A-A93A2489A525@iet.unipi.it>	<52D8A5E1.9020408@huawei.com>	<52DD1914.7090506@iet.unipi.it>	<52E1E272.8060009@huawei.com>	<CA+_eA9hOzQiOWKvHOiKjY4kjxmerMWp=MhtF_vbr8t-q4V732g@mail.gmail.com>	<52E4BC38.7040407@huawei.com>
 <CA+_eA9j0thuO6sALz+en=xVUj+L=UaBkEvYjQ3THCLYHMQsOjA@mail.gmail.com>
In-Reply-To: <CA+_eA9j0thuO6sALz+en=xVUj+L=UaBkEvYjQ3THCLYHMQsOjA@mail.gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.177.18.75]
X-CFilter-Loop: Reflected
Cc: =?ISO-8859-1?Q?facolt=E0?= <giuseppe.lettieri73@gmail.com>,
 Giuseppe Lettieri <g.lettieri@iet.unipi.it>, Luigi Rizzo <rizzo@iet.unipi.it>,
 net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jan 2014 10:21:36 -0000

On 2014/1/27 17:37, Vincenzo Maffione wrote:
> 
> 
> 
> 2014/1/26 Wang Weidong <wangweidong1@huawei.com <mailto:wangweidong1@huawei.com>>
> 
>     On 2014/1/24 22:56, Vincenzo Maffione wrote:
>     >
>     >
>     >
>     > 2014/1/24 Wang Weidong <wangweidong1@huawei.com <mailto:wangweidong1@huawei.com> <mailto:wangweidong1@huawei.com <mailto:wangweidong1@huawei.com>>>
>     >
>     >     On 2014/1/20 20:39, Giuseppe Lettieri wrote:
>     >     > Hi Wang,
>     >     >
>     >     > OK, you are using the netmap support in the upstream qemu git. That does not yet include all our modifications, some of which are very important for high throughput with VALE. In particular, the upstream qemu does not include the batching improvements in the frontend/backend interface, and it does not include the "map ring" optimization of the e1000 frontend. Please find attached a gzipped patch that contains all of our qemu code. The patch is against the latest upstream master (commit 1cf892ca).
>     >     >
>     >     > Please ./configure the patched qemu with the following option, in addition to any other option you may need:
>     >     >
>     >     > --enable-e1000-paravirt --enable-netmap \
>     >     > --extra-cflags=-I/path/to/netmap/sys/directory
>     >     >
>     >     > Note that --enable-e1000-paravirt is needed to enable the "map ring" optimization in the e1000 frontend, even if you are not going to use the e1000-paravirt device.
>     >     >
>     >     > Now you should be able to rerun your tests. I am also attaching a README file that describes some more tests you may want to run.
>     >     >
>     >
>     > Hello,
>     >
>     >
>     >     Yes, I patch the qemu-netmap-bc767e701.patch to the qemu, download the 20131019-tinycore-netmap.hdd.
>     >     And I do some test that:
>     >
>     >     1. I use the bridge below:
>     >     qemu-system-x86_64 -m 2048 -boot c -net nic -net bridge,br=br1 -hda /home/wwd/tinycores/20131019-tinycore-netmap.hdd -enable-kvm -vnc :0
>     >     test between two vms.
>     >     br1 without device.
>     >     Use pktgen, I got the 237.95 kpps.
>     >     Use the netserver/netperf I got the speed 1037M bits/sec with TCP_STREAM. The max speed is up to 1621M.
>     >     Use the netserver/netperf I got the speed 3296/s with TCP_RR
>     >     Use the netserver/netperf I got the speed 234M/86M bits/sec with UDP_STREAM
>     >
>     >     When I add a device from host to the br1, the speed is 159.86 kpps.
>     >     Use the netserver/netperf I got the speed 720M bits/sec with TCP_STREAM. The max speed is up to 1000M.
>     >     Use the netserver/netperf I got the speed 3556/s with TCP_RR
>     >     Use the netserver/netperf I got the speed 181M/181M bits/sec with UDP_STREAM
>     >
>     >     What do you think of these data?
>     >
>     >
>     > You are using the old/deprecated QEMU command line syntax (-net), and therefore honestly It's not clear to me what kind of network configuration you are running.
>     >
>     > Please use our scripts "launch-qemu.sh", "prep-taps.sh", according to what described in the README.images file (attached).
>     > Alternatively, use the syntax like in the following examples
>     >
>     > (#1)   qemu-system-x86_64 archdisk.qcow -enable-kvm -device virtio-net-pci,netdev=mynet -netdev tap,ifname=tap01,id=mynet,script=no,downscript=no -smp 2
>     > (#2)   qemu-system-x86_64 archdisk.qcow -enable-kvm -device e1000,mitigation=off,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev netmap,ifname=vale0:01,id=mynet -smp 2
>     >
>     Here I use the 20131019-tinycore-netmap.hdd (download from the http://info.iet.unipi.it/~luigi/netmap/) instead archdisk.qcow  with #2.
>     And I can't do the "cpufreq-set -g performance  # on linux" as the README.image.
> 
>  
> This is our fault, thanks for reporting (probably the tinycore kernel doesn't include the cpufreq governors). However, if you use a different linux O.S. as host machine you should be able to cpufreq-set -gperformance on the host machine, while keep using tinycore into the vms.
> 
>     Although, I use the pkt-gen test the vms, I got the tx speed is ~3Mpps while the other vm's rx speed is only 1.44Mpps.
> 
>     Is it right? I can't get ~4Mpps is the reason that I can't set the "CPU power saving".
> 
> 
> It can be OK, depending on your machine and maybe on the CPU power saving.k On my machine
>     - Processor: Intel i7-3770K CPU @ 3.50GHz (8 cores)
>     - Memory @ 1333 MHz
>     - Host O.S.: Archlinux with Linux 3.12
> I get:
> 
>     - 3.9 Mpps on TX, 3.5 Mpps RX when guests are given 1 vCPU each
>     - 4.5 Mpps on TX, 3.2 Mpps RX when guests are given 2 vCPU each.
> 
>  
> 
> 
>     > so that it's clear to us what network frontend (e.g. emulated NIC) and network backend (e.g. netmap, tap, vde, ecc..) you are using.
>     > In example #1 we are using virtio-net as frontend and tap as backend, while in example #2 we are using e1000 as frontend and netmap as backend.
>     > Also consider giving more than one core (e.g. -smp 2) to each guest, to mitigate receiver livelock problems.
>     >
>     >
>     >
>     >     2. I use the vale below:
>     >     qemu-system-x86_64 -m 2048 -boot c -net nic -net netmap,vale0:0 -hda /home/wwd/tinycores/20131019-tinycore-netmap.hdd -enable-kvm -vnc :0
>     >
>     > Same for here, it's not clear what you are using. I guess each guest has an e1000 device and is connected to a different port of the same vale switch (e.g. vale0:0 and vale0:1)?
>     >
>     >     Test with 2 vms from the same host
>     >     vale0 without device.
>     >     I use the pkt-gen, the speed is 938 Kpps
>     >
>     >
>     > You should get ~4Mpps with e1000 frontend + netmap backend on a reasonably good machine. Make sure you have ./configure'd QEMU with --enable-e1000-paravirt.
>     >
>     >
>     >     I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 195M/195M, then add -- -m 8, I only got 1.07M/1.07M.
>     >     When use the smaller msg size, the speed will smaller?
>     >
>     >
>     > If you use e1000 with netperf (without pkt-gen) your performance is doomed to be horrible. Use e1000-paravirt (as a frontend) instead if you are interested in netperf experiment.
>     > Also consider that the point in using the "-- -m8" options is experimenting high packet rates, so what you should measure here is not the througput in Mbps, but the packet rate: netperf reports the number of packets sent and received, so you can obtain the packet rate by dividing by the running time.
>     > The throughput in Mbps is uninteresting, if you want high bulk throughput you just don't use "-- -m 8", but leave the defaults.
>     > Using virtio-net in this case will help because of the TSO offloadings.
>     >
>     Here, I am a little interested in netperf. So I did that:
>     qemu-system-x86_64 20131019-tinycore-netmap.hdd -enable-kvm -device *e1000-paravirt*,mitigation=off,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev netmap,ifname=vale0:01,id=mynet -smp 2 -m 2048 -vnc :0
>     qemu-system-x86_64 20131019-tinycore-netmap.hdd -enable-kvm -device *e1000-paravirt*,mitigation=off,mac=00:AA:BB:CC:DD:02,netdev=mynet -netdev netmap,ifname=vale0:02,id=mynet -smp 2 -m 2048 -vnc :1
>     I think this cmd is error. because after I set ipv4 addrs to the device, But I find two vms can't communicate with each other.
> 
> 
> This is the command that I use on my host machine (can be generated using the script launch-qemu.sh)
> 
> qemu-system-x86_64 20131019-tinycore-netmap.hdd -enable-kvm -device e1000-paravirt,mitigation=off,ioeventfd=on,v1000=off,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev netmap,ifname=vale0:01,id=mynet -smp 2 -m 3G -vga std
> 
> and it works to me. It may be that your problem is due to the fact that you are using a netmap version which mismatches with the QEMU version.
> Please use the netmap version I attached.
> 
> Also try "virtio-net-pci" as frontend, like in the following
> 
> qemu-system-x86_64 20131019-tinycore-netmap.hdd -enable-kvm -device virtio-net-pci,mrg_rxbuf=on,ioeventfd=on,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev netmap,ifname=vale0:01,id=mynet -smp 2 -m 3G -vga std
> 
> Consider that:
>  - e1000-paravirt is optimized for high packet rate (e.g. UDP_STREAM -- -m8)
>   - virtio-net-pci is optimized for TCP_STREAM and TCP_RR.
> and you should be able to deduce this from the numbers you measure.
>    
> 
> Cheers,
>   Vincenzo
> 
>  

Nice, I will test again.

Regards,
Wang

> 
> 
>     Thanks,
>     Wang
> 
>     > cheers
>     >   Vincenzo
>     >
>     >
>     >
>     >     with vale-ctl -a vale0:eth2,
>     >     use pkt-gen, the speed is 928 Kpps
>     >     I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 209M/208M, then add -- -m 8, I only got 1.06M/1.06M.
>     >
>     >     with vale-ctl -h vale0:eth2,
>     >     use pkt-gen, the speed is 928 Kpps
>     >     I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 192M/192M, then add -- -m 8, I only got 1.06M/1.06M.
>     >
>     >     Test with 2 vms form two host,
>     >     I only can test it by vale-ctl -h vale0:eth2 and set eth2 into promisc
>     >     use pkt-gen with the default params, the speed is about 750 Kpps
>     >     use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 160M/160M
>     >     Is this right?
>     >
>     >
>     >     3. I can't use the l2 utils.
>     >     When I do the "sudo l2open -t eth0 l2recv[l2send], I got that "l2open ioctl(TUNSETIFF...): Invalid argument"
>     >     and "use l2open -r eth0 l2recv", wait a moment (only several seconds), I got the result:
>     >     TEST-RESULT: 0.901 kpps 1pkts
>     >     select/read=100.00 err=0
>     >
>     >     And I can't find the l2 utils from the net? Is it implemented by your team?
>     >
>     >     All of them is tested on vms.
>     >
>     >     Cheers.
>     >     Wang
>     >
>     >
>     >     >
>     >     > Cheers,
>     >     > Giuseppe
>     >     >
>     >     > Il 17/01/2014 04:39, Wang Weidong ha scritto:
>     >     >> On 2014/1/16 18:24, facoltà wrote:
>     >     [...]
>     >     >>
>     >     >>
>     >     >
>     >     >
>     >
>     >
>     >
>     >
>     >
>     > --
>     > Vincenzo Maffione
> 
> 
> 
> 
> 
> -- 
> Vincenzo Maffione


From owner-freebsd-net@FreeBSD.ORG  Mon Jan 27 11:06:51 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2761BC86
 for <freebsd-net@FreeBSD.org>; Mon, 27 Jan 2014 11:06:51 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1171F1A8C
 for <freebsd-net@FreeBSD.org>; Mon, 27 Jan 2014 11:06:51 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s0RB6oqh013078
 for <freebsd-net@FreeBSD.org>; Mon, 27 Jan 2014 11:06:50 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id s0RB6oEf013075
 for freebsd-net@FreeBSD.org; Mon, 27 Jan 2014 11:06:50 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 27 Jan 2014 11:06:50 GMT
Message-Id: <201401271106.s0RB6oEf013075@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
 owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@freebsd.org>
To: freebsd-net@FreeBSD.org
Subject: Current problem reports assigned to freebsd-net@FreeBSD.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jan 2014 11:06:51 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/185496  net        [re] RTL8169 doesn't receive unicast ethernet packets 
o kern/185427  net        [igb] freebsd 8.4, 9.1 and 9.2 panic Double-Fault with
o kern/185023  net        [tun] Closing tun<n> interface deconfigures IP address
o kern/185022  net        [tun] ls /dev/tun<n> creates tun<n> interface
o kern/184311  net        [bge] [panic] kernel panic with bge(4) on SunFire X210
o kern/184084  net        [ral] kernel crash by ral (RT3090)
o bin/183687   net        [patch] route(8): route add -net 172.20 add wrong host
o kern/183659  net        [tcp] ]TCP stack lock contention with short-lived conn
o conf/183407  net        [rc.d] [patch] Routing restart returns non-zero exitco
o kern/183391  net        [oce] 10gigabit networking problems with Emulex OCE 11
o kern/183390  net        [ixgbe] 10gigabit networking problems
o kern/182917  net        [igb] strange out traffic with igb interfaces
o kern/182847  net        [netinet6] [patch] Remove dead code
o kern/182665  net        [wlan] Kernel panic when creating second wlandev.
o kern/182382  net        [tcp] sysctl to set TCP CC method on BIG ENDIAN system
o kern/182297  net        [cm] ArcNet driver fails to detect the link address - 
o kern/182212  net        [patch] [ng_mppc] ng_mppc(4) blocks on network errors 
o kern/181970  net        [re] LAN Realtek® 8111G is not supported by re driver
o kern/181931  net        [vlan] [lagg] vlan over lagg over mlxen crashes the ke
o kern/181823  net        [ip6] [patch] make ipv6 mroute return same errror code
o kern/181741  net        [kernel] [patch] Packet loss when 'control' messages a
o kern/181703  net        [re] [patch] Fix Realtek 8111G Ethernet controller not
o kern/181657  net        [bpf] [patch] BPF_COP/BPF_COPX instruction reservation
o kern/181257  net        [bge] bge link status change
o kern/181236  net        [igb] igb driver unstable work
o kern/181135  net        [netmap] [patch] sys/dev/netmap patch for Linux compat
o kern/181131  net        [netmap] [patch] sys/dev/netmap memory allocation impr
o kern/181006  net        [run] [patch] mbuf leak in run(4) driver
o kern/180893  net        [if_ethersubr] [patch] Packets received with own LLADD
o kern/180844  net        [panic] [re] Intermittent panic (re driver?)
o kern/180775  net        [bxe] if_bxe driver broken with Broadcom BCM57711 card
o kern/180722  net        [bluetooth] bluetooth takes 30-50 attempts to pair to 
s kern/180468  net        [request] LOCAL_PEERCRED support for PF_INET
o kern/180065  net        [netinet6] [patch] Multicast loopback to own host brok
o kern/179926  net        [lacp] [patch] active aggregator selection bug
o kern/179824  net        [ixgbe] System (9.1-p4) hangs on heavy ixgbe network t
o kern/179733  net        [lagg] [patch] interface loses capabilities when proto
o kern/179429  net        [tap] STP enabled tap bridge
o kern/179299  net        [igb] Intel X540-T2 - unstable driver
a kern/179264  net        [vimage] [pf] Core dump with Packet filter and VIMAGE 
o kern/178947  net        [arp] arp rejecting not working
o kern/178782  net        [ixgbe] 82599EB SFP does not work with passthrough und
o kern/178612  net        [run] kernel panic due the problems with run driver
o kern/178472  net        [ip6] [patch] make return code consistent with IPv4 co
o kern/178079  net        [tcp] Switching TCP CC algorithm panics on sparc64 wit
s kern/178071  net        FreeBSD unable to recongize Kontron (Industrial Comput
o kern/177905  net        [xl] [panic] ifmedia_set when pluging CardBus LAN card
o kern/177618  net        [bridge] Problem with bridge firewall with trunk ports
o kern/177402  net        [igb] [pf] problem with ethernet driver igb + pf / alt
o kern/177400  net        [jme] JMC25x 1000baseT establishment issues
o kern/177366  net        [ieee80211] negative malloc(9) statistics for 80211nod
f kern/177362  net        [netinet] [patch] Wrong control used to return TOS
o kern/177194  net        [netgraph] Unnamed netgraph nodes for vlan interfaces
o kern/177184  net        [bge] [patch] enable wake on lan
o kern/177139  net        [igb] igb drops ethernet ports 2 and 3
o kern/176884  net        [re] re0 flapping up/down
o kern/176671  net        [epair] MAC address for epair device not unique
o kern/176484  net        [ipsec] [enc] [patch] panic: IPsec + enc(4); device na
o kern/176446  net        [netinet] [patch] Concurrency in ixgbe driving out-of-
o kern/176420  net        [kernel] [patch] incorrect errno for LOCAL_PEERCRED
o kern/176419  net        [kernel] [patch] socketpair support for LOCAL_PEERCRED
o kern/176401  net        [netgraph] page fault  in netgraph
o kern/176167  net        [ipsec][lagg] using lagg and ipsec causes immediate pa
o kern/176027  net        [em] [patch] flow control systcl consistency for em dr
o kern/176026  net        [tcp] [patch] TCP wrappers caused quite a lot of warni
o kern/175864  net        [re] Intel MB D510MO, onboard ethernet not working aft
o kern/175852  net        [amd64] [patch] in_cksum_hdr() behaves differently on 
o kern/175734  net        no ethernet detected on system with EG20T PCH chipset 
o kern/175267  net        [pf] [tap] pf + tap keep state problem
o kern/175236  net        [epair] [gif] epair and gif Devices On Bridge
o kern/175182  net        [panic] kernel panic on RADIX_MPATH when deleting rout
o kern/175153  net        [tcp] will there miss a FIN when do TSO?
o kern/174959  net        [net] [patch] rnh_walktree_from visits spurious nodes
o kern/174958  net        [net] [patch] rnh_walktree_from makes unreasonable ass
o kern/174897  net        [route] Interface routes are broken
o kern/174851  net        [bxe] [patch] UDP checksum offload is wrong in bxe dri
o kern/174850  net        [bxe] [patch] bxe driver does not receive multicasts
o kern/174849  net        [bxe] [patch] bxe driver can hang kernel when reset
o kern/174822  net        [tcp] Page fault in tcp_discardcb under high traffic
o kern/174602  net        [gif] [ipsec] traceroute issue on gif tunnel with ipse
o kern/174535  net        [tcp] TCP fast retransmit feature works strange
o kern/173871  net        [gif] process of 'ifconfig gif0 create hangs' when if_
o kern/173475  net        [tun] tun(4) stays opened by PID after process is term
o kern/173201  net        [ixgbe] [patch] Missing / broken ixgbe sysctl's and tu
o kern/173137  net        [em] em(4) unable to run at gigabit with 9.1-RC2
o kern/173002  net        [patch] data type size problem in if_spppsubr.c
o kern/172895  net        [ixgb] [ixgbe] do not properly determine link-state
o kern/172683  net        [ip6] Duplicate IPv6 Link Local Addresses
o kern/172675  net        [netinet] [patch] sysctl_tcp_hc_list (net.inet.tcp.hos
p kern/172113  net        [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4
o kern/171840  net        [ip6] IPv6 packets transmitting only on queue 0
o kern/171739  net        [bce] [panic] bce related kernel panic
o kern/171711  net        [dummynet] [panic] Kernel panic in dummynet
o kern/171532  net        [ndis] ndis(4) driver includes 'pccard'-specific code,
o kern/171531  net        [ndis] undocumented dependency for ndis(4)
o kern/171524  net        [ipmi] ipmi driver crashes kernel by reboot or shutdow
s kern/171508  net        [epair] [request] Add the ability to name epair device
o kern/171228  net        [re] [patch] if_re - eeprom write issues
o kern/170701  net        [ppp] killl ppp or reboot with active ppp connection c
o kern/170267  net        [ixgbe] IXGBE_LE32_TO_CPUS is probably an unintentiona
o kern/170081  net        [fxp] pf/nat/jails not working if checksum offloading 
o kern/169898  net        ifconfig(8) fails to set MTU on multiple interfaces.
o kern/169676  net        [bge] [hang] system hangs, fully or partially after re
o kern/169620  net        [ng] [pf] ng_l2tp incoming packet bypass pf firewall
o kern/169459  net        [ppp] umodem/ppp/3g stopped working after update from 
o kern/169438  net        [ipsec] ipv4-in-ipv6 tunnel mode IPsec does not work
p kern/168294  net        [ixgbe] [patch] ixgbe driver compiled in kernel has no
o kern/168246  net        [em] Multiple em(4) not working with qemu
o kern/168245  net        [arp] [regression] Permanent ARP entry not deleted on 
o kern/168244  net        [arp] [regression] Unable to manually remove permanent
o kern/168183  net        [bce] bce driver hang system
o kern/167603  net        [ip] IP fragment reassembly's broken: file transfer ov
o kern/167500  net        [em] [panic] Kernel panics in em driver
o kern/167325  net        [netinet] [patch] sosend sometimes return EINVAL with 
o kern/167202  net        [igmp]: Sending multiple IGMP packets crashes kernel
o kern/166462  net        [gre] gre(4) when using a tunnel source address from c
o kern/166285  net        [arp] FreeBSD v8.1 REL p8 arp: unknown hardware addres
o kern/166255  net        [net] [patch] It should be possible to disable "promis
p kern/165903  net        mbuf leak
o kern/165622  net        [ndis][panic][patch] Unregistered use of FPU in kernel
s kern/165562  net        [request] add support for Intel i350 in FreeBSD 7.4
o kern/165526  net        [bxe] UDP packets checksum calculation whithin if_bxe 
o kern/165488  net        [ppp] [panic] Fatal trap 12 jails and ppp , kernel wit
o kern/165305  net        [ip6] [request] Feature parity between IP_TOS and IPV6
o kern/165296  net        [vlan] [patch] Fix EVL_APPLY_VLID, update EVL_APPLY_PR
o kern/165181  net        [igb] igb freezes after about 2 weeks of uptime
o kern/165174  net        [patch] [tap] allow tap(4) to keep its address on clos
o kern/165152  net        [ip6] Does not work through the issue of ipv6 addresse
o kern/164495  net        [igb] connect double head igb to switch cause system t
o kern/164490  net        [pfil] Incorrect IP checksum on pfil pass from ip_outp
o kern/164475  net        [gre] gre misses RUNNING flag after a reboot
o kern/164265  net        [netinet] [patch] tcp_lro_rx computes wrong checksum i
o kern/163903  net        [igb] "igb0:tx(0)","bpf interface lock" v2.2.5 9-STABL
o kern/163481  net        freebsd do not add itself to ping route packet
o kern/162927  net        [tun] Modem-PPP error ppp[1538]: tun0: Phase: Clearing
o kern/162558  net        [dummynet] [panic] seldom dummynet panics
o kern/162153  net        [em] intel em driver 7.2.4 don't compile
o kern/162110  net        [igb] [panic] RELENG_9 panics on boot in IGB driver - 
o kern/162028  net        [ixgbe] [patch] misplaced #endif in ixgbe.c
o kern/161277  net        [em] [patch] BMC cannot receive IPMI traffic after loa
o kern/160873  net        [igb] igb(4) from HEAD fails to build on 7-STABLE
o kern/160750  net        Intel PRO/1000 connection breaks under load until rebo
o kern/160693  net        [gif] [em] Multicast packet are not passed from GIF0 t
o kern/160293  net        [ieee80211] ppanic] kernel panic during network setup 
o kern/160206  net        [gif] gifX stops working after a while (IPv6 tunnel)
o kern/159817  net        [udp] write UDPv4: No buffer space available (code=55)
o kern/159629  net        [ipsec] [panic] kernel panic with IPsec in transport m
o kern/159621  net        [tcp] [panic] panic: soabort: so_count
o kern/159603  net        [netinet] [patch] in_ifscrubprefix() - network route c
o kern/159601  net        [netinet] [patch] in_scrubprefix() - loopback route re
o kern/159294  net        [em] em watchdog timeouts
o kern/159203  net        [wpi] Intel 3945ABG Wireless LAN not support IBSS
o kern/158930  net        [bpf] BPF element leak in ifp->bpf_if->bif_dlist
o kern/158726  net        [ip6] [patch] ICMPv6 Router Announcement flooding limi
o kern/158694  net        [ix] [lagg] ix0 is not working within lagg(4)
o kern/158665  net        [ip6] [panic] kernel pagefault in in6_setscope()
o kern/158635  net        [em] TSO breaks BPF packet captures with em driver
f kern/157802  net        [dummynet] [panic] kernel panic in dummynet
o kern/157785  net        amd64 + jail + ipfw + natd = very slow outbound traffi
o kern/157418  net        [em] em driver lockup during boot on Supermicro X9SCM-
o kern/157410  net        [ip6] IPv6 Router Advertisements Cause Excessive CPU U
o kern/157287  net        [re] [panic] INVARIANTS panic (Memory modified after f
o kern/157200  net        [network.subr] [patch] stf(4) can not communicate betw
o kern/157182  net        [lagg] lagg interface not working together with epair 
o kern/156877  net        [dummynet] [panic] dummynet move_pkt() null ptr derefe
o kern/156667  net        [em] em0 fails to init on CURRENT after March 17
o kern/156408  net        [vlan] Routing failure when using VLANs vs. Physical e
o kern/156328  net        [icmp]: host can ping other subnet but no have IP from
o kern/156317  net        [ip6] Wrong order of IPv6 NS DAD/MLD Report
o kern/156279  net        [if_bridge][divert][ipfw] unable to correctly re-injec
o kern/156226  net        [lagg]: failover does not announce the failover to swi
o kern/156030  net        [ip6] [panic] Crash in nd6_dad_start() due to null ptr
o kern/155680  net        [multicast] problems with multicast
s kern/155642  net        [new driver] [request] Add driver for Realtek RTL8191S
o kern/155597  net        [panic] Kernel panics with "sbdrop" message
o kern/155420  net        [vlan] adding vlan break existent vlan
o kern/155177  net        [route] [panic] Panic when inject routes in kernel
o kern/155010  net        [msk] ntfs-3g via iscsi using msk driver cause kernel 
o kern/154943  net        [gif] ifconfig gifX create on existing gifX clears IP
s kern/154851  net        [new driver] [request]: Port brcm80211 driver from Lin
o kern/154850  net        [netgraph] [patch] ng_ether fails to name nodes when t
o kern/154679  net        [em] Fatal trap 12: "em1 taskq" only at startup (8.1-R
o kern/154600  net        [tcp] [panic] Random kernel panics on tcp_output
o kern/154557  net        [tcp] Freeze tcp-session of the clients, if in the gat
o kern/154443  net        [if_bridge] Kernel module bridgestp.ko missing after u
o kern/154286  net        [netgraph] [panic] 8.2-PRERELEASE panic in netgraph
o kern/154255  net        [nfs] NFS not responding
o kern/154214  net        [stf] [panic] Panic when creating stf interface
o kern/154185  net        race condition in mb_dupcl
p kern/154169  net        [multicast] [ip6] Node Information Query multicast add
o kern/154134  net        [ip6] stuck kernel state in LISTEN on ipv6 daemon whic
o kern/154091  net        [netgraph] [panic] netgraph, unaligned mbuf?
o conf/154062  net        [vlan] [patch] change to way of auto-generatation of v
o kern/153937  net        [ral] ralink panics the system (amd64 freeBSDD 8.X) wh
o kern/153936  net        [ixgbe] [patch] MPRC workaround incorrectly applied to
o kern/153816  net        [ixgbe] ixgbe doesn't work properly with the Intel 10g
o kern/153772  net        [ixgbe] [patch] sysctls reference wrong XON/XOFF varia
o kern/153497  net        [netgraph] netgraph panic due to race conditions
o kern/153454  net        [patch] [wlan] [urtw] Support ad-hoc and hostap modes 
o kern/153308  net        [em] em interface use 100% cpu
o kern/153244  net        [em] em(4) fails to send UDP to port 0xffff
o kern/152893  net        [netgraph] [panic] 8.2-PRERELEASE panic in netgraph
o kern/152853  net        [em] tftpd (and likely other udp traffic) fails over e
o kern/152828  net        [em] poor performance on 8.1, 8.2-PRE
o kern/152569  net        [net]: Multiple ppp connections and routing table prob
o kern/152235  net        [arp] Permanent local ARP entries are not properly upd
o kern/152141  net        [vlan] [patch] encapsulate vlan in ng_ether before out
o kern/152036  net        [libc] getifaddrs(3) returns truncated sockaddrs for n
o kern/151690  net        [ep] network connectivity won't work until dhclient is
o kern/151681  net        [nfs] NFS mount via IPv6 leads to hang on client with 
o kern/151593  net        [igb] [panic] Kernel panic when bringing up igb networ
o kern/150920  net        [ixgbe][igb] Panic when packets are dropped with heade
o kern/150557  net        [igb] igb0: Watchdog timeout -- resetting
o kern/150251  net        [patch] [ixgbe] Late cable insertion broken
o kern/150249  net        [ixgbe] Media type detection broken
o bin/150224   net        ppp(8) does not reassign static IP after kill -KILL co
f kern/149969  net        [wlan] [ral] ralink rt2661 fails to maintain connectio
o kern/149643  net        [rum] device not sending proper beacon frames in ap mo
o kern/149609  net        [panic] reboot after adding second default route
o kern/149117  net        [inet] [patch] in_pcbbind: redundant test
o kern/149086  net        [multicast] Generic multicast join failure in 8.1
o kern/148018  net        [flowtable] flowtable crashes on ia64
o kern/147912  net        [boot] FreeBSD 8 Beta won't boot on Thinkpad i1300  11
o kern/147894  net        [ipsec] IPv6-in-IPv4 does not work inside an ESP-only 
o kern/147155  net        [ip6] setfb not work with ipv6
o kern/146845  net        [libc] close(2) returns error 54 (connection reset by 
f kern/146792  net        [flowtable] flowcleaner 100% cpu's core load
o kern/146719  net        [pf] [panic] PF or dumynet kernel panic
o kern/146534  net        [icmp6] wrong source address in echo reply
o kern/146427  net        [mwl] Additional virtual access points don't work on m
f kern/146394  net        [vlan] IP source address for outgoing connections
o bin/146377   net        [ppp] [tun] Interface doesn't clear addresses when PPP
o kern/146358  net        [vlan] wrong destination MAC address
o kern/146165  net        [wlan] [panic] Setting bssid in adhoc mode causes pani
o kern/146037  net        [panic] mpd + CoA = kernel panic
o kern/145825  net        [panic] panic: soabort: so_count
o kern/145728  net        [lagg] Stops working lagg between two servers.
p kern/145600  net        TCP/ECN behaves different to CE/CWR than ns2 reference
f kern/144917  net        [flowtable] [panic] flowtable crashes system [regressi
o kern/144882  net        MacBookPro =>4.1 does not connect to BSD in hostap wit
o kern/144874  net        [if_bridge] [patch] if_bridge frees mbuf after pfil ho
o conf/144700  net        [rc.d] async dhclient breaks stuff for too many people
o kern/144616  net        [nat] [panic] ip_nat panic FreeBSD 7.2
f kern/144315  net        [ipfw] [panic] freebsd 8-stable reboot after add ipfw 
o kern/144231  net        bind/connect/sendto too strict about sockaddr length
o kern/143846  net        [gif] bringing gif3 tunnel down causes gif0 tunnel to 
s kern/143673  net        [stf] [request] there should be a way to support multi
o kern/143622  net        [pfil] [patch] unlock pfil lock while calling firewall
o kern/143593  net        [ipsec] When using IPSec, tcpdump doesn't show outgoin
o kern/143591  net        [ral] RT2561C-based DLink card (DWL-510) fails to work
o kern/143208  net        [ipsec] [gif] IPSec over gif interface not working
o kern/143034  net        [panic] system reboots itself in tcp code [regression]
o kern/142877  net        [hang] network-related repeatable 8.0-STABLE hard hang
o kern/142774  net        Problem with outgoing connections on interface with mu
o kern/142772  net        [libc] lla_lookup: new lle malloc failed
f kern/142518  net        [em] [lagg] Problem on 8.0-STABLE with em and lagg
o kern/142018  net        [iwi] [patch] Possibly wrong interpretation of beacon-
o kern/141861  net        [wi] data garbled with WEP and wi(4) with Prism 2.5
f kern/141741  net        Etherlink III NIC won't work after upgrade to FBSD 8, 
o kern/140742  net        rum(4) Two asus-WL167G adapters cannot talk to each ot
o kern/140682  net        [netgraph] [panic] random panic in netgraph
f kern/140634  net        [vlan] destroying if_lagg interface with if_vlan membe
o kern/140619  net        [ifnet] [patch] refine obsolete if_var.h comments desc
o kern/140346  net        [wlan] High bandwidth use causes loss of wlan connecti
o kern/140142  net        [ip6] [panic] FreeBSD 7.2-amd64 panic w/IPv6
o kern/140066  net        [bwi] install report for 8.0 RC 2 (multiple problems)
o kern/139387  net        [ipsec] Wrong lenth of PF_KEY messages in promiscuous 
o bin/139346   net        [patch] arp(8) add option to remove static entries lis
o kern/139268  net        [if_bridge] [patch] allow if_bridge to forward just VL
p kern/139204  net        [arp] DHCP server replies rejected, ARP entry lost bef
o kern/139117  net        [lagg] + wlan boot timing (EBUSY)
o kern/138850  net        [dummynet] dummynet doesn't work correctly on a bridge
o kern/138782  net        [panic] sbflush_internal: cc 0 || mb 0xffffff004127b00
o kern/138688  net        [rum] possibly broken on 8 Beta 4 amd64: able to wpa a
o kern/138678  net        [lo] FreeBSD does not assign linklocal address to loop
o kern/138407  net        [gre] gre(4) interface does not come up after reboot
o kern/138332  net        [tun] [lor] ifconfig tun0 destroy causes LOR if_adata/
o kern/138266  net        [panic] kernel panic when udp benchmark test used as r
f kern/138029  net        [bpf] [panic] periodically kernel panic and reboot
o kern/137881  net        [netgraph] [panic] ng_pppoe fatal trap 12
p bin/137841   net        [patch] wpa_supplicant(8) cannot verify SHA256 signed 
p kern/137776  net        [rum] panic in rum(4) driver on 8.0-BETA2
o bin/137641   net        ifconfig(8): various problems with "vlan_device.vlan_i
o kern/137392  net        [ip] [panic] crash in ip_nat.c line 2577
o kern/137372  net        [ral] FreeBSD doesn't support wireless interface from 
o kern/137089  net        [lagg] lagg falsely triggers IPv6 duplicate address de
o kern/136911  net        [netgraph] [panic] system panic on kldload ng_bpf.ko t
o kern/136618  net        [pf][stf] panic on cloning interface without unit numb
o kern/135502  net        [periodic] Warning message raised by rtfree function i
o kern/134583  net        [hang] Machine with jail freezes after random amount o
o kern/134531  net        [route] [panic] kernel crash related to routes/zebra
o kern/134157  net        [dummynet] dummynet loads cpu for 100% and make a syst
o kern/133969  net        [dummynet] [panic] Fatal trap 12: page fault while in 
o kern/133968  net        [dummynet] [panic] dummynet kernel panic
o kern/133736  net        [udp] ip_id not protected ...
o kern/133595  net        [panic] Kernel Panic at pcpu.h:195
o kern/133572  net        [ppp] [hang] incoming PPTP connection hangs the system
o kern/133490  net        [bpf] [panic] 'kmem_map too small' panic on Dell r900 
o kern/133235  net        [netinet] [patch] Process SIOCDLIFADDR command incorre
f kern/133213  net        arp and sshd errors on 7.1-PRERELEASE
o kern/133060  net        [ipsec] [pfsync] [panic] Kernel panic with ipsec + pfs
o kern/132889  net        [ndis] [panic] NDIS kernel crash on load BCM4321 AGN d
o conf/132851  net        [patch] rc.conf(5): allow to setfib(1) for service run
o kern/132734  net        [ifmib] [panic] panic in net/if_mib.c
o kern/132705  net        [libwrap] [patch] libwrap - infinite loop if hosts.all
o kern/132672  net        [ndis] [panic] ndis with rt2860.sys causes kernel pani
o kern/132354  net        [nat] Getting some packages to ipnat(8) causes crash
o kern/131781  net        [ndis] ndis keeps dropping the link
o kern/131776  net        [wi] driver fails to init
o kern/131753  net        [altq] [panic] kernel panic in hfsc_dequeue
o bin/131365   net        route(8): route add changes interpretation of network 
f kern/130820  net        [ndis] wpa_supplicant(8) returns 'no space on device'
o kern/130628  net        [nfs] NFS / rpc.lockd deadlock on 7.1-R
o kern/130525  net        [ndis] [panic] 64 bit ar5008 ndisgen-erated driver cau
o kern/130311  net        [wlan_xauth] [panic] hostapd restart causing kernel pa
o kern/130109  net        [ipfw] Can not set fib for packets originated from loc
f kern/130059  net        [panic] Leaking 50k mbufs/hour
f kern/129719  net        [nfs] [panic] Panic during shutdown, tcp_ctloutput: in
o kern/129517  net        [ipsec] [panic] double fault / stack overflow
f kern/129508  net        [carp] [panic] Kernel panic with EtherIP (may be relat
o kern/129219  net        [ppp] Kernel panic when using kernel mode ppp
o kern/129197  net        [panic] 7.0 IP stack related panic
o kern/129036  net        [ipfw] 'ipfw fwd' does not change outgoing interface n
o bin/128954   net        ifconfig(8) deletes valid routes
o bin/128602   net        [an] wpa_supplicant(8) crashes with an(4)
o kern/128448  net        [nfs] 6.4-RC1 Boot Fails if NFS Hostname cannot be res
o bin/128295   net        [patch] ifconfig(8) does not print TOE4 or TOE6 capabi
o bin/128001   net        wpa_supplicant(8), wlan(4), and wi(4) issues
o kern/127826  net        [iwi] iwi0 driver has reduced performance and connecti
o kern/127815  net        [gif] [patch] if_gif does not set vlan attributes from
o kern/127724  net        [rtalloc] rtfree: 0xc5a8f870 has 1 refs
f bin/127719   net        [arp] arp: Segmentation fault (core dumped)
f kern/127528  net        [icmp]: icmp socket receives icmp replies not owned by
p kern/127360  net        [socket] TOE socket options missing from sosetopt()
o bin/127192   net        routed(8) removes the secondary alias IP of interface 
f kern/127145  net        [wi]: prism (wi) driver crash at bigger traffic
o kern/126895  net        [patch] [ral] Add antenna selection (marked as TBD)
o kern/126874  net        [vlan]: Zebra problem if ifconfig vlanX destroy
o kern/126695  net        rtfree messages and network disruption upon use of if_
o kern/126339  net        [ipw] ipw driver drops the connection
o kern/126075  net        [inet] [patch] internet control accesses beyond end of
o bin/125922   net        [patch] Deadlock in arp(8)
o kern/125920  net        [arp] Kernel Routing Table loses Ethernet Link status 
o kern/125845  net        [netinet] [patch] tcp_lro_rx() should make use of hard
o kern/125258  net        [socket] socket's SO_REUSEADDR option does not work
o kern/125239  net        [gre] kernel crash when using gre
o kern/124341  net        [ral] promiscuous mode for wireless device ral0 looses
o kern/124225  net        [ndis] [patch] ndis network driver sometimes loses net
o kern/124160  net        [libc] connect(2) function loops indefinitely
o kern/124021  net        [ip6] [panic] page fault in nd6_output()
o kern/123968  net        [rum] [panic] rum driver causes kernel panic with WPA.
o kern/123892  net        [tap] [patch] No buffer space available
o kern/123890  net        [ppp] [panic] crash & reboot on work with PPP low-spee
o kern/123858  net        [stf] [patch] stf not usable behind a NAT
o kern/123758  net        [panic] panic while restarting net/freenet6
o bin/123633   net        ifconfig(8) doesn't set inet and ether address in one 
o kern/123559  net        [iwi] iwi periodically disassociates/associates [regre
o bin/123465   net        [ip6] route(8): route add -inet6 <ipv6_addr> -interfac
o kern/123463  net        [ipsec] [panic] repeatable crash related to ipsec-tool
o conf/123330  net        [nsswitch.conf] Enabling samba wins in nsswitch.conf c
o kern/123160  net        [ip] Panic and reboot at sysctl kern.polling.enable=0
o kern/122989  net        [swi] [panic] 6.3 kernel panic in swi1: net
o kern/122954  net        [lagg] IPv6 EUI64 incorrectly chosen for lagg devices
f kern/122780  net        [lagg] tcpdump on lagg interface during high pps wedge
o kern/122685  net        It is not visible passing packets in tcpdump(1)
o kern/122319  net        [wi] imposible to enable ad-hoc demo mode with Orinoco
o kern/122290  net        [netgraph] [panic] Netgraph related "kmem_map too smal
o kern/122252  net        [ipmi] [bge] IPMI problem with BCM5704 (does not work 
o kern/122033  net        [ral] [lor] Lock order reversal in ral0 at bootup ieee
o bin/121895   net        [patch] rtsol(8)/rtsold(8) doesn't handle managed netw
s kern/121774  net        [swi] [panic] 6.3 kernel panic in swi1: net
o kern/121555  net        [panic] Fatal trap 12: current process = 12 (swi1: net
o kern/121534  net        [ipl] [nat] FreeBSD Release 6.3 Kernel Trap 12:
o kern/121443  net        [gif] [lor] icmp6_input/nd6_lookup
o kern/121437  net        [vlan] Routing to layer-2 address does not work on VLA
o bin/121359   net        [patch] [security] ppp(8): fix local stack overflow in
o kern/121257  net        [tcp] TSO + natd  -> slow outgoing tcp traffic
o kern/121181  net        [panic] Fatal trap 3: breakpoint instruction fault whi
o kern/120966  net        [rum] kernel panic with if_rum and WPA encryption
o kern/120566  net        [request]: ifconfig(8) make order of arguments more fr
o kern/120304  net        [netgraph] [patch] netgraph source assumes 32-bit time
o kern/120266  net        [udp] [panic] gnugk causes kernel panic when closing U
o bin/120060   net        routed(8) deletes link-level routes in the presence of
o kern/119945  net        [rum] [panic] rum device in hostap mode, cause kernel 
o kern/119791  net        [nfs] UDP NFS mount of aliased IP addresses from a Sol
o kern/119617  net        [nfs] nfs error on wpa network when reseting/shutdown
f kern/119516  net        [ip6] [panic] _mtx_lock_sleep: recursed on non-recursi
o kern/119432  net        [arp] route add -host <host> -iface <nic> causes arp e
o kern/119225  net        [wi] 7.0-RC1 no carrier with Prism 2.5 wifi card [regr
o kern/118727  net        [netgraph] [patch] [request] add new ng_pf module
o kern/117423  net        [vlan] Duplicate IP on different interfaces
o bin/117339   net        [patch] route(8): loading routing management commands 
o bin/116643   net        [patch] [request] fstat(1): add INET/INET6 socket deta
o kern/116185  net        [iwi] if_iwi driver leads system to reboot
o kern/115239  net        [ipnat] panic with 'kmem_map too small' using ipnat
o kern/115019  net        [netgraph] ng_ether upper hook packet flow stops on ad
o kern/115002  net        [wi] if_wi timeout. failed allocation (busy bit). ifco
o kern/114915  net        [patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f
o kern/113432  net        [ucom] WARNING: attempt to net_add_domain(netgraph) af
o kern/112722  net        [ipsec] [udp] IP v4 udp fragmented packet reject
o kern/112686  net        [patm] patm driver freezes System (FreeBSD 6.2-p4) i38
o bin/112557   net        [patch] ppp(8) lock file should not use symlink name
o kern/112528  net        [nfs] NFS over TCP under load hangs with "impossible p
o kern/111537  net        [inet6] [patch] ip6_input() treats mbuf cluster wrong
o kern/111457  net        [ral] ral(4) freeze
o kern/110284  net        [if_ethersubr] Invalid Assumption in SIOCSIFADDR in et
o kern/110249  net        [kernel] [regression] [patch] setsockopt() error regre
o kern/109470  net        [wi] Orinoco Classic Gold PC Card Can't Channel Hop
o bin/108895   net        pppd(8): PPPoE dead connections on 6.2 [regression]
f kern/108197  net        [panic] [gif] [ip6] if_delmulti reference counting pan
o kern/107944  net        [wi] [patch] Forget to unlock mutex-locks
o conf/107035  net        [patch] bridge(8): bridge interface given in rc.conf n
o kern/106444  net        [netgraph] [panic] Kernel Panic on Binding to an ip to
o kern/106316  net        [dummynet] dummynet with multipass ipfw drops packets 
o kern/105945  net        Address can disappear from network interface
s kern/105943  net        Network stack may modify read-only mbuf chain copies
o bin/105925   net        problems with ifconfig(8) and vlan(4) [regression]
o kern/104851  net        [inet6] [patch] On link routes not configured when usi
o kern/104751  net        [netgraph] kernel panic, when getting info about my tr
o kern/104738  net        [inet] [patch] Reentrant problem with inet_ntoa in the
o kern/103191  net        Unpredictable reboot
o kern/103135  net        [ipsec] ipsec with ipfw divert (not NAT) encodes a pac
o kern/102540  net        [netgraph] [patch] supporting vlan(4) by ng_fec(4)
o conf/102502  net        [netgraph] [patch] ifconfig name does't rename netgrap
o kern/102035  net        [plip] plip networking disables parallel port printing
o kern/100709  net        [libc] getaddrinfo(3) should return TTL info
o kern/100519  net        [netisr] suggestion to fix suboptimal network polling
o kern/98597   net        [inet6] Bug in FreeBSD 6.1 IPv6 link-local DAD procedu
o bin/98218    net        wpa_supplicant(8) blacklist not working
o kern/97306   net        [netgraph] NG_L2TP locks after connection with failed 
o conf/97014   net        [gif] gifconfig_gif? in rc.conf does not recognize IPv
f kern/96268   net        [socket] TCP socket performance drops by 3000% if pack
o kern/95519   net        [ral] ral0 could not map mbuf
o kern/95288   net        [pppd] [tty] [panic] if_ppp panic in sys/kern/tty_subr
o kern/95277   net        [netinet] [patch] IP Encapsulation mask_match() return
o kern/95267   net        packet drops periodically appear
f kern/93378   net        [tcp] Slow data transfer in Postfix and Cyrus IMAP (wo
o kern/93019   net        [ppp] ppp and tunX problems: no traffic after restarti
o kern/92880   net        [libc] [patch] almost rewritten inet_network(3) functi
s kern/92279   net        [dc] Core faults everytime I reboot, possible NIC issu
o kern/91859   net        [ndis] if_ndis does not work with Asus WL-138
o kern/91364   net        [ral] [wep] WF-511 RT2500 Card PCI and WEP
o kern/91311   net        [aue] aue interface hanging
o kern/87421   net        [netgraph] [panic]: ng_ether + ng_eiface + if_bridge
o kern/86871   net        [tcp] [patch] allocation logic for PCBs in TIME_WAIT s
o kern/86427   net        [lor] Deadlock with FASTIPSEC and nat
o kern/85780   net        'panic: bogus refcnt 0' in routing/ipv6
o bin/85445    net        ifconfig(8): deprecated keyword to ifconfig inoperativ
o bin/82975    net        route change does not parse classfull network as given
o kern/82881   net        [netgraph] [panic] ng_fec(4) causes kernel panic after
o kern/82468   net        Using 64MB tcp send/recv buffers, trafficflow stops, i
o bin/82185    net        [patch] ndp(8) can delete the incorrect entry
o kern/81095   net        IPsec connection stops working if associated network i
o kern/78968   net        FreeBSD freezes on mbufs exhaustion (network interface
o kern/78090   net        [ipf] ipf filtering on bridged packets doesn't work if
o kern/77341   net        [ip6] problems with IPV6 implementation
o kern/75873   net        Usability problem with non-RFC-compliant IP spoof prot
s kern/75407   net        [an] an(4): no carrier after short time
a kern/71474   net        [route] route lookup does not skip interfaces marked d
o kern/71469   net        default route to internet magically disappears with mu
o kern/68889   net        [panic] m_copym, length > size of mbuf chain
o kern/66225   net        [netgraph] [patch] extend ng_eiface(4) control message
o kern/65616   net        IPSEC can't detunnel GRE packets after real ESP encryp
s kern/60293   net        [patch] FreeBSD arp poison patch
a kern/56233   net        IPsec tunnel (ESP) over IPv6: MTU computation is wrong
s bin/41647    net        ifconfig(8) doesn't accept lladdr along with inet addr
o kern/39937   net        ipstealth issue
a kern/38554   net        [patch] changing interface ipaddress doesn't seem to w
o kern/31940   net        ip queue length too short for >500kpps
o kern/31647   net        [libc] socket calls can return undocumented EINVAL
o kern/30186   net        [libc] getaddrinfo(3) does not handle incorrect servna
f kern/24959   net        [patch] proper TCP_NOPUSH/TCP_CORK compatibility
o conf/23063   net        [arp] [patch] for static ARP tables in rc.network
o kern/21998   net        [socket] [patch] ident only for outgoing connections
o kern/5877    net        [socket] sb_cc counts control data as well as data dat

475 problems total.


From owner-freebsd-net@FreeBSD.ORG  Mon Jan 27 15:47:30 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 94E89FBE
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 15:47:30 +0000 (UTC)
Received: from bounce122.photobucket.com (bounce122.photobucket.com
 [66.11.51.122]) by mx1.freebsd.org (Postfix) with SMTP id 60ACA15C4
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 15:47:30 +0000 (UTC)
Received: (qmail 17647 invoked from network); 27 Jan 2014 15:46:11 -0000
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed;
 d=photobucket.com; s=default; h=Comment:DomainKey-Signature:
 Received:Received:To:Subject:MIME-Version:Content-Type:From:
 Message-Id:Date; bh=EM79JQfrwyP0wd0iWN+Y1r5ZPDY=; b=CpH9CMYMBDiV
 gkj5y0XmnlpkBw4tG2ZFvkue72G+xEb+4JxW/ZmPuzMZec+DoXcSfNNLWcZP2jfr
 kfzCntT2j1eYzkyJO1wInelMg6e65IeMBUfpqSC4i5JkIR3l3EmKy4/6yLrhjn9E
 VPSSLJUJSZVbJ3K7Sywsn6jg/7sI4+s=
Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=photobucket.com;
 b=Iz5S9k0t+IzU2w9pJD1aYiNxW1LWxYuDikUkbVIvhyUF2g6Cuy/RY7SqCN9lNQrQJYx1I1mNnozawmQzkfuMvRYqhnhVTAQVwEKPEe43FPsT/IRHKaTItEVDv9UfttnkOKsTxl/lNlnp2VBAdN1x3sg8ZBIXueRLVYALQSp+Atg=
 ; 
X-Mailer-Info: AwVkZQDknmSypzMbZzIlMztfqTIvYaSzo3WlMKANM3WuYKSzo3WlMKZfMj
Received: from unknown (HELO den2tools01.photobucket.com) (10.2.24.106)
 by mailer.photobucket.com with SMTP; 27 Jan 2014 15:46:11 -0000
Received: by den2tools01.photobucket.com (Postfix, from userid 99)
 id C898D604A5; Mon, 27 Jan 2014 08:46:11 -0700 (MST)
To: freebsd-net@freebsd.org
Subject: mya.01@hotmail.com shared a photo with you on Photobucket
MIME-Version: 1.0
From: updates@photobucket.com
X-PBContext: 2
Message-Id: <20140127154611.C898D604A5@den2tools01.photobucket.com>
Date: Mon, 27 Jan 2014 08:46:11 -0700 (MST)
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jan 2014 15:47:30 -0000

Hello!

Good news!  mya.01@hotmail.com wants to share a Photobucket photo with you:
"Good day,=20

Nice to meet you, my name is Mary, I found your contact and I picked intere=
st to contact you via this medium. I've something very important which I wo=
uld love to share with you therefore, I would appreciate if you respond bac=
k to me through this E-mail (mya.01@hotmail.com), &amp; I'll write you back=
 with my full details. I am waiting anxiously for your response.=20

Truly yours,=20
Mary."

http://s1276.photobucket.com/user/staffpicks/media/Animated_GIFs/mtuhir.gif=
.html?evt=3Demail_share


____________________________________________
Photobucket.com - http://.photobucket.com=

From owner-freebsd-net@FreeBSD.ORG  Mon Jan 27 20:23:09 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id F40EDDC8;
 Mon, 27 Jan 2014 20:23:08 +0000 (UTC)
Received: from mail-qc0-x22c.google.com (mail-qc0-x22c.google.com
 [IPv6:2607:f8b0:400d:c01::22c])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id A4C801DB2;
 Mon, 27 Jan 2014 20:23:08 +0000 (UTC)
Received: by mail-qc0-f172.google.com with SMTP id c9so8901270qcz.31
 for <multiple recipients>; Mon, 27 Jan 2014 12:23:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:date:message-id:subject:from:to:content-type;
 bh=RxCkGlcbLExsPnrxDNQgyQLespxWMTFSwvuizWE5Nnc=;
 b=lXszfcnSV4653QWrbUQQ6DJ98UXQCXBtFtc5N1kPoEIMQ7aKRzLZKOkKeEFouqDwwo
 0OpIaq4u7Dfm519qhifPY5fLmUsqoyosgEvoazKHQXZlL+JYI08bsFAhqkuEPFcU+LEC
 cedDuoCvYjsNAT+RXHZK1zvBzZKWcvBf/BSxLsapQTRpDqTMriKbri1xSywfONpXjeTx
 uqodbxQQpVeQkJQm5gclue0JLMGtSD1hJ3vZaYoU7nq90OSnSnOLDKiYhlNsXp2kdmjM
 ZfmSoiR5chW/FkbBdHNvURS9BXnRRQI7YSvVZ97+PCDgTTc7kf0+nHzOWnya7un9RT5R
 lsiQ==
MIME-Version: 1.0
X-Received: by 10.224.122.208 with SMTP id m16mr46425650qar.55.1390854187930; 
 Mon, 27 Jan 2014 12:23:07 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.224.52.8 with HTTP; Mon, 27 Jan 2014 12:23:07 -0800 (PST)
Date: Mon, 27 Jan 2014 12:23:07 -0800
X-Google-Sender-Auth: VrWtfFIAKoQaqamrf79udvoViok
Message-ID: <CAJ-Vmom9cK+Jyryasbz=9Kt47xy490=vnevpMKK78XQ8ooZTSg@mail.gmail.com>
Subject: flowtable - FL_HASH_ALL
From: Adrian Chadd <adrian@freebsd.org>
To: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>,
 FreeBSD Net <freebsd-net@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jan 2014 20:23:09 -0000

Hi,

What's FL_HASH_ALL supposed to do? Is the flowtable code going to do
any kind of 4-tuple hashing if it isn't set?


-a

From owner-freebsd-net@FreeBSD.ORG  Mon Jan 27 21:09:41 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0C24DDEC;
 Mon, 27 Jan 2014 21:09:41 +0000 (UTC)
Received: from acme.spoerlein.net (acme.spoerlein.net
 [IPv6:2a01:4f8:131:23c2::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7AE331146;
 Mon, 27 Jan 2014 21:09:40 +0000 (UTC)
Received: from localhost (acme.spoerlein.net [IPv6:2a01:4f8:131:23c2::1])
 by acme.spoerlein.net (8.14.7/8.14.7) with ESMTP id s0RL9cF5027666
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Mon, 27 Jan 2014 22:09:38 +0100 (CET) (envelope-from uqs@FreeBSD.org)
Date: Mon, 27 Jan 2014 22:09:37 +0100
From: Ulrich =?utf-8?B?U3DDtnJsZWlu?= <uqs@FreeBSD.org>
To: Luigi Rizzo <rizzo@iet.unipi.it>
Subject: Re: unused in_cksum_update() ?
Message-ID: <20140127210937.GB93124@acme.spoerlein.net>
Mail-Followup-To: Luigi Rizzo <rizzo@iet.unipi.it>,
 Gleb Smirnoff <glebius@FreeBSD.org>, wollman@freebsd.org,
 current@freebsd.org, net@freebsd.org
References: <20140109192114.GA49934@onelab2.iet.unipi.it>
 <20140110103140.GD73147@FreeBSD.org>
 <20140110182448.GA62317@onelab2.iet.unipi.it>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20140110182448.GA62317@onelab2.iet.unipi.it>
User-Agent: Mutt/1.5.22 (2013-10-16)
Cc: wollman@freebsd.org, current@freebsd.org, net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jan 2014 21:09:41 -0000

On Fri, 2014-01-10 at 19:24:48 +0100, Luigi Rizzo wrote:
> On Fri, Jan 10, 2014 at 02:31:40PM +0400, Gleb Smirnoff wrote:
> > On Thu, Jan 09, 2014 at 08:21:14PM +0100, Luigi Rizzo wrote:
> > L> a lot of arch-specific headers (sys/${ARCH}/include/in_cksum.h)
> > L> have a lengthy definition for
> > L> 
> > L> 	in_cksum_update(struct ip *ip)
> > L> 
> > L> which seems completely unused in our source tree.
> > L> Time to remove it perhaps ?
> > L> 
> > L> grep cannot find any use at least since stable/8
> > 
> > I'd prefer not to hurry with its removal. Might be that pf will use it.
> > Since it lives in a header file, it doesn't add a single bit to kernel
> > size.
> 
> we should care more about obfuscation and correcteness, and this is
> a killer in both respects.
> Depending on $arch the function is not even available or wrong:
> 
> In particular, the basic code follows the description in
> http://tools.ietf.org/html/rfc1141A with ntohs/htons to deal
> with endianness (note that the '256' should not be converted):
> 
> 	tmp = ntohs(sum)+256;
> 	tmp = tmp + (tmp >> 16);
> 	sum = htons(tmp); // also truncates high bits
> 
> It is correctly implemented (but in a totally generic way, so no
> point to have it in the arch-specific files) for  amd64, i386,
> ia64, mips, powerpc; it is not implemented for arm, and it is wrong
> for sparc64 (where the 256 is incorrectly replaced by a 1).
> 
> In terms of usage: the svn repo suggests that it was added in r15884
> in 1996 (stable/2.2 is the first branch where it appears):
> 
> http://svnweb.freebsd.org/base/head/sys/i386/include/in_cksum.h?r1=15884&r2=15883&pathrev=15884
> 
> as far as i can tell never used anywhere, and copied from place to
> place when we started to support different architectures.
> 
> Shall we wait until it becomes 18 ? :)
> 
> I am adding Garret to the list as he may have more details.

Git's "pickaxe" is a very good tool for this sort of code archeology.
There's only a handful of commits that touched anything related to
"in_cksum_update".

I'm not going to dump the output of git log -S"in_cksum_update" here,
just the revisions that add/remove that string.

r15884
r36849
r66458
r86144
r99040
r158458
r163022
r178172
r180010

hth
Uli

From owner-freebsd-net@FreeBSD.ORG  Mon Jan 27 23:27:29 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4F99E9CE
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 23:27:29 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 125001D18
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 23:27:28 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqQEABnr5lKDaFve/2dsb2JhbABag0RWgn25EE+BMXSCJQEBAQMBAQEBICsgCwUWGAICDRkCIwYBCSYOAgUEARwEh1ADCQgNqXWXJg2FVheBKYtOgTQQAgEbNAeCb4FJBIlIjAxngx6LK4VBg0seMYE9
X-IronPort-AV: E=Sophos;i="4.95,732,1384318800"; d="scan'208";a="90909892"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 27 Jan 2014 18:27:21 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 2F446B40EF;
 Mon, 27 Jan 2014 18:27:19 -0500 (EST)
Date: Mon, 27 Jan 2014 18:27:19 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: pyunyh@gmail.com
Message-ID: <1168237133.17228249.1390865239175.JavaMail.root@uoguelph.ca>
In-Reply-To: <20140127055047.GA1368@michelle.cdnetworks.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: Daniel Braniss <danny@cs.huji.ac.il>, freebsd-net@freebsd.org,
 Adam McDougall <mcdouga9@egr.msu.edu>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jan 2014 23:27:29 -0000

pyunyh@gmail.com wrote:
> On Sun, Jan 26, 2014 at 09:16:54PM -0500, Rick Macklem wrote:
> > Adam McDougall wrote:
> > > Also try rsize=32768,wsize=32768 in your mount options, made a
> > > huge
> > > difference for me.  I've noticed slow file transfers on NFS in 9
> > > and
> > > finally did some searching a couple months ago, someone suggested
> > > it
> > > and
> > > they were on to something.
> > > 
> > I have a "hunch" that might explain why 64K NFS reads/writes
> > perform
> > poorly for some network environments.
> > A 64K NFS read reply/write request consists of a list of 34 mbufs
> > when
> > passed to TCP via sosend() and a total data length of around
> > 65680bytes.
> > Looking at a couple of drivers (virtio and ixgbe), they seem to
> > expect
> > no more than 32-33 mbufs in a list for a 65535 byte TSO xmit. I
> > think
> > (I don't have anything that does TSO to confirm this) that NFS will
> > pass
> > a list that is longer (34 plus a TCP/IP header).
> > At a glance, it appears that the drivers call m_defrag() or
> > m_collapse()
> > when the mbuf list won't fit in their scatter table (32 or 33
> > elements)
> > and if this fails, just silently drop the data without sending it.
> > If I'm right, there would considerable overhead from
> > m_defrag()/m_collapse()
> > and near disaster if they fail to fix the problem and the data is
> > silently
> > dropped instead of xmited.
> > 
> 
> I think the actual number of DMA segments allocated for the mbuf
> chain is determined by bus_dma(9).  bus_dma(9) will coalesce
> current segment with previous segment if possible.
> 
Ok, I'll have to take a look, but I thought that an array of sized
by "num_segs" is passed in as an argument. (And num_segs is set to
either IXGBE_82598_SCATTER (100) or IXGBE_82599_SCATTER (32).)
It looked to me that the ixgbe driver called itself ix, so it isn't
obvious to me which we are talking about. (I know that Daniel Braniss
had an ix0 and ix1, which were fixed for NFS by disabling TSO.)

I'll admit I mostly looked at virtio's network driver, since that
was the one being used by J David.

Problems w.r.t. TSO enabled for NFS using 64K rsize/wsize have been
cropping up for quite a while, and I am just trying to find out why.
(I have no hardware/software that exhibits the problem, so I can
only look at the sources and ask others to try testing stuff.)

> I'm not sure whether you're referring to ixgbe(4) or ix(4) but I
> see the total length of all segment size of ix(4) is 65535 so
> it has no room for ethernet/VLAN header of the mbuf chain.  The
> driver should be fixed to transmit a 64KB datagram.
Well, if_hw_tsomax is set to 65535 by the generic code (the driver
doesn't set it) and the code in tcp_output() seems to subtract the
size of an tcp/ip header from that before passing data to the driver,
so I think the mbuf chain passed to the driver will fit in one
ip datagram. (I'd assume all sorts of stuff would break for TSO
enabled drivers if that wasn't the case?)

> I think the use of m_defrag(9) in TSO is suboptimal. All TSO
> capable controllers are able to handle multiple TX buffers so it
> should have used m_collapse(9) rather than copying entire chain
> with m_defrag(9).
> 
I haven't looked at these closely yet (plan on doing so to-day), but
even m_collapse() looked like it copied data between mbufs and that
is certainly suboptimal, imho. I don't see why a driver can't split
the mbuf list, if there are too many entries for the scatter/gather
and do it in two iterations (much like tcp_output() does already,
since the data length exceeds 65535 - tcp/ip header size).

However, at this point, I just want to find out if the long chain
of mbufs is why TSO is problematic for these drivers, since I'll
admit I'm getting tired of telling people to disable TSO (and I
suspect some don't believe me and never try it).

> > Anyhow, I have attached a patch that makes NFS use MJUMPAGESIZE
> > clusters,
> > so the mbuf count drops from 34 to 18.
> > 
> 
> Could we make it conditional on size?
> 
Not sure what you mean? If you mean "the size of the read/write",
that would be possible for NFSv3, but less so for NFSv4. (The read/write
is just one Op. in the compound for NFSv4 and there is no way to
predict how much more data is going to be generated by subsequent Ops.)

If by "size" you mean amount of memory in the machine then, yes, it
certainly could be conditional on that. (I plan to try and look at
the allocator to-day as well, but if others know of disadvantages with
using MJUMPAGESIZE instead of MCLBYTES, please speak up.)

Garrett Wollman already alluded to the MCLBYTES case being pre-allocated,
but I'll admit I have no idea what the implications of that are at this
time.

> > If anyone has a TSO scatter/gather enabled net interface and can
> > test this
> > patch on it with NFS I/O (default of 64K rsize/wsize) when TSO is
> > enabled
> > and see what effect it has, that would be appreciated.
> > 
> > Btw, thanks go to Garrett Wollman for suggesting the change to
> > MJUMPAGESIZE
> > clusters.
> > 
> > rick
> > ps: If the attachment doesn't make it through and you want the
> > patch, just
> >     email me and I'll send you a copy.
> > 
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Mon Jan 27 23:47:15 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id ADE3C47E
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 23:47:15 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 706201E4E
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 23:47:14 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqQEALHv5lKDaFve/2dsb2JhbABXA4NEVoJ9uRFPgTF0giUBAQEDAQEBASArIAsFFhgCAg0ZAikBCSYGCAcEARwBA4dcCA2peJ0IF4EpjQIKBgIBGyQQBxGCHkCBSQSJSIwMhAWQbINLHjF7Qg
X-IronPort-AV: E=Sophos;i="4.95,732,1384318800"; d="scan'208";a="90913571"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 27 Jan 2014 18:47:10 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 7775DB40E7;
 Mon, 27 Jan 2014 18:47:10 -0500 (EST)
Date: Mon, 27 Jan 2014 18:47:10 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: John-Mark Gurney <jmg@funkthat.com>
Message-ID: <222089865.17245782.1390866430479.JavaMail.root@uoguelph.ca>
In-Reply-To: <20140127032338.GP13704@funkthat.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, Adam McDougall <mcdouga9@egr.msu.edu>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jan 2014 23:47:15 -0000

John-Mark Gurney wrote:
> Rick Macklem wrote this message on Sun, Jan 26, 2014 at 21:16 -0500:
> > Btw, thanks go to Garrett Wollman for suggesting the change to
> > MJUMPAGESIZE
> > clusters.
> > 
> > rick
> > ps: If the attachment doesn't make it through and you want the
> > patch, just
> >     email me and I'll send you a copy.
> 
> The patch looks good, but we probably shouldn't change _readlink..
> The chances of a link being >2k are pretty slim, and the chances of
> the link being >32k are even smaller...
> 
Yea, I already thought of that, actually. However, see below w.r.t.
NFSv4.

However, at this point I
mostly want to find out if it the long mbuf chain that causes problems
for TSO enabled network interfaces.

> In fact, we might want to switch _readlink to MGET (could be
> conditional
> upon cnt) so that if it fits in an mbuf we don't allocate a cluster
> for
> it...
> 
For NFSv4, what was an RPC for NFSv3 becomes one of several Ops. in
a compound RPC. As such, there is no way to know how much additional
RPC message there will be. So, although the readlink reply won't use
much of the 4K allocation, replies for subsequent Ops. in the compound
certainly could. (Is it more efficient to allocate 4K now and use
part of it for subsequent message reply stuff or allocate additional
mbuf clusters later for subsequent stuff, as required? On a small
memory constrained machine, I suspect the latter is correct, but for
the kind of hardware that has TSO scatter/gather enabled network
interfaces, I'm not so sure. At this point, I wouldn't even say
that using 4K clusters is going to be a win and my hunch is that
any win wouldn't apply to small memory constrained machines.)

My test server has 256Mbytes of ram and it certainly doesn't show
any improvement (big surprise;-), but it also doesn't show any
degradation for the limited testing I've done.

Again, my main interest at this point is whether reducing the
number of mbufs in the chain fixes the TSO issues. I think
the question of whether or not 4K clusters are performance
improvement in general, is an interesting one that comes later.

rick

> --
>   John-Mark Gurney				Voice: +1 415 225 5579
> 
>      "All that I will do, has been done, All that I have, has not."
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 00:28:31 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4CCA03B0
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 00:28:31 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2815E1156
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 00:28:30 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s0S0SRwC063807
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Mon, 27 Jan 2014 16:28:27 -0800 (PST)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s0S0SQ4E063806;
 Mon, 27 Jan 2014 16:28:26 -0800 (PST) (envelope-from jmg)
Date: Mon, 27 Jan 2014 16:28:26 -0800
From: John-Mark Gurney <jmg@funkthat.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID: <20140128002826.GU13704@funkthat.com>
Mail-Followup-To: Rick Macklem <rmacklem@uoguelph.ca>,
 freebsd-net@freebsd.org, Adam McDougall <mcdouga9@egr.msu.edu>
References: <20140127032338.GP13704@funkthat.com>
 <222089865.17245782.1390866430479.JavaMail.root@uoguelph.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <222089865.17245782.1390866430479.JavaMail.root@uoguelph.ca>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Mon, 27 Jan 2014 16:28:27 -0800 (PST)
Cc: freebsd-net@freebsd.org, Adam McDougall <mcdouga9@egr.msu.edu>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 00:28:31 -0000

Rick Macklem wrote this message on Mon, Jan 27, 2014 at 18:47 -0500:
> John-Mark Gurney wrote:
> > Rick Macklem wrote this message on Sun, Jan 26, 2014 at 21:16 -0500:
> > > Btw, thanks go to Garrett Wollman for suggesting the change to
> > > MJUMPAGESIZE
> > > clusters.
> > > 
> > > rick
> > > ps: If the attachment doesn't make it through and you want the
> > > patch, just
> > >     email me and I'll send you a copy.
> > 
> > The patch looks good, but we probably shouldn't change _readlink..
> > The chances of a link being >2k are pretty slim, and the chances of
> > the link being >32k are even smaller...
> > 
> Yea, I already thought of that, actually. However, see below w.r.t.
> NFSv4.
> 
> However, at this point I
> mostly want to find out if it the long mbuf chain that causes problems
> for TSO enabled network interfaces.

I agree, though a long mbuf chain is more of a driver issue than an
NFS issue...

> > In fact, we might want to switch _readlink to MGET (could be
> > conditional
> > upon cnt) so that if it fits in an mbuf we don't allocate a cluster
> > for
> > it...
> > 
> For NFSv4, what was an RPC for NFSv3 becomes one of several Ops. in
> a compound RPC. As such, there is no way to know how much additional
> RPC message there will be. So, although the readlink reply won't use
> much of the 4K allocation, replies for subsequent Ops. in the compound
> certainly could. (Is it more efficient to allocate 4K now and use
> part of it for subsequent message reply stuff or allocate additional
> mbuf clusters later for subsequent stuff, as required? On a small
> memory constrained machine, I suspect the latter is correct, but for
> the kind of hardware that has TSO scatter/gather enabled network
> interfaces, I'm not so sure. At this point, I wouldn't even say
> that using 4K clusters is going to be a win and my hunch is that
> any win wouldn't apply to small memory constrained machines.)

Though the code that was patched wasn't using any partial buffers,
it was always allocating a new buffer...  If the code in
_read/_readlinks starts using a previous mbuf chain, then obviously
things are different and I'd agree, always allocating a 2k/4k
cluster makes sense...

> My test server has 256Mbytes of ram and it certainly doesn't show
> any improvement (big surprise;-), but it also doesn't show any
> degradation for the limited testing I've done.

I'm not too surprised, unless you're on a heavy server pushing
>200MB/sec, the allocation cost is probably cheap enough that it
doesn't show up...  going to 4k means immediately half as many mbufs
are needed/allocated, and as they are page sized, don't have the
problems of physical memory fragmentation, nor do they have to do an
IPI/tlb shoot down in the case of multipage allocations...  (I'm
dealing w/ this for geli.)

> Again, my main interest at this point is whether reducing the
> number of mbufs in the chain fixes the TSO issues. I think
> the question of whether or not 4K clusters are performance
> improvement in general, is an interesting one that comes later.

Another thing I noticed is that we are getting an mbuf and then
allocating a cluster... Is there a reason we aren't using something
like m_getm or m_getcl?  We have a special uma zone that has
mbuf and mbuf cluster already paired meaning we save some lock
operations for each segment allocated...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 00:58:27 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 605F0A75
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 00:58:27 +0000 (UTC)
Received: from mail-pd0-x231.google.com (mail-pd0-x231.google.com
 [IPv6:2607:f8b0:400e:c02::231])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2E548136F
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 00:58:27 +0000 (UTC)
Received: by mail-pd0-f177.google.com with SMTP id x10so6402562pdj.36
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 16:58:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=from:date:to:cc:subject:message-id:reply-to:references:mime-version
 :content-type:content-disposition:in-reply-to:user-agent;
 bh=Kydc5GLBfufbqmaeY5S8edqaJ9Hxqemk8fq/lmUiyf0=;
 b=gyDLZhkqNDC8DzYk7gK7OCFptNRyBL5B2rM6S4Z944zcQsAguwg0SNGrVdewA8KJr7
 U5pyJ4TVYlDSx9mIFM0jmXnN4KZJ33ibL10LdLAs6TPkVaKoquMXBvjbTb2Fuh6Hi/L5
 fn2mGQQJixs246OmMrXPrwahd1re7TKdpd8UTgG0u0xZ6CHicG4PPt93L1/mdhrB0DfR
 jADrjQvBS+im5BHS9XjJHmqCUkPc/8mz/70nIukldH4wGmXgkvJKeHzNG5O7LbE+w11Z
 BazE+lrVjddQy5xoyXScA4vHxA9tiPFKBuTM+XCSKzyXdioyvx56v3xHTAxXvHTrqGBd
 ws3Q==
X-Received: by 10.67.5.131 with SMTP id cm3mr33310107pad.92.1390870706768;
 Mon, 27 Jan 2014 16:58:26 -0800 (PST)
Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249])
 by mx.google.com with ESMTPSA id un5sm97038939pab.3.2014.01.27.16.58.22
 for <multiple recipients>
 (version=TLSv1 cipher=RC4-SHA bits=128/128);
 Mon, 27 Jan 2014 16:58:25 -0800 (PST)
Received: by pyunyh@gmail.com (sSMTP sendmail emulation);
 Tue, 28 Jan 2014 09:58:18 +0900
From: Yonghyeon PYUN <pyunyh@gmail.com>
Date: Tue, 28 Jan 2014 09:58:18 +0900
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID: <20140128005818.GB2722@michelle.cdnetworks.com>
References: <20140127055047.GA1368@michelle.cdnetworks.com>
 <1168237133.17228249.1390865239175.JavaMail.root@uoguelph.ca>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="CE+1k2dSO48ffgeK"
Content-Disposition: inline
In-Reply-To: <1168237133.17228249.1390865239175.JavaMail.root@uoguelph.ca>
User-Agent: Mutt/1.4.2.3i
Cc: Daniel Braniss <danny@cs.huji.ac.il>, freebsd-net@freebsd.org,
 Adam McDougall <mcdouga9@egr.msu.edu>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: pyunyh@gmail.com
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 00:58:27 -0000


--CE+1k2dSO48ffgeK
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Mon, Jan 27, 2014 at 06:27:19PM -0500, Rick Macklem wrote:
> pyunyh@gmail.com wrote:
> > On Sun, Jan 26, 2014 at 09:16:54PM -0500, Rick Macklem wrote:
> > > Adam McDougall wrote:
> > > > Also try rsize=32768,wsize=32768 in your mount options, made a
> > > > huge
> > > > difference for me.  I've noticed slow file transfers on NFS in 9
> > > > and
> > > > finally did some searching a couple months ago, someone suggested
> > > > it
> > > > and
> > > > they were on to something.
> > > > 
> > > I have a "hunch" that might explain why 64K NFS reads/writes
> > > perform
> > > poorly for some network environments.
> > > A 64K NFS read reply/write request consists of a list of 34 mbufs
> > > when
> > > passed to TCP via sosend() and a total data length of around
> > > 65680bytes.
> > > Looking at a couple of drivers (virtio and ixgbe), they seem to
> > > expect
> > > no more than 32-33 mbufs in a list for a 65535 byte TSO xmit. I
> > > think
> > > (I don't have anything that does TSO to confirm this) that NFS will
> > > pass
> > > a list that is longer (34 plus a TCP/IP header).
> > > At a glance, it appears that the drivers call m_defrag() or
> > > m_collapse()
> > > when the mbuf list won't fit in their scatter table (32 or 33
> > > elements)
> > > and if this fails, just silently drop the data without sending it.
> > > If I'm right, there would considerable overhead from
> > > m_defrag()/m_collapse()
> > > and near disaster if they fail to fix the problem and the data is
> > > silently
> > > dropped instead of xmited.
> > > 
> > 
> > I think the actual number of DMA segments allocated for the mbuf
> > chain is determined by bus_dma(9).  bus_dma(9) will coalesce
> > current segment with previous segment if possible.
> > 
> Ok, I'll have to take a look, but I thought that an array of sized
> by "num_segs" is passed in as an argument. (And num_segs is set to
> either IXGBE_82598_SCATTER (100) or IXGBE_82599_SCATTER (32).)
> It looked to me that the ixgbe driver called itself ix, so it isn't
> obvious to me which we are talking about. (I know that Daniel Braniss
> had an ix0 and ix1, which were fixed for NFS by disabling TSO.)
> 

It's ix(4). ixbge(4) is a different driver.

> I'll admit I mostly looked at virtio's network driver, since that
> was the one being used by J David.
> 
> Problems w.r.t. TSO enabled for NFS using 64K rsize/wsize have been
> cropping up for quite a while, and I am just trying to find out why.
> (I have no hardware/software that exhibits the problem, so I can
> only look at the sources and ask others to try testing stuff.)
> 
> > I'm not sure whether you're referring to ixgbe(4) or ix(4) but I
> > see the total length of all segment size of ix(4) is 65535 so
> > it has no room for ethernet/VLAN header of the mbuf chain.  The
> > driver should be fixed to transmit a 64KB datagram.
> Well, if_hw_tsomax is set to 65535 by the generic code (the driver
> doesn't set it) and the code in tcp_output() seems to subtract the
> size of an tcp/ip header from that before passing data to the driver,
> so I think the mbuf chain passed to the driver will fit in one
> ip datagram. (I'd assume all sorts of stuff would break for TSO
> enabled drivers if that wasn't the case?)

I believe the generic code is doing right.  I'm under the
impression the non-working TSO indicates a bug in driver.  Some
drivers didn't account for additional ethernet/VLAN header so the
total size of DMA segments exceeded 65535.  I've attached a diff
for ix(4). It wasn't tested at all as I don't have hardware to
test.

> 
> > I think the use of m_defrag(9) in TSO is suboptimal. All TSO
> > capable controllers are able to handle multiple TX buffers so it
> > should have used m_collapse(9) rather than copying entire chain
> > with m_defrag(9).
> > 
> I haven't looked at these closely yet (plan on doing so to-day), but
> even m_collapse() looked like it copied data between mbufs and that
> is certainly suboptimal, imho. I don't see why a driver can't split
> the mbuf list, if there are too many entries for the scatter/gather
> and do it in two iterations (much like tcp_output() does already,
> since the data length exceeds 65535 - tcp/ip header size).
> 

It can split the mbuf list if controllers supports increased number
of TX buffers.  Because controller shall consume the same number of
DMA descriptors for the mbuf list, drivers tend to impose a limit
on the number of TX buffers to save resources. 

> However, at this point, I just want to find out if the long chain
> of mbufs is why TSO is problematic for these drivers, since I'll
> admit I'm getting tired of telling people to disable TSO (and I
> suspect some don't believe me and never try it).
> 

TSO capable controllers tend to have various limitations(the first
TX buffer should have complete ethernet/IP/TCP header, ip_len of IP
header should be reset to 0, TCP pseudo checksum should be
recomputed etc) and cheap controllers need more assistance from
driver to let its firmware know various IP/TCP header offset
location in the mbuf.  Because this requires a IP/TCP header
parsing, it's error prone and very complex.

> > > Anyhow, I have attached a patch that makes NFS use MJUMPAGESIZE
> > > clusters,
> > > so the mbuf count drops from 34 to 18.
> > > 
> > 
> > Could we make it conditional on size?
> > 
> Not sure what you mean? If you mean "the size of the read/write",
> that would be possible for NFSv3, but less so for NFSv4. (The read/write
> is just one Op. in the compound for NFSv4 and there is no way to
> predict how much more data is going to be generated by subsequent Ops.)
> 

Sorry, I should have been more clearer. You already answered my
question.  Thanks.

> If by "size" you mean amount of memory in the machine then, yes, it
> certainly could be conditional on that. (I plan to try and look at
> the allocator to-day as well, but if others know of disadvantages with
> using MJUMPAGESIZE instead of MCLBYTES, please speak up.)
> 
> Garrett Wollman already alluded to the MCLBYTES case being pre-allocated,
> but I'll admit I have no idea what the implications of that are at this
> time.
> 
> > > If anyone has a TSO scatter/gather enabled net interface and can
> > > test this
> > > patch on it with NFS I/O (default of 64K rsize/wsize) when TSO is
> > > enabled
> > > and see what effect it has, that would be appreciated.
> > > 
> > > Btw, thanks go to Garrett Wollman for suggesting the change to
> > > MJUMPAGESIZE
> > > clusters.
> > > 
> > > rick
> > > ps: If the attachment doesn't make it through and you want the
> > > patch, just
> > >     email me and I'll send you a copy.
> > > 

--CE+1k2dSO48ffgeK
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment; filename="ix.TSO.diff"

Index: sys/dev/ixgbe/ixv.h
===================================================================
--- sys/dev/ixgbe/ixv.h	(revision 260903)
+++ sys/dev/ixgbe/ixv.h	(working copy)
@@ -172,7 +172,7 @@
 #define IXV_SCATTER			32
 #define IXV_RX_HDR			128
 #define MSIX_BAR			3
-#define IXV_TSO_SIZE			65535
+#define IXV_TSO_SIZE			(65535 + sizeof(struct ether_vlan_header))
 #define IXV_BR_SIZE			4096
 #define IXV_LINK_ITR			2000
 #define TX_BUFFER_SIZE		((u32) 1514)

--CE+1k2dSO48ffgeK--

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 01:15:31 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 6EA777B
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 01:15:31 +0000 (UTC)
Received: from mail-wg0-x231.google.com (mail-wg0-x231.google.com
 [IPv6:2a00:1450:400c:c00::231])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id ED54C152A
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 01:15:30 +0000 (UTC)
Received: by mail-wg0-f49.google.com with SMTP id a1so6512572wgh.28
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 17:15:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=6naP9ogt3J4PoGh2KiN44jICL+TA1rD3fsABnhYXksQ=;
 b=WbzWP8iERj4Kxry2nmLqjDYKJIAi16Y4MHGyhlUTZ736pYJCWNoro/oGJ16zOWlB3k
 +9934vBCyERFjxcdlFbOkaYp0nOcOM3s2BWSMQyaUfaH1tBgk8R5LCkOOETI3Ys7XxsM
 libD7TUf6/03XuJtfggo6VdnLPwHT7wttGbtlhKURTGVW4pGjMZP1Zk7W08LGT5sab9i
 Yz4XAnrTb99aR1dbTs4AB4BFp26V8XKurGpfSLlsV4RWrP3JcxntcAoD523elrVFT4+m
 8Sim3okcVXdxkVbXEkLXxMM0NNiX7ga7xxByP9tjgFHMFE8x1YJsTPpuCgqzn9g5abB2
 +jhg==
MIME-Version: 1.0
X-Received: by 10.180.105.65 with SMTP id gk1mr14076259wib.12.1390871729390;
 Mon, 27 Jan 2014 17:15:29 -0800 (PST)
Received: by 10.194.20.162 with HTTP; Mon, 27 Jan 2014 17:15:29 -0800 (PST)
In-Reply-To: <20140128005818.GB2722@michelle.cdnetworks.com>
References: <20140127055047.GA1368@michelle.cdnetworks.com>
 <1168237133.17228249.1390865239175.JavaMail.root@uoguelph.ca>
 <20140128005818.GB2722@michelle.cdnetworks.com>
Date: Mon, 27 Jan 2014 17:15:29 -0800
Message-ID: <CAFOYbcndfNwTbNdcep4fgeYGiFqXd0Wp6cCFVJgS_mAkv3TwWw@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: Jack Vogel <jfvogel@gmail.com>
To: Pyun YongHyeon <pyunyh@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
Cc: Daniel Braniss <danny@cs.huji.ac.il>, FreeBSD Net <freebsd-net@freebsd.org>,
 Adam McDougall <mcdouga9@egr.msu.edu>, Rick Macklem <rmacklem@uoguelph.ca>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 01:15:31 -0000

That header file is for the VF driver :) which I don't believe is being
used in this case.
The driver is capable of handling 256K but its limited by the stack to 64K
(look in
ixgbe.h), so its not a few bytes off due to the vlan header.

The scatter size is not an arbitrary one, its due to hardware limitations
in Niantic
(82599).  Turning off TSO in the 10G environment is not practical, you will
have
trouble getting good performance.

Jack


On Mon, Jan 27, 2014 at 4:58 PM, Yonghyeon PYUN <pyunyh@gmail.com> wrote:

> On Mon, Jan 27, 2014 at 06:27:19PM -0500, Rick Macklem wrote:
> > pyunyh@gmail.com wrote:
> > > On Sun, Jan 26, 2014 at 09:16:54PM -0500, Rick Macklem wrote:
> > > > Adam McDougall wrote:
> > > > > Also try rsize=32768,wsize=32768 in your mount options, made a
> > > > > huge
> > > > > difference for me.  I've noticed slow file transfers on NFS in 9
> > > > > and
> > > > > finally did some searching a couple months ago, someone suggested
> > > > > it
> > > > > and
> > > > > they were on to something.
> > > > >
> > > > I have a "hunch" that might explain why 64K NFS reads/writes
> > > > perform
> > > > poorly for some network environments.
> > > > A 64K NFS read reply/write request consists of a list of 34 mbufs
> > > > when
> > > > passed to TCP via sosend() and a total data length of around
> > > > 65680bytes.
> > > > Looking at a couple of drivers (virtio and ixgbe), they seem to
> > > > expect
> > > > no more than 32-33 mbufs in a list for a 65535 byte TSO xmit. I
> > > > think
> > > > (I don't have anything that does TSO to confirm this) that NFS will
> > > > pass
> > > > a list that is longer (34 plus a TCP/IP header).
> > > > At a glance, it appears that the drivers call m_defrag() or
> > > > m_collapse()
> > > > when the mbuf list won't fit in their scatter table (32 or 33
> > > > elements)
> > > > and if this fails, just silently drop the data without sending it.
> > > > If I'm right, there would considerable overhead from
> > > > m_defrag()/m_collapse()
> > > > and near disaster if they fail to fix the problem and the data is
> > > > silently
> > > > dropped instead of xmited.
> > > >
> > >
> > > I think the actual number of DMA segments allocated for the mbuf
> > > chain is determined by bus_dma(9).  bus_dma(9) will coalesce
> > > current segment with previous segment if possible.
> > >
> > Ok, I'll have to take a look, but I thought that an array of sized
> > by "num_segs" is passed in as an argument. (And num_segs is set to
> > either IXGBE_82598_SCATTER (100) or IXGBE_82599_SCATTER (32).)
> > It looked to me that the ixgbe driver called itself ix, so it isn't
> > obvious to me which we are talking about. (I know that Daniel Braniss
> > had an ix0 and ix1, which were fixed for NFS by disabling TSO.)
> >
>
> It's ix(4). ixbge(4) is a different driver.
>
> > I'll admit I mostly looked at virtio's network driver, since that
> > was the one being used by J David.
> >
> > Problems w.r.t. TSO enabled for NFS using 64K rsize/wsize have been
> > cropping up for quite a while, and I am just trying to find out why.
> > (I have no hardware/software that exhibits the problem, so I can
> > only look at the sources and ask others to try testing stuff.)
> >
> > > I'm not sure whether you're referring to ixgbe(4) or ix(4) but I
> > > see the total length of all segment size of ix(4) is 65535 so
> > > it has no room for ethernet/VLAN header of the mbuf chain.  The
> > > driver should be fixed to transmit a 64KB datagram.
> > Well, if_hw_tsomax is set to 65535 by the generic code (the driver
> > doesn't set it) and the code in tcp_output() seems to subtract the
> > size of an tcp/ip header from that before passing data to the driver,
> > so I think the mbuf chain passed to the driver will fit in one
> > ip datagram. (I'd assume all sorts of stuff would break for TSO
> > enabled drivers if that wasn't the case?)
>
> I believe the generic code is doing right.  I'm under the
> impression the non-working TSO indicates a bug in driver.  Some
> drivers didn't account for additional ethernet/VLAN header so the
> total size of DMA segments exceeded 65535.  I've attached a diff
> for ix(4). It wasn't tested at all as I don't have hardware to
> test.
>
> >
> > > I think the use of m_defrag(9) in TSO is suboptimal. All TSO
> > > capable controllers are able to handle multiple TX buffers so it
> > > should have used m_collapse(9) rather than copying entire chain
> > > with m_defrag(9).
> > >
> > I haven't looked at these closely yet (plan on doing so to-day), but
> > even m_collapse() looked like it copied data between mbufs and that
> > is certainly suboptimal, imho. I don't see why a driver can't split
> > the mbuf list, if there are too many entries for the scatter/gather
> > and do it in two iterations (much like tcp_output() does already,
> > since the data length exceeds 65535 - tcp/ip header size).
> >
>
> It can split the mbuf list if controllers supports increased number
> of TX buffers.  Because controller shall consume the same number of
> DMA descriptors for the mbuf list, drivers tend to impose a limit
> on the number of TX buffers to save resources.
>
> > However, at this point, I just want to find out if the long chain
> > of mbufs is why TSO is problematic for these drivers, since I'll
> > admit I'm getting tired of telling people to disable TSO (and I
> > suspect some don't believe me and never try it).
> >
>
> TSO capable controllers tend to have various limitations(the first
> TX buffer should have complete ethernet/IP/TCP header, ip_len of IP
> header should be reset to 0, TCP pseudo checksum should be
> recomputed etc) and cheap controllers need more assistance from
> driver to let its firmware know various IP/TCP header offset
> location in the mbuf.  Because this requires a IP/TCP header
> parsing, it's error prone and very complex.
>
> > > > Anyhow, I have attached a patch that makes NFS use MJUMPAGESIZE
> > > > clusters,
> > > > so the mbuf count drops from 34 to 18.
> > > >
> > >
> > > Could we make it conditional on size?
> > >
> > Not sure what you mean? If you mean "the size of the read/write",
> > that would be possible for NFSv3, but less so for NFSv4. (The read/write
> > is just one Op. in the compound for NFSv4 and there is no way to
> > predict how much more data is going to be generated by subsequent Ops.)
> >
>
> Sorry, I should have been more clearer. You already answered my
> question.  Thanks.
>
> > If by "size" you mean amount of memory in the machine then, yes, it
> > certainly could be conditional on that. (I plan to try and look at
> > the allocator to-day as well, but if others know of disadvantages with
> > using MJUMPAGESIZE instead of MCLBYTES, please speak up.)
> >
> > Garrett Wollman already alluded to the MCLBYTES case being pre-allocated,
> > but I'll admit I have no idea what the implications of that are at this
> > time.
> >
> > > > If anyone has a TSO scatter/gather enabled net interface and can
> > > > test this
> > > > patch on it with NFS I/O (default of 64K rsize/wsize) when TSO is
> > > > enabled
> > > > and see what effect it has, that would be appreciated.
> > > >
> > > > Btw, thanks go to Garrett Wollman for suggesting the change to
> > > > MJUMPAGESIZE
> > > > clusters.
> > > >
> > > > rick
> > > > ps: If the attachment doesn't make it through and you want the
> > > > patch, just
> > > >     email me and I'll send you a copy.
> > > >
>
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 01:33:01 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8F30C387
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 01:33:01 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 261B11674
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 01:33:00 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqQEAFMI51KDaFve/2dsb2JhbABXA4NEVoJ9uRJPgTJ0giUBAQEDAQEBASArHgIIAxsYAgINGQIpAQkmBggHBAEcAQOHXAgNqWufYReBKYx0CgYCARskEAcRgh5AgUkEiUiMDIQFkGyDSx4xe0I
X-IronPort-AV: E=Sophos;i="4.95,732,1384318800"; d="scan'208";a="90927656"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 27 Jan 2014 20:32:59 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 13B00B3F26;
 Mon, 27 Jan 2014 20:32:59 -0500 (EST)
Date: Mon, 27 Jan 2014 20:32:59 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: John-Mark Gurney <jmg@funkthat.com>
Message-ID: <1415339672.17282775.1390872779067.JavaMail.root@uoguelph.ca>
In-Reply-To: <20140128002826.GU13704@funkthat.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, Adam McDougall <mcdouga9@egr.msu.edu>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 01:33:01 -0000

John-Mark Gurney wrote:
> Rick Macklem wrote this message on Mon, Jan 27, 2014 at 18:47 -0500:
> > John-Mark Gurney wrote:
> > > Rick Macklem wrote this message on Sun, Jan 26, 2014 at 21:16
> > > -0500:
> > > > Btw, thanks go to Garrett Wollman for suggesting the change to
> > > > MJUMPAGESIZE
> > > > clusters.
> > > > 
> > > > rick
> > > > ps: If the attachment doesn't make it through and you want the
> > > > patch, just
> > > >     email me and I'll send you a copy.
> > > 
> > > The patch looks good, but we probably shouldn't change
> > > _readlink..
> > > The chances of a link being >2k are pretty slim, and the chances
> > > of
> > > the link being >32k are even smaller...
> > > 
> > Yea, I already thought of that, actually. However, see below w.r.t.
> > NFSv4.
> > 
> > However, at this point I
> > mostly want to find out if it the long mbuf chain that causes
> > problems
> > for TSO enabled network interfaces.
> 
> I agree, though a long mbuf chain is more of a driver issue than an
> NFS issue...
> 
Yes, if my hunch is correct, it is. If my hunch gets verified, I will
be posting w.r.t. how best to deal with the problem. I suspect a patch
like this one might serve as a useful work-around while the drivers
gets fixed, if the hunch is correct.

> > > In fact, we might want to switch _readlink to MGET (could be
> > > conditional
> > > upon cnt) so that if it fits in an mbuf we don't allocate a
> > > cluster
> > > for
> > > it...
> > > 
> > For NFSv4, what was an RPC for NFSv3 becomes one of several Ops. in
> > a compound RPC. As such, there is no way to know how much
> > additional
> > RPC message there will be. So, although the readlink reply won't
> > use
> > much of the 4K allocation, replies for subsequent Ops. in the
> > compound
> > certainly could. (Is it more efficient to allocate 4K now and use
> > part of it for subsequent message reply stuff or allocate
> > additional
> > mbuf clusters later for subsequent stuff, as required? On a small
> > memory constrained machine, I suspect the latter is correct, but
> > for
> > the kind of hardware that has TSO scatter/gather enabled network
> > interfaces, I'm not so sure. At this point, I wouldn't even say
> > that using 4K clusters is going to be a win and my hunch is that
> > any win wouldn't apply to small memory constrained machines.)
> 
> Though the code that was patched wasn't using any partial buffers,
> it was always allocating a new buffer...  If the code in
> _read/_readlinks starts using a previous mbuf chain, then obviously
> things are different and I'd agree, always allocating a 2k/4k
> cluster makes sense...
> 
Yes, but nd_mb and nd_bpos are set, which means subsequent replies can
use the remainder of the cluster.

Why does it always allocate a new cluster? Well, because the code is
OLD. It was written for OpenBSD2.6 and, at that time, I tried to make
it portable across the BSDen. I'm not so concerned w.r.t. its portability
now, since no one else is porting it and I don't plan to, but I still
think it would be nice if it were portable to other BSDen.
Back when I wrote it, I believe that MCLBYTES was 1K and an entire
cluster was needed. (To be honest, I found out that FreeBSD's NCLBYTES
is 2K about 2 days ago, when I started looking at this stuff.)

Could it now look to see if enough bytes (a little over 1K) were available
in the current cluster and use that. Yes, but it would reduce the portability
of the code and I don't think it would make a measurable difference performance
wise.

> > My test server has 256Mbytes of ram and it certainly doesn't show
> > any improvement (big surprise;-), but it also doesn't show any
> > degradation for the limited testing I've done.
> 
> I'm not too surprised, unless you're on a heavy server pushing
> >200MB/sec, the allocation cost is probably cheap enough that it
> doesn't show up...  going to 4k means immediately half as many mbufs
> are needed/allocated, and as they are page sized, don't have the
> problems of physical memory fragmentation, nor do they have to do an
> IPI/tlb shoot down in the case of multipage allocations...  (I'm
> dealing w/ this for geli.)
> 
Yes, Garrett Wollman proposed this and I suspect there might be a
performance gain for larger systems. He has a more involved patch.
To be honest, if Garrett is convinced that his patch is of benefit
performance wise, I will do a separate posting w.r.t. it and whether
or not it is appropriate to be committed to head, etc.

> > Again, my main interest at this point is whether reducing the
> > number of mbufs in the chain fixes the TSO issues. I think
> > the question of whether or not 4K clusters are performance
> > improvement in general, is an interesting one that comes later.
> 
> Another thing I noticed is that we are getting an mbuf and then
> allocating a cluster... Is there a reason we aren't using something
> like m_getm or m_getcl?  We have a special uma zone that has
> mbuf and mbuf cluster already paired meaning we save some lock
> operations for each segment allocated...
> 
See above w.r.t. OLD portable code. There was a time when MGETCL()
wasn't guaranteed to succeed even when M_WAITOK is specified.
This is also why there is that weird loop in the NFSMCLGET() macro.
(I think there was a time in FreeBSD's past when allocation was never
 guaranteed and the rest of the code doesn't tolerate a NULL mbuf ptr.
 Something like M_TRYWAIT in old versions of FreeBSD?)

Btw, Garrett Wollman's patch uses m_getm2() to get the mbuf list.

rick

> --
>   John-Mark Gurney				Voice: +1 415 225 5579
> 
>      "All that I will do, has been done, All that I have, has not."
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 01:46:23 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4344A4CC
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 01:46:23 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id EB3001726
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 01:46:22 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqEEALkK51KDaFve/2dsb2JhbABahBqCfblhgTJ0giUBAQEDASNWGxgCAg0EFQIjNhAJh3EDCQipfJcmDYguF4Epi0GBMwEKBgEcNAcKgmWBSQSJSIxzjkmFQYNLHoEtCBci
X-IronPort-AV: E=Sophos;i="4.95,732,1384318800"; d="scan'208";a="90929623"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 27 Jan 2014 20:46:21 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 62FD0B3F4F;
 Mon, 27 Jan 2014 20:46:21 -0500 (EST)
Date: Mon, 27 Jan 2014 20:46:21 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: pyunyh@gmail.com
Message-ID: <944293786.17288188.1390873581393.JavaMail.root@uoguelph.ca>
In-Reply-To: <20140128005818.GB2722@michelle.cdnetworks.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: Daniel Braniss <danny@cs.huji.ac.il>, freebsd-net@freebsd.org,
 Adam McDougall <mcdouga9@egr.msu.edu>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 01:46:23 -0000

pyunyh@gmail.com wrote:
> On Mon, Jan 27, 2014 at 06:27:19PM -0500, Rick Macklem wrote:
> > pyunyh@gmail.com wrote:
> > > On Sun, Jan 26, 2014 at 09:16:54PM -0500, Rick Macklem wrote:
> > > > Adam McDougall wrote:
> > > > > Also try rsize=32768,wsize=32768 in your mount options, made
> > > > > a
> > > > > huge
> > > > > difference for me.  I've noticed slow file transfers on NFS
> > > > > in 9
> > > > > and
> > > > > finally did some searching a couple months ago, someone
> > > > > suggested
> > > > > it
> > > > > and
> > > > > they were on to something.
> > > > > 
> > > > I have a "hunch" that might explain why 64K NFS reads/writes
> > > > perform
> > > > poorly for some network environments.
> > > > A 64K NFS read reply/write request consists of a list of 34
> > > > mbufs
> > > > when
> > > > passed to TCP via sosend() and a total data length of around
> > > > 65680bytes.
> > > > Looking at a couple of drivers (virtio and ixgbe), they seem to
> > > > expect
> > > > no more than 32-33 mbufs in a list for a 65535 byte TSO xmit. I
> > > > think
> > > > (I don't have anything that does TSO to confirm this) that NFS
> > > > will
> > > > pass
> > > > a list that is longer (34 plus a TCP/IP header).
> > > > At a glance, it appears that the drivers call m_defrag() or
> > > > m_collapse()
> > > > when the mbuf list won't fit in their scatter table (32 or 33
> > > > elements)
> > > > and if this fails, just silently drop the data without sending
> > > > it.
> > > > If I'm right, there would considerable overhead from
> > > > m_defrag()/m_collapse()
> > > > and near disaster if they fail to fix the problem and the data
> > > > is
> > > > silently
> > > > dropped instead of xmited.
> > > > 
> > > 
> > > I think the actual number of DMA segments allocated for the mbuf
> > > chain is determined by bus_dma(9).  bus_dma(9) will coalesce
> > > current segment with previous segment if possible.
> > > 
Btw, I looked at ixgbe.c and it uses bus_dmamap_load_mbuf_sg(), which
seems to used the fixed size scatter/gather list provided as an argument.

> > Ok, I'll have to take a look, but I thought that an array of sized
> > by "num_segs" is passed in as an argument. (And num_segs is set to
> > either IXGBE_82598_SCATTER (100) or IXGBE_82599_SCATTER (32).)
> > It looked to me that the ixgbe driver called itself ix, so it isn't
> > obvious to me which we are talking about. (I know that Daniel
> > Braniss
> > had an ix0 and ix1, which were fixed for NFS by disabling TSO.)
> > 
> 
> It's ix(4). ixbge(4) is a different driver.
> 
Ok, well I was looking at ixgbe.c and that one seems like it
might have the problem, for the 82599 case.

> > I'll admit I mostly looked at virtio's network driver, since that
> > was the one being used by J David.
> > 
> > Problems w.r.t. TSO enabled for NFS using 64K rsize/wsize have been
> > cropping up for quite a while, and I am just trying to find out
> > why.
> > (I have no hardware/software that exhibits the problem, so I can
> > only look at the sources and ask others to try testing stuff.)
> > 
> > > I'm not sure whether you're referring to ixgbe(4) or ix(4) but I
> > > see the total length of all segment size of ix(4) is 65535 so
> > > it has no room for ethernet/VLAN header of the mbuf chain.  The
> > > driver should be fixed to transmit a 64KB datagram.
> > Well, if_hw_tsomax is set to 65535 by the generic code (the driver
> > doesn't set it) and the code in tcp_output() seems to subtract the
> > size of an tcp/ip header from that before passing data to the
> > driver,
> > so I think the mbuf chain passed to the driver will fit in one
> > ip datagram. (I'd assume all sorts of stuff would break for TSO
> > enabled drivers if that wasn't the case?)
> 
> I believe the generic code is doing right.  I'm under the
> impression the non-working TSO indicates a bug in driver.  Some
> drivers didn't account for additional ethernet/VLAN header so the
> total size of DMA segments exceeded 65535.  I've attached a diff
> for ix(4). It wasn't tested at all as I don't have hardware to
> test.
> 
I agree that if my hunch is correct, the drivers aren't correct.
But since the problem seems to have shown up a lot and it is
always reported as an NFS issue, I really want to get to the
bottom of it. And, if changing to 4K clusters is useful
work-around for any breakage in the drivers, then that might
be useful.

If the problem isn't the number of mbufs in the mbuf chain,
then changing to 4K clusters won't have any effect, since the
total data length in the chain remains the same. That will
tell us that the problem is something else.

> > 
> > > I think the use of m_defrag(9) in TSO is suboptimal. All TSO
> > > capable controllers are able to handle multiple TX buffers so it
> > > should have used m_collapse(9) rather than copying entire chain
> > > with m_defrag(9).
> > > 
> > I haven't looked at these closely yet (plan on doing so to-day),
> > but
> > even m_collapse() looked like it copied data between mbufs and that
> > is certainly suboptimal, imho. I don't see why a driver can't split
> > the mbuf list, if there are too many entries for the scatter/gather
> > and do it in two iterations (much like tcp_output() does already,
> > since the data length exceeds 65535 - tcp/ip header size).
> > 
> 
> It can split the mbuf list if controllers supports increased number
> of TX buffers.  Because controller shall consume the same number of
> DMA descriptors for the mbuf list, drivers tend to impose a limit
> on the number of TX buffers to save resources.
> 
> > However, at this point, I just want to find out if the long chain
> > of mbufs is why TSO is problematic for these drivers, since I'll
> > admit I'm getting tired of telling people to disable TSO (and I
> > suspect some don't believe me and never try it).
> > 
> 
> TSO capable controllers tend to have various limitations(the first
> TX buffer should have complete ethernet/IP/TCP header, ip_len of IP
> header should be reset to 0, TCP pseudo checksum should be
> recomputed etc) and cheap controllers need more assistance from
> driver to let its firmware know various IP/TCP header offset
> location in the mbuf.  Because this requires a IP/TCP header
> parsing, it's error prone and very complex.
> 
> > > > Anyhow, I have attached a patch that makes NFS use MJUMPAGESIZE
> > > > clusters,
> > > > so the mbuf count drops from 34 to 18.
> > > > 
> > > 
> > > Could we make it conditional on size?
> > > 
> > Not sure what you mean? If you mean "the size of the read/write",
> > that would be possible for NFSv3, but less so for NFSv4. (The
> > read/write
> > is just one Op. in the compound for NFSv4 and there is no way to
> > predict how much more data is going to be generated by subsequent
> > Ops.)
> > 
> 
> Sorry, I should have been more clearer. You already answered my
> question.  Thanks.
> 
> > If by "size" you mean amount of memory in the machine then, yes, it
> > certainly could be conditional on that. (I plan to try and look at
> > the allocator to-day as well, but if others know of disadvantages
> > with
> > using MJUMPAGESIZE instead of MCLBYTES, please speak up.)
> > 
> > Garrett Wollman already alluded to the MCLBYTES case being
> > pre-allocated,
> > but I'll admit I have no idea what the implications of that are at
> > this
> > time.
> > 
> > > > If anyone has a TSO scatter/gather enabled net interface and
> > > > can
> > > > test this
> > > > patch on it with NFS I/O (default of 64K rsize/wsize) when TSO
> > > > is
> > > > enabled
> > > > and see what effect it has, that would be appreciated.
> > > > 
> > > > Btw, thanks go to Garrett Wollman for suggesting the change to
> > > > MJUMPAGESIZE
> > > > clusters.
> > > > 
> > > > rick
> > > > ps: If the attachment doesn't make it through and you want the
> > > > patch, just
> > > >     email me and I'll send you a copy.
> > > > 
> 

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 01:51:14 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8F09C74C
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 01:51:14 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 2485217BE
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 01:51:13 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: 
X-IronPort-AV: E=Sophos;i="4.95,732,1384318800"; d="scan'208";a="91495648"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 27 Jan 2014 20:51:12 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 3AC49B4032;
 Mon, 27 Jan 2014 20:51:12 -0500 (EST)
Date: Mon, 27 Jan 2014 20:51:12 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Jack Vogel <jfvogel@gmail.com>
Message-ID: <482557096.17290094.1390873872231.JavaMail.root@uoguelph.ca>
In-Reply-To: <CAFOYbcndfNwTbNdcep4fgeYGiFqXd0Wp6cCFVJgS_mAkv3TwWw@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.209]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: Daniel Braniss <danny@cs.huji.ac.il>, FreeBSD Net <freebsd-net@freebsd.org>,
 Adam McDougall <mcdouga9@egr.msu.edu>, Pyun YongHyeon <pyunyh@gmail.com>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 01:51:14 -0000

Jack Vogel wrote:
> That header file is for the VF driver :) which I don't believe is
> being
> used in this case.
> The driver is capable of handling 256K but its limited by the stack
> to 64K
> (look in
> ixgbe.h), so its not a few bytes off due to the vlan header.
> 
> The scatter size is not an arbitrary one, its due to hardware
> limitations
> in Niantic
> (82599).  Turning off TSO in the 10G environment is not practical,
> you will
> have
> trouble getting good performance.
> 
> Jack
> 
Well, if you look at this thread, Daniel got much better performance
by turning off TSO. However, I agree that this is not an ideal solution.
http://docs.FreeBSD.org/cgi/mid.cgi?2C287272-7B57-4AAD-B22F-6A65D9F8677B

rick

> 
> 
> On Mon, Jan 27, 2014 at 4:58 PM, Yonghyeon PYUN <pyunyh@gmail.com>
> wrote:
> 
> > On Mon, Jan 27, 2014 at 06:27:19PM -0500, Rick Macklem wrote:
> > > pyunyh@gmail.com wrote:
> > > > On Sun, Jan 26, 2014 at 09:16:54PM -0500, Rick Macklem wrote:
> > > > > Adam McDougall wrote:
> > > > > > Also try rsize=32768,wsize=32768 in your mount options,
> > > > > > made a
> > > > > > huge
> > > > > > difference for me.  I've noticed slow file transfers on NFS
> > > > > > in 9
> > > > > > and
> > > > > > finally did some searching a couple months ago, someone
> > > > > > suggested
> > > > > > it
> > > > > > and
> > > > > > they were on to something.
> > > > > >
> > > > > I have a "hunch" that might explain why 64K NFS reads/writes
> > > > > perform
> > > > > poorly for some network environments.
> > > > > A 64K NFS read reply/write request consists of a list of 34
> > > > > mbufs
> > > > > when
> > > > > passed to TCP via sosend() and a total data length of around
> > > > > 65680bytes.
> > > > > Looking at a couple of drivers (virtio and ixgbe), they seem
> > > > > to
> > > > > expect
> > > > > no more than 32-33 mbufs in a list for a 65535 byte TSO xmit.
> > > > > I
> > > > > think
> > > > > (I don't have anything that does TSO to confirm this) that
> > > > > NFS will
> > > > > pass
> > > > > a list that is longer (34 plus a TCP/IP header).
> > > > > At a glance, it appears that the drivers call m_defrag() or
> > > > > m_collapse()
> > > > > when the mbuf list won't fit in their scatter table (32 or 33
> > > > > elements)
> > > > > and if this fails, just silently drop the data without
> > > > > sending it.
> > > > > If I'm right, there would considerable overhead from
> > > > > m_defrag()/m_collapse()
> > > > > and near disaster if they fail to fix the problem and the
> > > > > data is
> > > > > silently
> > > > > dropped instead of xmited.
> > > > >
> > > >
> > > > I think the actual number of DMA segments allocated for the
> > > > mbuf
> > > > chain is determined by bus_dma(9).  bus_dma(9) will coalesce
> > > > current segment with previous segment if possible.
> > > >
> > > Ok, I'll have to take a look, but I thought that an array of
> > > sized
> > > by "num_segs" is passed in as an argument. (And num_segs is set
> > > to
> > > either IXGBE_82598_SCATTER (100) or IXGBE_82599_SCATTER (32).)
> > > It looked to me that the ixgbe driver called itself ix, so it
> > > isn't
> > > obvious to me which we are talking about. (I know that Daniel
> > > Braniss
> > > had an ix0 and ix1, which were fixed for NFS by disabling TSO.)
> > >
> >
> > It's ix(4). ixbge(4) is a different driver.
> >
> > > I'll admit I mostly looked at virtio's network driver, since that
> > > was the one being used by J David.
> > >
> > > Problems w.r.t. TSO enabled for NFS using 64K rsize/wsize have
> > > been
> > > cropping up for quite a while, and I am just trying to find out
> > > why.
> > > (I have no hardware/software that exhibits the problem, so I can
> > > only look at the sources and ask others to try testing stuff.)
> > >
> > > > I'm not sure whether you're referring to ixgbe(4) or ix(4) but
> > > > I
> > > > see the total length of all segment size of ix(4) is 65535 so
> > > > it has no room for ethernet/VLAN header of the mbuf chain.  The
> > > > driver should be fixed to transmit a 64KB datagram.
> > > Well, if_hw_tsomax is set to 65535 by the generic code (the
> > > driver
> > > doesn't set it) and the code in tcp_output() seems to subtract
> > > the
> > > size of an tcp/ip header from that before passing data to the
> > > driver,
> > > so I think the mbuf chain passed to the driver will fit in one
> > > ip datagram. (I'd assume all sorts of stuff would break for TSO
> > > enabled drivers if that wasn't the case?)
> >
> > I believe the generic code is doing right.  I'm under the
> > impression the non-working TSO indicates a bug in driver.  Some
> > drivers didn't account for additional ethernet/VLAN header so the
> > total size of DMA segments exceeded 65535.  I've attached a diff
> > for ix(4). It wasn't tested at all as I don't have hardware to
> > test.
> >
> > >
> > > > I think the use of m_defrag(9) in TSO is suboptimal. All TSO
> > > > capable controllers are able to handle multiple TX buffers so
> > > > it
> > > > should have used m_collapse(9) rather than copying entire chain
> > > > with m_defrag(9).
> > > >
> > > I haven't looked at these closely yet (plan on doing so to-day),
> > > but
> > > even m_collapse() looked like it copied data between mbufs and
> > > that
> > > is certainly suboptimal, imho. I don't see why a driver can't
> > > split
> > > the mbuf list, if there are too many entries for the
> > > scatter/gather
> > > and do it in two iterations (much like tcp_output() does already,
> > > since the data length exceeds 65535 - tcp/ip header size).
> > >
> >
> > It can split the mbuf list if controllers supports increased number
> > of TX buffers.  Because controller shall consume the same number of
> > DMA descriptors for the mbuf list, drivers tend to impose a limit
> > on the number of TX buffers to save resources.
> >
> > > However, at this point, I just want to find out if the long chain
> > > of mbufs is why TSO is problematic for these drivers, since I'll
> > > admit I'm getting tired of telling people to disable TSO (and I
> > > suspect some don't believe me and never try it).
> > >
> >
> > TSO capable controllers tend to have various limitations(the first
> > TX buffer should have complete ethernet/IP/TCP header, ip_len of IP
> > header should be reset to 0, TCP pseudo checksum should be
> > recomputed etc) and cheap controllers need more assistance from
> > driver to let its firmware know various IP/TCP header offset
> > location in the mbuf.  Because this requires a IP/TCP header
> > parsing, it's error prone and very complex.
> >
> > > > > Anyhow, I have attached a patch that makes NFS use
> > > > > MJUMPAGESIZE
> > > > > clusters,
> > > > > so the mbuf count drops from 34 to 18.
> > > > >
> > > >
> > > > Could we make it conditional on size?
> > > >
> > > Not sure what you mean? If you mean "the size of the read/write",
> > > that would be possible for NFSv3, but less so for NFSv4. (The
> > > read/write
> > > is just one Op. in the compound for NFSv4 and there is no way to
> > > predict how much more data is going to be generated by subsequent
> > > Ops.)
> > >
> >
> > Sorry, I should have been more clearer. You already answered my
> > question.  Thanks.
> >
> > > If by "size" you mean amount of memory in the machine then, yes,
> > > it
> > > certainly could be conditional on that. (I plan to try and look
> > > at
> > > the allocator to-day as well, but if others know of disadvantages
> > > with
> > > using MJUMPAGESIZE instead of MCLBYTES, please speak up.)
> > >
> > > Garrett Wollman already alluded to the MCLBYTES case being
> > > pre-allocated,
> > > but I'll admit I have no idea what the implications of that are
> > > at this
> > > time.
> > >
> > > > > If anyone has a TSO scatter/gather enabled net interface and
> > > > > can
> > > > > test this
> > > > > patch on it with NFS I/O (default of 64K rsize/wsize) when
> > > > > TSO is
> > > > > enabled
> > > > > and see what effect it has, that would be appreciated.
> > > > >
> > > > > Btw, thanks go to Garrett Wollman for suggesting the change
> > > > > to
> > > > > MJUMPAGESIZE
> > > > > clusters.
> > > > >
> > > > > rick
> > > > > ps: If the attachment doesn't make it through and you want
> > > > > the
> > > > > patch, just
> > > > >     email me and I'll send you a copy.
> > > > >
> >
> > _______________________________________________
> > freebsd-net@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to
> > "freebsd-net-unsubscribe@freebsd.org"
> >
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 02:14:55 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2A16EA9F
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 02:14:55 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id F32271953
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 02:14:54 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s0S2Ep3C065392
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Mon, 27 Jan 2014 18:14:51 -0800 (PST)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s0S2EobG065391;
 Mon, 27 Jan 2014 18:14:50 -0800 (PST) (envelope-from jmg)
Date: Mon, 27 Jan 2014 18:14:50 -0800
From: John-Mark Gurney <jmg@funkthat.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID: <20140128021450.GY13704@funkthat.com>
Mail-Followup-To: Rick Macklem <rmacklem@uoguelph.ca>,
 freebsd-net@freebsd.org, Adam McDougall <mcdouga9@egr.msu.edu>
References: <20140128002826.GU13704@funkthat.com>
 <1415339672.17282775.1390872779067.JavaMail.root@uoguelph.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1415339672.17282775.1390872779067.JavaMail.root@uoguelph.ca>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Mon, 27 Jan 2014 18:14:51 -0800 (PST)
Cc: freebsd-net@freebsd.org, Adam McDougall <mcdouga9@egr.msu.edu>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 02:14:55 -0000

Rick Macklem wrote this message on Mon, Jan 27, 2014 at 20:32 -0500:
> John-Mark Gurney wrote:
> > Rick Macklem wrote this message on Mon, Jan 27, 2014 at 18:47 -0500:
> > > John-Mark Gurney wrote:
> > > > Rick Macklem wrote this message on Sun, Jan 26, 2014 at 21:16
> > > > -0500:
> > > > > Btw, thanks go to Garrett Wollman for suggesting the change to
> > > > > MJUMPAGESIZE
> > > > > clusters.
> > > > > 
> > > > > rick
> > > > > ps: If the attachment doesn't make it through and you want the
> > > > > patch, just
> > > > >     email me and I'll send you a copy.
> > > > 
> > > > The patch looks good, but we probably shouldn't change
> > > > _readlink..
> > > > The chances of a link being >2k are pretty slim, and the chances
> > > > of
> > > > the link being >32k are even smaller...
> > > > 
> > > Yea, I already thought of that, actually. However, see below w.r.t.
> > > NFSv4.
> > > 
> > > However, at this point I
> > > mostly want to find out if it the long mbuf chain that causes
> > > problems
> > > for TSO enabled network interfaces.
> > 
> > I agree, though a long mbuf chain is more of a driver issue than an
> > NFS issue...
> > 
> Yes, if my hunch is correct, it is. If my hunch gets verified, I will
> be posting w.r.t. how best to deal with the problem. I suspect a patch
> like this one might serve as a useful work-around while the drivers
> gets fixed, if the hunch is correct.

It would be nice to have a way to force such a segment to go out to
the drivers to make debugging/testing drivers easier...  I'm not sure
the best way to handle that though...

> > > > In fact, we might want to switch _readlink to MGET (could be
> > > > conditional
> > > > upon cnt) so that if it fits in an mbuf we don't allocate a
> > > > cluster
> > > > for
> > > > it...
> > > > 
> > > For NFSv4, what was an RPC for NFSv3 becomes one of several Ops. in
> > > a compound RPC. As such, there is no way to know how much
> > > additional
> > > RPC message there will be. So, although the readlink reply won't
> > > use
> > > much of the 4K allocation, replies for subsequent Ops. in the
> > > compound
> > > certainly could. (Is it more efficient to allocate 4K now and use
> > > part of it for subsequent message reply stuff or allocate
> > > additional
> > > mbuf clusters later for subsequent stuff, as required? On a small
> > > memory constrained machine, I suspect the latter is correct, but
> > > for
> > > the kind of hardware that has TSO scatter/gather enabled network
> > > interfaces, I'm not so sure. At this point, I wouldn't even say
> > > that using 4K clusters is going to be a win and my hunch is that
> > > any win wouldn't apply to small memory constrained machines.)
> > 
> > Though the code that was patched wasn't using any partial buffers,
> > it was always allocating a new buffer...  If the code in
> > _read/_readlinks starts using a previous mbuf chain, then obviously
> > things are different and I'd agree, always allocating a 2k/4k
> > cluster makes sense...
> > 
> Yes, but nd_mb and nd_bpos are set, which means subsequent replies can
> use the remainder of the cluster.

Couldn't we scan the list of replies, find out how much data we need,
m_getm the space for it all (which will use 4k clusters as necessary)?

> Why does it always allocate a new cluster? Well, because the code is
> OLD. It was written for OpenBSD2.6 and, at that time, I tried to make
> it portable across the BSDen. I'm not so concerned w.r.t. its portability
> now, since no one else is porting it and I don't plan to, but I still
> think it would be nice if it were portable to other BSDen.
> Back when I wrote it, I believe that MCLBYTES was 1K and an entire
> cluster was needed. (To be honest, I found out that FreeBSD's NCLBYTES
> is 2K about 2 days ago, when I started looking at this stuff.)
> 
> Could it now look to see if enough bytes (a little over 1K) were available
> in the current cluster and use that. Yes, but it would reduce the portability
> of the code and I don't think it would make a measurable difference performance
> wise.

Are you sure it would reduce the portability?  I can't think of a way
it would...  Some code will always need to be written for portability..

> > > My test server has 256Mbytes of ram and it certainly doesn't show
> > > any improvement (big surprise;-), but it also doesn't show any
> > > degradation for the limited testing I've done.
> > 
> > I'm not too surprised, unless you're on a heavy server pushing
> > >200MB/sec, the allocation cost is probably cheap enough that it
> > doesn't show up...  going to 4k means immediately half as many mbufs
> > are needed/allocated, and as they are page sized, don't have the
> > problems of physical memory fragmentation, nor do they have to do an
> > IPI/tlb shoot down in the case of multipage allocations...  (I'm
> > dealing w/ this for geli.)
> > 
> Yes, Garrett Wollman proposed this and I suspect there might be a
> performance gain for larger systems. He has a more involved patch.
> To be honest, if Garrett is convinced that his patch is of benefit
> performance wise, I will do a separate posting w.r.t. it and whether
> or not it is appropriate to be committed to head, etc.
> 
> > > Again, my main interest at this point is whether reducing the
> > > number of mbufs in the chain fixes the TSO issues. I think
> > > the question of whether or not 4K clusters are performance
> > > improvement in general, is an interesting one that comes later.
> > 
> > Another thing I noticed is that we are getting an mbuf and then
> > allocating a cluster... Is there a reason we aren't using something
> > like m_getm or m_getcl?  We have a special uma zone that has
> > mbuf and mbuf cluster already paired meaning we save some lock
> > operations for each segment allocated...
> > 
> See above w.r.t. OLD portable code. There was a time when MGETCL()
> wasn't guaranteed to succeed even when M_WAITOK is specified.
> This is also why there is that weird loop in the NFSMCLGET() macro.

Correct, but as you wrapped them in NFS* macros, it doesn't mean you
can't merge the MGETCL w/ NFSMCLGET into a new function that merges
the two...  It's just another (not too difficult) wrapper that the
porter has to write...

Though apparently portability has been given up since you use MCLGET
directly in nfsserver/nfs_nfsdport.c instead of NFSMCLGET...

Sounds like nfsport.h needs some updating....

> (I think there was a time in FreeBSD's past when allocation was never
>  guaranteed and the rest of the code doesn't tolerate a NULL mbuf ptr.
>  Something like M_TRYWAIT in old versions of FreeBSD?)

Correct, there was a time that M_WAITOK could still return, but it was
many years ago and many releases ago...

> Btw, Garrett Wollman's patch uses m_getm2() to get the mbuf list.

Interestingly, m_getm2 will use 4k clusters as necessary, and in
the _readlink case, do the correct thing...

Hmmm... m_getm2 isn't documented...  It was added by andre almost 7
years ago...  It does appear to be a public interface as ofed, sctp
iscsi and ng(_tty) all use it, though only sctp appears to use it any
differently than m_getm.. The rest could simply use m_getm instead
of m_getm2...   Considering it was committed the day before SCTP was
committed, I'm not too surprised...

P.S. if someone wants to submit a patch to mbuf.9 to update the docs
that would be helpful... I'll review and commit...  and m_append is
also undocumented...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 04:27:33 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 70081213
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 04:27:33 +0000 (UTC)
Received: from hergotha.csail.mit.edu
 (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id EE7C914CC
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 04:27:32 +0000 (UTC)
Received: from hergotha.csail.mit.edu (localhost [127.0.0.1])
 by hergotha.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s0S4RUAj077762;
 Mon, 27 Jan 2014 23:27:30 -0500 (EST)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: (from wollman@localhost)
 by hergotha.csail.mit.edu (8.14.7/8.14.4/Submit) id s0S4RTVn077761;
 Mon, 27 Jan 2014 23:27:29 -0500 (EST) (envelope-from wollman)
Date: Mon, 27 Jan 2014 23:27:29 -0500 (EST)
Message-Id: <201401280427.s0S4RTVn077761@hergotha.csail.mit.edu>
From: wollman@freebsd.org
To: rmacklem@uoguelph.ca
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
In-Reply-To: <1415339672.17282775.1390872779067.JavaMail.root@uoguelph.ca>
References: <20140128002826.GU13704@funkthat.com>
Organization: none
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (hergotha.csail.mit.edu [127.0.0.1]); Mon, 27 Jan 2014 23:27:30 -0500 (EST)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED
 autolearn=disabled version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 hergotha.csail.mit.edu
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 04:27:33 -0000

In article <1415339672.17282775.1390872779067.JavaMail.root@uoguelph.ca>,
Rick Macklem writes:

>Btw, Garrett Wollman's patch uses m_getm2() to get the mbuf list.

I do two things in my version that should provide an improvement.  The
first is, as you say, using m_getm2() to allocate a list of mbufs.
The second is to use a fixed-size iovec array and a special-purpose
UMA zone to allocate the iovec and a preinitialized uio as a single
allocation.

I haven't tested this approach at all (not even compilation testing),
so I don't know whether it will work or not, and I don't know if it
actually provides the sort of performance improvement I expect.

The real big improvement, which I have not tried to implement, would
be to use physical pages (via sfbufs) by sharing the inner loop of
sendfile(2).  Since I use ZFS as my backing filesystem, I'm not sure
this would have any benefit for me, but it should be a measurable
improvement for UFS-backed NFS servers.

My patch follows.  Note that I haven't even compile-tested it yet, and
there is likely to be some fuzz if you apply it to stock kernel
sources.

-GAWollman

--- nfs_nfsdport.c.orig	2014-01-26 23:38:58.296234939 -0500
+++ nfs_nfsdport.c	2014-01-26 23:46:17.901236792 -0500
@@ -50,6 +50,14 @@
 
 FEATURE(nfsd, "NFSv4 server");
 
+#define NFS_NIOVEC (NFS_SRVMAXDATA / MCLBYTES + 2)
+struct nfsd_iovec {
+	struct	uio nfsiov_uio;
+	struct	iovec nfsiov_iov[NFS_NIOVEC];
+};
+static struct uma_zone *nfsd_iovec_zone;
+static void nfsd_iovec_construct(struct uio **, struct mbuf **, struct mbuf **,
+    int);
 extern u_int32_t newnfs_true, newnfs_false, newnfs_xdrneg1;
 extern int nfsrv_useacl;
 extern int newnfs_numnfsd;
@@ -626,7 +634,7 @@
 	struct iovec *iv2;
 	int error = 0, len, left, siz, tlen, ioflag = 0;
 	struct mbuf *m2 = NULL, *m3;
-	struct uio io, *uiop = &io;
+	struct uio *uiop;
 	struct nfsheur *nh;
 
 	len = left = NFSM_RNDUP(cnt);
@@ -634,49 +642,11 @@
 	/*
 	 * Generate the mbuf list with the uio_iov ref. to it.
 	 */
-	i = 0;
-	while (left > 0) {
-		NFSMGET(m);
-		MCLGET(m, M_WAIT);
-		m->m_len = 0;
-		siz = min(M_TRAILINGSPACE(m), left);
-		left -= siz;
-		i++;
-		if (m3)
-			m2->m_next = m;
-		else
-			m3 = m;
-		m2 = m;
-	}
-	MALLOC(iv, struct iovec *, i * sizeof (struct iovec),
-	    M_TEMP, M_WAITOK);
-	uiop->uio_iov = iv2 = iv;
-	m = m3;
-	left = len;
-	i = 0;
-	while (left > 0) {
-		if (m == NULL)
-			panic("nfsvno_read iov");
-		siz = min(M_TRAILINGSPACE(m), left);
-		if (siz > 0) {
-			iv->iov_base = mtod(m, caddr_t) + m->m_len;
-			iv->iov_len = siz;
-			m->m_len += siz;
-			left -= siz;
-			iv++;
-			i++;
-		}
-		m = m->m_next;
-	}
-	uiop->uio_iovcnt = i;
+	nfsd_iovec_construct(&uiop, &m3, &m2, len);
 	uiop->uio_offset = off;
-	uiop->uio_resid = len;
-	uiop->uio_rw = UIO_READ;
-	uiop->uio_segflg = UIO_SYSSPACE;
 	nh = nfsrv_sequential_heuristic(uiop, vp);
 	ioflag |= nh->nh_seqcount << IO_SEQSHIFT;
 	error = VOP_READ(vp, uiop, IO_NODELOCKED | ioflag, cred);
-	FREE((caddr_t)iv2, M_TEMP);
 	if (error) {
 		m_freem(m3);
 		*mpp = NULL;
@@ -695,6 +665,7 @@
 	*mpendp = m2;
 
 out:
+	uma_zfree(nfsd_iovec_zone, uiop);	/* now safe to free */
 	NFSEXITCODE(error);
 	return (error);
 }
@@ -3284,6 +3255,74 @@
 	}
 }
 
+/*
+ * UMA initializer for nfsd_iovec objects.
+ */
+static int
+nfsd_iovec_init(void *mem, int size, int flags)
+{
+	int i;
+	struct nfsd_iovec *nfsiov = mem;
+	struct uio *uio = &nfsiov->nfsiov_uio;
+
+	KASSERT(size == sizeof(struct nfsd_iovec));
+	uio->uio_iov = nfsiov->nfsiov_iovec;
+	uio->uio_iovcnt = 0;
+	/* don't care about state of uio_offset */
+	uio->uio_resid = 0;
+	uio->uio_segflg = UIO_SYSSPACE;
+	uio->uio_rw = UIO_READ;
+	uio->uio_td = NULL;
+	return (0);
+}
+
+/* 
+ * The destructor doesn't need to do anything different from the
+ * initializer.
+ */
+static int
+nfsd_iovec_dtor(void *mem, int size, void *arg)
+{
+	return (nfsd_iovec_init(mem, size, 0));
+}
+
+static void
+nfsd_iovec_construct(struct uio **uiop, struct mbuf **mp, struct mbuf **tailp,
+    int left)
+{
+	struct nfsd_iovec *nfsiov;
+	struct iovec *iov;
+	struct mbuf *m, *m2;
+	struct uio *uio;
+	int siz;
+
+	/* uma_zalloc is guaranteed to succeed or deadlock with M_WAITOK */
+	nfsiov = uma_zalloc(nfsd_iovec_zone, NULL, M_WAITOK);
+	*uiop = uio = &nfsiov->nfsiov_uio;
+	for (;;) {
+		m = m_getm2(NULL, left, M_WAITOK, MT_DATA, 0);
+		if (m != NULL) /* should always be taken with M_WAITOK */
+			break;
+		nfs_catnap(PZERO, 0, "nfsiovec");
+	}
+	*mp = m;
+	uio->uio_resid = left;
+	iov = uio->uio_iov;
+
+	while (m != NULL && left > 0) {
+		if (++uio->uio_iovcnt > NFSIOV_NIOVEC)
+			panic("nfsd_iovec_construct: mbuf chain exceeded size");
+		iov->iov_base = mtod(m, char *);
+		m->m_len = iov->iov_len = siz = min(M_TRAILINGSPACE(m), left);
+		left -= siz;
+		iov++;
+		m2 = m->m_next;
+		if ((m2 = m->m_next) == NULL && tailp != NULL) /* last one? */
+			*tailp = m;
+		m = m2;
+	}
+}
+
 extern int (*nfsd_call_nfsd)(struct thread *, struct nfssvc_args *);
 
 /*
@@ -3319,6 +3358,10 @@
 		vn_deleg_ops.vndeleg_recall = nfsd_recalldelegation;
 		vn_deleg_ops.vndeleg_disable = nfsd_disabledelegation;
 #endif
+		nfsd_iovec_zone = uma_zcreate("nfsd iovec", 
+		    sizeof(struct nfsd_iovec), NULL /* ctor */,
+		    nfsd_iovec_dtor, nfsd_iovec_init, NULL /* fini */,
+		    sizeof(void *) - 1 /* alignment mask */, 0 /* flags */);
 		nfsd_call_servertimer = nfsrv_servertimer;
 		nfsd_call_nfsd = nfssvc_nfsd;
 		loaded = 1;
@@ -3347,6 +3390,9 @@
 		if (nfsrvd_pool != NULL)
 			svcpool_destroy(nfsrvd_pool);
 
+		/* Release memory in the iovec zone */
+		uma_zdestroy(nfsd_iovec_zone);
+
 		/* and get rid of the locks */
 		for (i = 0; i < NFSRVCACHE_HASHSIZE; i++)
 			mtx_destroy(&nfsrc_tcpmtx[i]);

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 06:28:44 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C96C9B14;
 Tue, 28 Jan 2014 06:28:44 +0000 (UTC)
Received: from mail-ie0-x232.google.com (mail-ie0-x232.google.com
 [IPv6:2607:f8b0:4001:c03::232])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8DC941C9A;
 Tue, 28 Jan 2014 06:28:44 +0000 (UTC)
Received: by mail-ie0-f178.google.com with SMTP id x13so7121234ief.37
 for <multiple recipients>; Mon, 27 Jan 2014 22:28:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=MIYIa4w/O3tQmMg88kcxETTAMgoN0ggwnVRCLgpfLfk=;
 b=QkYKol6EZhtDNhAWdiAL+dfYPeurYjjSDdyIza32q1N6oobbrzBH2jpHm0vZEsVbo+
 siYsVuPa45xQv+HzopCFgKthF5//1PvAASBzc21Oy9ZM7ZxbyBjnCuo81K0hfhTA5c6W
 G7Pe06UcA88V0Xh1Ci0G6eRj+2HCkO2p9/OIX7stGwvAIql5mI8qeiL9W2Z5fjRKG8TN
 Ik00GfTKdib6dtt6Zc1DO4JSpDUZW83PueTIw/oYbbvb7Jncl3z+jx5jgDbeYFOTrM3t
 GoUyXqhcJ4OmnuOZcbWes7hw+zjjDX6sqtIpCJUzgSpWMgW8HSCG1lvIUMgzH76BDoVG
 e+Ew==
MIME-Version: 1.0
X-Received: by 10.50.13.9 with SMTP id d9mr21433238igc.25.1390890522907; Mon,
 27 Jan 2014 22:28:42 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Mon, 27 Jan 2014 22:28:42 -0800 (PST)
In-Reply-To: <201401280427.s0S4RTVn077761@hergotha.csail.mit.edu>
References: <20140128002826.GU13704@funkthat.com>
 <1415339672.17282775.1390872779067.JavaMail.root@uoguelph.ca>
 <201401280427.s0S4RTVn077761@hergotha.csail.mit.edu>
Date: Tue, 28 Jan 2014 01:28:42 -0500
X-Google-Sender-Auth: mRtwRKtE_tp1KE4p2pUa9Xn1F4I
Message-ID: <CABXB=RRHWd2xo0NChe2KGRbOxUeTJxBh=oH3JfJu+EFV-C61pQ@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: wollman@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-net@freebsd.org, Rick Macklem <rmacklem@uoguelph.ca>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 06:28:44 -0000

A few questions as I try to parse the various patches floating around for this.

What's the difference between src/sys/nfsserver and
src/sys/fs/nfsserver?  It looks like maybe the former is for NFSv3 and
the latter is for NFSv4?

If so, these patches appear to be for the NFSv4 server.  Since we are
using the NFSv3 server exclusively, does that mean we would need to do
something similar somewhere in the neighborhood of line 930 of
src/sys/nfsserver/nfs_serv.c?

Also, these patches appear server-side.  To make sure things flow
smoothly, will additional client-side changes be necessary?  There is
some MGET/MCLGET in src/sys/nfsclient/nfs_subs.c.  (The equivalent in
src/sys/fs/nfsclient/nfs_clcomsubs.c appear to be using the
NFSMGET/NFSMCLGET macros, so presumably those are handled?)

In any case, the switch from 2k to 4k mbufs and m_getm2 seems well
worthwhile regardless of whether it addresses this specific issue.  It
should reduce a lot of overhead in many common cases.

If my understanding isn't too far off, I can take a whack at testing
the result, but only on NFSv3.

Thanks!

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 06:55:14 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id E22C8DA3
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 06:55:14 +0000 (UTC)
Received: from hergotha.csail.mit.edu
 (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9A0641E40
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 06:55:14 +0000 (UTC)
Received: from hergotha.csail.mit.edu (localhost [127.0.0.1])
 by hergotha.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s0S6tCsE079255;
 Tue, 28 Jan 2014 01:55:12 -0500 (EST)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: (from wollman@localhost)
 by hergotha.csail.mit.edu (8.14.7/8.14.4/Submit) id s0S6tBWj079252;
 Tue, 28 Jan 2014 01:55:11 -0500 (EST) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <21223.21583.878646.673835@hergotha.csail.mit.edu>
Date: Tue, 28 Jan 2014 01:55:11 -0500
From: Garrett Wollman <wollman@freebsd.org>
To: J David <j.david.lists@gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
In-Reply-To: <CABXB=RRHWd2xo0NChe2KGRbOxUeTJxBh=oH3JfJu+EFV-C61pQ@mail.gmail.com>
References: <20140128002826.GU13704@funkthat.com>
 <1415339672.17282775.1390872779067.JavaMail.root@uoguelph.ca>
 <201401280427.s0S4RTVn077761@hergotha.csail.mit.edu>
 <CABXB=RRHWd2xo0NChe2KGRbOxUeTJxBh=oH3JfJu+EFV-C61pQ@mail.gmail.com>
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (hergotha.csail.mit.edu [127.0.0.1]); Tue, 28 Jan 2014 01:55:12 -0500 (EST)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED
 autolearn=disabled version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 hergotha.csail.mit.edu
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 06:55:14 -0000

<<On Tue, 28 Jan 2014 01:28:42 -0500, J David <j.david.lists@gmail.com> said:

> What's the difference between src/sys/nfsserver and
> src/sys/fs/nfsserver?  It looks like maybe the former is for NFSv3 and
> the latter is for NFSv4?

/sys/nfs* is the "old" (read: obsolete) NFS client and server.
/sys/fs/nfs* is the "new" (default) NFS client and server.  Both
implementations do both NFSv2 and NFSv3; only the "new" implementation
does NFSv4.  Even if you are only using NFSv3, you want to be using
the "new" implementation, and the "old" one should go away before
the stable/11 branch happens.

We're running a mix of 9.1 (with some earlier versions of Rick's DRC
patches and FHA for NFSv3) and 9.2 (with the DRC patches) currently,
and I'm looking through a bunch of changes to pull forward from
stable/9.

-GAWollman

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 06:59:18 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A8F99E93;
 Tue, 28 Jan 2014 06:59:18 +0000 (UTC)
Received: from mail-ie0-x22b.google.com (mail-ie0-x22b.google.com
 [IPv6:2607:f8b0:4001:c03::22b])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6BA2D1E5B;
 Tue, 28 Jan 2014 06:59:18 +0000 (UTC)
Received: by mail-ie0-f171.google.com with SMTP id as1so7191458iec.2
 for <multiple recipients>; Mon, 27 Jan 2014 22:59:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=/8sKFAiOTxgqgN0SxKq5sKBZZiVNiFDOP5opMVNTEO8=;
 b=rt+CP83wxfjJyiXH+yxoFI21VFG3DuZZ+HT6If4A0AsF3nzjwwtozwALJYNG4DaRoB
 ExpENXBcr1hnLisx2Mt+2P/xuuwYE3nH6oJNA9jNqKNPRYNDQVtowuxfPxPPJSMJwZ4e
 g55D71k9S/u7TS0ewUXKZm/dfCJ7uiNDqSlBkFjgcIEFG4wjc4amJCSxp6+q8h/MdY4z
 CF34PhZrtaHO6qSoBvJQe5rfFT7c8j0rKeYZDy7nT6pFEySNqo5dqbitykrL4lbrzZ1n
 hRW6p49um8YsoC4sRcJg0PlNxPnt8QeZwi5G/vxTnsE2Vjh6sGVBgLatdKJIXk62rPVL
 RFBw==
MIME-Version: 1.0
X-Received: by 10.42.121.147 with SMTP id j19mr25148037icr.13.1390892357869;
 Mon, 27 Jan 2014 22:59:17 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Mon, 27 Jan 2014 22:59:17 -0800 (PST)
In-Reply-To: <CABXB=RRHWd2xo0NChe2KGRbOxUeTJxBh=oH3JfJu+EFV-C61pQ@mail.gmail.com>
References: <20140128002826.GU13704@funkthat.com>
 <1415339672.17282775.1390872779067.JavaMail.root@uoguelph.ca>
 <201401280427.s0S4RTVn077761@hergotha.csail.mit.edu>
 <CABXB=RRHWd2xo0NChe2KGRbOxUeTJxBh=oH3JfJu+EFV-C61pQ@mail.gmail.com>
Date: Tue, 28 Jan 2014 01:59:17 -0500
X-Google-Sender-Auth: A8EVTsraayRwaKYEwWl0Vd1hdM4
Message-ID: <CABXB=RQjxoe_hQitA7z=LsPx1hyhu7ci9u7G9Ne_KFHTimw+Zw@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: wollman@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-net@freebsd.org, Rick Macklem <rmacklem@uoguelph.ca>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 06:59:18 -0000

Another way to test this is to instrument the virtio driver, which
turned out to be very straightforward:

Index: if_vtnet.c

===================================================================

--- if_vtnet.c (revision 260701)

+++ if_vtnet.c (working copy)

@@ -1886,6 +1887,7 @@

  return (virtqueue_enqueue(vq, txhdr, &sg, sg.sg_nseg, 0));


 fail:

+ sc->vtnet_stats.tx_excess_mbuf_drop++;

  m_freem(*m_head);

  *m_head = NULL;


@@ -2645,6 +2647,9 @@

  SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "tx_task_rescheduled",

      CTLFLAG_RD, &stats->tx_task_rescheduled,

      "Times the transmit interrupt task rescheduled itself");

+ SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "tx_excess_mbuf_drop",

+     CTLFLAG_RD, &stats->tx_excess_mbuf_drop,

+     "Times packets were dropped due to excess mbufs");

 }


 static int

Index: if_vtnetvar.h

===================================================================

--- if_vtnetvar.h (revision 260701)

+++ if_vtnetvar.h (working copy)

@@ -48,6 +48,7 @@

  unsigned long tx_csum_bad_ethtype;

  unsigned long tx_tso_bad_ethtype;

  unsigned long tx_task_rescheduled;

+ unsigned long tx_excess_mbuf_drop;

 };


 struct vtnet_softc {


This patch didn't seem harmful from a performance standpoint since if
things are working, the counter increment never gets hit.

With this change, I re-ran some 64k tests.  I found that the number of
drops was very small, but not zero.

On the client, doing the write-append test (which has no reads), it
seems like it slowly builds up 8 with what appears to be some sort of
back off (each one takes longer to appear than the last):


$ sysctl dev.vtnet.1.tx_excess_mbuf_drop

dev.vtnet.1.tx_excess_mbuf_drop: 8


But after 8, it appears congestion control is clamped down so hard
that no more happen.

Once read activity starts, the server builds up more:

dev.vtnet.1.tx_excess_mbuf_drop: 53


So while there aren't a lot of these, they definitely do exist and
there's just no way they're good for performance.

Thanks!

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 11:46:31 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id BE4105B3
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 11:46:31 +0000 (UTC)
Received: from sam.nabble.com (sam.nabble.com [216.139.236.26])
 (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id A140813DC
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 11:46:31 +0000 (UTC)
Received: from [192.168.236.26] (helo=sam.nabble.com)
 by sam.nabble.com with esmtp (Exim 4.72)
 (envelope-from <zaphod@berentweb.com>) id 1W877e-0004Xu-4D
 for freebsd-net@freebsd.org; Tue, 28 Jan 2014 03:46:30 -0800
Date: Tue, 28 Jan 2014 03:46:30 -0800 (PST)
From: Beeblebrox <zaphod@berentweb.com>
To: freebsd-net@freebsd.org
Message-ID: <1390909590119-5880672.post@n5.nabble.com>
Subject: Jails on fib problem
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 11:46:31 -0000

Hi.
I'm trying to setup a pool of jails, with the gateway to the jails as a fib
device. All jails reside on cloned interface IP xxx.xxx.x.1/28 as gateway
(fib 1). Jail IP's start from xxx.xxx.x.2/32. The fib seems to be limited to
one jail only. That is, the first jail to grab the fib seems to keep control
of it and traffic from other jails does not get routed to the public
gateway.

Do I need to be using one-fib-per-jail? Does each /32 jail require its own
fib device?

Thanks.


-----
FreeBSD-11-current_amd64_root-on-zfs_RadeonKMS
--
View this message in context: http://freebsd.1045724.n5.nabble.com/Jails-on-fib-problem-tp5880672.html
Sent from the freebsd-net mailing list archive at Nabble.com.

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 13:07:48 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B359D243
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 13:07:48 +0000 (UTC)
Received: from mail2.dataoppdrag.no (mail2.dataoppdrag.no
 [IPv6:2a02:f58:7:2::2])
 by mx1.freebsd.org (Postfix) with ESMTP id 6E6F01ADA
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 13:07:48 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by mail2.dataoppdrag.no (Postfix) with ESMTP id A88004330A
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 14:07:39 +0100 (CET)
Received: from mail2.dataoppdrag.no ([127.0.0.1])
 by localhost (mail2.dataoppdrag.no [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id pS4AXgRqKAhE for <freebsd-net@freebsd.org>;
 Tue, 28 Jan 2014 14:07:39 +0100 (CET)
Received: from [172.20.10.252] (42-80-141-95.net.dataoppdrag.no [95.141.80.42])
 by mail2.dataoppdrag.no (Postfix) with ESMTP id 82B8F43307
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 14:07:39 +0100 (CET)
Message-ID: <52E7AB9B.5050707@dataoppdrag.no>
Date: Tue, 28 Jan 2014 14:07:39 +0100
From: Ole Myhre <ole@dataoppdrag.no>
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: freebsd-net@freebsd.org
Subject: carp and rtadvd
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 13:07:48 -0000

Hi,

I have a simple setup with two 10.0-RELEASE firewalls running carp, a
virtual IPv6 address and running rtadvd:

(applied to both firewalls)

# kldload carp
# ifconfig em2 inet6 2001:db8::1/64 vhid 1 up
# sysctl net.inet6.ip6.forwarding=1
# echo 'rtadvd_enable="YES"' >> /etc/rc.conf
# echo 'rtadvd_interfaces="em2"' >> /etc/rc.conf
# service rtadvd start

This works fine, one firewall is MASTER, the other BACKUP and the
clients behind em2 gets a prefix in the 2001:db8::/64 subnet. However
both firewalls are sending router advertisements (only one being MASTER)
with the LL-address of the physical em2 interface as the gateway. This
causes clients that supports multiple default gateways to select both
firewalls as their default gateway, and sending traffic to both the
MASTER and BACKUP firewall.

Is there a way to make only the MASTER send router advertisements or
(preferably only the MASTER) sending router advertisements with a
virtual LL-address?

Thanks,
Ole Myhre

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 13:16:02 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 74C6E39F
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 13:16:02 +0000 (UTC)
Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 419161C87
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 13:16:02 +0000 (UTC)
Received: from Julian-MBP3.local
 (50-196-156-133-static.hfc.comcastbusiness.net [50.196.156.133])
 (authenticated bits=0)
 by vps1.elischer.org (8.14.7/8.14.7) with ESMTP id s0SD0ETh016578
 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Tue, 28 Jan 2014 05:00:26 -0800 (PST)
 (envelope-from julian@freebsd.org)
Message-ID: <52E7A9D8.30604@freebsd.org>
Date: Tue, 28 Jan 2014 21:00:08 +0800
From: Julian Elischer <julian@freebsd.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: Beeblebrox <zaphod@berentweb.com>, freebsd-net@freebsd.org
Subject: Re: Jails on fib problem
References: <1390909590119-5880672.post@n5.nabble.com>
In-Reply-To: <1390909590119-5880672.post@n5.nabble.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 13:16:02 -0000

On 1/28/14, 7:46 PM, Beeblebrox wrote:
> Hi.
> I'm trying to setup a pool of jails, with the gateway to the jails as a fib
> device.
what's a fib device? Do you mean each jail has a different default fib?
you are not using vimage jails?

>   All jails reside on cloned interface IP xxx.xxx.x.1/28 as gateway
> (fib 1).
so they all have the same address?? can you even do that? or you mean 
that they all have the same
default route?
>   Jail IP's start from xxx.xxx.x.2/32. The fib seems to be limited to
> one jail only. That is, the first jail to grab the fib seems to keep control
> of it and traffic from other jails does not get routed to the public
> gateway.
multiple jails can use the same fib, but I think you are confused bout 
what is going on.

> Do I need to be using one-fib-per-jail? Does each /32 jail require its own
> fib device?
fibs don't have devices.

I'm having a hard time working out what you are trying to do.


> Thanks.
>
>
>
> -----
> FreeBSD-11-current_amd64_root-on-zfs_RadeonKMS
> --
> View this message in context: http://freebsd.1045724.n5.nabble.com/Jails-on-fib-problem-tp5880672.html
> Sent from the freebsd-net mailing list archive at Nabble.com.
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>


From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 13:18:59 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C751453F
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 13:18:59 +0000 (UTC)
Received: from mail-pd0-x232.google.com (mail-pd0-x232.google.com
 [IPv6:2607:f8b0:400e:c02::232])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9D3221CA8
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 13:18:59 +0000 (UTC)
Received: by mail-pd0-f178.google.com with SMTP id y13so345466pdi.9
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 05:18:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=NppNPFbtAZaMNcuSlf8zVCmYHc78jXk09R1HsX3FnEM=;
 b=n0tQ+VlKn94saS80o1hM9IHXKOzg1jTEDbV9FhiBpJzhGTu8n7n4nBsiedCjRXZZqN
 ks2qW6aSZkTlxgkg0J53U+uh/YLOcBt2bt3rq08BhFa/dYn9w4dWbUOX6ZRmHVijct6k
 2KDM4OZ5pBlAqA7HiHKClSHy6/bh38143yxiHumA2DAHwf0yNAkDAumFJrwWNUv+KK5D
 NHpIdP7Muk6jtQX8K9+fM804NGwYBA2cUP4ojr3U2PqK2znIrFR2wBsQgW+p8zEPcmpp
 UjdtoWdzYH4Ue0n6vGpFiqzmJOOJWDYpSTSj0tRp5duof2yBgdRLbUkUZYoF3GWRElvl
 3gSQ==
MIME-Version: 1.0
X-Received: by 10.66.221.199 with SMTP id qg7mr1530641pac.88.1390915139241;
 Tue, 28 Jan 2014 05:18:59 -0800 (PST)
Sender: ermal.luci@gmail.com
Received: by 10.70.46.42 with HTTP; Tue, 28 Jan 2014 05:18:59 -0800 (PST)
In-Reply-To: <52E7AB9B.5050707@dataoppdrag.no>
References: <52E7AB9B.5050707@dataoppdrag.no>
Date: Tue, 28 Jan 2014 14:18:59 +0100
X-Google-Sender-Auth: ZJ4gFAQdDsvV9AUAKF9BCJTvfqE
Message-ID: <CAPBZQG2VQStkTW1wLJQiQZvGRnoOmFzf9zynHXXtv1A3ngucxg@mail.gmail.com>
Subject: Re: carp and rtadvd
From: =?ISO-8859-1?Q?Ermal_Lu=E7i?= <eri@freebsd.org>
To: Ole Myhre <ole@dataoppdrag.no>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
Cc: freebsd-net <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 13:18:59 -0000

On Tue, Jan 28, 2014 at 2:07 PM, Ole Myhre <ole@dataoppdrag.no> wrote:

> Hi,
>
> I have a simple setup with two 10.0-RELEASE firewalls running carp, a
> virtual IPv6 address and running rtadvd:
>
> (applied to both firewalls)
>
> # kldload carp
> # ifconfig em2 inet6 2001:db8::1/64 vhid 1 up
> # sysctl net.inet6.ip6.forwarding=1
> # echo 'rtadvd_enable="YES"' >> /etc/rc.conf
> # echo 'rtadvd_interfaces="em2"' >> /etc/rc.conf
> # service rtadvd start
>
> This works fine, one firewall is MASTER, the other BACKUP and the
> clients behind em2 gets a prefix in the 2001:db8::/64 subnet. However
> both firewalls are sending router advertisements (only one being MASTER)
> with the LL-address of the physical em2 interface as the gateway. This
> causes clients that supports multiple default gateways to select both
> firewalls as their default gateway, and sending traffic to both the
> MASTER and BACKUP firewall.
>
> Is there a way to make only the MASTER send router advertisements or
> (preferably only the MASTER) sending router advertisements with a
> virtual LL-address?
>
>
You have to use the rtadvd patched from pfSense.
Look at our tools repo to get the code.


> Thanks,
> Ole Myhre
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>


-- 
Ermal

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 14:55:59 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A5DA17F9;
 Tue, 28 Jan 2014 14:55:59 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id 42B59151A;
 Tue, 28 Jan 2014 14:55:58 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqQEAFjE51KDaFve/2dsb2JhbABag0RWgn25DE+BJXSCJQEBAQMBAQEBIAQnIAsFFhgCAg0ZAikBCSYGCAcEARwEh1wIDal5n3MXgSmNBQEBGwEzB4JvgUkEiUmMDIQFkG2DSx4xgQQ5
X-IronPort-AV: E=Sophos;i="4.95,736,1384318800"; d="scan'208";a="91045198"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 28 Jan 2014 09:55:57 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 70951B4089;
 Tue, 28 Jan 2014 09:55:57 -0500 (EST)
Date: Tue, 28 Jan 2014 09:55:57 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: J David <j.david.lists@gmail.com>
Message-ID: <1098090585.17554698.1390920957454.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RQjxoe_hQitA7z=LsPx1hyhu7ci9u7G9Ne_KFHTimw+Zw@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.209]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, wollman@freebsd.org,
 Bryan Venteicher <bryanv@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 14:55:59 -0000

J David wrote:
> Another way to test this is to instrument the virtio driver, which
> turned out to be very straightforward:
> 
> Index: if_vtnet.c
> 
> ===================================================================
> 
> --- if_vtnet.c (revision 260701)
> 
> +++ if_vtnet.c (working copy)
> 
> @@ -1886,6 +1887,7 @@
> 
>   return (virtqueue_enqueue(vq, txhdr, &sg, sg.sg_nseg, 0));
> 
> 
> 
>  fail:
> 
> + sc->vtnet_stats.tx_excess_mbuf_drop++;
> 
>   m_freem(*m_head);
> 
>   *m_head = NULL;
> 
> 
> 
> @@ -2645,6 +2647,9 @@
> 
>   SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "tx_task_rescheduled",
> 
>       CTLFLAG_RD, &stats->tx_task_rescheduled,
> 
>       "Times the transmit interrupt task rescheduled itself");
> 
> + SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "tx_excess_mbuf_drop",
> 
> +     CTLFLAG_RD, &stats->tx_excess_mbuf_drop,
> 
> +     "Times packets were dropped due to excess mbufs");
> 
>  }
> 
> 
> 
>  static int
> 
> Index: if_vtnetvar.h
> 
> ===================================================================
> 
> --- if_vtnetvar.h (revision 260701)
> 
> +++ if_vtnetvar.h (working copy)
> 
> @@ -48,6 +48,7 @@
> 
>   unsigned long tx_csum_bad_ethtype;
> 
>   unsigned long tx_tso_bad_ethtype;
> 
>   unsigned long tx_task_rescheduled;
> 
> + unsigned long tx_excess_mbuf_drop;
> 
>  };
> 
> 
> 
>  struct vtnet_softc {
> 
> 
> This patch didn't seem harmful from a performance standpoint since if
> things are working, the counter increment never gets hit.
> 
> With this change, I re-ran some 64k tests.  I found that the number
> of
> drops was very small, but not zero.
> 
> On the client, doing the write-append test (which has no reads), it
> seems like it slowly builds up 8 with what appears to be some sort of
> back off (each one takes longer to appear than the last):
> 
> 
> $ sysctl dev.vtnet.1.tx_excess_mbuf_drop
> 
> dev.vtnet.1.tx_excess_mbuf_drop: 8
> 
> 
> But after 8, it appears congestion control is clamped down so hard
> that no more happen.
> 
> Once read activity starts, the server builds up more:
> 
> dev.vtnet.1.tx_excess_mbuf_drop: 53
> 
> 
> So while there aren't a lot of these, they definitely do exist and
> there's just no way they're good for performance.
> 
It would be nice to also count the number of times m_collapse() gets
called, since that will generate a lot of overhead that I think will
show up on your test, since you don't have any disk activity.

And I'd state that having any of these is near-disastrous for performance,
since it means a timeout/retransmit of a TCP segment. For a lan environment,
I would consider 1 timeout/retransmit in a million packets as a lot.

rick
ps: I've cc'd Bryan, since he's the guy handling virtio, I think.

> Thanks!
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 15:10:22 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9104BD63;
 Tue, 28 Jan 2014 15:10:22 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id 4075F1666;
 Tue, 28 Jan 2014 15:10:21 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEAN/H51KDaFve/2dsb2JhbABahBqCfblbgSV0giUBAQEEI1YbGAICDRkCWQaIGKoJn3MXgSmNIjQHgm+BSQSJSaB+g0segW4
X-IronPort-AV: E=Sophos;i="4.95,736,1384318800"; d="scan'208";a="91051562"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 28 Jan 2014 10:10:20 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id EC933B4022;
 Tue, 28 Jan 2014 10:10:20 -0500 (EST)
Date: Tue, 28 Jan 2014 10:10:20 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: J David <j.david.lists@gmail.com>
Message-ID: <1614542711.17567039.1390921820957.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RRHWd2xo0NChe2KGRbOxUeTJxBh=oH3JfJu+EFV-C61pQ@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.209]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, wollman@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 15:10:22 -0000

J David wrote:
> A few questions as I try to parse the various patches floating around
> for this.
> 
> What's the difference between src/sys/nfsserver and
> src/sys/fs/nfsserver?  It looks like maybe the former is for NFSv3
> and
> the latter is for NFSv4?
> 
> If so, these patches appear to be for the NFSv4 server.  Since we are
> using the NFSv3 server exclusively, does that mean we would need to
> do
> something similar somewhere in the neighborhood of line 930 of
> src/sys/nfsserver/nfs_serv.c?
> 
> Also, these patches appear server-side.  To make sure things flow
> smoothly, will additional client-side changes be necessary?  There is
> some MGET/MCLGET in src/sys/nfsclient/nfs_subs.c.  (The equivalent in
> src/sys/fs/nfsclient/nfs_clcomsubs.c appear to be using the
> NFSMGET/NFSMCLGET macros, so presumably those are handled?)
> 
> In any case, the switch from 2k to 4k mbufs and m_getm2 seems well
> worthwhile regardless of whether it addresses this specific issue.
>  It
> should reduce a lot of overhead in many common cases.
> 
> If my understanding isn't too far off, I can take a whack at testing
> the result, but only on NFSv3.
> 
> Thanks!
> 
I think Garrett clarified which sources are which. The attached simple
patch makes both the new/default client and new/default server use
MJUMPAGESIZE clusters. (It is the one I already mentioned, called 4kmcl.patch.)

Garrett's patch using m_getm2() would only affect the server side read,
but not client write or server side readdir. (It can probably be combined
with my simple one, but I haven't tested that.)
4kmcl.patch is not ready for head (as John Mark-Gurney pointed out, it does
4K clusters for readlink and it also does 4K clusters for all the small RPC
messages), but it works ok for testing to see if it gets rid of the drops
and calls to m_collapse().

Since you are using 9.2-release, you have the DRC changes. At some point,
you can try setting these in the server (they reduce CPU overheads by allowing
the DRC to grow, holding onto more mbufs). Btw, head (and I think stable/9,10)
have been significantly changed by Alexander Motin's recent commits, although
these sysctls still exist.
vfs.nfsd.tcphighwater=100000
vfs.nfsd.tcpcachetimeout=600

rick

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 15:35:43 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A9D11D58
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 15:35:43 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 5B7E91869
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 15:35:42 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqQEANzN51KDaFve/2dsb2JhbABXA4NEVoJ9uQxPgSV0giUBAQEDAQEBASArHgIIAwUWGAICDRkCKQEJJgYIBwQBHAEDh1wIDaoHn3QXgSmMdAoGAgEbJBAHEYIeQIFJBIlJjAyEBZBtg0seMXtC
X-IronPort-AV: E=Sophos;i="4.95,736,1384318800"; d="scan'208";a="91060302"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 28 Jan 2014 10:35:41 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 50745B4065;
 Tue, 28 Jan 2014 10:35:41 -0500 (EST)
Date: Tue, 28 Jan 2014 10:35:41 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: John-Mark Gurney <jmg@funkthat.com>
Message-ID: <372707859.17587309.1390923341323.JavaMail.root@uoguelph.ca>
In-Reply-To: <20140128021450.GY13704@funkthat.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, Adam McDougall <mcdouga9@egr.msu.edu>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 15:35:43 -0000

John-Mark Gurney wrote:
> Rick Macklem wrote this message on Mon, Jan 27, 2014 at 20:32 -0500:
> > John-Mark Gurney wrote:
> > > Rick Macklem wrote this message on Mon, Jan 27, 2014 at 18:47
> > > -0500:
> > > > John-Mark Gurney wrote:
> > > > > Rick Macklem wrote this message on Sun, Jan 26, 2014 at 21:16
> > > > > -0500:
> > > > > > Btw, thanks go to Garrett Wollman for suggesting the change
> > > > > > to
> > > > > > MJUMPAGESIZE
> > > > > > clusters.
> > > > > > 
> > > > > > rick
> > > > > > ps: If the attachment doesn't make it through and you want
> > > > > > the
> > > > > > patch, just
> > > > > >     email me and I'll send you a copy.
> > > > > 
> > > > > The patch looks good, but we probably shouldn't change
> > > > > _readlink..
> > > > > The chances of a link being >2k are pretty slim, and the
> > > > > chances
> > > > > of
> > > > > the link being >32k are even smaller...
> > > > > 
> > > > Yea, I already thought of that, actually. However, see below
> > > > w.r.t.
> > > > NFSv4.
> > > > 
> > > > However, at this point I
> > > > mostly want to find out if it the long mbuf chain that causes
> > > > problems
> > > > for TSO enabled network interfaces.
> > > 
> > > I agree, though a long mbuf chain is more of a driver issue than
> > > an
> > > NFS issue...
> > > 
> > Yes, if my hunch is correct, it is. If my hunch gets verified, I
> > will
> > be posting w.r.t. how best to deal with the problem. I suspect a
> > patch
> > like this one might serve as a useful work-around while the drivers
> > gets fixed, if the hunch is correct.
> 
> It would be nice to have a way to force such a segment to go out to
> the drivers to make debugging/testing drivers easier...  I'm not sure
> the best way to handle that though...
> 
> > > > > In fact, we might want to switch _readlink to MGET (could be
> > > > > conditional
> > > > > upon cnt) so that if it fits in an mbuf we don't allocate a
> > > > > cluster
> > > > > for
> > > > > it...
> > > > > 
> > > > For NFSv4, what was an RPC for NFSv3 becomes one of several
> > > > Ops. in
> > > > a compound RPC. As such, there is no way to know how much
> > > > additional
> > > > RPC message there will be. So, although the readlink reply
> > > > won't
> > > > use
> > > > much of the 4K allocation, replies for subsequent Ops. in the
> > > > compound
> > > > certainly could. (Is it more efficient to allocate 4K now and
> > > > use
> > > > part of it for subsequent message reply stuff or allocate
> > > > additional
> > > > mbuf clusters later for subsequent stuff, as required? On a
> > > > small
> > > > memory constrained machine, I suspect the latter is correct,
> > > > but
> > > > for
> > > > the kind of hardware that has TSO scatter/gather enabled
> > > > network
> > > > interfaces, I'm not so sure. At this point, I wouldn't even say
> > > > that using 4K clusters is going to be a win and my hunch is
> > > > that
> > > > any win wouldn't apply to small memory constrained machines.)
> > > 
> > > Though the code that was patched wasn't using any partial
> > > buffers,
> > > it was always allocating a new buffer...  If the code in
> > > _read/_readlinks starts using a previous mbuf chain, then
> > > obviously
> > > things are different and I'd agree, always allocating a 2k/4k
> > > cluster makes sense...
> > > 
> > Yes, but nd_mb and nd_bpos are set, which means subsequent replies
> > can
> > use the remainder of the cluster.
> 
> Couldn't we scan the list of replies, find out how much data we need,
> m_getm the space for it all (which will use 4k clusters as
> necessary)?
> 
The NFSv4 server parses the compound as it processes it. It must keep
things like current-filehandle and saved-filehandle between RPCs and
things like the attributes are a lot of work to parse, so I don't think
two passes through a request is warranted. Also, there is no way of knowing
how big a reply is until you execute the reply, although you can "guess"
at it.

I never intended to imply that the patch I emailed is ready for head.
It does 4K clusters for all RPCs, even ones known to be small (as in
client side Getattr/Lookup requests).

Since messgaes are sent quickly and then mbufs released, except for
the DRC in the server, I think avoiding large allocations for server
replies that may be cached is the case to try and avoid. Fortunately
the large replies will be for read and readdir and these don't need
to be cached by the DRC. As such, a patch that uses 4K clusters in
the server for read, readdir and 4K clusters for write requests in
the client, should be appropriate, I think?
(And, yes, I think you are correct that readlink is better off with
 a MCLBYTES cluster.)
The coding is straightforward, but the patch will be fairly large,
since readdir in the server uses NFSM_BUILD(), that in turn uses
NFSMCLGET(). These will need an extra "do a big cluster" argument.
For initial testing, it was just simpler to make them all big.

rick

> > Why does it always allocate a new cluster? Well, because the code
> > is
> > OLD. It was written for OpenBSD2.6 and, at that time, I tried to
> > make
> > it portable across the BSDen. I'm not so concerned w.r.t. its
> > portability
> > now, since no one else is porting it and I don't plan to, but I
> > still
> > think it would be nice if it were portable to other BSDen.
> > Back when I wrote it, I believe that MCLBYTES was 1K and an entire
> > cluster was needed. (To be honest, I found out that FreeBSD's
> > NCLBYTES
> > is 2K about 2 days ago, when I started looking at this stuff.)
> > 
> > Could it now look to see if enough bytes (a little over 1K) were
> > available
> > in the current cluster and use that. Yes, but it would reduce the
> > portability
> > of the code and I don't think it would make a measurable difference
> > performance
> > wise.
> 
> Are you sure it would reduce the portability?  I can't think of a way
> it would...  Some code will always need to be written for
> portability..
> 
Well, I had it ported to OpenBSD, FreeBSD6 and Mac OS X 10.3 by using
the NFSMCLGET() macro. If it uses things like m_getm2() and separate
uma zones, I don't know how much extra work would be needed for other
BSDen, since I have no idea which BSDen have these things?

> > > > My test server has 256Mbytes of ram and it certainly doesn't
> > > > show
> > > > any improvement (big surprise;-), but it also doesn't show any
> > > > degradation for the limited testing I've done.
> > > 
> > > I'm not too surprised, unless you're on a heavy server pushing
> > > >200MB/sec, the allocation cost is probably cheap enough that it
> > > doesn't show up...  going to 4k means immediately half as many
> > > mbufs
> > > are needed/allocated, and as they are page sized, don't have the
> > > problems of physical memory fragmentation, nor do they have to do
> > > an
> > > IPI/tlb shoot down in the case of multipage allocations...  (I'm
> > > dealing w/ this for geli.)
> > > 
> > Yes, Garrett Wollman proposed this and I suspect there might be a
> > performance gain for larger systems. He has a more involved patch.
> > To be honest, if Garrett is convinced that his patch is of benefit
> > performance wise, I will do a separate posting w.r.t. it and
> > whether
> > or not it is appropriate to be committed to head, etc.
> > 
> > > > Again, my main interest at this point is whether reducing the
> > > > number of mbufs in the chain fixes the TSO issues. I think
> > > > the question of whether or not 4K clusters are performance
> > > > improvement in general, is an interesting one that comes later.
> > > 
> > > Another thing I noticed is that we are getting an mbuf and then
> > > allocating a cluster... Is there a reason we aren't using
> > > something
> > > like m_getm or m_getcl?  We have a special uma zone that has
> > > mbuf and mbuf cluster already paired meaning we save some lock
> > > operations for each segment allocated...
> > > 
> > See above w.r.t. OLD portable code. There was a time when MGETCL()
> > wasn't guaranteed to succeed even when M_WAITOK is specified.
> > This is also why there is that weird loop in the NFSMCLGET() macro.
> 
> Correct, but as you wrapped them in NFS* macros, it doesn't mean you
> can't merge the MGETCL w/ NFSMCLGET into a new function that merges
> the two...  It's just another (not too difficult) wrapper that the
> porter has to write...
> 
> Though apparently portability has been given up since you use MCLGET
> directly in nfsserver/nfs_nfsdport.c instead of NFSMCLGET...
> 
> Sounds like nfsport.h needs some updating....
> 
The files with "port" in the names are re-written for each port. They
were generated by cribbing code from the extant client/server. (Without
looking, I'd guess you find MGET(), MCLGET() in the old FreeBSD server,
or maybe it was inherited from OpenBSD 2.6.)

Everything can be re-written, but why do so if the old code still works.
I'm one guy who does this as a spare time unpaid hobby and I'm working
on 4.1 server code these days.

> > (I think there was a time in FreeBSD's past when allocation was
> > never
> >  guaranteed and the rest of the code doesn't tolerate a NULL mbuf
> >  ptr.
> >  Something like M_TRYWAIT in old versions of FreeBSD?)
> 
> Correct, there was a time that M_WAITOK could still return, but it
> was
> many years ago and many releases ago...
> 
> > Btw, Garrett Wollman's patch uses m_getm2() to get the mbuf list.
> 
> Interestingly, m_getm2 will use 4k clusters as necessary, and in
> the _readlink case, do the correct thing...
> 
> Hmmm... m_getm2 isn't documented...  It was added by andre almost 7
> years ago...  It does appear to be a public interface as ofed, sctp
> iscsi and ng(_tty) all use it, though only sctp appears to use it any
> differently than m_getm.. The rest could simply use m_getm instead
> of m_getm2...   Considering it was committed the day before SCTP was
> committed, I'm not too surprised...
> 
> P.S. if someone wants to submit a patch to mbuf.9 to update the docs
> that would be helpful... I'll review and commit...  and m_append is
> also undocumented...
> --
>   John-Mark Gurney				Voice: +1 415 225 5579
> 
>      "All that I will do, has been done, All that I have, has not."
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 15:37:39 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id E553CE08
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 15:37:38 +0000 (UTC)
Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.116.12])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 687CB1882
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 15:37:38 +0000 (UTC)
Received: from th-04.cs.huji.ac.il ([132.65.80.125])
 by kabab.cs.huji.ac.il with esmtp
 id 1W8AjD-0004Ad-30; Tue, 28 Jan 2014 17:37:31 +0200
Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\))
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: Daniel Braniss <danny@cs.huji.ac.il>
In-Reply-To: <482557096.17290094.1390873872231.JavaMail.root@uoguelph.ca>
Date: Tue, 28 Jan 2014 17:37:20 +0200
Message-Id: <59178C23-A863-40AF-922E-C0A16D12ECE9@cs.huji.ac.il>
References: <482557096.17290094.1390873872231.JavaMail.root@uoguelph.ca>
To: Rick Macklem <rmacklem@uoguelph.ca>
X-Mailer: Apple Mail (2.1827)
Content-Type: text/plain;
	charset=windows-1252
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
Cc: Pyun YongHyeon <pyunyh@gmail.com>, FreeBSD Net <freebsd-net@freebsd.org>,
 Adam McDougall <mcdouga9@egr.msu.edu>, Jack Vogel <jfvogel@gmail.com>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 15:37:39 -0000


On Jan 28, 2014, at 3:51 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Jack Vogel wrote:
>> That header file is for the VF driver :) which I don't believe is
>> being
>> used in this case.
>> The driver is capable of handling 256K but its limited by the stack
>> to 64K
>> (look in
>> ixgbe.h), so its not a few bytes off due to the vlan header.
>>=20
>> The scatter size is not an arbitrary one, its due to hardware
>> limitations
>> in Niantic
>> (82599).  Turning off TSO in the 10G environment is not practical,
>> you will
>> have
>> trouble getting good performance.
>>=20
>> Jack
>>=20
> Well, if you look at this thread, Daniel got much better performance
> by turning off TSO. However, I agree that this is not an ideal =
solution.
> =
http://docs.FreeBSD.org/cgi/mid.cgi?2C287272-7B57-4AAD-B22F-6A65D9F8677B
>=20
> rick
>=20
>>=20
>>=20
>> On Mon, Jan 27, 2014 at 4:58 PM, Yonghyeon PYUN <pyunyh@gmail.com>
>> wrote:
>>=20
>>> On Mon, Jan 27, 2014 at 06:27:19PM -0500, Rick Macklem wrote:
>>>> pyunyh@gmail.com wrote:
>>>>> On Sun, Jan 26, 2014 at 09:16:54PM -0500, Rick Macklem wrote:
>>>>>> Adam McDougall wrote:
>>>>>>> Also try rsize=3D32768,wsize=3D32768 in your mount options,
>>>>>>> made a
>>>>>>> huge
>>>>>>> difference for me.  I've noticed slow file transfers on NFS
>>>>>>> in 9
>>>>>>> and
>>>>>>> finally did some searching a couple months ago, someone
>>>>>>> suggested
>>>>>>> it
>>>>>>> and
>>>>>>> they were on to something.
>>>>>>>=20
>>>>>> I have a "hunch" that might explain why 64K NFS reads/writes
>>>>>> perform
>>>>>> poorly for some network environments.
>>>>>> A 64K NFS read reply/write request consists of a list of 34
>>>>>> mbufs
>>>>>> when
>>>>>> passed to TCP via sosend() and a total data length of around
>>>>>> 65680bytes.
>>>>>> Looking at a couple of drivers (virtio and ixgbe), they seem
>>>>>> to
>>>>>> expect
>>>>>> no more than 32-33 mbufs in a list for a 65535 byte TSO xmit.
>>>>>> I
>>>>>> think
>>>>>> (I don't have anything that does TSO to confirm this) that
>>>>>> NFS will
>>>>>> pass
>>>>>> a list that is longer (34 plus a TCP/IP header).
>>>>>> At a glance, it appears that the drivers call m_defrag() or
>>>>>> m_collapse()
>>>>>> when the mbuf list won't fit in their scatter table (32 or 33
>>>>>> elements)
>>>>>> and if this fails, just silently drop the data without
>>>>>> sending it.
>>>>>> If I'm right, there would considerable overhead from
>>>>>> m_defrag()/m_collapse()
>>>>>> and near disaster if they fail to fix the problem and the
>>>>>> data is
>>>>>> silently
>>>>>> dropped instead of xmited.
>>>>>>=20
>>>>>=20
>>>>> I think the actual number of DMA segments allocated for the
>>>>> mbuf
>>>>> chain is determined by bus_dma(9).  bus_dma(9) will coalesce
>>>>> current segment with previous segment if possible.
>>>>>=20
>>>> Ok, I'll have to take a look, but I thought that an array of
>>>> sized
>>>> by "num_segs" is passed in as an argument. (And num_segs is set
>>>> to
>>>> either IXGBE_82598_SCATTER (100) or IXGBE_82599_SCATTER (32).)
>>>> It looked to me that the ixgbe driver called itself ix, so it
>>>> isn't
>>>> obvious to me which we are talking about. (I know that Daniel
>>>> Braniss
>>>> had an ix0 and ix1, which were fixed for NFS by disabling TSO.)
>>>>=20
>>>=20
>>> It's ix(4). ixbge(4) is a different driver.
>>>=20
this brings a sore problem, in 9.2-stable there is no man page for ix.
also, the man page for ixbge does not mention the 82599EB, the only way
I know it=92s the ixbge it=92s because i did:
	pciconf -lv
then  grep -r 82599EB sys/dev and found the driver source.

I will try rick=92s patch over the weekend.

danny

>>>> I'll admit I mostly looked at virtio's network driver, since that
>>>> was the one being used by J David.
>>>>=20
>>>> Problems w.r.t. TSO enabled for NFS using 64K rsize/wsize have
>>>> been
>>>> cropping up for quite a while, and I am just trying to find out
>>>> why.
>>>> (I have no hardware/software that exhibits the problem, so I can
>>>> only look at the sources and ask others to try testing stuff.)
>>>>=20
>>>>> I'm not sure whether you're referring to ixgbe(4) or ix(4) but
>>>>> I
>>>>> see the total length of all segment size of ix(4) is 65535 so
>>>>> it has no room for ethernet/VLAN header of the mbuf chain.  The
>>>>> driver should be fixed to transmit a 64KB datagram.
>>>> Well, if_hw_tsomax is set to 65535 by the generic code (the
>>>> driver
>>>> doesn't set it) and the code in tcp_output() seems to subtract
>>>> the
>>>> size of an tcp/ip header from that before passing data to the
>>>> driver,
>>>> so I think the mbuf chain passed to the driver will fit in one
>>>> ip datagram. (I'd assume all sorts of stuff would break for TSO
>>>> enabled drivers if that wasn't the case?)
>>>=20
>>> I believe the generic code is doing right.  I'm under the
>>> impression the non-working TSO indicates a bug in driver.  Some
>>> drivers didn't account for additional ethernet/VLAN header so the
>>> total size of DMA segments exceeded 65535.  I've attached a diff
>>> for ix(4). It wasn't tested at all as I don't have hardware to
>>> test.
>>>=20
>>>>=20
>>>>> I think the use of m_defrag(9) in TSO is suboptimal. All TSO
>>>>> capable controllers are able to handle multiple TX buffers so
>>>>> it
>>>>> should have used m_collapse(9) rather than copying entire chain
>>>>> with m_defrag(9).
>>>>>=20
>>>> I haven't looked at these closely yet (plan on doing so to-day),
>>>> but
>>>> even m_collapse() looked like it copied data between mbufs and
>>>> that
>>>> is certainly suboptimal, imho. I don't see why a driver can't
>>>> split
>>>> the mbuf list, if there are too many entries for the
>>>> scatter/gather
>>>> and do it in two iterations (much like tcp_output() does already,
>>>> since the data length exceeds 65535 - tcp/ip header size).
>>>>=20
>>>=20
>>> It can split the mbuf list if controllers supports increased number
>>> of TX buffers.  Because controller shall consume the same number of
>>> DMA descriptors for the mbuf list, drivers tend to impose a limit
>>> on the number of TX buffers to save resources.
>>>=20
>>>> However, at this point, I just want to find out if the long chain
>>>> of mbufs is why TSO is problematic for these drivers, since I'll
>>>> admit I'm getting tired of telling people to disable TSO (and I
>>>> suspect some don't believe me and never try it).
>>>>=20
>>>=20
>>> TSO capable controllers tend to have various limitations(the first
>>> TX buffer should have complete ethernet/IP/TCP header, ip_len of IP
>>> header should be reset to 0, TCP pseudo checksum should be
>>> recomputed etc) and cheap controllers need more assistance from
>>> driver to let its firmware know various IP/TCP header offset
>>> location in the mbuf.  Because this requires a IP/TCP header
>>> parsing, it's error prone and very complex.
>>>=20
>>>>>> Anyhow, I have attached a patch that makes NFS use
>>>>>> MJUMPAGESIZE
>>>>>> clusters,
>>>>>> so the mbuf count drops from 34 to 18.
>>>>>>=20
>>>>>=20
>>>>> Could we make it conditional on size?
>>>>>=20
>>>> Not sure what you mean? If you mean "the size of the read/write",
>>>> that would be possible for NFSv3, but less so for NFSv4. (The
>>>> read/write
>>>> is just one Op. in the compound for NFSv4 and there is no way to
>>>> predict how much more data is going to be generated by subsequent
>>>> Ops.)
>>>>=20
>>>=20
>>> Sorry, I should have been more clearer. You already answered my
>>> question.  Thanks.
>>>=20
>>>> If by "size" you mean amount of memory in the machine then, yes,
>>>> it
>>>> certainly could be conditional on that. (I plan to try and look
>>>> at
>>>> the allocator to-day as well, but if others know of disadvantages
>>>> with
>>>> using MJUMPAGESIZE instead of MCLBYTES, please speak up.)
>>>>=20
>>>> Garrett Wollman already alluded to the MCLBYTES case being
>>>> pre-allocated,
>>>> but I'll admit I have no idea what the implications of that are
>>>> at this
>>>> time.
>>>>=20
>>>>>> If anyone has a TSO scatter/gather enabled net interface and
>>>>>> can
>>>>>> test this
>>>>>> patch on it with NFS I/O (default of 64K rsize/wsize) when
>>>>>> TSO is
>>>>>> enabled
>>>>>> and see what effect it has, that would be appreciated.
>>>>>>=20
>>>>>> Btw, thanks go to Garrett Wollman for suggesting the change
>>>>>> to
>>>>>> MJUMPAGESIZE
>>>>>> clusters.
>>>>>>=20
>>>>>> rick
>>>>>> ps: If the attachment doesn't make it through and you want
>>>>>> the
>>>>>> patch, just
>>>>>>    email me and I'll send you a copy.
>>>>>>=20
>>>=20
>>> _______________________________________________
>>> freebsd-net@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to
>>> "freebsd-net-unsubscribe@freebsd.org"
>>>=20
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to
>> "freebsd-net-unsubscribe@freebsd.org"


From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 17:17:46 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C71FDDDB
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 17:17:46 +0000 (UTC)
Received: from mail-vc0-x234.google.com (mail-vc0-x234.google.com
 [IPv6:2607:f8b0:400c:c03::234])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7C1A412E9
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 17:17:46 +0000 (UTC)
Received: by mail-vc0-f180.google.com with SMTP id ks9so438281vcb.11
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 09:17:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=berentweb.com; s=google;
 h=mime-version:reply-to:sender:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=SnwwIytR+mJVZbBac9wiD0A0jNZD0tbQPgMJHOblfh0=;
 b=emZx+8wo3aeOGoIxNTjKqGsr2NdPCatFdyx6VPLmq1NPO6r5blX2A8mR6b7ZpVaENk
 4ohNYdu7Qjl0mgdxDp83Z7qMSd+Rox4nOXL9m1uPfubxlIl5UBfz5XMszpJlAcKVNMZt
 1ekxhTRpoLfrCfZGib2enBEDvAJIRvRvBEkek=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:reply-to:sender:in-reply-to
 :references:date:message-id:subject:from:to:cc:content-type;
 bh=SnwwIytR+mJVZbBac9wiD0A0jNZD0tbQPgMJHOblfh0=;
 b=h4PNqRXJbZIxlR4HY5fR2OWpRe+sspZfBc4q6e9RFX743OGYdQIYHnRmyx6iCigKtp
 7E/OwgZdRqtBsWRiY/Q83+FEu1o/BZ380yfCelgDqeRYZD0/jHOnbnIWdl/tqeTNajG1
 TMghJmfoxTPMt7UCd8mMWgdFZ8+hOTfOOWDbj3y7K1mcmhb4kLXKXI2YaFvoM2gLayPr
 g/7l+JaZRl2yXfjb2r4EJBKbC0bnlHsx/Xb631AhNkS68OgyVKFmE7SaBWknNDf047e3
 H21LV2XIzdP1O00eEdZ0MtONQU7jQqL6CesQpzeZ5B6RNmZn+uLHPKXPOaMh8BF42SA2
 oIKg==
X-Gm-Message-State: ALoCoQlhuPj+UdBYzwEmkQdiCWpgJU4ANsqFGNq5Q26SFjw5kZU48VRx3jwvQu3LGW+ExHzuy5oI
MIME-Version: 1.0
X-Received: by 10.52.116.71 with SMTP id ju7mr954206vdb.31.1390929465500; Tue,
 28 Jan 2014 09:17:45 -0800 (PST)
Sender: rsb@berentweb.com
Received: by 10.220.146.145 with HTTP; Tue, 28 Jan 2014 09:17:45 -0800 (PST)
X-Originating-IP: [83.66.215.241]
In-Reply-To: <52E7A9D8.30604@freebsd.org>
References: <1390909590119-5880672.post@n5.nabble.com>
 <52E7A9D8.30604@freebsd.org>
Date: Tue, 28 Jan 2014 19:17:45 +0200
X-Google-Sender-Auth: -IV9VPWfFpVpJ8UvXMb-CBHEQ0U
Message-ID: <CAPSTskviFBMHUXmWhTnGB13ZEYsyQm2_azHNJ65VG42oQLpmpQ@mail.gmail.com>
Subject: Re: Jails on fib problem
From: Beeblebrox <zaphod@berentweb.com>
To: Julian Elischer <julian@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: zaphod@berentweb.com
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 17:17:46 -0000

> what's a fib device? Do you mean each jail has a different default fib?
> you are not using vimage jails?

Hi Julian.
* No vimage
* All jails use the same fib. /etc/rc.conf:
cloned_interfaces="lo2"
ifconfig_lo2="inet 127.0.1.1/28"
static_routes="jail default"
route_jail="default 127.0.1.1 -fib 1"
route_default="default 192.168.1.1"

> so they all have the same address?? can you even do that? or you mean that
> they all have the same default route?
I mean same default route, jail IP's start from 127.0.1.2/32 and go to
127.0.1.6/32
jail.conf assigns fib with "exec.fib = 1;"
jails on the 127.0.1.1/28 subnet range should be able to route traffic
through the 127.0.0.1 gateway regardless of the fact that the jails
themselves reside on a /32 subnet. However, it's not working smoothly

> fibs don't have devices.
Yes, I know  - a misnomer.

setfib 1 netstat -rn
Destination        Gateway            Flags    Netif Expire
default            127.0.1.1          UGS       lo2
127.0.0.1          link#3             UH        lo0
127.0.1.1          link#4             UH        lo2
127.0.1.2          link#4             UH        lo2
127.0.1.3          link#4             UH        lo2
127.0.1.4          link#4             UH        lo2
192.168.1.0/24     link#1             U         re0  (Ext_If)
192.168.2.0/26     link#2             U         re1  (Lan_If)

To complicate things further, I also have a vboxnet0 for VBox guests.
127.0.1.2 is a dns jail for example. The Internal LAN clients,
vboxnet0 guests and lo0 need to resolve names from that jail.

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 19:57:00 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id ED002CA9;
 Tue, 28 Jan 2014 19:57:00 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id BD7F31353;
 Tue, 28 Jan 2014 19:57:00 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s0SJv0oP089165;
 Tue, 28 Jan 2014 19:57:00 GMT
 (envelope-from jmg@freefall.freebsd.org)
Received: (from jmg@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id s0SJv0WX089164;
 Tue, 28 Jan 2014 19:57:00 GMT (envelope-from jmg)
Date: Tue, 28 Jan 2014 19:57:00 GMT
Message-Id: <201401281957.s0SJv0WX089164@freefall.freebsd.org>
To: jmg@FreeBSD.org, freebsd-net@FreeBSD.org, jvf@FreeBSD.org
From: jmg@FreeBSD.org
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving
 out-of-order packet process and spurious RST
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 19:57:01 -0000

Synopsis: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST

Responsible-Changed-From-To: freebsd-net->jvf
Responsible-Changed-By: jmg
Responsible-Changed-When: Tue Jan 28 19:56:21 UTC 2014
Responsible-Changed-Why: 
assign this to Jack so he gets bugged about it weekly.. :)

http://www.freebsd.org/cgi/query-pr.cgi?pr=176446

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 20:29:12 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id D179D7EC;
 Tue, 28 Jan 2014 20:29:12 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id A3D311645;
 Tue, 28 Jan 2014 20:29:12 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s0SKTCNp097102;
 Tue, 28 Jan 2014 20:29:12 GMT
 (envelope-from jmg@freefall.freebsd.org)
Received: (from jmg@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id s0SKTBfS097101;
 Tue, 28 Jan 2014 20:29:11 GMT (envelope-from jmg)
Date: Tue, 28 Jan 2014 20:29:11 GMT
Message-Id: <201401282029.s0SKTBfS097101@freefall.freebsd.org>
To: sysop@prisjakt.nu, jmg@FreeBSD.org, freebsd-net@FreeBSD.org,
 freebsd-j@FreeBSD.org
From: jmg@FreeBSD.org
Subject: Re: kern/179299: [igb] Intel X540-T2 - unstable driver
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 20:29:12 -0000

Synopsis: [igb] Intel X540-T2 - unstable driver

State-Changed-From-To: open->closed
State-Changed-By: jmg
State-Changed-When: Tue Jan 28 20:27:25 UTC 2014
State-Changed-Why: 
looks like you don't have the hardware anymore.. if you can reproduce,
we can open this up again..


Responsible-Changed-From-To: freebsd-net->freebsd-j
Responsible-Changed-By: jmg
Responsible-Changed-When: Tue Jan 28 20:27:25 UTC 2014
Responsible-Changed-Why: 
looks like you don't have the hardware anymore.. if you can reproduce,
we can open this up again..

http://www.freebsd.org/cgi/query-pr.cgi?pr=179299

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 20:30:01 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 65C59887
 for <freebsd-net@smarthost.ysv.freebsd.org>;
 Tue, 28 Jan 2014 20:30:01 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 438291654
 for <freebsd-net@smarthost.ysv.freebsd.org>;
 Tue, 28 Jan 2014 20:30:01 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s0SKU17c097274
 for <freebsd-net@freefall.freebsd.org>; Tue, 28 Jan 2014 20:30:01 GMT
 (envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id s0SKU15n097273;
 Tue, 28 Jan 2014 20:30:01 GMT (envelope-from gnats)
Date: Tue, 28 Jan 2014 20:30:01 GMT
Message-Id: <201401282030.s0SKU15n097273@freefall.freebsd.org>
To: freebsd-net@FreeBSD.org
Cc: 
From: dfilter@FreeBSD.ORG (dfilter service)
Subject: Re: kern/183659: commit references a PR
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: dfilter service <dfilter@FreeBSD.ORG>
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 20:30:01 -0000

The following reply was made to PR kern/183659; it has been noted by GNATS.

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/183659: commit references a PR
Date: Tue, 28 Jan 2014 20:28:45 +0000 (UTC)

 Author: gnn
 Date: Tue Jan 28 20:28:32 2014
 New Revision: 261242
 URL: http://svnweb.freebsd.org/changeset/base/261242
 
 Log:
   Decrease lock contention within the TCP accept case by removing
   the INP_INFO lock from tcp_usr_accept.  As the PR/patch states
   this was following the advice already in the code.
   See the PR below for a full disucssion of this change and its
   measured effects.
   
   PR:		183659
   Submitted by:	Julian Charbon
   Reviewed by:	jhb
 
 Modified:
   head/sys/netinet/tcp_syncache.c
   head/sys/netinet/tcp_usrreq.c
 
 Modified: head/sys/netinet/tcp_syncache.c
 ==============================================================================
 --- head/sys/netinet/tcp_syncache.c	Tue Jan 28 19:12:31 2014	(r261241)
 +++ head/sys/netinet/tcp_syncache.c	Tue Jan 28 20:28:32 2014	(r261242)
 @@ -682,7 +682,7 @@ syncache_socket(struct syncache *sc, str
  	 * connection when the SYN arrived.  If we can't create
  	 * the connection, abort it.
  	 */
 -	so = sonewconn(lso, SS_ISCONNECTED);
 +	so = sonewconn(lso, 0);
  	if (so == NULL) {
  		/*
  		 * Drop the connection; we will either send a RST or
 @@ -922,6 +922,8 @@ syncache_socket(struct syncache *sc, str
  
  	INP_WUNLOCK(inp);
  
 +	soisconnected(so);
 +
  	TCPSTAT_INC(tcps_accepts);
  	return (so);
  
 
 Modified: head/sys/netinet/tcp_usrreq.c
 ==============================================================================
 --- head/sys/netinet/tcp_usrreq.c	Tue Jan 28 19:12:31 2014	(r261241)
 +++ head/sys/netinet/tcp_usrreq.c	Tue Jan 28 20:28:32 2014	(r261242)
 @@ -610,13 +610,6 @@ out:
  /*
   * Accept a connection.  Essentially all the work is done at higher levels;
   * just return the address of the peer, storing through addr.
 - *
 - * The rationale for acquiring the tcbinfo lock here is somewhat complicated,
 - * and is described in detail in the commit log entry for r175612.  Acquiring
 - * it delays an accept(2) racing with sonewconn(), which inserts the socket
 - * before the inpcb address/port fields are initialized.  A better fix would
 - * prevent the socket from being placed in the listen queue until all fields
 - * are fully initialized.
   */
  static int
  tcp_usr_accept(struct socket *so, struct sockaddr **nam)
 @@ -633,7 +626,6 @@ tcp_usr_accept(struct socket *so, struct
  
  	inp = sotoinpcb(so);
  	KASSERT(inp != NULL, ("tcp_usr_accept: inp == NULL"));
 -	INP_INFO_RLOCK(&V_tcbinfo);
  	INP_WLOCK(inp);
  	if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) {
  		error = ECONNABORTED;
 @@ -653,7 +645,6 @@ tcp_usr_accept(struct socket *so, struct
  out:
  	TCPDEBUG2(PRU_ACCEPT);
  	INP_WUNLOCK(inp);
 -	INP_INFO_RUNLOCK(&V_tcbinfo);
  	if (error == 0)
  		*nam = in_sockaddr(port, &addr);
  	return error;
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 20:30:04 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4A6EB89E;
 Tue, 28 Jan 2014 20:30:04 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1B86D1657;
 Tue, 28 Jan 2014 20:30:04 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s0SKU3KD097294;
 Tue, 28 Jan 2014 20:30:03 GMT
 (envelope-from jmg@freefall.freebsd.org)
Received: (from jmg@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id s0SKU3Ji097293;
 Tue, 28 Jan 2014 20:30:03 GMT (envelope-from jmg)
Date: Tue, 28 Jan 2014 20:30:03 GMT
Message-Id: <201401282030.s0SKU3Ji097293@freefall.freebsd.org>
To: jmg@FreeBSD.org, freebsd-j@FreeBSD.org, freebsd-net@FreeBSD.org
From: jmg@FreeBSD.org
Subject: Re: kern/179299: [igb] Intel X540-T2 - unstable driver
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 20:30:04 -0000

Synopsis: [igb] Intel X540-T2 - unstable driver

Responsible-Changed-From-To: freebsd-j->freebsd-net
Responsible-Changed-By: jmg
Responsible-Changed-When: Tue Jan 28 20:29:36 UTC 2014
Responsible-Changed-Why: 
fix responsible typo..

http://www.freebsd.org/cgi/query-pr.cgi?pr=179299

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 20:33:03 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 143EBB01;
 Tue, 28 Jan 2014 20:33:03 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id DA09A16DB;
 Tue, 28 Jan 2014 20:33:02 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s0SKX2hC099110;
 Tue, 28 Jan 2014 20:33:02 GMT
 (envelope-from gnn@freefall.freebsd.org)
Received: (from gnn@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id s0SKX2W8099109;
 Tue, 28 Jan 2014 20:33:02 GMT (envelope-from gnn)
Date: Tue, 28 Jan 2014 20:33:02 GMT
Message-Id: <201401282033.s0SKX2W8099109@freefall.freebsd.org>
To: jcharbon@verisign.com, gnn@FreeBSD.org, freebsd-net@FreeBSD.org
From: gnn@FreeBSD.org
Subject: Re: kern/183659: [tcp] ]TCP stack lock contention with short-lived
 connections
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 20:33:03 -0000

Synopsis: [tcp] ]TCP stack lock contention with short-lived connections

State-Changed-From-To: open->patched
State-Changed-By: gnn
State-Changed-When: Tue Jan 28 20:32:31 UTC 2014
State-Changed-Why: 
Patched with commit 261242

http://www.freebsd.org/cgi/query-pr.cgi?pr=183659

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 20:37:58 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 36574D0D
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 20:37:58 +0000 (UTC)
Received: from na3sys010aog111.obsmtp.com (na3sys010aog111.obsmtp.com
 [74.125.245.90]) by mx1.freebsd.org (Postfix) with SMTP id A94931733
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 20:37:57 +0000 (UTC)
Received: from mail-ee0-f54.google.com ([74.125.83.54]) (using TLSv1) by
 na3sys010aob111.postini.com ([74.125.244.12]) with SMTP
 ID DSNKUugVJOaUD1rOkrp/4ioDVB5SfuULZzjs@postini.com;
 Tue, 28 Jan 2014 12:37:57 PST
Received: by mail-ee0-f54.google.com with SMTP id e53so466820eek.41
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 12:37:55 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:content-type:mime-version:subject:from
 :in-reply-to:date:cc:content-transfer-encoding:message-id:references
 :to; bh=jhNQVNyXS4+2PwXqSBVa5dKLEVylrIgl8Ul9GTsnEfY=;
 b=PAAdPKNT5U+emTnINI5UqCZ5EdVRmuAZoILpmyD0KZ2B07uMOIQQMXptB5kkudxYsi
 WDtApREoPqrBL/EGVAc3jpWFZanSDr/UF2yCrazJdpsn6Ip5gVdb+4loIBLjaQ08hG6Z
 WWg5bfKfUMKgdRBWlqJgfE8zl6JDB6Ux5l6oTtHLVVW9NxOhfhJNNGcyKE6Bl/Hb2Rig
 Zw7G1/x9LfzqRZc43L5i5eRLNYN+RBy1qgr1Kxj5wV7nkw5mrIvujnn/RExWiM9NiFJi
 VKMyQZ7Z7HXWvyWlGrLvJktx6GjVsYPuQSSp5B0LkBbaM1cmRFoTkjROKRF8D99B4NiD
 0VYw==
X-Received: by 10.14.211.71 with SMTP id v47mr3897742eeo.37.1390941101826;
 Tue, 28 Jan 2014 12:31:41 -0800 (PST)
X-Gm-Message-State: ALoCoQkVYtdDUdwoGgO/JEspVz5QxI9sLK8CiJfe8aXq2bclF7aDXCXNZZ6SMV71j6Yf47307kCy/GNBV/CUoTg53aKWk1IAc4sC8wNzqFbD+Cg7t59GnFvGlqBpeN1JcpTOOjQHIuzoaYXJOzgL8ThlTsSEVW/ns3iz6zWZW/ky88n/EahejkU=
X-Received: by 10.14.211.71 with SMTP id v47mr3897732eeo.37.1390941101739;
 Tue, 28 Jan 2014 12:31:41 -0800 (PST)
Received: from grey.home.unixconn.com (h-74-23.a183.priv.bahnhof.se.
 [46.59.74.23])
 by mx.google.com with ESMTPSA id b41sm59859144eef.16.2014.01.28.12.31.39
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Tue, 28 Jan 2014 12:31:40 -0800 (PST)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\))
Subject: Re: kern/179299: [igb] Intel X540-T2 - unstable driver
From: Maxim Bourmistrov <maxim@prisjakt.nu>
In-Reply-To: <201401282029.s0SKTBfS097101@freefall.freebsd.org>
Date: Tue, 28 Jan 2014 21:31:38 +0100
Content-Transfer-Encoding: 7bit
Message-Id: <8AB7DCF4-BE5C-4F11-BB66-CED78A58F51A@prisjakt.nu>
References: <201401282029.s0SKTBfS097101@freefall.freebsd.org>
To: jmg@FreeBSD.org
X-Mailer: Apple Mail (2.1827)
Cc: freebsd-net@FreeBSD.org, freebsd-j@FreeBSD.org, sysop@prisjakt.nu
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 20:37:58 -0000


Agree.


On 28 jan 2014, at 21:29, jmg@FreeBSD.org wrote:

> Synopsis: [igb] Intel X540-T2 - unstable driver
> 
> State-Changed-From-To: open->closed
> State-Changed-By: jmg
> State-Changed-When: Tue Jan 28 20:27:25 UTC 2014
> State-Changed-Why: 
> looks like you don't have the hardware anymore.. if you can reproduce,
> we can open this up again..
> 
> 
> Responsible-Changed-From-To: freebsd-net->freebsd-j
> Responsible-Changed-By: jmg
> Responsible-Changed-When: Tue Jan 28 20:27:25 UTC 2014
> Responsible-Changed-Why: 
> looks like you don't have the hardware anymore.. if you can reproduce,
> we can open this up again..
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=179299


From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 21:00:01 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C3F6D357
 for <freebsd-net@smarthost.ysv.freebsd.org>;
 Tue, 28 Jan 2014 21:00:01 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id AFD2E18CD
 for <freebsd-net@smarthost.ysv.freebsd.org>;
 Tue, 28 Jan 2014 21:00:01 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s0SL01iC004063
 for <freebsd-net@freefall.freebsd.org>; Tue, 28 Jan 2014 21:00:01 GMT
 (envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id s0SL01HY004062;
 Tue, 28 Jan 2014 21:00:01 GMT (envelope-from gnats)
Date: Tue, 28 Jan 2014 21:00:01 GMT
Message-Id: <201401282100.s0SL01HY004062@freefall.freebsd.org>
To: freebsd-net@FreeBSD.org
Cc: 
From: Vlad Movchan <vladislav.movchan@gmail.com>
Subject: Re: kern/165622: [ndis][panic][patch] Unregistered use of FPU in
 kernel on amd64
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: Vlad Movchan <vladislav.movchan@gmail.com>
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 21:00:01 -0000

The following reply was made to PR kern/165622; it has been noted by GNATS.

From: Vlad Movchan <vladislav.movchan@gmail.com>
To: bug-followup@FreeBSD.org, Vlad Movchan <vladislav.movchan@gmail.com>
Cc:  
Subject: Re: kern/165622: [ndis][panic][patch] Unregistered use of FPU in
 kernel on amd64
Date: Tue, 28 Jan 2014 22:56:22 +0200

 --001a11343fbc90848404f10e0ef4
 Content-Type: multipart/alternative; boundary=001a11343fbc90847f04f10e0ef2
 
 --001a11343fbc90847f04f10e0ef2
 Content-Type: text/plain; charset=ISO-8859-1
 
 Here is a corrected patch. Previous version could not be compiled on i386.
 
 --001a11343fbc90847f04f10e0ef2
 Content-Type: text/html; charset=ISO-8859-1
 
 <div dir="ltr">Here is a corrected patch. Previous version could not be compiled on i386.<br></div>
 
 --001a11343fbc90847f04f10e0ef2--
 --001a11343fbc90848404f10e0ef4
 Content-Type: text/plain; charset=US-ASCII; name="fpu_patch3.txt"
 Content-Disposition: attachment; filename="fpu_patch3.txt"
 Content-Transfer-Encoding: base64
 X-Attachment-Id: f_hqzly1yi0
 
 SW5kZXg6IHN5cy9jb21wYXQvbmRpcy9rZXJuX3dpbmRydi5jCj09PT09PT09PT09PT09PT09PT09
 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIHN5cy9j
 b21wYXQvbmRpcy9rZXJuX3dpbmRydi5jCShyZXZpc2lvbiAyNjEyMzkpCisrKyBzeXMvY29tcGF0
 L25kaXMva2Vybl93aW5kcnYuYwkod29ya2luZyBjb3B5KQpAQCAtNTYsNiArNTYsMTAgQEAKICNp
 bmNsdWRlIDxtYWNoaW5lL3NlZ21lbnRzLmg+CiAjZW5kaWYKIAorI2lmZGVmIF9fYW1kNjRfXwor
 I2luY2x1ZGUgPG1hY2hpbmUvZnB1Lmg+CisjZW5kaWYKKwogI2luY2x1ZGUgPGRldi91c2IvdXNi
 Lmg+CiAKICNpbmNsdWRlIDxjb21wYXQvbmRpcy9wZV92YXIuaD4KQEAgLTY2LDYgKzcwLDE2IEBA
 CiAjaW5jbHVkZSA8Y29tcGF0L25kaXMvaGFsX3Zhci5oPgogI2luY2x1ZGUgPGNvbXBhdC9uZGlz
 L3VzYmRfdmFyLmg+CiAKKyNpZmRlZiBfX2FtZDY0X18KK3N0cnVjdCBmcHVfY2NfZW50IHsKKwlj
 aGFyCQl1c2VkOworCXN0cnVjdCBmcHVfa2Vybl9jdHggKmN0eDsKKwlTTElTVF9FTlRSWShmcHVf
 Y2NfZW50KSBsaW5rOworfTsKK3N0YXRpYyBTTElTVF9IRUFEKGZwdV9jdHhfY2FjaGUsIGZwdV9j
 Y19lbnQpIGZwdV9jY19oZWFkOworc3RhdGljIHN0cnVjdCBtdHggZnB1X2NhY2hlX210eDsKKyNl
 bmRpZgorCiBzdGF0aWMgc3RydWN0IG10eCBkcnZkYl9tdHg7CiBzdGF0aWMgU1RBSUxRX0hFQUQo
 ZHJ2ZGIsIGRydmRiX2VudCkgZHJ2ZGJfaGVhZDsKIApAQCAtOTYsNiArMTEwLDExIEBACiAJbXR4
 X2luaXQoJmRydmRiX210eCwgIldpbmRvd3MgZHJpdmVyIERCIGxvY2siLAogCSAgICAiV2luZG93
 cyBpbnRlcm5hbCBsb2NrIiwgTVRYX0RFRik7CiAKKyNpZmRlZiBfX2FtZDY0X18KKwlTTElTVF9J
 TklUKCZmcHVfY2NfaGVhZCk7CisJbXR4X2luaXQoJmZwdV9jYWNoZV9tdHgsICJmcHUgY29udGV4
 dCBjYWNoZSBsb2NrIiwgTlVMTCwgTVRYX0RFRik7CisjZW5kaWYKKwogCS8qCiAJICogUENJIGFu
 ZCBwY2NhcmQgZGV2aWNlcyBkb24ndCBuZWVkIHRvIHVzZSBJUlBzIHRvCiAJICogaW50ZXJhY3Qg
 d2l0aCB0aGVpciBidXMgZHJpdmVycyAodXN1YWxseSksIHNvIG91cgpAQCAtMTMwLDYgKzE0OSw5
 IEBACiB3aW5kcnZfbGliZmluaSh2b2lkKQogewogCXN0cnVjdCBkcnZkYl9lbnQJKmQ7CisjaWZk
 ZWYgX19hbWQ2NF9fCisJc3RydWN0IGZwdV9jY19lbnQgKmVudDsKKyNlbmRpZgogCiAJbXR4X2xv
 Y2soJmRydmRiX210eCk7IAogCXdoaWxlKFNUQUlMUV9GSVJTVCgmZHJ2ZGJfaGVhZCkgIT0gTlVM
 TCkgewpAQCAtMTQ4LDYgKzE3MCwxNSBAQAogCXNtcF9yZW5kZXp2b3VzKE5VTEwsIHg4Nl9vbGRs
 ZHQsIE5VTEwsIE5VTEwpOwogCUV4RnJlZVBvb2wobXlfdGlkcyk7CiAjZW5kaWYKKyNpZmRlZiBf
 X2FtZDY0X18KKwl3aGlsZSAoKGVudCA9IFNMSVNUX0ZJUlNUKCZmcHVfY2NfaGVhZCkpICE9IE5V
 TEwpIHsKKwkJU0xJU1RfUkVNT1ZFX0hFQUQoJmZwdV9jY19oZWFkLCBsaW5rKTsKKwkJZnB1X2tl
 cm5fZnJlZV9jdHgoZW50LT5jdHgpOworCQlmcmVlKGVudCwgTV9ERVZCVUYpOworCX0KKworCW10
 eF9kZXN0cm95KCZmcHVfY2FjaGVfbXR4KTsKKyNlbmRpZgogCXJldHVybiAoMCk7CiB9CiAKQEAg
 LTYxMyw2ICs2NDQsMTQyIEBACiAKIAlyZXR1cm4gKDApOwogfQorCitzdGF0aWMgc3RydWN0IGZw
 dV9jY19lbnQgKgorcmVxdWVzdF9mcHVfY2NfZW50KHZvaWQpCit7CisJc3RydWN0IGZwdV9jY19l
 bnQgKmVudDsKKworCW10eF9sb2NrKCZmcHVfY2FjaGVfbXR4KTsKKwlTTElTVF9GT1JFQUNIKGVu
 dCwgJmZwdV9jY19oZWFkLCBsaW5rKSB7CisJCWlmKGVudC0+dXNlZCA9PSAwKSB7CisJCQllbnQt
 PnVzZWQgPSAxOworCQkJbXR4X3VubG9jaygmZnB1X2NhY2hlX210eCk7CisJCQlyZXR1cm4gKGVu
 dCk7CisJCX0KKwl9CisJbXR4X3VubG9jaygmZnB1X2NhY2hlX210eCk7CisKKwlpZiAoKGVudCA9
 IG1hbGxvYyhzaXplb2Yoc3RydWN0IGZwdV9jY19lbnQpLCBNX0RFVkJVRiwgTV9OT1dBSVQgfAor
 CSAgICBNX1pFUk8pKSAhPSBOVUxMKSB7CisJCWVudC0+Y3R4ID0gZnB1X2tlcm5fYWxsb2NfY3R4
 KEZQVV9LRVJOX05PUk1BTCB8CisJCSAgICBGUFVfS0VSTl9OT1dBSVQpOworCQlpZiAoZW50LT5j
 dHggIT0gTlVMTCkgeworCQkJZW50LT51c2VkID0gMTsKKwkJCW10eF9sb2NrKCZmcHVfY2FjaGVf
 bXR4KTsKKwkJCVNMSVNUX0lOU0VSVF9IRUFEKCZmcHVfY2NfaGVhZCwgZW50LCBsaW5rKTsKKwkJ
 CW10eF91bmxvY2soJmZwdV9jYWNoZV9tdHgpOworCQl9IGVsc2UKKwkJCWZyZWUoZW50LCBNX0RF
 VkJVRik7CisJfQorCisJcmV0dXJuIChlbnQpOworfQorCitzdGF0aWMgdm9pZAorcmVsZWFzZV9m
 cHVfY2NfZW50KHN0cnVjdCBmcHVfY2NfZW50ICplbnQpCit7CisKKwllbnQtPnVzZWQgPSAwOwor
 fQorCit1aW50NjRfdAorX3g4Nl82NF9jYWxsMSh2b2lkICpmbiwgdWludDY0X3QgYSkKK3sKKwlz
 dHJ1Y3QgZnB1X2NjX2VudCAqZW50OworCXVpbnQ2NF90IHJldDsKKworCWlmICgoZW50ID0gcmVx
 dWVzdF9mcHVfY2NfZW50KCkpID09IE5VTEwpCisJCXJldHVybiAoRU5PTUVNKTsKKwlmcHVfa2Vy
 bl9lbnRlcihjdXJ0aHJlYWQsIGVudC0+Y3R4LCBGUFVfS0VSTl9OT1JNQUwpOworCXJldCA9IHg4
 Nl82NF9jYWxsMShmbiwgYSk7CisJZnB1X2tlcm5fbGVhdmUoY3VydGhyZWFkLCBlbnQtPmN0eCk7
 CisJcmVsZWFzZV9mcHVfY2NfZW50KGVudCk7CisKKwlyZXR1cm4gKHJldCk7Cit9CisKK3VpbnQ2
 NF90CitfeDg2XzY0X2NhbGwyKHZvaWQgKmZuLCB1aW50NjRfdCBhLCB1aW50NjRfdCBiKQorewor
 CXN0cnVjdCBmcHVfY2NfZW50ICplbnQ7CisJdWludDY0X3QgcmV0OworCisJaWYgKChlbnQgPSBy
 ZXF1ZXN0X2ZwdV9jY19lbnQoKSkgPT0gTlVMTCkKKwkJcmV0dXJuIChFTk9NRU0pOworCWZwdV9r
 ZXJuX2VudGVyKGN1cnRocmVhZCwgZW50LT5jdHgsIEZQVV9LRVJOX05PUk1BTCk7CisJcmV0ID0g
 eDg2XzY0X2NhbGwyKGZuLCBhLCBiKTsKKwlmcHVfa2Vybl9sZWF2ZShjdXJ0aHJlYWQsIGVudC0+
 Y3R4KTsKKwlyZWxlYXNlX2ZwdV9jY19lbnQoZW50KTsKKworCXJldHVybiAocmV0KTsKK30KKwor
 dWludDY0X3QKK194ODZfNjRfY2FsbDModm9pZCAqZm4sIHVpbnQ2NF90IGEsIHVpbnQ2NF90IGIs
 IHVpbnQ2NF90IGMpCit7CisJc3RydWN0IGZwdV9jY19lbnQgKmVudDsKKwl1aW50NjRfdCByZXQ7
 CisKKwlpZiAoKGVudCA9IHJlcXVlc3RfZnB1X2NjX2VudCgpKSA9PSBOVUxMKQorCQlyZXR1cm4g
 KEVOT01FTSk7CisJZnB1X2tlcm5fZW50ZXIoY3VydGhyZWFkLCBlbnQtPmN0eCwgRlBVX0tFUk5f
 Tk9STUFMKTsKKwlyZXQgPSB4ODZfNjRfY2FsbDMoZm4sIGEsIGIsIGMpOworCWZwdV9rZXJuX2xl
 YXZlKGN1cnRocmVhZCwgZW50LT5jdHgpOworCXJlbGVhc2VfZnB1X2NjX2VudChlbnQpOworCisJ
 cmV0dXJuIChyZXQpOworfQorCit1aW50NjRfdAorX3g4Nl82NF9jYWxsNCh2b2lkICpmbiwgdWlu
 dDY0X3QgYSwgdWludDY0X3QgYiwgdWludDY0X3QgYywgdWludDY0X3QgZCkKK3sKKwlzdHJ1Y3Qg
 ZnB1X2NjX2VudCAqZW50OworCXVpbnQ2NF90IHJldDsKKworCWlmICgoZW50ID0gcmVxdWVzdF9m
 cHVfY2NfZW50KCkpID09IE5VTEwpCisJCXJldHVybiAoRU5PTUVNKTsKKwlmcHVfa2Vybl9lbnRl
 cihjdXJ0aHJlYWQsIGVudC0+Y3R4LCBGUFVfS0VSTl9OT1JNQUwpOworCXJldCA9IHg4Nl82NF9j
 YWxsNChmbiwgYSwgYiwgYywgZCk7CisJZnB1X2tlcm5fbGVhdmUoY3VydGhyZWFkLCBlbnQtPmN0
 eCk7CisJcmVsZWFzZV9mcHVfY2NfZW50KGVudCk7CisKKwlyZXR1cm4gKHJldCk7Cit9CisKK3Vp
 bnQ2NF90CitfeDg2XzY0X2NhbGw1KHZvaWQgKmZuLCB1aW50NjRfdCBhLCB1aW50NjRfdCBiLCB1
 aW50NjRfdCBjLCB1aW50NjRfdCBkLAorICAgIHVpbnQ2NF90IGUpCit7CisJc3RydWN0IGZwdV9j
 Y19lbnQgKmVudDsKKwl1aW50NjRfdCByZXQ7CisKKwlpZiAoKGVudCA9IHJlcXVlc3RfZnB1X2Nj
 X2VudCgpKSA9PSBOVUxMKQorCQlyZXR1cm4gKEVOT01FTSk7CisJZnB1X2tlcm5fZW50ZXIoY3Vy
 dGhyZWFkLCBlbnQtPmN0eCwgRlBVX0tFUk5fTk9STUFMKTsKKwlyZXQgPSB4ODZfNjRfY2FsbDUo
 Zm4sIGEsIGIsIGMsIGQsIGUpOworCWZwdV9rZXJuX2xlYXZlKGN1cnRocmVhZCwgZW50LT5jdHgp
 OworCXJlbGVhc2VfZnB1X2NjX2VudChlbnQpOworCisJcmV0dXJuIChyZXQpOworfQorCit1aW50
 NjRfdAorX3g4Nl82NF9jYWxsNih2b2lkICpmbiwgdWludDY0X3QgYSwgdWludDY0X3QgYiwgdWlu
 dDY0X3QgYywgdWludDY0X3QgZCwKKyAgICB1aW50NjRfdCBlLCB1aW50NjRfdCBmKQoreworCXN0
 cnVjdCBmcHVfY2NfZW50ICplbnQ7CisJdWludDY0X3QgcmV0OworCisJaWYgKChlbnQgPSByZXF1
 ZXN0X2ZwdV9jY19lbnQoKSkgPT0gTlVMTCkKKwkJcmV0dXJuIChFTk9NRU0pOworCWZwdV9rZXJu
 X2VudGVyKGN1cnRocmVhZCwgZW50LT5jdHgsIEZQVV9LRVJOX05PUk1BTCk7CisJcmV0ID0geDg2
 XzY0X2NhbGw2KGZuLCBhLCBiLCBjLCBkLCBlLCBmKTsKKwlmcHVfa2Vybl9sZWF2ZShjdXJ0aHJl
 YWQsIGVudC0+Y3R4KTsKKwlyZWxlYXNlX2ZwdV9jY19lbnQoZW50KTsKKworCXJldHVybiAocmV0
 KTsKK30KICNlbmRpZiAvKiBfX2FtZDY0X18gKi8KIAogCkluZGV4OiBzeXMvY29tcGF0L25kaXMv
 cGVfdmFyLmgKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
 PT09PT09PT09PT09PT09PT09PQotLS0gc3lzL2NvbXBhdC9uZGlzL3BlX3Zhci5oCShyZXZpc2lv
 biAyNjEyMzkpCisrKyBzeXMvY29tcGF0L25kaXMvcGVfdmFyLmgJKHdvcmtpbmcgY29weSkKQEAg
 LTQ2MCwyMiArNDYwLDMwIEBACiBleHRlcm4gdWludDY0X3QgeDg2XzY0X2NhbGw2KHZvaWQgKiwg
 dWludDY0X3QsIHVpbnQ2NF90LCB1aW50NjRfdCwgdWludDY0X3QsCiAJdWludDY0X3QsIHVpbnQ2
 NF90KTsKIAordWludDY0X3QgX3g4Nl82NF9jYWxsMSh2b2lkICosIHVpbnQ2NF90KTsKK3VpbnQ2
 NF90IF94ODZfNjRfY2FsbDIodm9pZCAqLCB1aW50NjRfdCwgdWludDY0X3QpOwordWludDY0X3Qg
 X3g4Nl82NF9jYWxsMyh2b2lkICosIHVpbnQ2NF90LCB1aW50NjRfdCwgdWludDY0X3QpOwordWlu
 dDY0X3QgX3g4Nl82NF9jYWxsNCh2b2lkICosIHVpbnQ2NF90LCB1aW50NjRfdCwgdWludDY0X3Qs
 IHVpbnQ2NF90KTsKK3VpbnQ2NF90IF94ODZfNjRfY2FsbDUodm9pZCAqLCB1aW50NjRfdCwgdWlu
 dDY0X3QsIHVpbnQ2NF90LCB1aW50NjRfdCwKKyAgICB1aW50NjRfdCk7Cit1aW50NjRfdCBfeDg2
 XzY0X2NhbGw2KHZvaWQgKiwgdWludDY0X3QsIHVpbnQ2NF90LCB1aW50NjRfdCwgdWludDY0X3Qs
 CisgICAgdWludDY0X3QsIHVpbnQ2NF90KTsKIAogI2RlZmluZQlNU0NBTEwxKGZuLCBhKQkJCQkJ
 CVwKLQl4ODZfNjRfY2FsbDEoKGZuKSwgKHVpbnQ2NF90KShhKSkKKwlfeDg2XzY0X2NhbGwxKChm
 biksICh1aW50NjRfdCkoYSkpCiAjZGVmaW5lCU1TQ0FMTDIoZm4sIGEsIGIpCQkJCQlcCi0JeDg2
 XzY0X2NhbGwyKChmbiksICh1aW50NjRfdCkoYSksICh1aW50NjRfdCkoYikpCisJX3g4Nl82NF9j
 YWxsMigoZm4pLCAodWludDY0X3QpKGEpLCAodWludDY0X3QpKGIpKQogI2RlZmluZQlNU0NBTEwz
 KGZuLCBhLCBiLCBjKQkJCQkJXAotCXg4Nl82NF9jYWxsMygoZm4pLCAodWludDY0X3QpKGEpLCAo
 dWludDY0X3QpKGIpLAkJXAorCV94ODZfNjRfY2FsbDMoKGZuKSwgKHVpbnQ2NF90KShhKSwgKHVp
 bnQ2NF90KShiKSwJCVwKIAkodWludDY0X3QpKGMpKQogI2RlZmluZQlNU0NBTEw0KGZuLCBhLCBi
 LCBjLCBkKQkJCQkJXAotCXg4Nl82NF9jYWxsNCgoZm4pLCAodWludDY0X3QpKGEpLCAodWludDY0
 X3QpKGIpLAkJXAorCV94ODZfNjRfY2FsbDQoKGZuKSwgKHVpbnQ2NF90KShhKSwgKHVpbnQ2NF90
 KShiKSwJCVwKIAkodWludDY0X3QpKGMpLCAodWludDY0X3QpKGQpKQogI2RlZmluZQlNU0NBTEw1
 KGZuLCBhLCBiLCBjLCBkLCBlKQkJCQlcCi0JeDg2XzY0X2NhbGw1KChmbiksICh1aW50NjRfdCko
 YSksICh1aW50NjRfdCkoYiksCQlcCisJX3g4Nl82NF9jYWxsNSgoZm4pLCAodWludDY0X3QpKGEp
 LCAodWludDY0X3QpKGIpLAkJXAogCSh1aW50NjRfdCkoYyksICh1aW50NjRfdCkoZCksICh1aW50
 NjRfdCkoZSkpCiAjZGVmaW5lCU1TQ0FMTDYoZm4sIGEsIGIsIGMsIGQsIGUsIGYpCQkJCVwKLQl4
 ODZfNjRfY2FsbDYoKGZuKSwgKHVpbnQ2NF90KShhKSwgKHVpbnQ2NF90KShiKSwJCVwKKwlfeDg2
 XzY0X2NhbGw2KChmbiksICh1aW50NjRfdCkoYSksICh1aW50NjRfdCkoYiksCQlcCiAJKHVpbnQ2
 NF90KShjKSwgKHVpbnQ2NF90KShkKSwgKHVpbnQ2NF90KShlKSwgKHVpbnQ2NF90KShmKSkKIAog
 I2VuZGlmIC8qIF9fYW1kNjRfXyAqLwo=
 --001a11343fbc90848404f10e0ef4--

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 21:42:42 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9ADA8694;
 Tue, 28 Jan 2014 21:42:42 +0000 (UTC)
Received: from mail2.dataoppdrag.no (mail2.dataoppdrag.no
 [IPv6:2a02:f58:7:2::2])
 by mx1.freebsd.org (Postfix) with ESMTP id 518851D66;
 Tue, 28 Jan 2014 21:42:42 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by mail2.dataoppdrag.no (Postfix) with ESMTP id 875F543038;
 Tue, 28 Jan 2014 22:42:40 +0100 (CET)
Received: from mail2.dataoppdrag.no ([127.0.0.1])
 by localhost (mail2.dataoppdrag.no [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id MuClo+YO5gwm; Tue, 28 Jan 2014 22:42:40 +0100 (CET)
Received: from [172.20.10.252] (42-80-141-95.net.dataoppdrag.no [95.141.80.42])
 by mail2.dataoppdrag.no (Postfix) with ESMTP id 688FB43037;
 Tue, 28 Jan 2014 22:42:40 +0100 (CET)
Message-ID: <52E82450.60107@dataoppdrag.no>
Date: Tue, 28 Jan 2014 22:42:40 +0100
From: Ole Myhre <ole@dataoppdrag.no>
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: =?ISO-8859-1?Q?Ermal_Lu=E7i?= <eri@freebsd.org>
Subject: Re: carp and rtadvd
References: <52E7AB9B.5050707@dataoppdrag.no>
 <CAPBZQG2VQStkTW1wLJQiQZvGRnoOmFzf9zynHXXtv1A3ngucxg@mail.gmail.com>
In-Reply-To: <CAPBZQG2VQStkTW1wLJQiQZvGRnoOmFzf9zynHXXtv1A3ngucxg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 21:42:42 -0000

On 28.01.2014 14:18, Ermal Luçi wrote:

> You have to use the rtadvd patched from pfSense.
> Look at our tools repo to get the code.

Doesn't pfSense use radvd and not rtadvd? I've tried the patched radvd
from the tools repo, however it does not seem to work. Maybe it's
created for < 10.0?

Thanks,
Ole

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 22:50:53 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 83090DEE;
 Tue, 28 Jan 2014 22:50:53 +0000 (UTC)
Received: from mail-vb0-x236.google.com (mail-vb0-x236.google.com
 [IPv6:2607:f8b0:400c:c02::236])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2B8691221;
 Tue, 28 Jan 2014 22:50:53 +0000 (UTC)
Received: by mail-vb0-f54.google.com with SMTP id w20so678395vbb.13
 for <multiple recipients>; Tue, 28 Jan 2014 14:50:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=iaaxAPvs6SStV4r2phgDCLVn8FFTUUwXEItzFgHSAtU=;
 b=Ty10IHEY0dqgribTVJ3Q+mkRm1InnNWA19etliL868IhwrD8yUXqlJFUdDxBtn5Cyr
 nC+g1sgFeaCTkfSF2sEn798nQCvg2iN3rfmqcRPBwkV6ayqfNGbQ/d6bfUON3LvTiyXP
 X5M8HqLxw2wphlch3A/Nen+TBWkVi87K4SZa5qK0t+IGkbHKh0aAKerH/0/slvFKNF9+
 TyhUOxCOt00gSIqKqLbqLQnVBTI5wAFe9VsLLYmDZTKSWrDerSL0GGGS1veKvIFDNpJ2
 d5AZnfgDw/o12xmvIE3BkQV//r2bXKJYYCDViQ4BzMyirJnQlZn8w5/YReQm288tpfra
 UEJQ==
MIME-Version: 1.0
X-Received: by 10.221.26.10 with SMTP id rk10mr3315716vcb.0.1390949452156;
 Tue, 28 Jan 2014 14:50:52 -0800 (PST)
Sender: ndenev@gmail.com
Received: by 10.220.78.84 with HTTP; Tue, 28 Jan 2014 14:50:52 -0800 (PST)
In-Reply-To: <CAPSTskviFBMHUXmWhTnGB13ZEYsyQm2_azHNJ65VG42oQLpmpQ@mail.gmail.com>
References: <1390909590119-5880672.post@n5.nabble.com>
 <52E7A9D8.30604@freebsd.org>
 <CAPSTskviFBMHUXmWhTnGB13ZEYsyQm2_azHNJ65VG42oQLpmpQ@mail.gmail.com>
Date: Tue, 28 Jan 2014 22:50:52 +0000
X-Google-Sender-Auth: w6PDynSsQKBZiQIQzs1fKBQzj8o
Message-ID: <CA+P_MZFZ8Ue46VZOkG9sL2X_KqyXQZZb7B5z2Mm+r+xca6UUPQ@mail.gmail.com>
Subject: Re: Jails on fib problem
From: Nikolay Denev <nike_d@cytexbg.com>
To: zaphod@berentweb.com
Content-Type: text/plain; charset=ISO-8859-1
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 22:50:53 -0000

On Tue, Jan 28, 2014 at 5:17 PM, Beeblebrox <zaphod@berentweb.com> wrote:
>> what's a fib device? Do you mean each jail has a different default fib?
>> you are not using vimage jails?
>
> Hi Julian.
> * No vimage
> * All jails use the same fib. /etc/rc.conf:
> cloned_interfaces="lo2"
> ifconfig_lo2="inet 127.0.1.1/28"
> static_routes="jail default"
> route_jail="default 127.0.1.1 -fib 1"
> route_default="default 192.168.1.1"
>
>> so they all have the same address?? can you even do that? or you mean that
>> they all have the same default route?
> I mean same default route, jail IP's start from 127.0.1.2/32 and go to
> 127.0.1.6/32
> jail.conf assigns fib with "exec.fib = 1;"
> jails on the 127.0.1.1/28 subnet range should be able to route traffic
> through the 127.0.0.1 gateway regardless of the fact that the jails
> themselves reside on a /32 subnet. However, it's not working smoothly
>
>> fibs don't have devices.
> Yes, I know  - a misnomer.
>
> setfib 1 netstat -rn
> Destination        Gateway            Flags    Netif Expire
> default            127.0.1.1          UGS       lo2
> 127.0.0.1          link#3             UH        lo0
> 127.0.1.1          link#4             UH        lo2
> 127.0.1.2          link#4             UH        lo2
> 127.0.1.3          link#4             UH        lo2
> 127.0.1.4          link#4             UH        lo2
> 192.168.1.0/24     link#1             U         re0  (Ext_If)
> 192.168.2.0/26     link#2             U         re1  (Lan_If)
>
> To complicate things further, I also have a vboxnet0 for VBox guests.
> 127.0.1.2 is a dns jail for example. The Internal LAN clients,
> vboxnet0 guests and lo0 need to resolve names from that jail.
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"

You can't use 127/8 addresses and expect them to be routed/forwarded.
See rfc1122.

--Nikolay

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 23:29:00 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id DDF75FB3
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 23:29:00 +0000 (UTC)
Received: from mail-ig0-x232.google.com (mail-ig0-x232.google.com
 [IPv6:2607:f8b0:4001:c05::232])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id AC74A158D
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 23:29:00 +0000 (UTC)
Received: by mail-ig0-f178.google.com with SMTP id uq10so3086975igb.5
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 15:29:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:content-type;
 bh=zClW/XfweItmgB6x4uQgGMoKtWGxMYr9oVqjZycfAAw=;
 b=tDJtkBk16eYXhOvovvf+jHrFYa/P4ICK16u5ENaY7dexEogwzmpJOn7rKW79pdK+xs
 Rkc1U9G5sQvEk5kqzHThK2Nvc1c0Q4MGQVJhjvchYEuhMKAtbR/R8k0l+tuZ/nHJ/tyg
 5Le19Hg1lFmMIlt587eY6tSaEnJfaE7TplA6FpKFCn00EJq/FSUb4yn7YnVWIYWKlwPS
 2c9gid5NBGli9pnQsjBlizG1nGjGv6YdRnYWlcZeqeTJkWY7JsRjbnW9nNeIdF4WJHUZ
 JerpkkShDSvLP/D1Kw/4wzj45oQReNO3L+rs4c/1TPdexfQJlhd+tM4WMJcHseuYyVjU
 WhEg==
MIME-Version: 1.0
X-Received: by 10.51.17.101 with SMTP id gd5mr5486745igd.25.1390951739913;
 Tue, 28 Jan 2014 15:28:59 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Tue, 28 Jan 2014 15:28:59 -0800 (PST)
In-Reply-To: <20140128021450.GY13704@funkthat.com>
References: <20140128002826.GU13704@funkthat.com>
 <1415339672.17282775.1390872779067.JavaMail.root@uoguelph.ca>
 <20140128021450.GY13704@funkthat.com>
Date: Tue, 28 Jan 2014 18:28:59 -0500
X-Google-Sender-Auth: kfx4cadvc10boT96qtjos3WGgc0
Message-ID: <CABXB=RTa_buG0k7oFaQcKpE0EFyS+pxMzx6aKCzXkP0tQ9As2g@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: freebsd-net@freebsd.org, jmg@funkthat.com
Content-Type: text/plain; charset=ISO-8859-1
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 23:29:00 -0000

On Mon, Jan 27, 2014 at 9:14 PM, John-Mark Gurney <jmg@funkthat.com> wrote:
> P.S. if someone wants to submit a patch to mbuf.9 to update the docs
> that would be helpful... I'll review and commit...  and m_append is
> also undocumented...

Would that look something like this be a start in that direction?

http://pastebin.com/UVir1BET

This is all very new to me, so I apologize if that's completely wrong.

m_append does appear to be documented, it's between m_adj and
m_prepend.  (At least it is on 9.2, which is the latest tree I have
access to.)

It does also look like m_getm is just a macro in mbuf.h that calls
m_getm2 with flags set to M_PKTHDR, not a function as described in the
man page.  It was not immediately obvious if that was intentional or
something that should be fixed or if it's intentionally meant to be
treated as a function from an API standpoint.

Thanks!

From owner-freebsd-net@FreeBSD.ORG  Tue Jan 28 23:41:38 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9A8F6782
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 23:41:38 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5D28916B2
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 23:41:37 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s0SNfbqk082788
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Tue, 28 Jan 2014 15:41:37 -0800 (PST)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s0SNfaqM082787;
 Tue, 28 Jan 2014 15:41:36 -0800 (PST) (envelope-from jmg)
Date: Tue, 28 Jan 2014 15:41:36 -0800
From: John-Mark Gurney <jmg@funkthat.com>
To: J David <j.david.lists@gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID: <20140128234136.GJ13704@funkthat.com>
Mail-Followup-To: J David <j.david.lists@gmail.com>,
	freebsd-net@freebsd.org
References: <20140128002826.GU13704@funkthat.com>
 <1415339672.17282775.1390872779067.JavaMail.root@uoguelph.ca>
 <20140128021450.GY13704@funkthat.com>
 <CABXB=RTa_buG0k7oFaQcKpE0EFyS+pxMzx6aKCzXkP0tQ9As2g@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CABXB=RTa_buG0k7oFaQcKpE0EFyS+pxMzx6aKCzXkP0tQ9As2g@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Tue, 28 Jan 2014 15:41:37 -0800 (PST)
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2014 23:41:38 -0000

J David wrote this message on Tue, Jan 28, 2014 at 18:28 -0500:
> On Mon, Jan 27, 2014 at 9:14 PM, John-Mark Gurney <jmg@funkthat.com> wrote:
> > P.S. if someone wants to submit a patch to mbuf.9 to update the docs
> > that would be helpful... I'll review and commit...  and m_append is
> > also undocumented...
> 
> Would that look something like this be a start in that direction?
> 
> http://pastebin.com/UVir1BET

It might be better to move most of m_getm's docs under m_getm2, and
document that m_getm is just m_getm2 w/ M_PKTHDR flag set.

Could you also document that only M_PKTHDR and M_EOR are valid
flags for m_getm2?

> This is all very new to me, so I apologize if that's completely wrong.

Nope, good first start...

> m_append does appear to be documented, it's between m_adj and
> m_prepend.  (At least it is on 9.2, which is the latest tree I have
> access to.)

You are correct.. the problem is that the MLINK isn't setup in the
Makefile, so:
$ man m_append
No manual entry for m_append

I've fixed that, r261254...

> It does also look like m_getm is just a macro in mbuf.h that calls
> m_getm2 with flags set to M_PKTHDR, not a function as described in the
> man page.  It was not immediately obvious if that was intentional or
> something that should be fixed or if it's intentionally meant to be
> treated as a function from an API standpoint.

It's common to use a macro when the change isn't complicated, i.e.
just adding a flag...

Thanks.

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 00:06:50 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B602CF4A
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 00:06:50 +0000 (UTC)
Received: from mail-ie0-x22a.google.com (mail-ie0-x22a.google.com
 [IPv6:2607:f8b0:4001:c03::22a])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 83AF5185C
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 00:06:50 +0000 (UTC)
Received: by mail-ie0-f170.google.com with SMTP id u16so1377678iet.1
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 16:06:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=xcf2TVbtRhc6egPITwsSspZRXi7QWrV3eOZVZG3Qaak=;
 b=F5B+6qVoXhrwQ9tUyEFzzM6m2huoMHUTEJ46X3WW15FZF3PVb5Md3ZDnf3jY/dauhf
 FCjjVtLts3Jd8OawDaVe/OB30hT07dcwl7NwAZzGHxL54IjsxMnXgTDqrtecDVUBkTgq
 O75SBdBiaLMD4DIvBaLL2jgrEuqbNlZkMhU6SDvkZnmimmGJUnyqMGsqKU21ouzZZZSq
 4CMrvbJtZDLytZsQEUDCN4ji9/EJExfN3F8ZNB0h6ZUTaXqfILx1sbYzJ3jxfaxKpuak
 sQNvsQtowA2UFd0gc+BFeKIbsYddYfzIjdw0x6mA3S07eWOYqwdY+YRIm/wwCWvnWMQG
 ATlg==
MIME-Version: 1.0
X-Received: by 10.50.154.102 with SMTP id vn6mr25502784igb.1.1390954009442;
 Tue, 28 Jan 2014 16:06:49 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Tue, 28 Jan 2014 16:06:49 -0800 (PST)
In-Reply-To: <372707859.17587309.1390923341323.JavaMail.root@uoguelph.ca>
References: <20140128021450.GY13704@funkthat.com>
 <372707859.17587309.1390923341323.JavaMail.root@uoguelph.ca>
Date: Tue, 28 Jan 2014 19:06:49 -0500
X-Google-Sender-Auth: qNyyKJhBRRg54dHCKEQmwcr46rw
Message-ID: <CABXB=RQksyZq43=jLw3wJT5vLzuK4h5cgE=Lj4caq1RgOBa8gA@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 00:06:50 -0000

On Tue, Jan 28, 2014 at 10:35 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Since messgaes are sent quickly and then mbufs released, except for
> the DRC in the server, I think avoiding large allocations for server
> replies that may be cached is the case to try and avoid. Fortunately
> the large replies will be for read and readdir and these don't need
> to be cached by the DRC. As such, a patch that uses 4K clusters in
> the server for read, readdir and 4K clusters for write requests in
> the client, should be appropriate, I think?

m_getm2 appears to consistent produce "right-sized" results.  The
relevant code is:

    while (len > 0) {

        if (len > MCLBYTES)

            mb = m_getjcl(how, type, (flags & M_PKTHDR),

                MJUMPAGESIZE);

        else if (len >= MINCLSIZE)

            mb = m_getcl(how, type, (flags & M_PKTHDR));

        else if (flags & M_PKTHDR)

            mb = m_gethdr(how, type);

        else

            mb = m_get(how, type);

/* ... */

    }

So it allocates the shortest possible chain and uses the best-fit
cluster for the last (or only) block in the chain.

It's probably the use of this function in m_uiotombuf or somewhere
very similar that prevents tools like iperf from encountering this
same issue.

Getting this same logic into the NFS code seems like it would be a
good thing, in terms of reducing code duplication, increasing
performance, and leveraging a well-tested code path.

It may raise portability concerns, but it does seem likely that other
OS's to which the NFS code could potentially be ported have similar
mechanisms these days.  Possibly it would be worthwhile to examine
whether the NFS code could choose a slightly different point of
abstraction.  Or, if that's undesirable, maybe asking the hypothetical
person doing such a port to cross that bridge when they come to it is
not unreasonable, since that would be the person most likely to be
intimately familiar with the relevant details of both OS's.

Also, looking at GAWollman's patch, an mbuf+cluster allocator that
kicks back a prewired iovec seems really handy.  Is that something
that would be useful elsewhere in the kernel, or is NFS just kind of a
special case because it's just moving data around, not across weird
boundaries like device drivers and anything user mode-facing does?

Thanks!

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 00:32:22 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 10CBF47A;
 Wed, 29 Jan 2014 00:32:22 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id A1C311A62;
 Wed, 29 Jan 2014 00:32:20 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqQEAIVL6FKDaFve/2dsb2JhbABag0RWgn65OE+BJXSCJQEBAQMBAQEBIAQnHgILBRYYERkCBCUBCSYGCAcEARwEh1wIDaozn0MXjigGAQEbGRsHgm+BSQSJSYZ1hReEBZBtg0seMXwIFyI
X-IronPort-AV: E=Sophos;i="4.95,739,1384318800"; d="scan'208";a="91280933"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 28 Jan 2014 19:32:13 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 6E9B4B404E;
 Tue, 28 Jan 2014 19:32:13 -0500 (EST)
Date: Tue, 28 Jan 2014 19:32:13 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: J David <j.david.lists@gmail.com>
Message-ID: <312973812.17975525.1390955533440.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RQksyZq43=jLw3wJT5vLzuK4h5cgE=Lj4caq1RgOBa8gA@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: multipart/mixed; 
 boundary="----=_Part_17975523_770461322.1390955533437"
X-Originating-IP: [172.17.91.209]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 00:32:22 -0000

------=_Part_17975523_770461322.1390955533437
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

J David wrote:
> On Tue, Jan 28, 2014 at 10:35 AM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> > Since messgaes are sent quickly and then mbufs released, except for
> > the DRC in the server, I think avoiding large allocations for
> > server
> > replies that may be cached is the case to try and avoid.
> > Fortunately
> > the large replies will be for read and readdir and these don't need
> > to be cached by the DRC. As such, a patch that uses 4K clusters in
> > the server for read, readdir and 4K clusters for write requests in
> > the client, should be appropriate, I think?
> 
> m_getm2 appears to consistent produce "right-sized" results.  The
> relevant code is:
> 
>     while (len > 0) {
> 
>         if (len > MCLBYTES)
> 
>             mb = m_getjcl(how, type, (flags & M_PKTHDR),
> 
>                 MJUMPAGESIZE);
> 
>         else if (len >= MINCLSIZE)
> 
>             mb = m_getcl(how, type, (flags & M_PKTHDR));
> 
>         else if (flags & M_PKTHDR)
> 
>             mb = m_gethdr(how, type);
> 
>         else
> 
>             mb = m_get(how, type);
> 
> /* ... */
> 
>     }
> 
> So it allocates the shortest possible chain and uses the best-fit
> cluster for the last (or only) block in the chain.
> 
> It's probably the use of this function in m_uiotombuf or somewhere
> very similar that prevents tools like iperf from encountering this
> same issue.
> 
> Getting this same logic into the NFS code seems like it would be a
> good thing, in terms of reducing code duplication, increasing
> performance, and leveraging a well-tested code path.
> 
For the server generating read replies, I suspect this is the case and
that is what Garrett Wollman's patch does. However, readdir builds up
the reply in small chunks via NFSM_BUILD() and this will require an extra
argument that says "allocate a big cluster". Since it builds the reply in
small chunks, it cannot use m_getm2().

I haven't looked at the client side write yet, so I don't know if m_getm2()
is feasible for it or not.

Hopefully Garrett and/or you will be able to do some testing of it
and report back w.r.t. performance gains, etc. Once we have that,
we can decide if this is an appropriate commit to head.

Since I suspect it will take some time for Garrett to do this, please
try my simple patch in your test environment, mostly to determine if
the fail count goes to 0 (and also count calls to m_collapse() without/with
the patch, since those will impact performance, too).

Thanks in advance for trying the patch, rick
ps: Attached again, just in case you don't already have it.

> It may raise portability concerns, but it does seem likely that other
> OS's to which the NFS code could potentially be ported have similar
> mechanisms these days.  Possibly it would be worthwhile to examine
> whether the NFS code could choose a slightly different point of
> abstraction.  Or, if that's undesirable, maybe asking the
> hypothetical
> person doing such a port to cross that bridge when they come to it is
> not unreasonable, since that would be the person most likely to be
> intimately familiar with the relevant details of both OS's.
> 
As I mentioned before, I am no longer concerned about portability.
The discussion about portability was meant to explain why the code
was written the way it was and, yes, I did note that "portability is
nice" but did not intend to imply that that should limit modifications
to the code that improve it for FreeBSD.

> Also, looking at GAWollman's patch, an mbuf+cluster allocator that
> kicks back a prewired iovec seems really handy.  Is that something
> that would be useful elsewhere in the kernel, or is NFS just kind of
> a
> special case because it's just moving data around, not across weird
> boundaries like device drivers and anything user mode-facing does?
> 
> Thanks!
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

------=_Part_17975523_770461322.1390955533437
Content-Type: text/x-patch; name=4kmcl.patch
Content-Disposition: attachment; filename=4kmcl.patch
Content-Transfer-Encoding: base64

LS0tIGZzL25mcy9uZnNwb3J0Lmguc2F2MgkyMDE0LTAxLTI2IDE4OjQzOjQ3LjAwMDAwMDAwMCAt
MDUwMAorKysgZnMvbmZzL25mc3BvcnQuaAkyMDE0LTAxLTI2IDE5OjA0OjI3LjAwMDAwMDAwMCAt
MDUwMApAQCAtMTUzLDE0ICsxNTMsMjcgQEAKIAkJCU1HRVRIRFIoKG0pLCBNX1dBSVRPSywgTVRf
REFUQSk7IAlcCiAJCX0gCQkJCQkJXAogCX0gd2hpbGUgKDApCi0jZGVmaW5lCU5GU01DTEdFVCht
LCB3KQlkbyB7IAkJCQkJXAotCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEpOyAJCQlcCi0J
CXdoaWxlICgobSkgPT0gTlVMTCApIHsgCQkJCVwKLQkJCSh2b2lkKSBuZnNfY2F0bmFwKFBaRVJP
LCAwLCAibmZzbWdldCIpOwlcCi0JCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEpOyAJCVwK
LQkJfSAJCQkJCQlcCi0JCU1DTEdFVCgobSksICh3KSk7CQkJCVwKKyNpZiBNSlVNUEFHRVNJWkUg
PiBNQ0xCWVRFUworI2RlZmluZQlORlNNQ0xHRVQobSwgdykJZG8gewkgCQkJCQlcCisJCShtKSA9
IG1fZ2V0amNsKE1fV0FJVE9LLCBNVF9EQVRBLCAwLCBNSlVNUEFHRVNJWkUpOwlcCisJCXdoaWxl
ICgobSkgPT0gTlVMTCkgewkgCQkJCVwKKwkJCSh2b2lkKW5mc19jYXRuYXAoUFpFUk8sIDAsICJu
ZnNtZ2V0Iik7CQlcCisJCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEpOwkgCQlcCisJCQlp
ZiAoKG0pICE9IE5VTEwpCQkJCVwKKwkJCQlNQ0xHRVQoKG0pLCAodykpOwkJCVwKKwkJfQkgCQkJ
CQkJXAogCX0gd2hpbGUgKDApCisjZWxzZQorI2RlZmluZQlORlNNQ0xHRVQobSwgdykJZG8gewkg
CQkJCQlcCisJCShtKSA9IG1fZ2V0amNsKE1fV0FJVE9LLCBNVF9EQVRBLCAwLCBNQ0xCWVRFUyk7
CQlcCisJCXdoaWxlICgobSkgPT0gTlVMTCkgewkgCQkJCVwKKwkJCSh2b2lkKW5mc19jYXRuYXAo
UFpFUk8sIDAsICJuZnNtZ2V0Iik7CQlcCisJCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEp
OwkgCQlcCisJCQlpZiAoKG0pICE9IE5VTEwpCQkJCVwKKwkJCQlNQ0xHRVQoKG0pLCAodykpOwkJ
CVwKKwkJfQkgCQkJCQkJXAorCX0gd2hpbGUgKDApCisjZW5kaWYKICNkZWZpbmUJTkZTTUNMR0VU
SERSKG0sIHcpIGRvIHsgCQkJCVwKIAkJTUdFVEhEUigobSksIE1fV0FJVE9LLCBNVF9EQVRBKTsJ
CVwKIAkJd2hpbGUgKChtKSA9PSBOVUxMICkgeyAJCQkJXAotLS0gZnMvbmZzc2VydmVyL25mc19u
ZnNkcG9ydC5jLnNhdjIJMjAxNC0wMS0yNiAxODo1NDoyOS4wMDAwMDAwMDAgLTA1MDAKKysrIGZz
L25mc3NlcnZlci9uZnNfbmZzZHBvcnQuYwkyMDE0LTAxLTI2IDE4OjU2OjA4LjAwMDAwMDAwMCAt
MDUwMApAQCAtNTY2LDggKzU2Niw3IEBAIG5mc3Zub19yZWFkbGluayhzdHJ1Y3Qgdm5vZGUgKnZw
LCBzdHJ1Y3QKIAlsZW4gPSAwOwogCWkgPSAwOwogCXdoaWxlIChsZW4gPCBORlNfTUFYUEFUSExF
TikgewotCQlORlNNR0VUKG1wKTsKLQkJTUNMR0VUKG1wLCBNX1dBSVRPSyk7CisJCU5GU01DTEdF
VChtcCwgTV9XQUlUT0spOwogCQltcC0+bV9sZW4gPSBORlNNU0laKG1wKTsKIAkJaWYgKGxlbiA9
PSAwKSB7CiAJCQltcDMgPSBtcDIgPSBtcDsKQEAgLTYzNiw4ICs2MzUsNyBAQCBuZnN2bm9fcmVh
ZChzdHJ1Y3Qgdm5vZGUgKnZwLCBvZmZfdCBvZmYsCiAJICovCiAJaSA9IDA7CiAJd2hpbGUgKGxl
ZnQgPiAwKSB7Ci0JCU5GU01HRVQobSk7Ci0JCU1DTEdFVChtLCBNX1dBSVRPSyk7CisJCU5GU01D
TEdFVChtLCBNX1dBSVRPSyk7CiAJCW0tPm1fbGVuID0gMDsKIAkJc2l6ID0gbWluKE1fVFJBSUxJ
TkdTUEFDRShtKSwgbGVmdCk7CiAJCWxlZnQgLT0gc2l6Owo=
------=_Part_17975523_770461322.1390955533437--

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 00:33:23 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 48279511
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 00:33:23 +0000 (UTC)
Received: from mail-ie0-x22e.google.com (mail-ie0-x22e.google.com
 [IPv6:2607:f8b0:4001:c03::22e])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 11DCE1A6F
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 00:33:23 +0000 (UTC)
Received: by mail-ie0-f174.google.com with SMTP id tp5so1357640ieb.5
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 16:33:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:content-type;
 bh=mqgPCBtc6Fj8VwcT/oeN2Ls+r46ZRnZ4VlM3kr4CjBM=;
 b=IfDZ0kRXmLGqc4/Gr+wcwD7tKXHleDcPsYU81luDdvxzIiDjkE8Sf65Hizg3sgcRyd
 tCmJgFCcKSv54mSMyJfB1MAXKOH3X/7k/c8vaLdeXQNmhzJ+G/Ux+rGFhfDSy38i/JF6
 7FBE6zFahzq6rez3R5agUkvUhSoOMk6asnXyaym9g6h9OYvlgoKVXFNBhWegK/EUuU2w
 NdVYbMU8I2PbdFxN8O8Fj7QskZfp+vCdNarIQGYLZuZt72yM1BSxd0d1Sv7gyYzuCAup
 PjCGcUMuzaQcztRh62v1FO6Gqsy3vvvFTH5T4zd5k+vUCJpOHHTT1odFDEkk4GwjRKfc
 D/+g==
MIME-Version: 1.0
X-Received: by 10.50.50.70 with SMTP id a6mr5684153igo.1.1390955602379; Tue,
 28 Jan 2014 16:33:22 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Tue, 28 Jan 2014 16:33:22 -0800 (PST)
In-Reply-To: <20140128234136.GJ13704@funkthat.com>
References: <20140128002826.GU13704@funkthat.com>
 <1415339672.17282775.1390872779067.JavaMail.root@uoguelph.ca>
 <20140128021450.GY13704@funkthat.com>
 <CABXB=RTa_buG0k7oFaQcKpE0EFyS+pxMzx6aKCzXkP0tQ9As2g@mail.gmail.com>
 <20140128234136.GJ13704@funkthat.com>
Date: Tue, 28 Jan 2014 19:33:22 -0500
X-Google-Sender-Auth: GeJ3Uavi6pnu4OAknw5gfAvbplQ
Message-ID: <CABXB=RQd+UQLvZ=Lf-yuG==Xss+NxGjZnBupVzM0b_QW9nuLRw@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: freebsd-net@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 00:33:23 -0000

On Tue, Jan 28, 2014 at 6:41 PM, John-Mark Gurney <jmg@funkthat.com> wrote:
> It might be better to move most of m_getm's docs under m_getm2, and
> document that m_getm is just m_getm2 w/ M_PKTHDR flag set.
>
> Could you also document that only M_PKTHDR and M_EOR are valid
> flags for m_getm2?

OK, try this:

http://pastebin.com/39kwExsc

> You are correct.. the problem is that the MLINK isn't setup in the
> Makefile, so:
> $ man m_append
> No manual entry for m_append

Oh drat, I just saw another one of those the other day and didn't make
a note of it.

Oh yeah, it's uma_zalloc_arg.  Thanks, shell history!

> It's common to use a macro when the change isn't complicated, i.e.
> just adding a flag...

Sure, it's a little unsettling that the man page is explicitly
separated into macros and functions and this macro is the second entry
in under "The functions are:."  It's like seeing a library book
shelved in the wrong section.

Probably m_getm was a function for a long time and got turned into a
macro when m_getm2 was born.  The documentation was either not updated
or intentionally left as-is.  Since I didn't know which was the case,
I've left it where it is for now.

Thanks!

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 02:37:27 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id D5A45506;
 Wed, 29 Jan 2014 02:37:27 +0000 (UTC)
Received: from mail-ie0-x22d.google.com (mail-ie0-x22d.google.com
 [IPv6:2607:f8b0:4001:c03::22d])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 98C601298;
 Wed, 29 Jan 2014 02:37:27 +0000 (UTC)
Received: by mail-ie0-f173.google.com with SMTP id e14so1520249iej.4
 for <multiple recipients>; Tue, 28 Jan 2014 18:37:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=ImOtefEV5UD+d9ZE3OvxNvydaThRm3yhcrhw5wSimp4=;
 b=zC2WF2iU4zvS/6mdBsIzqeA1dQTO54A7F3w8WoVEDhrWo+Wh7CTix7oUrlfmCcJZDO
 5tyx2jQGVQ4YTlipRJATAbKP0qaLAE7fBFjwnsb0PNKMQ+L4Ln9DCdZucgXZJHDvNg4P
 LIok2moHefxSjF7L0bvFjglQRyfKSL8sgWeUkoA2qlxqmBDWliwOEZzJsFpjlsKRLDKi
 XNCoFy/mDJBU6pVKoLYfuECr85oc3G1MoK7qmCeJI+pW189CQ2UWavLYM9HMLNQIGeio
 yeL8fOsOv2qP1AAM/NsrsyPxUpcl6t2no2bsyHSEQFDeN0h9sFMHghN6Gj3BxyVItCnJ
 b1Zg==
MIME-Version: 1.0
X-Received: by 10.51.17.101 with SMTP id gd5mr6124613igd.25.1390963046931;
 Tue, 28 Jan 2014 18:37:26 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Tue, 28 Jan 2014 18:37:26 -0800 (PST)
In-Reply-To: <312973812.17975525.1390955533440.JavaMail.root@uoguelph.ca>
References: <CABXB=RQksyZq43=jLw3wJT5vLzuK4h5cgE=Lj4caq1RgOBa8gA@mail.gmail.com>
 <312973812.17975525.1390955533440.JavaMail.root@uoguelph.ca>
Date: Tue, 28 Jan 2014 21:37:26 -0500
X-Google-Sender-Auth: VVC2ujiT2pgYylxeDvIFsMdka8k
Message-ID: <CABXB=RQj2evY7=Q0_7vbHrQrH3fPkW774gjNxWLwWbRXMzjdDA@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 02:37:27 -0000

On Tue, Jan 28, 2014 at 7:32 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Hopefully Garrett and/or you will be able to do some testing of it
> and report back w.r.t. performance gains, etc.

OK, it has seen light testing.

As predicted the vtnet drops are eliminated and CPU load is reduced.

The performance is also improved:

Test Before After
SeqWr 1506 7461
SeqRd 566 192015
RndRd 602 218730
RndWr 44 13972

All numbers in kiB/sec.

There were initially still some problems with lousy hostcache values
on the client after the test, which is what causes the iperf
performance to tank after the NFS test, but after a reboot of both
sides and fresh retest, I haven't reproduced that again.  If it comes
back, I'll try to figure out what's going on.

But this definitely looks like a move in the right direction.

Thanks!

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 05:54:46 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id E82B2B39
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 05:54:46 +0000 (UTC)
Received: from mail-ve0-x22a.google.com (mail-ve0-x22a.google.com
 [IPv6:2607:f8b0:400c:c01::22a])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9BE641298
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 05:54:46 +0000 (UTC)
Received: by mail-ve0-f170.google.com with SMTP id cz12so902978veb.15
 for <freebsd-net@freebsd.org>; Tue, 28 Jan 2014 21:54:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=berentweb.com; s=google;
 h=mime-version:reply-to:sender:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=4qqJvkFGjgTsV5aDsFVuX8N951vJsfxA7rjgL8lPd24=;
 b=MhjPuRH8Aenf5SZ8h9KzlDHNGFDuGSEG1C3+TFBZ0qB/XrfHC6O+wxhhoFWKB+iyJb
 TXXWTd32xLi3Jg1iFpL3Z2QJ1ezxaguVJb/i/S0MdEYW2GPfVqxqAKXPnCDKvJm75kd/
 pEjvJuK7qy2pkiQL7SmYO33PskihRw5A5m1mw=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:reply-to:sender:in-reply-to
 :references:date:message-id:subject:from:to:cc:content-type;
 bh=4qqJvkFGjgTsV5aDsFVuX8N951vJsfxA7rjgL8lPd24=;
 b=d8RDvz/xMkHE9BX+WdcZ+92R70NnF3hXJlnjL0ImSNEjpc6L0p6O4bV7MBbvFBFZ3P
 NyirM+uVIW3k7cNRmNtHt4joE5BQ96x85rbNkaovPgn6e16K/23BN/XlxcUcyjcpZy8s
 EztOdslRe+/l5w1FANLG0bRIR7ArY6SM6u+NSnWVEwB1QebuQU/X9gDuOY1HjJbLCboI
 EkTWvlcpoSc2C5toAA+Qx6MmEjKfyMHwRLqHOpyAeHGtMZEzHgJ3XGCRlwuCkUSBPcGI
 704WoTBwxuD/5LVSRyhWtcyqHTTRglbffUyX5+vAMbn7d+TVV8/U2o8V9MAKrU2/Uuvn
 YeuQ==
X-Gm-Message-State: ALoCoQk3+HZ1d91YcG3pSsvyu++ZtYU69k3cTU92gvSXoCBBK6F+x4UV6YTmD6f7O3qXP34L+fBt
MIME-Version: 1.0
X-Received: by 10.220.2.199 with SMTP id 7mr4875082vck.4.1390974885664; Tue,
 28 Jan 2014 21:54:45 -0800 (PST)
Sender: rsb@berentweb.com
Received: by 10.220.146.145 with HTTP; Tue, 28 Jan 2014 21:54:45 -0800 (PST)
X-Originating-IP: [83.66.213.127]
In-Reply-To: <CA+P_MZFZ8Ue46VZOkG9sL2X_KqyXQZZb7B5z2Mm+r+xca6UUPQ@mail.gmail.com>
References: <1390909590119-5880672.post@n5.nabble.com>
 <52E7A9D8.30604@freebsd.org>
 <CAPSTskviFBMHUXmWhTnGB13ZEYsyQm2_azHNJ65VG42oQLpmpQ@mail.gmail.com>
 <CA+P_MZFZ8Ue46VZOkG9sL2X_KqyXQZZb7B5z2Mm+r+xca6UUPQ@mail.gmail.com>
Date: Wed, 29 Jan 2014 07:54:45 +0200
X-Google-Sender-Auth: lIs5T0MIqsFGhXM35g8y-kb5m50
Message-ID: <CAPSTskts0+c+aWE7CbFKMBnHU=NG78TKRU8dA-AFUgzaJ1u7_g@mail.gmail.com>
Subject: Re: Jails on fib problem
From: Beeblebrox <zaphod@berentweb.com>
To: Nikolay Denev <nike_d@cytexbg.com>
Content-Type: text/plain; charset=ISO-8859-1
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: zaphod@berentweb.com
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 05:54:47 -0000

> You can't use 127/8 addresses and expect them to be routed/forwarded.
> See rfc1122.
> --Nikolay

Thank you very much Nikolay. To correct the setup, I could
a. Remove  fib-1 thus placing everything on fib-0 and jail IP's remain
at 127/32 or
b. Migrate jails and fib-1 to a 192.168 range.
Will I have various problems if I try to reconfigure as described in (a) ?
Regards

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 12:22:58 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 5428A8AD
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 12:22:58 +0000 (UTC)
Received: from mail-ve0-x22e.google.com (mail-ve0-x22e.google.com
 [IPv6:2607:f8b0:400c:c01::22e])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 045CE12A3
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 12:22:57 +0000 (UTC)
Received: by mail-ve0-f174.google.com with SMTP id pa12so1097980veb.19
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 04:22:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=berentweb.com; s=google;
 h=mime-version:reply-to:sender:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=NeSbL7nm53wNG66yBirEJtIgQwFYAgvI1rUEwVep/6s=;
 b=hAdxvDQlQ1SGF0cKlPHhbR5q8qHyaaAuDRsdFbGf5d86IC2MRsqvns431DroVb6J27
 xdsk4TpI0zrBGsxi9d+aL1yDP02aHjPuT5Hx9bn9PoCKHDckNKH+l+OXkZFl5cQYiqbm
 /sjyDmodyaLgyeTnV7O2OHoAaGnvgxLwzyFPw=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:reply-to:sender:in-reply-to
 :references:date:message-id:subject:from:to:cc:content-type;
 bh=NeSbL7nm53wNG66yBirEJtIgQwFYAgvI1rUEwVep/6s=;
 b=mZY14CMGKRrZQdyRQDOWaShfvjfxYZQFW01qUgzfBPGlVEY82Nl63+F1P98UmqIT8h
 uIXnrtONLBc8KehmUlCIpxarDGGQMHGxQVmcJaHHLBIVmaGW6DZTe45ZFpcZbcNlCqkQ
 hN8iJjPaIf1pG0zZX1/lEjVf2qTZP6yBHX/gE74efO4xIz+aj4hgtsM4tupWAeqYyEmB
 nqhej2hP6Y4UWPT5nDsbsacE1pHCZocTYcu75Evogx44kRoCUtuLi6lKYCmQWYZkZcrr
 scdqVzC+haaS9rPY23biJCkgLvJbNM2IvnppY7hIA8D90q+wJGeswEj+D5VkaAihoB6C
 OS1g==
X-Gm-Message-State: ALoCoQlB680rsNRGx46QJtgiwYKyI6aOU66fx/V9GsatpnsK9L28ha2kFPfaY1LmtkdugMrhKh+3
MIME-Version: 1.0
X-Received: by 10.59.0.193 with SMTP id ba1mr6328287ved.12.1390998176957; Wed,
 29 Jan 2014 04:22:56 -0800 (PST)
Sender: rsb@berentweb.com
Received: by 10.220.146.145 with HTTP; Wed, 29 Jan 2014 04:22:56 -0800 (PST)
X-Originating-IP: [83.66.213.127]
In-Reply-To: <CAPSTskts0+c+aWE7CbFKMBnHU=NG78TKRU8dA-AFUgzaJ1u7_g@mail.gmail.com>
References: <1390909590119-5880672.post@n5.nabble.com>
 <52E7A9D8.30604@freebsd.org>
 <CAPSTskviFBMHUXmWhTnGB13ZEYsyQm2_azHNJ65VG42oQLpmpQ@mail.gmail.com>
 <CA+P_MZFZ8Ue46VZOkG9sL2X_KqyXQZZb7B5z2Mm+r+xca6UUPQ@mail.gmail.com>
 <CAPSTskts0+c+aWE7CbFKMBnHU=NG78TKRU8dA-AFUgzaJ1u7_g@mail.gmail.com>
Date: Wed, 29 Jan 2014 14:22:56 +0200
X-Google-Sender-Auth: ln6vnDeEPagQjdMb25r91mE8380
Message-ID: <CAPSTsktw2Jyn7_K5zHy1kzOKLkSNbR_KCZKZJauOOV_1Vwx1tQ@mail.gmail.com>
Subject: Re: Jails on fib problem
From: Beeblebrox <zaphod@berentweb.com>
To: Nikolay Denev <nike_d@cytexbg.com>
Content-Type: text/plain; charset=ISO-8859-1
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: zaphod@berentweb.com
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 12:22:58 -0000

Since no answer as to a-or-b option, I assumed (a) option was also
flawed, so I went with (b). I moved the jails to a 192 address family.
Current rc.conf:
cloned_interfaces="lo2"
ifconfig_lo2="inet 192.168.2.110/28"
static_routes="jail default"
route_jail="default 192.168.2.110 -fib 1"
route_default="default 192.168.1.1"

 # setfib 1 netstat -rn
Destination        Gateway            Flags    Netif Expire
default            192.168.2.110      UGS       lo2
127.0.0.1          link#3             UH        lo0
192.168.1.0/24     link#1             U         re0
192.168.2.99       link#4             UH        lo2    (privoxy)
192.168.2.100      link#4             UH        lo2    (http cache)
192.168.2.110      link#4             UH        lo2
192.168.56.0/28    link#6             U      vboxnet0

Traffic for any internet IP gets passed to httpcache -> privoxy jail
(99), but does not get forwarded to the 192.168.1.1 gateway. If I try
to access the 192.168.1.1 adsl modem page, this does come up correctly
(I presume because it is within defined address range on the routing
table).
What am I missing so that traffic from jail knows to exit from re0 and
on to default gateway? In pf.conf I have one NAT rule - Should I be
natting on lo2 as well?
 nat on $ExtIf from !($ExtIf) -> $ExtIf

Regards.

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 13:12:17 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3D12BF4C
 for <net@freebsd.org>; Wed, 29 Jan 2014 13:12:17 +0000 (UTC)
Received: from frv190.fwdcdn.com (frv190.fwdcdn.com [212.42.77.190])
 (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id EAF5716EB
 for <net@freebsd.org>; Wed, 29 Jan 2014 13:12:16 +0000 (UTC)
Received: from [10.10.1.29] (helo=frv197.fwdcdn.com)
 by frv190.fwdcdn.com with esmtp ID 1W8UWK-000NVB-J9
 for net@freebsd.org; Wed, 29 Jan 2014 14:45:32 +0200
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ukr.net;
 s=ffe; 
 h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-Id:Cc:To:Subject:From:Date;
 bh=J78dzmwjnhMXZAuV2RB1+jsYGeIAtpT9+mcostG3u3A=; 
 b=dYhXvjbulFz2cZp9sI02HYtSrjNl9hWPSQzpZRcfX4HptjeinrS3IjF+eSLe3lCRxgyxjxUJU133BcC27NuuaoHXvXMz4hnMuDsXXIsbHtG8rIJzVBilG2duzeFeEkJOdzgVh095OkqgLBBuIllwwpN9QI3GdN3kyHBAMktQ9sc=;
Received: from [10.10.10.35] (helo=frv35.ukr.net)
 by frv197.fwdcdn.com with smtp ID 1W8UWD-0008v9-7c
 for net@freebsd.org; Wed, 29 Jan 2014 14:45:25 +0200
Date: Wed, 29 Jan 2014 14:45:24 +0200
From: Vladislav Prodan <universite@ukr.net>
Subject: Necessary to implement static NAT 1:1
To: questions@freebsd.org
X-Mailer: mail.ukr.net 5.0
Message-Id: <1390999493.115887823.pfbg2ep5@frv35.ukr.net>
MIME-Version: 1.0
Received: from universite@ukr.net by frv35.ukr.net;
 Wed, 29 Jan 2014 14:45:24 +0200
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: binary
Content-Disposition: inline
Cc: "net@freebsd.org" <net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 13:12:17 -0000

Necessary to implement static NAT 1:1

10.1.2.3 -> 100.1.2.3
10.1.2.4 -> 100.1.2.4
10.1.2.5 -> 100.1.2.5
10.1.2.6 -> 100.1.2.6
...
IP addresses such an over 20k
prompt you implement?


-- 
Vladislav V. Prodan 
System & Network Administrator 
http://support.od.ua 
+380 67 4584408, +380 99 4060508
VVP88-RIPE

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 15:25:48 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 6829DD68
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 15:25:48 +0000 (UTC)
Received: from mail-ve0-x229.google.com (mail-ve0-x229.google.com
 [IPv6:2607:f8b0:400c:c01::229])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1A74B15D3
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 15:25:47 +0000 (UTC)
Received: by mail-ve0-f169.google.com with SMTP id oy12so1342268veb.28
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 07:25:47 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=berentweb.com; s=google;
 h=mime-version:reply-to:sender:in-reply-to:references:date:message-id
 :subject:from:cc:content-type;
 bh=qBWtuonJSeSW9j90vQXkW/So1WD/pth3iNzrF38T8To=;
 b=S7nh81WmP/nuYrv6n21YvPiQPkUN4xK+k5H7N7kNuVzCxnFMHhy8WBnTDfbOEHbx5m
 GwHK79L70itSoYzxPsHkUn+9V3DAxM/OsXh454zSF6ydffozb+41rHQTcMxEj5BpsuM8
 3dRft8Tdu6sP5wyNxBfmpkIs/zPIEvZOPodtE=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:reply-to:sender:in-reply-to
 :references:date:message-id:subject:from:cc:content-type;
 bh=qBWtuonJSeSW9j90vQXkW/So1WD/pth3iNzrF38T8To=;
 b=JWiPlP7C9sfNQXBGIhSI+RvqQzrh44Kma3lDOAmkYtPI/DoSSbawc14zyx3ebTAyxJ
 OYewcDmJ88sQxAf+CT0yYuajrVK21hRxackxxicR6mLZc5Eernri59WhXXE52zppvWY8
 1E3nZzaYPd1CiZ6LO9iL4hnvl/McIvgDmHJdQU/KJasxOuPZf9kn71JcR355zPs+7WgM
 7temX+UcQ49VRTI7mhmv2Tp8CRHC0tpUTFdnw8d7UfEuccV2472ljUTlWVZSChroJJGu
 v4qwFBeTkswGPUOvqbiZMw11MQjzEQcUiPxTewTQnLAQhwLgmKwV4Gb5HkgiSnFmce0L
 D8zA==
X-Gm-Message-State: ALoCoQmuLOwUJTeiQIIDa876MrUetdW5Okk28xH7wlDxMC0yDRmEAg3Q8M21TlyUp+PKteQisGyP
MIME-Version: 1.0
X-Received: by 10.52.76.105 with SMTP id j9mr63825vdw.52.1391009147000; Wed,
 29 Jan 2014 07:25:47 -0800 (PST)
Sender: rsb@berentweb.com
Received: by 10.220.146.145 with HTTP; Wed, 29 Jan 2014 07:25:46 -0800 (PST)
X-Originating-IP: [83.66.213.78]
In-Reply-To: <CAPSTsktw2Jyn7_K5zHy1kzOKLkSNbR_KCZKZJauOOV_1Vwx1tQ@mail.gmail.com>
References: <1390909590119-5880672.post@n5.nabble.com>
 <52E7A9D8.30604@freebsd.org>
 <CAPSTskviFBMHUXmWhTnGB13ZEYsyQm2_azHNJ65VG42oQLpmpQ@mail.gmail.com>
 <CA+P_MZFZ8Ue46VZOkG9sL2X_KqyXQZZb7B5z2Mm+r+xca6UUPQ@mail.gmail.com>
 <CAPSTskts0+c+aWE7CbFKMBnHU=NG78TKRU8dA-AFUgzaJ1u7_g@mail.gmail.com>
 <CAPSTsktw2Jyn7_K5zHy1kzOKLkSNbR_KCZKZJauOOV_1Vwx1tQ@mail.gmail.com>
Date: Wed, 29 Jan 2014 17:25:46 +0200
X-Google-Sender-Auth: kRr8nH3JEYnx1S6iqVMYT08EpRo
Message-ID: <CAPSTsktfHeCKqtAm3xhic_77iqaFz=WNybz4bKTow=+mm14V3A@mail.gmail.com>
Subject: Re: Jails on fib problem
From: Beeblebrox <zaphod@berentweb.com>
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: zaphod@berentweb.com
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 15:25:48 -0000

I forgot about this details - again: From ifconfig man page's FIB section:
The FIB is not inherited, e.g. vlans or other sub-interfaces will use
the default FIB (0) irrespective of the parent interface's FIB.

What alternatives are there to get this setup working? FIB it seems,
is not the answer?

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 17:52:09 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B8EA5F62;
 Wed, 29 Jan 2014 17:52:09 +0000 (UTC)
Received: from mail-wg0-x22f.google.com (mail-wg0-x22f.google.com
 [IPv6:2a00:1450:400c:c00::22f])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1C1791431;
 Wed, 29 Jan 2014 17:52:08 +0000 (UTC)
Received: by mail-wg0-f47.google.com with SMTP id m15so4292086wgh.14
 for <multiple recipients>; Wed, 29 Jan 2014 09:52:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:from:to:subject:in-reply-to:references:user-agent:date
 :message-id:mime-version:content-type:content-transfer-encoding;
 bh=BW+s0IikF17BwEWylDYMo/dIXE7UbD34hNvQceEjF4Y=;
 b=kBn8LIMYJ88b3FmLh7YtPHYpq7pZp66z+V3FeUOfWJfPEVB2UvtOiq/UJqS4Y1Fhvt
 gg8KUbgFMQn7B7EmY49MhOTMqr+ltg8iDHaVacxN2GokEoeTHtHGtFeKzlQ5EF0+YBlP
 2iOhthjPP+2xv5fsLNwo/JIppj/fB07Sc5d0KCn8CR6Xjd8VbfpWEpdohj1Xbdz/wGcF
 v9Up0RErMpKRHclkVjC28fIx09htUaU/vrGNdpwhbzNgdtvDafpPBebh4CI7jExXLm7Q
 li28ntjeaWEveSdWE11Qi2qJv+bTU8mlxu7W/s7uUXAHaiOvSAmbOk5cMZgNNlZXH/Hf
 rukA==
X-Received: by 10.180.207.15 with SMTP id ls15mr6586868wic.50.1391017927580;
 Wed, 29 Jan 2014 09:52:07 -0800 (PST)
Received: from srvbsdfenssv.interne.associated-bears.org
 (LCaen-151-92-21-48.w217-128.abo.wanadoo.fr. [217.128.200.48])
 by mx.google.com with ESMTPSA id cm5sm6911830wid.5.2014.01.29.09.52.06
 for <multiple recipients>
 (version=TLSv1 cipher=RC4-SHA bits=128/128);
 Wed, 29 Jan 2014 09:52:07 -0800 (PST)
Sender: Eric Masson <emss.mail@gmail.com>
Received: from srvbsdfenssv.interne.associated-bears.org (localhost
 [127.0.0.1])
 by srvbsdfenssv.interne.associated-bears.org (Postfix) with ESMTP id
 DB51ACF0CB; Wed, 29 Jan 2014 18:52:05 +0100 (CET)
X-Virus-Scanned: amavisd-new at interne.associated-bears.org
Received: from srvbsdfenssv.interne.associated-bears.org ([127.0.0.1])
 by srvbsdfenssv.interne.associated-bears.org
 (srvbsdfenssv.interne.associated-bears.org [127.0.0.1]) (amavisd-new,
 port 10024)
 with ESMTP id nwz9VeNOhd96; Wed, 29 Jan 2014 18:52:05 +0100 (CET)
Received: by srvbsdfenssv.interne.associated-bears.org (Postfix,
 from userid 1001)
 id 0402ACF1AF; Wed, 29 Jan 2014 18:52:05 +0100 (CET)
From: Eric Masson <emss@free.fr>
To: Mailing List FreeBSD Network <freebsd-net@FreeBSD.org>,
 Mailing List FreeBSD ipfw <freebsd-ipfw@freebsd.org>
Subject: Re: [FreeBSD 10.0] nat before vpn, incoming packets not translated
In-Reply-To: <868uu4rshh.fsf@srvbsdfenssv.interne.associated-bears.org> (Eric
 Masson's message of "Sat, 25 Jan 2014 16:28:10 +0100")
References: <868uu4rshh.fsf@srvbsdfenssv.interne.associated-bears.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (berkeley-unix)
X-Operating-System: FreeBSD 9.2-RELEASE-p3 amd64
Date: Wed, 29 Jan 2014 18:52:04 +0100
Message-ID: <861tzqwu9n.fsf@srvbsdfenssv.interne.associated-bears.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 17:52:09 -0000

Eric Masson <emss@free.fr> writes:

Hi,

No idea on this subject ?

forwarding to freebsd-ipfw.

Regards

Éric Masson

> Hi,
>
> I've setup a lab to experiment nat before ipsec scenario.
> Architecture :
> - 3 host only interfaces have been set up on the host
> - 4 FreeBSD10 guests have been set up :
>   - 2 clients connected to their respective gateways via dedicated host
>     only interfaces.
>   - 2 gateways connected together via dedicated host only interface
>
> Client 1 setup :
> <----------------------------------------------------------------->
> emss@client1:~ % more /etc/rc.conf
> hostname="client1"
> keymap="fr.iso.acc.kbd"
> ifconfig_em0="inet 192.168.11.100 netmask 255.255.255.0"
> ifconfig_em0_ipv6="inet6 accept_rtadv"
> defaultrouter="192.168.11.15"
> sshd_enable="YES"
> dumpdev="AUTO"
> sendmail_enable="NO"
> sendmail_submit_enable="NO"
> sendmail_outbound_enable="NO"
> sendmail_msp_queue_enable="NO"
> <----------------------------------------------------------------->
>
> Gateway 1 setup :
> <----------------------------------------------------------------->
> emss@gateway1:~ % more /etc/rc.conf
> hostname="gateway1"
> keymap="fr.iso.acc.kbd"
> ifconfig_em1="inet 192.168.11.15 netmask 255.255.255.0"
> ifconfig_em1_ipv6="inet6 accept_rtadv"
> ifconfig_em0="inet 10.0.0.5 netmask 255.255.255.0"
> gateway_enable="YES"
> ipsec_enable="YES"
> ipsec_file="/etc/ipsec.conf"
> firewall_enable="YES"
> firewall_script="/etc/ipfw.rules"
> firewall_logging="YES"
> sshd_enable="YES"
> # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
> dumpdev="AUTO"
> sendmail_enable="NO"
> sendmail_submit_enable="NO"
> sendmail_outbound_enable="NO"
> sendmail_msp_queue_enable="NO"
> emss@gateway1:~ % more /etc/ipfw.rules
> #!/bin/sh
> cmd="/sbin/ipfw"
> $cmd -f flush
> $cmd    add 00100 nat 100 all from 192.168.11.0/24 to 192.168.21.0/24
> $cmd    nat 100 config log ip 172.16.0.1 reverse
> emss@gateway1:~ % more /etc/ipsec.conf
> flush;
> spdflush;
>
> add 10.0.0.5 10.0.0.6 esp 0x1000 -E 3des-cbc "123456789012345678901234";
> add 10.0.0.6 10.0.0.5 esp 0x1001 -E 3des-cbc "432109876543210987654321";
>
> add 10.0.0.5 10.0.0.6 ipcomp 0x2000 -C deflate;
> add 10.0.0.6 10.0.0.5 ipcomp 0x2001 -C deflate;
>
> spdadd 192.168.21.0/24 172.16.0.1/32 any -P in ipsec
>   ipcomp/tunnel/10.0.0.6-10.0.0.5/require
>   esp/tunnel/10.0.0.6-10.0.0.5/require;
>
> spdadd 172.16.0.1/32 192.168.21.0/24 any -P out ipsec
>   ipcomp/tunnel/10.0.0.5-10.0.0.6/require
>   esp/tunnel/10.0.0.5-10.0.0.6/require;
> emss@gateway1:~ % more /boot/loader.conf
> ipfw_load="YES"
> ipfw_nat_load="YES"
>
> net.inet.ip.fw.default_to_accept="1"
> <----------------------------------------------------------------->
>
> Gateway 2 setup :
> <----------------------------------------------------------------->
> emss@gateway2:~ % more /etc/rc.conf
> hostname="gateway2"
> keymap="fr.iso.acc.kbd"
> ifconfig_em1="inet 10.0.0.6 netmask 255.255.255.0"
> ifconfig_em0="inet 192.168.21.15 netmask 255.255.255.0"
> ifconfig_em0_ipv6="inet6 accept_rtadv"
> gateway_enable="YES"
> ipsec_enable="YES"
> ipsec_file="/etc/ipsec.conf"
> sshd_enable="YES"
> # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
> dumpdev="AUTO"
> sendmail_enable="NO"
> sendmail_submit_enable="NO"
> sendmail_outbound_enable="NO"
> sendmail_msp_queue_enable="NO"
> emss@gateway2:~ % more /etc/ipsec.conf
> flush;
> spdflush;
>
> add 10.0.0.5 10.0.0.6 esp 0x1000 -E 3des-cbc "123456789012345678901234";
> add 10.0.0.6 10.0.0.5 esp 0x1001 -E 3des-cbc "432109876543210987654321";
>
> add 10.0.0.5 10.0.0.6 ipcomp 0x2000 -C deflate;
> add 10.0.0.6 10.0.0.5 ipcomp 0x2001 -C deflate;
>
> spdadd 192.168.21.0/24 172.16.0.1/32 any -P out ipsec
>   ipcomp/tunnel/10.0.0.6-10.0.0.5/require
>   esp/tunnel/10.0.0.6-10.0.0.5/require;
>
> spdadd 172.16.0.1/32 192.168.21.0/24 any -P in ipsec
>   ipcomp/tunnel/10.0.0.5-10.0.0.6/require
>   esp/tunnel/10.0.0.5-10.0.0.6/require;
> <----------------------------------------------------------------->
>
> Client 2 setup :
> <----------------------------------------------------------------->
> emss@client2:~ % more /etc/rc.conf
> hostname="client2"
> keymap="fr.iso.acc.kbd"
> ifconfig_em0="inet 192.168.21.100 netmask 255.255.255.0"
> ifconfig_em0_ipv6="inet6 accept_rtadv"
> defaultrouter="192.168.21.15"
> sshd_enable="YES"
> # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
> dumpdev="AUTO"
> sendmail_enable="NO"
> sendmail_submit_enable="NO"
> sendmail_outbound_enable="NO"
> sendmail_msp_queue_enable="NO"
> <----------------------------------------------------------------->
>
> Test setup by pinging client2 from client1 :
>
> On client1 :
> emss@client1:~ % ping 192.168.21.100
> PING 192.168.21.100 (192.168.21.100): 56 data bytes
>
> On gateway1 inside interface :
>
> root@gateway1:~ # tcpdump -i em1
> 17:16:08.600154 IP 192.168.11.100 > 192.168.21.100: ICMP echo request, id 10499, seq 7207, length 64
> 17:16:08.600660 IP 192.168.11.100 > 192.168.21.100: ICMP echo request, id 59651, seq 213, length 64
> ...
>
> On gateway1 outside interface :
> root@gateway1:~ # tcpdump -i em0
> 17:16:48.501317 IP 10.0.0.5 > 10.0.0.6: ESP(spi=0x00001000,seq=0x1ed4), length 128
> 17:16:48.501612 IP 10.0.0.5 > 10.0.0.6: ESP(spi=0x00001000,seq=0x1ed5), length 128
> 17:16:48.502665 IP 10.0.0.6 > 10.0.0.5: ESP(spi=0x00001001,seq=0x1e67), length 128
> 17:16:48.502938 IP 10.0.0.6 > 10.0.0.5: ESP(spi=0x00001001,seq=0x1e68), length 128
> ...
>
> On client2 :
> root@client2:~ # tcpdump -i em0
> 17:14:17.671181 IP 172.16.0.1 > 192.168.21.100: ICMP echo request, id 59651, seq 107, length 64
> 17:14:17.671230 IP 192.168.21.100 > 172.16.0.1: ICMP echo reply, id 59651, seq 107, length 64
> ...
>
> So, the only remaining issue is that gateway1 doesn't nat back ipsec
> decapsulated packets (if no nat in scenario, everything works fine).
>
> Setting net.inet.ip.fw.one_pass to 0 doesn't change anything.
>
> Any idea, please ?
>
> Regards
>
> Éric Masson

-- 
 Intéressant votre témoignage, quoique un peu long.
 Pourriez-vous en écrire davantage !
 -+- LL in GNU n'a qu'un mot à dire : assez, encore ! -+-

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 18:08:46 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4B2BA4B0
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 18:08:46 +0000 (UTC)
Received: from exprod6og122.obsmtp.com (exprod6og122.obsmtp.com [64.18.1.238])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id DBB421572
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 18:08:45 +0000 (UTC)
Received: from osprey.verisign.com ([216.168.239.75]) (using TLSv1) by
 exprod6ob122.postini.com ([64.18.5.12]) with SMTP
 ID DSNKUulDptsto1iLNMsm37kh2BdptmO2lk3b@postini.com;
 Wed, 29 Jan 2014 10:08:45 PST
Received: from BRN1WNEXCHM01.vcorp.ad.vrsn.com
 (brn1wnexchm01.vcorp.ad.vrsn.com [10.173.152.255])
 by osprey.verisign.com (8.13.6/8.13.4) with ESMTP id s0TI8ZUH017321
 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL)
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 13:08:38 -0500
Received: from BRN1WNEXMBX01.vcorp.ad.vrsn.com ([::1]) by
 BRN1WNEXCHM01.vcorp.ad.vrsn.com ([::1]) with mapi id 14.02.0342.003; Wed, 29
 Jan 2014 13:08:34 -0500
From: "Bentkofsky, Michael" <MBentkofsky@verisign.com>
To: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject: RE: kern/176446: [netinet] [patch] Concurrency in ixgbe
Thread-Topic: kern/176446: [netinet] [patch] Concurrency in ixgbe
Thread-Index: Ac8dG8u9zjMR+0+wTzaH1G8/XnwkLAAAUZjQ
Date: Wed, 29 Jan 2014 18:08:33 +0000
Message-ID: <080FBD5B7A09F845842100A6DE79623346E60505@BRN1WNEXMBX01.vcorp.ad.vrsn.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.173.152.4]
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 18:08:46 -0000

I believe this has been fixed in r240968.


From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 18:54:09 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 69504390
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 18:54:09 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu
 [IPv6:2001:470:8b2d:1e1c:21b:21ff:feb8:d7b0])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2229519A4
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 18:54:09 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1])
 by khavrinen.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s0TIs7Qp047007
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256
 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA)
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 13:54:08 -0500 (EST)
 (envelope-from wollman@khavrinen.csail.mit.edu)
Received: (from wollman@localhost)
 by khavrinen.csail.mit.edu (8.14.7/8.14.7/Submit) id s0TIs7K5047004;
 Wed, 29 Jan 2014 13:54:07 -0500 (EST) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
Date: Wed, 29 Jan 2014 13:54:07 -0500
From: Garrett Wollman <wollman@csail.mit.edu>
To: freebsd-net@freebsd.org
Subject: Big physically contiguous mbuf clusters
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (khavrinen.csail.mit.edu [127.0.0.1]); Wed, 29 Jan 2014 13:54:08 -0500 (EST)
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 18:54:09 -0000

Resolved: that mbuf clusters longer than one page ought not be
supported.  There is too much physical-memory fragmentation for them
to be of use on a moderately active server.  9k mbufs are especially
bad, since in the fragmented case they waste 3k per allocation.

-GAWollman


From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 19:23:00 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id D119EA18
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 19:23:00 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id ABFEF1D0A
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 19:23:00 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s0TJLcuF002771
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Wed, 29 Jan 2014 11:21:38 -0800 (PST)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s0TJLbe6002770;
 Wed, 29 Jan 2014 11:21:37 -0800 (PST) (envelope-from jmg)
Date: Wed, 29 Jan 2014 11:21:37 -0800
From: John-Mark Gurney <jmg@funkthat.com>
To: Garrett Wollman <wollman@csail.mit.edu>
Subject: Re: Big physically contiguous mbuf clusters
Message-ID: <20140129192137.GF93141@funkthat.com>
Mail-Followup-To: Garrett Wollman <wollman@csail.mit.edu>,
 freebsd-net@freebsd.org
References: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Wed, 29 Jan 2014 11:21:38 -0800 (PST)
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 19:23:00 -0000

Garrett Wollman wrote this message on Wed, Jan 29, 2014 at 13:54 -0500:
> Resolved: that mbuf clusters longer than one page ought not be
> supported.  There is too much physical-memory fragmentation for them
> to be of use on a moderately active server.  9k mbufs are especially
> bad, since in the fragmented case they waste 3k per allocation.

I agree, but I am split on removing the code as there are still very
broken controllers that may require them, though in those cases, it
might be helpful to have a tunable that lets you set how many jumbo
frames are allocated at boot, and these pages are never released back
to the system...

We definately need to fix all the drivers that use MJUM9BYTES which
apparently are quite a few:
http://fxr.watson.org/fxr/ident?im=excerpts;i=MJUM9BYTES

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 19:26:01 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 34251BCC
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 19:26:01 +0000 (UTC)
Received: from web01.jbserver.net (web01.jbserver.net
 [IPv6:2a00:d10:2000:e::3])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id ECD961D2C
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 19:26:00 +0000 (UTC)
Received: from 75-138-17-190.fibertel.com.ar ([190.17.138.75]
 helo=[192.168.3.102])
 by web01.jbserver.net with esmtpsa (TLSv1:DHE-RSA-CAMELLIA256-SHA:256)
 (Exim 4.82) (envelope-from <fernando@gont.com.ar>)
 id 1W8alp-0007w8-5W; Wed, 29 Jan 2014 20:25:57 +0100
Message-ID: <52E955BA.9060908@gont.com.ar>
Date: Wed, 29 Jan 2014 16:25:46 -0300
From: Fernando Gont <fernando@gont.com.ar>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: FreeBSD Net <freebsd-net@freebsd.org>
Subject: Fwd: RFC 7112 on Implications of Oversized IPv6 Header Chains
References: <20140129173044.D475C7FC17B@rfc-editor.org>
In-Reply-To: <20140129173044.D475C7FC17B@rfc-editor.org>
X-Enigmail-Version: 1.5.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 19:26:01 -0000

Folks,

FYI. This one has important implications -- it allows stateless
filtering in IPv6 (otherwise not really possible)


-------- Original Message --------
Subject: RFC 7112 on Implications of Oversized IPv6 Header Chains
Date: Wed, 29 Jan 2014 09:30:44 -0800 (PST)
From: rfc-editor@rfc-editor.org
To: ietf-announce@ietf.org, rfc-dist@rfc-editor.org
CC: drafts-update-ref@iana.org, ipv6@ietf.org, rfc-editor@rfc-editor.org

A new Request for Comments is now available in online RFC libraries.


        RFC 7112

        Title:      Implications of Oversized IPv6 Header
                    Chains
        Author:     F. Gont, V. Manral,
                    R. Bonica
        Status:     Standards Track
        Stream:     IETF
        Date:       January 2014
        Mailbox:    fgont@si6networks.com,
                    vishwas@ionosnetworks.com,
                    rbonica@juniper.net
        Pages:      8
        Characters: 15897
        Updates:    RFC 2460

        I-D Tag:    draft-ietf-6man-oversized-header-chain-09.txt

        URL:        http://www.rfc-editor.org/rfc/rfc7112.txt

The IPv6 specification allows IPv6 Header Chains of an arbitrary
size.  The specification also allows options that can, in turn,
extend each of the headers.  In those scenarios in which the IPv6
Header Chain or options are unusually long and packets are
fragmented, or scenarios in which the fragment size is very small,
the First Fragment of a packet may fail to include the entire IPv6
Header Chain.  This document discusses the interoperability and
security problems of such traffic, and updates RFC 2460 such that the
First Fragment of a packet is required to contain the entire IPv6
Header Chain.

This document is a product of the IPv6 Maintenance Working Group of the
IETF.

This is now a Proposed Standard.

STANDARDS TRACK: This document specifies an Internet standards track
protocol for the Internet community,and requests discussion and suggestions
for improvements.  Please refer to the current edition of the Internet
Official Protocol Standards (STD 1) for the standardization state and
status of this protocol.  Distribution of this memo is unlimited.

This announcement is sent to the IETF-Announce and rfc-dist lists.
To subscribe or unsubscribe, see
  http://www.ietf.org/mailman/listinfo/ietf-announce
  http://mailman.rfc-editor.org/mailman/listinfo/rfc-dist

For searching the RFC series, see
http://www.rfc-editor.org/search/rfc_search.php
For downloading RFCs, see http://www.rfc-editor.org/rfc.html

Requests for special distribution should be addressed to either the
author of the RFC in question, or to rfc-editor@rfc-editor.org.  Unless
specifically noted otherwise on the RFC itself, all RFCs are for
unlimited distribution.


The RFC Editor Team
Association Management Solutions, LLC


--------------------------------------------------------------------
IETF IPv6 working group mailing list
ipv6@ietf.org
Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------


-- 
Fernando Gont
e-mail: fernando@gont.com.ar || fgont@si6networks.com
PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1


From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 21:30:01 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CFFDAE49
 for <freebsd-net@smarthost.ysv.freebsd.org>;
 Wed, 29 Jan 2014 21:30:01 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id BA9701753
 for <freebsd-net@smarthost.ysv.freebsd.org>;
 Wed, 29 Jan 2014 21:30:01 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s0TLU1vh067521
 for <freebsd-net@freefall.freebsd.org>; Wed, 29 Jan 2014 21:30:01 GMT
 (envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id s0TLU1dp067519;
 Wed, 29 Jan 2014 21:30:01 GMT (envelope-from gnats)
Date: Wed, 29 Jan 2014 21:30:01 GMT
Message-Id: <201401292130.s0TLU1dp067519@freefall.freebsd.org>
To: freebsd-net@FreeBSD.org
Cc: 
From: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= <olivier@cochard.me>
Subject: Re: kern/177905: [xl] [panic] ifmedia_set when pluging CardBus LAN
 card, xl(4) driver
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= <olivier@cochard.me>
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 21:30:01 -0000

The following reply was made to PR kern/177905; it has been noted by GNATS.

From: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= <olivier@cochard.me>
To: bug-followup@freebsd.org, olivier@cochard.me
Cc:  
Subject: Re: kern/177905: [xl] [panic] ifmedia_set when pluging CardBus LAN
 card, xl(4) driver
Date: Wed, 29 Jan 2014 22:25:52 +0100

 --001a11362cd417b57a04f12297a8
 Content-Type: text/plain; charset=ISO-8859-1
 
 Just a keepalive: Still the same problem on FreeBSD 10.0.
 
 --001a11362cd417b57a04f12297a8
 Content-Type: text/html; charset=ISO-8859-1
 
 <div dir="ltr">Just a keepalive: Still the same problem on FreeBSD 10.0.<br></div>
 
 --001a11362cd417b57a04f12297a8--

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 22:21:22 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 6B0457D4
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 22:21:22 +0000 (UTC)
Received: from mail-qc0-x22a.google.com (mail-qc0-x22a.google.com
 [IPv6:2607:f8b0:400d:c01::22a])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2A19D1CED
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 22:21:22 +0000 (UTC)
Received: by mail-qc0-f170.google.com with SMTP id e9so3859618qcy.1
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 14:21:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=UnyI2eYYxSEcf7NFs3EDoEZfO6pNK1Ca8wpGCcfymGE=;
 b=pmgHc5QV8KHTmF2/SPKALh0qx25Adtrgx0FOEBCRWWrkiSTZ94nRURq7e2fgpQh9gL
 Hjzjya9Xeg3m3jz3CgscLPx9oCaT6A+Oyno6w/JvhNcg57zRoEJm6potOb42gbGFRWgW
 /hMTM7sFaqIoIsUziFvqA1UF4KZzjzlzIfETKkQe9s4DSnMadH/rJINMvo+j+9Hs8nsa
 ncH6OI4pLP1kK8rl3rrwpnCsQIx/EyW08r6CRsRx65E3jGerpwpyPQVS40ImOKO372wL
 xPZY9cXe1QwSkWQXWphPVRjM83MrpdKtFvLbOEPUr1FUk0IxVCE5BSkAEwGe3L5z/YEj
 1YEw==
MIME-Version: 1.0
X-Received: by 10.224.52.3 with SMTP id f3mr16517773qag.26.1391034081362; Wed,
 29 Jan 2014 14:21:21 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.224.52.8 with HTTP; Wed, 29 Jan 2014 14:21:21 -0800 (PST)
In-Reply-To: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
References: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
Date: Wed, 29 Jan 2014 14:21:21 -0800
X-Google-Sender-Auth: h3t7oDAulNFeRilUMr0HheGsL3c
Message-ID: <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com>
Subject: Re: Big physically contiguous mbuf clusters
From: Adrian Chadd <adrian@freebsd.org>
To: Garrett Wollman <wollman@csail.mit.edu>
Content-Type: text/plain; charset=ISO-8859-1
Cc: FreeBSD Net <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 22:21:22 -0000

Hi,

On 29 January 2014 10:54, Garrett Wollman <wollman@csail.mit.edu> wrote:
> Resolved: that mbuf clusters longer than one page ought not be
> supported.  There is too much physical-memory fragmentation for them
> to be of use on a moderately active server.  9k mbufs are especially
> bad, since in the fragmented case they waste 3k per allocation.

I've been wondering whether it'd be feasible to teach the physical
memory allocator about >page sized allocations and to create zones of
slightly more physically contiguous memory.

For servers with lots of memory we could then keep these around and
only dip into them for temporary allocations (eg not VM pages that may
be held for some unknown amount of time.)

Question is - can we enforce that kind of behaviour?


-a

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 22:26:34 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id DD2E2A3E;
 Wed, 29 Jan 2014 22:26:34 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu
 [IPv6:2001:470:8b2d:1e1c:21b:21ff:feb8:d7b0])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id A94701D23;
 Wed, 29 Jan 2014 22:26:34 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1])
 by khavrinen.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s0TMQXeC049098
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256
 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA);
 Wed, 29 Jan 2014 17:26:33 -0500 (EST)
 (envelope-from wollman@khavrinen.csail.mit.edu)
Received: (from wollman@localhost)
 by khavrinen.csail.mit.edu (8.14.7/8.14.7/Submit) id s0TMQXxo049095;
 Wed, 29 Jan 2014 17:26:33 -0500 (EST) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <21225.32793.237629.329516@khavrinen.csail.mit.edu>
Date: Wed, 29 Jan 2014 17:26:33 -0500
From: Garrett Wollman <wollman@csail.mit.edu>
To: Adrian Chadd <adrian@freebsd.org>
Subject: Re: Big physically contiguous mbuf clusters
In-Reply-To: <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com>
References: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
 <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com>
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (khavrinen.csail.mit.edu [127.0.0.1]); Wed, 29 Jan 2014 17:26:33 -0500 (EST)
Cc: FreeBSD Net <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 22:26:34 -0000

<<On Wed, 29 Jan 2014 14:21:21 -0800, Adrian Chadd <adrian@freebsd.org> said:

> For servers with lots of memory we could then keep these around and
> only dip into them for temporary allocations (eg not VM pages that may
> be held for some unknown amount of time.)

mbufs may also be held for some unknown amout of time, so I don't
think that helps at all.

-GAWollman


From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 22:27:19 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 35F7DBE6;
 Wed, 29 Jan 2014 22:27:19 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id E8CE81D3A;
 Wed, 29 Jan 2014 22:27:18 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s0TMRE5r006197
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Wed, 29 Jan 2014 14:27:14 -0800 (PST)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s0TMREGH006196;
 Wed, 29 Jan 2014 14:27:14 -0800 (PST) (envelope-from jmg)
Date: Wed, 29 Jan 2014 14:27:14 -0800
From: John-Mark Gurney <jmg@funkthat.com>
To: Adrian Chadd <adrian@freebsd.org>
Subject: Re: Big physically contiguous mbuf clusters
Message-ID: <20140129222714.GK93141@funkthat.com>
Mail-Followup-To: Adrian Chadd <adrian@freebsd.org>,
 Garrett Wollman <wollman@csail.mit.edu>,
 FreeBSD Net <freebsd-net@freebsd.org>
References: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
 <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Wed, 29 Jan 2014 14:27:14 -0800 (PST)
Cc: Garrett Wollman <wollman@csail.mit.edu>,
 FreeBSD Net <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 22:27:19 -0000

Adrian Chadd wrote this message on Wed, Jan 29, 2014 at 14:21 -0800:
> On 29 January 2014 10:54, Garrett Wollman <wollman@csail.mit.edu> wrote:
> > Resolved: that mbuf clusters longer than one page ought not be
> > supported.  There is too much physical-memory fragmentation for them
> > to be of use on a moderately active server.  9k mbufs are especially
> > bad, since in the fragmented case they waste 3k per allocation.
> 
> I've been wondering whether it'd be feasible to teach the physical
> memory allocator about >page sized allocations and to create zones of
> slightly more physically contiguous memory.
> 
> For servers with lots of memory we could then keep these around and
> only dip into them for temporary allocations (eg not VM pages that may
> be held for some unknown amount of time.)
> 
> Question is - can we enforce that kind of behaviour?

It shouldn't be too hard to do...  Since everything pretty much goes
through uma we can adopt a scheme similar to what Solaris does (read
Magazines and Vmem: Extending the Slab Allocator to Many CPUs and
Arbitrary Resources)...  Instead of dealing w/ page size allocations,
everything is larger, say 16KB, and broken down from there...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 23:01:45 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 89D9AE0D;
 Wed, 29 Jan 2014 23:01:45 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id 27134108D;
 Wed, 29 Jan 2014 23:01:44 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqQEAIiH6VKDaFve/2dsb2JhbABZg0RWgn65QU+BGXSCJQEBAQMBAQEBIAQnHQECCwUWGAICDRkCKQEJJgYIBwQBGQMEh1wIDaploEYXgSmNBQEBGzQHgm+BSQSJSYp3gRWEBZBtg0seMYEEOQ
X-IronPort-AV: E=Sophos;i="4.95,744,1384318800"; d="scan'208";a="91674231"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 29 Jan 2014 18:01:43 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id A3442B40A3;
 Wed, 29 Jan 2014 18:01:43 -0500 (EST)
Date: Wed, 29 Jan 2014 18:01:43 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: J David <j.david.lists@gmail.com>
Message-ID: <1352428787.18632865.1391036503658.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RQj2evY7=Q0_7vbHrQrH3fPkW774gjNxWLwWbRXMzjdDA@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org>,
 Bryan Venteicher <bryanv@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 23:01:45 -0000

J David wrote:
> On Tue, Jan 28, 2014 at 7:32 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> > Hopefully Garrett and/or you will be able to do some testing of it
> > and report back w.r.t. performance gains, etc.
> 
> OK, it has seen light testing.
> 
> As predicted the vtnet drops are eliminated and CPU load is reduced.
> 
Ok, that's good news. Bryan, is increasing VTNET_MAX_TX_SEGS in the
driver feasible?

However, I do suspect we'll be putting a refined version of the patch
in head someday (maybe April, sooner would have to be committed by
someone else). I suspect that Garrett's code for server read will work
well and I'll cobble something to-gether for server readdir and client write.

> The performance is also improved:
> 
> Test Before After
> SeqWr 1506 7461
> SeqRd 566 192015
> RndRd 602 218730
> RndWr 44 13972
> 
> All numbers in kiB/sec.
> 
If you get the chance, you can try a few tunables on the server.
vfs.nfsd.fha.enable=0
- ken@ found that FHA was necessary for ZFS exports, to avoid out
  of order reads from confusing ZFS's sequential reading heuristic.
However, FHA also means that all readaheads for a file are serialized
with the reads for the file (same fh->same nfsd thread). Somehow, it
seems to me that doing reads concurrently in the server (given shared
vnode locks) could be a good thing.
--> I wonder what the story is for UFS?
So, it would be interesting to see what disabling FHA does for the
sequential read test.

I think I already mentioned the DRC cache ones:
vfs.nfsd.tcphighwater=100000
vfs.nfsd.tcpcachetimeo=600 (actually I think Garrett uses 300)

Good to see some progress, rick
ps: Daniel reports that he will be able to test the patch this
    weekend, to see if it fixes his problem that required TSO
    to be disabled, so we'll wait and see.

> There were initially still some problems with lousy hostcache values
> on the client after the test, which is what causes the iperf
> performance to tank after the NFS test, but after a reboot of both
> sides and fresh retest, I haven't reproduced that again.  If it comes
> back, I'll try to figure out what's going on.
> 
Hopefully a networking type might know what is going on, because this
is way out of my area of expertise.

> But this definitely looks like a move in the right direction.
> 
> Thanks!
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 23:03:07 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CF34CEE2;
 Wed, 29 Jan 2014 23:03:07 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id 7FB0C10A5;
 Wed, 29 Jan 2014 23:03:07 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqQEAIiH6VKDaFve/2dsb2JhbABZg0RWgn65QU+BGXSCJQEBAQMBAQEBICseAgsFFhgCAg0ZAikBCSYGCAcEARwEh1wIDaploEYXgSmNBQEBGzQHgm+BSQSJSYwMhAWQbYNLHjGBBDk
X-IronPort-AV: E=Sophos;i="4.95,744,1384318800"; d="scan'208";a="91674458"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 29 Jan 2014 18:03:06 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 6E3F0B3F00;
 Wed, 29 Jan 2014 18:03:06 -0500 (EST)
Date: Wed, 29 Jan 2014 18:03:06 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: J David <j.david.lists@gmail.com>
Message-ID: <1869703796.18633714.1391036586445.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RQj2evY7=Q0_7vbHrQrH3fPkW774gjNxWLwWbRXMzjdDA@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 23:03:07 -0000

J David wrote:
> On Tue, Jan 28, 2014 at 7:32 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> > Hopefully Garrett and/or you will be able to do some testing of it
> > and report back w.r.t. performance gains, etc.
> 
> OK, it has seen light testing.
> 
> As predicted the vtnet drops are eliminated and CPU load is reduced.
> 
Oh, and I forgot to say thanks for doing this testing, rick

> The performance is also improved:
> 
> Test Before After
> SeqWr 1506 7461
> SeqRd 566 192015
> RndRd 602 218730
> RndWr 44 13972
> 
> All numbers in kiB/sec.
> 
> There were initially still some problems with lousy hostcache values
> on the client after the test, which is what causes the iperf
> performance to tank after the NFS test, but after a reboot of both
> sides and fresh retest, I haven't reproduced that again.  If it comes
> back, I'll try to figure out what's going on.
> 
> But this definitely looks like a move in the right direction.
> 
> Thanks!
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 23:08:44 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 615542BC;
 Wed, 29 Jan 2014 23:08:44 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id 1393710F2;
 Wed, 29 Jan 2014 23:08:43 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: 
X-IronPort-AV: E=Sophos;i="4.95,744,1384318800"; d="scan'208";a="92187781"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 29 Jan 2014 18:08:31 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 8EDFCB4062;
 Wed, 29 Jan 2014 18:08:31 -0500 (EST)
Date: Wed, 29 Jan 2014 18:08:31 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: J David <j.david.lists@gmail.com>
Message-ID: <2032299860.18637455.1391036911579.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RQj2evY7=Q0_7vbHrQrH3fPkW774gjNxWLwWbRXMzjdDA@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.209]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 23:08:44 -0000

J David wrote:
> On Tue, Jan 28, 2014 at 7:32 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> > Hopefully Garrett and/or you will be able to do some testing of it
> > and report back w.r.t. performance gains, etc.
> 
> OK, it has seen light testing.
> 
> As predicted the vtnet drops are eliminated and CPU load is reduced.
> 
> The performance is also improved:
> 
> Test Before After
> SeqWr 1506 7461
> SeqRd 566 192015
> RndRd 602 218730
> RndWr 44 13972
> 
> All numbers in kiB/sec.
> 
Oops, ignore most of what I said about FHA. I now see that the default
is 8 nfsd per FH, which should handle readaheads.

However, it does remind me that it would be nice to try cranking up
the readahead value for the client mount. "-o readahead=8" would be
a good one to try (you can go as high as 16, if you'd like).

Have fun with it, rick

> There were initially still some problems with lousy hostcache values
> on the client after the test, which is what causes the iperf
> performance to tank after the NFS test, but after a reboot of both
> sides and fresh retest, I haven't reproduced that again.  If it comes
> back, I'll try to figure out what's going on.
> 
> But this definitely looks like a move in the right direction.
> 
> Thanks!
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 23:11:27 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4D55F561;
 Wed, 29 Jan 2014 23:11:27 +0000 (UTC)
Received: from mail-pb0-x22e.google.com (mail-pb0-x22e.google.com
 [IPv6:2607:f8b0:400e:c01::22e])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 191CA118C;
 Wed, 29 Jan 2014 23:11:27 +0000 (UTC)
Received: by mail-pb0-f46.google.com with SMTP id um1so2364408pbc.19
 for <multiple recipients>; Wed, 29 Jan 2014 15:11:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:mail-followup-to:references
 :mime-version:content-type:content-disposition:in-reply-to
 :user-agent; bh=DY5Bn7ksljXPjIRVBlEkpA6d9joXOdvyunVoYwCRcQ0=;
 b=GCPISSPhA2hQ50K2ncGUe2/qjFs3fCX8ObF/Ag8uNzyYiEPPnYiNrn/2DUPTu49+fY
 SkHSmy6ch/L9zgekJUNHGu1mkeBzyEC5WWYjoBs7DBXU3ksGuUj5l2T6I73cLB7P/LMw
 kWIZubTW1GBkDWROPLHqLuLl2mWu/yY4GJKvTo1GRjpl4Hn1WEUSjPtj5MlzUtYw8AOE
 VgRh4UWiG37CfwIcTsUoZU4oL+TLQtywwBoidRZFcHGS/3pJxy6dwPD7fKQvsdtSbgYs
 0SFbIt28QTmXozQ5MaNy6cr45kql59aLcWJYbtVBCxvTe4Ygcc9OiZo3igVeYDm7ewXc
 EqIA==
X-Received: by 10.68.130.202 with SMTP id og10mr10744421pbb.133.1391037086785; 
 Wed, 29 Jan 2014 15:11:26 -0800 (PST)
Received: from ox (c-24-6-44-228.hsd1.ca.comcast.net. [24.6.44.228])
 by mx.google.com with ESMTPSA id sy10sm26834530pac.15.2014.01.29.15.11.25
 for <multiple recipients>
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Wed, 29 Jan 2014 15:11:26 -0800 (PST)
Date: Wed, 29 Jan 2014 15:11:21 -0800
From: Navdeep Parhar <nparhar@gmail.com>
To: Adrian Chadd <adrian@freebsd.org>
Subject: Re: Big physically contiguous mbuf clusters
Message-ID: <20140129231121.GA18434@ox>
Mail-Followup-To: Adrian Chadd <adrian@freebsd.org>,
 Garrett Wollman <wollman@csail.mit.edu>,
 FreeBSD Net <freebsd-net@freebsd.org>
References: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
 <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: Garrett Wollman <wollman@csail.mit.edu>,
 FreeBSD Net <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 23:11:27 -0000

On Wed, Jan 29, 2014 at 02:21:21PM -0800, Adrian Chadd wrote:
> Hi,
> 
> On 29 January 2014 10:54, Garrett Wollman <wollman@csail.mit.edu> wrote:
> > Resolved: that mbuf clusters longer than one page ought not be
> > supported.  There is too much physical-memory fragmentation for them
> > to be of use on a moderately active server.  9k mbufs are especially
> > bad, since in the fragmented case they waste 3k per allocation.
> 
> I've been wondering whether it'd be feasible to teach the physical
> memory allocator about >page sized allocations and to create zones of
> slightly more physically contiguous memory.

I think this would be very useful.  For example, a zone_jumbo32 would
hit a sweet spot -- enough to fit 3 jumbo frames and some loose change
for metadata.  I'd like to see us improve our allocators and VM system
to work better with larger contiguous allocations, rather than
deprecating the larger zones.  It seems backwards to push towards
smaller allocation units when installed physical memory in a typical
system continues to rise.

Allocating 3 x 4K instead of 1 x 9K for a jumbo means 3x the number of
vtophys translations, 3x the phys_addr/len traffic on the PCIe bus
(scatter list has to be fed to the chip and now it's 3x what it has to
be), 3x the number of "wrapper" mbuf allocations (one for each 4K
cluster) which will then be stitched together to form a frame, etc. etc.

Regards,
Navdeep

> 
> For servers with lots of memory we could then keep these around and
> only dip into them for temporary allocations (eg not VM pages that may
> be held for some unknown amount of time.)
> 
> Question is - can we enforce that kind of behaviour?
> 
> 
> 
> -a
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"

From owner-freebsd-net@FreeBSD.ORG  Wed Jan 29 23:31:14 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 92320330;
 Wed, 29 Jan 2014 23:31:14 +0000 (UTC)
Received: from mail-ig0-x230.google.com (mail-ig0-x230.google.com
 [IPv6:2607:f8b0:4001:c05::230])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 484AE139A;
 Wed, 29 Jan 2014 23:31:14 +0000 (UTC)
Received: by mail-ig0-f176.google.com with SMTP id j1so16305315iga.3
 for <multiple recipients>; Wed, 29 Jan 2014 15:31:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:from:date:message-id
 :subject:to:cc:content-type;
 bh=DL42o1oFSdz/gGZv93VjQhF9DRGOaiIuIXnIgx0Asd4=;
 b=tZjwsn/8KCGC98Db76d3JnodEAQKeqvxD3jXWuagDzc+uAFiMPlYxd0dRSBjY9GFDe
 5yiBDRjlHLMcb3H+XPbtqPI+WcIrGvZNPVPP3/uUfKO8UVhLkSkqVVFmE8GOqfvGhmxG
 6RDj+xV3D1w3WC5A24otXcXoZe/2DqqJiZLgY0Skx9y2TUKaEJ6PqQQdu7t4aJ4qxoUF
 Qmm4rFql4DXlZnvabjD4/s20r1nbw8W+B6yATRqsJaZmhP4tre+dD8teZeBBk5asQkwL
 JVwcreW+odYR6e/k7aIChrIHaUp8W3h1JxjQxCwThgGl7iWy1FC/fXjD7OgtloBo6G7P
 f/9A==
X-Received: by 10.50.114.4 with SMTP id jc4mr31370497igb.0.1391038273623; Wed,
 29 Jan 2014 15:31:13 -0800 (PST)
MIME-Version: 1.0
Sender: mr.kodiak@gmail.com
Received: by 10.64.96.73 with HTTP; Wed, 29 Jan 2014 15:30:43 -0800 (PST)
In-Reply-To: <1352428787.18632865.1391036503658.JavaMail.root@uoguelph.ca>
References: <CABXB=RQj2evY7=Q0_7vbHrQrH3fPkW774gjNxWLwWbRXMzjdDA@mail.gmail.com>
 <1352428787.18632865.1391036503658.JavaMail.root@uoguelph.ca>
From: Bryan Venteicher <bryanv@freebsd.org>
Date: Wed, 29 Jan 2014 17:30:43 -0600
X-Google-Sender-Auth: TYs9uu6M0ndJjNQBihmikRszTDg
Message-ID: <CAGaYwLcDVMA3=1x4hXXVvRojCBewWFZUyZfdiup=jo685+51+A@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
Cc: freebsd-net@freebsd.org, J David <j.david.lists@gmail.com>,
 Garrett Wollman <wollman@freebsd.org>, Bryan Venteicher <bryanv@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2014 23:31:14 -0000

On Wed, Jan 29, 2014 at 5:01 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> J David wrote:
> > On Tue, Jan 28, 2014 at 7:32 PM, Rick Macklem <rmacklem@uoguelph.ca>
> > wrote:
> > > Hopefully Garrett and/or you will be able to do some testing of it
> > > and report back w.r.t. performance gains, etc.
> >
> > OK, it has seen light testing.
> >
> > As predicted the vtnet drops are eliminated and CPU load is reduced.
> >
> Ok, that's good news. Bryan, is increasing VTNET_MAX_TX_SEGS in the
> driver feasible?
>
>

I've been busy the last few days, and won't be able to get to any code
until the weekend.

The current MAX_TX_SEGS value is mostly arbitrary - the implicit limit is
VIRTIO_MAX_INDIRECT. This value is used in virtqueue.c to allocate an array
of 'struct vring_desc' which is 16 bytes so we have some next power of 2
rounding going on, so we can make it bigger without using any real
additional memory usage.

But also note I do put an MAX_TX_SEGS sized array of 'struct sglist_segs'
on the stack so it cannot be made too big. Even what is currently there is
probably already pushing what's a Good Idea to put on the stack anyways
(especially since it is near the bottom of a typically pretty deep call
stack). I've been meaning to move that to hanging on the 'struct vtnet_txq'
instead.

I think all TSO capable drivers that use m_collapse(..., 32) (and don't set
if_hw_tsomax) are broken - there looks to be several. I was slightly on top
of my game by using 33 since it appears m_collapse() does not touch the
pkthdr mbuf (I think that was my thinking 3 years ago, and seems to be the
case by a quick glance at the code). I think drivers using m_defrag(...,
32) are OK, but that function can be much, much more expensive.


However, I do suspect we'll be putting a refined version of the patch
> in head someday (maybe April, sooner would have to be committed by
> someone else). I suspect that Garrett's code for server read will work
> well and I'll cobble something to-gether for server readdir and client
> write.
>
> > The performance is also improved:
> >
> > Test Before After
> > SeqWr 1506 7461
> > SeqRd 566 192015
> > RndRd 602 218730
> > RndWr 44 13972
> >
> > All numbers in kiB/sec.
> >
> If you get the chance, you can try a few tunables on the server.
> vfs.nfsd.fha.enable=0
> - ken@ found that FHA was necessary for ZFS exports, to avoid out
>   of order reads from confusing ZFS's sequential reading heuristic.
> However, FHA also means that all readaheads for a file are serialized
> with the reads for the file (same fh->same nfsd thread). Somehow, it
> seems to me that doing reads concurrently in the server (given shared
> vnode locks) could be a good thing.
> --> I wonder what the story is for UFS?
> So, it would be interesting to see what disabling FHA does for the
> sequential read test.
>
> I think I already mentioned the DRC cache ones:
> vfs.nfsd.tcphighwater=100000
> vfs.nfsd.tcpcachetimeo=600 (actually I think Garrett uses 300)
>
> Good to see some progress, rick
> ps: Daniel reports that he will be able to test the patch this
>     weekend, to see if it fixes his problem that required TSO
>     to be disabled, so we'll wait and see.
>
> > There were initially still some problems with lousy hostcache values
> > on the client after the test, which is what causes the iperf
> > performance to tank after the NFS test, but after a reboot of both
> > sides and fresh retest, I haven't reproduced that again.  If it comes
> > back, I'll try to figure out what's going on.
> >
> Hopefully a networking type might know what is going on, because this
> is way out of my area of expertise.
>
> > But this definitely looks like a move in the right direction.
> >
> > Thanks!
> > _______________________________________________
> > freebsd-net@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to
> > "freebsd-net-unsubscribe@freebsd.org"
> >
>

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 00:55:20 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9637138D
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 00:55:20 +0000 (UTC)
Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com
 [66.111.4.28])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 659971EA1
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 00:55:20 +0000 (UTC)
Received: from compute1.internal (compute1.nyi.mail.srv.osa [10.202.2.41])
 by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 15BBC20C66
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 19:55:18 -0500 (EST)
Received: from frontend1 ([10.202.2.160])
 by compute1.internal (MEProxy); Wed, 29 Jan 2014 19:55:18 -0500
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=
 messagingengine.com; h=content-type:mime-version:subject:from
 :in-reply-to:date:cc:content-transfer-encoding:message-id
 :references:to; s=smtpout; bh=NX/OrH7/1OJxJdubTttkDG2E+OA=; b=bD
 k2ZJt1u2+gf/d/Fs7KEOxqoMMGrC0RQVbLeNr4stwiN3NCA5cX54csmGsewRdPPN
 L5zyjXJxGZUjJqpks/Mzs+10IJb3FWSyoyl11ce4lUqTh7DsPJcwrDUAAAvClswx
 BaagBSlkr9KhS4JVxf+2hLIhgg5k9HaOwTh4qIs9M=
X-Sasl-enc: MIBf6EKWPZeWy2mXG1OjSkVRX+42FYOXOP1mRmFvsw8G 1391043317
Received: from [172.16.1.145] (unknown [68.117.126.78])
 by mail.messagingengine.com (Postfix) with ESMTPA id 8DEEBC00E81;
 Wed, 29 Jan 2014 19:55:17 -0500 (EST)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\))
Subject: Re: carp and rtadvd
From: Mark Felder <feld@FreeBSD.org>
In-Reply-To: <52E7AB9B.5050707@dataoppdrag.no>
Date: Wed, 29 Jan 2014 18:55:16 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <D9CBAFC5-CCA2-4F9F-9CDE-12FEBEC42875@FreeBSD.org>
References: <52E7AB9B.5050707@dataoppdrag.no>
To: Ole Myhre <ole@dataoppdrag.no>
X-Mailer: Apple Mail (2.1827)
Cc: FreeBSD Net <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 00:55:20 -0000


On Jan 28, 2014, at 7:07, Ole Myhre <ole@dataoppdrag.no> wrote:

> Hi,
>=20
> I have a simple setup with two 10.0-RELEASE firewalls running carp, a
> virtual IPv6 address and running rtadvd:
>=20
> (applied to both firewalls)
>=20
> # kldload carp
> # ifconfig em2 inet6 2001:db8::1/64 vhid 1 up
> # sysctl net.inet6.ip6.forwarding=3D1
> # echo 'rtadvd_enable=3D"YES"' >> /etc/rc.conf
> # echo 'rtadvd_interfaces=3D"em2"' >> /etc/rc.conf
> # service rtadvd start
>=20
> This works fine, one firewall is MASTER, the other BACKUP and the
> clients behind em2 gets a prefix in the 2001:db8::/64 subnet. However
> both firewalls are sending router advertisements (only one being =
MASTER)
> with the LL-address of the physical em2 interface as the gateway. This
> causes clients that supports multiple default gateways to select both
> firewalls as their default gateway, and sending traffic to both the
> MASTER and BACKUP firewall.
>=20
> Is there a way to make only the MASTER send router advertisements or
> (preferably only the MASTER) sending router advertisements with a
> virtual LL-address?
>=20

What I would do is use devd to start/stop the rtadvd service based on =
whether or not you're master.


# notify 30 {
#         match "system"          "IFNET";
#         match "subsystem"       "carp0";
#         match "type"            "LINK_UP";
#         action "/path/to/script/or/command";
# };
# =20
# notify 30 {
#         match "system"          "IFNET";
#         match "subsystem"       "carp0";
#         match "type"            "LINK_DOWN";
#         action "/path/to/script/or/command";
# };


From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 01:34:42 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C509D7FD;
 Thu, 30 Jan 2014 01:34:42 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 95EC91246;
 Thu, 30 Jan 2014 01:34:41 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s0U1YZl6008675
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Wed, 29 Jan 2014 17:34:35 -0800 (PST)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s0U1YYV7008674;
 Wed, 29 Jan 2014 17:34:34 -0800 (PST) (envelope-from jmg)
Date: Wed, 29 Jan 2014 17:34:34 -0800
From: John-Mark Gurney <jmg@funkthat.com>
To: Adrian Chadd <adrian@freebsd.org>, Garrett Wollman <wollman@csail.mit.edu>,
 FreeBSD Net <freebsd-net@freebsd.org>
Subject: Re: Big physically contiguous mbuf clusters
Message-ID: <20140130013434.GP93141@funkthat.com>
Mail-Followup-To: Adrian Chadd <adrian@freebsd.org>,
 Garrett Wollman <wollman@csail.mit.edu>,
 FreeBSD Net <freebsd-net@freebsd.org>
References: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
 <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com>
 <20140129231121.GA18434@ox>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20140129231121.GA18434@ox>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Wed, 29 Jan 2014 17:34:35 -0800 (PST)
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 01:34:42 -0000

Navdeep Parhar wrote this message on Wed, Jan 29, 2014 at 15:11 -0800:
> On Wed, Jan 29, 2014 at 02:21:21PM -0800, Adrian Chadd wrote:
> > Hi,
> > 
> > On 29 January 2014 10:54, Garrett Wollman <wollman@csail.mit.edu> wrote:
> > > Resolved: that mbuf clusters longer than one page ought not be
> > > supported.  There is too much physical-memory fragmentation for them
> > > to be of use on a moderately active server.  9k mbufs are especially
> > > bad, since in the fragmented case they waste 3k per allocation.
> > 
> > I've been wondering whether it'd be feasible to teach the physical
> > memory allocator about >page sized allocations and to create zones of
> > slightly more physically contiguous memory.
> 
> I think this would be very useful.  For example, a zone_jumbo32 would
> hit a sweet spot -- enough to fit 3 jumbo frames and some loose change
> for metadata.  I'd like to see us improve our allocators and VM system

Actually, that is what currently happens...  I just verified this on
-current...

http://fxr.watson.org/fxr/source/vm/uma_core.c#L880

is where the allocation happens, notice the uk_ppera, and kgdb says:
print zone_jumbo9[0].uz_kegs.lh_first[0].kl_keg[0].uk_ppera
$7 = 3

> to work better with larger contiguous allocations, rather than
> deprecating the larger zones.  It seems backwards to push towards
> smaller allocation units when installed physical memory in a typical
> system continues to rise.
> 
> Allocating 3 x 4K instead of 1 x 9K for a jumbo means 3x the number of
> vtophys translations, 3x the phys_addr/len traffic on the PCIe bus

I don't think that this will be an issue.. If we support a 9k jumbo
that is not physically contiguous (easy on main memory), it's likely
that the table we use to fetch the first physical page will likely have
the next two pages in it, so I doubt there will be that significant
performance penalty, yes, we'll loop a few more times, but main memory
accesses is more the speed limiter in these situations...

> (scatter list has to be fed to the chip and now it's 3x what it has to
> be), 3x the number of "wrapper" mbuf allocations (one for each 4K
> cluster) which will then be stitched together to form a frame, etc. etc.

And what is that in percentage of overall traffic?  .4% (assuming 16 bytes
per 4k page)...  If your PCIe bus is saturating and you need that extra
.4% traffic, then you have a serious issue w/ your bus layout...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 02:05:27 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3DF5EDB1;
 Thu, 30 Jan 2014 02:05:27 +0000 (UTC)
Received: from mail-pb0-x230.google.com (mail-pb0-x230.google.com
 [IPv6:2607:f8b0:400e:c01::230])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 09F92149D;
 Thu, 30 Jan 2014 02:05:27 +0000 (UTC)
Received: by mail-pb0-f48.google.com with SMTP id rr13so2504520pbb.21
 for <multiple recipients>; Wed, 29 Jan 2014 18:05:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:subject:message-id:mail-followup-to:references
 :mime-version:content-type:content-disposition:in-reply-to
 :user-agent; bh=BYVlWTcCBZVAZbvxJEeExuf3OB4kxF3bOLO3wrj0Qx8=;
 b=C7rLbW3saSuaYMczwhkmx2ff4KJh/4VWLQhfmKDZurfSSwZ+GoB/iDJbwaFz0w0QYI
 AqfymKhS2JfGGYF4YLXT12XrjW+VI4Afxna60MQv7QuuXHxSmihnEe1bwezVkI3OxT6u
 8N5BY8kqFO/V4BE78nYOhmQ2VMahspB2DxwhhaXQ/40HWR/QCvse39qaz93qvQxqDZ9M
 cWs0k9O6iH6dZ+m13zTDPRy6+8gFaQwGr36Nry1qjCAlzav3qPa0G5b9ulMUMb/OYlTL
 /eRxclfsVVX/1VGCRNIFuMjMUYOqPE1RISCDcYw3MPjuMWb0x7NSU75NztfpyKCcT3Wa
 TIhQ==
X-Received: by 10.66.163.164 with SMTP id yj4mr11461589pab.91.1391047526709;
 Wed, 29 Jan 2014 18:05:26 -0800 (PST)
Received: from ox (c-24-6-44-228.hsd1.ca.comcast.net. [24.6.44.228])
 by mx.google.com with ESMTPSA id ns7sm11640705pbc.32.2014.01.29.18.05.25
 for <multiple recipients>
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Wed, 29 Jan 2014 18:05:25 -0800 (PST)
Date: Wed, 29 Jan 2014 18:05:23 -0800
From: Navdeep Parhar <nparhar@gmail.com>
To: Adrian Chadd <adrian@freebsd.org>, Garrett Wollman <wollman@csail.mit.edu>,
 FreeBSD Net <freebsd-net@freebsd.org>
Subject: Re: Big physically contiguous mbuf clusters
Message-ID: <20140130020523.GB18434@ox>
Mail-Followup-To: Adrian Chadd <adrian@freebsd.org>,
 Garrett Wollman <wollman@csail.mit.edu>,
 FreeBSD Net <freebsd-net@freebsd.org>
References: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
 <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com>
 <20140129231121.GA18434@ox> <20140130013434.GP93141@funkthat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20140130013434.GP93141@funkthat.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 02:05:27 -0000

On Wed, Jan 29, 2014 at 05:34:34PM -0800, John-Mark Gurney wrote:
> Navdeep Parhar wrote this message on Wed, Jan 29, 2014 at 15:11 -0800:
> 
> > (scatter list has to be fed to the chip and now it's 3x what it has to
> > be), 3x the number of "wrapper" mbuf allocations (one for each 4K
> > cluster) which will then be stitched together to form a frame, etc. etc.
> 
> And what is that in percentage of overall traffic?  .4% (assuming 16 bytes
> per 4k page)...  If your PCIe bus is saturating and you need that extra
> .4% traffic, then you have a serious issue w/ your bus layout...

The 16B and 4KB are in different directions, the former is from host to
chip and the latter from chip to host memory.  So the 16B eats into the
transmit bandwidth.  FWIW, I do deal with cards where PCIe is the
limiting factor (a 4x10G card with a pcie gen2 x8 block, a 2x40G card
with pcie gen3 x8 block) and the effects of 4K vs. 9K rx on the transmit
bandwidth are measurable.  These days chips can even place multiple
frames into a single buffer (if they'd fit) and that's another reason I
tend to advocate for larger contiguous buffer sizes.

Regards,
Navdeep

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 03:08:23 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0CDF0E1E
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 03:08:23 +0000 (UTC)
Received: from hergotha.csail.mit.edu
 (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id B083F1964
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 03:08:22 +0000 (UTC)
Received: from hergotha.csail.mit.edu (localhost [127.0.0.1])
 by hergotha.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s0U38KQk009863;
 Wed, 29 Jan 2014 22:08:20 -0500 (EST)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: (from wollman@localhost)
 by hergotha.csail.mit.edu (8.14.7/8.14.4/Submit) id s0U38JSM009860;
 Wed, 29 Jan 2014 22:08:19 -0500 (EST) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <21225.49699.916951.881502@hergotha.csail.mit.edu>
Date: Wed, 29 Jan 2014 22:08:19 -0500
From: Garrett Wollman <wollman@bimajority.org>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
In-Reply-To: <1352428787.18632865.1391036503658.JavaMail.root@uoguelph.ca>
References: <CABXB=RQj2evY7=Q0_7vbHrQrH3fPkW774gjNxWLwWbRXMzjdDA@mail.gmail.com>
 <1352428787.18632865.1391036503658.JavaMail.root@uoguelph.ca>
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (hergotha.csail.mit.edu [127.0.0.1]); Wed, 29 Jan 2014 22:08:20 -0500 (EST)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED
 autolearn=disabled version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 hergotha.csail.mit.edu
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 03:08:23 -0000

<<On Wed, 29 Jan 2014 18:01:43 -0500 (EST), Rick Macklem <rmacklem@uoguelph.ca> said:

> However, I do suspect we'll be putting a refined version of the patch
> in head someday (maybe April, sooner would have to be committed by
> someone else). I suspect that Garrett's code for server read will work
> well and I'll cobble something to-gether for server readdir and client write.

Once I can get this mps(4) issue ironed out, I should be in a position
to get some real data on this.

-GAWollman


From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 03:22:30 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 11C454E5
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 03:22:30 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id CB1321A9F
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 03:22:29 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEAJfE6VKDaFve/2dsb2JhbABZhBuDAboagRl0giUBAQEEI1YbGAICDRkCWQYTiAWqeqBxF4EpjSI0B4JvgUkEiUmgfoNLHoFu
X-IronPort-AV: E=Sophos;i="4.95,746,1384318800"; d="scan'208";a="91740667"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 29 Jan 2014 22:22:22 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id D571DB40D2;
 Wed, 29 Jan 2014 22:22:22 -0500 (EST)
Date: Wed, 29 Jan 2014 22:22:22 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Garrett Wollman <wollman@bimajority.org>
Message-ID: <1315174039.18735121.1391052142869.JavaMail.root@uoguelph.ca>
In-Reply-To: <21225.49699.916951.881502@hergotha.csail.mit.edu>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 03:22:30 -0000

Garrett Wollman wrote:
> <<On Wed, 29 Jan 2014 18:01:43 -0500 (EST), Rick Macklem
> <rmacklem@uoguelph.ca> said:
> 
> > However, I do suspect we'll be putting a refined version of the
> > patch
> > in head someday (maybe April, sooner would have to be committed by
> > someone else). I suspect that Garrett's code for server read will
> > work
> > well and I'll cobble something to-gether for server readdir and
> > client write.
> 
> Once I can get this mps(4) issue ironed out, I should be in a
> position
> to get some real data on this.
> 
If you can check the network device driver you use and if it looks
like it has a scatter size of less than 36 (often a constant with
"TXSEG" or "TX_SEG" in the name) and calls either m_defrag() or
m_collapse(), adding a counter to see if those functions are being
called, would be nice.

If the m_collapse()/m_defrag() function is being called without
the patch and not with the patch, the performance difference may
be avoiding that call and not a more generic benefit.

I just did a quick find/grep and it looks like a lot of drivers
have *TXSEGS* set to around 32 and then call one of two functions
for more than that. Since without a patch, 64K NFS reads/writes
hand sosend() an mbuf list of 34 entries, it seems like this could
be happening a lot. (I didn't look to see which ones set if_hw_tsomax
to significantly less than 64K.)

Thanks for working on this, rick.
ps: you might want to combine your patch with mine, so readdir and
    client side writes use 4K clusters.

> -GAWollman
> 
> 

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 03:22:58 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 6508D572
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 03:22:58 +0000 (UTC)
Received: from hergotha.csail.mit.edu
 (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 061511AA4
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 03:22:57 +0000 (UTC)
Received: from hergotha.csail.mit.edu (localhost [127.0.0.1])
 by hergotha.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s0U3MukO010030;
 Wed, 29 Jan 2014 22:22:56 -0500 (EST)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: (from wollman@localhost)
 by hergotha.csail.mit.edu (8.14.7/8.14.4/Submit) id s0U3Mt3s010029;
 Wed, 29 Jan 2014 22:22:55 -0500 (EST) (envelope-from wollman)
Date: Wed, 29 Jan 2014 22:22:55 -0500 (EST)
From: Garrett Wollman <wollman@hergotha.csail.mit.edu>
Message-Id: <201401300322.s0U3Mt3s010029@hergotha.csail.mit.edu>
To: nparhar@gmail.com
Subject: Re: Big physically contiguous mbuf clusters
X-Newsgroups: mit.lcs.mail.freebsd-net
In-Reply-To: <20140129231138$3db6@grapevine.csail.mit.edu>
References: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
 <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com>
Organization: none
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (hergotha.csail.mit.edu [127.0.0.1]); Wed, 29 Jan 2014 22:22:56 -0500 (EST)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,LOTS_OF_MONEY
 autolearn=disabled version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 hergotha.csail.mit.edu
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 03:22:58 -0000

In article <20140129231138$3db6@grapevine.csail.mit.edu>,
nparhar@gmail.com writes:

>I think this would be very useful.  For example, a zone_jumbo32 would
>hit a sweet spot -- enough to fit 3 jumbo frames and some loose change
>for metadata.  I'd like to see us improve our allocators and VM system
>to work better with larger contiguous allocations, rather than
>deprecating the larger zones.  It seems backwards to push towards
>smaller allocation units when installed physical memory in a typical
>system continues to rise.

In order to resist fragmentation, you need to be willing to dedicate
some partition of physical memory to larger allocations.  That's fine
for a special-purpose device like a switch, but is not so good for a
general-purpose operating system.  But if you were willing to reserve,
say, 1/64th of physical memory at boot time, make it all direct-mapped
using superpages, and allocate it in fixed-power-of-two-sized chunks,
you would probably get a performance win.  But the chunks *have* to be
fixed-size, otherwise you are nearly guaranteed to get your arena
checkerboarded.  I'd consider giving 2 GB on a 128-GB machine for
that.

For NFS performance, you'd probably want to be able to take a whole
chunk, read the desired data into it in a single VOP, then pass the
whole thing to the socket layer wrapped in an mbuf.

-GAWollman

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 03:31:10 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2C7096B1;
 Thu, 30 Jan 2014 03:31:10 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id 956461B3B;
 Thu, 30 Jan 2014 03:31:09 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqQEAPXG6VKDaFve/2dsb2JhbABZg0RXgwG5S0+BGnSCJQEBAQMBAQEBIAQnHQECCwUWGAICDRkCKQEJJgYIBwQBGQMEh1wIDapuoHAXgSmMfwYBAQEaNAeCb4FJBIlJineBFYQFkG2DSx4xfAgXIg
X-IronPort-AV: E=Sophos;i="4.95,746,1384318800"; d="scan'208";a="91741811"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 29 Jan 2014 22:31:08 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 2F49BB4184;
 Wed, 29 Jan 2014 22:31:08 -0500 (EST)
Date: Wed, 29 Jan 2014 22:31:08 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bryan Venteicher <bryanv@freebsd.org>
Message-ID: <1879662319.18746958.1391052668182.JavaMail.root@uoguelph.ca>
In-Reply-To: <CAGaYwLcDVMA3=1x4hXXVvRojCBewWFZUyZfdiup=jo685+51+A@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, J David <j.david.lists@gmail.com>,
 Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 03:31:10 -0000

Bryan Venteicher wrote:
> On Wed, Jan 29, 2014 at 5:01 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> 
> > J David wrote:
> > > On Tue, Jan 28, 2014 at 7:32 PM, Rick Macklem
> > > <rmacklem@uoguelph.ca>
> > > wrote:
> > > > Hopefully Garrett and/or you will be able to do some testing of
> > > > it
> > > > and report back w.r.t. performance gains, etc.
> > >
> > > OK, it has seen light testing.
> > >
> > > As predicted the vtnet drops are eliminated and CPU load is
> > > reduced.
> > >
> > Ok, that's good news. Bryan, is increasing VTNET_MAX_TX_SEGS in the
> > driver feasible?
> >
> >
> 
> I've been busy the last few days, and won't be able to get to any
> code
> until the weekend.
> 
> The current MAX_TX_SEGS value is mostly arbitrary - the implicit
> limit is
> VIRTIO_MAX_INDIRECT. This value is used in virtqueue.c to allocate an
> array
> of 'struct vring_desc' which is 16 bytes so we have some next power
> of 2
> rounding going on, so we can make it bigger without using any real
> additional memory usage.
> 
> But also note I do put an MAX_TX_SEGS sized array of 'struct
> sglist_segs'
> on the stack so it cannot be made too big. Even what is currently
> there is
> probably already pushing what's a Good Idea to put on the stack
> anyways
> (especially since it is near the bottom of a typically pretty deep
> call
> stack). I've been meaning to move that to hanging on the 'struct
> vtnet_txq'
> instead.
> 
Well, NFS hands TCP a list of 34 mbufs. If TCP obly adds one, then
increasing it from 34 to 35 would be all it takes. However, see below.

> I think all TSO capable drivers that use m_collapse(..., 32) (and
> don't set
> if_hw_tsomax) are broken - there looks to be several. I was slightly
> on top
> of my game by using 33 since it appears m_collapse() does not touch
> the
> pkthdr mbuf (I think that was my thinking 3 years ago, and seems to
> be the
> case by a quick glance at the code). I think drivers using
> m_defrag(...,
> 32) are OK, but that function can be much, much more expensive.
> 
Well, even m_defrag(..M_NOWAIT..) can fail and then it means a TCP
layer timeout/retransmit. If the allocator is constipated, this could
be pretty much a trainwreck, I think.

I also agree that m_defrag() adds a lot of overhead, but calling
m_collapse() a lot will be quite a bit of overhead, as well. (Also,
I don't think that m_collapse() is more likely to fail, since it
only copies data to the previous mbuf when the entire mbuf that
follows will fit and it's allowed. I'd assume that a ref count
copied mbuf cluster doesn't allow this copy or things would be
badly broken.)

Bottom line, I think calling either m_collapse() or m_defrag()
should be considered a "last resort".

Maybe the driver could reduce the size of if_hw_tsomax whenever
it finds it needs to call one of these functions, to try and avoid
a re-occurrence?

rick

> 
> However, I do suspect we'll be putting a refined version of the patch
> > in head someday (maybe April, sooner would have to be committed by
> > someone else). I suspect that Garrett's code for server read will
> > work
> > well and I'll cobble something to-gether for server readdir and
> > client
> > write.
> >
> > > The performance is also improved:
> > >
> > > Test Before After
> > > SeqWr 1506 7461
> > > SeqRd 566 192015
> > > RndRd 602 218730
> > > RndWr 44 13972
> > >
> > > All numbers in kiB/sec.
> > >
> > If you get the chance, you can try a few tunables on the server.
> > vfs.nfsd.fha.enable=0
> > - ken@ found that FHA was necessary for ZFS exports, to avoid out
> >   of order reads from confusing ZFS's sequential reading heuristic.
> > However, FHA also means that all readaheads for a file are
> > serialized
> > with the reads for the file (same fh->same nfsd thread). Somehow,
> > it
> > seems to me that doing reads concurrently in the server (given
> > shared
> > vnode locks) could be a good thing.
> > --> I wonder what the story is for UFS?
> > So, it would be interesting to see what disabling FHA does for the
> > sequential read test.
> >
> > I think I already mentioned the DRC cache ones:
> > vfs.nfsd.tcphighwater=100000
> > vfs.nfsd.tcpcachetimeo=600 (actually I think Garrett uses 300)
> >
> > Good to see some progress, rick
> > ps: Daniel reports that he will be able to test the patch this
> >     weekend, to see if it fixes his problem that required TSO
> >     to be disabled, so we'll wait and see.
> >
> > > There were initially still some problems with lousy hostcache
> > > values
> > > on the client after the test, which is what causes the iperf
> > > performance to tank after the NFS test, but after a reboot of
> > > both
> > > sides and fresh retest, I haven't reproduced that again.  If it
> > > comes
> > > back, I'll try to figure out what's going on.
> > >
> > Hopefully a networking type might know what is going on, because
> > this
> > is way out of my area of expertise.
> >
> > > But this definitely looks like a move in the right direction.
> > >
> > > Thanks!
> > > _______________________________________________
> > > freebsd-net@freebsd.org mailing list
> > > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > > To unsubscribe, send any mail to
> > > "freebsd-net-unsubscribe@freebsd.org"
> > >
> >
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 03:37:26 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 316ED84F
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 03:37:26 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id E73981B5E
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 03:37:25 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Aj8MACXI6VKDaFve/2dsb2JhbABZDoFkBoFMV4MBuUtPgRp0giUBAQEEAQEBICsgCxsYAgINGQIjBgEJJgYIBwQBHAEDh1ADEQ2qbpdHDYkcF4Epi0GBPgYBAQEaNAeCb4FJBIlJineBFWeDHosshUGCbl0eMXwIFyI
X-IronPort-AV: E=Sophos;i="4.95,746,1384318800"; d="scan'208";a="91742974"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 29 Jan 2014 22:37:24 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id D2A4479283;
 Wed, 29 Jan 2014 22:37:24 -0500 (EST)
Date: Wed, 29 Jan 2014 22:37:24 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Garrett Wollman <wollman@hergotha.csail.mit.edu>
Message-ID: <323566728.18752313.1391053044849.JavaMail.root@uoguelph.ca>
In-Reply-To: <201401300322.s0U3Mt3s010029@hergotha.csail.mit.edu>
Subject: Re: Big physically contiguous mbuf clusters
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.209]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, nparhar@gmail.com
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 03:37:26 -0000

Garrett Wollman wrote:
> In article <20140129231138$3db6@grapevine.csail.mit.edu>,
> nparhar@gmail.com writes:
> 
> >I think this would be very useful.  For example, a zone_jumbo32
> >would
> >hit a sweet spot -- enough to fit 3 jumbo frames and some loose
> >change
> >for metadata.  I'd like to see us improve our allocators and VM
> >system
> >to work better with larger contiguous allocations, rather than
> >deprecating the larger zones.  It seems backwards to push towards
> >smaller allocation units when installed physical memory in a typical
> >system continues to rise.
> 
> In order to resist fragmentation, you need to be willing to dedicate
> some partition of physical memory to larger allocations.  That's fine
> for a special-purpose device like a switch, but is not so good for a
> general-purpose operating system.  But if you were willing to
> reserve,
> say, 1/64th of physical memory at boot time, make it all
> direct-mapped
> using superpages, and allocate it in fixed-power-of-two-sized chunks,
> you would probably get a performance win.  But the chunks *have* to
> be
> fixed-size, otherwise you are nearly guaranteed to get your arena
> checkerboarded.  I'd consider giving 2 GB on a 128-GB machine for
> that.
> 
> For NFS performance, you'd probably want to be able to take a whole
> chunk, read the desired data into it in a single VOP, then pass the
> whole thing to the socket layer wrapped in an mbuf.
> 
Yep, 1 64K (or 128K soon) mbuf would be nice for read, readdir, write.
(Assuming tcp_output knows how to split it up for net interfaces that
 can't handle TSO segments that large.)

I'm not sure why, but most use 65535 (max IP datagram size) as if_hw_tsomax.
(This guarantees the 64K NFS send gets split up. Doesn't TSO split it up
 into mtu sized segments? If so, I don't see why if_hw_tsomax would be
 a limit?) I'm not knowledgible w.r.t. TSO, so feel free to ignore or
ocrrect this.

rick

> -GAWollman
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 03:50:01 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B34299B7
 for <freebsd-net@smarthost.ysv.freebsd.org>;
 Thu, 30 Jan 2014 03:50:01 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 86EAB1D19
 for <freebsd-net@smarthost.ysv.freebsd.org>;
 Thu, 30 Jan 2014 03:50:01 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s0U3o1TR066908
 for <freebsd-net@freefall.freebsd.org>; Thu, 30 Jan 2014 03:50:01 GMT
 (envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id s0U3o1ii066905;
 Thu, 30 Jan 2014 03:50:01 GMT (envelope-from gnats)
Date: Thu, 30 Jan 2014 03:50:01 GMT
Message-Id: <201401300350.s0U3o1ii066905@freefall.freebsd.org>
To: freebsd-net@FreeBSD.org
Cc: 
From: Takefu <takefu@airport.fm>
Subject: Re: kern/121257: [tcp] TSO + natd  -> slow outgoing tcp traffic
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: Takefu <takefu@airport.fm>
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 03:50:01 -0000

The following reply was made to PR kern/121257; it has been noted by GNATS.

From: Takefu <takefu@airport.fm>
To: bug-followup@FreeBSD.org
Cc: vnovy@vnovy.net
Subject: Re: kern/121257: [tcp] TSO + natd  -> slow outgoing tcp traffic
Date: Thu, 30 Jan 2014 12:42:33 +0900

 Limited improvement method
 
 8.4-RELEASE
 9.2-RELEASE
 10.0-RELEASE
 
 --- /usr/src/etc/rc.d/natd	2013-07-01 15:47:09.000000000 +0900
 +++ /etc/rc.d/natd	2014-01-30 12:26:43.000000000 +0900
 @@ -36,6 +36,7 @@
  		fi
  	fi
 
 +	sysctl net.inet.tcp.tso=0 > /dev/null
  	return 0
  }
 
 
From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 03:56:31 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 27A70B03
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 03:56:31 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id E23921DA2
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 03:56:30 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEAM7M6VKDaFve/2dsb2JhbABZhBuDAboRCYEadIJPBIEHAg0ZAl+IGJthjxGgcReBKY0igyqBSQSJSaB+g0segW4
X-IronPort-AV: E=Sophos;i="4.95,746,1384318800"; d="scan'208";a="91744702"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 29 Jan 2014 22:56:29 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id DBB80B4063
 for <freebsd-net@freebsd.org>; Wed, 29 Jan 2014 22:56:29 -0500 (EST)
Date: Wed, 29 Jan 2014 22:56:29 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: FreeBSD Net <freebsd-net@freebsd.org>
Message-ID: <24918548.18766184.1391054189890.JavaMail.root@uoguelph.ca>
Subject: 64K NFS I/O generates a 34mbuf list for TCP which breaks TSO
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.209]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 03:56:31 -0000

For some time, I've been seeing reports of NFS related issues
that get resolved by the user either disabling TSO or reducing
the rsize/wsize to 32K.

I now think I know why this is happening, although the evidence
is just coming in. (I have no hardware/software that does TSO,
so I never see these problems during testing.)

A 64K NFS read reply, readdir reply or write request results in
the krpc handing the TCP socket an mbuf list with 34 entries via
sosend(). Now, I am really rusty w.r.t. TCP, but it looks like
this will result in a TCP/IP header + 34 data mbufs being handed
to the network device driver, if if_hw_tsomax has the default
setting of 65535 (max IP datagram).

At a glance, many drivers use a scatter/gather list of around
32 elements for transmission. If the mbuf list doesn't fit in
this scatter/gather list (which looks to me like it will be the
case), then the driver either calls m_defrag() or m_collapse()
to try and fix the problem.
This seems like a serious problem to me.
1 - If m_collapse()/m_defrag() fails, the transmit doesn't happen
    and things wedge until a TCP timeout retransmit gets things
    going again. It looks like m_defrag() is less likely to fail,
    but generates a lot of overhead. m_collapse() seems to be less
    overhead, but seems less likely to succeed.
    (Since m_defrag() is called with M_NOWAIT, it can fail in that
     extreme case. I'm not sure if it will fail otherwise?)

So, how to fix this?
1 - Change NFS to use 4K clusters for these 64K reads/writes, reducing
    the mbuf list from 34->18. Preliminary patches for this are being
    tested.
    --> However, this seems to be more of a work-around than a fix.
2 - As soon as a driver needs to call m_defrag() or m_collapse()
    because the length of the TSO transmit mbuf list is too long,
    reduce if_hw_tsomax by a significant amount to try and get
    tcp_output() to generate shorter mbuf lists.
    Not great, but at least better than calling m_defrag()/m_collapse()
    over and over and over again.
    --> As a starting point, instrumenting the device drivers so that
        counts of # ofcalls to m_defrag()/m_collapse() and counts of
        failed calls would help to confirm how serious this problem is.
3 - ??? Any ideas from folk familiar with TSO and these drivers.

rick
ps: Until this gets resolved, please tell anyone with serious NFS
    performance/reliability issues to try either disabling TSO or
    doing client mounts with "-o rsize=32768,wsize=32768".
    I'm not sure how many believe me when I tell them, but at least
    I now have a theory as to why it can help a lot.


From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 08:47:08 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 75D9F310
 for <freebsd-net@FreeBSD.org>; Thu, 30 Jan 2014 08:47:08 +0000 (UTC)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id C57F8127F
 for <freebsd-net@FreeBSD.org>; Thu, 30 Jan 2014 08:47:07 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA10574
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 10:46:59 +0200 (EET)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1W8nH0-0007cE-TK
 for freebsd-net@freebsd.org; Thu, 30 Jan 2014 10:46:58 +0200
Message-ID: <52EA114C.40908@FreeBSD.org>
Date: Thu, 30 Jan 2014 10:46:04 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: FreeBSD Net <freebsd-net@FreeBSD.org>
Subject: Re: Big physically contiguous mbuf clusters
References: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
 <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com>
 <20140129222714.GK93141@funkthat.com>
In-Reply-To: <20140129222714.GK93141@funkthat.com>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 08:47:08 -0000

on 30/01/2014 00:27 John-Mark Gurney said the following:
> Adrian Chadd wrote this message on Wed, Jan 29, 2014 at 14:21 -0800:
>> On 29 January 2014 10:54, Garrett Wollman <wollman@csail.mit.edu> wrote:
>>> Resolved: that mbuf clusters longer than one page ought not be
>>> supported.  There is too much physical-memory fragmentation for them
>>> to be of use on a moderately active server.  9k mbufs are especially
>>> bad, since in the fragmented case they waste 3k per allocation.
>>
>> I've been wondering whether it'd be feasible to teach the physical
>> memory allocator about >page sized allocations and to create zones of
>> slightly more physically contiguous memory.
>>
>> For servers with lots of memory we could then keep these around and
>> only dip into them for temporary allocations (eg not VM pages that may
>> be held for some unknown amount of time.)
>>
>> Question is - can we enforce that kind of behaviour?
> 
> It shouldn't be too hard to do...  Since everything pretty much goes
> through uma we can adopt a scheme similar to what Solaris does (read
> Magazines and Vmem: Extending the Slab Allocator to Many CPUs and
> Arbitrary Resources)...  Instead of dealing w/ page size allocations,
> everything is larger, say 16KB, and broken down from there...
> 

FWIW, this is not how it is currently implemented in Solaris judging from
OpenSolaris / illumos code.
They try to find a slab size where the waste would be minimal.  There is a cap
on the maximum slab size, of course.  This is also done for sub-page items.
E.g. if an item size is 3KB, then FreeBSD uma would use 4KB slabs and waste
about 1KB in each slab.  On the other hand, illumos kmem cache code would pick
12KB slab size.

-- 
Andriy Gapon

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 09:36:49 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 23A66E0A;
 Thu, 30 Jan 2014 09:36:49 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id EC2D5170F;
 Thu, 30 Jan 2014 09:36:48 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s0U9amgi073278;
 Thu, 30 Jan 2014 09:36:48 GMT
 (envelope-from vanhu@freefall.freebsd.org)
Received: (from vanhu@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id s0U9amBJ073277;
 Thu, 30 Jan 2014 09:36:48 GMT (envelope-from vanhu)
Date: Thu, 30 Jan 2014 09:36:48 GMT
Message-Id: <201401300936.s0U9amBJ073277@freefall.freebsd.org>
To: vanhu@FreeBSD.org, freebsd-net@FreeBSD.org, vanhu@FreeBSD.org
From: vanhu@FreeBSD.org
Subject: Re: kern/169438: [ipsec] ipv4-in-ipv6 tunnel mode IPsec does not work
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 09:36:49 -0000

Synopsis: [ipsec] ipv4-in-ipv6 tunnel mode IPsec does not work

Responsible-Changed-From-To: freebsd-net->vanhu
Responsible-Changed-By: vanhu
Responsible-Changed-When: jeu 30 jan 2014 09:34:17 UTC
Responsible-Changed-Why: 
Hi.

Your hack solves the issue for ipv4-in-ipv6, but the same issue exists
for ipv6-in-ipv4, and requires some more refactoring of the code.

We're working on such a patch for both ways, and I hope we'll have a version
ready to commit within the next few weeks.


http://www.freebsd.org/cgi/query-pr.cgi?pr=169438

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 13:45:30 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B70737C9
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 13:45:30 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3712B1E50
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 13:45:30 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id s0UDjKQm032687
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 15:45:20 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua s0UDjKQm032687
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id s0UDjKXU032686
 for freebsd-net@freebsd.org; Thu, 30 Jan 2014 15:45:20 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Thu, 30 Jan 2014 15:45:19 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: FreeBSD Net <freebsd-net@freebsd.org>
Subject: Re: Big physically contiguous mbuf clusters
Message-ID: <20140130134519.GU24664@kib.kiev.ua>
References: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
 <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com>
 <20140129231121.GA18434@ox>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="oGy11dVowAZA6eXT"
Content-Disposition: inline
In-Reply-To: <20140129231121.GA18434@ox>
User-Agent: Mutt/1.5.22 (2013-10-16)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 13:45:30 -0000


--oGy11dVowAZA6eXT
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jan 29, 2014 at 03:11:21PM -0800, Navdeep Parhar wrote:
> On Wed, Jan 29, 2014 at 02:21:21PM -0800, Adrian Chadd wrote:
> > Hi,
> >=20
> > On 29 January 2014 10:54, Garrett Wollman <wollman@csail.mit.edu> wrote:
> > > Resolved: that mbuf clusters longer than one page ought not be
> > > supported.  There is too much physical-memory fragmentation for them
> > > to be of use on a moderately active server.  9k mbufs are especially
> > > bad, since in the fragmented case they waste 3k per allocation.
> >=20
> > I've been wondering whether it'd be feasible to teach the physical
> > memory allocator about >page sized allocations and to create zones of
> > slightly more physically contiguous memory.
>=20
> I think this would be very useful.  For example, a zone_jumbo32 would
> hit a sweet spot -- enough to fit 3 jumbo frames and some loose change
> for metadata.  I'd like to see us improve our allocators and VM system
> to work better with larger contiguous allocations, rather than
> deprecating the larger zones.  It seems backwards to push towards
> smaller allocation units when installed physical memory in a typical
> system continues to rise.
>=20
> Allocating 3 x 4K instead of 1 x 9K for a jumbo means 3x the number of
> vtophys translations, 3x the phys_addr/len traffic on the PCIe bus
> (scatter list has to be fed to the chip and now it's 3x what it has to
> be), 3x the number of "wrapper" mbuf allocations (one for each 4K
> cluster) which will then be stitched together to form a frame, etc. etc.
If the platform supports IOMMU, then physical contiguity of the pages
could be ignored, since with proper busdma tag VT-d driver allocates
continous bus address space for device view mapping.

Of course, this is moot right now due to drivers have no idea about IOMMU
presence, and since IOMMU busdma both disabled by default and having
non-trivial setup cost.

>=20
> Regards,
> Navdeep
>=20
> >=20
> > For servers with lots of memory we could then keep these around and
> > only dip into them for temporary allocations (eg not VM pages that may
> > be held for some unknown amount of time.)
> >=20
> > Question is - can we enforce that kind of behaviour?
> >=20
> >=20
> >=20
> > -a
> > _______________________________________________
> > freebsd-net@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"

--oGy11dVowAZA6eXT
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJS6ldvAAoJEJDCuSvBvK1BIO4P/jdUPJCR+HAGTKErDa5QXf2s
EKVRLoHGl/wRiI6gv5G3fbh6ZdHAQLSD29iyOecpTMmv0CtTLyk19y9nVB0VdyhB
cBKYkDOUZab1jksREKKjUlf+OGWpdTuG7FZ3pQki6VKXB81zmDN0aOijCM4H+poU
lrG6OWIcU7nlbiAhRqA7rdSdMOKTvtbWBSWfOFzXmlYi374PJXYZOC2foRZXrfAw
G972p18FZnbOTsZ3SO91NmYxTWQ5C6qxufmEjg9OuG38YWFPa80l17c5A3WDKZHt
0F2Ujh970vwynmNRaPq2rX4d/QE9jbtm1qAhDueKoppE1pbLnlORWz5DPSZMzXDx
H8ei0BP/fPXTt5IILHlcMyRBuTTharDGB0UjhOZI0ruD4cWCmkZcoQxraMoe4iWQ
EpiQsGbcXB0VzbG5mAp9KuPt8By0gWf94NLkwva7Z21N2u+SWLvQdUr/k5N0oAqL
mAoMuFc//tzpC8i9/73R+yjMkoiaGZCB3X42OtofLwdWQkINrMdOy4k1ZyI8s2kS
5w9W2QXJPlIOxkc+WCF+MKhX/2RGNvuKLNN+LfbffYIzIDuFsEknoPEuIyVIjPM/
XZP6gMcHIMYS/a18vQzx0h9BnlsV1TnIU0759iGsDS1AT12NPwF8L4TIf0TD+nRr
d+K2ZRxD20R/TptSfghN
=X67A
-----END PGP SIGNATURE-----

--oGy11dVowAZA6eXT--

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 15:06:29 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id E296FA43
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 15:06:29 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id A7DC01510
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 15:06:29 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: 
X-IronPort-AV: E=Sophos;i="4.95,750,1384318800"; d="scan'208";a="92376724"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 30 Jan 2014 10:06:28 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 2A704B40F9
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 10:06:28 -0500 (EST)
Date: Thu, 30 Jan 2014 10:06:28 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: FreeBSD Net <freebsd-net@freebsd.org>
Message-ID: <1217356349.106076.1391094388165.JavaMail.root@uoguelph.ca>
Subject: Re: 64K NFS I/O generates a 34mbuf list for TCP which breaks TSO
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 15:06:29 -0000

Hi, just adding one more idea on what to do about this
to the list:
- Add a if_hw_tsomaxseg and modify the loop in tcp_output()
  so that it uses both if_hw_tsomax and if_hw_tsomaxseg to
  decide how much to hand to the device driver in each mbuf list.
  (I haven't looked to see how easy it would be to change this loop.)

rick

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 15:12:08 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id D78A0BD2
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 15:12:08 +0000 (UTC)
Received: from mail-qa0-x22a.google.com (mail-qa0-x22a.google.com
 [IPv6:2607:f8b0:400d:c00::22a])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 96E8A15C3
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 15:12:08 +0000 (UTC)
Received: by mail-qa0-f42.google.com with SMTP id k4so4552913qaq.15
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 07:12:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=LwkMeLNNvlicyXbBmSJES5ymzZNGHsGZVAELUI5ZbtE=;
 b=rLrIHDuDA+2fYXb49F6FroYECzG53qs8lUq6eh2Uz3T8R7bxzLsr9TSZilj6PAntK9
 mvlyxc1/l1osv9Cipqp3zhTLYkDwAvI7pMpOp902CZg48mGhDtTrMl/CPUQkLMq8SCdV
 alkMCs35KPfpkdcjVUenGNdWJeJzYXgTNMUHEYOcEzB1Z3cCHbov0Fx37N3dxRqHI+fh
 bEAAlsErEZAL6TR0EovDQxUhP/cdw861rdatMgVZMUGv8kkyZ9TAmHhGbiqZoWDPtIAM
 alMinID2+419OYZPiEBessyerTq939tjE0q15s69Kf8FLkrY6cVH9RB8uPOrGAchF+Uf
 UKjQ==
MIME-Version: 1.0
X-Received: by 10.140.108.74 with SMTP id i68mr21004155qgf.87.1391094727736;
 Thu, 30 Jan 2014 07:12:07 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.224.52.8 with HTTP; Thu, 30 Jan 2014 07:12:07 -0800 (PST)
In-Reply-To: <1217356349.106076.1391094388165.JavaMail.root@uoguelph.ca>
References: <1217356349.106076.1391094388165.JavaMail.root@uoguelph.ca>
Date: Thu, 30 Jan 2014 07:12:07 -0800
X-Google-Sender-Auth: lY59v5VrBUmI-QO7Ts9UvkRKGzo
Message-ID: <CAJ-VmonGR-KQBGyKCUn=k8PDH3skB5N8br3JyDTD27+z=UwJJw@mail.gmail.com>
Subject: Re: 64K NFS I/O generates a 34mbuf list for TCP which breaks TSO
From: Adrian Chadd <adrian@freebsd.org>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
Cc: FreeBSD Net <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 15:12:08 -0000

On 30 January 2014 07:06, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Hi, just adding one more idea on what to do about this
> to the list:
> - Add a if_hw_tsomaxseg and modify the loop in tcp_output()
>   so that it uses both if_hw_tsomax and if_hw_tsomaxseg to
>   decide how much to hand to the device driver in each mbuf list.
>   (I haven't looked to see how easy it would be to change this loop.)

I don't think that's a hack. I think adding that and setting tsomaxseg
to say 30 for now would be a good comprimise.


-a

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 15:18:29 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 71788CE6;
 Thu, 30 Jan 2014 15:18:29 +0000 (UTC)
Received: from mail-qc0-x234.google.com (mail-qc0-x234.google.com
 [IPv6:2607:f8b0:400d:c01::234])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 229411601;
 Thu, 30 Jan 2014 15:18:29 +0000 (UTC)
Received: by mail-qc0-f180.google.com with SMTP id i17so5178873qcy.11
 for <multiple recipients>; Thu, 30 Jan 2014 07:18:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:date:message-id:subject:from:to:content-type;
 bh=ugcUvCRSsoGpTtfZbv0Uhnr4LqgB0lzB/VnS6jy4UO4=;
 b=pl2xqtvo2xByrojDI07wxBJuZQHV3freN+XlfHzdCDo1GOHkZNAJmcJp6r5HxFwRSP
 QsgtoBoHFM4MO7AFvUphy2Cr13fRvOh2MppC44R/MXuyFvrg/bgsBhHTH54uzspd7I9v
 HUBiKAnhxtd3zb/R5wWY5kDANHHPzhEwFqYCOYbW59W1IkEUbct3EgAJHkfN/117P1yy
 rTwCJeeMIfDp6btQy50fjlgEeOhLbT6zCqGhpwXCec3rhciSEcUhYlCZ5wfQ41toOJTQ
 EsvhM0xOnsrJ/sVy9N64+ICFzbnmHEL/ggqevC8DKPdtHKFbutWHZhDu3t9gIuEX5q7k
 ttPQ==
MIME-Version: 1.0
X-Received: by 10.229.35.194 with SMTP id q2mr22669825qcd.7.1391095108227;
 Thu, 30 Jan 2014 07:18:28 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.224.52.8 with HTTP; Thu, 30 Jan 2014 07:18:28 -0800 (PST)
Date: Thu, 30 Jan 2014 07:18:28 -0800
X-Google-Sender-Auth: b7mz-_vmvNsP2ykU0q8_bR9XTA8
Message-ID: <CAJ-VmokSjj_7qCRL3juczq_XO8h9etOarsnasjfNg4upmkMX9Q@mail.gmail.com>
Subject: 
From: Adrian Chadd <adrian@freebsd.org>
To: FreeBSD Net <freebsd-net@freebsd.org>, 
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 15:18:29 -0000

Hi,

I'd like to disable the code in flowtable.c that assigns the mbuf flowid.

I'd like to ensure that any mbuf flowid that's set is (eventually)
going to be consistently toeplitz in the future (to match what NICs
are doing on the RX side) and this may cause the flowid to be set to
something completely different.

I've only done some light production testing with this so far, to no
visible ill effects.

What do people think?

Thanks,


-a


Index: sys/net/flowtable.c
===================================================================
--- sys/net/flowtable.c (revision 261001)
+++ sys/net/flowtable.c (working copy)
@@ -1102,10 +1102,12 @@
  if (af == AF_INET6)
  fle = flowtable_lookup_mbuf6(ft, m);
 #endif
+#if 0
  if (fle != NULL && m != NULL && (m->m_flags & M_FLOWID) == 0) {
  m->m_flags |= M_FLOWID;
  m->m_pkthdr.flowid = fle->f_fhash;
  }
+#endif
  return (fle);
 }

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 18:40:44 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id ABF44B87;
 Thu, 30 Jan 2014 18:40:44 +0000 (UTC)
Received: from mail-qc0-x22c.google.com (mail-qc0-x22c.google.com
 [IPv6:2607:f8b0:400d:c01::22c])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5B9B21909;
 Thu, 30 Jan 2014 18:40:44 +0000 (UTC)
Received: by mail-qc0-f172.google.com with SMTP id c9so5627888qcz.31
 for <multiple recipients>; Thu, 30 Jan 2014 10:40:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:date:message-id:subject:from:to:content-type;
 bh=uiFIhVVAE6V9Nvpvo/jiWNAsVutM+eIUHimqf4JIi8M=;
 b=i/WGkaTck6R5QkObIVBYWjQKlR+76HfqpcZjJHsW0JRsVaX/yzwkZCegd1dSdDvwBm
 1qlm0fY/M23u+SspaMJFURsFvQ3tZG96S9SxZv540bcPcTHqaByErk8N6gb+mC9ZrvX4
 D+JOE6dns5n6xwEhAhK5u1LKeLEYyVuxbD5D9ztEAyiGMpM7mOJRvDH7uxtatYGjUsvH
 8/jWRMazBOrNJkf9t9w3OLvTARaeIx4X/4OmrMD94PMz+AfmqGZ3jnQeqiPPziLXN9II
 3YBs66a0lJQWC23bHne7h1lhacSkIYpWDeTRu63KmsZi+5ymrISnNRJQetPedS8ZdDgG
 fMYw==
MIME-Version: 1.0
X-Received: by 10.140.96.180 with SMTP id k49mr22717122qge.4.1391107243141;
 Thu, 30 Jan 2014 10:40:43 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.224.52.8 with HTTP; Thu, 30 Jan 2014 10:40:43 -0800 (PST)
Date: Thu, 30 Jan 2014 10:40:43 -0800
X-Google-Sender-Auth: _UpKv-12afRRAdwR0t63cd57tik
Message-ID: <CAJ-Vmokh7AZdW3_G3CXDVzRfWm3EVz9kUeOWvHzr+G4SyNRyYg@mail.gmail.com>
Subject: Re: (removing mbuf flowid setup in flowtable.c)
From: Adrian Chadd <adrian@freebsd.org>
To: FreeBSD Net <freebsd-net@freebsd.org>, 
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 18:40:44 -0000

On 30 January 2014 07:18, Adrian Chadd <adrian@freebsd.org> wrote:
> Hi,
>
> I'd like to disable the code in flowtable.c that assigns the mbuf flowid.
>
> I'd like to ensure that any mbuf flowid that's set is (eventually)
> going to be consistently toeplitz in the future (to match what NICs
> are doing on the RX side) and this may cause the flowid to be set to
> something completely different.
>
> I've only done some light production testing with this so far, to no
> visible ill effects.
>
> What do people think?

Someone pointed out privately that doing this would mean that UDP
flows without flow ids would suddenly not have flowids any longer and
thus wouldn't use multiple output queues.

So, I'll leave this alone for now until I can import the toeplitz hash
code into -HEAD and add an option to tag outbound udp frames with this
particular flowid hash.

Thanks,


-a


> Thanks,
>
>
> -a
>
>
> Index: sys/net/flowtable.c
> ===================================================================
> --- sys/net/flowtable.c (revision 261001)
> +++ sys/net/flowtable.c (working copy)
> @@ -1102,10 +1102,12 @@
>   if (af == AF_INET6)
>   fle = flowtable_lookup_mbuf6(ft, m);
>  #endif
> +#if 0
>   if (fle != NULL && m != NULL && (m->m_flags & M_FLOWID) == 0) {
>   m->m_flags |= M_FLOWID;
>   m->m_pkthdr.flowid = fle->f_fhash;
>   }
> +#endif
>   return (fle);
>  }

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 18:49:12 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4CCCAD6B
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 18:49:12 +0000 (UTC)
Received: from mail-ie0-f173.google.com (mail-ie0-f173.google.com
 [209.85.223.173])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 14DDF1964
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 18:49:11 +0000 (UTC)
Received: by mail-ie0-f173.google.com with SMTP id e14so3677952iej.32
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 10:49:00 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:sender:subject:mime-version:content-type:from
 :in-reply-to:date:cc:content-transfer-encoding:message-id:references
 :to; bh=G6zJZbKaq2ujxL42jBzDatR2Zr2Mcq4ojANg6Msl+JY=;
 b=YDkdUT7DCe0qCexOiHEXexdqN47mbMnPjVqBMXD+9Ni6FzoLFSwpKc+DzDgJq/wmmD
 ngV9lN3Fvyzfyo4+TjtZotse6qgjmJMZERRzJDKz3VbryUwoQAPVrJhUfY18/24bVlRG
 cEUTYbRU4V43urW/xxVOtB+uPOldWMEkEDGH/58AKQHGIIGNa52IiyxYrCQfI8xq4XH9
 6R7kvB+7hHWJ5SSSFpa9OEMqP2ZG+5wPgUHxKCAnsj3KRGCQSXIuVZhmMy0MniIBzStS
 6/7wOTTSJaMs43VYqj/WLJLtZESIUSpInvUiuJwGaouu7LpIGvhhEc/qqPMpNQdefurs
 +XvA==
X-Gm-Message-State: ALoCoQnqDKbfRNaPoB1lfBCiWL50JkCoq1pRrWboP986SekWRvyIlVAEHVHPEt6ZV+LbuIBCd5bd
X-Received: by 10.50.138.37 with SMTP id qn5mr18540543igb.36.1391107740401;
 Thu, 30 Jan 2014 10:49:00 -0800 (PST)
Received: from fusionlt2834a.int.fusionio.com ([209.117.142.2])
 by mx.google.com with ESMTPSA id t4sm81981674igm.10.2014.01.30.10.48.59
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Thu, 30 Jan 2014 10:48:59 -0800 (PST)
Sender: Warner Losh <wlosh@bsdimp.com>
Subject: Re: (removing mbuf flowid setup in flowtable.c)
Mime-Version: 1.0 (Apple Message framework v1085)
Content-Type: text/plain; charset=us-ascii
From: Warner Losh <imp@bsdimp.com>
In-Reply-To: <CAJ-Vmokh7AZdW3_G3CXDVzRfWm3EVz9kUeOWvHzr+G4SyNRyYg@mail.gmail.com>
Date: Thu, 30 Jan 2014 11:48:57 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <87746AA8-4759-4302-883C-25CDAC95C322@bsdimp.com>
References: <CAJ-Vmokh7AZdW3_G3CXDVzRfWm3EVz9kUeOWvHzr+G4SyNRyYg@mail.gmail.com>
To: Adrian Chadd <adrian@freebsd.org>
X-Mailer: Apple Mail (2.1085)
Cc: FreeBSD Net <freebsd-net@freebsd.org>,
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 18:49:12 -0000


On Jan 30, 2014, at 11:40 AM, Adrian Chadd wrote:

> On 30 January 2014 07:18, Adrian Chadd <adrian@freebsd.org> wrote:
>> Hi,
>>=20
>> I'd like to disable the code in flowtable.c that assigns the mbuf =
flowid.
>>=20
>> I'd like to ensure that any mbuf flowid that's set is (eventually)
>> going to be consistently toeplitz in the future (to match what NICs
>> are doing on the RX side) and this may cause the flowid to be set to
>> something completely different.
>>=20
>> I've only done some light production testing with this so far, to no
>> visible ill effects.
>>=20
>> What do people think?
>=20
> Someone pointed out privately that doing this would mean that UDP
> flows without flow ids would suddenly not have flowids any longer and
> thus wouldn't use multiple output queues.
>=20
> So, I'll leave this alone for now until I can import the toeplitz hash
> code into -HEAD and add an option to tag outbound udp frames with this
> particular flowid hash.

toeplitz is a funky kind of matrix, according to google. What does that =
have to do with mbufs? :)

Warner


From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 18:50:32 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8B194F81;
 Thu, 30 Jan 2014 18:50:32 +0000 (UTC)
Received: from mail-qc0-x22c.google.com (mail-qc0-x22c.google.com
 [IPv6:2607:f8b0:400d:c01::22c])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 38D1E1977;
 Thu, 30 Jan 2014 18:50:32 +0000 (UTC)
Received: by mail-qc0-f172.google.com with SMTP id c9so5647531qcz.31
 for <multiple recipients>; Thu, 30 Jan 2014 10:50:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=nDUF5NlLj7vyiYJ7D9I5i7McjwlQ38+thwDPkwxt9SY=;
 b=i/MNjsSJ0WTffo7iHwtzQVUrvv/IgwfvFP057VWD3YO+jHhFy9yoI2E2otlLKjq3Tw
 4BFj46BlGmlBgTKs0gnFK1XX1ZwljAEzMiyhq0b+DRnH4CaeyUFSdh6rynwLcSWos9CZ
 myQ+QB9z82T1tM3Kx9toFFqHgUZttu5LCI5L8ND+knd7WdRq+4jqgkG1rVnUeb5VdnxS
 8/8gSWaURDDQddh/HYNplDJ2jzZ6RZp+SeoVm7MgsKVYGV3VfCGG/+u80boxIhbL/On/
 5vMWH3lX4uDv827tuhz3QZaIasbDxTW686DU/h090Gta6h14YQtxDMnZ/FO7FP8TIV7U
 G3fg==
MIME-Version: 1.0
X-Received: by 10.140.24.71 with SMTP id 65mr23460311qgq.12.1391107831471;
 Thu, 30 Jan 2014 10:50:31 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.224.52.8 with HTTP; Thu, 30 Jan 2014 10:50:31 -0800 (PST)
In-Reply-To: <87746AA8-4759-4302-883C-25CDAC95C322@bsdimp.com>
References: <CAJ-Vmokh7AZdW3_G3CXDVzRfWm3EVz9kUeOWvHzr+G4SyNRyYg@mail.gmail.com>
 <87746AA8-4759-4302-883C-25CDAC95C322@bsdimp.com>
Date: Thu, 30 Jan 2014 10:50:31 -0800
X-Google-Sender-Auth: iLb-IZ3u7wCTL9TuDs_QIrhanZk
Message-ID: <CAJ-Vmongfj1bJE608sbXUOtVrQqvfp5PjnhSQJvLcM3tgC=-MA@mail.gmail.com>
Subject: Re: (removing mbuf flowid setup in flowtable.c)
From: Adrian Chadd <adrian@freebsd.org>
To: Warner Losh <imp@bsdimp.com>
Content-Type: text/plain; charset=ISO-8859-1
Cc: FreeBSD Net <freebsd-net@freebsd.org>,
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 18:50:32 -0000

On 30 January 2014 10:48, Warner Losh <imp@bsdimp.com> wrote:

>
> toeplitz is a funky kind of matrix, according to google. What does that have to do with mbufs? :)

google "toeplitz hash rss". :-)


-a

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 20:30:18 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A99C8DC7;
 Thu, 30 Jan 2014 20:30:18 +0000 (UTC)
Received: from mail-ig0-x233.google.com (mail-ig0-x233.google.com
 [IPv6:2607:f8b0:4001:c05::233])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6328A1241;
 Thu, 30 Jan 2014 20:30:18 +0000 (UTC)
Received: by mail-ig0-f179.google.com with SMTP id c10so8032174igq.0
 for <multiple recipients>; Thu, 30 Jan 2014 12:30:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=BvNKMo4HmFJTvtr1m9IzBaaalkwmp7VIJ/nXUSIDH0w=;
 b=D31j1TdIDJpGm+qHklnbsHH3ZMdnyHwRjIj2mNUBtZTD4v004Ghhq1SyEAzG4N9Zyg
 JycEvLY1F68ueIdFqwGWZzRPEVJR3Fq5hsvJ5rifA3cPHQO7ZOtpOUMticISC0sW3Gp6
 bPEuSp5pMqT1dg3KHU0n6sp7Qs2uXTdr/fLOk1/fuTlec/yHB2iwSBRhf8Pp3bHe9kO2
 hGJgMha3BIEaVSiV7oScDI3/tH9O4NnQ/zW0rOHdWqSTWixQqdWfkPbVaqH5comlwPhD
 NKwIkhqE3fBvy5W6k4fpzo+bLTOy4rQyEzk/+ywHv2NwQnDfC0L/jJTRk2ChubUDeHlj
 dHqQ==
MIME-Version: 1.0
X-Received: by 10.50.50.70 with SMTP id a6mr15952234igo.1.1391113816972; Thu,
 30 Jan 2014 12:30:16 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Thu, 30 Jan 2014 12:30:16 -0800 (PST)
In-Reply-To: <1879662319.18746958.1391052668182.JavaMail.root@uoguelph.ca>
References: <CAGaYwLcDVMA3=1x4hXXVvRojCBewWFZUyZfdiup=jo685+51+A@mail.gmail.com>
 <1879662319.18746958.1391052668182.JavaMail.root@uoguelph.ca>
Date: Thu, 30 Jan 2014 15:30:16 -0500
X-Google-Sender-Auth: dtPD4ZjLe0S89T7j6CnmOBITrh4
Message-ID: <CABXB=RR1eDvdUAaZd73Vv99EJR=DFzwRvMTw3WFER3aQ+2+2zQ@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Bryan Venteicher <bryanv@freebsd.org>,
 Garrett Wollman <wollman@freebsd.org>, freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 20:30:18 -0000

On Wed, Jan 29, 2014 at 10:31 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>> I've been busy the last few days, and won't be able to get to any
>> code
>> until the weekend.

Is there likely to be more to it than just cranking the MAX_TX_SEGS
value and recompiling?  If so, is it something I could take on?

> Well, NFS hands TCP a list of 34 mbufs. If TCP obly adds one, then
> increasing it from 34 to 35 would be all it takes. However, see below.

One thing I don't want to miss here is that an NFS block size of
65,536 is really suboptimal.  The largest size of a TCP datagram is
65535.  So by the time NFS adds the overhead on and the total amount
of data to be sent winds up in that ~65k range, it guarantees that the
operation has to be split it into at least two TCP packets, one
max-size and one tiny one.  This doubles a lot of the network stack
overhead, regardless of whether the packet ends up being segmented
into tiny bits down the road or not.

If NFS could be modified to respect the actual size of a TCP packet,
generating a steady stream of 63.9k (or thereabout) writes instead of
the current 64k-1k-64k-1k, performance would likely see another
significant boost.  This would nearly double the average throughput
per packet, which would help with network latency and CPU load.

It's also not 100% clear but it seems like in some cases the existing
behavior also causes the TCP stack to park on the "leftover" bit and
wait for more data, which comes in another >64k chunk, and from there
on out there's no more correlation between TCP packets and NFS
operations, so an operation doesn't begin on a packet boundary.  That
continues as long as load keeps up.  That's probably not good for
performance either.  And it certainly confuses the heck out of
tcpdump.

Probably 60k would be the next most reasonable size, since it's the
largest page size multiple that will fit into a TCP packet while still
leaving room for overhead.

Since the max size of TCP packets is not an area where there's really
any flexibility, what would have to happen to NFS to make that (or
arbitrary values) perform at its best within that constraint?

It's apparent from even trivial testing that performance is
dramatically affected if the "use a power of two for NFS rsize/wsize"
recommendation isn't followed, but what is the origin of that?  Is it
something that could be changed?

> I don't think that m_collapse() is more likely to fail, since it
> only copies data to the previous mbuf when the entire mbuf that
> follows will fit and it's allowed. I'd assume that a ref count
> copied mbuf cluster doesn't allow this copy or things would be
> badly broken.)

m_collapse checks M_WRITEABLE which appears to cover the ref count
case.  (It's a dense macro, but it seems to require a ref count of 1
if a cluster is used.)

The cases where m_collapse can succeed are pretty slim.  It pretty
much requires two consecutive underutilizied buffers, which probably
explains why it fails so often in this code path.  Since one of its
two methods outright skips the packet header mbuf (to avoid risk of
moving it), possibly the only case where it succeeds is when the last
data mbuf is short enough that whatever NFS trailers are being
appended can fit with it.

> Bottom line, I think calling either m_collapse() or m_defrag()
> should be considered a "last resort".

It definitely seems more designed for a case where 8 different stack
layers each put their own little header/trailer fingerprint on the
packet, and that's not what's happening here.

> Maybe the driver could reduce the size of if_hw_tsomax whenever
> it finds it needs to call one of these functions, to try and avoid
> a re-occurrence?

Since the issue is one of segment length rather than packet length,
this seems risky.  If one of those touched-by-everybody packets goes
by, it may not be that large, but it would risk permanently (until
reboot) dropping the throughput of that interface.

Thanks!

From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 22:44:06 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 5A28DC9F;
 Thu, 30 Jan 2014 22:44:06 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id D9B561E6F;
 Thu, 30 Jan 2014 22:44:05 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: 
X-IronPort-AV: E=Sophos;i="4.95,752,1384318800"; d="scan'208";a="92492324"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 30 Jan 2014 17:44:03 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id CD40CB4084;
 Thu, 30 Jan 2014 17:44:03 -0500 (EST)
Date: Thu, 30 Jan 2014 17:44:03 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: J David <j.david.lists@gmail.com>
Message-ID: <87942875.478893.1391121843834.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RR1eDvdUAaZd73Vv99EJR=DFzwRvMTw3WFER3aQ+2+2zQ@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.209]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: Bryan Venteicher <bryanv@freebsd.org>,
 Garrett Wollman <wollman@freebsd.org>, freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 22:44:06 -0000

J David wrote:
> On Wed, Jan 29, 2014 at 10:31 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> >> I've been busy the last few days, and won't be able to get to any
> >> code
> >> until the weekend.
> 
> Is there likely to be more to it than just cranking the MAX_TX_SEGS
> value and recompiling?  If so, is it something I could take on?
> 
> > Well, NFS hands TCP a list of 34 mbufs. If TCP obly adds one, then
> > increasing it from 34 to 35 would be all it takes. However, see
> > below.
> 
> One thing I don't want to miss here is that an NFS block size of
> 65,536 is really suboptimal.  The largest size of a TCP datagram is
> 65535.  So by the time NFS adds the overhead on and the total amount
> of data to be sent winds up in that ~65k range, it guarantees that
> the
> operation has to be split it into at least two TCP packets, one
> max-size and one tiny one.  This doubles a lot of the network stack
> overhead, regardless of whether the packet ends up being segmented
> into tiny bits down the road or not.
> 
For your virtual network, yes. For the underlying file system on the
server (which would not normally be in memory), a large block size
will normally be good. (No one size fits all, which is why there are
the rsize/wsize mount options.) To be honest, the limit is MAXBSIZE,
which just happens to be 64K at this time. I'd like to see MAXBSIZE
increased to at least 128K, since that is the default blocksize for
ZFS, I've been told.

Also, for real networks, the NFS RPC message will be broken into
quite a few packets to go on the wire, as far as I know. (I don't
think there are real networks using a 64K jumbo packet, is there?)
For my hardware, the packets will be 1500bytes each on the wire,
since nothing I have does jumbo packets.

Unfortunately, NFS adds a little bit to the front of the data, so
an NFS RPC will always be a little bit more than a power of 2 in
size for reads/writes of a power of 2. Also, most NFS RPC messages
are small, so NFS traffic is always going to have a lot of small
TCP segments interspersed with a few large ones (and going in both
directions on the TCP connection concurrently).

Now, I am not sure why 65535 (largest ip datagram) has been chosen
as the default limit for TSO segments? (From my point of view, it
would be nice if the limit were larger, assuming there is a limit on
the number of mbufs in the list, so that calls to m_collapse()/m_defrag()
are avoided. I am hoping the networking types consider my recent post
and maybe the suggestion of having a if_hw_tsomaxseg limit along with
if_hw_tsomax.)

> If NFS could be modified to respect the actual size of a TCP packet,
> generating a steady stream of 63.9k (or thereabout) writes instead of
> the current 64k-1k-64k-1k, performance would likely see another
> significant boost.  This would nearly double the average throughput
> per packet, which would help with network latency and CPU load.
> 
> It's also not 100% clear but it seems like in some cases the existing
> behavior also causes the TCP stack to park on the "leftover" bit and
> wait for more data, which comes in another >64k chunk, and from there
> on out there's no more correlation between TCP packets and NFS
> operations, so an operation doesn't begin on a packet boundary.  That
> continues as long as load keeps up.  That's probably not good for
> performance either.  And it certainly confuses the heck out of
> tcpdump.
> 
Well, since NFS sets the TCP_NODELAY socket option, that shouldn't
occur in the TCP layer. If some network device driver is delaying,
waiting for more to send, then I'd say that device driver is broken.

> Probably 60k would be the next most reasonable size, since it's the
> largest page size multiple that will fit into a TCP packet while
> still
> leaving room for overhead.
> 
> Since the max size of TCP packets is not an area where there's really
> any flexibility, what would have to happen to NFS to make that (or
> arbitrary values) perform at its best within that constraint?
> 
For real NFS environments, the performance of the file system and
underlying disk subsystem is generally more important than the network.
(Your benchmark has artificially taken the file system on disk out of
 the mix, so you will see an exaggerated effect from network performance.
 This is fine if you are looking for network bottlenecks, but not if
 you want to relate this to performance of a real NFS environment.)
I already mentioned that the Linux client doing file_sync 8K writes
will result in poor performance of a server's disk file system. (Some
NAS vendors avoid this by using non-volatile ram in the server as stable
storage, but a FreeBSD server can't expect such hardware to be available.)

> It's apparent from even trivial testing that performance is
> dramatically affected if the "use a power of two for NFS rsize/wsize"
> recommendation isn't followed, but what is the origin of that?  Is it
> something that could be changed?
> 
Because disk file systems on file servers always use block sizes that
are a power of 2.

> > I don't think that m_collapse() is more likely to fail, since it
> > only copies data to the previous mbuf when the entire mbuf that
> > follows will fit and it's allowed. I'd assume that a ref count
> > copied mbuf cluster doesn't allow this copy or things would be
> > badly broken.)
> 
> m_collapse checks M_WRITEABLE which appears to cover the ref count
> case.  (It's a dense macro, but it seems to require a ref count of 1
> if a cluster is used.)
> 
> The cases where m_collapse can succeed are pretty slim.  It pretty
> much requires two consecutive underutilizied buffers, which probably
> explains why it fails so often in this code path.  Since one of its
> two methods outright skips the packet header mbuf (to avoid risk of
> moving it), possibly the only case where it succeeds is when the last
> data mbuf is short enough that whatever NFS trailers are being
> appended can fit with it.
> 
Yes, I would agree with this. (I think I somehow mistyped what I
meant to say. I didn't mean to imply that m_collapse() will usually
succeed for these long NFS mbuf list RPC messages.)

> > Bottom line, I think calling either m_collapse() or m_defrag()
> > should be considered a "last resort".
> 
> It definitely seems more designed for a case where 8 different stack
> layers each put their own little header/trailer fingerprint on the
> packet, and that's not what's happening here.
> 
> > Maybe the driver could reduce the size of if_hw_tsomax whenever
> > it finds it needs to call one of these functions, to try and avoid
> > a re-occurrence?
> 
> Since the issue is one of segment length rather than packet length,
> this seems risky.  If one of those touched-by-everybody packets goes
> by, it may not be that large, but it would risk permanently (until
> reboot) dropping the throughput of that interface.
> 
Agreed. I think adding a if_hw_tsomaxseg that TCP can use is preferable.
I didn't think of that until after sending the first post.
Also, I think adding it implies a driver KPI change, which means it can't
be done for 9.n or 10.n.

rick

> Thanks!
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Fri Jan 31 03:32:44 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 14BE032D;
 Fri, 31 Jan 2014 03:32:44 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id 5C6CD13D3;
 Fri, 31 Jan 2014 03:32:40 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: 
X-IronPort-AV: E=Sophos;i="4.95,754,1384318800"; d="scan'208";a="92529657"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 30 Jan 2014 22:32:32 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 23612B403B;
 Thu, 30 Jan 2014 22:32:32 -0500 (EST)
Date: Thu, 30 Jan 2014 22:32:32 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Adrian Chadd <adrian@freebsd.org>
Message-ID: <1856284835.584005.1391139152133.JavaMail.root@uoguelph.ca>
In-Reply-To: <CAJ-VmonGR-KQBGyKCUn=k8PDH3skB5N8br3JyDTD27+z=UwJJw@mail.gmail.com>
Subject: Re: 64K NFS I/O generates a 34mbuf list for TCP which breaks TSO
MIME-Version: 1.0
Content-Type: multipart/mixed; 
 boundary="----=_Part_584003_402802933.1391139152131"
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: FreeBSD Net <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jan 2014 03:32:44 -0000

------=_Part_584003_402802933.1391139152131
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Adrian Chadd wrote:
> On 30 January 2014 07:06, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> > Hi, just adding one more idea on what to do about this
> > to the list:
> > - Add a if_hw_tsomaxseg and modify the loop in tcp_output()
> >   so that it uses both if_hw_tsomax and if_hw_tsomaxseg to
> >   decide how much to hand to the device driver in each mbuf list.
> >   (I haven't looked to see how easy it would be to change this
> >   loop.)
> 
> I don't think that's a hack. I think adding that and setting
> tsomaxseg
> to say 30 for now would be a good comprimise.
> 
Well, my TCP is very rusty and I have no way to test it (I don't
have anything that does TSO), but I've attached a stab at a patch
to do this.

Maybe it can be used as a starting point for this, if others think
it makes sense.

The "#ifdef notyet" in the patch would become something like:
# if __FreeBSD_version >= NNNN
when a change to add if_hw_tsomaxseg is done, was what I was
thinking.

rick

> 
> 
> -a
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

------=_Part_584003_402802933.1391139152131
Content-Type: text/x-patch; name=tsomaxseg.patch
Content-Disposition: attachment; filename=tsomaxseg.patch
Content-Transfer-Encoding: base64

LS0tIGtlcm4vdWlwY19zb2NrYnVmLmMuc2F2CTIwMTQtMDEtMzAgMjA6Mjc6MTcuMDAwMDAwMDAw
IC0wNTAwCisrKyBrZXJuL3VpcGNfc29ja2J1Zi5jCTIwMTQtMDEtMzAgMjI6MTI6MDguMDAwMDAw
MDAwIC0wNTAwCkBAIC05NjUsNiArOTY1LDM5IEBAIHNic25kcHRyKHN0cnVjdCBzb2NrYnVmICpz
YiwgdV9pbnQgb2ZmLCAKIH0KIAogLyoKKyAqIFJldHVybiB0aGUgZmlyc3QgbWJ1ZiBmb3IgdGhl
IHByb3ZpZGVkIG9mZnNldC4KKyAqLworc3RydWN0IG1idWYgKgorc2JzbmRtYnVmKHN0cnVjdCBz
b2NrYnVmICpzYiwgdV9pbnQgb2ZmLCBsb25nICpmaXJzdF9sZW4pCit7CisJc3RydWN0IG1idWYg
Km07CisKKwlLQVNTRVJUKHNiLT5zYl9tYiAhPSBOVUxMLCAoIiVzOiBzYl9tYiBpcyBOVUxMIiwg
X19mdW5jX18pKTsKKworCSpmaXJzdF9sZW4gPSAwOworCS8qCisJICogSXMgb2ZmIGJlbG93IHN0
b3JlZCBvZmZzZXQ/IEhhcHBlbnMgb24gcmV0cmFuc21pdHMuCisJICogSWYgc28sIGp1c3QgdXNl
IHNiX21iLgorCSAqLworCWlmIChzYi0+c2Jfc25kcHRyID09IE5VTEwgfHwgc2ItPnNiX3NuZHB0
cm9mZiA+IG9mZikKKwkJbSA9IHNiLT5zYl9tYjsKKwllbHNlIHsKKwkJbSA9IHNiLT5zYl9zbmRw
dHI7CisJCW9mZiAtPSBzYi0+c2Jfc25kcHRyb2ZmOworCX0KKwl3aGlsZSAob2ZmID4gMCAmJiBt
ICE9IE5VTEwpIHsKKwkJaWYgKG9mZiA8IG0tPm1fbGVuKQorCQkJYnJlYWs7CisJCW9mZiAtPSBt
LT5tX2xlbjsKKwkJbSA9IG0tPm1fbmV4dDsKKwl9CisJaWYgKG0gIT0gTlVMTCkKKwkJKmZpcnN0
X2xlbiA9IG0tPm1fbGVuIC0gb2ZmOworCisJcmV0dXJuIChtKTsKK30KKworLyoKICAqIERyb3Ag
YSByZWNvcmQgb2ZmIHRoZSBmcm9udCBvZiBhIHNvY2tidWYgYW5kIG1vdmUgdGhlIG5leHQgcmVj
b3JkIHRvIHRoZQogICogZnJvbnQuCiAgKi8KLS0tIHN5cy9zb2NrYnVmLmguc2F2CTIwMTQtMDEt
MzAgMjA6NDI6MjguMDAwMDAwMDAwIC0wNTAwCisrKyBzeXMvc29ja2J1Zi5oCTIwMTQtMDEtMzAg
MjI6MDg6NDMuMDAwMDAwMDAwIC0wNTAwCkBAIC0xNTMsNiArMTUzLDggQEAgaW50CXNicmVzZXJ2
ZV9sb2NrZWQoc3RydWN0IHNvY2tidWYgKnNiLAogCSAgICBzdHJ1Y3QgdGhyZWFkICp0ZCk7CiBz
dHJ1Y3QgbWJ1ZiAqCiAJc2JzbmRwdHIoc3RydWN0IHNvY2tidWYgKnNiLCB1X2ludCBvZmYsIHVf
aW50IGxlbiwgdV9pbnQgKm1vZmYpOworc3RydWN0IG1idWYgKgorCXNic25kbWJ1ZihzdHJ1Y3Qg
c29ja2J1ZiAqc2IsIHVfaW50IG9mZiwgbG9uZyAqZmlyc3RfbGVuKTsKIHZvaWQJc2J0b3hzb2Nr
YnVmKHN0cnVjdCBzb2NrYnVmICpzYiwgc3RydWN0IHhzb2NrYnVmICp4c2IpOwogaW50CXNid2Fp
dChzdHJ1Y3Qgc29ja2J1ZiAqc2IpOwogaW50CXNibG9jayhzdHJ1Y3Qgc29ja2J1ZiAqc2IsIGlu
dCBmbGFncyk7Ci0tLSBuZXRpbmV0L3RjcF9pbnB1dC5jLnNhdgkyMDE0LTAxLTMwIDE5OjM3OjUy
LjAwMDAwMDAwMCAtMDUwMAorKysgbmV0aW5ldC90Y3BfaW5wdXQuYwkyMDE0LTAxLTMwIDE5OjM5
OjA3LjAwMDAwMDAwMCAtMDUwMApAQCAtMzYyNyw2ICszNjI3LDcgQEAgdGNwX21zcyhzdHJ1Y3Qg
dGNwY2IgKnRwLCBpbnQgb2ZmZXIpCiAJaWYgKGNhcC5pZmNhcCAmIENTVU1fVFNPKSB7CiAJCXRw
LT50X2ZsYWdzIHw9IFRGX1RTTzsKIAkJdHAtPnRfdHNvbWF4ID0gY2FwLnRzb21heDsKKwkJdHAt
PnRfdHNvbWF4c2VncyA9IGNhcC50c29tYXhzZWdzOwogCX0KIH0KIAotLS0gbmV0aW5ldC90Y3Bf
b3V0cHV0LmMuc2F2CTIwMTQtMDEtMzAgMTg6NTU6MTUuMDAwMDAwMDAwIC0wNTAwCisrKyBuZXRp
bmV0L3RjcF9vdXRwdXQuYwkyMDE0LTAxLTMwIDIyOjE4OjU2LjAwMDAwMDAwMCAtMDUwMApAQCAt
MTY2LDggKzE2Niw4IEBAIGludAogdGNwX291dHB1dChzdHJ1Y3QgdGNwY2IgKnRwKQogewogCXN0
cnVjdCBzb2NrZXQgKnNvID0gdHAtPnRfaW5wY2ItPmlucF9zb2NrZXQ7Ci0JbG9uZyBsZW4sIHJl
Y3dpbiwgc2VuZHdpbjsKLQlpbnQgb2ZmLCBmbGFncywgZXJyb3IgPSAwOwkvKiBLZWVwIGNvbXBp
bGVyIGhhcHB5ICovCisJbG9uZyBsZW4sIHJlY3dpbiwgc2VuZHdpbiwgdHNvX3RsZW47CisJaW50
IGNudCwgb2ZmLCBmbGFncywgZXJyb3IgPSAwOwkvKiBLZWVwIGNvbXBpbGVyIGhhcHB5ICovCiAJ
c3RydWN0IG1idWYgKm07CiAJc3RydWN0IGlwICppcCA9IE5VTEw7CiAJc3RydWN0IGlwb3ZseSAq
aXBvdiA9IE5VTEw7CkBAIC03ODAsNiArNzgwLDI0IEBAIHNlbmQ6CiAJCQl9CiAKIAkJCS8qCisJ
CQkgKiBMaW1pdCB0aGUgbnVtYmVyIG9mIFRTTyB0cmFuc21pdCBzZWdtZW50cyAobWJ1ZnMKKwkJ
CSAqIGluIG1idWYgbGlzdCkgdG8gdHAtPnRfdHNvbWF4c2Vncy4KKwkJCSAqLworCQkJY250ID0g
MDsKKwkJCW0gPSBzYnNuZG1idWYoJnNvLT5zb19zbmQsIG9mZiwgJnRzb190bGVuKTsKKwkJCXdo
aWxlIChtICE9IE5VTEwgJiYgY250IDwgdHAtPnRfdHNvbWF4c2VncyAmJgorCQkJICAgIHRzb190
bGVuIDwgbGVuKSB7CisJCQkJaWYgKGNudCA+IDApCisJCQkJCXRzb190bGVuICs9IG0tPm1fbGVu
OworCQkJCWNudCsrOworCQkJCW0gPSBtLT5tX25leHQ7CisJCQl9CisJCQlpZiAobSAhPSBOVUxM
ICYmIHRzb190bGVuIDwgbGVuKSB7CisJCQkJbGVuID0gdHNvX3RsZW47CisJCQkJc2VuZGFsb3Qg
PSAxOworCQkJfQorCisJCQkvKgogCQkJICogUHJldmVudCB0aGUgbGFzdCBzZWdtZW50IGZyb20g
YmVpbmcKIAkJCSAqIGZyYWN0aW9uYWwgdW5sZXNzIHRoZSBzZW5kIHNvY2tidWYgY2FuCiAJCQkg
KiBiZSBlbXB0aWVkLgotLS0gbmV0aW5ldC90Y3Bfc3Vici5jLnNhdgkyMDE0LTAxLTMwIDE5OjQ0
OjM1LjAwMDAwMDAwMCAtMDUwMAorKysgbmV0aW5ldC90Y3Bfc3Vici5jCTIwMTQtMDEtMzAgMjA6
NTY6MTIuMDAwMDAwMDAwIC0wNTAwCkBAIC0xODAwLDYgKzE4MDAsMTIgQEAgdGNwX21heG10dShz
dHJ1Y3QgaW5fY29ubmluZm8gKmluYywgc3RydQogCQkJICAgIGlmcC0+aWZfaHdhc3Npc3QgJiBD
U1VNX1RTTykKIAkJCQljYXAtPmlmY2FwIHw9IENTVU1fVFNPOwogCQkJCWNhcC0+dHNvbWF4ID0g
aWZwLT5pZl9od190c29tYXg7CisjaWZkZWYgbm90eWV0CisJCQkJY2FwLT50c29tYXhzZWdzID0g
aWZwLT5pZl9od190c29tYXhzZWdzOworI2VuZGlmCisJCQkJaWYgKGNhcC0+dHNvbWF4c2VncyA9
PSAwKQorCQkJCQljYXAtPnRzb21heHNlZ3MgPQorCQkJCQkgICAgVENQVFNPX01BWF9UWF9TRUdT
X0RFRkFVTFQ7CiAJCX0KIAkJUlRGUkVFKHNyby5yb19ydCk7CiAJfQotLS0gbmV0aW5ldC90Y3Bf
dmFyLmguc2F2CTIwMTQtMDEtMzAgMTk6Mzk6MjIuMDAwMDAwMDAwIC0wNTAwCisrKyBuZXRpbmV0
L3RjcF92YXIuaAkyMDE0LTAxLTMwIDIwOjUyOjU3LjAwMDAwMDAwMCAtMDUwMApAQCAtMjA5LDYg
KzIwOSw3IEBAIHN0cnVjdCB0Y3BjYiB7CiAJdV9pbnQJdF9rZWVwY250OwkJLyogbnVtYmVyIG9m
IGtlZXBhbGl2ZXMgYmVmb3JlIGNsb3NlICovCiAKIAl1X2ludAl0X3Rzb21heDsJCS8qIHRzbyBi
dXJzdCBsZW5ndGggbGltaXQgKi8KKwl1X2ludAl0X3Rzb21heHNlZ3M7CQkvKiB0c28gYnVyc3Qg
c2VnbWVudCBsaW1pdCAqLwogCiAJdWludDMyX3QgdF9pc3BhcmVbOF07CQkvKiA1IFVUTywgMyBU
QkQgKi8KIAl2b2lkCSp0X3BzcGFyZTJbNF07CQkvKiA0IFRCRCAqLwpAQCAtMjY4LDYgKzI2OSwx
MSBAQCBzdHJ1Y3QgdGNwY2IgewogI2RlZmluZQlUQ1BPT0JfSEFWRURBVEEJMHgwMQogI2RlZmlu
ZQlUQ1BPT0JfSEFEREFUQQkweDAyCiAKKy8qCisgKiBEZWZhdWx0IHZhbHVlIGZvciBUU08gbWF4
aW11bSBudW1iZXIgb2YgdHJhbnNtaXQgc2VnbWVudHMgKGNvdW50IG9mIG1idWZzKS4KKyAqLwor
I2RlZmluZQlUQ1BUU09fTUFYX1RYX1NFR1NfREVGQVVMVAkzMAorCiAjaWZkZWYgVENQX1NJR05B
VFVSRQogLyoKICAqIERlZmluZXMgd2hpY2ggYXJlIG5lZWRlZCBieSB0aGUgeGZvcm1fdGNwIG1v
ZHVsZSBhbmQgdGNwX1tpbnxvdXRdcHV0CkBAIC0zMzMsNiArMzM5LDcgQEAgc3RydWN0IGhjX21l
dHJpY3NfbGl0ZSB7CS8qIG11c3Qgc3RheSBpbgogc3RydWN0IHRjcF9pZmNhcCB7CiAJaW50CWlm
Y2FwOwogCXVfaW50CXRzb21heDsKKwl1X2ludAl0c29tYXhzZWdzOwogfTsKIAogI2lmbmRlZiBf
TkVUSU5FVF9JTl9QQ0JfSF8K
------=_Part_584003_402802933.1391139152131--

From owner-freebsd-net@FreeBSD.ORG  Fri Jan 31 03:37:24 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id BE83C3C8;
 Fri, 31 Jan 2014 03:37:24 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id 5A2D5142B;
 Fri, 31 Jan 2014 03:37:24 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: 
X-IronPort-AV: E=Sophos;i="4.95,754,1384318800"; d="scan'208";a="92530012"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 30 Jan 2014 22:37:23 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 0A2A3B4066;
 Thu, 30 Jan 2014 22:37:23 -0500 (EST)
Date: Thu, 30 Jan 2014 22:37:23 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: J David <j.david.lists@gmail.com>
Message-ID: <122461163.585673.1391139443031.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RR1eDvdUAaZd73Vv99EJR=DFzwRvMTw3WFER3aQ+2+2zQ@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: Bryan Venteicher <bryanv@freebsd.org>,
 Garrett Wollman <wollman@freebsd.org>, freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jan 2014 03:37:24 -0000

J David wrote:
> On Wed, Jan 29, 2014 at 10:31 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> >> I've been busy the last few days, and won't be able to get to any
> >> code
> >> until the weekend.
> 
> Is there likely to be more to it than just cranking the MAX_TX_SEGS
> value and recompiling?  If so, is it something I could take on?
> 
> > Well, NFS hands TCP a list of 34 mbufs. If TCP obly adds one, then
> > increasing it from 34 to 35 would be all it takes. However, see
> > below.
> 
> One thing I don't want to miss here is that an NFS block size of
> 65,536 is really suboptimal.  The largest size of a TCP datagram is
> 65535.  So by the time NFS adds the overhead on and the total amount
> of data to be sent winds up in that ~65k range, it guarantees that
> the
> operation has to be split it into at least two TCP packets, one
> max-size and one tiny one.  This doubles a lot of the network stack
> overhead, regardless of whether the packet ends up being segmented
> into tiny bits down the road or not.
> 
> If NFS could be modified to respect the actual size of a TCP packet,
> generating a steady stream of 63.9k (or thereabout) writes instead of
> the current 64k-1k-64k-1k, performance would likely see another
> significant boost.  This would nearly double the average throughput
> per packet, which would help with network latency and CPU load.
> 
> It's also not 100% clear but it seems like in some cases the existing
> behavior also causes the TCP stack to park on the "leftover" bit and
> wait for more data, which comes in another >64k chunk, and from there
> on out there's no more correlation between TCP packets and NFS
> operations, so an operation doesn't begin on a packet boundary.  That
> continues as long as load keeps up.  That's probably not good for
> performance either.  And it certainly confuses the heck out of
> tcpdump.
> 
> Probably 60k would be the next most reasonable size, since it's the
> largest page size multiple that will fit into a TCP packet while
> still
> leaving room for overhead.
> 
> Since the max size of TCP packets is not an area where there's really
> any flexibility, what would have to happen to NFS to make that (or
> arbitrary values) perform at its best within that constraint?
> 
> It's apparent from even trivial testing that performance is
> dramatically affected if the "use a power of two for NFS rsize/wsize"
> recommendation isn't followed, but what is the origin of that?  Is it
> something that could be changed?
> 
> > I don't think that m_collapse() is more likely to fail, since it
> > only copies data to the previous mbuf when the entire mbuf that
> > follows will fit and it's allowed. I'd assume that a ref count
> > copied mbuf cluster doesn't allow this copy or things would be
> > badly broken.)
> 
> m_collapse checks M_WRITEABLE which appears to cover the ref count
> case.  (It's a dense macro, but it seems to require a ref count of 1
> if a cluster is used.)
> 
> The cases where m_collapse can succeed are pretty slim.  It pretty
> much requires two consecutive underutilizied buffers, which probably
> explains why it fails so often in this code path.  Since one of its
> two methods outright skips the packet header mbuf (to avoid risk of
> moving it), possibly the only case where it succeeds is when the last
> data mbuf is short enough that whatever NFS trailers are being
> appended can fit with it.
> 
Btw, in the previous post I agreed "in general". For this specific
case of the 64K NFS read reply/write request the first two mbufs
don't have much data in them. The first is the Sun RPC header generated
by the krpc and the 2nd is the first part of the NFS args that preceeds
the data. As such, I suspect that m_collapse() will often succeed in
copying the 2nd mbuf's data into the first and reducing the mbuf count
to 33. (You could find out by adding a counter for calls to m_collapse()
and testing 64K without my patch.

rick

> > Bottom line, I think calling either m_collapse() or m_defrag()
> > should be considered a "last resort".
> 
> It definitely seems more designed for a case where 8 different stack
> layers each put their own little header/trailer fingerprint on the
> packet, and that's not what's happening here.
> 
> > Maybe the driver could reduce the size of if_hw_tsomax whenever
> > it finds it needs to call one of these functions, to try and avoid
> > a re-occurrence?
> 
> Since the issue is one of segment length rather than packet length,
> this seems risky.  If one of those touched-by-everybody packets goes
> by, it may not be that large, but it would risk permanently (until
> reboot) dropping the throughput of that interface.
> 
> Thanks!
> 

From owner-freebsd-net@FreeBSD.ORG  Fri Jan 31 03:53:05 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0C36A726;
 Fri, 31 Jan 2014 03:53:05 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id D654E154A;
 Fri, 31 Jan 2014 03:53:04 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s0V3r3HF029166
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Thu, 30 Jan 2014 19:53:03 -0800 (PST)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s0V3r3QN029165;
 Thu, 30 Jan 2014 19:53:03 -0800 (PST) (envelope-from jmg)
Date: Thu, 30 Jan 2014 19:53:03 -0800
From: John-Mark Gurney <jmg@funkthat.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: 64K NFS I/O generates a 34mbuf list for TCP which breaks TSO
Message-ID: <20140131035303.GT93141@funkthat.com>
Mail-Followup-To: Rick Macklem <rmacklem@uoguelph.ca>,
 Adrian Chadd <adrian@freebsd.org>,
 FreeBSD Net <freebsd-net@freebsd.org>
References: <CAJ-VmonGR-KQBGyKCUn=k8PDH3skB5N8br3JyDTD27+z=UwJJw@mail.gmail.com>
 <1856284835.584005.1391139152133.JavaMail.root@uoguelph.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1856284835.584005.1391139152133.JavaMail.root@uoguelph.ca>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Thu, 30 Jan 2014 19:53:03 -0800 (PST)
Cc: FreeBSD Net <freebsd-net@freebsd.org>, Adrian Chadd <adrian@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jan 2014 03:53:05 -0000

Rick Macklem wrote this message on Thu, Jan 30, 2014 at 22:32 -0500:
> Adrian Chadd wrote:
> > On 30 January 2014 07:06, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> > > Hi, just adding one more idea on what to do about this
> > > to the list:
> > > - Add a if_hw_tsomaxseg and modify the loop in tcp_output()
> > >   so that it uses both if_hw_tsomax and if_hw_tsomaxseg to
> > >   decide how much to hand to the device driver in each mbuf list.
> > >   (I haven't looked to see how easy it would be to change this
> > >   loop.)
> > 
> > I don't think that's a hack. I think adding that and setting
> > tsomaxseg
> > to say 30 for now would be a good comprimise.
> > 
> Well, my TCP is very rusty and I have no way to test it (I don't
> have anything that does TSO), but I've attached a stab at a patch
> to do this.
> 
> Maybe it can be used as a starting point for this, if others think
> it makes sense.
> 
> The "#ifdef notyet" in the patch would become something like:
> # if __FreeBSD_version >= NNNN
> when a change to add if_hw_tsomaxseg is done, was what I was
> thinking.

Definately need to make sure you fix the drivers that support large
enough sg arrays like ixgb which supports 100...

Just a sampling of ones that use a _SCATTER define:
./e1000/if_igb.h:#define IGB_MAX_SCATTER                64
./e1000/if_lem.h:#define EM_MAX_SCATTER         64
./e1000/if_em.h:#define EM_MAX_SCATTER          32
./nfe/if_nfereg.h:#define       NFE_MAX_SCATTER         32
./ixgbe/ixgbe.h:#define IXGBE_82598_SCATTER             100
./ixgbe/ixgbe.h:#define IXGBE_82599_SCATTER             32
./ixgb/if_ixgb.h:#define IXGB_MAX_SCATTER           100

I wonder how many of these are hardware limits, or just I don't
want to allocate too much space on the stack, as 16 bytes per
bus_dma_segment_t (on amd64) adds up...

The other question is should the drivers w/ a limit on the segments
reduce the size of the TSO packet so that we don't need to
m_defrag/m_collapse which are expensive operations...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-net@FreeBSD.ORG  Fri Jan 31 04:36:19 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id BA598C9;
 Fri, 31 Jan 2014 04:36:19 +0000 (UTC)
Received: from mail-ig0-x232.google.com (mail-ig0-x232.google.com
 [IPv6:2607:f8b0:4001:c05::232])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 728B51779;
 Fri, 31 Jan 2014 04:36:19 +0000 (UTC)
Received: by mail-ig0-f178.google.com with SMTP id uq10so8705475igb.5
 for <multiple recipients>; Thu, 30 Jan 2014 20:36:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=aagJ+YALSALUvr3mHWx7BykmqSFhuPfJR6jgzI5O/W0=;
 b=ACOTWZ3Vs8rwaqpmcxcKSYdq6E/SJAwybpmZvADtLItLslyKfGImiiX5eblRgUyu7Z
 DA/vxbCud1fGcl23L/vaYV08J31goWMupdHyLxDWr6/f/OnmJ/I/M2kCfXaLifo9xctL
 0nHDDDjXt/fJHjXq3xOnaHx1pLvKg/WicfhaGF9V/g7xVMB+4czuiEmPnKVwoYzVSFAX
 sXn2BO7ocqs5QBy5bZJ749NP3kotbZ+kYSViHsCLjCpDvw6d1M1JsQXeFA8nCiSXzfgg
 QsH5ui7IahSFfaGv8Q5aXnvt03fw+YGBfcO4oxvSng9pX5OGJn0niSGZGsMwPd/BjwdY
 jNrg==
MIME-Version: 1.0
X-Received: by 10.43.51.65 with SMTP id vh1mr13779261icb.24.1391142978559;
 Thu, 30 Jan 2014 20:36:18 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Thu, 30 Jan 2014 20:36:18 -0800 (PST)
In-Reply-To: <87942875.478893.1391121843834.JavaMail.root@uoguelph.ca>
References: <CABXB=RR1eDvdUAaZd73Vv99EJR=DFzwRvMTw3WFER3aQ+2+2zQ@mail.gmail.com>
 <87942875.478893.1391121843834.JavaMail.root@uoguelph.ca>
Date: Thu, 30 Jan 2014 23:36:18 -0500
X-Google-Sender-Auth: ziCZJJVV4QHzrdWTvg3C_eY9gU4
Message-ID: <CABXB=RTx9_gE=0G9UAzwJ3LuYv8fy=sAOZp1e2D7cJ6_=kgd9A@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Bryan Venteicher <bryanv@freebsd.org>,
 Garrett Wollman <wollman@freebsd.org>, freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jan 2014 04:36:19 -0000

On Thu, Jan 30, 2014 at 5:44 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> I'd like to see MAXBSIZE
> increased to at least 128K, since that is the default block size for
> ZFS, I've been told.

Regrettably, that is incomplete.  The ZFS record size is variable *up
to* 128kiB by default; it's more of an upper limit than a hard and
fast rule.  Also, it is configurable at runtime on a per-filesystem
basis.  Although any file >128kiB probably does use 128kiB blocks, ZFS
has ARC and L2ARC and manages its own prefetch.  Probably as long as
NFS treats the rsize/wsize as a fixed-sized block, the number of
workloads benefited by pushing it to 128kiB may be very limited.


> Also, for real networks, the NFS RPC message will be broken into
> quite a few packets to go on the wire, as far as I know. (I don't
> think there are real networks using a 64K jumbo packet, is there?)
> For my hardware, the packets will be 1500bytes each on the wire,
> since nothing I have does jumbo packets.

Real environments for NFS in 2014 are 10gig LANs with hardware TSO
that makes the overhead of TSO negligible.  As someone else on this
thread has already pointed out, efficiently utilizing TSO is
essentially mandatory to make good use of 10gig hardware.  So as far
as FreeBSD is concerned, yes, many networks effectively have a 64k MTU
(for TCP only since FreeBSD does not implement GSO at this time) and
it should act accordingly when dealing with them.

This NFS buffer size is nearly doubling the number of TCP packets it
takes to move the same amount data.  Regardless of how those packets
are eventually segmented -- which can be effectively ignored in the
real world of hardware TSO -- the overhead of TCP and IP is not nil,
cannot be offloaded, and doubling it is not a good thing.  It doubles
every step down to the very bottom, including optional stuff like PF
if it is hanging around in there.

> Unfortunately, NFS adds a little bit to the front of the data, so
> an NFS RPC will always be a little bit more than a power of 2 in
> size for reads/writes of a power of 2.

That's why NFS should be able to operate on page-sized multiples
rather than powers of 2.  Then it can operate on the filesystem using
the best size for that, operate on the network using the best size for
that, and mediate the two using page-sized jumbo clusters.

If you know the underlying filesystem block size, by all means, read
or write based on it where appropriate.

> Now, I am not sure why 65535 (largest ip datagram) has been chosen
> as the default limit for TSO segments?

The process of TCP segmentation, whether offloaded or not, is
performed on a single TCP packet.  It operates by reusing that
packet's header over and over for each segment with slight
modifications.  Consequently the maximum size that can be offloaded is
the maximum size that can be segmented: one packet.

> Well, since NFS sets the TCP_NODELAY socket option, that shouldn't
> occur in the TCP layer. If some network device driver is delaying,
> waiting for more to send, then I'd say that device driver is broken.

This is not a driver issue.  TCP_NODELAY means "don't wait for more
data."  It doesn't mean "don't send more data that is ready to be
sent."  If there's more data already present on the stream by the time
the TCP stack gets to it, which is possible in an SMP environment,
TCP_NODELAY won't, as far as I know, prevent it from being sent in the
next available packet.  This isn't necessarily something that happens
every time, or even consistently, but when you're sending a hundred
thousand packets per second, it looks like the chain can indeed come
off the bicycle.

NFS is not sending packets to the TCP stack, it is sending stream
data.  With TCP_NODELAY it should be possible to engineer a one send =
one packet correlation, but that's true if and only if that send is
less than the max packet size.

> For real NFS environments, the performance of the file system and
> underlying disk subsystem is generally more important than the network.

Maybe this is the case if NFS is serving from one spinning disk.  It's
definitely not the case for ZFS installs with 128GiB RAM, shelves of
SAS drives, TB of SSD L2ARC, and STEC slog devices.

The performance of the virtual environment we're using as a test
platform is remarkably close to that.  It just has the benefit of
being two orders of magnitude cheaper and therefore something that can
be set aside for testing stuff like this.

> (Some
> NAS vendors avoid this by using non-volatile ram in the server as stable
> storage, but a FreeBSD server can't expect such hardware to be available.)

Nonvolatile slogs are all but mandatory in any ZFS-backed-NFS
fileserver deployment.  Like TSO, it's not hypothetical, it is
standard for production deployments.

>> but what is the origin of that?  Is it
>> something that could be changed?
>>
> Because disk file systems on file servers always use block sizes that
> are a power of 2.

Maybe my question wasn't phrased well.  What is the origin of the huge
performance drop when a non-multiple-of-2 size is used?  This is
visible under small random ops where the data difference between a 60k
read and a 64k read isn't ever used and the next block is almost
certainly not going to be read next.  So it's very weird (to me) that
performance drops as much as it does.

> Agreed. I think adding a if_hw_tsomaxseg that TCP can use is preferable.

It may be valuable for other workloads to prevent drops on some kind
of pathologically sliced-up packets, but jumbo cluster support in NFS
should pretty much guarantee that it is not going to have a problem in
this area with any interface in common use.

Thanks!

From owner-freebsd-net@FreeBSD.ORG  Fri Jan 31 06:18:34 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CF903D91
 for <freebsd-net@freebsd.org>; Fri, 31 Jan 2014 06:18:34 +0000 (UTC)
Received: from hergotha.csail.mit.edu
 (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9ED5D1E26
 for <freebsd-net@freebsd.org>; Fri, 31 Jan 2014 06:18:34 +0000 (UTC)
Received: from hergotha.csail.mit.edu (localhost [127.0.0.1])
 by hergotha.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s0V6IVEN027168;
 Fri, 31 Jan 2014 01:18:31 -0500 (EST)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: (from wollman@localhost)
 by hergotha.csail.mit.edu (8.14.7/8.14.4/Submit) id s0V6IVJv027167;
 Fri, 31 Jan 2014 01:18:31 -0500 (EST) (envelope-from wollman)
Date: Fri, 31 Jan 2014 01:18:31 -0500 (EST)
Message-Id: <201401310618.s0V6IVJv027167@hergotha.csail.mit.edu>
From: wollman@freebsd.org
To: j.david.lists@gmail.com
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
X-Newsgroups: mit.lcs.mail.freebsd-net
In-Reply-To: <CABXB=RTx9_gE=0G9UAzwJ3LuYv8fy=sAOZp1e2D7cJ6_=kgd9A@mail.gmail.com>
References: <CABXB=RR1eDvdUAaZd73Vv99EJR=DFzwRvMTw3WFER3aQ+2+2zQ@mail.gmail.com>
 <87942875.478893.1391121843834.JavaMail.root@uoguelph.ca>
Organization: none
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (hergotha.csail.mit.edu [127.0.0.1]); Fri, 31 Jan 2014 01:18:31 -0500 (EST)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED
 autolearn=disabled version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 hergotha.csail.mit.edu
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jan 2014 06:18:35 -0000

In article
<CABXB=RTx9_gE=0G9UAzwJ3LuYv8fy=sAOZp1e2D7cJ6_=kgd9A@mail.gmail.com>,
J David writes:

>The process of TCP segmentation, whether offloaded or not, is
>performed on a single TCP packet.  It operates by reusing that
>packet's header over and over for each segment with slight
>modifications.  Consequently the maximum size that can be offloaded is
>the maximum size that can be segmented: one packet.

This is almost entirely wrong in its description of the non-offload
case.  A segment is a PDU at the transport layer.  In normal
operation, TCP figures out how much it can send, constructs a header,
and copies an mbuf chain referencing one segment's worth of data out
of the socket's transmit buffer.  tcp_output() repeats this process
(possibly using the same mbuf cluster multiple times, if it's larger
than the receiver's or the path's maximum segment size) until it
either runs out of stuff to send, or runs out of transmit window to
send into.  THAT IS WHY TSO IS A WIN: as you describe, the packet
headers are mostly identical, and (if the transmit window allows) it's
much cheaper to build the header and do the DMA setup once, then let
the NIC take over from there, rather than having to DMA a different
(but nearly identical) header for every individual segment.

>NFS is not sending packets to the TCP stack, it is sending stream
>data.  With TCP_NODELAY it should be possible to engineer a one send =
>one packet correlation, but that's true if and only if that send is
>less than the max packet size.

Yes and no.  NFS constructs a chain of mbufs and calls the socket's
sosend() routine.  This ultimately results in a call to tcp_output(),
and in the normal case where there is no data awaiting transmission,
that mbuf chain will be shallow-copied (bumping all the mbuf cluster
reference counts) up to the limit of what the transmit window allows,
and Ethernet, IP, and TCP headers will be prepended (possibly in a
separate mbuf).  The whole mess is then passed on to the hardware for
offload, if it fits.  RPC responses will only get smushed together if
tcp_output() wasn't able to schedule the transmit immediately, and if
the network is working properly, that will only happen if there's more
than one client-side-receive-window's-worth of data to be transmitted.

This shallow-copy behavior, by the way, is why the drivers need
m_defrag() rather than m_collapse(): M_WRITABLE is never true for
clusters coming out of tcp_output(), because the refcount will never
be less than 2 (one for the socket buffer and at least one for the
interface's transmit queue, depending on how many segments include
some data from the cluster).  But it's also part of why having a
"gigantic" cluster (e.g., 128k) would be a big win for NFS.

-GAWollman


From owner-freebsd-net@FreeBSD.ORG  Fri Jan 31 17:58:16 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C69861A1;
 Fri, 31 Jan 2014 17:58:16 +0000 (UTC)
Received: from mail-ie0-x232.google.com (mail-ie0-x232.google.com
 [IPv6:2607:f8b0:4001:c03::232])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 895181945;
 Fri, 31 Jan 2014 17:58:16 +0000 (UTC)
Received: by mail-ie0-f178.google.com with SMTP id x13so4720460ief.9
 for <multiple recipients>; Fri, 31 Jan 2014 09:58:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=OuiCkJqdf7ZDhRD3EbTuIQDZsZXdHn1CR3hukVW+0tI=;
 b=ZFlmxfNWy3UAL1DvI9ja4/X7Fc1SRLbmhUVRi0GVQqcnoR/buj/CbfEGN7I2VbTTqm
 6QHg8ni0bHYDadbkM5mftBxLqMEfcHg4ntNlGaaTTxv0Ap3fqaKfrGwnVAWBezgIGCSS
 wRmHKYvt44rQ6BYjNlxf5S/9x9EqpMwVu3FXUbF+EVVQENQ2otSjV2ARqCucaHHq9CTG
 86QfqKd7JvTQOADpTmBRfXEH7MGRtV93I5JUoA1v/x5kQb/R8d+6keWOEZ9KLebb0akP
 Cb6RDbQC98e37hyk8Xsi31eTaEEtMjc4O1bB94wdBqjJjfYWPZlHDZwowCf07g8efrDG
 UdBQ==
MIME-Version: 1.0
X-Received: by 10.43.82.69 with SMTP id ab5mr992946icc.95.1391191096052; Fri,
 31 Jan 2014 09:58:16 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Fri, 31 Jan 2014 09:58:15 -0800 (PST)
In-Reply-To: <201401310618.s0V6IVJv027167@hergotha.csail.mit.edu>
References: <CABXB=RR1eDvdUAaZd73Vv99EJR=DFzwRvMTw3WFER3aQ+2+2zQ@mail.gmail.com>
 <87942875.478893.1391121843834.JavaMail.root@uoguelph.ca>
 <CABXB=RTx9_gE=0G9UAzwJ3LuYv8fy=sAOZp1e2D7cJ6_=kgd9A@mail.gmail.com>
 <201401310618.s0V6IVJv027167@hergotha.csail.mit.edu>
Date: Fri, 31 Jan 2014 12:58:15 -0500
X-Google-Sender-Auth: RGZ1-iXDuUwPxh2EdH2xGmc6rCo
Message-ID: <CABXB=RQ6LdeoNi4vNZGCaM2C_up_JCf2SpWPzm2S_M_+pTnzsQ@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: Garrett Wollman <wollman@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jan 2014 17:58:16 -0000

On Fri, Jan 31, 2014 at 1:18 AM,  <wollman@freebsd.org> wrote:
> This is almost entirely wrong in its description of the non-offload
> case.

Yes, you're quite right; I confused myself.  GSO works a little
differently, but FreeBSD doesn't use that.

> The whole mess is then passed on to the hardware for
> offload, if it fits.

That's the point, NFS is creating a situation where it never fits.  It
can't shove 65k into 64k, so it ends up looping back through the whole
output routine again for a tiny tail of data, and then the same for
the input routine on the other side.  Arguably that makes rsize/wsize
65536 negligibly different than rsize/wsize 32768 in the long run
because the average data output per pass is about the same (64k + 1k
vs 33k + 33k).  Except, of course, in the case where almost all files
are between 32k and 60k.

Please don't get me wrong, I'm not suggesting there's anything more
than a small CPU reduction to be obtained by changing this.  Which is
not nothing if the client is CPU-limited due to the other work it's
doing, but it's not much.  To get real speedups from NFS would require
a change to the punishing read-before-write behavior, which is pretty
clearly not going to happen.

> RPC responses will only get smushed together if
> tcp_output() wasn't able to schedule the transmit immediately, and if
> the network is working properly, that will only happen if there's more
> than one client-side-receive-window's-worth of data to be transmitted.

This is something I have seen live in tcpdump, but then I have had so
many problems with NFS and congestion control that the "network is
working properly" condition probably isn't satisfied.  Hopefully the
jumbo cluster changes will resolve that once and for all.

Thanks!

From owner-freebsd-net@FreeBSD.ORG  Fri Jan 31 19:41:39 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 74D7FA62
 for <freebsd-net@FreeBSD.org>; Fri, 31 Jan 2014 19:41:39 +0000 (UTC)
Received: from mx1.sbone.de (mx1.sbone.de [IPv6:2a01:4f8:130:3ffc::401:25])
 (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2BC86125C
 for <freebsd-net@FreeBSD.org>; Fri, 31 Jan 2014 19:41:39 +0000 (UTC)
Received: from mail.sbone.de (mail.sbone.de [IPv6:fde9:577b:c1a9:31::2013:587])
 (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.sbone.de (Postfix) with ESMTPS id 56D9825D3897
 for <freebsd-net@FreeBSD.org>; Fri, 31 Jan 2014 19:41:35 +0000 (UTC)
Received: from content-filter.sbone.de (content-filter.sbone.de
 [IPv6:fde9:577b:c1a9:31::2013:2742])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.sbone.de (Postfix) with ESMTPS id D4012C22C60
 for <freebsd-net@FreeBSD.org>; Fri, 31 Jan 2014 19:41:34 +0000 (UTC)
X-Virus-Scanned: amavisd-new at sbone.de
Received: from mail.sbone.de ([IPv6:fde9:577b:c1a9:31::2013:587])
 by content-filter.sbone.de (content-filter.sbone.de
 [fde9:577b:c1a9:31::2013:2742]) (amavisd-new, port 10024)
 with ESMTP id DPEHQt2VJF5y for <freebsd-net@FreeBSD.org>;
 Fri, 31 Jan 2014 19:41:33 +0000 (UTC)
Received: from nv.sbone.de (nv.sbone.de [IPv6:fde9:577b:c1a9:31::2013:138])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.sbone.de (Postfix) with ESMTPSA id 38EC9C22C0D
 for <freebsd-net@FreeBSD.org>; Fri, 31 Jan 2014 19:41:33 +0000 (UTC)
Date: Fri, 31 Jan 2014 19:41:29 +0000 (UTC)
From: "Bjoern A. Zeeb" <bz@FreeBSD.org>
To: FreeBSD Net <freebsd-net@FreeBSD.org>
Subject: 10.0-R noinet snapshots available
Message-ID: <alpine.BSF.2.00.1401311933350.4568@ai.fobar.qr>
X-OpenPGP-Key-Id: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jan 2014 19:41:39 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hi,

it's been a while but I have produced a new set of noinet snapshots for FreeBSD 10.0-RELEASE.

Download i386 or amd64 noinet-snapshot-10.0-RELEASE-r260789 install media from any of the mirrors mentioned on https://wiki.freebsd.org/IPv6Only or learn how to build a noinet system yourself on https://www.freebsd.org/ipv6/ipv6only.html .

With the new package system in place you can turn these snapshots into an IPv6-only desktop or server in minutes.  Try it out!  Make sure your applications, your research project, your web presence, you name it works with IPv6.

Enjoy,
Bjoern


Checksums:

amd64:

SHA256 (FreeBSD-10.0-RELEASE-amd64-bootonly.iso) = 24400220df2a2728ab45b85d9aa48a8bc5cbd3166f39c1d9d5d1ce1f41bb05bc
SHA256 (FreeBSD-10.0-RELEASE-amd64-disc1.iso) = 629f041dee6d127ca94c62d6aa41991f25d208dbe19c424484004bdc92bf5150
SHA256 (FreeBSD-10.0-RELEASE-amd64-memstick.img) = bf9b04754dc809d47cad4d4bbb8893a7a29d0c6e2988c27a080f1380547c23e6
SHA256 (ftp/MANIFEST) = 8d1eeb8d12892a69d2402d1294293962bb31b588ab52666490d98ac2b19642ba

MD5 (FreeBSD-10.0-RELEASE-amd64-bootonly.iso) = 68f0be479177a698686bf956632140e2
MD5 (FreeBSD-10.0-RELEASE-amd64-disc1.iso) = cd5f5e5575919082317fd378fa4b5105
MD5 (FreeBSD-10.0-RELEASE-amd64-memstick.img) = edbc9b48ebac4e7f8166c91105ffafdb
MD5 (ftp/MANIFEST) = 4ee8ba1f71c04caca70e021830bde370

i386:

SHA256 (FreeBSD-10.0-RELEASE-i386-bootonly.iso) = e3c81250dd0cdabc78cdd767bc5ee0f3a81e992923aa9aae722336161d67198f
SHA256 (FreeBSD-10.0-RELEASE-i386-disc1.iso) = da8bfc78464997baf00b179ce4d307e6fe24aa8c8fc5aec84a680d00b21ac080
SHA256 (FreeBSD-10.0-RELEASE-i386-memstick.img) = bea592a2a87344722cf127ed89462a3a4f7a6a970ca0d89182bd1c9218495846
SHA256 (ftp/MANIFEST) = a17d5fe9b8bb27340d125d3d0aa6ffe68861a769e004f97a159069f10802fbbc

MD5 (FreeBSD-10.0-RELEASE-i386-bootonly.iso) = 5a1753c397a5f58811c41f759e68b0e3
MD5 (FreeBSD-10.0-RELEASE-i386-disc1.iso) = 26f3cfbdd5f8f2fd0b182fec624e1abc
MD5 (FreeBSD-10.0-RELEASE-i386-memstick.img) = 7781d6e5f0c41225311b7752d80e0ea5
MD5 (ftp/MANIFEST) = 0d7f4fe729870b206b23a7f1c56c5773
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAlLr/GkACgkQK1i4+DzPGEJBDACcCHKSonlGKkBu7wJZY7pPk3um
6m4AoKI716/125C7bIr5Y8cDBq5jZB7i
=YCT+
-----END PGP SIGNATURE-----

From owner-freebsd-net@FreeBSD.ORG  Fri Jan 31 23:17:02 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 701F2D5C;
 Fri, 31 Jan 2014 23:17:02 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id 21E5B1364;
 Fri, 31 Jan 2014 23:17:01 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqQEAKwu7FKDaFve/2dsb2JhbABZg0RXgwG6CU+BInSCJQEBAQMBAQEBICsgCwUWGAICDRkCKQEJJgYIBwQBHASHXAgNrAChMBeBKY0BBwEBARo0B4JvgUkEiUmMDoQFkG+DSx4xewkXIg
X-IronPort-AV: E=Sophos;i="4.95,760,1384318800"; d="scan'208";a="92176026"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 31 Jan 2014 18:16:45 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 79BA8B3EFE;
 Fri, 31 Jan 2014 18:16:45 -0500 (EST)
Date: Fri, 31 Jan 2014 18:16:45 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: J David <j.david.lists@gmail.com>
Message-ID: <1622306213.1079665.1391210205488.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RQ6LdeoNi4vNZGCaM2C_up_JCf2SpWPzm2S_M_+pTnzsQ@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jan 2014 23:17:02 -0000

J David wrote:
> On Fri, Jan 31, 2014 at 1:18 AM,  <wollman@freebsd.org> wrote:
> > This is almost entirely wrong in its description of the non-offload
> > case.
> 
> Yes, you're quite right; I confused myself.  GSO works a little
> differently, but FreeBSD doesn't use that.
> 
> > The whole mess is then passed on to the hardware for
> > offload, if it fits.
> 
> That's the point, NFS is creating a situation where it never fits.
>  It
> can't shove 65k into 64k, so it ends up looping back through the
> whole
> output routine again for a tiny tail of data, and then the same for
> the input routine on the other side.  Arguably that makes rsize/wsize
> 65536 negligibly different than rsize/wsize 32768 in the long run
> because the average data output per pass is about the same (64k + 1k
> vs 33k + 33k).  Except, of course, in the case where almost all files
> are between 32k and 60k.
> 
You can certainly try "-o rsize=61440,wsize=61440" (assuming a 4K page size)
for the mount, if you'd like.

There is a bug (that is a 1 line patch I keep forgetting to put in) where,
if you choose an rsize,wsize not an exact multiple of PAGE_SIZE, mmap'd files
can get garbage from the partially valid pages. However, I'm pretty sure you
are safe so long as you specify exact multiples of PAGE_SIZE.

The default size is the size recommended by the NFS server, capped at MAXBSIZE.
(Btw, Solaris10 recommends 256K and allows 1Mbyte. FreeBSD recommends and allows
 MAXBSIZE.)
I'll admit I'm not convinced that the reduced overheads of using 61440 outweight
the fact that the server file systems use blocksizes that are always a power of 2.
Without good evidence that using 61440 is better, I wouldn't want the server
recommending that. (And I don't know how NFS would know that it is sending on a
TSO enabled interface.)

rick

> Please don't get me wrong, I'm not suggesting there's anything more
> than a small CPU reduction to be obtained by changing this.  Which is
> not nothing if the client is CPU-limited due to the other work it's
> doing, but it's not much.  To get real speedups from NFS would
> require
> a change to the punishing read-before-write behavior, which is pretty
> clearly not going to happen.
> 
> > RPC responses will only get smushed together if
> > tcp_output() wasn't able to schedule the transmit immediately, and
> > if
> > the network is working properly, that will only happen if there's
> > more
> > than one client-side-receive-window's-worth of data to be
> > transmitted.
> 
> This is something I have seen live in tcpdump, but then I have had so
> many problems with NFS and congestion control that the "network is
> working properly" condition probably isn't satisfied.  Hopefully the
> jumbo cluster changes will resolve that once and for all.
> 
> Thanks!
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Fri Jan 31 23:20:58 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B456DBA;
 Fri, 31 Jan 2014 23:20:58 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id 676DD13E2;
 Fri, 31 Jan 2014 23:20:58 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: 
X-IronPort-AV: E=Sophos;i="4.95,760,1384318800"; d="scan'208";a="92702741"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 31 Jan 2014 18:20:56 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id A57A0B3F43;
 Fri, 31 Jan 2014 18:20:56 -0500 (EST)
Date: Fri, 31 Jan 2014 18:20:56 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: J David <j.david.lists@gmail.com>
Message-ID: <1609454808.1083115.1391210456671.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RQ6LdeoNi4vNZGCaM2C_up_JCf2SpWPzm2S_M_+pTnzsQ@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jan 2014 23:20:58 -0000

J David wrote:
> On Fri, Jan 31, 2014 at 1:18 AM,  <wollman@freebsd.org> wrote:
> > This is almost entirely wrong in its description of the non-offload
> > case.
> 
> Yes, you're quite right; I confused myself.  GSO works a little
> differently, but FreeBSD doesn't use that.
> 
> > The whole mess is then passed on to the hardware for
> > offload, if it fits.
> 
> That's the point, NFS is creating a situation where it never fits.
>  It
> can't shove 65k into 64k, so it ends up looping back through the
> whole
> output routine again for a tiny tail of data, and then the same for
> the input routine on the other side.  Arguably that makes rsize/wsize
> 65536 negligibly different than rsize/wsize 32768 in the long run
> because the average data output per pass is about the same (64k + 1k
> vs 33k + 33k).  Except, of course, in the case where almost all files
> are between 32k and 60k.
> 
Oh, and remember to try setting readahead=8 in your mounts, too. NFS will
do a read + N readaheads (where N == 1 by default) and then wait for
replies to those before continuing on. If the product of rsize * readahead
isn't enough bits to fill the pipe (bandwidth * transit delay), then you
won't be using the bandwidth your network interface provides.

rick
ps: Any you probably want your nfsd threads to be at least 16 instead of
    the default of 4.

> Please don't get me wrong, I'm not suggesting there's anything more
> than a small CPU reduction to be obtained by changing this.  Which is
> not nothing if the client is CPU-limited due to the other work it's
> doing, but it's not much.  To get real speedups from NFS would
> require
> a change to the punishing read-before-write behavior, which is pretty
> clearly not going to happen.
> 
> > RPC responses will only get smushed together if
> > tcp_output() wasn't able to schedule the transmit immediately, and
> > if
> > the network is working properly, that will only happen if there's
> > more
> > than one client-side-receive-window's-worth of data to be
> > transmitted.
> 
> This is something I have seen live in tcpdump, but then I have had so
> many problems with NFS and congestion control that the "network is
> working properly" condition probably isn't satisfied.  Hopefully the
> jumbo cluster changes will resolve that once and for all.
> 
> Thanks!
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Fri Jan 31 23:45:12 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9F37086D
 for <freebsd-net@freebsd.org>; Fri, 31 Jan 2014 23:45:12 +0000 (UTC)
Received: from mail-ee0-x22a.google.com (mail-ee0-x22a.google.com
 [IPv6:2a00:1450:4013:c00::22a])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3B66F15DC
 for <freebsd-net@freebsd.org>; Fri, 31 Jan 2014 23:45:12 +0000 (UTC)
Received: by mail-ee0-f42.google.com with SMTP id b15so751833eek.29
 for <freebsd-net@freebsd.org>; Fri, 31 Jan 2014 15:45:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=ItOilkexPfeiNLizJhiOTpVezO51thfPcHeAy8iIWDI=;
 b=YXT0x1D56gYZM2e4kupKHP3SaRNkKBNFireiCdS+DYXjKzlfLaEs3m+19qGJSSBqjn
 L7bGfL0IknU8V+ljJcRkUNnPw66VBGTIwNhTZMKqeGv3dafAIB8Kgj/MB0m2wAqdPZMM
 262t4qMlV1J8pyMF0lJ7AXfjf96cmgHACB+/xzLmroREH7N39grArxa5giARYNzjb4fo
 RRDT5vpcHoVp4QjYQ7j1UxdS9mD4GjUC34flROaScMXluO6mfQZf8NH5EkzozRbdFmE8
 lrIUbFfxedYojdfEcKmO9ztDdKFyMTcGr1GLm2S33+o/Ua4xfEj3tW/ChW03Vw0W8D5g
 6dyA==
MIME-Version: 1.0
X-Received: by 10.14.126.9 with SMTP id a9mr5850552eei.95.1391211910555; Fri,
 31 Jan 2014 15:45:10 -0800 (PST)
Received: by 10.14.65.4 with HTTP; Fri, 31 Jan 2014 15:45:10 -0800 (PST)
Date: Fri, 31 Jan 2014 15:45:10 -0800
Message-ID: <CALCpEUH7YHhG3gw0wrObsJux=PWmJu2K6_tzCbCfXQ21ENYHdw@mail.gmail.com>
Subject: Errors using span interface on if_bridge(4)
From: hiren panchasara <hiren.panchasara@gmail.com>
To: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jan 2014 23:45:12 -0000

Below is my setup:
11.0-CURRENT FreeBSD 11.0-CURRENT #1 r260789:260806M: Thu Jan 23
21:18:08 UTC 2014
(n/w stack is untouched)

ix1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric
0 mtu 1500
        options=8400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
        ether 38:ea:a7:8b:af:c4
        inet6 fe80::3aea:a7ff:fe8b:afc4%ix1 prefixlen 64 scopeid 0x6
        inet 10.73.149.91 netmask 0xffffff00 broadcast 10.73.149.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
        status: active
ix2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric
0 mtu 1500
        options=8400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
        ether 90:e2:ba:30:73:40
        inet6 fe80::92e2:baff:fe30:7340%ix2 prefixlen 64 scopeid 0x7
        inet 192.168.0.2 netmask 0xffffff00 broadcast 192.168.0.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
        status: active
ix3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric
0 mtu 1500
        options=8400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
        ether 90:e2:ba:30:73:41
        inet6 fe80::92e2:baff:fe30:7341%ix3 prefixlen 64 scopeid 0x8
        inet 192.168.0.3 netmask 0xffffff00 broadcast 192.168.0.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (autoselect <full-duplex>)
        status: active
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 02:a1:25:9a:8f:00
        nd6 options=9<PERFORMNUD,IFDISABLED>
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: ix1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 6 priority 128 path cost 2000
        member: ix2 flags=8<SPAN>
                ifmaxaddr 0 port 7 priority 128 path cost 2000

ix2 and ix3 are connected back to back via a cable so that I can snoop
any traffic arriving on bridge0 on to ix3.

I have tcpdump going on all 3 interfaces. What I am seeing is
interesting when I send data to ix1 via iperf3 (iperf3 -c
10.73.149.91) . I see packets coming to ix1, getting copied to ix2 but
on ix3 I only see a few packets making it successfully, for rest I
see:

23:30:01.308691 IP bad-hlen 0
23:30:01.308700 IP bad-hlen 0
23:30:01.308711 IP bad-hlen 0

Failure is intermittent. Some packets get through but I see this error
for others.

Looking at the packet carefully, for all those packets with errors,
header length for ipv4 is being reported as 0. Only other indication I
could see was:

-bash-4.2$ sysctl -a | grep checksum_errs
dev.ix.0.mac_stats.checksum_errs: 0
dev.ix.1.mac_stats.checksum_errs: 0
dev.ix.2.mac_stats.checksum_errs: 0
dev.ix.3.mac_stats.checksum_errs: 5686743

I also disabled tso and lro on all of them.

Looking at the code: if_bridge.c has bridge_span() which does
m_copypacket() to span interface.

2549                 mc = m_copypacket(m, M_NOWAIT);
2550                 if (mc == NULL) {
2551                         sc->sc_ifp->if_oerrors++;
2552                         continue;
2553                 }
2554
2555                 bridge_enqueue(sc, dst_if, mc);

Now, I am not sure if its failing at m_copypacket() or after that in
bridge_enqueue(). Not sure how do I look at if_oerrors count.

Any further help in debugging would be great.

cheers,
Hiren

From owner-freebsd-net@FreeBSD.ORG  Sat Feb  1 01:20:01 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2975B96A;
 Sat,  1 Feb 2014 01:20:01 +0000 (UTC)
Received: from mail-ie0-x233.google.com (mail-ie0-x233.google.com
 [IPv6:2607:f8b0:4001:c03::233])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id E2A7A1CA7;
 Sat,  1 Feb 2014 01:20:00 +0000 (UTC)
Received: by mail-ie0-f179.google.com with SMTP id ar20so4977053iec.10
 for <multiple recipients>; Fri, 31 Jan 2014 17:20:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=S5aYt//r4llUtWOUnBsX/w7f4ApMl+41Eh8RwIVf65s=;
 b=eHYjYbP/vEfYlRhYyANgSY1Ee9HEvpj8KE0Kfkjlriz0x0nGCX5sD4M5K574/o4BQ3
 UC750yAC0JCYvMqBnj2BH0gk3C7fn97XmNVck/amSksH7Tvj6qDNfMDIsT1bJWhiF1KX
 gFE2WRmnOpDGeZ1X4ZRq+FYmXRmyU3PUBGXzRYK/dXx4Lb0LBLJMVYbNXfl5Ps4JIm66
 JfbU08s1M0AUqbOJojMfrIgsdxH9VfnlzvI65k3QftHggL8HMpNfzlLHJ5Bv4JFV1rXc
 B/lXxte2RHxgLRrLqzq7uvhlC6DAItCTiqrVcKPaXYzWeIrrnjFNjLWKNzTTe91Q/MH5
 TfhQ==
MIME-Version: 1.0
X-Received: by 10.50.60.105 with SMTP id g9mr1447813igr.14.1391217600360; Fri,
 31 Jan 2014 17:20:00 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Fri, 31 Jan 2014 17:20:00 -0800 (PST)
In-Reply-To: <1622306213.1079665.1391210205488.JavaMail.root@uoguelph.ca>
References: <CABXB=RQ6LdeoNi4vNZGCaM2C_up_JCf2SpWPzm2S_M_+pTnzsQ@mail.gmail.com>
 <1622306213.1079665.1391210205488.JavaMail.root@uoguelph.ca>
Date: Fri, 31 Jan 2014 20:20:00 -0500
X-Google-Sender-Auth: fACMXLJidVL3k7PeVpegChTlAJY
Message-ID: <CABXB=RQDDn2AW0tU4hb9mS5hN5B52VFcCGiicv0SCkMB7SEKAg@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Feb 2014 01:20:01 -0000

On Fri, Jan 31, 2014 at 6:16 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> You can certainly try "-o rsize=61440,wsize=61440" (assuming a 4K page size)
> for the mount, if you'd like.

This has previously been tested with all 4k steps between 16k and 32k.
 All of them perform worse than

With 61440, NFS fails outright on the random read test:

$ iozone -e -I -s 1g -r 4k -i 0 -i 2

Iozone: Performance Test of File I/O

        Version $Revision: 3.420 $

Compiled for 64 bit mode.

Build: freebsd

[...]

Include fsync in write timing

O_DIRECT feature enabled

File size set to 1048576 KB

Record Size 4 KB

Command line used: iozone -e -I -s 1g -r 4k -i 0 -i 2

Output is in Kbytes/sec

Time Resolution = 0.000005 seconds.

Processor cache size set to 1024 Kbytes.

Processor cache line size set to 32 bytes.

File stride size set to 17 * record size.

                                                            random
random    bkwd   record   stride

              KB  reclen   write rewrite    read    reread    read
write    read  rewrite     read   fwrite frewrite   fread  freread

         1048576       4   24688   23891

Error reading block at 1073729536

read: Bad file descriptor


Upon using the -w option, which leaves the file intact on exit, it's
possible to see that it's not even 1gig in length:

$ ls -aln iozone.tmp

-rw-r-----  1 1000  0  1073709056 Feb  1 01:18 iozone.tmp


It's 32k short, which is a pretty surprising result.

Thanks!

From owner-freebsd-net@FreeBSD.ORG  Sat Feb  1 01:24:00 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id E2B95A34
 for <freebsd-net@freebsd.org>; Sat,  1 Feb 2014 01:24:00 +0000 (UTC)
Received: from mail-ea0-x229.google.com (mail-ea0-x229.google.com
 [IPv6:2a00:1450:4013:c01::229])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7CFE91D14
 for <freebsd-net@freebsd.org>; Sat,  1 Feb 2014 01:24:00 +0000 (UTC)
Received: by mail-ea0-f169.google.com with SMTP id h10so2697882eak.28
 for <freebsd-net@freebsd.org>; Fri, 31 Jan 2014 17:23:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :content-type; bh=DldUvhk2YYHjl7UFf7K2u2lvsRgxn091VS6CsQw3f0M=;
 b=l6jdXn+2wjwuNp2EHC+4vVMyvcnDfw1yjMFieqOxhfsD0uOvt93IeSUO46C7RsT9uc
 DS1udWMaXm4bQ7303SKhajinu6gGvaVhBfCD0mV+5E9XGakFYogTE1ALqKkjKe05fPbq
 9pRPGqaUSPA0LRPIJ7f05cFAgITuBzk8nByALAnjr53Bhp30tZgF/8Rwi6JDKAtdfeKx
 dbmcOR2AoQXgQKmJ14Tk2R81PoOt19RPUSEw/gvuTLHm7ttfBfG7ndEdsbT7qeSCyeNw
 6ru+cW3e5+XahqmqkG0m2MynVcWOYJkrLg8qNjuHwKcHSCbx9z4nbllWAIL24NtNIZHH
 46bg==
MIME-Version: 1.0
X-Received: by 10.14.6.5 with SMTP id 5mr21987956eem.51.1391217838906; Fri, 31
 Jan 2014 17:23:58 -0800 (PST)
Received: by 10.14.65.4 with HTTP; Fri, 31 Jan 2014 17:23:58 -0800 (PST)
In-Reply-To: <CALCpEUH7YHhG3gw0wrObsJux=PWmJu2K6_tzCbCfXQ21ENYHdw@mail.gmail.com>
References: <CALCpEUH7YHhG3gw0wrObsJux=PWmJu2K6_tzCbCfXQ21ENYHdw@mail.gmail.com>
Date: Fri, 31 Jan 2014 17:23:58 -0800
Message-ID: <CALCpEUEuXR7_Q8_zt2TjnsOqt_rYMH5do=UEjssqkj684Xdimg@mail.gmail.com>
Subject: Re: Errors using span interface on if_bridge(4)
From: hiren panchasara <hiren.panchasara@gmail.com>
To: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Feb 2014 01:24:00 -0000

On Fri, Jan 31, 2014 at 3:45 PM, hiren panchasara
<hiren.panchasara@gmail.com> wrote:
<skip>
> Looking at the code: if_bridge.c has bridge_span() which does
> m_copypacket() to span interface.
>
> 2549                 mc = m_copypacket(m, M_NOWAIT);
> 2550                 if (mc == NULL) {
> 2551                         sc->sc_ifp->if_oerrors++;
> 2552                         continue;
> 2553                 }
> 2554
> 2555                 bridge_enqueue(sc, dst_if, mc);
>
> Now, I am not sure if its failing at m_copypacket() or after that in
> bridge_enqueue(). Not sure how do I look at if_oerrors count.

-bash-4.2$ netstat -I ix3
Name    Mtu Network       Address              Ipkts Ierrs Idrop
Opkts Oerrs  Coll
ix3    1500 <Link#8>      90:e2:ba:30:73:41 9869468123     0 439521
28167217     0     0
ix3       - fe80::92e2:ba fe80::92e2:baff:f        0     -     -
 2     -     -
ix3       - 192.168.0.0   192.168.0.3              0     -     -
 0     -     -

(sorry if this doesn't format/line-wrap correctly).

Basically Oerrs is 0 here. So I _think_ its failing in/after bridge_enqueue()??

cheers,
Hiren

From owner-freebsd-net@FreeBSD.ORG  Sat Feb  1 01:41:06 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CC1C0E2D;
 Sat,  1 Feb 2014 01:41:06 +0000 (UTC)
Received: from mail-ig0-x229.google.com (mail-ig0-x229.google.com
 [IPv6:2607:f8b0:4001:c05::229])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 910071E27;
 Sat,  1 Feb 2014 01:41:06 +0000 (UTC)
Received: by mail-ig0-f169.google.com with SMTP id uq10so2217954igb.0
 for <multiple recipients>; Fri, 31 Jan 2014 17:41:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=809WyS4AowujPVBOoSqcmfMWqqLhGWjfV5hkWBXl1Zg=;
 b=H0NuY9jQ63yowaFL2gZKDpBBw0wl2OAyHHRhJhZv1PIn75iQEIlbNHD6yjLj6L0pxt
 pllsj2XNL3HXdCGItxjOqR1bHvM2sRfnmv8EoJnqnEq3tihDuHsqsnebNMSN8SI1hcq5
 VbG1jwVFUX1n4dvCtIR22S7OithFqNHZQ2yHQbr8pRvUpSHiOf+RWiEoxGGnYEE9/1h7
 rCAK1X05pqpFqA/WvnhOffXhfqBuuCTjB2QFkxvCicKQGU1MhP3jr2ewfAtAm1ssHL7v
 uE0AHSkKon2LS+FJyxFO7Ip+QFYR6bHTzOz8Afu2RP4u+/aBNfS94sIVngi5x/lQVGJI
 WHdg==
MIME-Version: 1.0
X-Received: by 10.42.52.209 with SMTP id k17mr17146362icg.1.1391218866028;
 Fri, 31 Jan 2014 17:41:06 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Fri, 31 Jan 2014 17:41:05 -0800 (PST)
In-Reply-To: <1609454808.1083115.1391210456671.JavaMail.root@uoguelph.ca>
References: <CABXB=RQ6LdeoNi4vNZGCaM2C_up_JCf2SpWPzm2S_M_+pTnzsQ@mail.gmail.com>
 <1609454808.1083115.1391210456671.JavaMail.root@uoguelph.ca>
Date: Fri, 31 Jan 2014 20:41:05 -0500
X-Google-Sender-Auth: B8v5sRIw6xBSVZsi1DwpfAzKP5Y
Message-ID: <CABXB=RSCHTg662Fj2CBZHD=8-GEaZyYdVRcsCTxBbaJ-f8ZwuA@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Feb 2014 01:41:06 -0000

On Fri, Jan 31, 2014 at 6:20 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Oh, and remember to try setting readahead=8 in your mounts, too. NFS will
> do a read + N readaheads (where N == 1 by default) and then wait for
> replies to those before continuing on.

Predictably, this has no effect on anything but sequential reads.

No tuning is going to change the fact that writing 14MiB/sec from the
client to the server results in 200+ MiB/sec of wasted traffic being
sent from the server back to the client.

This is from the client's interface during a write-only test:

      Interface           Traffic               Peak                Total


        vtnet1  in    202.838 MB/s        219.467 MB/s          359.898 GB

                 out    14.127 MB/s         14.346 MB/s           96.503 GB


If write performance did get to wire speed on this workload, the most
it could ever do would be <128MiB/sec, because the unused backflow of
2GiB/sec would max out the interface.

Thanks!

From owner-freebsd-net@FreeBSD.ORG  Sat Feb  1 02:09:27 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A2F531B3
 for <freebsd-net@freebsd.org>; Sat,  1 Feb 2014 02:09:27 +0000 (UTC)
Received: from smarthost1.sentex.ca (smarthost1.sentex.ca
 [IPv6:2607:f3e0:0:1::12])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 328291F5E
 for <freebsd-net@freebsd.org>; Sat,  1 Feb 2014 02:09:27 +0000 (UTC)
Received: from [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a] (saphire3.sentex.ca
 [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a])
 by smarthost1.sentex.ca (8.14.7/8.14.7) with ESMTP id s1129Pw6085759;
 Fri, 31 Jan 2014 21:09:26 -0500 (EST) (envelope-from mike@sentex.net)
Message-ID: <52EC573B.109@sentex.net>
Date: Fri, 31 Jan 2014 21:08:59 -0500
From: Mike Tancsa <mike@sentex.net>
Organization: Sentex Communications
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject: missing missing packets in igb stats ?
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.74
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Feb 2014 02:09:27 -0000

Hi Jack,
I was testing out forwarding and firewalling speeds of the igb driver on 
RELENG_10 and noticed something odd.

I have 2 boxes connected to a FreeBSD box in the middle

FreeBSD-A(em1)-----------(igb1)Router-1(igb0)----------(em1)FreeBSD-B

So Box A generates packets as fast as it can to FreeBSD box B's em1 nic. 
  Router-1 is a FreeBSD box as releng10.  Watching ifstat on Router-1 as 
I execute the command on FreeBSD-A
# ./netblast 1.1.1.2 500 100 20

start:             1391219372.477992294
finish:            1391219392.496952108
send calls:        10877557
send errors:       0
approx send rate:  543877
approx error rate: 0

I see on the router-1 box

        igb0                igb1
  Kbps in  Kbps out   Kbps in  Kbps out
     0.00      0.00      0.00      0.00
  1600.61  191639.1  280888.7      0.00
  3669.84  434348.6  636134.9      0.00
  3706.56  438636.7  596650.5      0.00
  3755.10  444358.9  562814.3      0.00
  3714.89  439478.5  562056.4      0.00
  3796.79  449397.9  562042.9      0.00
  3786.02  447957.2  577561.4      0.00
  3629.18  429453.4  601285.7      0.00
  3728.48  441312.7  597785.3      0.00
  3806.67  450401.2  596247.0      0.00
  3854.79  456150.2  597865.7      0.00
  3690.11  436552.1  596695.8      0.00
  3676.08  435002.6  596462.8      0.00
  3730.35  441535.2  597132.1      0.00
  3680.43  435518.2  596960.3      0.00
  3741.41  442685.3  597750.8      0.00
  3691.93  436870.6  596236.9      0.00
  3627.31  429120.5  594116.8      0.00
  3661.97  433492.7  595812.0      0.00
  3693.86  437169.0  597826.9      0.00
  2046.18  240635.3  331656.3      0.00
     0.00      0.00      0.00      0.00
     0.00      0.00      0.00      0.00

Notice the rate of traffic coming in on igb1 is higher than what is 
going out on igb0.  Box A thinks it sent traffic at some 536,616 pkts 
per second or 590Mb/s.  However, traffic going out is slower, and what 
is seen at box B is less. It sees the traffic at 286Mb/s and 357,873 pps

Given the lost packets, should this not show up somewhere in the igb 
statistics ?

dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.4.0
dev.igb.0.%driver: igb
dev.igb.0.%location: slot=0 function=0 handle=\_SB_.PCI0.PEG0.PEGP
dev.igb.0.%pnpinfo: vendor=0x8086 device=0x10c9 subvendor=0x8086 
subdevice=0xa03c class=0x020000
dev.igb.0.%parent: pci1
dev.igb.0.nvm: -1
dev.igb.0.enable_aim: 1
dev.igb.0.fc: 3
dev.igb.0.rx_processing_limit: 100
dev.igb.0.link_irq: 2
dev.igb.0.dropped: 0
dev.igb.0.tx_dma_fail: 0
dev.igb.0.rx_overruns: 0
dev.igb.0.watchdog_timeouts: 0
dev.igb.0.device_control: 1488978497
dev.igb.0.rx_control: 67141634
dev.igb.0.interrupt_mask: 4
dev.igb.0.extended_int_mask: 2147483655
dev.igb.0.tx_buf_alloc: 0
dev.igb.0.rx_buf_alloc: 0
dev.igb.0.fc_high_water: 58976
dev.igb.0.fc_low_water: 58960
dev.igb.0.queue0.no_desc_avail: 19682298
dev.igb.0.queue0.tx_packets: 20962740
dev.igb.0.queue0.rx_packets: 1101622
dev.igb.0.queue0.rx_bytes: 66097424
dev.igb.0.queue0.lro_queued: 0
dev.igb.0.queue0.lro_flushed: 0
dev.igb.0.queue1.no_desc_avail: 32582207
dev.igb.0.queue1.tx_packets: 50082567
dev.igb.0.queue1.rx_packets: 6598
dev.igb.0.queue1.rx_bytes: 462728
dev.igb.0.queue1.lro_queued: 0
dev.igb.0.queue1.lro_flushed: 0
dev.igb.0.mac_stats.excess_coll: 0
dev.igb.0.mac_stats.single_coll: 0
dev.igb.0.mac_stats.multiple_coll: 0
dev.igb.0.mac_stats.late_coll: 0
dev.igb.0.mac_stats.collision_count: 0
dev.igb.0.mac_stats.symbol_errors: 0
dev.igb.0.mac_stats.sequence_errors: 0
dev.igb.0.mac_stats.defer_count: 138912
dev.igb.0.mac_stats.missed_packets: 0
dev.igb.0.mac_stats.recv_no_buff: 0
dev.igb.0.mac_stats.recv_undersize: 0
dev.igb.0.mac_stats.recv_fragmented: 0
dev.igb.0.mac_stats.recv_oversize: 0
dev.igb.0.mac_stats.recv_jabber: 0
dev.igb.0.mac_stats.recv_errs: 0
dev.igb.0.mac_stats.crc_errs: 0
dev.igb.0.mac_stats.alignment_errs: 0
dev.igb.0.mac_stats.coll_ext_errs: 0
dev.igb.0.mac_stats.xon_recvd: 550808
dev.igb.0.mac_stats.xon_txd: 0
dev.igb.0.mac_stats.xoff_recvd: 550808
dev.igb.0.mac_stats.xoff_txd: 0
dev.igb.0.mac_stats.total_pkts_recvd: 1108220
dev.igb.0.mac_stats.good_pkts_recvd: 6604
dev.igb.0.mac_stats.bcast_pkts_recvd: 0
dev.igb.0.mac_stats.mcast_pkts_recvd: 0
dev.igb.0.mac_stats.rx_frames_64: 1
dev.igb.0.mac_stats.rx_frames_65_127: 6603
dev.igb.0.mac_stats.rx_frames_128_255: 0
dev.igb.0.mac_stats.rx_frames_256_511: 0
dev.igb.0.mac_stats.rx_frames_512_1023: 0
dev.igb.0.mac_stats.rx_frames_1024_1522: 0
dev.igb.0.mac_stats.good_octets_recvd: 489608
dev.igb.0.mac_stats.good_octets_txd: 10120060648
dev.igb.0.mac_stats.total_pkts_txd: 71045307
dev.igb.0.mac_stats.good_pkts_txd: 71045307
dev.igb.0.mac_stats.bcast_pkts_txd: 2
dev.igb.0.mac_stats.mcast_pkts_txd: 0
dev.igb.0.mac_stats.tx_frames_64: 2
dev.igb.0.mac_stats.tx_frames_65_127: 5051081
dev.igb.0.mac_stats.tx_frames_128_255: 65994224
dev.igb.0.mac_stats.tx_frames_256_511: 0
dev.igb.0.mac_stats.tx_frames_512_1023: 0
dev.igb.0.mac_stats.tx_frames_1024_1522: 0
dev.igb.0.mac_stats.tso_txd: 0
dev.igb.0.mac_stats.tso_ctx_fail: 0
dev.igb.0.interrupts.asserts: 6564060
dev.igb.0.interrupts.rx_pkt_timer: 1108207
dev.igb.0.interrupts.rx_abs_timer: 0
dev.igb.0.interrupts.tx_pkt_timer: 0
dev.igb.0.interrupts.tx_abs_timer: 1108220
dev.igb.0.interrupts.tx_queue_empty: 71044772
dev.igb.0.interrupts.tx_queue_min_thresh: 0
dev.igb.0.interrupts.rx_desc_min_thresh: 0
dev.igb.0.interrupts.rx_overrun: 0
dev.igb.0.host.breaker_tx_pkt: 0
dev.igb.0.host.host_tx_pkt_discard: 0
dev.igb.0.host.rx_pkt: 13
dev.igb.0.host.breaker_rx_pkts: 0
dev.igb.0.host.breaker_rx_pkt_drop: 0
dev.igb.0.host.tx_good_pkt: 535
dev.igb.0.host.breaker_tx_pkt_drop: 0
dev.igb.0.host.rx_good_bytes: 70993032
dev.igb.0.host.tx_good_bytes: 10120060648
dev.igb.0.host.length_errors: 0
dev.igb.0.host.serdes_violation_pkt: 0
dev.igb.0.host.header_redir_missed: 0
dev.igb.0.wake: 0
dev.igb.1.%desc: Intel(R) PRO/1000 Network Connection version - 2.4.0
dev.igb.1.%driver: igb
dev.igb.1.%location: slot=0 function=1
dev.igb.1.%pnpinfo: vendor=0x8086 device=0x10c9 subvendor=0x8086 
subdevice=0xa03c class=0x020000
dev.igb.1.%parent: pci1
dev.igb.1.nvm: -1
dev.igb.1.enable_aim: 1
dev.igb.1.fc: 3
dev.igb.1.rx_processing_limit: 100
dev.igb.1.link_irq: 2
dev.igb.1.dropped: 0
dev.igb.1.tx_dma_fail: 0
dev.igb.1.rx_overruns: 0
dev.igb.1.watchdog_timeouts: 0
dev.igb.1.device_control: 1488978497
dev.igb.1.rx_control: 67141634
dev.igb.1.interrupt_mask: 4
dev.igb.1.extended_int_mask: 2147483655
dev.igb.1.tx_buf_alloc: 0
dev.igb.1.rx_buf_alloc: 0
dev.igb.1.fc_high_water: 58976
dev.igb.1.fc_low_water: 58960
dev.igb.1.queue0.no_desc_avail: 0
dev.igb.1.queue0.tx_packets: 14
dev.igb.1.queue0.rx_packets: 27770289
dev.igb.1.queue0.rx_bytes: 3632804418
dev.igb.1.queue0.lro_queued: 0
dev.igb.1.queue0.lro_flushed: 0
dev.igb.1.queue1.no_desc_avail: 0
dev.igb.1.queue1.tx_packets: 6599
dev.igb.1.queue1.rx_packets: 58098597
dev.igb.1.queue1.rx_bytes: 8250006086
dev.igb.1.queue1.lro_queued: 0
dev.igb.1.queue1.lro_flushed: 0
dev.igb.1.mac_stats.excess_coll: 0
dev.igb.1.mac_stats.single_coll: 0
dev.igb.1.mac_stats.multiple_coll: 0
dev.igb.1.mac_stats.late_coll: 0
dev.igb.1.mac_stats.collision_count: 0
dev.igb.1.mac_stats.symbol_errors: 0
dev.igb.1.mac_stats.sequence_errors: 0
dev.igb.1.mac_stats.defer_count: 0
dev.igb.1.mac_stats.missed_packets: 0
dev.igb.1.mac_stats.recv_no_buff: 0
dev.igb.1.mac_stats.recv_undersize: 0
dev.igb.1.mac_stats.recv_fragmented: 0
dev.igb.1.mac_stats.recv_oversize: 0
dev.igb.1.mac_stats.recv_jabber: 0
dev.igb.1.mac_stats.recv_errs: 0
dev.igb.1.mac_stats.crc_errs: 0
dev.igb.1.mac_stats.alignment_errs: 0
dev.igb.1.mac_stats.coll_ext_errs: 0
dev.igb.1.mac_stats.xon_recvd: 0
dev.igb.1.mac_stats.xon_txd: 0
dev.igb.1.mac_stats.xoff_recvd: 0
dev.igb.1.mac_stats.xoff_txd: 0
dev.igb.1.mac_stats.total_pkts_recvd: 85868886
dev.igb.1.mac_stats.good_pkts_recvd: 85868886
dev.igb.1.mac_stats.bcast_pkts_recvd: 31
dev.igb.1.mac_stats.mcast_pkts_recvd: 0
dev.igb.1.mac_stats.rx_frames_64: 5
dev.igb.1.mac_stats.rx_frames_65_127: 6211527
dev.igb.1.mac_stats.rx_frames_128_255: 79657327
dev.igb.1.mac_stats.rx_frames_256_511: 27
dev.igb.1.mac_stats.rx_frames_512_1023: 0
dev.igb.1.mac_stats.rx_frames_1024_1522: 0
dev.igb.1.mac_stats.good_octets_recvd: 12226286048
dev.igb.1.mac_stats.good_octets_txd: 490260
dev.igb.1.mac_stats.total_pkts_txd: 6613
dev.igb.1.mac_stats.good_pkts_txd: 6613
dev.igb.1.mac_stats.bcast_pkts_txd: 4
dev.igb.1.mac_stats.mcast_pkts_txd: 0
dev.igb.1.mac_stats.tx_frames_64: 8
dev.igb.1.mac_stats.tx_frames_65_127: 6605
dev.igb.1.mac_stats.tx_frames_128_255: 0
dev.igb.1.mac_stats.tx_frames_256_511: 0
dev.igb.1.mac_stats.tx_frames_512_1023: 0
dev.igb.1.mac_stats.tx_frames_1024_1522: 0
dev.igb.1.mac_stats.tso_txd: 0
dev.igb.1.mac_stats.tso_ctx_fail: 0
dev.igb.1.interrupts.asserts: 8707927
dev.igb.1.interrupts.rx_pkt_timer: 85867976
dev.igb.1.interrupts.rx_abs_timer: 0
dev.igb.1.interrupts.tx_pkt_timer: 0
dev.igb.1.interrupts.tx_abs_timer: 85868886
dev.igb.1.interrupts.tx_queue_empty: 6613
dev.igb.1.interrupts.tx_queue_min_thresh: 0
dev.igb.1.interrupts.rx_desc_min_thresh: 0
dev.igb.1.interrupts.rx_overrun: 0
dev.igb.1.host.breaker_tx_pkt: 0
dev.igb.1.host.host_tx_pkt_discard: 0
dev.igb.1.host.rx_pkt: 910
dev.igb.1.host.breaker_rx_pkts: 0
dev.igb.1.host.breaker_rx_pkt_drop: 0
dev.igb.1.host.tx_good_pkt: 0
dev.igb.1.host.breaker_tx_pkt_drop: 0
dev.igb.1.host.rx_good_bytes: 12226288092
dev.igb.1.host.tx_good_bytes: 490260
dev.igb.1.host.length_errors: 0
dev.igb.1.host.serdes_violation_pkt: 0
dev.igb.1.host.header_redir_missed: 0


Motherboard is Intel

Base Board Information
         Manufacturer: Intel Corporation
         Product Name: DH87RL
         Version: AAG74240-401
         Serial Number: BQRL330000Q9


NIC is dual port

igb0@pci0:1:0:0:        class=0x020000 card=0xa03c8086 chip=0x10c98086 
rev=0x01 hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '82576 Gigabit Network Connection'
     class      = network
     subclass   = ethernet
     bar   [10] = type Memory, range 32, base 0xf7c20000, size 131072, 
enabled
     bar   [14] = type Memory, range 32, base 0xf7800000, size 4194304, 
enabled
     bar   [18] = type I/O Port, range 32, base 0xe020, size 32, enabled
     bar   [1c] = type Memory, range 32, base 0xf7c44000, size 16384, 
enabled
     cap 01[40] = powerspec 3  supports D0 D3  current D0
     cap 05[50] = MSI supports 1 message, 64 bit, vector masks
     cap 11[70] = MSI-X supports 10 messages, enabled
                  Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
     cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR link x4(x4)
                  speed 2.5(2.5) ASPM disabled(L0s/L1)
     ecap 0001[100] = AER 1 0 fatal 0 non-fatal 2 corrected
     ecap 0003[140] = Serial 1 90e2baffff5eb48a
     ecap 000e[150] = ARI 1
     ecap 0010[160] = SRIOV 1
igb1@pci0:1:0:1:        class=0x020000 card=0xa03c8086 chip=0x10c98086 
rev=0x01 hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '82576 Gigabit Network Connection'
     class      = network
     subclass   = ethernet
     bar   [10] = type Memory, range 32, base 0xf7c00000, size 131072, 
enabled
     bar   [14] = type Memory, range 32, base 0xf7000000, size 4194304, 
enabled
     bar   [18] = type I/O Port, range 32, base 0xe000, size 32, enabled
     bar   [1c] = type Memory, range 32, base 0xf7c40000, size 16384, 
enabled
     cap 01[40] = powerspec 3  supports D0 D3  current D0
     cap 05[50] = MSI supports 1 message, 64 bit, vector masks
     cap 11[70] = MSI-X supports 10 messages, enabled
                  Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
     cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR link x4(x4)
                  speed 2.5(2.5) ASPM disabled(L0s/L1)
     ecap 0001[100] = AER 1 0 fatal 0 non-fatal 2 corrected
     ecap 0003[140] = Serial 1 90e2baffff5eb48a
     ecap 000e[150] = ARI 1
     ecap 0010[160] = SRIOV 1


root@intel4gen-9:/usr/home/mdtancsa # netstat -m
6141/6489/12630 mbufs in use (current/cache/total)
6139/5871/12010/487416 mbuf clusters in use (current/cache/total/max)
6139/5861 mbuf+clusters out of packet secondary zone in use (current/cache)
0/5/5/243708 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/72209 9k jumbo clusters in use (current/cache/total/max)
0/0/0/40618 16k jumbo clusters in use (current/cache/total/max)
13813K/13384K/27197K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
root@intel4gen-9:/usr/home/mdtancsa #


-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/

From owner-freebsd-net@FreeBSD.ORG  Sat Feb  1 16:05:59 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id EC63763E;
 Sat,  1 Feb 2014 16:05:59 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id 9EDD71924;
 Sat,  1 Feb 2014 16:05:59 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: 
X-IronPort-AV: E=Sophos;i="4.95,760,1384318800"; d="scan'208";a="92858180"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 01 Feb 2014 11:05:52 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 6AFBCB4115;
 Sat,  1 Feb 2014 11:05:52 -0500 (EST)
Date: Sat, 1 Feb 2014 11:05:52 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: J David <j.david.lists@gmail.com>
Message-ID: <1966386250.1241234.1391270752429.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RQDDn2AW0tU4hb9mS5hN5B52VFcCGiicv0SCkMB7SEKAg@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Feb 2014 16:06:00 -0000

J David wrote:
> On Fri, Jan 31, 2014 at 6:16 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> > You can certainly try "-o rsize=61440,wsize=61440" (assuming a 4K
> > page size)
> > for the mount, if you'd like.
> 
> This has previously been tested with all 4k steps between 16k and
> 32k.
>  All of them perform worse than
> 
> With 61440, NFS fails outright on the random read test:
> 
> $ iozone -e -I -s 1g -r 4k -i 0 -i 2
> 
> Iozone: Performance Test of File I/O
> 
>         Version $Revision: 3.420 $
> 
> Compiled for 64 bit mode.
> 
> Build: freebsd
> 
> [...]
> 
> Include fsync in write timing
> 
> O_DIRECT feature enabled
> 
> File size set to 1048576 KB
> 
> Record Size 4 KB
> 
> Command line used: iozone -e -I -s 1g -r 4k -i 0 -i 2
> 
> Output is in Kbytes/sec
> 
> Time Resolution = 0.000005 seconds.
> 
> Processor cache size set to 1024 Kbytes.
> 
> Processor cache line size set to 32 bytes.
> 
> File stride size set to 17 * record size.
> 
>                                                             random
> random    bkwd   record   stride
> 
>               KB  reclen   write rewrite    read    reread    read
> write    read  rewrite     read   fwrite frewrite   fread  freread
> 
>          1048576       4   24688   23891
> 
> Error reading block at 1073729536
> 
> read: Bad file descriptor
> 
> 
> Upon using the -w option, which leaves the file intact on exit, it's
> possible to see that it's not even 1gig in length:
> 
> $ ls -aln iozone.tmp
> 
> -rw-r-----  1 1000  0  1073709056 Feb  1 01:18 iozone.tmp
> 
> 
> It's 32k short, which is a pretty surprising result.
> 
Ok, I knew that non-powers of 2 could result in problems. I thought
they only occurred when the size wasn't an exact multiple of page
size, but it seems there are non-power of 2 problems.

rick

> Thanks!
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

From owner-freebsd-net@FreeBSD.ORG  Sat Feb  1 18:53:31 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id BA9ABEFA;
 Sat,  1 Feb 2014 18:53:31 +0000 (UTC)
Received: from mail-ie0-x231.google.com (mail-ie0-x231.google.com
 [IPv6:2607:f8b0:4001:c03::231])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7F23014FE;
 Sat,  1 Feb 2014 18:53:31 +0000 (UTC)
Received: by mail-ie0-f177.google.com with SMTP id at1so5323492iec.22
 for <multiple recipients>; Sat, 01 Feb 2014 10:53:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=D7Cm9i0bNIYPxSf70mShCRurGYAhCiIERJDutZAL9Ks=;
 b=krd6fao7JtKcz1/6Cnid7Ff/+5B2ph6/4xiRNbRA4Dk9SrmK1jQ6i2/jLTksHrzjfQ
 8kiqOHDcVXdqJCrZFRObzopJCeoNkhea5AN2V7JUWXX/WBwcrQ+zWIxqKWlh/CH/zLdi
 TOV1dMZ3Wziea4OLkf8eQfTo5DfzyUgvCUnoLBsPSWlyUGfbCn+2xjVeDJdTbJk3Mp/u
 tIiNxQrd56OM8Y4yDUyJ3WnT3xH4zmjhq+iSfKrM0OVWQKXRc5r8p/tRjwNrr1U/WeSM
 OuOH/0YSilnlW5pIyHCvGKMZNMovDWZx7ZXw7iKmFaupzwhkhNMFp/uLL/Q/+D+KIviU
 BT3A==
MIME-Version: 1.0
X-Received: by 10.43.153.68 with SMTP id kz4mr20116610icc.29.1391280810893;
 Sat, 01 Feb 2014 10:53:30 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Sat, 1 Feb 2014 10:53:30 -0800 (PST)
In-Reply-To: <1966386250.1241234.1391270752429.JavaMail.root@uoguelph.ca>
References: <CABXB=RQDDn2AW0tU4hb9mS5hN5B52VFcCGiicv0SCkMB7SEKAg@mail.gmail.com>
 <1966386250.1241234.1391270752429.JavaMail.root@uoguelph.ca>
Date: Sat, 1 Feb 2014 13:53:30 -0500
X-Google-Sender-Auth: fxWhU_1JA9pbEfHRNb4Qi0mISpU
Message-ID: <CABXB=RRVkcCyxHD1ryHM=md4yjxHiwHyLDtLYL_xGJqxQh_9qA@mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Feb 2014 18:53:31 -0000

On Sat, Feb 1, 2014 at 11:05 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Ok, I knew that non-powers of 2 could result in problems. I thought
> they only occurred when the size wasn't an exact multiple of page
> size, but it seems there are non-power of 2 problems.

What can I do to help identify and resolve these problems?

Thanks!