Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Jan 2014 14:50:47 +0900
From:      Yonghyeon PYUN <pyunyh@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        freebsd-net@freebsd.org, Adam McDougall <mcdouga9@egr.msu.edu>
Subject:   Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID:  <20140127055047.GA1368@michelle.cdnetworks.com>
In-Reply-To: <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca>
References:  <52DC1241.7010004@egr.msu.edu> <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jan 26, 2014 at 09:16:54PM -0500, Rick Macklem wrote:
> Adam McDougall wrote:
> > Also try rsize=32768,wsize=32768 in your mount options, made a huge
> > difference for me.  I've noticed slow file transfers on NFS in 9 and
> > finally did some searching a couple months ago, someone suggested it
> > and
> > they were on to something.
> > 
> I have a "hunch" that might explain why 64K NFS reads/writes perform
> poorly for some network environments.
> A 64K NFS read reply/write request consists of a list of 34 mbufs when
> passed to TCP via sosend() and a total data length of around 65680bytes.
> Looking at a couple of drivers (virtio and ixgbe), they seem to expect
> no more than 32-33 mbufs in a list for a 65535 byte TSO xmit. I think
> (I don't have anything that does TSO to confirm this) that NFS will pass
> a list that is longer (34 plus a TCP/IP header).
> At a glance, it appears that the drivers call m_defrag() or m_collapse()
> when the mbuf list won't fit in their scatter table (32 or 33 elements)
> and if this fails, just silently drop the data without sending it.
> If I'm right, there would considerable overhead from m_defrag()/m_collapse()
> and near disaster if they fail to fix the problem and the data is silently
> dropped instead of xmited.
> 

I think the actual number of DMA segments allocated for the mbuf
chain is determined by bus_dma(9).  bus_dma(9) will coalesce
current segment with previous segment if possible.

I'm not sure whether you're referring to ixgbe(4) or ix(4) but I
see the total length of all segment size of ix(4) is 65535 so
it has no room for ethernet/VLAN header of the mbuf chain.  The
driver should be fixed to transmit a 64KB datagram.
I think the use of m_defrag(9) in TSO is suboptimal. All TSO
capable controllers are able to handle multiple TX buffers so it
should have used m_collapse(9) rather than copying entire chain
with m_defrag(9).

> Anyhow, I have attached a patch that makes NFS use MJUMPAGESIZE clusters,
> so the mbuf count drops from 34 to 18.
> 

Could we make it conditional on size?

> If anyone has a TSO scatter/gather enabled net interface and can test this
> patch on it with NFS I/O (default of 64K rsize/wsize) when TSO is enabled
> and see what effect it has, that would be appreciated.
> 
> Btw, thanks go to Garrett Wollman for suggesting the change to MJUMPAGESIZE
> clusters.
> 
> rick
> ps: If the attachment doesn't make it through and you want the patch, just
>     email me and I'll send you a copy.
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140127055047.GA1368>