Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Jan 2014 21:27:17 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        J David <j.david.lists@gmail.com>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID:  <390483613.15499210.1390530437153.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RToav%2B%2BV38pOorVPWpgZSuYmL-x7e8oxd3ayJCmAtLn-g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
J David wrote:
> On Wed, Jan 22, 2014 at 11:12 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> > So, do you consider the 32K results as reasonable or terrible
> > performance?
> > (They are obviously much better than 64K, except for the reread
> > case.)
> 
> It's the 64k numbers that prompted the "terrible" thread title.  A
> 196K/sec write speed on a 10+ Gbit network is pretty disastrous.
> 
> The 32k numbers are, as you say, better.  Possibly reasonable, but
> I'm
> not sure if they're optimal.  It's hard to tell the latency of the
> virtual network, which would be needed to make that determination.
>  It
> would be best if FreeBSD out of the box blows the doors of Debian out
> of the box and FreeBSD tuned to the gills blows the doors off of
> Debian tuned to the gills.  Right now, Debian seems to be the one
> with
> the edge and with FreeBSD's illustrious history as the NFS
> performance
> king for so many years, that just won't do. :)
> 
> > Btw, I don't think you've mentioned what network device driver gets
> > used
> > for this virtual environment. That might be useful, in case the
> > maintainer
> > of that driver is aware of some issue/patch.
> 
> KVM uses virtio.
> 
> >> 00:38:07.932732 IP (tos 0x0, ttl 64, id 38912, offset 0, flags
> >> [DF],
> >> proto TCP (6), length 53628)
> >>
> > I don't know why this would be so large. A 32K write should be
> > under
> > 33Kbytes in size, not 53Kbytes. I suspect tcpdump is confused?
> 
> Since TCP is stream oriented, is there a reason to expect 1:1
> correlation between NFS writes and TCP packets?
> 
Well, my TCP is pretty rusty, but...
Since your stats didn't show any jumbo frames, each IP
datagram needs to fit in the MTU of 1500bytes. NFS hands an mbuf
list of just over 64K (or 32K) to TCP in a single sosend(), then TCP
will generate about 45 (or about 23 for 32K) TCP segments and put
each in an IP datagram, then hand it to the network device driver
for transmission. (wireshark figures this out and shows you the 45
TCP/IP packets + a summary of the NFS RPC message they make up.
tcpdump doesn't know how to do this stuff. At least not any version
I've used.)

So, in summary, no, unless you use a very small 1Kbyte rsize/wsize.

> > Well, it seems Debian is doing 4096 byte writes, which won't have
> > anywhere
> > near the effect on the network driver/virtual hardware that a 64K
> > (about
> > 45 IP datagrams) in one NFS RPC will.
> 
> Debian's kernel says it is doing 64k reads/writes on that mount.  So
> again, possibly an expectation of 1:1 correlation between NFS writes
> and TCP packets is not being satisfied.
> 
The tcpdump you posted was showing 4Kbyte NFS writes, not a 4K TCP/IP
datagram. As above, the 4K NFS RPC message will be 3 TCP/IP packets.
tcpdump did succeed in figuring this out, unlike the large writes.
(Look for the entries with "filesync" mentioned in it, for the tcpdump
 stuff you posted.) 

> However, iozone is doing 4k reads/writes for these tests, so it's
> also
> possible that Debian is not coalescing them at all (which FreeBSD
> apparently is) and the 4k writes are hitting the virtual wire as-is.
> 
Yep, or the Linux client might be writing a page at a time.

> Also, both sides have TSO and LRO, so it would be surprising (and
> incorrect?) behavior if a 64k packet were actually fragmented into 45
> IP datagrams.  Although if something is happening to temporarily
> provoke exactly that behavior, it might explain the 1500 byte
> packets,
> so that's definitely a lead.  Maybe it would be possible for me to
> peek at the stream from various different points and establish who is
> doing the fragmenting.
> 
The stuff you posted didn't list any jumbo frames, so 1500byte TCP/IP
datagrams must be what TCP generates and sends via the network device
driver.

> It could be that if Debian is basically disregarding the 64k setting
> and using only 4k packets, it's simply not hitting whatever
> large-packet bad behavior that is harming FreeBSD.  However it also
> performs better in the server role, with the client requesting the
> larger packets.  So that's not definitive.
> 
A little nit here. It isn't a large packet, it is a burst of 1500byte
packets resulting from a send of a large NFS RPC message.

> > Yea, looking at this case in wireshark might make what is going on
> > apparent.
> 
> Possibly, but that would likely have to be done by someone with more
> NFS protocol familiarity than I.
> 
Well, wireshark is pretty good at pointing out stuff like retransmits,
which are mostly what you are looking for. (And you can easily scan the timestamps
column and look for large delays.) It also reports relative sequence
numbers, so you don't have to do the math in your head.

> Also, the incorrect checksums on outbound packets are normal because
> the interface supports checksum offloading.  The checksum simply
> hasn't been calculated yet when tcpdump sees it.
> 
Ok, that makes sense. I never use TSO or checksum offload, since I've
seen these broken too many times (but that doesn't mean they're broken
in this case). I recall you saying you tried turning off TSO with no
effect. You might also try turning off checksum offload. I doubt it will
be where things are broken, but might be worth a try.

Again, if you take the packet trace for a FreeBSD 64K test and put it
in wireshark, you might be able to see how things are broken. (wireshark
is your friend, believe me on this one;-)

rick

> Thanks!
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?390483613.15499210.1390530437153.JavaMail.root>