Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Jul 2014 15:05:38 +0900
From:      Yonghyeon PYUN <pyunyh@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        "Russell L. Carter" <rcarter@pinyon.org>, freebsd-net@freebsd.org, Rick Macklem <rmacklem@uoguelph.ca>
Subject:   Re: NFS client READ performance on -current
Message-ID:  <20140712060538.GA3649@michelle.fasterthan.com>
In-Reply-To: <201407110954.23381.jhb@freebsd.org>
References:  <1610703198.9975909.1405031503143.JavaMail.root@uoguelph.ca> <201407110954.23381.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jul 11, 2014 at 09:54:23AM -0400, John Baldwin wrote:
> On Thursday, July 10, 2014 6:31:43 pm Rick Macklem wrote:
> > John Baldwin wrote:
> > > On Thursday, July 03, 2014 8:51:01 pm Rick Macklem wrote:
> > > > Russell L. Carter wrote:
> > > > > 
> > > > > 
> > > > > On 07/02/14 19:09, Rick Macklem wrote:
> > > > > 
> > > > > > Could you please post the dmesg stuff for the network
> > > > > > interface,
> > > > > > so I can tell what driver is being used? I'll take a look at
> > > > > > it,
> > > > > > in case it needs to be changed to use m_defrag().
> > > > > 
> > > > > em0: <Intel(R) PRO/1000 Network Connection 7.4.2> port
> > > > > 0xd020-0xd03f
> > > > > mem
> > > > > 0xfe4a0000-0xfe4bffff,0xfe480000-0xfe49ffff irq 44 at device 0.0
> > > > > on
> > > > > pci2
> > > > > em0: Using an MSI interrupt
> > > > > em0: Ethernet address: 00:15:17:bc:29:ba
> > > > > 001.000007 [2323] netmap_attach             success for em0 tx
> > > > > 1/1024
> > > > > rx
> > > > > 1/1024 queues/slots
> > > > > 
> > > > > This is one of those dual nic cards, so there is em1 as well...
> > > > > 
> > > > Well, I took a quick look at the driver and it does use m_defrag(),
> > > > but
> > > > I think that the "retry:" label it does a goto after doing so might
> > > > be in
> > > > the wrong place.
> > > > 
> > > > The attached untested patch might fix this.
> > > > 
> > > > Is it convenient to build a kernel with this patch applied and then
> > > > try
> > > > it with TSO enabled?
> > > > 
> > > > rick
> > > > ps: It does have the transmit segment limit set to 32. I have no
> > > > idea if
> > > >     this is a hardware limitation.
> > > 
> > > I think the retry is not in the wrong place, but the overhead of all
> > > those
> > > pullups is apparently quite severe.
> > The m_defrag() call after the first failure will just barely squeeze
> > the just under 64K TSO segment into 32 mbuf clusters. Then I think any
> > m_pullup() done during the retry will allocate an mbuf
> > (at a glance it seems to always do this when the old mbuf is a cluster)
> > and prepend that to the list.
> > --> Now the list is > 32 mbufs again and the bus_dmammap_load_mbuf_sg()
> >     will fail again on the retry, this time fatally, I think?
> > 
> > I can't see any reason to re-do all the stuff using m_pullup() and Russell
> > reported that moving the "retry:" fixed his problem, from what I understood.
> 
> Ah, I had assumed (incorrectly) that the m_pullup()s would all be nops in this
> case.  It seems the NIC would really like to have all those things in a single
> segment, but it is not required, so I agree that your patch is fine.
> 

I recall em(4) controllers have various limitation in TSO. Driver
has to update IP header to make TSO work so driver has to get a
writable mbufs.  bpf(4) consumers will see IP packet length is 0
after this change.  I think tcpdump has a compile time option to
guess correct IP packet length.  The firmware of controller also
should be able to access complete IP/TCP header in a single buffer.
I don't remember more details in TSO limitation but I guess you may
be able to get more details TSO limitation from publicly available
Intel data sheet.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140712060538.GA3649>