Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Jul 2014 18:31:43 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        John Baldwin <jhb@freebsd.org>
Cc:        "Russell L. Carter" <rcarter@pinyon.org>, freebsd-net@freebsd.org
Subject:   Re: NFS client READ performance on -current
Message-ID:  <1610703198.9975909.1405031503143.JavaMail.root@uoguelph.ca>
In-Reply-To: <201407101325.46156.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin wrote:
> On Thursday, July 03, 2014 8:51:01 pm Rick Macklem wrote:
> > Russell L. Carter wrote:
> > > 
> > > 
> > > On 07/02/14 19:09, Rick Macklem wrote:
> > > 
> > > > Could you please post the dmesg stuff for the network
> > > > interface,
> > > > so I can tell what driver is being used? I'll take a look at
> > > > it,
> > > > in case it needs to be changed to use m_defrag().
> > > 
> > > em0: <Intel(R) PRO/1000 Network Connection 7.4.2> port
> > > 0xd020-0xd03f
> > > mem
> > > 0xfe4a0000-0xfe4bffff,0xfe480000-0xfe49ffff irq 44 at device 0.0
> > > on
> > > pci2
> > > em0: Using an MSI interrupt
> > > em0: Ethernet address: 00:15:17:bc:29:ba
> > > 001.000007 [2323] netmap_attach             success for em0 tx
> > > 1/1024
> > > rx
> > > 1/1024 queues/slots
> > > 
> > > This is one of those dual nic cards, so there is em1 as well...
> > > 
> > Well, I took a quick look at the driver and it does use m_defrag(),
> > but
> > I think that the "retry:" label it does a goto after doing so might
> > be in
> > the wrong place.
> > 
> > The attached untested patch might fix this.
> > 
> > Is it convenient to build a kernel with this patch applied and then
> > try
> > it with TSO enabled?
> > 
> > rick
> > ps: It does have the transmit segment limit set to 32. I have no
> > idea if
> >     this is a hardware limitation.
> 
> I think the retry is not in the wrong place, but the overhead of all
> those
> pullups is apparently quite severe.
The m_defrag() call after the first failure will just barely squeeze
the just under 64K TSO segment into 32 mbuf clusters. Then I think any
m_pullup() done during the retry will allocate an mbuf
(at a glance it seems to always do this when the old mbuf is a cluster)
and prepend that to the list.
--> Now the list is > 32 mbufs again and the bus_dmammap_load_mbuf_sg()
    will fail again on the retry, this time fatally, I think?

I can't see any reason to re-do all the stuff using m_pullup() and Russell
reported that moving the "retry:" fixed his problem, from what I understood.

>  It would be interesting to test
> the
> following in addition to your change to see if it improves
> performance
> further:
> 
> Index: if_em.c
> ===================================================================
> --- if_em.c	(revision 268495)
> +++ if_em.c	(working copy)
> @@ -1959,7 +1959,9 @@ retry:
>  	if (error == EFBIG && remap) {
>  		struct mbuf *m;
>  
> -		m = m_defrag(*m_headp, M_NOWAIT);
> +		m = m_collapse(*m_headp, M_NOWAIT, EM_MAX_SCATTER);
> +		if (m == NULL)
> +			m = m_defrag(*m_headp, M_NOWAIT);
Since a just under 64K TSO segment barely fits in 32 mbuf clusters,
I'm at least 99% sure the m_collapse() will fail, but it can't hurt to
try it. (If it supported 33 or 34, I think m_collapse() would have a
reasonable chance of success.)

Right now the NFS and krpc code creates 2 small mbufs in front of the
read/write data clusters and I think the TCP layer adds another one.
Even if this was modified to put it all in one cluster, I don't think
m_collapse() would succeed, since it only copies the data up and deletes
an mbuf from the chain if it will all fit in the preceding one. Since
the read/write data clusters are full (except the last one), they can't
fit in the M_TRAILINGSPACE() of the preceding one unless it is empty
from my reading of m_collapse().

rick

>  		if (m == NULL) {
>  			adapter->mbuf_alloc_failed++;
>  			m_freem(*m_headp);
> 
> 
> --
> John Baldwin
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1610703198.9975909.1405031503143.JavaMail.root>