From owner-freebsd-net@FreeBSD.ORG Sat Jul 12 21:14:08 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0BC2025A; Sat, 12 Jul 2014 21:14:08 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id B56DF24A1; Sat, 12 Jul 2014 21:14:07 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqYEAL+jwVODaFve/2dsb2JhbABZg2BagnG+QwqGcFMBgSd1hAQBAQQBAQEgBCcgCwUWGAICDRkCKQEJJg4HBAEcBIghDa5+mAsXgSyNSAYBARs0B4J3gUwFmCKENpJTg2AhNX0IFyI X-IronPort-AV: E=Sophos;i="5.01,650,1400040000"; d="scan'208";a="139847356" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 12 Jul 2014 17:14:00 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 32840B40F3; Sat, 12 Jul 2014 17:14:00 -0400 (EDT) Date: Sat, 12 Jul 2014 17:14:00 -0400 (EDT) From: Rick Macklem To: pyunyh@gmail.com Message-ID: <2136988575.13956627.1405199640153.JavaMail.root@uoguelph.ca> In-Reply-To: <20140712060538.GA3649@michelle.fasterthan.com> Subject: Re: NFS client READ performance on -current MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: "Russell L. Carter" , freebsd-net@freebsd.org, John Baldwin X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jul 2014 21:14:08 -0000 Yonghyeon Pyun wrote: > On Fri, Jul 11, 2014 at 09:54:23AM -0400, John Baldwin wrote: > > On Thursday, July 10, 2014 6:31:43 pm Rick Macklem wrote: > > > John Baldwin wrote: > > > > On Thursday, July 03, 2014 8:51:01 pm Rick Macklem wrote: > > > > > Russell L. Carter wrote: > > > > > > > > > > > > > > > > > > On 07/02/14 19:09, Rick Macklem wrote: > > > > > > > > > > > > > Could you please post the dmesg stuff for the network > > > > > > > interface, > > > > > > > so I can tell what driver is being used? I'll take a look > > > > > > > at > > > > > > > it, > > > > > > > in case it needs to be changed to use m_defrag(). > > > > > > > > > > > > em0: port > > > > > > 0xd020-0xd03f > > > > > > mem > > > > > > 0xfe4a0000-0xfe4bffff,0xfe480000-0xfe49ffff irq 44 at > > > > > > device 0.0 > > > > > > on > > > > > > pci2 > > > > > > em0: Using an MSI interrupt > > > > > > em0: Ethernet address: 00:15:17:bc:29:ba > > > > > > 001.000007 [2323] netmap_attach success for em0 > > > > > > tx > > > > > > 1/1024 > > > > > > rx > > > > > > 1/1024 queues/slots > > > > > > > > > > > > This is one of those dual nic cards, so there is em1 as > > > > > > well... > > > > > > > > > > > Well, I took a quick look at the driver and it does use > > > > > m_defrag(), > > > > > but > > > > > I think that the "retry:" label it does a goto after doing so > > > > > might > > > > > be in > > > > > the wrong place. > > > > > > > > > > The attached untested patch might fix this. > > > > > > > > > > Is it convenient to build a kernel with this patch applied > > > > > and then > > > > > try > > > > > it with TSO enabled? > > > > > > > > > > rick > > > > > ps: It does have the transmit segment limit set to 32. I have > > > > > no > > > > > idea if > > > > > this is a hardware limitation. > > > > > > > > I think the retry is not in the wrong place, but the overhead > > > > of all > > > > those > > > > pullups is apparently quite severe. > > > The m_defrag() call after the first failure will just barely > > > squeeze > > > the just under 64K TSO segment into 32 mbuf clusters. Then I > > > think any > > > m_pullup() done during the retry will allocate an mbuf > > > (at a glance it seems to always do this when the old mbuf is a > > > cluster) > > > and prepend that to the list. > > > --> Now the list is > 32 mbufs again and the > > > bus_dmammap_load_mbuf_sg() > > > will fail again on the retry, this time fatally, I think? > > > > > > I can't see any reason to re-do all the stuff using m_pullup() > > > and Russell > > > reported that moving the "retry:" fixed his problem, from what I > > > understood. > > > > Ah, I had assumed (incorrectly) that the m_pullup()s would all be > > nops in this > > case. It seems the NIC would really like to have all those things > > in a single > > segment, but it is not required, so I agree that your patch is > > fine. > > > > I recall em(4) controllers have various limitation in TSO. Driver > has to update IP header to make TSO work so driver has to get a > writable mbufs. bpf(4) consumers will see IP packet length is 0 > after this change. I think tcpdump has a compile time option to > guess correct IP packet length. The firmware of controller also > should be able to access complete IP/TCP header in a single buffer. > I don't remember more details in TSO limitation but I guess you may > be able to get more details TSO limitation from publicly available > Intel data sheet. I think that the patch should handle this ok. All of the m_pullup() stuff gets done the first time. Then, if the result is more than 32 mbufs in the list, m_defrag() is called to copy the chain. This should result in all the header stuff in the first mbuf cluster and the map call is done again with this list of clusters. (Without the patch, m_pullup() would allocate another prepended mbuf and make the chain more than 32mbufs again.) Russell seemed to confirm that the patch fixed the problem for him, but since I don't have em(4) hardware, it would be nice to have someone with commit privilege and access to em(4) hardware test and commit it. rick > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to > "freebsd-net-unsubscribe@freebsd.org" >