From owner-freebsd-net@FreeBSD.ORG Sat Jul 12 06:05:50 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6A2F3A0F; Sat, 12 Jul 2014 06:05:50 +0000 (UTC) Received: from mail-pd0-x22e.google.com (mail-pd0-x22e.google.com [IPv6:2607:f8b0:400e:c02::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 387E2229C; Sat, 12 Jul 2014 06:05:50 +0000 (UTC) Received: by mail-pd0-f174.google.com with SMTP id y10so2496985pdj.19 for ; Fri, 11 Jul 2014 23:05:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=ZEeVvYtnJMf5nYAaIkTWIgRZ7PFzns+5QSKIZMGKdg0=; b=mWTbaNXam7Ldgw/hZjWhPucoQ1G31Qobin96apYj66rDRZ818mQvPv2JYMgd/hj2DE b15Pa69k3nCOEdJxOhvZAVLW2sYND3OoJswr243lYh4trZST6uNe0rF7x1zC3o5SPA0n GwD4kKx1vwRQruQ7/cjmNpWEflojQuIqIj0c3gypOaAoHdiSaJM56A4YbNwM/vnoFZhN 2mUbag/1silDsLomF0Cf0TXm0odWzZUlH1JySGpyHpBYyKoVPGn9XwnPgux0xwvnKtM4 uNc4z1uqbV6Z8fIbQR7l35SnCfE26RTHdAubxOwKTLc5rvzm3RWSZufdn9c9ZKrei9F0 rKkg== X-Received: by 10.70.0.48 with SMTP id 16mr3696096pdb.8.1405145149742; Fri, 11 Jul 2014 23:05:49 -0700 (PDT) Received: from pyunyh@gmail.com ([106.247.248.2]) by mx.google.com with ESMTPSA id qn13sm5502173pdb.69.2014.07.11.23.05.46 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 11 Jul 2014 23:05:49 -0700 (PDT) From: Yonghyeon PYUN X-Google-Original-From: "Yonghyeon PYUN" Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Sat, 12 Jul 2014 15:05:39 +0900 Date: Sat, 12 Jul 2014 15:05:38 +0900 To: John Baldwin Subject: Re: NFS client READ performance on -current Message-ID: <20140712060538.GA3649@michelle.fasterthan.com> Reply-To: pyunyh@gmail.com References: <1610703198.9975909.1405031503143.JavaMail.root@uoguelph.ca> <201407110954.23381.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201407110954.23381.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i Cc: "Russell L. Carter" , freebsd-net@freebsd.org, Rick Macklem X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jul 2014 06:05:50 -0000 On Fri, Jul 11, 2014 at 09:54:23AM -0400, John Baldwin wrote: > On Thursday, July 10, 2014 6:31:43 pm Rick Macklem wrote: > > John Baldwin wrote: > > > On Thursday, July 03, 2014 8:51:01 pm Rick Macklem wrote: > > > > Russell L. Carter wrote: > > > > > > > > > > > > > > > On 07/02/14 19:09, Rick Macklem wrote: > > > > > > > > > > > Could you please post the dmesg stuff for the network > > > > > > interface, > > > > > > so I can tell what driver is being used? I'll take a look at > > > > > > it, > > > > > > in case it needs to be changed to use m_defrag(). > > > > > > > > > > em0: port > > > > > 0xd020-0xd03f > > > > > mem > > > > > 0xfe4a0000-0xfe4bffff,0xfe480000-0xfe49ffff irq 44 at device 0.0 > > > > > on > > > > > pci2 > > > > > em0: Using an MSI interrupt > > > > > em0: Ethernet address: 00:15:17:bc:29:ba > > > > > 001.000007 [2323] netmap_attach success for em0 tx > > > > > 1/1024 > > > > > rx > > > > > 1/1024 queues/slots > > > > > > > > > > This is one of those dual nic cards, so there is em1 as well... > > > > > > > > > Well, I took a quick look at the driver and it does use m_defrag(), > > > > but > > > > I think that the "retry:" label it does a goto after doing so might > > > > be in > > > > the wrong place. > > > > > > > > The attached untested patch might fix this. > > > > > > > > Is it convenient to build a kernel with this patch applied and then > > > > try > > > > it with TSO enabled? > > > > > > > > rick > > > > ps: It does have the transmit segment limit set to 32. I have no > > > > idea if > > > > this is a hardware limitation. > > > > > > I think the retry is not in the wrong place, but the overhead of all > > > those > > > pullups is apparently quite severe. > > The m_defrag() call after the first failure will just barely squeeze > > the just under 64K TSO segment into 32 mbuf clusters. Then I think any > > m_pullup() done during the retry will allocate an mbuf > > (at a glance it seems to always do this when the old mbuf is a cluster) > > and prepend that to the list. > > --> Now the list is > 32 mbufs again and the bus_dmammap_load_mbuf_sg() > > will fail again on the retry, this time fatally, I think? > > > > I can't see any reason to re-do all the stuff using m_pullup() and Russell > > reported that moving the "retry:" fixed his problem, from what I understood. > > Ah, I had assumed (incorrectly) that the m_pullup()s would all be nops in this > case. It seems the NIC would really like to have all those things in a single > segment, but it is not required, so I agree that your patch is fine. > I recall em(4) controllers have various limitation in TSO. Driver has to update IP header to make TSO work so driver has to get a writable mbufs. bpf(4) consumers will see IP packet length is 0 after this change. I think tcpdump has a compile time option to guess correct IP packet length. The firmware of controller also should be able to access complete IP/TCP header in a single buffer. I don't remember more details in TSO limitation but I guess you may be able to get more details TSO limitation from publicly available Intel data sheet.