From owner-freebsd-net@freebsd.org Mon Jun 5 18:25:18 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E53ADAFEC3E for ; Mon, 5 Jun 2017 18:25:18 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8386765333 for ; Mon, 5 Jun 2017 18:25:18 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (mh0.gentlemail.de [IPv6:2a00:e10:2800::a135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v55IPFYI044771; Mon, 5 Jun 2017 20:25:15 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id A70737A0; Mon, 5 Jun 2017 20:25:14 +0200 (CEST) Message-ID: <5935A20A.6040000@omnilan.de> Date: Mon, 05 Jun 2017 20:25:14 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Vincenzo Maffione CC: freebsd-net Subject: Re: ovs-netmap forgotten? References: <5926FFDB.7040900@omnilan.de> <592F20A0.4020702@omnilan.de> <592FC60A.1030308@omnilan.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]); Mon, 05 Jun 2017 20:25:15 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: ; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Jun 2017 18:25:19 -0000 Bezüglich Vincenzo Maffione's Nachricht vom 05.06.2017 16:06 (localtime): > Hi Harry, > I've done some investigation on this issue (just for fun) , and I think I > may have found the issue. > > When using vlan interfaces, netmap use the emulated adapter, as the "vlan" > driver is not netmap-enabled (and it cannot be). > To intercept RX packets, netmap replaces the "if_input" function pointer > field in the kernel "struct ifnet" (the struct representing a network > interface). > Note that you have an instance of "struct ifnet" for em0 (physical NIC), > and a different instance for each VLAN cloned interface (e.g. "vlan100") on > em0. > If you put vlan100 in netmap mode, netmap will replace the if_input of > vlan100, and not the if_input of em0. So far, this is an expected behaviour. > > Unfortunately, I see in the code here > > https://github.com/freebsd/freebsd/blob/master/sys/net/if_vlan.c#L1244-L1245 > > that when VLAN driver intercepts the RX packet coming from the underlying > interface (e.g. em0 in our example), the em0 if_input is used rather than > the vlan100 if_input. > > In terms of code, we have > (*ifp->if_input)(ifv->ifv_ifp, m); > rather than > (*ifv->ifv_ifp->if_input)(ifv->ifv_ifp, m); > Since em0 if_input is not replaced, netmap does not intercept it and you > don't see it in your application, e.g. > > # pkt-gen -i vlan100 -f rx > > will see nothing. > > Now, I think that normally ifv->ifv_ifp->if_input == ifp->if_input, so this > may explain why the code is written like that (to avoid the additional > pointer dereferencing). > This is not the case for netmap, where ifv->ifv_ifp->if_input != > ifp->if_input when em0 xor vlan100 are in netmap mode. > > You may try to recompile the kernel with that change and see if you can see > packets coming on vlan100 with pkt-gen. > I recommend you always doing tests with pkt-gen before trying to use > vale-ctl -a. NICE :-) Thank you very much for your effort and impressive reading-only analysis. Maybe one has to be used to ifv ifp and companion variables, or I can't see _the_ simplicity of the code or everybody else is geniuous... First quick test shows you're right and this tiny diff solves a decent share of my (ESXi-replacing) problems: --- src/sys/net/if_vlan.c.orig 2017-06-05 17:39:27.770574000 +0200 +++ src/sys/net/if_vlan.c 2017-06-05 17:39:21.550278000 +0200 @@ -1234,7 +1234,7 @@ if_inc_counter(ifv->ifv_ifp, IFCOUNTER_IPACKETS, 1); /* Pass it back through the parent's input routine. */ - (*ifp->if_input)(ifv->ifv_ifp, m); + (*ifv->ifv_ifp->if_input)(ifv->ifv_ifp, m); } static int Will do real-world tests tommorrow. Unrelated to the vlan-netmap issue, more topic-related: Last little (completely non-academic) test showed unfortunately that "vtnet|virtio-net<-vale:guestif->netmapIF" can't compete with "vmx3f|vmxnet3<-ESXivSwitch->sameHWif". The latter consumes no noticable CPU consumption when NFS-copying big files via 1GbE, like on native host (which leaves the machine 99-100% idle @108MB/s). Running the same guest with the same task on bhyve causes ~20% CPU utilization; @1GbE :-( Also there was no significant difference between vale(4) and if_bridge(4) with that workload (little IPp/s on saturated 1GbE PHY). Most likely the lack of offloading features, and thus causing many more interrupts in the guest than with vmxf3's TSO capability, is the cause. Haven't done any inter-VM "real-world" tests yet, where vale(4) will strike back... So to achive my goal, replacing my ESXi setups, I'd need your quick help again to port vmxnet3 ;-) /joking Hope ptnet can help out here, at least for FreeBSD guests, but as far as I could see, when merging netmap from HEAD to stable/11, (updated diff applicable after r319182 was available here too: ftp://ftp.omnilan.de/pub/FreeBSD/OmniLAN/misc/), bhyve(8) doesn't support ptnet yet. Is there any specific reason why ptnetmap-memdev (https://svnweb.freebsd.org/socsvn/soc2016/vincenzo/head/usr.sbin/bhyve/pci_ptnetmap_netif.c) hasn't been commited to HEAD? Does anybody have an idea if there is any vmnet/vtnet companion (in development stage) providing offloading features, reducing interrupt wastings? Another question, better addressed to virtualization@ but I remember cross-posting is to avoid: I never tried to understand why vmx3f seems to work without using interrupts at all, as opposed to vmx(4), but maybe it is possible to do the same for vtnet(4)? Thanks, -harry