From owner-freebsd-net@FreeBSD.ORG Mon Jan 20 12:50:22 2014 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 78A4EDD4 for ; Mon, 20 Jan 2014 12:50:22 +0000 (UTC) Received: from smtp.unipi.it (smtp1.unipi.it [131.114.21.19]) by mx1.freebsd.org (Postfix) with ESMTP id 0BC141D21 for ; Mon, 20 Jan 2014 12:50:20 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp.unipi.it (Postfix) with ESMTP id 8711E40E52; Mon, 20 Jan 2014 13:39:51 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at unipi.it Received: from [192.168.10.21] (triderg7.iet.unipi.it [131.114.58.42]) (Authenticated User) by smtp.unipi.it (Postfix) with ESMTPSA id 024A440B09; Mon, 20 Jan 2014 13:39:49 +0100 (CET) Message-ID: <52DD1914.7090506@iet.unipi.it> Date: Mon, 20 Jan 2014 13:39:48 +0100 From: Giuseppe Lettieri User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Wang Weidong , =?windows-1252?Q?facolt=E0?= Subject: Re: netmap: I got some troubles with netmap References: <52D74E15.1040909@huawei.com> <92C7725B-B30A-4A19-925A-A93A2489A525@iet.unipi.it> <52D8A5E1.9020408@huawei.com> In-Reply-To: <52D8A5E1.9020408@huawei.com> Content-Type: multipart/mixed; boundary="------------080404040009080203030301" X-Mailman-Approved-At: Mon, 20 Jan 2014 13:20:53 +0000 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: Luigi Rizzo , Vincenzo Maffione , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jan 2014 12:50:22 -0000 This is a multi-part message in MIME format. --------------080404040009080203030301 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Hi Wang, OK, you are using the netmap support in the upstream qemu git. That does not yet include all our modifications, some of which are very important for high throughput with VALE. In particular, the upstream qemu does not include the batching improvements in the frontend/backend interface, and it does not include the "map ring" optimization of the e1000 frontend. Please find attached a gzipped patch that contains all of our qemu code. The patch is against the latest upstream master (commit 1cf892ca). Please ./configure the patched qemu with the following option, in addition to any other option you may need: --enable-e1000-paravirt --enable-netmap \ --extra-cflags=-I/path/to/netmap/sys/directory Note that --enable-e1000-paravirt is needed to enable the "map ring" optimization in the e1000 frontend, even if you are not going to use the e1000-paravirt device. Now you should be able to rerun your tests. I am also attaching a README file that describes some more tests you may want to run. Cheers, Giuseppe Il 17/01/2014 04:39, Wang Weidong ha scritto: > On 2014/1/16 18:24, facoltà wrote: >> Hi Wang, >> >> I work with Luigi, please check the replies below. >> >> >> Il giorno 16/gen/2014, alle ore 04:53, Luigi Rizzo > ha scritto: >> >>> >>> > [...] >>> Problem 3: >>> "qemu-system-x86_64 -m 1024 -boot c -net nic -net netmap,ifname=vale0:1 -hda /home/disk/nm_d0 >>> -enable-kvm -vnc :0", Use that command to start a vm. >>> >>> I test on the vm. >>> #pkt-gen -i eth0 -f tx -l 60 -n 20000000, >>> the speed is up to 1.02 Mpps. >> >>> >>> I do "vale-ctl -h vale0:eth2", then I test on the vm, the speed is up to 558.57 Kpps. >>> While "vale-ctl -a vale0:eth2", the speed is up to 800 kpps. >>> >> >> The number you obtain in the first test is quite low. vale-ctl -h vale0:eth2 connects the host stack, which is very slow, so ~500 Kpps is not unexpected. I don’t know about the third test at the moment, I have to check. >> >> What version of our modified qemu are you using? Please note that there might be a qemu patch in the netmap sources, but that is only a leftover from our first attempts, so you should not use that. >> > Here, I use the qemu is from 'git clone git://git.qemu-project.org/qemu.git' origin/master and the commit is f976b09ea249 > ("PPC: Fix compilation with TCG debug"). The netmap is submit into the qemu in commit 58952137b0("net: Adding netmap > network backend"). Is the version I used is not right? Because of the netmap-20131019 doesn't support qemu, so I find the > newest qemu. > > Although, I try to use the netmap-20120813 which support qemu, and download the qemu-1.0.1 from http://wiki.qemu-project.org/download/, > then I patch the patch-zz-netmap-1 and copy the qemu-netmap to the qemu. I test the "pkt-gen -i eth0 -f tx -l 60 -n 20000000" on the vm, > (the pkt-gen is from netmap-20131019) And the speed is unsteadily, sometimes up to 2Mpps or 1.44, and avg is 1.74Mpps. > But when I use "./bridge -i vale0:eth2" on the host, then test "pkt-gen -i eth0 -f tx -l 60 -n 20000000" on the vm, > I got a NULL pointer dereference BUG that: > > -------------- > [ 2313.454871] BUG: unable to handle kernel NULL pointer dereference at (null) > [ 2313.547751] IP: [] get_rps_cpu+0x44/0x390 > [ 2313.613802] PGD 1f7cbe5067 PUD 1f7d792067 PMD 0 > [ 2313.668509] Oops: 0000 [#1] SMP > [ 2313.706703] CPU 0 > [ 2313.728373] Modules linked in: ixgbe(N) netmap_lin(N) edd(N) bridge(N) stp(N) llc(N) mperf(N) microcode(N) fuse(N) loop(N) dm_mod(N) vhost_net(N) macvtap(N) macvlan(N) tun(N) kvm_intel(N) sg(N) i2c_i801(N) ipv6(N) kvm(N) ipv6_lib(N) i2c_core(N) i7core_edac(N) mptctl(N) iTCO_wdt(N) igb(N) pcspkr(N) edac_core(N) rtc_cmos(N) serio_raw(N) iTCO_vendor_support(N) mdio(N) dca(N) button(N) ext3(N) jbd(N) mbcache(N) usbhid(N) hid(N) uhci_hcd(N) ehci_hcd(N) usbcore(N) usb_common(N) sd_mod(N) crc_t10dif(N) processor(N) thermal_sys(N) hwmon(N) scsi_dh_alua(N) scsi_dh_hp_sw(N) scsi_dh_rdac(N) scsi_dh_emc(N) scsi_dh(N) ata_generic(N) ata_piix(N) libata(N) mptsas(N) mptscsih(N) mptbase(N) scsi_transport_sas(N) scsi_mod(N) [last unloaded: ixgbe] > [ 2314.498465] Supported: Yes > [ 2314.530455] > [ 2314.548001] Pid: 10708, comm: bridge Tainted: G N 3.0.58-0.6.6-default #2 Huawei Technologies Co., Ltd. Tecal XH620 /BC21THSA > [ 2314.718261] RIP: 0010:[] [] get_rps_cpu+0x44/0x390 > [ 2314.813196] RSP: 0018:ffff881f5af75928 EFLAGS: 00010246 > [ 2314.876137] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > [ 2314.960745] RDX: ffff881f5af75990 RSI: ffff881f5b1da480 RDI: ffff881f59098000 > [ 2315.045354] RBP: ffff881f5b1da480 R08: 0000000000000000 R09: 0000000000000004 > [ 2315.129963] R10: 0000000080042000 R11: 0000000000000001 R12: ffff881f59098000 > [ 2315.214570] R13: ffff881f7a480000 R14: ffff881f5b1da480 R15: 00000000000003ff > [ 2315.299179] FS: 00007f948e25c700(0000) GS:ffff88203f200000(0000) knlGS:0000000000000000 > [ 2315.395135] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 2315.463237] CR2: 0000000000000000 CR3: 0000001f7bb55000 CR4: 00000000000026e0 > [ 2315.547845] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 2315.632454] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 2315.717064] Process bridge (pid: 10708, threadinfo ffff881f5af74000, task ffff881f5903a3c0) > [ 2315.816120] Stack: > [ 2315.839856] ffff881f5af7598f 0000000000000258 ffff881f81aa1280 ffffffff8137ed57 > [ 2315.927586] ffff881f5af75990 0000000000000000 ffff881f5b1da480 0000000000000296 > [ 2316.015317] ffff881f7a480000 ffff881f5b1da480 00000000000003ff ffffffff8138e998 > [ 2316.103044] Call Trace: > [ 2316.131948] [] netif_rx+0xf8/0x190 > [ 2316.191799] [] netmap_sync_to_host+0x1de/0x2b0 [netmap_lin] > [ 2316.277452] [] netmap_poll+0x495/0x610 [netmap_lin] > [ 2316.354846] [] do_poll+0x115/0x2a0 > [ 2316.414696] [] do_sys_poll+0x18e/0x200 > [ 2316.478676] [] sys_poll+0x66/0x100 > [ 2316.538526] [] system_call_fastpath+0x16/0x1b > [ 2316.609726] [<00007f948d7724bf>] 0x7f948d7724be > [ 2316.664418] Code: 24 40 49 89 fc 4c 89 74 24 48 4c 89 7c 24 50 48 89 54 24 20 0f b7 86 ac 00 00 00 66 85 c0 0f 85 d3 00 00 00 48 8b 9f d8 02 00 00 <4c> 8b 2b 4d 85 ed 0f 84 83 01 00 00 41 83 7d 00 01 0f 84 05 01 > [ 2316.888727] RIP [] get_rps_cpu+0x44/0x390 > [ 2316.955804] RSP > ------------------------- > > As you point out that I shouldn't use these old version. So the BUG not occured in the netmap-20131019 and qemu-newest which integrated the netmap-backend. > > Btw, how can I use the bridge command for testing? > > Thanks, > Wang > >> Cheers, >> Giuseppe >> >>> I did something wrong? >>> ------ >>> >>> thanks, >>> >>> Wang >>> >>> >>> >>> >>> >>> >>> -- >>> -----------------------------------------+------------------------------- >>> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazione >>> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >>> TEL +39-050-2211611 . via Diotisalvi 2 >>> Mobile +39-338-6809875 . 56122 PISA (Italy) >>> -----------------------------------------+------------------------------- >> > > -- Dr. Ing. Giuseppe Lettieri Dipartimento di Ingegneria della Informazione Universita' di Pisa Largo Lucio Lazzarino 1, 56122 Pisa - Italy Ph. : (+39) 050-2217.649 (direct) .599 (switch) Fax : (+39) 050-2217.600 e-mail: g.lettieri@iet.unipi.it --------------080404040009080203030301 Content-Type: text/plain; charset=UTF-8; name="README.images" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="README.images" EXPERIMENTING WITH NETMAP, VALE AND FAST QEMU --------------------------------------------- To ease experiments with Netmap, the VALE switch and our Qemu enhancements we have prepared a couple of bootable images (linux and FreeBSD). You can find them on the netmap page http://info.iet.unipi.it/~luigi/netmap/ where you can also look at more recent versions of this file. Below are step-by-step instructions on experiments you can run with these images. The two main versions are picobsd.hdd -> FreeBSD HEAD (netmap + VALE) tinycore.hdd -> Linux (qemu + netmap + VALE) Booting the image ----------------- For all experiments you need to copy the image on a USB stick and boot a PC with it. Alternatively, you can use the image with VirtualBox, Qemu or other emulators, as an example qemu-system-x86_64 -hda IMAGE_FILE -m 1G -machine accel=kvm ... (remove 'accel=kvm' if your host does not support kvm). The images do not install anything on the hard disk. Both systems have preloaded drivers for a number of network cards (including the intel 10 Gbit ones) with netmap extensions. The VALE switch is also available (it is part of the netmap module). ssh, scp and a few other utilities are also included. FreeBSD image: + the OS boots directly in console mode, you can switch between terminals with ALT-Fn. The password for the 'root' account is 'setup' + if you are connected to a network, you can use dhclient em0 # or other interface name to obtain an IP address and external connectivity. Linux image: + in addition to the netmap/VALE modules, the KVM kernel module is also preloaded. + the boot-loader gives you two main options (each with a variant to delay boot in case you have slow devices): + "Boot TinyCore" boots in an X11 environment as user 'tc'. You can create a few terminals using the icon at the bottom. You can use "sudo -s" to get root access. In case no suitable video card is available/detected, it falls back to command line mode. + "Boot Core (command line only)" boots in console mode with virtual terminals. You're automatically logged in as user 'tc'. To log in the other terminals use the same username (no password required). + The system should automatically recognize the existing ethernet devices, and load the appropriate netmap-capable device drivers when available. Interfaces are configured through DHCP when possible. General test recommendations ---------------------------- NOTE: The tests outlined in the following sections can generate very high packet rates, and some hardware misconfiguration problems may prevent you from achieving maximum speed. Common problems are: + slow link autonegotiation. Our programs typically wait 2-4 seconds for link negotiation to complete, but some NIC/switch combinations are much slower. In this case you should increase the delay (pkt-gen has the -w XX option for that) or possibly force the link speed and duplex mode on both sides. Check the link speed to make sure there are no nogotiation problems, and that you see the expected speed. ethtool IFNAME # on linux ifconfig IFNAME # on FreeBSD + ethernet flow control. If the receiving port is slow (often the case in presence of multicast/broadcast traffic, or also unicast if you are sending to non-netmap receivers), it will generate ethernet flow control frames that throttle down the sender. We recommend to disable BOTH RX and TX ethernet flow control on BOTH sender and receiver. On Linux this can be done with ethtool: ethtool -A IFNAME tx off rx off whereas on FreeBSD there are device-specific sysctl sysctl dev.ix.0.queue0.flow_control = 0 + CPU power saving. The CPU governor on linux, or equivalent in FreeBSD, tend to throttle down the clock rate reducing performance. Unlike other similar systems, netmap does not have busy-wait loops, so the CPU load is generally low and this can trigger the clock slowdown. Make sure that ALL CPUs run at maximum speed disabling the dynamic frequency-scaling mechanisms. cpufreq-set -gperformance # on linux sysctl dev.cpu.0.freq=3401 # on FreeBSD. + wrong MAC address netmap does not put the NIC in promiscuous mode, so unless the application does it, the NIC will only receive broadcast traffic or unicast directed to its own MAC address. STANDARD SOCKET TESTS --------------------- For most socket-based experiments you can use the "netperf" tool installed on the system (version 2.6.0). Be careful to use a matching version for the other netperf endpoint (e.g. netserver) when running tests between different machines. Interesting experiments are: netperf -H x.y.z.w -tTCP_STREAM # test TCP throughput netperf -H x.y.z.w -tTCP_RR # test latency netperf -H x.y.z.w -tUDP_STREAM -- -m8 # test UDP throughput with short packets where x.y.z.w is the host running "netserver". RAW SOCKET AND TAP TESTS ------------------------ For experiments with raw sockets and tap devices you can use the l2 utilities (l2open, l2send, l2recv) installed on the system. With these utilities you can send/receive custom network packets to/from raw sockets or tap file descriptors. The receiver can be run with one of the following commands l2open -r IFNAME l2recv # receive from a raw socket attached to IFNAME l2open -t IFNAME l2recv # receive from a file descriptor opened on the tap IFNAME The receiver process will wait indefinitely for the first packet and then keep receiving as long as packets keep coming. When the flow stops (after a 2 seconds timeout) the process terminates and prints the received packet rate and packet count. To run the sender in an easy way, you can use the script l2-send.sh in the home directory. This script defines several shell variables that can be manually changed to customize the test (see the comments in the script itself). As an example, you can test configurations with Virtual Machines attached to host tap devices bridged together. Tests using the Linux in-kernel pktgen -------------------------------------- To use the Linux in-kernel packet generator, you can use the script "linux-pktgen.sh" in the home directory. The pktgen creates a kernel thread for each hardware TX queue of a given NIC. By manually changing the script shell variable definitions you can change the test configuration (e.g. addresses in the generated packet). Please change the "NCPU" variable to match the number of CPUs on your machine. The script has an argument which specifies the number of NIC queues (i.e. kernel threads) to use minus one. For example: ./linux-pktgen.sh 2 # Uses 3 NIC queues When the script terminates, it prints the per-queue rates and the total rate achieved. NETMAP AND VALE EXPERIMENTS --------------------------- For most experiments with netmap you can use the "pkt-gen" command (do not confuse it with the Linux in-kernel pktgen), which has a large number of options to send and receive traffic (also on TAP devices). pkt-gen normally generates UDP traffic for a specific IP address and using the brodadcast MAC address Netmap testing with network interfaces -------------------------------------- Remember that you need a netmap-capable driver in order to use netmap on a specific NIC. Currently supported drivers are e1000, e1000e, ixgbe, igb. For updated information please visit http://info.iet.unipi.it/~luigi/netmap/ Before running pkt-gen, make sure that the link is up. Run pkt-gen on an interface called "IFNAME": pkt-gen -i IFNAME -f tx # run a pkt-gen sender pkt-gen -i IFNAME -f rx # run a pkt-gen receiver pkt-gen without arguments will show other options, e.g. + -w sec modifies the wait time for link negotioation + -l len modifies the packet size + -d, -s set the IP destination/source addresses and ports + -D, -S set the MAC destination/source addresses and more. Testing the VALE switch ------------------------ To use the VALE switch instead of physical ports you only need to change the interface name in the pkt-gen command. As an example, on a single machine, you can run senders and receivers on multiple ports of a VALE switch as follows (run the commands into separate terminals to see the output) pkt-gen -ivale0:01 -ftx # run a sender on the port 01 of the switch vale0 pkt-gen -ivale0:02 -frx # receiver on the port 02 of same switch pkt-gen -ivale0:03 -ftx # another sender on the port 03 The VALE switches and ports are created (and destroyed) on the fly. Transparent connection of physical ports to the VALE switch ----------------------------------------------------------- It is also possible to use a network device as a port of a VALE switch. You can do this with the following command: vale-ctl -h vale0:eth0 # attach interface "eth0" to the "vale0" switch To detach an interface from a bridge: vale-ctl -d vale0:eth0 # detach interface "eth0" from the "vale0" switch These operations can be issued at any moment. Tests with our modified QEMU ---------------------------- The Linux image also contains our modified QEMU, with the VALE backend and the "e1000-paravirt" frontend (a paravirtualized e1000 emulation). After you have booted the image on a physical machine (so you can exploit KVM), you can boot the same image a second time (recursively) with QEMU. Therefore, you can run all the tests above also from within the virtual machine environment. To make VM testing easier, the home directory contains some some useful scripts to set up and launch VMs on the physical machine. + "prep-taps.sh" creates and sets up two permanent tap interfaces ("tap01" and "tap02") and a Linux in-kernel bridge. The tap interfaces are then bridged together on the same bridge. The bridge interface ("br0"), is given the address 10.0.0.200/24. This setup can be used to make two VMs communicate through the host bridge, or to test the speed of a linux switch using l2open + "unprep-taps.sh" undoes the above setup. + "launch-qemu.sh" can be used to run QEMU virtual machines. It takes four arguments: + The first argument can be "qemu" or "kvm", depending on whether we want to use the standard QEMU binary translation or the hardware virtualization acceleration. + The third argument can be "--tap", "--netuser" or "--vale", and tells QEMU what network backend to use: a tap device, the QEMU user networking (slirp), or a VALE switch port. + When the third argument is "--tap" or "--vale", the fourth argument specifies an index (e.g. "01", "02", etc..) which tells QEMU what tap device or VALE port to use as backend. You can manually modify the script to set the shell variables that select the type of emulated device (e.g. e1000, virtio-net-pci, ...) and related options (ioeventfd, virtio vhost, e1000 mitigation, ....). The default setup has an "e1000" device with interrupt mitigation disabled. You can try the paravirtualized e1000 device ("e1000-paravirt") or the "virtio-net" device to get better performance. However, bear in mind that these paravirtualized devices don't have netmap support (whereas the standard e1000 does have netmap support). Examples: # Run a kvm VM attached to the port 01 of a VALE switch ./launch-qemu.sh kvm --vale 01 # Run a kvm VM attached to the port 02 of the same VALE switch ./launch-qemu.sh kvm --vale 02 # Run a kvm VM attached to the tap called "tap01" ./launch-qemu.sh kvm --tap 01 # Run a kvm VM attached to the tap called "tap02" ./launch-qemu.sh kvm --tap 02 Guest-to-guest tests -------------------- If you run two VMs attached to the same switch (which can be a Linux bridge or a VALE switch), you can run guest-to-guest experiments. All the tests reported in the previous sections are possible (normal sockets, raw sockets, pkt-gen, ...), indipendently of the backend used. In the following examples we assume that: + Each VM has an ethernet interface called "eth0". + The interface of the first VM is given the IP 10.0.0.1/24. + The interface of the second VM is given the IP 10.0.0.2/24. + The Linux bridge interface "br0" on the host is given the IP 10.0.0.200/24. Examples: [1] ### Test UDP short packets over traditional sockets ### # On the guest 10.0.0.2 run netserver # on the guest 10.0.0.1 run netperf -H10.0.0.2 -tUDP_STREAM -- -m8 [2] ### Test UDP short packets with pkt-gen ### # On the guest 10.0.0.2 run pkt-gen -ieth0 -frx # On the guest 10.0.0.1 run pkt-gen -ieth0 -ftx [3] ### Test guest-to-guest latency ### # On the guest 10.0.0.2 run netserver # On the guest 10.0.0.1 run netperf -H10.0.0.2 -tTCP_RR Note that you can use pkt-gen into a VM only if the emulated ethernet device is supported by netmap. The default emulated device is "e1000", which has netmap support. If you try to run pkt-gen on an unsupported device, pkt-gen will not work, reporting that it is unable to register the interface. Guest-to-host tests (follows from the previous section) ------------------------------------------------------- If you run only a VM on your host machine, you can measure the network performance between the VM and the host machine. In this case the experiment setup depends on the backend you are using. With the tap backend, you can use the bridge interface "br0" as a communication endpoint. You can run normal/raw sockets experiments, but you cannot use pkt-gen on the "br0" interface, since the Linux bridge interface is not supported by netmap. Examples with the tap backend: [1] ### Test TCP throughput over traditional sockets ### # On the host run netserver # on the guest 10.0.0.1 run netperf -H10.0.0.200 -tTCP_STREAM [2] ### Test UDP short packets with pkt-gen and l2 ### # On the host run l2open -r br0 l2recv # On the guest 10.0.0.1 run (xx:yy:zz:ww:uu:vv is the # "br0" hardware address) pkt-gen -ieth0 -ftx -d10.0.0.200:7777 -Dxx:yy:zz:ww:uu:vv With the VALE backend you can perform only UDP tests, since we don't have a netmap application which implements a TCP endpoint: pkt-gen generates UDP packets. As a communication endpoint on the host, you can use a virtual VALE port opened on the fly by a pkt-gen instance. Examples with the VALE backend: [1] ### Test UDP short packets ### # On the host run pkt-gen -ivale0:99 -frx # On the guest 10.0.0.1 run pkt-gen -ieth0 -ftx [2] ### Test UDP big packets (receiver on the guest) ### # On the guest 10.0.0.1 run pkt-gen -ieth0 -frx # On the host run pkt-gen -ivale0:99 -ftx -l1460 --------------080404040009080203030301--