Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Aug 2016 13:48:12 GMT
From:      vincenzo@FreeBSD.org
To:        svn-soc-all@FreeBSD.org
Subject:   socsvn commit: r308087 - soc2016/vincenzo
Message-ID:  <201608201348.u7KDmCql099004@socsvn.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: vincenzo
Date: Sat Aug 20 13:48:12 2016
New Revision: 308087
URL: http://svnweb.FreeBSD.org/socsvn/?view=rev&rev=308087

Log:
  Add README for the project

Added:
  soc2016/vincenzo/README

Added: soc2016/vincenzo/README
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ soc2016/vincenzo/README	Sat Aug 20 13:48:12 2016	(r308087)
@@ -0,0 +1,225 @@
+== High-performance TCP/IP networking for bhyve VMs using netmap passthrough ==
+
+  Student:  Vincenzo Maffione (vincenzo AT freebsd DOT org)
+  Mentor:  Luigi Rizzo (luigi AT freebsd DOT org)
+
+
+=========================== Project proposal ==========================
+
+Netmap passhthrough (ptnetmap) has been recently introduced on Linux/FreeBSD
+platforms, where QEMU-KVM/bhyve hypervisors allow VMs to exchange over 20 Mpps
+through VALE switches. Unfortunately, the original ptnetmap implementation
+was not able to exchange packets with the guest TCP/IP stack, it only
+supported guest applications running directly over netmap. Moreover, ptnetmap
+was not able to support multi-ring netmap ports.
+
+I have recently developed a prototype of ptnet, a new multi-ring
+paravirtualized device for Linux and QEMU/KVM that builds on ptnetmap to allow
+VMs to exchange TCP traffic at 20 Gbps, while still offering the same ptnetmap
+performance to native netmap applications.
+
+In this project I would like to implement ptnet for FreeBSD and bhyve, which 
+do not currently allow TCP/IP traffic with such high performance. Taking
+the above prototype as a reference, the following work is required:
+
+ - Implement a ptnet driver for FreeBSD guests that is able to attach to netmap
+   to support native netmap applications (estimated new code ~700 loc).
+ - Export a network interface to the FreeBSD guest kernel that allows ptnet
+   to be used by the network stack, including virtio-net header support
+   (estimated new code ~800 loc).
+ - Extend bhyve to emulate the ptnet device model and interact with the netmap
+   instance used by the hypervisor (estimated new code ~600 loc).
+
+
+================== An overview of netmap and ptnetmap ====================
+
+Netmap is a framework for high performance network I/O. It exposes an
+hardware-independent API which allows userspace application to directly
+interact
+with NIC hardware rings, in order to receive and transmit Ethernet frames.
+Rings are always accessed in the context of system calls and NIC interrups
+are used to notify applications about NIC processing completion.
+The performance boost of netmap w.r.t. traditional socket API primarily comes
+from: (i) batching, since it is possible to send/receive hundreds of packets
+with a single system call, (ii) preallocation of packet buffers and memory
+mapping of those in the application address space.
+
+Several netmap extension have been developed to support virtualization.
+Netmap support for various paravirtualized drivers - e.g. virtio-net, Xen
+netfront/netback - allows netmap applications to run in the guest over fast
+paravirtualized I/O devices.
+
+The Virtual Ethernet (VALE) software switch, which supports scalable high
+performance local communication (over 20 Mpps between two switch ports), can
+then be used to connect together multiple VMs.
+
+However, in a typical scenario with two communicating netmap applications
+running in different VMs (on the same host) connected through a VALE switch,
+the journey of a packet is still quite convoluted. As a matter of facts,
+while netmap is fast on both the host (the VALE switch) and the guest
+(interaction between application and the emulated device), each packet still
+needs to be processed from the hypervisor, which needs to emulate the
+device model used in the guest (e.g. e1000, virtio-net). The emulation
+involves device-specific overhead - queue processing, format conversions,
+packet copies, address translations, etc. As a consequence, the maximum
+packet rate between the two VMs is often limited by 2-5 Mpps.
+
+To overcome these limitations, ptnetmap has been introduced as a passthrough
+technique to completely avoid hypervisor processing in the packet
+datapath, unblocking the full potential of netmap also for virtual machine
+environments.
+With ptnetmap, a netmap port on the host can be exposed to the guest in a
+protected way, so that netmap applications in the guest can directly access
+the rings and packet buffers of the host port, avoiding all the extra overhead
+involved in the emulation of network devices. System calls issued by guest
+applications on ptnetmap ports are served by kernel threads (one
+per ring) running in the netmap host.
+
+Similarly to VirtIO paravirtualization, synchronization between
+guest netmap (driver) and host netmap (kernel threads) happens through a
+shared memory area called Communication Status Block (CSB), which is used
+to store producer-consumer state and notification suppression flags.
+
+Two notification mechanisms needs to be supported by the hypervisor to allow
+guest and host netmap to wake up each other.
+On QEMU/bhyve, notifications from guest to host are implemented with accesses
+to I/O registers which cause a trap in the hypervisor. Notifications in the
+other direction are implemented using KVM/bhyve interrupt injection mechanisms.
+MSI-X interrupts are used since they have less overhead than traditional
+PCI interrupts.
+
+Since I/O register accesses and interrupts are very expensive in the common
+case of hardware assisted virtualization, they are suppressed when not needed,
+i.e. each time the host (or the guest) is actively polling the CSB to
+check for more work. From an high-level perspective, the system tries to
+dynamically switch between polling operation under high load, and
+interrupt-based operation under lower loads.
+
+
+===================== The ptnet paravirtualized device ================
+
+The original ptnetmap implementation required ptnetmap-enabled virtio-net/e1000
+drivers. Only the notification functionalities of those devices were reused,
+while the datapath (e.g. e1000 rings or virtio-net Virtual Queues) was
+completely bypassed.
+
+The ptnet device has been introduced as a cleaner approach to ptnetmap that
+also adds the ability to interact with the standard TCP/IP network stack
+and supports multi-ring netmap ports. The introduction of a new device model
+does not limit the adoption of this solution, since ptnet drivers are
+distributed together with netmap, and hypervisor modifications are needed in
+any case.
+
+The ptnet device belongs to the classes of paravirtualized devices, like
+virtio-net. Unlike virtio-net, however, ptnet does not define an interface
+to exchange packets (datapath), but the existing netmap API is used instead.
+However, a CSB - cleaned up and extended to support an arbitrary number of
+rings - is still used for producer-consumer synchronization and notification
+suppression.
+
+A number of device registers are used for configuration (number of rings and
+slots, device MAC address, supported features, ...) while "kick" registers
+are used for guest-to-host notifications.
+The ptnetmap kthread infrastructure, moreover, has been already extended to
+suppor an arbitrary number of rings, where currently each ring is served
+by a different kernel thread.
+
+
+=============================== Deliverables ===============================
+
+==== D1 (due by week 3) ====
+Implement a ptnet driver for FreeBSD guests, which only supports native netmap
+applications. This new driver can be tested using Linux and QEMU-KVM as
+hypervisor, which already supports ptnetmap and emulates the ptnet device model.
+Since the datapath will be equivalent, we expect to have the same performance
+of the original ptnetmap (over 20 Mpps for VALE ports, 14.88 Mpps for hardware
+10Gbit ports).
+
+==== D2 (due by mid-term) ====
+Extend the ptnet FreeBSD driver to export a regular network interface to the
+FreeBSD kernel. In terms of latency, we expect a performance similar to the the
+ptnet linux driver.
+
+==== D3 (due by week 9) ====
+Extend the ptnet FreeBSD driver to support TCP Segmentation Offloading (TSO)
+and Checksum offloading, by means of the virtio-net header, similarly to
+what is done in the linux driver. After this step we expect to have a TCP
+performance similar to the Linux one.
+
+==== D4 (due by the end of project) ====
+Implement the emulation of the ptnet device model in bhyve, starting from a
+bhyve version supporting netmap and ptnetmap, which is already available.
+At this point we expect FreeBSD guests over bhyve to see similar TCP/IP
+throughput and latency performance as Linux guests over QEMU-KVM (about 20 Gbps
+for TCP bulk traffic and about 40 thousand HTTP-like transactions per second
+between two guests running through a VALE switch).
+
+
+================================= Milestones ================================
+
+Start date: 2016/05/23
+
+Estimated end dates: 2016/08/23
+
+Timetable:
+
+ * Week 1-2: Write ptnet FreeBSD driver supporting netmap native applications [D1] --> COMPLETED
+ * Week 3: Tests, bug-fixing, and performance evaluation [D1] --> COMPLETED
+ * Week 4-5: Write FreeBSD network interface support for the ptnet driver. [D2] --> COMPLETED
+ * Week 6: Tests, bug-fixing [D2]. Prepare documents for mid-term evaluation. --> COMPLETED
+ * Week 7-8: Add virtio-net header support to the ptnet driver [D3]. --> COMPLETED
+ * Week 9: Test, bug-fixes and performance evaluation [D3]. --> COMPLETED
+ * Week 10: Write ptnet device model emulation for bhyve [D4] --> COMPLETED
+ * Week 11: Test and performance evaluation over bhyve [D4].
+ * Week 12: Clean code and prepare documentation for final evaluation.
+
+
+========================== Final submission =================================
+
+Final code of my project is available at the following SVN repository:
+
+    https://svnweb.freebsd.org/socsvn/soc2016/vincenzo/
+
+which refers to FreeBSD head (11.0-CURRENT).
+
+Moreover, all modifications I did to netmap (see below) have also been
+merged also in the netmap GIT repository, so that can be also found at
+https://github.com/luigirizzo/netmap.
+
+My code modifications belong to two different subsystems:
+
+    (1) netmap, where I added the ptnet device driver, implemented as a single
+        source file, named head/sys/dev/netmap/if_ptnet.c.
+        The file is available at the following link into the SVN repository:
+            https://svnweb.freebsd.org/socsvn/soc2016/vincenzo/head/sys/dev/netmap/if_ptnet.c?view=markup
+
+        Moreover, some code reorganization and bug-fixing to other parts of
+        netmap were necessary, including rearrangements of the ptnet driver
+        for Linux that I had already developed. A complete patch (which
+        also includes the if_ptnet.c FreeBSD driver) can be obtained with
+        the following command on the github netmap repository:
+            git diff --author="Vincenzo Maffione" 09936864fa5b67b82ef4a9907819b7018e9a38f2 master
+
+    (2) bhyve, where I reworked and fixed the netmap support and added the
+        emulation of the ptnet device. Code modifications can be obtained
+        with the following SVN diff on the SVN repository:
+
+          $ svn diff -r 302612 usr.sbin/bhyve
+
+
+A modified version of QEMU that supports ptnet (not developed in the
+context of this GSOC project) is available here:
+
+    https://github.com/vmaffione/qemu/tree/ptnet
+
+    
+
+
+======================= Useful links ==============================
+
+ * [0] http://info.iet.unipi.it/~luigi/netmap/
+ * [1] https://wiki.freebsd.org/SummerOfCode2016/PtnetDriverAndDeviceModel#preview
+ * [2] https://svnweb.freebsd.org/socsvn/soc2016/vincenzo/
+ * [3] https://github.com/luigirizzo/netmap
+ * [4] https://github.com/vmaffione/qemu/tree/ptnet
+



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201608201348.u7KDmCql099004>