Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Aug 2008 08:06:48 GMT
From:      Marko Zec <zec@FreeBSD.org>
To:        Perforce Change Reviews <perforce@freebsd.org>
Subject:   PERFORCE change 147425 for review
Message-ID:  <200808150806.m7F86mA0039023@repoman.freebsd.org>

next in thread | raw e-mail | index | archive | help
http://perforce.freebsd.org/chv.cgi?CH=147425

Change 147425 by zec@zec_tpx32 on 2008/08/15 08:06:14

	Add an intro section to the document, clarify a few issues,
	randomly s/virtual machine/virtual environment/ or vimage or
	vnet where appropriate.

Affected files ...

.. //depot/projects/vimage/porting_to_vimage.txt#6 edit

Differences ...

==== //depot/projects/vimage/porting_to_vimage.txt#6 (text+ko) ====

@@ -6,21 +6,94 @@
 ===================
 
 Vimage is a framework in the BSD kernel which allows a co-operating module
-to present multiple instances of itself so that it can participate 
-in a virtual machine scenario.
+to operate on multiple independent instances of its state so that it can
+participate in a virtual machine / virtual environment scenario.
+
+The implementation approach taken by the vimage framwork is a replacement
+of selected global state variables with constructs that allow for the
+virtualized state to be stored and resolved in appropriate instances of
+module-specific container structures.  The code operating on virtualized state
+has to conform to a set of rules described further bellow, among other things
+in order to allow for all the changes to be conditionally compilable, i.e.
+permitting the virtualized code to fall back to operation on global state.
+
+The most visible change throughout the existing code is typically replacement
+of direct references to global variables with macros; foo_var thus becomes
+V_foo_var.  V_foo_bar macros will resolve back to foo_bar global in default
+kernel builds, and alternatively to some_base_pointer->_foo_bar for "options
+VIMAGE" kernel configs.  Prepending of "V_" prefixes to variable references
+helps in visual discrimination between global and virtualized state.  The
+framework extends the sysctl infrastructure to support access to virtualized
+state through introduction of the SYSCTL_V family of macros; those also
+automatically fall back to their standard SYSCTL counterparts in default
+kernel builds.  Transparent kldsym(2) lookups are provided to virtualized
+variables explicitly marked for visibility to kldsym interface, which permits
+userland binaries such as netstat to operate unmodified on "options VIMAGE"
+kernels, though this may have wide security implications.
+
+The vimage struct is currently primarily a placeholder for pointers to
+module-specific struct instances; currently V_NET (networking), V_CPU
+(CPU scheduling), and V_PROCG (jail-style interprocess protection) major
+module classes are defined.  Each vimage module may or may not be further
+split into minor or submodules; the networking subsystem (vimage id V_NET;
+struct vnet) in particular is organized in submodules such as VNET_MOD_NET
+(mandatory shared infrastructure: routing tables, interface lists etc.);
+VNET_MOD_INET (IPv4 state including transport protocols); VNET_MOD_INET6,
+VNET_MOD_IPSEC, VNET_MOD_IPFW, VNET_MOD_NETGRAPH etc.  The speciality of
+VNET submodules is in that they not only provide storage for virtualized
+data, but also enforce ordering of initialization and cleanup.  Hence, not
+all submodules must necessarily allocate private storage for their specific
+data; they may be defined solely for to support proper initialization
+ordering.
+
+Each process is associated with a vimage, and vimages currently hang off of
+ucred-s.  This relationship defines a process's administrative affinity
+to a vimage and thus indirectly to all of its modules (NET, CPU, PROCG)
+as well as to any submodules.  All network interfaces and sockets hold
+pointers back to their parent vnets; this relationship is obviously entirely
+independent from proc->ucred->vimage bindings.  Hence, when a process
+opens a socket, the socket will get bound to a vnet instance hanging off of
+proc->ucred->vimage->vnet, but once such a socket->vnet binding gets
+established, it cannot be changed for the entire socket lifetime.  Certain
+classes of network interfaces (Ethernet in particular) can be assigned
+from one vnet to another at any time.  By definition all vnets are
+are independent and can communicate only if they are explicitly provided
+with communication paths; currently only netgraph can be used to establish
+inter-vnet datapaths.
+
+In network traffic processing the vnet affinity is defined either by the
+inbound interface or by the socket / pcb -> vnet binding.  However, there
+are many functions in the network stack that cannot implicitly fetch
+the vnet context from their standard arguments.  Instead of explicitly
+extending argument lists of such functions with a struct vnet *,
+a per-thread variable td_vnet was introduced, which can be fetched via
+the curvnet macro (#define curvnet curthread->td_vnet).  The curvnet
+context has to be set on entry to the network stack (socket operations,
+packet reception, or timer-driven functions) and cleared on exit.  This
+must be done via provided CURVNET_SET() / CURVNET_RESTORE() family of
+macros, which allow for "stacking" of curvnet context setting and provide
+additional debugging info in INVARIANTS kernel configs.  In most cases
+however a developer writing virtualized code will not have to set /
+restore the curvnet context unless the code would include timer-driven
+events, given that those are inherently vnet-contextless on entry.
+
+
+Converting / virtualizing existing code
+=======================================
 
 There are several steps need in virtualisation.
+
 1/ decide whether the module needs to be virtualised.
 
    if the module is a driver for specific hardware, it makes sense that
    there be only one instance of the driver as there is only one piece of
    physical hardware.  There are changes in the networking code to allow
-   physical (or virtual) interfaces to be moved between virtual machines.
-   This generally requires NO changes to the network drivers of the classes
+   physical (or virtual) interfaces to be moved between vnets.  This
+   generally requires NO changes to the network drivers of the classes
    covered (e.g. ethernet).
 
 2/ decide if your module is part of one of the major module groups.
-   These are V_GLOBAL V_NET V_PROCG V_CPU.
+   These are currently V_NET V_PROCG V_CPU.
 
    The reader will note that the descriptions below  use the acronym VNET
    a lot.  The vimage system has been at this time broken into a number of 
@@ -32,11 +105,6 @@
    processors to it, but keep the saem filesystem and network setup, or
    alternatively to share processors but to have virtualised networking.
 
-   The current code has a "vnet" pointer in the thread. It could be argued
-   that it should actually be a vimage.
-
-   [comments from Marko here]
-
 3/ If the module is to be virtualised, decide which attributes of the 
    module should be virtualised. 
 
@@ -51,26 +119,28 @@
    achieve the behaviour required for part #2.
 
 5/ Work out for all the code paths through the module, how the path entering
-   the module can divine which virtual machine it is on.
+   the module can divine which virtual environment it is on.
 
    Some examples:
-   * Since interfaces are all assigned to one virtual machine or
-     another, an incoming packet has a pointer to the receive interface,
-     which in turn has a pointer to the virtual machine instance.
+   * Since interfaces are all assigned to one vnet or another, an incoming
+     packet has a pointer to the receive interface, which in turn has a
+     pointer back to the vnet.
    * Similarly, on any request from outside the kernel, (direct or indirect)
-     the current thread has a way to get to the current virtual machine
-     instance (easily referable as the "curvnet" macro).
+     the current thread has a way to get to the current virtual environment
+     instance via td->ucred->vimage.  For existig sockets the vnet context
+     must be used via so->so_vnet since td->ucred->vimage might change after
+     socket creation.
    * Timer initiated actions usually have a (void *) argument which points to 
      some private structure for the module. It should be possible to add 
-     a pointer to the appropriate virtual machine instance into whatever 
-     structure that points to.
-   * Sometimes an action (timer initialted or initialted by module load or 
-     unload simply has to chack all the virtual machine instances.
-     There is a macro (pair) for this which will iterate through all the 
-     virtual machine instances.
+     a pointer to the appropriate module instance into whatever structure
+     that points to.
+   * Sometimes an action (timer trigerred or trigerred by module load or 
+     unload simply has to check all the vimage or module instances.
+     There are macro (pairs) for this which will iterate through all the 
+     VNET or VPROCG instances.
 
    This covers most of the cases, however in some cases it may still be
-   required for the module to stash away the virtual machine instance
+   required for the module to stash away the virtual environment instance
    somewhere, and make associated changes in the code.
 
 6/ Add the code described below to the files that make up the module
@@ -80,7 +150,7 @@
 temp. note: for module FOO add a definition for VNET_MOD_FOO in sys/vimage.h.
 Thos will eventually be dynamically assigned.
 
-For now these instructions refer mainly to VNET and not VCPU etc.
+For now these instructions refer mainly to VNET and not VCPU, VPROCG etc.
 
 Symbols defined in other modules that have been virtualised will have been
 moved to a module-specific virtualisation structure. It will be defined in a 
@@ -103,18 +173,19 @@
 When VIMAGE is compiled in, the macro will evaluate to an access to an
 element in a structure pointed to by a local varible.
 For this reason, it is necessary to also add, at the beginning of
-these functions another MACRO that will instanciate this local variable
+these functions another MACRO that will instantiate this local variable
 and point it at the correct place.
-As an example, prior to using the "V_ifnet" structure, we must
-add the following MACRO at the head of a code block enclosing the references.
-  INIT_VNET_NET(initial_value);
+As an example, prior to using the "V_ifnet" structure in a program block,
+we must add the following MACRO at the head of a code block enclosing the
+references to set up module-specific base pointer variable:
+  INIT_VNET_NET(initial_valu);
 
 When VIMAGE is not defined, this will evaluate to nothing but when it
 IS defined, it will evaluate to:
   struct vnet_net *vnet_net = (initial_value);
 
 The initial value is usually something like "curvnet" which in turn
-is a macro that derives the virtual machine reference from the current thread.
+is a macro that derives the vnet affinity from the current thread.
 It could also be (m->m_ifp->if_vnet) if we were receiving an mbuf.
 
 In the case where it is just one function in a module calling
@@ -125,17 +196,17 @@
 marked as "unused"). 
 
 Usually, when a packet enters the system it is carried through the processing 
-path via a single thread, and that thread will set its virtual machine 
+path via a single thread, and that thread will set its virtual environment
 reference to that indicated by the packet on picking up that new packet.
 This means that in the normal inbound processing path as well as the
 outgoing process path the current thread can be used to indicate the
-current virtual machine. In the case of timer initiated events, best practice
-would also be to set the current virtual machine reference to that indicated
-calculated by whatever way that would be done, so that any functions called
-could rely on the current thread being a good reference for the correct
-virtual machine.
+current virtual environment. In the case of timer initiated events, best
+practice would also be to set the current virtual module reference to that
+indicated calculated by whatever way that would be done, so that any functions
+called could rely on the current thread being a good reference for the correct
+virtual module.
 
-When a new module is defined for virtualisation. The following
+When a new VNET submodule is defined for virtualisation, the following
 structure defining macro is used to define it to the framework. 
 
 
@@ -150,17 +221,18 @@
                 .vmi_struct_size        =                               \
                         sizeof(struct vnet_##m_name_lc),                \
                 .vmi_symmap             = m_symmap                      \
+
 The ID  we allocated in the temporary  first step  in "Details" is
-the first entry here. Eventually this should be automatically done
+the first entry here; eventually this should be automatically done
 by module name. The DEPENDSON field tells us the order that modules
-should be initialised in a new virtual machine. This may later need
+should be initialised in a new virtual environment. This may later need
 to be changes to a list of text module names for dynamic calculation.
-The rest of the fields are self explanatory..
+The rest of the fields are self explanatory.
 With the exception of the symmap entry.
 The symmap allows us to intercept calls by libkvm to the 
 linker when it is looking up symbols and to redirect it
 dynamically. this allows for example "netstat -r" to find the 
-routing tables for THIS virtual machine. (cute eh?)  
+routing tables for THIS virtual environment.
 (of course that won't work for core dumps). (XXX *needs thought *)
 
 As example of virtualising a dummy module named the FOO module
@@ -194,11 +266,13 @@
 #endif /* !_FOO_VFOO_H_ */
 =========================================================
 
-For each time the foo module is initiated for a new virtual machine,
+For each time the foo module is initiated for a new virtual environment,
 the foo_bar structure must be initiated, so a new foo_creator and destructor 
 functions are defined for the module. The Module will call these when a new 
-virtual machine is created or destroyed. The constructor must be called once
-for the base machine when the system is booted, even when VIMAGE is not defined. 
+virtual environment is created or destroyed. The constructor must be called
+once for the base machine when the system is booted, even when options VIMAGE
+is not defined. 
+
 ==================== in module foo.c ======
 #include "opt_vimage.h"
 [...]
@@ -229,7 +303,7 @@
   
 #ifdef VIMAGE
 /* If we have symbols we need to divert for libkvm
- * then put them in here. We may net need to do anything if
+ * then put them in here. We may not need to do anything if
  * the symbols are not used by libkvm.
  */
 static struct vnet_symmap vnet_net_symmap[] = {
@@ -239,7 +313,7 @@
 };
 /*
  * Declare our module and state that we want to be done after the 
- * loopback interface is initialised for the virtual machine.
+ * loopback interface is initialised for the virtual environment.
  */
 VNET_MOD_DECLARE(FOO, foo, vnet_foo_iattach,
     vnet_foo_idetach, LOIF, vnet_foo_symmap)
@@ -295,7 +369,7 @@
 		/* Initialize everything. */
 		/* put your code here */
 #ifdef VIMAGE
-		/* This will do the work for each vortual machine. */
+		/* This will do the work for each vortual environment. */
 		vnet_mod_register(&vnet_foo_modinfo);
 #else /* !VIMAGE */
 #ifdef FUTURE
@@ -309,7 +383,7 @@
 	case MOD_UNLOAD:
 		/* You can't unload it because an interface may be using it. */
 		/* this needs work */
-		/* Should refuse to unload if any virtual machines */
+		/* Should refuse to unload if any virtual environment */
 		/* are using this still. */
 		/* MARKO, fill in here */
 		error = EBUSY;



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200808150806.m7F86mA0039023>