Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Aug 2008 10:48:31 -0700
From:      Julian Elischer <julian@elischer.org>
To:        Marko Zec <zec@FreeBSD.org>
Cc:        Perforce Change Reviews <perforce@freebsd.org>
Subject:   Re: PERFORCE change 147425 for review
Message-ID:  <48A5C16F.2070306@elischer.org>
In-Reply-To: <200808150806.m7F86mA0039023@repoman.freebsd.org>
References:  <200808150806.m7F86mA0039023@repoman.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Marko Zec wrote:
> http://perforce.freebsd.org/chv.cgi?CH=147425
> 
> Change 147425 by zec@zec_tpx32 on 2008/08/15 08:06:14
> 
> 	Add an intro section to the document, clarify a few issues,
> 	randomly s/virtual machine/virtual environment/ or vimage or
> 	vnet where appropriate.


THANKYOU!

> 
> Affected files ...
> 
> .. //depot/projects/vimage/porting_to_vimage.txt#6 edit
> 
> Differences ...
> 
> ==== //depot/projects/vimage/porting_to_vimage.txt#6 (text+ko) ====
> 
> @@ -6,21 +6,94 @@
>  ===================
>  
>  Vimage is a framework in the BSD kernel which allows a co-operating module
> -to present multiple instances of itself so that it can participate 
> -in a virtual machine scenario.
> +to operate on multiple independent instances of its state so that it can
> +participate in a virtual machine / virtual environment scenario.
> +
> +The implementation approach taken by the vimage framwork is a replacement
> +of selected global state variables with constructs that allow for the
> +virtualized state to be stored and resolved in appropriate instances of
> +module-specific container structures.  The code operating on virtualized state
> +has to conform to a set of rules described further bellow, among other things
> +in order to allow for all the changes to be conditionally compilable, i.e.
> +permitting the virtualized code to fall back to operation on global state.
> +
> +The most visible change throughout the existing code is typically replacement
> +of direct references to global variables with macros; foo_var thus becomes
> +V_foo_var.  V_foo_bar macros will resolve back to foo_bar global in default
> +kernel builds, and alternatively to some_base_pointer->_foo_bar for "options
> +VIMAGE" kernel configs.  Prepending of "V_" prefixes to variable references
> +helps in visual discrimination between global and virtualized state.  The
> +framework extends the sysctl infrastructure to support access to virtualized
> +state through introduction of the SYSCTL_V family of macros; those also
> +automatically fall back to their standard SYSCTL counterparts in default
> +kernel builds.  Transparent kldsym(2) lookups are provided to virtualized
> +variables explicitly marked for visibility to kldsym interface, which permits
> +userland binaries such as netstat to operate unmodified on "options VIMAGE"
> +kernels, though this may have wide security implications.
> +
> +The vimage struct is currently primarily a placeholder for pointers to
> +module-specific struct instances; currently V_NET (networking), V_CPU
> +(CPU scheduling), and V_PROCG (jail-style interprocess protection) major
> +module classes are defined.  Each vimage module may or may not be further
> +split into minor or submodules; the networking subsystem (vimage id V_NET;
> +struct vnet) in particular is organized in submodules such as VNET_MOD_NET
> +(mandatory shared infrastructure: routing tables, interface lists etc.);
> +VNET_MOD_INET (IPv4 state including transport protocols); VNET_MOD_INET6,
> +VNET_MOD_IPSEC, VNET_MOD_IPFW, VNET_MOD_NETGRAPH etc.  The speciality of
> +VNET submodules is in that they not only provide storage for virtualized
> +data, but also enforce ordering of initialization and cleanup.  Hence, not
> +all submodules must necessarily allocate private storage for their specific
> +data; they may be defined solely for to support proper initialization
> +ordering.
> +
> +Each process is associated with a vimage, and vimages currently hang off of
> +ucred-s.  This relationship defines a process's administrative affinity
> +to a vimage and thus indirectly to all of its modules (NET, CPU, PROCG)
> +as well as to any submodules.  All network interfaces and sockets hold
> +pointers back to their parent vnets; this relationship is obviously entirely
> +independent from proc->ucred->vimage bindings.  Hence, when a process
> +opens a socket, the socket will get bound to a vnet instance hanging off of
> +proc->ucred->vimage->vnet, but once such a socket->vnet binding gets
> +established, it cannot be changed for the entire socket lifetime.  Certain
> +classes of network interfaces (Ethernet in particular) can be assigned
> +from one vnet to another at any time.  By definition all vnets are
> +are independent and can communicate only if they are explicitly provided
> +with communication paths; currently only netgraph can be used to establish
> +inter-vnet datapaths.
> +
> +In network traffic processing the vnet affinity is defined either by the
> +inbound interface or by the socket / pcb -> vnet binding.  However, there
> +are many functions in the network stack that cannot implicitly fetch
> +the vnet context from their standard arguments.  Instead of explicitly
> +extending argument lists of such functions with a struct vnet *,
> +a per-thread variable td_vnet was introduced, which can be fetched via
> +the curvnet macro (#define curvnet curthread->td_vnet).  The curvnet
> +context has to be set on entry to the network stack (socket operations,
> +packet reception, or timer-driven functions) and cleared on exit.  This
> +must be done via provided CURVNET_SET() / CURVNET_RESTORE() family of
> +macros, which allow for "stacking" of curvnet context setting and provide
> +additional debugging info in INVARIANTS kernel configs.  In most cases
> +however a developer writing virtualized code will not have to set /
> +restore the curvnet context unless the code would include timer-driven
> +events, given that those are inherently vnet-contextless on entry.
> +
> +
> +Converting / virtualizing existing code
> +=======================================
>  
>  There are several steps need in virtualisation.
> +
>  1/ decide whether the module needs to be virtualised.
>  
>     if the module is a driver for specific hardware, it makes sense that
>     there be only one instance of the driver as there is only one piece of
>     physical hardware.  There are changes in the networking code to allow
> -   physical (or virtual) interfaces to be moved between virtual machines.
> -   This generally requires NO changes to the network drivers of the classes
> +   physical (or virtual) interfaces to be moved between vnets.  This
> +   generally requires NO changes to the network drivers of the classes
>     covered (e.g. ethernet).
>  
>  2/ decide if your module is part of one of the major module groups.
> -   These are V_GLOBAL V_NET V_PROCG V_CPU.
> +   These are currently V_NET V_PROCG V_CPU.
>  
>     The reader will note that the descriptions below  use the acronym VNET
>     a lot.  The vimage system has been at this time broken into a number of 
> @@ -32,11 +105,6 @@
>     processors to it, but keep the saem filesystem and network setup, or
>     alternatively to share processors but to have virtualised networking.
>  
> -   The current code has a "vnet" pointer in the thread. It could be argued
> -   that it should actually be a vimage.
> -
> -   [comments from Marko here]
> -
>  3/ If the module is to be virtualised, decide which attributes of the 
>     module should be virtualised. 
>  
> @@ -51,26 +119,28 @@
>     achieve the behaviour required for part #2.
>  
>  5/ Work out for all the code paths through the module, how the path entering
> -   the module can divine which virtual machine it is on.
> +   the module can divine which virtual environment it is on.
>  
>     Some examples:
> -   * Since interfaces are all assigned to one virtual machine or
> -     another, an incoming packet has a pointer to the receive interface,
> -     which in turn has a pointer to the virtual machine instance.
> +   * Since interfaces are all assigned to one vnet or another, an incoming
> +     packet has a pointer to the receive interface, which in turn has a
> +     pointer back to the vnet.
>     * Similarly, on any request from outside the kernel, (direct or indirect)
> -     the current thread has a way to get to the current virtual machine
> -     instance (easily referable as the "curvnet" macro).
> +     the current thread has a way to get to the current virtual environment
> +     instance via td->ucred->vimage.  For existig sockets the vnet context
> +     must be used via so->so_vnet since td->ucred->vimage might change after
> +     socket creation.
>     * Timer initiated actions usually have a (void *) argument which points to 
>       some private structure for the module. It should be possible to add 
> -     a pointer to the appropriate virtual machine instance into whatever 
> -     structure that points to.
> -   * Sometimes an action (timer initialted or initialted by module load or 
> -     unload simply has to chack all the virtual machine instances.
> -     There is a macro (pair) for this which will iterate through all the 
> -     virtual machine instances.
> +     a pointer to the appropriate module instance into whatever structure
> +     that points to.
> +   * Sometimes an action (timer trigerred or trigerred by module load or 
> +     unload simply has to check all the vimage or module instances.
> +     There are macro (pairs) for this which will iterate through all the 
> +     VNET or VPROCG instances.
>  
>     This covers most of the cases, however in some cases it may still be
> -   required for the module to stash away the virtual machine instance
> +   required for the module to stash away the virtual environment instance
>     somewhere, and make associated changes in the code.
>  
>  6/ Add the code described below to the files that make up the module
> @@ -80,7 +150,7 @@
>  temp. note: for module FOO add a definition for VNET_MOD_FOO in sys/vimage.h.
>  Thos will eventually be dynamically assigned.
>  
> -For now these instructions refer mainly to VNET and not VCPU etc.
> +For now these instructions refer mainly to VNET and not VCPU, VPROCG etc.
>  
>  Symbols defined in other modules that have been virtualised will have been
>  moved to a module-specific virtualisation structure. It will be defined in a 
> @@ -103,18 +173,19 @@
>  When VIMAGE is compiled in, the macro will evaluate to an access to an
>  element in a structure pointed to by a local varible.
>  For this reason, it is necessary to also add, at the beginning of
> -these functions another MACRO that will instanciate this local variable
> +these functions another MACRO that will instantiate this local variable
>  and point it at the correct place.
> -As an example, prior to using the "V_ifnet" structure, we must
> -add the following MACRO at the head of a code block enclosing the references.
> -  INIT_VNET_NET(initial_value);
> +As an example, prior to using the "V_ifnet" structure in a program block,
> +we must add the following MACRO at the head of a code block enclosing the
> +references to set up module-specific base pointer variable:
> +  INIT_VNET_NET(initial_valu);
>  
>  When VIMAGE is not defined, this will evaluate to nothing but when it
>  IS defined, it will evaluate to:
>    struct vnet_net *vnet_net = (initial_value);
>  
>  The initial value is usually something like "curvnet" which in turn
> -is a macro that derives the virtual machine reference from the current thread.
> +is a macro that derives the vnet affinity from the current thread.
>  It could also be (m->m_ifp->if_vnet) if we were receiving an mbuf.
>  
>  In the case where it is just one function in a module calling
> @@ -125,17 +196,17 @@
>  marked as "unused"). 
>  
>  Usually, when a packet enters the system it is carried through the processing 
> -path via a single thread, and that thread will set its virtual machine 
> +path via a single thread, and that thread will set its virtual environment
>  reference to that indicated by the packet on picking up that new packet.
>  This means that in the normal inbound processing path as well as the
>  outgoing process path the current thread can be used to indicate the
> -current virtual machine. In the case of timer initiated events, best practice
> -would also be to set the current virtual machine reference to that indicated
> -calculated by whatever way that would be done, so that any functions called
> -could rely on the current thread being a good reference for the correct
> -virtual machine.
> +current virtual environment. In the case of timer initiated events, best
> +practice would also be to set the current virtual module reference to that
> +indicated calculated by whatever way that would be done, so that any functions
> +called could rely on the current thread being a good reference for the correct
> +virtual module.
>  
> -When a new module is defined for virtualisation. The following
> +When a new VNET submodule is defined for virtualisation, the following
>  structure defining macro is used to define it to the framework. 
>  
>  
> @@ -150,17 +221,18 @@
>                  .vmi_struct_size        =                               \
>                          sizeof(struct vnet_##m_name_lc),                \
>                  .vmi_symmap             = m_symmap                      \
> +
>  The ID  we allocated in the temporary  first step  in "Details" is
> -the first entry here. Eventually this should be automatically done
> +the first entry here; eventually this should be automatically done
>  by module name. The DEPENDSON field tells us the order that modules
> -should be initialised in a new virtual machine. This may later need
> +should be initialised in a new virtual environment. This may later need
>  to be changes to a list of text module names for dynamic calculation.
> -The rest of the fields are self explanatory..
> +The rest of the fields are self explanatory.
>  With the exception of the symmap entry.
>  The symmap allows us to intercept calls by libkvm to the 
>  linker when it is looking up symbols and to redirect it
>  dynamically. this allows for example "netstat -r" to find the 
> -routing tables for THIS virtual machine. (cute eh?)  
> +routing tables for THIS virtual environment.
>  (of course that won't work for core dumps). (XXX *needs thought *)
>  
>  As example of virtualising a dummy module named the FOO module
> @@ -194,11 +266,13 @@
>  #endif /* !_FOO_VFOO_H_ */
>  =========================================================
>  
> -For each time the foo module is initiated for a new virtual machine,
> +For each time the foo module is initiated for a new virtual environment,
>  the foo_bar structure must be initiated, so a new foo_creator and destructor 
>  functions are defined for the module. The Module will call these when a new 
> -virtual machine is created or destroyed. The constructor must be called once
> -for the base machine when the system is booted, even when VIMAGE is not defined. 
> +virtual environment is created or destroyed. The constructor must be called
> +once for the base machine when the system is booted, even when options VIMAGE
> +is not defined. 
> +
>  ==================== in module foo.c ======
>  #include "opt_vimage.h"
>  [...]
> @@ -229,7 +303,7 @@
>    
>  #ifdef VIMAGE
>  /* If we have symbols we need to divert for libkvm
> - * then put them in here. We may net need to do anything if
> + * then put them in here. We may not need to do anything if
>   * the symbols are not used by libkvm.
>   */
>  static struct vnet_symmap vnet_net_symmap[] = {
> @@ -239,7 +313,7 @@
>  };
>  /*
>   * Declare our module and state that we want to be done after the 
> - * loopback interface is initialised for the virtual machine.
> + * loopback interface is initialised for the virtual environment.
>   */
>  VNET_MOD_DECLARE(FOO, foo, vnet_foo_iattach,
>      vnet_foo_idetach, LOIF, vnet_foo_symmap)
> @@ -295,7 +369,7 @@
>  		/* Initialize everything. */
>  		/* put your code here */
>  #ifdef VIMAGE
> -		/* This will do the work for each vortual machine. */
> +		/* This will do the work for each vortual environment. */
>  		vnet_mod_register(&vnet_foo_modinfo);
>  #else /* !VIMAGE */
>  #ifdef FUTURE
> @@ -309,7 +383,7 @@
>  	case MOD_UNLOAD:
>  		/* You can't unload it because an interface may be using it. */
>  		/* this needs work */
> -		/* Should refuse to unload if any virtual machines */
> +		/* Should refuse to unload if any virtual environment */
>  		/* are using this still. */
>  		/* MARKO, fill in here */
>  		error = EBUSY;




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48A5C16F.2070306>