From owner-p4-projects@FreeBSD.ORG Tue Aug 18 07:03:26 2009 Return-Path: Delivered-To: p4-projects@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 32767) id 6D7FD106568E; Tue, 18 Aug 2009 07:03:26 +0000 (UTC) Delivered-To: perforce@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 32006106568C for ; Tue, 18 Aug 2009 07:03:26 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from repoman.freebsd.org (repoman.freebsd.org [IPv6:2001:4f8:fff6::29]) by mx1.freebsd.org (Postfix) with ESMTP id 1F11E8FC5B for ; Tue, 18 Aug 2009 07:03:26 +0000 (UTC) Received: from repoman.freebsd.org (localhost [127.0.0.1]) by repoman.freebsd.org (8.14.3/8.14.3) with ESMTP id n7I73PGO004972 for ; Tue, 18 Aug 2009 07:03:26 GMT (envelope-from julian@freebsd.org) Received: (from perforce@localhost) by repoman.freebsd.org (8.14.3/8.14.3/Submit) id n7I73PAF004970 for perforce@freebsd.org; Tue, 18 Aug 2009 07:03:25 GMT (envelope-from julian@freebsd.org) Date: Tue, 18 Aug 2009 07:03:25 GMT Message-Id: <200908180703.n7I73PAF004970@repoman.freebsd.org> X-Authentication-Warning: repoman.freebsd.org: perforce set sender to julian@freebsd.org using -f From: Julian Elischer To: Perforce Change Reviews Cc: Subject: PERFORCE change 167462 for review X-BeenThere: p4-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: p4 projects tree changes List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Aug 2009 07:03:26 -0000 http://perforce.freebsd.org/chv.cgi?CH=167462 Change 167462 by julian@julian-mac on 2009/08/18 07:03:06 Wordsmithing. Spelling errors. Affected files ... .. //depot/projects/vimage/porting_to_vimage.txt#16 edit Differences ... ==== //depot/projects/vimage/porting_to_vimage.txt#16 (text+ko) ==== @@ -1,4 +1,4 @@ -July 7 2009 +August 17 2009 Julian Elischer =================== @@ -9,18 +9,18 @@ to operate on multiple independent instances of its state so that it can participate in a virtual machine / virtual environment scenario. It refers to a part of the Jail infrastructure in FreeBSD. For historical reasons -Virtual network stack enabled jails are known a vimage enabled jails or vnet -enabled jails. The currently correct term is the latter, which is a -contraction of the first. In the future other parts of the system may be -virtualized using the same technology and the term to cover all such components -would be VIMAGE enhanced modules. +"Virtual network stack enabled jails"(1) are also known as "vimage enabled +jails"(2) or "vnet enabled jails"(3). The currently correct term is the +latter, which is a contraction of the first. In the future other parts of +the system may be virtualized using the same technology and the term to +cover all such components would be VIMAGE enhanced modules. The implementation approach taken by the vimage framework is a redefinition of selected global state variables to evaluate to constructs that allow for the virtualized state to be stored and resolved in appropriate instances of 'jail' specific container storage regions. The code operating on virtualized -state has to conform to a set of rules described further below, among other -things in order to allow for all the changes to be conditionally compilable, +state has to conform to a set of rules described further below. Among other +things in order to allow for all the changes to be conditionally compilable. i.e. permitting the virtualized code to fall back to operation on global state. The rest of this document will discuss NETWORK virtualization @@ -36,7 +36,7 @@ Prepending of "V_" prefixes to variable references helps in visual discrimination between global and virtualized state. It is also possible to use an alternative syntax, of VNET(foo_bar) to -achieve the same thing. The deveopers felt that V_foo_bar was less +achieve the same thing. The developers felt that V_foo_bar was less visually distracting while still providing enough clues to the reader that the variable is virtualized. In fact the V_foo_bar macro is locally defined near the definition of foo_bar to be an alias for @@ -55,7 +55,7 @@ a jail, usually the default (null) jail, and jails currently hang off of a processes ucred. This relationship defines a process's administrative affinity to a vnet and thus indirectly to all of its state. All network -interfaces and sockets hold pointers back to their parent vnets. +interfaces and sockets hold pointers back to their associated vnets. This relationship is obviously entirely independent from proc->ucred->jail bindings. Hence, when a process opens a socket, the socket will get bound to a vnet instance hanging off of proc->ucred->jail->vnet, but once such a @@ -105,7 +105,7 @@ There are several steps need in virtualisation. -1/ decide whether the module needs to be virtualised. +1/ Decide whether the module needs to be virtualised. If the module is a driver for specific hardware, it makes sense that there be only one instance of the driver as there is only one piece of @@ -148,7 +148,7 @@ * Sometimes an action (timer trigerred or trigerred by module load or unload simply has to check all the vimage or module instances. There are macro (pairs) for this which will iterate through all the - VNET or instances. + VNET or instances. (see sample code below). This covers most of the cases, however in some cases it may still be required for the module to stash away the virtual environment instance @@ -164,14 +164,15 @@ any of these are called. The modevent handler may veto load or teardown. On Shutdown, only the modevent handler is called so it may have to simulate the calling of the other handlers if clean shutdown is a requirement - of your module. (see sample code below). + of your module. (see sample code below). Don't forget to unregister + event handlers, and destroy locks and condition variables. -6/ Add the code described below to the files that make up the module +6/ Add the code described below to the files that make up the module. Details: (VNET implementation details) Firstly the file must be included. Depending on what -code you use you may find you also need one of more of: , +code you use you may find you also need one or more of: , and . These requirements may change slightly as the ABI settles. @@ -187,11 +188,15 @@ static VNET_DEFINE(int, foo) = 3; VNET_DEFINE(struct bar, thebar) = { 1,2,3 }; +extern int foo; +in an include file might become: +VNET_DECLARE(int foo); + Normal rules regarding 'static/extern' apply. The initial values that you -give in this way will be stored and used as teh initial values for +give in this way will be stored and used as the initial values for EACH NEW INSTANCE of these variables as new jails/vnets are created. -As mentioned above, accesses to virtualised symbols are achieved via macros, +As mentioned above, accesses to virtualized symbols are achieved via macros, which generally are of the same name as the original symbol but with a "V_" prepended, thus the head of the interface list, called 'ifnet' is replaced whereever used with "V_ifnet". We do this, by adding the following @@ -314,7 +319,9 @@ case MOD_SHUTDOWN: /* * this is called once but you may want to shut down - * things in each jail. + * things in each jail, or something global. + * In that case it's up to us to simulate the SYSUNINIT() + * or the VNET_SYSUNINIT() */ { VNET_ITERATOR_DECL(vnet_iter); @@ -327,7 +334,7 @@ VNET_LIST_RUNLOCK(); } /* you may need to shutdown something global. */ - mymod_destroy(); + mymod_uninit(); break; default: @@ -343,13 +350,13 @@ 0 }; -#define MYMOD_MAJOR_ORDER SI_SUB_PROTO_IFATTACHDOMAIN /* for example */ +#define MYMOD_MAJOR_ORDER SI_SUB_PROTO_BEGIN /* for example */ #define MYMOD_MODULE_ORDER (SI_ORDER_ANY + 64) /* not fussy */ #define MYMOD_SYSINIT_ORDER (MYMOD_MODULE_ORDER + 1) /* a bit later */ -#define MYMOD_VNET_ORDER (MYMOD_SYSINIT_ORDER + 1 ) /* later still */ +#define MYMOD_VNET_ORDER (MYMOD_MODULE_ORDER + 2) /* later still */ DECLARE_MODULE(mymod, mymodmod, MYMOD_MAJOR_ORDER, MYMOD_MODULE_ORDER); -MODULE_DEPEND(dummynet, ipfw, 2, 2, 2); +MODULE_DEPEND(mymod, ipfw, 2, 2, 2); /* depend on ipfw version (exactly) 2 */ MODULE_VERSION(mymod, 1); SYSINIT(mymod_init, MYMOD_MAJOR_ORDER, MYMOD_SYSINIT_ORDER, @@ -368,12 +375,12 @@ On BOOT, the order of evaluation will be: In a NON-VIMAGE kernel where the module is compiled: MODEVENT, SYSINIT and VNET_SYSINIT both runm with order defined by their - order declarations. {good foot shooting aterial if you get it wrong!} + order declarations. {good foot shooting material if you get it wrong!} - In a VIMAGE kernel where the module is compiled: - MODEVNET, SYSINIT and VNET_SYSINIT both runm with order defined by their - order declarations. AND in addition, the VNET_SYSINIT being - repeated once for every new jail/vnet. + In a VIMAGE kernel where the module is compiled in: + MODEVNET, SYSINIT and VNET_SYSINIT all run with order defined by their + order declarations. AND in addition, the VNET_SYSINIT is + repeated once for every existing or new jail/vnet. On loading a vnet enabled kernel module after boot: MODEVENT("event = load"); @@ -382,12 +389,12 @@ AND in addition, VNET_SYSINIT being called for each new jail created. On unloading of module: - MODEVENT("event = quiesce") - MODEVENT("event = unload") + MODEVENT("event = MOD_QUIESCE") + MODEVENT("event = MOD_UNLOAD") VNET_SYSUNINIT called for every jail/vnet SYSUNINIT -On system shutdown: [check/fix this ] +On system shutdown: MODEVENT(shutdown) NOTICE that while the order of the SYSINIT and VNET_SYSINIT is reversed from @@ -396,14 +403,14 @@ things which are order dependent using MODEVENTs. Or, put another way, -Since MODEVENT is called first during l module load, it would, by the +Since MODEVENT is called first during module load, it would, by the assumption that everything is reversed, be easy to assume that MODEVENT is called AFTER the SYSINITS during unload. This is in fact not the case. (and I have the scars to prove it). It might be make some sense if the "QUIESCE" was called before the SYSINIT/SYSUNINIT and the UNLOAD called after.. with a millisecond -sleep between them, but this is not the case either. +sleep between them, but this is not the case either. Since initial values are copied into the virtualized variables on each new instantiatin, it is quite possible to have modules for which @@ -412,6 +419,7 @@ Sometimes there is a need to iterate through the vnets. See the modevent shutdown handler (above) for an example of how to do this. +Don't forget the locks. In the case where you are loading a new protocol, or domain (protocol family) there are some "shortcuts" that are in place to allow you to maintain a bit @@ -434,9 +442,14 @@ teardown.) In this case one needs to be absolutely sure that both your domain and protocol initializers can be called multiple times, once for each vnet. One can still add SYSINITs for once only initialization, -or use the modevent handler +or use the modevent handler. I prefer to do as much explicitly +in the SYSINITS and VNET_SYSINITS as then you have no surprises. finally: The command to make a new jail with a new vnet: jail -c host.hostname=test path=/ vnet command=/bin/tcsh +jail -c host.hostname=test path=/ children.max=4 vnet command=/bin/tcsh +(children.max allows hierarchical jail creation). +Note that the command must come last. +