From owner-freebsd-virtualization@FreeBSD.ORG Mon Sep 21 07:55:00 2009 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA58B1065679 for ; Mon, 21 Sep 2009 07:55:00 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outL.internet-mail-service.net (outl.internet-mail-service.net [216.240.47.235]) by mx1.freebsd.org (Postfix) with ESMTP id 8B5038FC1E for ; Mon, 21 Sep 2009 07:55:00 +0000 (UTC) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id C3D11961D8; Mon, 21 Sep 2009 00:55:00 -0700 (PDT) X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id 730922D6012; Mon, 21 Sep 2009 00:54:59 -0700 (PDT) Message-ID: <4AB73152.1020703@elischer.org> Date: Mon, 21 Sep 2009 00:54:59 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812) MIME-Version: 1.0 To: FreeBSD virtualization mailing list , Robert Watson , Marko Zec , "Bjoern A. Zeeb" Content-Type: multipart/mixed; boundary="------------030507070006070008040707" Cc: Subject: reducing the memory redirections needed for vnet use. X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Sep 2009 07:55:00 -0000 This is a multi-part message in MIME format. --------------030507070006070008040707 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit I've been running the folllowing set of diffs as a proof of concept change for a couple of months now. The diff is to cache the vnet address fixup constant in the pcpu area. This means that the address of V_xxx is now (%fs:pcpu_offset)+symbol (*) the cost in the machine dependent part of the scheduler is minimal and I have never been able to measure the difference in context times (though I didn't try that hard). The important partof this however is not so oonbvious. If we wanted to have per-vnet-per-cpu memory for per cpu stats collection in vnets, then this approach becomes the blueprint as to how that must be done. Obviously memory to hold all the data needs to be allocated, but once that is donem the pcpu specific section of the memory need sto be selected by teh scheduerl as teh thread moves between cpus. This patch shows where that would occur, and also, where it would be stored. In the patch the preprocessor variable VNET_PCPU_DATA should be ignored as it is just my personal way of turning the feature on and off for testing. (*) I'm not convinced that the equivalent cannot be done in a single instruction addressing mode on some cpus.. certainly I think it may have been possible on the 68000 family from memory. --------------030507070006070008040707 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="pcpu.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="pcpu.diff" Index: kern/kern_synch.c =================================================================== --- kern/kern_synch.c (revision 197380) +++ kern/kern_synch.c (working copy) @@ -65,6 +65,10 @@ #include +#if defined(VIMAGE) && defined(VNET_PCPU_DATA) +#include +#endif + #ifdef XEN #include #include @@ -447,6 +451,14 @@ PT_UPDATES_FLUSH(); #endif sched_switch(td, newtd, flags); +#if defined(VIMAGE) && defined(VNET_PCPU_DATA) + if (td->td_vnet) { + PCPU_SET(vnet_data_adj, td->td_vnet->vnet_data_base); + } else { + /* Added to a vnet symbol, gives something near 0 */ + PCPU_SET(vnet_data_adj, -VNET_START); + } +#endif KTR_STATE1(KTR_SCHED, "thread", sched_tdname(td), "running", "prio:%d", td->td_priority); Index: kern/kern_fork.c =================================================================== --- kern/kern_fork.c (revision 197380) +++ kern/kern_fork.c (working copy) @@ -540,7 +540,7 @@ td2->td_sigmask = td->td_sigmask; td2->td_flags = TDF_INMEM; -#ifdef VIMAGE +#ifdef VIMAGE /* if we ever make curvnet always valid, change this. */ td2->td_vnet = NULL; td2->td_vnet_lpush = NULL; #endif @@ -817,6 +817,10 @@ td = curthread; p = td->td_proc; +#if defined(VIMAGE) && defined(VNET_PCPU_DATA) + /* if we ever make curvnet always valid, change this. */ + PCPU_SET(vnet_data_adj, (uintptr_t)NULL); +#endif KASSERT(p->p_state == PRS_NORMAL, ("executing process is still new")); CTR4(KTR_PROC, "fork_exit: new thread %p (td_sched %p, pid %d, %s)", Index: net/vnet.c =================================================================== --- net/vnet.c (revision 197380) +++ net/vnet.c (working copy) @@ -145,15 +145,7 @@ * module will find every network stack instance with proper default values. */ -/* - * Location of the kernel's 'set_vnet' linker set. - */ -extern uintptr_t *__start_set_vnet; -extern uintptr_t *__stop_set_vnet; -#define VNET_START (uintptr_t)&__start_set_vnet -#define VNET_STOP (uintptr_t)&__stop_set_vnet - /* * Number of bytes of data in the 'set_vnet' linker set, and hence the total * size of all kernel virtualized global variables, and the malloc(9) type Index: net/vnet.h =================================================================== --- net/vnet.h (revision 197380) +++ net/vnet.h (working copy) @@ -92,6 +92,14 @@ #include /* + * Location of the kernel's 'set_vnet' linker set. + */ +extern uintptr_t *__start_set_vnet; +extern uintptr_t *__stop_set_vnet; + +#define VNET_START (uintptr_t)&__start_set_vnet +#define VNET_STOP (uintptr_t)&__stop_set_vnet +/* * Functions to allocate and destroy virtual network stacks. */ struct vnet *vnet_alloc(void); @@ -107,6 +115,28 @@ * Various macros -- get and set the current network stack, but also * assertions. */ +#ifdef VNET_PCPU_DATA + +#define VNET_PCPU_SET(a, b) PCPU_SET(a, b) + +#define CURVNET_DO_RESTORE() do { \ + if ((curvnet = saved_vnet) == NULL) { \ + PCPU_SET(vnet_data_adj, (uintptr_t)NULL); \ + } else { \ + PCPU_SET(vnet_data_adj, saved_vnet->vnet_data_base); \ + } \ +} while (0) + +#else + +#define VNET_PCPU_SET(a, b) + +#define CURVNET_DO_RESTORE() do { \ + curvnet = saved_vnet; \ +} while (0) + +#endif + #ifdef VNET_DEBUG #define VNET_ASSERT(condition) \ if (!(condition)) { \ @@ -120,34 +150,44 @@ struct vnet *saved_vnet = curvnet; \ const char *saved_vnet_lpush = curthread->td_vnet_lpush; \ curvnet = arg; \ + VNET_PCPU_SET(vnet_data_adj, curvnet->vnet_data_base); \ curthread->td_vnet_lpush = __FUNCTION__; -#define CURVNET_SET_VERBOSE(arg) \ +#define CURVNET_SET_VERBOSE(arg) \ CURVNET_SET_QUIET(arg) \ if (saved_vnet) \ - printf("CURVNET_SET(%p) in %s() on cpu %d, prev %p in %s()\n", \ + printf("CURVNET_SET(%p) in %s() on cpu %d," \ + " prev %p in %s()\n", \ curvnet, curthread->td_vnet_lpush, curcpu, \ saved_vnet, saved_vnet_lpush); #define CURVNET_SET(arg) CURVNET_SET_VERBOSE(arg) -#define CURVNET_RESTORE() \ +#define CURVNET_RESTORE() do { \ VNET_ASSERT(saved_vnet == NULL || \ saved_vnet->vnet_magic_n == VNET_MAGIC_N); \ - curvnet = saved_vnet; \ - curthread->td_vnet_lpush = saved_vnet_lpush; + CURVNET_DO_RESTORE(); \ + curthread->td_vnet_lpush = saved_vnet_lpush; \ +} while (0) + #else /* !VNET_DEBUG */ + #define VNET_ASSERT(condition) #define CURVNET_SET(arg) \ - struct vnet *saved_vnet = curvnet; \ - curvnet = arg; +struct vnet *saved_vnet = curvnet; \ + do { \ + curvnet = arg; \ + VNET_PCPU_SET(vnet_data_adj, curvnet->vnet_data_base); \ +} while (0) #define CURVNET_SET_VERBOSE(arg) CURVNET_SET(arg) #define CURVNET_SET_QUIET(arg) CURVNET_SET(arg) -#define CURVNET_RESTORE() \ - curvnet = saved_vnet; +#define CURVNET_RESTORE() do { \ + CURVNET_DO_RESTORE(); \ +} while (0) + #endif /* VNET_DEBUG */ extern struct vnet *vnet0; @@ -205,8 +245,13 @@ #define VNET_VNET_PTR(vnet, n) _VNET_PTR((vnet)->vnet_data_base, n) #define VNET_VNET(vnet, n) (*VNET_VNET_PTR((vnet), n)) +#ifdef VNET_PCPU_DATA +#define VNET(n) _VNET(PCPU_GET(vnet_data_adj), n) +#define VNET_PTR(n) _VNET_PTR(PCPU_GET(vnet_data_adj), n) +#else #define VNET_PTR(n) VNET_VNET_PTR(curvnet, n) #define VNET(n) VNET_VNET(curvnet, n) +#endif /* * Virtual network stack allocator interfaces from the kernel linker. @@ -324,6 +369,7 @@ #define CURVNET_SET(arg) #define CURVNET_SET_QUIET(arg) #define CURVNET_RESTORE() +#define VNET_PCPU_SET(a, b) #define VNET_LIST_RLOCK() #define VNET_LIST_RLOCK_NOSLEEP() Index: ddb/db_sym.c =================================================================== --- ddb/db_sym.c (revision 197380) +++ ddb/db_sym.c (working copy) @@ -64,12 +64,6 @@ static int db_cpu = -1; #ifdef VIMAGE -extern uintptr_t *__start_set_vnet; -extern uintptr_t *__stop_set_vnet; - -#define VNET_START (uintptr_t)&__start_set_vnet -#define VNET_STOP (uintptr_t)&__stop_set_vnet - static void *db_vnet = NULL; #endif Index: sys/pcpu.h =================================================================== --- sys/pcpu.h (revision 197380) +++ sys/pcpu.h (working copy) @@ -161,6 +161,11 @@ */ uintptr_t pc_dynamic; +#define VNET_PCPU_DATA 1 /* also in vnet.h */ +#if defined(VIMAGE) && defined(VNET_PCPU_DATA) /* maybe always here? */ + uintptr_t pc_vnet_data_adj; /* offset to per vnet data area */ +#endif + /* * Keep MD fields last, so that CPU-specific variations on a * single architecture don't result in offset variations of --------------030507070006070008040707-- From owner-freebsd-virtualization@FreeBSD.ORG Mon Sep 21 07:58:56 2009 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B83CA1065672 for ; Mon, 21 Sep 2009 07:58:56 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outD.internet-mail-service.net (outd.internet-mail-service.net [216.240.47.227]) by mx1.freebsd.org (Postfix) with ESMTP id 5FA8E8FC12 for ; Mon, 21 Sep 2009 07:58:56 +0000 (UTC) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id A8BCDAE083; Mon, 21 Sep 2009 00:59:42 -0700 (PDT) X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id E6B582D6013; Mon, 21 Sep 2009 00:58:54 -0700 (PDT) Message-ID: <4AB7323E.8070809@elischer.org> Date: Mon, 21 Sep 2009 00:58:54 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812) MIME-Version: 1.0 To: FreeBSD virtualization mailing list , Marko Zec , "rw >> Robert Watson" Content-Type: multipart/mixed; boundary="------------040707000408090009040208" Cc: Subject: virtualizing pfil X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Sep 2009 07:58:56 -0000 This is a multi-part message in MIME format. --------------040707000408090009040208 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit here is a patch that effectively make pfil a per-vnet feature, this is needed because some features (notibly ipfw) are enabled and disabled by connecting and disconnecing from the network stack using pfil. --------------040707000408090009040208 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="pfil.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="pfil.diff" Index: netinet/raw_ip.c =================================================================== --- netinet/raw_ip.c (revision 197375) +++ netinet/raw_ip.c (working copy) @@ -84,9 +84,9 @@ * The data hooks are not used here but it is convenient * to keep them all in one place. */ -int (*ip_fw_ctl_ptr)(struct sockopt *) = NULL; +VNET_DEFINE(ip_fw_chk_ptr_t, ip_fw_chk_ptr) = NULL; +VNET_DEFINE(ip_fw_ctl_ptr_t, ip_fw_ctl_ptr) = NULL; int (*ip_dn_ctl_ptr)(struct sockopt *) = NULL; -int (*ip_fw_chk_ptr)(struct ip_fw_args *args) = NULL; int (*ip_dn_io_ptr)(struct mbuf **m, int dir, struct ip_fw_args *fwa) = NULL; /* @@ -523,8 +523,8 @@ case IP_FW_TABLE_LIST: case IP_FW_NAT_GET_CONFIG: case IP_FW_NAT_GET_LOG: - if (ip_fw_ctl_ptr != NULL) - error = ip_fw_ctl_ptr(sopt); + if (V_ip_fw_ctl_ptr != NULL) + error = V_ip_fw_ctl_ptr(sopt); else error = ENOPROTOOPT; break; @@ -584,8 +584,8 @@ case IP_FW_TABLE_FLUSH: case IP_FW_NAT_CFG: case IP_FW_NAT_DEL: - if (ip_fw_ctl_ptr != NULL) - error = ip_fw_ctl_ptr(sopt); + if (V_ip_fw_ctl_ptr != NULL) + error = V_ip_fw_ctl_ptr(sopt); else error = ENOPROTOOPT; break; Index: netinet/ip_output.c =================================================================== --- netinet/ip_output.c (revision 197375) +++ netinet/ip_output.c (working copy) @@ -489,12 +489,12 @@ #endif /* IPSEC */ /* Jump over all PFIL processing if hooks are not active. */ - if (!PFIL_HOOKED(&inet_pfil_hook)) + if (!PFIL_HOOKED(&V_inet_pfil_hook)) goto passout; /* Run through list of hooks for output packets. */ odst.s_addr = ip->ip_dst.s_addr; - error = pfil_run_hooks(&inet_pfil_hook, &m, ifp, PFIL_OUT, inp); + error = pfil_run_hooks(&V_inet_pfil_hook, &m, ifp, PFIL_OUT, inp); if (error != 0 || m == NULL) goto done; Index: netinet/ipfw/ip_fw_pfil.c =================================================================== --- netinet/ipfw/ip_fw_pfil.c (revision 197375) +++ netinet/ipfw/ip_fw_pfil.c (working copy) @@ -515,42 +515,54 @@ int ipfw_chg_hook(SYSCTL_HANDLER_ARGS) { - int enable = *(int *)arg1; + int enable; + int oldenable; int error; -#ifdef VIMAGE /* Since enabling is global, only let base do it. */ - if (! IS_DEFAULT_VNET(curvnet)) - return (EPERM); + if (arg1 == &VNET_NAME(fw_enable)) { + enable = V_fw_enable; + } +#ifdef INET6 + else if (arg1 == &VNET_NAME(fw6_enable)) { + enable = V_fw6_enable; + } #endif + else + return (EINVAL); + + oldenable = enable; + error = sysctl_handle_int(oidp, &enable, 0, req); + if (error) return (error); enable = (enable) ? 1 : 0; - if (enable == *(int *)arg1) + if (enable == oldenable) return (0); - if (arg1 == &V_fw_enable) { + if (arg1 == &VNET_NAME(fw_enable)) { if (enable) error = ipfw_hook(); else error = ipfw_unhook(); + if (error) + return (error); + V_fw_enable = enable; } #ifdef INET6 - if (arg1 == &V_fw6_enable) { + else if (arg1 == &VNET_NAME(fw6_enable)) { if (enable) error = ipfw6_hook(); else error = ipfw6_unhook(); + if (error) + return (error); + V_fw6_enable = enable; } #endif - if (error) - return (error); - - *(int *)arg1 = enable; - return (0); } Index: netinet/ipfw/ip_fw2.c =================================================================== --- netinet/ipfw/ip_fw2.c (revision 197375) +++ netinet/ipfw/ip_fw2.c (working copy) @@ -2495,6 +2495,10 @@ } IPFW_RLOCK(chain); + if (! V_ipfw_vnet_ready) { /* shutting down, leave NOW. */ + IPFW_RUNLOCK(chain); + return (IP_FW_PASS); /* accept */ + } mtag = m_tag_find(m, PACKET_TAG_DIVERT, NULL); if (args->rule) { /* @@ -4637,29 +4641,21 @@ printf("limited to %d packets/entry by default\n", V_verbose_limit); - /* - * Hook us up to pfil. - * Eventually pfil will be per vnet. - */ - if ((error = ipfw_hook()) != 0) { - printf("ipfw_hook() error\n"); - return (error); - } -#ifdef INET6 - if ((error = ipfw6_hook()) != 0) { - printf("ipfw6_hook() error\n"); - return (error); - } -#endif - /* - * Other things that are only done the first time. - * (now that we are guaranteed of success). - */ - ip_fw_ctl_ptr = ipfw_ctl; - ip_fw_chk_ptr = ipfw_chk; return (error); } +/********************** + * Called for the removal of the last instance only on module unload. + */ +static void +ipfw_destroy(void) +{ + + uma_zdestroy(ipfw_dyn_rule_zone); + IPFW_DYN_LOCK_DESTROY(); + printf("IP firewall unloaded\n"); +} + /**************** * Stuff that must be initialized for every instance * (including the first of course). @@ -4743,19 +4739,30 @@ /* First set up some values that are compile time options */ V_ipfw_vnet_ready = 1; /* Open for business */ - return (0); -} -/********************** - * Called for the removal of the last instance only on module unload. - */ -static void -ipfw_destroy(void) -{ + /* Hook up the raw inputs */ + V_ip_fw_ctl_ptr = ipfw_ctl; + V_ip_fw_chk_ptr = ipfw_chk; - uma_zdestroy(ipfw_dyn_rule_zone); - IPFW_DYN_LOCK_DESTROY(); - printf("IP firewall unloaded\n"); + /* + * Hook us up to pfil. + */ + if (V_fw_enable) { + if ((error = ipfw_hook()) != 0) { + printf("ipfw_hook() error\n"); + return (error); + } + } +#ifdef INET6 + if (V_fw6_enable) { + if ((error = ipfw6_hook()) != 0) { + printf("ipfw6_hook() error\n"); + /* XXX should we unhook everything else? */ + return (error); + } + } +#endif + return (0); } /*********************** @@ -4767,9 +4774,18 @@ struct ip_fw *reap; V_ipfw_vnet_ready = 0; /* tell new callers to go away */ - callout_drain(&V_ipfw_timeout); + ipfw_unhook(); +#ifdef INET6 + ipfw6_unhook(); +#endif + /* layer2 and other entrypoints still come in this way. */ + V_ip_fw_chk_ptr = NULL; + V_ip_fw_ctl_ptr = NULL; + IPFW_WLOCK(&V_layer3_chain); /* We wait on the wlock here until the last user leaves */ + IPFW_WUNLOCK(&V_layer3_chain); IPFW_WLOCK(&V_layer3_chain); + callout_drain(&V_ipfw_timeout); flush_tables(&V_layer3_chain); V_layer3_chain.reap = NULL; free_chain(&V_layer3_chain, 1 /* kill default rule */); @@ -4803,21 +4819,10 @@ /* Called once at module load or * system boot if compiled in. */ break; + case MOD_QUIESCE: + /* Called before unload. May veto unloading. */ + break; case MOD_UNLOAD: - break; - case MOD_QUIESCE: - /* Yes, the unhooks can return errors, we can safely ignore - * them. Eventually these will be done per jail as they - * shut down. We will wait on each vnet's l3 lock as existing - * callers go away. - */ - ipfw_unhook(); -#ifdef INET6 - ipfw6_unhook(); -#endif - /* layer2 and other entrypoints still come in this way. */ - ip_fw_chk_ptr = NULL; - ip_fw_ctl_ptr = NULL; /* Called during unload. */ break; case MOD_SHUTDOWN: @@ -4866,4 +4871,3 @@ VNET_SYSUNINIT(vnet_ipfw_uninit, IPFW_SI_SUB_FIREWALL, IPFW_VNET_ORDER, vnet_ipfw_uninit, NULL); - Index: netinet/ip_input.c =================================================================== --- netinet/ip_input.c (revision 197375) +++ netinet/ip_input.c (working copy) @@ -170,7 +170,7 @@ &VNET_NAME(ip_checkinterface), 0, "Verify packet arrives on correct interface"); -struct pfil_head inet_pfil_hook; /* Packet filter hooks */ +VNET_DEFINE(struct pfil_head, inet_pfil_hook); /* Packet filter hooks */ static struct netisr_handler ip_nh = { .nh_name = "ip", @@ -318,6 +318,13 @@ NULL, UMA_ALIGN_PTR, 0); maxnipq_update(); + /* Initialize packet filter hooks. */ + V_inet_pfil_hook.ph_type = PFIL_TYPE_AF; + V_inet_pfil_hook.ph_af = AF_INET; + if ((i = pfil_head_register(&V_inet_pfil_hook)) != 0) + printf("%s: WARNING: unable to register pfil hook, " + "error %d\n", __func__, i); + #ifdef FLOWTABLE TUNABLE_INT_FETCH("net.inet.ip.output_flowtable_size", &V_ip_output_flowtable_size); @@ -348,13 +355,6 @@ ip_protox[pr->pr_protocol] = pr - inetsw; } - /* Initialize packet filter hooks. */ - inet_pfil_hook.ph_type = PFIL_TYPE_AF; - inet_pfil_hook.ph_af = AF_INET; - if ((i = pfil_head_register(&inet_pfil_hook)) != 0) - printf("%s: WARNING: unable to register pfil hook, " - "error %d\n", __func__, i); - /* Start ipport_tick. */ callout_init(&ipport_tick_callout, CALLOUT_MPSAFE); callout_reset(&ipport_tick_callout, 1, ipport_tick, NULL); @@ -510,11 +510,11 @@ */ /* Jump over all PFIL processing if hooks are not active. */ - if (!PFIL_HOOKED(&inet_pfil_hook)) + if (!PFIL_HOOKED(&V_inet_pfil_hook)) goto passin; odst = ip->ip_dst; - if (pfil_run_hooks(&inet_pfil_hook, &m, ifp, PFIL_IN, NULL) != 0) + if (pfil_run_hooks(&V_inet_pfil_hook, &m, ifp, PFIL_IN, NULL) != 0) return; if (m == NULL) /* consumed by filter */ return; Index: netinet/ip_var.h =================================================================== --- netinet/ip_var.h (revision 197375) +++ netinet/ip_var.h (working copy) @@ -244,14 +244,20 @@ extern void (*ip_rsvp_force_done)(struct socket *); extern void (*rsvp_input_p)(struct mbuf *m, int off); -extern struct pfil_head inet_pfil_hook; /* packet filter hooks */ +VNET_DECLARE(struct pfil_head, inet_pfil_hook); /* packet filter hooks */ +#define V_inet_pfil_hook VNET(inet_pfil_hook) void in_delayed_cksum(struct mbuf *m); /* ipfw and dummynet hooks. Most are declared in raw_ip.c */ struct ip_fw_args; -extern int (*ip_fw_chk_ptr)(struct ip_fw_args *args); -extern int (*ip_fw_ctl_ptr)(struct sockopt *); +typedef int (*ip_fw_chk_ptr_t)(struct ip_fw_args *args); +typedef int (*ip_fw_ctl_ptr_t)(struct sockopt *); +VNET_DECLARE(ip_fw_chk_ptr_t, ip_fw_chk_ptr); +VNET_DECLARE(ip_fw_ctl_ptr_t, ip_fw_ctl_ptr); +#define V_ip_fw_chk_ptr VNET(ip_fw_chk_ptr) +#define V_ip_fw_ctl_ptr VNET(ip_fw_ctl_ptr) + extern int (*ip_dn_ctl_ptr)(struct sockopt *); extern int (*ip_dn_io_ptr)(struct mbuf **m, int dir, struct ip_fw_args *fwa); extern void (*ip_dn_ruledel_ptr)(void *); /* in ip_fw2.c */ Index: netinet/ip_fastfwd.c =================================================================== --- netinet/ip_fastfwd.c (revision 197375) +++ netinet/ip_fastfwd.c (working copy) @@ -351,10 +351,11 @@ /* * Run through list of ipfilter hooks for input packets */ - if (!PFIL_HOOKED(&inet_pfil_hook)) + if (!PFIL_HOOKED(&V_inet_pfil_hook)) goto passin; - if (pfil_run_hooks(&inet_pfil_hook, &m, m->m_pkthdr.rcvif, PFIL_IN, NULL) || + if (pfil_run_hooks( + &V_inet_pfil_hook, &m, m->m_pkthdr.rcvif, PFIL_IN, NULL) || m == NULL) goto drop; @@ -438,10 +439,10 @@ /* * Run through list of hooks for output packets. */ - if (!PFIL_HOOKED(&inet_pfil_hook)) + if (!PFIL_HOOKED(&V_inet_pfil_hook)) goto passout; - if (pfil_run_hooks(&inet_pfil_hook, &m, ifp, PFIL_OUT, NULL) || m == NULL) { + if (pfil_run_hooks(&V_inet_pfil_hook, &m, ifp, PFIL_OUT, NULL) || m == NULL) { goto drop; } Index: netgraph/ng_bridge.c =================================================================== --- netgraph/ng_bridge.c (revision 197375) +++ netgraph/ng_bridge.c (working copy) @@ -634,7 +634,7 @@ /* Run packet through ipfw processing, if enabled */ #if 0 - if (priv->conf.ipfw[linkNum] && V_fw_enable && ip_fw_chk_ptr != NULL) { + if (priv->conf.ipfw[linkNum] && V_fw_enable && V_ip_fw_chk_ptr != NULL) { /* XXX not implemented yet */ } #endif Index: net/if_bridge.c =================================================================== --- net/if_bridge.c (revision 197375) +++ net/if_bridge.c (working copy) @@ -109,6 +109,7 @@ #include #include #include +#include #include /* for struct arpcom */ #include @@ -1800,9 +1801,9 @@ return; } - if (PFIL_HOOKED(&inet_pfil_hook) + if (PFIL_HOOKED(&V_inet_pfil_hook) #ifdef INET6 - || PFIL_HOOKED(&inet6_pfil_hook) + || PFIL_HOOKED(&V_inet6_pfil_hook) #endif ) { if (bridge_pfil(&m, sc->sc_ifp, ifp, PFIL_OUT) != 0) @@ -2062,9 +2063,9 @@ ETHER_BPF_MTAP(ifp, m); /* run the packet filter */ - if (PFIL_HOOKED(&inet_pfil_hook) + if (PFIL_HOOKED(&V_inet_pfil_hook) #ifdef INET6 - || PFIL_HOOKED(&inet6_pfil_hook) + || PFIL_HOOKED(&V_inet6_pfil_hook) #endif ) { BRIDGE_UNLOCK(sc); @@ -2102,9 +2103,9 @@ BRIDGE_UNLOCK(sc); - if (PFIL_HOOKED(&inet_pfil_hook) + if (PFIL_HOOKED(&V_inet_pfil_hook) #ifdef INET6 - || PFIL_HOOKED(&inet6_pfil_hook) + || PFIL_HOOKED(&V_inet6_pfil_hook) #endif ) { if (bridge_pfil(&m, ifp, dst_if, PFIL_OUT) != 0) @@ -2243,7 +2244,7 @@ #ifdef INET6 # define OR_PFIL_HOOKED_INET6 \ - || PFIL_HOOKED(&inet6_pfil_hook) + || PFIL_HOOKED(&V_inet6_pfil_hook) #else # define OR_PFIL_HOOKED_INET6 #endif @@ -2260,7 +2261,7 @@ iface->if_ipackets++; \ /* Filter on the physical interface. */ \ if (pfil_local_phys && \ - (PFIL_HOOKED(&inet_pfil_hook) \ + (PFIL_HOOKED(&V_inet_pfil_hook) \ OR_PFIL_HOOKED_INET6)) { \ if (bridge_pfil(&m, NULL, ifp, \ PFIL_IN) != 0 || m == NULL) { \ @@ -2349,9 +2350,9 @@ } /* Filter on the bridge interface before broadcasting */ - if (runfilt && (PFIL_HOOKED(&inet_pfil_hook) + if (runfilt && (PFIL_HOOKED(&V_inet_pfil_hook) #ifdef INET6 - || PFIL_HOOKED(&inet6_pfil_hook) + || PFIL_HOOKED(&V_inet6_pfil_hook) #endif )) { if (bridge_pfil(&m, sc->sc_ifp, NULL, PFIL_OUT) != 0) @@ -2396,9 +2397,9 @@ * pointer so we do not redundantly filter on the bridge for * each interface we broadcast on. */ - if (runfilt && (PFIL_HOOKED(&inet_pfil_hook) + if (runfilt && (PFIL_HOOKED(&V_inet_pfil_hook) #ifdef INET6 - || PFIL_HOOKED(&inet6_pfil_hook) + || PFIL_HOOKED(&V_inet6_pfil_hook) #endif )) { if (used == 0) { @@ -3037,7 +3038,7 @@ goto bad; } - if (ip_fw_chk_ptr && pfil_ipfw != 0 && dir == PFIL_OUT && ifp != NULL) { + if (V_ip_fw_chk_ptr && pfil_ipfw != 0 && dir == PFIL_OUT && ifp != NULL) { struct dn_pkt_tag *dn_tag; error = -1; @@ -3057,7 +3058,7 @@ args.next_hop = NULL; args.eh = &eh2; args.inp = NULL; /* used by ipfw uid/gid/jail rules */ - i = ip_fw_chk_ptr(&args); + i = V_ip_fw_chk_ptr(&args); *mp = args.m; if (*mp == NULL) @@ -3109,21 +3110,21 @@ * in_if -> bridge_if -> out_if */ if (pfil_bridge && dir == PFIL_OUT && bifp != NULL) - error = pfil_run_hooks(&inet_pfil_hook, mp, bifp, + error = pfil_run_hooks(&V_inet_pfil_hook, mp, bifp, dir, NULL); if (*mp == NULL || error != 0) /* filter may consume */ break; if (pfil_member && ifp != NULL) - error = pfil_run_hooks(&inet_pfil_hook, mp, ifp, + error = pfil_run_hooks(&V_inet_pfil_hook, mp, ifp, dir, NULL); if (*mp == NULL || error != 0) /* filter may consume */ break; if (pfil_bridge && dir == PFIL_IN && bifp != NULL) - error = pfil_run_hooks(&inet_pfil_hook, mp, bifp, + error = pfil_run_hooks(&V_inet_pfil_hook, mp, bifp, dir, NULL); if (*mp == NULL || error != 0) /* filter may consume */ @@ -3163,21 +3164,21 @@ #ifdef INET6 case ETHERTYPE_IPV6: if (pfil_bridge && dir == PFIL_OUT && bifp != NULL) - error = pfil_run_hooks(&inet6_pfil_hook, mp, bifp, + error = pfil_run_hooks(&V_inet6_pfil_hook, mp, bifp, dir, NULL); if (*mp == NULL || error != 0) /* filter may consume */ break; if (pfil_member && ifp != NULL) - error = pfil_run_hooks(&inet6_pfil_hook, mp, ifp, + error = pfil_run_hooks(&V_inet6_pfil_hook, mp, ifp, dir, NULL); if (*mp == NULL || error != 0) /* filter may consume */ break; if (pfil_bridge && dir == PFIL_IN && bifp != NULL) - error = pfil_run_hooks(&inet6_pfil_hook, mp, bifp, + error = pfil_run_hooks(&V_inet6_pfil_hook, mp, bifp, dir, NULL); break; #endif Index: net/if_ethersubr.c =================================================================== --- net/if_ethersubr.c (revision 197375) +++ net/if_ethersubr.c (working copy) @@ -434,7 +434,7 @@ { #if defined(INET) || defined(INET6) - if (ip_fw_chk_ptr && V_ether_ipfw != 0) { + if (V_ip_fw_chk_ptr && V_ether_ipfw != 0) { if (ether_ipfw_chk(&m, ifp, 0) == 0) { if (m) { m_freem(m); @@ -502,7 +502,7 @@ args.next_hop = NULL; /* we do not support forward yet */ args.eh = &save_eh; /* MAC header for bridged/MAC packets */ args.inp = NULL; /* used by ipfw uid/gid/jail rules */ - i = ip_fw_chk_ptr(&args); + i = V_ip_fw_chk_ptr(&args); m = args.m; if (m != NULL) { /* @@ -775,7 +775,7 @@ * Allow dummynet and/or ipfw to claim the frame. * Do not do this for PROMISC frames in case we are re-entered. */ - if (ip_fw_chk_ptr && V_ether_ipfw != 0 && !(m->m_flags & M_PROMISC)) { + if (V_ip_fw_chk_ptr && V_ether_ipfw != 0 && !(m->m_flags & M_PROMISC)) { if (ether_ipfw_chk(&m, NULL, 0) == 0) { if (m) m_freem(m); /* dropped; free mbuf chain */ Index: net/pfil.c =================================================================== --- net/pfil.c (revision 197375) +++ net/pfil.c (working copy) @@ -56,8 +56,9 @@ static int pfil_list_remove(pfil_list_t *, int (*)(void *, struct mbuf **, struct ifnet *, int, struct inpcb *), void *); -LIST_HEAD(, pfil_head) pfil_head_list = - LIST_HEAD_INITIALIZER(&pfil_head_list); +LIST_HEAD(pfilheadhead, pfil_head); +VNET_DEFINE(struct pfilheadhead, pfil_head_list); +#define V_pfil_head_list VNET(pfil_head_list) /* * pfil_run_hooks() runs the specified packet filter hooks. @@ -97,7 +98,7 @@ struct pfil_head *lph; PFIL_LIST_LOCK(); - LIST_FOREACH(lph, &pfil_head_list, ph_list) { + LIST_FOREACH(lph, &V_pfil_head_list, ph_list) { if (ph->ph_type == lph->ph_type && ph->ph_un.phu_val == lph->ph_un.phu_val) { PFIL_LIST_UNLOCK(); @@ -108,7 +109,7 @@ ph->ph_nhooks = 0; TAILQ_INIT(&ph->ph_in); TAILQ_INIT(&ph->ph_out); - LIST_INSERT_HEAD(&pfil_head_list, ph, ph_list); + LIST_INSERT_HEAD(&V_pfil_head_list, ph, ph_list); PFIL_LIST_UNLOCK(); return (0); } @@ -143,7 +144,7 @@ struct pfil_head *ph; PFIL_LIST_LOCK(); - LIST_FOREACH(ph, &pfil_head_list, ph_list) + LIST_FOREACH(ph, &V_pfil_head_list, ph_list) if (ph->ph_type == type && ph->ph_un.phu_val == val) break; PFIL_LIST_UNLOCK(); @@ -284,3 +285,45 @@ } return ENOENT; } + +/**************** + * Stuff that must be initialized for every instance + * (including the first of course). + */ +static int +vnet_pfil_init(const void *unused) +{ + LIST_INIT(&V_pfil_head_list); + return (0); +} + +/*********************** + * Called for the removal of each instance. + */ +static int +vnet_pfil_uninit(const void *unused) +{ + /* XXX should panic if list is not empty */ + return 0; +} + +/* Define startup order. */ +#define PFIL_SYSINIT_ORDER SI_SUB_PROTO_BEGIN +#define PFIL_MODEVENT_ORDER (SI_ORDER_FIRST) /* On boot slot in here. */ +#define PFIL_VNET_ORDER (PFIL_MODEVENT_ORDER + 2) /* Later still. */ + +/* + * Starting up. + * VNET_SYSINIT is called for each existing vnet and each new vnet. + */ +VNET_SYSINIT(vnet_pfil_init, PFIL_SYSINIT_ORDER, PFIL_VNET_ORDER, + vnet_pfil_init, NULL); + +/* + * Closing up shop. These are done in REVERSE ORDER, + * Not called on reboot. + * VNET_SYSUNINIT is called for each exiting vnet as it exits. + */ +VNET_SYSUNINIT(vnet_pfil_uninit, PFIL_SYSINIT_ORDER, PFIL_VNET_ORDER, + vnet_pfil_uninit, NULL); + Index: netinet6/ip6_var.h =================================================================== --- netinet6/ip6_var.h (revision 197375) +++ netinet6/ip6_var.h (working copy) @@ -358,7 +358,8 @@ #endif #define V_ip6_use_defzone VNET(ip6_use_defzone) -extern struct pfil_head inet6_pfil_hook; /* packet filter hooks */ +VNET_DECLARE (struct pfil_head, inet6_pfil_hook); /* packet filter hooks */ +#define V_inet6_pfil_hook VNET(inet6_pfil_hook) extern struct pr_usrreqs rip6_usrreqs; struct sockopt; Index: netinet6/ip6_output.c =================================================================== --- netinet6/ip6_output.c (revision 197375) +++ netinet6/ip6_output.c (working copy) @@ -805,12 +805,12 @@ } /* Jump over all PFIL processing if hooks are not active. */ - if (!PFIL_HOOKED(&inet6_pfil_hook)) + if (!PFIL_HOOKED(&V_inet6_pfil_hook)) goto passout; odst = ip6->ip6_dst; /* Run through list of hooks for output packets. */ - error = pfil_run_hooks(&inet6_pfil_hook, &m, ifp, PFIL_OUT, inp); + error = pfil_run_hooks(&V_inet6_pfil_hook, &m, ifp, PFIL_OUT, inp); if (error != 0 || m == NULL) goto done; ip6 = mtod(m, struct ip6_hdr *); Index: netinet6/ip6_forward.c =================================================================== --- netinet6/ip6_forward.c (revision 197375) +++ netinet6/ip6_forward.c (working copy) @@ -551,11 +551,11 @@ in6_clearscope(&ip6->ip6_dst); /* Jump over all PFIL processing if hooks are not active. */ - if (!PFIL_HOOKED(&inet6_pfil_hook)) + if (!PFIL_HOOKED(&V_inet6_pfil_hook)) goto pass; /* Run through list of hooks for output packets. */ - error = pfil_run_hooks(&inet6_pfil_hook, &m, rt->rt_ifp, PFIL_OUT, NULL); + error = pfil_run_hooks(&V_inet6_pfil_hook, &m, rt->rt_ifp, PFIL_OUT, NULL); if (error != 0) goto senderr; if (m == NULL) Index: netinet6/ip6_input.c =================================================================== --- netinet6/ip6_input.c (revision 197375) +++ netinet6/ip6_input.c (working copy) @@ -152,7 +152,7 @@ struct rwlock in6_ifaddr_lock; RW_SYSINIT(in6_ifaddr_lock, &in6_ifaddr_lock, "in6_ifaddr_lock"); -struct pfil_head inet6_pfil_hook; +VNET_DEFINE (struct pfil_head, inet6_pfil_hook); static void ip6_init2(void *); static struct ip6aux *ip6_setdstifaddr(struct mbuf *, struct in6_ifaddr *); @@ -247,6 +247,13 @@ V_ip6_desync_factor = arc4random() % MAX_TEMP_DESYNC_FACTOR; + /* Initialize packet filter hooks. */ + V_inet6_pfil_hook.ph_type = PFIL_TYPE_AF; + V_inet6_pfil_hook.ph_af = AF_INET6; + if ((i = pfil_head_register(&V_inet6_pfil_hook)) != 0) + printf("%s: WARNING: unable to register pfil hook, " + "error %d\n", __func__, i); + /* Skip global initialization stuff for non-default instances. */ if (!IS_DEFAULT_VNET(curvnet)) return; @@ -275,13 +282,6 @@ ip6_protox[pr->pr_protocol] = pr - inet6sw; } - /* Initialize packet filter hooks. */ - inet6_pfil_hook.ph_type = PFIL_TYPE_AF; - inet6_pfil_hook.ph_af = AF_INET6; - if ((i = pfil_head_register(&inet6_pfil_hook)) != 0) - printf("%s: WARNING: unable to register pfil hook, " - "error %d\n", __func__, i); - netisr_register(&ip6_nh); } @@ -515,10 +515,11 @@ odst = ip6->ip6_dst; /* Jump over all PFIL processing if hooks are not active. */ - if (!PFIL_HOOKED(&inet6_pfil_hook)) + if (!PFIL_HOOKED(&V_inet6_pfil_hook)) goto passin; - if (pfil_run_hooks(&inet6_pfil_hook, &m, m->m_pkthdr.rcvif, PFIL_IN, NULL)) + if (pfil_run_hooks(&V_inet6_pfil_hook, &m, + m->m_pkthdr.rcvif, PFIL_IN, NULL)) return; if (m == NULL) /* consumed by filter */ return; --------------040707000408090009040208--