From owner-freebsd-virtualization@FreeBSD.ORG Mon May 9 11:07:18 2011 Return-Path: Delivered-To: freebsd-virtualization@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 110E610656B2 for ; Mon, 9 May 2011 11:07:18 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id F3F6D8FC2B for ; Mon, 9 May 2011 11:07:17 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p49B7H16070793 for ; Mon, 9 May 2011 11:07:17 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p49B7Hqd070791 for freebsd-virtualization@FreeBSD.org; Mon, 9 May 2011 11:07:17 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 9 May 2011 11:07:17 GMT Message-Id: <201105091107.p49B7Hqd070791@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-virtualization@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-virtualization@FreeBSD.org X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2011 11:07:18 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- a kern/152047 virtualization[vimage] [panic] TUN\TAP under jail with vimage crashe o kern/148155 virtualization[vimage] Kernel panic with PF/IPFilter + VIMAGE kernel a kern/147950 virtualization[vimage] [carp] VIMAGE + CARP = kernel crash s kern/143808 virtualization[pf] pf does not work inside jail a kern/141696 virtualization[rum] [panic] rum(4)+ vimage = kernel panic 5 problems total. From owner-freebsd-virtualization@FreeBSD.ORG Mon May 9 13:13:13 2011 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B9FB7106564A for ; Mon, 9 May 2011 13:13:13 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 297308FC18 for ; Mon, 9 May 2011 13:13:12 +0000 (UTC) Received: by bwz12 with SMTP id 12so5870516bwz.13 for ; Mon, 09 May 2011 06:13:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:sender:date:message-id :user-agent:mime-version:content-type; bh=+RZmCLmxPoXrY7GeQKlF4d5WPdLiBcE4BjYuIMqZcd4=; b=Mv7lS+wsgRGELGNwqjyVFu98JyPDHh7C0UP++JDQyFH4l9o1VGk4GxYQt6B5o5TSj9 Z2kt5CBpK1hZXYWe1s5G9zb4MjVHxiHMUfbik/gkELath+WQy7jdj3K1ebB6JFsLIkDo P0HR6vYG6eOUtt3K0WSX1QmjOVbTZ27sQXtUo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:sender:date:message-id:user-agent:mime-version :content-type; b=xuX3lGg6bftpKvkCN5lLZXMn+jAVFuO3AUaSPyrvT8+Z7w5wxwEgC3u0sFzOqThQZ1 XTG+sosFkJxY7axfEH1RWWwrFYrAXCQarqOwfAXsFD266sGqvcs7N0TiHF3dCjuUL+Rh 1f9aK9FxW8PzofkGAVXY/F776wPfyNbOBfT1A= Received: by 10.204.133.27 with SMTP id d27mr2382195bkt.69.1304945309308; Mon, 09 May 2011 05:48:29 -0700 (PDT) Received: from localhost ([95.69.172.154]) by mx.google.com with ESMTPS id d25sm1163866bkd.5.2011.05.09.05.48.26 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 09 May 2011 05:48:27 -0700 (PDT) From: Mikolaj Golub To: freebsd-virtualization@FreeBSD.org Sender: Mikolaj Golub Date: Mon, 09 May 2011 15:48:25 +0300 Message-ID: <86aaewdopy.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Cc: Kostik Belousov , Pawel Jakub Dawidek Subject: vnet: acessing module's virtualized global variables from another module X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2011 13:13:13 -0000 --=-=-= Hi, Trying ipfw_nat under VIMAGE kernel I got this panic on the module load: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x4 fault code = supervisor read, page not present instruction pointer = 0x20:0xc09f098e stack pointer = 0x28:0xf563b944 frame pointer = 0x28:0xf563b998 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 4264 (kldload) witness_checkorder(c6d5e91c,9,ca0ac2e3,223,0,...) at witness_checkorder+0x6e _rw_wlock(c6d5e91c,ca0ac2e3,223,0,c0e8f795,...) at _rw_wlock+0x82 ipfw_nat_modevent(c98a48c0,0,0,75,0,...) at ipfw_nat_modevent+0x41 module_register_init(ca0ad508,0,c0e8d834,e6,0,...) at module_register_init+0xa7 linker_load_module(0,f563bc18,c0e8d834,3fc,f563bc28,...) at linker_load_module+0xa05 kern_kldload(c86835c0,c72d3400,f563bc40,0,c8d0d000,...) at kern_kldload+0x133 kldload(c86835c0,f563bcec,c09e8940,c86835c0,0,...) at kldload+0x74 syscallenter(c86835c0,f563bce4,c0ce05dd,c1022150,0,...) at syscallenter+0x263 syscall(f563bd28) at syscall+0x34 Xint0x80_syscall() at Xint0x80_syscall+0x21 --- syscall (304, FreeBSD ELF32, kldload), eip = 0x280da00b, esp = 0xbfbfe79c, ebp = 0xbfbfec88 - It crashed on acessing data from virtualized global variable V_layer3_chain in ipfw_nat_modevent(). V_layer3_chain is defined in ipfw module and it turns out that &V_layer3_chain returns wrong location from anywhere but ipfw.ko. May be this is a known issue, but I have not found info about this, so below are details of investigation why this happens. Virtualized global variables are defined using the VNET_DEFINE() macro, which places them in the 'set_vnet' linker set (in the base kernel or in module). This is used to 1) copy these "default" values to each virtual network stack instance when created; 2) act as unique global names by which the variable can be referred to. The location of a per-virtual instance variable is calculated at run-time like in the example below for layer3_chain variable in the default vnet (vnet0): vnet0->vnet_data_base + (uintptr_t) & vnet_entry_layer3_chain (1) For modules the thing is more complicated. When a module is loaded its global variables from 'set_vnet' linker set are copied to the kernel 'set_vnet', and for module to be able to access them the linker reallocates all references accordingly (kern/link_elf.c:elf_relocaddr()): if (x >= ef->vnet_start && x < ef->vnet_stop) return ((x - ef->vnet_start) + ef->vnet_base); So from inside the module the access to its virtualized variables works, but from the outside we get wrong location using calculation like above (1), because &vnet_entry_layer3_chain returns address of the variable in the module's 'set_vnet'. The workaround is to compile such modules into the kernel or use a hack I have done for ipfw_nat -- add the function to ipfw module which returns the location of virtualized layer3_chain variable and use this location instead of V_layer3_chain macro (see the attached patch). But I suppose the problem is not a new and there might be better approach already invented to deal with this? -- Mikolaj Golub --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=ipfw_nat.patch Index: sys/netinet/ipfw/ip_fw_private.h =================================================================== --- sys/netinet/ipfw/ip_fw_private.h (revision 221673) +++ sys/netinet/ipfw/ip_fw_private.h (working copy) @@ -201,6 +201,8 @@ VNET_DECLARE(int, fw_verbose); VNET_DECLARE(struct ip_fw_chain, layer3_chain); #define V_layer3_chain VNET(layer3_chain) +void* vnet_entry_addr_layer3_chain(void); +#define V_addr_layer3_chain (vnet_entry_addr_layer3_chain()) VNET_DECLARE(u_int32_t, set_disable); #define V_set_disable VNET(set_disable) Index: sys/netinet/ipfw/ip_fw_nat.c =================================================================== --- sys/netinet/ipfw/ip_fw_nat.c (revision 221673) +++ sys/netinet/ipfw/ip_fw_nat.c (working copy) @@ -62,7 +62,7 @@ ifaddr_change(void *arg __unused, struct ifnet *if struct ifaddr *ifa; struct ip_fw_chain *chain; - chain = &V_layer3_chain; + chain = V_addr_layer3_chain; IPFW_WLOCK(chain); /* Check every nat entry... */ LIST_FOREACH(ptr, &chain->nat, _next) { @@ -345,7 +345,7 @@ ipfw_nat_cfg(struct sockopt *sopt) { struct cfg_nat *cfg, *ptr; char *buf; - struct ip_fw_chain *chain = &V_layer3_chain; + struct ip_fw_chain *chain = V_addr_layer3_chain; size_t len; int gencnt, error = 0; @@ -421,7 +421,7 @@ static int ipfw_nat_del(struct sockopt *sopt) { struct cfg_nat *ptr; - struct ip_fw_chain *chain = &V_layer3_chain; + struct ip_fw_chain *chain = V_addr_layer3_chain; int i; sooptcopyin(sopt, &i, sizeof i, sizeof i); @@ -444,7 +444,7 @@ ipfw_nat_del(struct sockopt *sopt) static int ipfw_nat_get_cfg(struct sockopt *sopt) { - struct ip_fw_chain *chain = &V_layer3_chain; + struct ip_fw_chain *chain = V_addr_layer3_chain; struct cfg_nat *n; struct cfg_redir *r; struct cfg_spool *s; @@ -509,7 +509,7 @@ ipfw_nat_get_log(struct sockopt *sopt) int i, size; struct ip_fw_chain *chain; - chain = &V_layer3_chain; + chain = V_addr_layer3_chain; IPFW_RLOCK(chain); /* one pass to count, one to copy the data */ @@ -543,8 +543,11 @@ ipfw_nat_get_log(struct sockopt *sopt) static void ipfw_nat_init(void) { + struct ip_fw_chain *chain; - IPFW_WLOCK(&V_layer3_chain); + chain = V_addr_layer3_chain; + + IPFW_WLOCK(chain); /* init ipfw hooks */ ipfw_nat_ptr = ipfw_nat; lookup_nat_ptr = lookup_nat; @@ -552,7 +555,7 @@ ipfw_nat_init(void) ipfw_nat_del_ptr = ipfw_nat_del; ipfw_nat_get_cfg_ptr = ipfw_nat_get_cfg; ipfw_nat_get_log_ptr = ipfw_nat_get_log; - IPFW_WUNLOCK(&V_layer3_chain); + IPFW_WUNLOCK(chain); V_ifaddr_event_tag = EVENTHANDLER_REGISTER( ifaddr_event, ifaddr_change, NULL, EVENTHANDLER_PRI_ANY); @@ -564,7 +567,7 @@ ipfw_nat_destroy(void) struct cfg_nat *ptr, *ptr_temp; struct ip_fw_chain *chain; - chain = &V_layer3_chain; + chain = V_addr_layer3_chain; IPFW_WLOCK(chain); LIST_FOREACH_SAFE(ptr, &chain->nat, _next, ptr_temp) { LIST_REMOVE(ptr, _next); Index: sys/netinet/ipfw/ip_fw2.c =================================================================== --- sys/netinet/ipfw/ip_fw2.c (revision 221673) +++ sys/netinet/ipfw/ip_fw2.c (working copy) @@ -134,6 +134,11 @@ VNET_DEFINE(int, verbose_limit); /* layer3_chain contains the list of rules for layer 3 */ VNET_DEFINE(struct ip_fw_chain, layer3_chain); +void* +vnet_entry_addr_layer3_chain(void) +{ + return &V_layer3_chain; +} ipfw_nat_t *ipfw_nat_ptr = NULL; struct cfg_nat *(*lookup_nat_ptr)(struct nat_list *, int); --=-=-=-- From owner-freebsd-virtualization@FreeBSD.ORG Mon May 9 14:33:41 2011 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 10BD0106566C; Mon, 9 May 2011 14:33:41 +0000 (UTC) (envelope-from zec@fer.hr) Received: from munja.zvne.fer.hr (munja.zvne.fer.hr [161.53.66.248]) by mx1.freebsd.org (Postfix) with ESMTP id 936B48FC08; Mon, 9 May 2011 14:33:40 +0000 (UTC) Received: from sluga.fer.hr ([161.53.66.244]) by munja.zvne.fer.hr with Microsoft SMTPSVC(6.0.3790.4675); Mon, 9 May 2011 16:21:35 +0200 Received: from localhost ([161.53.19.8]) by sluga.fer.hr with Microsoft SMTPSVC(6.0.3790.4675); Mon, 9 May 2011 16:21:35 +0200 From: Marko Zec To: freebsd-virtualization@freebsd.org Date: Mon, 9 May 2011 16:21:15 +0200 User-Agent: KMail/1.9.10 References: <86aaewdopy.fsf@kopusha.home.net> In-Reply-To: <86aaewdopy.fsf@kopusha.home.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201105091621.16414.zec@fer.hr> X-OriginalArrivalTime: 09 May 2011 14:21:35.0404 (UTC) FILETIME=[66F166C0:01CC0E54] Cc: Mikolaj Golub , Kostik Belousov Subject: Re: vnet: acessing module's virtualized global variables from another module X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2011 14:33:41 -0000 On Monday 09 May 2011 14:48:25 Mikolaj Golub wrote: > Hi, > > Trying ipfw_nat under VIMAGE kernel I got this panic on the module load: Hi, I think the problem here is that curvnet context is not set properly on entry to ipfw_nat_modevent(). The canonical way to initialize VNET-enabled subsystems is to trigger them using VNET_SYSINIT() macros (instead of using modevent mechanisms), which in turn ensure that: a) that the initializer function gets invoked for each existing vnet b) curvnet context is set properly on entry to initializer functions and Cheers, Marko > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x4 > fault code = supervisor read, page not present > instruction pointer = 0x20:0xc09f098e > stack pointer = 0x28:0xf563b944 > frame pointer = 0x28:0xf563b998 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 4264 (kldload) > > witness_checkorder(c6d5e91c,9,ca0ac2e3,223,0,...) at > witness_checkorder+0x6e _rw_wlock(c6d5e91c,ca0ac2e3,223,0,c0e8f795,...) at > _rw_wlock+0x82 > ipfw_nat_modevent(c98a48c0,0,0,75,0,...) at ipfw_nat_modevent+0x41 > module_register_init(ca0ad508,0,c0e8d834,e6,0,...) at > module_register_init+0xa7 > linker_load_module(0,f563bc18,c0e8d834,3fc,f563bc28,...) at > linker_load_module+0xa05 > kern_kldload(c86835c0,c72d3400,f563bc40,0,c8d0d000,...) at > kern_kldload+0x133 kldload(c86835c0,f563bcec,c09e8940,c86835c0,0,...) at > kldload+0x74 syscallenter(c86835c0,f563bce4,c0ce05dd,c1022150,0,...) at > syscallenter+0x263 syscall(f563bd28) at syscall+0x34 > Xint0x80_syscall() at Xint0x80_syscall+0x21 > --- syscall (304, FreeBSD ELF32, kldload), eip = 0x280da00b, esp = > 0xbfbfe79c, ebp = 0xbfbfec88 - > > It crashed on acessing data from virtualized global variable V_layer3_chain > in ipfw_nat_modevent(). V_layer3_chain is defined in ipfw module and it > turns out that &V_layer3_chain returns wrong location from anywhere but > ipfw.ko. > > May be this is a known issue, but I have not found info about this, so > below are details of investigation why this happens. > > Virtualized global variables are defined using the VNET_DEFINE() macro, > which places them in the 'set_vnet' linker set (in the base kernel or in > module). This is used to > > 1) copy these "default" values to each virtual network stack instance when > created; > > 2) act as unique global names by which the variable can be referred to. The > location of a per-virtual instance variable is calculated at run-time like > in the example below for layer3_chain variable in the default vnet (vnet0): > > vnet0->vnet_data_base + (uintptr_t) & vnet_entry_layer3_chain (1) > > For modules the thing is more complicated. When a module is loaded its > global variables from 'set_vnet' linker set are copied to the kernel > 'set_vnet', and for module to be able to access them the linker reallocates > all references accordingly (kern/link_elf.c:elf_relocaddr()): > > if (x >= ef->vnet_start && x < ef->vnet_stop) > return ((x - ef->vnet_start) + ef->vnet_base); > > So from inside the module the access to its virtualized variables works, > but from the outside we get wrong location using calculation like above > (1), because &vnet_entry_layer3_chain returns address of the variable in > the module's 'set_vnet'. > > The workaround is to compile such modules into the kernel or use a hack I > have done for ipfw_nat -- add the function to ipfw module which returns the > location of virtualized layer3_chain variable and use this location instead > of V_layer3_chain macro (see the attached patch). > > But I suppose the problem is not a new and there might be better approach > already invented to deal with this? From owner-freebsd-virtualization@FreeBSD.ORG Mon May 9 17:05:34 2011 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BFB001065670; Mon, 9 May 2011 17:05:34 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id AD4F18FC12; Mon, 9 May 2011 17:05:33 +0000 (UTC) Received: by bwz12 with SMTP id 12so6152146bwz.13 for ; Mon, 09 May 2011 10:05:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:references:x-comment-to :sender:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=POkeOlKyoiKOEgZfj7nr/Z1Kkwd1z7biKmJQhBAGxrw=; b=h4nEjyKUZP3B8F6joZQqAewqqA7HAlDOoUcR3WyLj09dQR6WXetFVcMM/2BHYTMdKE PBoOqpSX+bsKEQ+E07AH8IcMECUvV95ny99sMC/H5CRiW7rZThQWi+wBsORq1D9v8GBh AzjQ5UMP5k2u17SshtkfvIiDJv+HGs9ADS61Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:sender:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=ekhjI6rCL935GHivXD8vAyW+4kkvlvvegbtp7ms0X3Z/6a61uH1Obl7XQfIbVB1Vmx ZAdo/l629Gi5dcYGxKdMypGMQuUSedycGZW/5f9xVqDI7ZoYiz7uXPFpZNgVJ85E7S+/ Tkl2xiPXBnJo/Xb5Twv+7kS2Jxx/NAdNjYkF4= Received: by 10.204.83.228 with SMTP id g36mr74420bkl.30.1304960732208; Mon, 09 May 2011 10:05:32 -0700 (PDT) Received: from localhost ([95.69.172.154]) by mx.google.com with ESMTPS id q25sm3791255bkk.22.2011.05.09.10.05.29 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 09 May 2011 10:05:31 -0700 (PDT) From: Mikolaj Golub To: Marko Zec References: <86aaewdopy.fsf@kopusha.home.net> <201105091621.16414.zec@fer.hr> X-Comment-To: Marko Zec Sender: Mikolaj Golub Date: Mon, 09 May 2011 20:05:28 +0300 In-Reply-To: <201105091621.16414.zec@fer.hr> (Marko Zec's message of "Mon, 9 May 2011 16:21:15 +0200") Message-ID: <864o53yfc7.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Mikolaj Golub , Kostik Belousov , freebsd-virtualization@freebsd.org Subject: Re: vnet: acessing module's virtualized global variables from another module X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2011 17:05:34 -0000 On Mon, 9 May 2011 16:21:15 +0200 Marko Zec wrote: MZ> On Monday 09 May 2011 14:48:25 Mikolaj Golub wrote: >> Hi, >> >> Trying ipfw_nat under VIMAGE kernel I got this panic on the module load: MZ> Hi, MZ> I think the problem here is that curvnet context is not set properly on entry MZ> to ipfw_nat_modevent(). The canonical way to initialize VNET-enabled MZ> subsystems is to trigger them using VNET_SYSINIT() macros (instead of using MZ> modevent mechanisms), which in turn ensure that: MZ> a) that the initializer function gets invoked for each existing vnet MZ> b) curvnet context is set properly on entry to initializer functions and hm, sorry, but I don't see how curvnet context might help here. For me this does not look like curvnet context problem or my understanding how it works completely wrong. Below is kgdb session on live VIMAGE system with ipfw.ko loaded. Let's look at some kernel virtualized variable: (kgdb) p vnet_entry_ifnet $1 = {tqh_first = 0x0, tqh_last = 0x0} (kgdb) p &vnet_entry_ifnet $2 = (struct ifnethead *) 0x8102d488 As expected the address is in kernel 'set_vnet': kopusha:/usr/src/sys% kldstat |grep kernel 1 69 0x80400000 1092700 kernel kopusha:/usr/src/sys% nm /boot/kernel/kernel |grep __start_set_vnet 8102d480 A __start_set_vnet default vnet: (kgdb) p vnet0 $3 = (struct vnet *) 0x86d9b000 Calculate ifnet location on vnet0 (a la VNET_VNET(vnet0, ifnet)): (kgdb) printf "0x%x\n", vnet0->vnet_data_base + (uintptr_t) & vnet_entry_ifnet 0x86d9c008 Access it: (kgdb) p *((struct ifnethead *)0x86d9c008) $4 = {tqh_first = 0x86da5c00, tqh_last = 0x89489c0c} (kgdb) p (*((struct ifnethead *)0x86d9c008)).tqh_first->if_dname $7 = 0x80e8b480 "usbus" (kgdb) p (*((struct ifnethead *)0x86d9c008)).tqh_first->if_vnet $8 = (struct vnet *) 0x86d9b000 Everything looks good. Now try the same with virtualized variable layer3_chain from ipfw module: (kgdb) p vnet_entry_layer3_chain $9 = {rules = 0x0, reap = 0x0, default_rule = 0x0, n_rules = 0, static_len = 0, map = 0x0, nat = {lh_first = 0x0}, tables = {0x0 }, rwmtx = {lock_object = { lo_name = 0x0, lo_flags = 0, lo_data = 0, lo_witness = 0x0}, rw_lock = 0}, uh_lock = { lock_object = {lo_name = 0x0, lo_flags = 0, lo_data = 0, lo_witness = 0x0}, rw_lock = 0}, id = 0, gencnt = 0} "master" variable looks good (initialized to zeros), what about its address? (kgdb) p &vnet_entry_layer3_chain $10 = (struct ip_fw_chain *) 0x894a5c00 It points to 'set_vnet' of the ipfw.ko: kopusha# kldstat |grep ipfw.ko 13 2 0x89495000 11000 ipfw.ko kopusha:/usr/src/sys% nm /boot/kernel/ipfw.ko |grep __start_set_vnet 00010be0 A __start_set_vnet kopusha:/usr/src/sys% printf "0x%x\n" $((0x89495000 + 0x00010be0)) 0x894a5be0 Calculate layer3_chain location on vnet0 (a la VNET_VNET(vnet0, layer3_chain)): (kgdb) printf "0x%x\n", vnet0->vnet_data_base + (uintptr_t) & vnet_entry_layer3_chain 0x8f214780 Try to read it: (kgdb) p ((struct ip_fw_chain *)0x8f214780)->rwmtx $13 = {lock_object = {lo_name = 0x0, lo_flags = 0, lo_data = 0, lo_witness = 0x0}, rw_lock = 0} (kgdb) p ((struct ip_fw_chain *)0x8f214780)->rules $14 = (struct ip_fw *) 0x6 Data looks wrong. But this is the way how this variable is acessed by ipfw_nat. I see the same in the crash image: (kgdb) where ... #11 0xc09a4882 in _rw_wlock (rw=0xc6d5e91c, file=0xca0ac2e3 "/usr/src/sys/modules/ipfw_nat/../../netinet/ipfw/ip_fw_nat.c", line=547) at /usr/src/sys/kern/kern_rwlock.c:238 #12 0xca0ab841 in ipfw_nat_modevent (mod=0xc98a48c0, type=0, unused=0x0) at /usr/src/sys/modules/ipfw_nat/../../netinet/ipfw/ip_fw_nat.c:547 note, rw=0xc6d5e91c (it crashed on it). And I get the same address doing like I did above: (kgdb) VNET_VNET vnet0 vnet_entry_layer3_chain at 0xc6d5e700 of type = struct ip_fw_chain (kgdb) p &((struct ip_fw_chain *)0xc6d5e700)->rwmtx $8 = (struct rwlock *) 0xc6d5e91c Thus ipfw_nat was in vnet0 context then. I saw crashes (in other modules) when the context was not initialised and they looked differently. Right location was 0x86d9c160 (found adding print to ipfw module, I don't know easier way): (kgdb) p ((struct ip_fw_chain *)0x86d9c160)->rwmtx $1 = {lock_object = {lo_name = 0x932ba4b3 "IPFW static rules", lo_flags = 69402624, lo_data = 0, lo_witness = 0x86d6ab30}, rw_lock = 1} (kgdb) p ((struct ip_fw_chain *)0x86d9c160)->rules $2 = (struct ip_fw *) 0x8f2d1c80 So I don't see a way how to reach module's virtualized variable from outside the module even if you are in the right vnet context. The linker, when loading the module and allocating the variable on vnet stacks in 'modspace' possesses this information and it reallocates addresses in the module and they are accessible from inside the module, but not from outside. MZ> Cheers, MZ> Marko >> Fatal trap 12: page fault while in kernel mode >> cpuid = 1; apic id = 01 >> fault virtual address = 0x4 >> fault code = supervisor read, page not present >> instruction pointer = 0x20:0xc09f098e >> stack pointer = 0x28:0xf563b944 >> frame pointer = 0x28:0xf563b998 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, def32 1, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 4264 (kldload) >> >> witness_checkorder(c6d5e91c,9,ca0ac2e3,223,0,...) at >> witness_checkorder+0x6e _rw_wlock(c6d5e91c,ca0ac2e3,223,0,c0e8f795,...) at >> _rw_wlock+0x82 >> ipfw_nat_modevent(c98a48c0,0,0,75,0,...) at ipfw_nat_modevent+0x41 >> module_register_init(ca0ad508,0,c0e8d834,e6,0,...) at >> module_register_init+0xa7 >> linker_load_module(0,f563bc18,c0e8d834,3fc,f563bc28,...) at >> linker_load_module+0xa05 >> kern_kldload(c86835c0,c72d3400,f563bc40,0,c8d0d000,...) at >> kern_kldload+0x133 kldload(c86835c0,f563bcec,c09e8940,c86835c0,0,...) at >> kldload+0x74 syscallenter(c86835c0,f563bce4,c0ce05dd,c1022150,0,...) at >> syscallenter+0x263 syscall(f563bd28) at syscall+0x34 >> Xint0x80_syscall() at Xint0x80_syscall+0x21 >> --- syscall (304, FreeBSD ELF32, kldload), eip = 0x280da00b, esp = >> 0xbfbfe79c, ebp = 0xbfbfec88 - >> >> It crashed on acessing data from virtualized global variable V_layer3_chain >> in ipfw_nat_modevent(). V_layer3_chain is defined in ipfw module and it >> turns out that &V_layer3_chain returns wrong location from anywhere but >> ipfw.ko. >> >> May be this is a known issue, but I have not found info about this, so >> below are details of investigation why this happens. >> >> Virtualized global variables are defined using the VNET_DEFINE() macro, >> which places them in the 'set_vnet' linker set (in the base kernel or in >> module). This is used to >> >> 1) copy these "default" values to each virtual network stack instance when >> created; >> >> 2) act as unique global names by which the variable can be referred to. The >> location of a per-virtual instance variable is calculated at run-time like >> in the example below for layer3_chain variable in the default vnet (vnet0): >> >> vnet0->vnet_data_base + (uintptr_t) & vnet_entry_layer3_chain (1) >> >> For modules the thing is more complicated. When a module is loaded its >> global variables from 'set_vnet' linker set are copied to the kernel >> 'set_vnet', and for module to be able to access them the linker reallocates >> all references accordingly (kern/link_elf.c:elf_relocaddr()): >> >> if (x >= ef->vnet_start && x < ef->vnet_stop) >> return ((x - ef->vnet_start) + ef->vnet_base); >> >> So from inside the module the access to its virtualized variables works, >> but from the outside we get wrong location using calculation like above >> (1), because &vnet_entry_layer3_chain returns address of the variable in >> the module's 'set_vnet'. >> >> The workaround is to compile such modules into the kernel or use a hack I >> have done for ipfw_nat -- add the function to ipfw module which returns the >> location of virtualized layer3_chain variable and use this location instead >> of V_layer3_chain macro (see the attached patch). >> >> But I suppose the problem is not a new and there might be better approach >> already invented to deal with this? -- Mikolaj Golub From owner-freebsd-virtualization@FreeBSD.ORG Mon May 9 22:11:01 2011 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E36C106566B; Mon, 9 May 2011 22:11:01 +0000 (UTC) (envelope-from zec@fer.hr) Received: from munja.zvne.fer.hr (munja.zvne.fer.hr [161.53.66.248]) by mx1.freebsd.org (Postfix) with ESMTP id 09E648FC1A; Mon, 9 May 2011 22:11:00 +0000 (UTC) Received: from sluga.fer.hr ([161.53.66.244]) by munja.zvne.fer.hr with Microsoft SMTPSVC(6.0.3790.4675); Tue, 10 May 2011 00:10:58 +0200 Received: from localhost ([161.53.19.8]) by sluga.fer.hr with Microsoft SMTPSVC(6.0.3790.4675); Tue, 10 May 2011 00:10:58 +0200 From: Marko Zec To: freebsd-virtualization@freebsd.org Date: Tue, 10 May 2011 00:10:36 +0200 User-Agent: KMail/1.9.10 References: <86aaewdopy.fsf@kopusha.home.net> <201105091621.16414.zec@fer.hr> <864o53yfc7.fsf@kopusha.home.net> In-Reply-To: <864o53yfc7.fsf@kopusha.home.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201105100010.37726.zec@fer.hr> X-OriginalArrivalTime: 09 May 2011 22:10:58.0314 (UTC) FILETIME=[F9586EA0:01CC0E95] Cc: Mikolaj Golub , Robert Watson , Bjoern Zeeb , Kostik Belousov Subject: Re: vnet: acessing module's virtualized global variables from another module X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2011 22:11:01 -0000 On Monday 09 May 2011 19:05:28 Mikolaj Golub wrote: > On Mon, 9 May 2011 16:21:15 +0200 Marko Zec wrote: > > MZ> On Monday 09 May 2011 14:48:25 Mikolaj Golub wrote: > >> Hi, > >> > >> Trying ipfw_nat under VIMAGE kernel I got this panic on the module > >> load: > > MZ> Hi, > > MZ> I think the problem here is that curvnet context is not set properly > on entry MZ> to ipfw_nat_modevent(). The canonical way to initialize > VNET-enabled MZ> subsystems is to trigger them using VNET_SYSINIT() macros > (instead of using MZ> modevent mechanisms), which in turn ensure that: > > MZ> a) that the initializer function gets invoked for each existing vnet > MZ> b) curvnet context is set properly on entry to initializer functions > and > > hm, sorry, but I don't see how curvnet context might help here. You're getting a panic in a function, i.e. in ipfw_nat_modevent() which has ipfw_nat_init() inlined into it, where you attempt to access per-vnet data without having curvnet context set. By definition that is not supposed to work on a VIMAGE kernel, so what you observe is not unexpected at all. Please set the curvnet context using VNET_SYSINIT() macros, or by hand using CURVNET_SET() / CURVNET_RESTORE(), before accesing any V_ data. Marko > For me this > does not look like curvnet context problem or my understanding how it works > completely wrong. > > Below is kgdb session on live VIMAGE system with ipfw.ko loaded. > > Let's look at some kernel virtualized variable: > > (kgdb) p vnet_entry_ifnet > $1 = {tqh_first = 0x0, tqh_last = 0x0} > (kgdb) p &vnet_entry_ifnet > $2 = (struct ifnethead *) 0x8102d488 > > As expected the address is in kernel 'set_vnet': > > kopusha:/usr/src/sys% kldstat |grep kernel > 1 69 0x80400000 1092700 kernel > kopusha:/usr/src/sys% nm /boot/kernel/kernel |grep __start_set_vnet > 8102d480 A __start_set_vnet > > default vnet: > > (kgdb) p vnet0 > $3 = (struct vnet *) 0x86d9b000 > > Calculate ifnet location on vnet0 (a la VNET_VNET(vnet0, ifnet)): > > (kgdb) printf "0x%x\n", vnet0->vnet_data_base + (uintptr_t) & > vnet_entry_ifnet 0x86d9c008 > > Access it: > > (kgdb) p *((struct ifnethead *)0x86d9c008) > $4 = {tqh_first = 0x86da5c00, tqh_last = 0x89489c0c} > (kgdb) p (*((struct ifnethead *)0x86d9c008)).tqh_first->if_dname > $7 = 0x80e8b480 "usbus" > (kgdb) p (*((struct ifnethead *)0x86d9c008)).tqh_first->if_vnet > $8 = (struct vnet *) 0x86d9b000 > > Everything looks good. Now try the same with virtualized variable > layer3_chain from ipfw module: > > (kgdb) p vnet_entry_layer3_chain > $9 = {rules = 0x0, reap = 0x0, default_rule = 0x0, n_rules = 0, static_len > = 0, map = 0x0, nat = {lh_first = 0x0}, tables = {0x0 }, > rwmtx = {lock_object = { lo_name = 0x0, lo_flags = 0, lo_data = 0, > lo_witness = 0x0}, rw_lock = 0}, uh_lock = { lock_object = {lo_name = 0x0, > lo_flags = 0, lo_data = 0, lo_witness = 0x0}, rw_lock = 0}, id = 0, gencnt > = 0} > > "master" variable looks good (initialized to zeros), what about its > address? > > (kgdb) p &vnet_entry_layer3_chain > $10 = (struct ip_fw_chain *) 0x894a5c00 > > It points to 'set_vnet' of the ipfw.ko: > > kopusha# kldstat |grep ipfw.ko > 13 2 0x89495000 11000 ipfw.ko > kopusha:/usr/src/sys% nm /boot/kernel/ipfw.ko |grep __start_set_vnet > 00010be0 A __start_set_vnet > kopusha:/usr/src/sys% printf "0x%x\n" $((0x89495000 + 0x00010be0)) > 0x894a5be0 > > Calculate layer3_chain location on vnet0 (a la VNET_VNET(vnet0, > layer3_chain)): > > (kgdb) printf "0x%x\n", vnet0->vnet_data_base + (uintptr_t) & > vnet_entry_layer3_chain 0x8f214780 > > Try to read it: > > (kgdb) p ((struct ip_fw_chain *)0x8f214780)->rwmtx > $13 = {lock_object = {lo_name = 0x0, lo_flags = 0, lo_data = 0, lo_witness > = 0x0}, rw_lock = 0} (kgdb) p ((struct ip_fw_chain *)0x8f214780)->rules > $14 = (struct ip_fw *) 0x6 > > Data looks wrong. But this is the way how this variable is acessed by > ipfw_nat. I see the same in the crash image: > > (kgdb) where > ... > #11 0xc09a4882 in _rw_wlock (rw=0xc6d5e91c, > file=0xca0ac2e3 > "/usr/src/sys/modules/ipfw_nat/../../netinet/ipfw/ip_fw_nat.c", line=547) > at /usr/src/sys/kern/kern_rwlock.c:238 > #12 0xca0ab841 in ipfw_nat_modevent (mod=0xc98a48c0, type=0, unused=0x0) > at /usr/src/sys/modules/ipfw_nat/../../netinet/ipfw/ip_fw_nat.c:547 > > note, rw=0xc6d5e91c (it crashed on it). And I get the same address doing > like I did above: > > (kgdb) VNET_VNET vnet0 vnet_entry_layer3_chain > at 0xc6d5e700 of type = struct ip_fw_chain > (kgdb) p &((struct ip_fw_chain *)0xc6d5e700)->rwmtx > $8 = (struct rwlock *) 0xc6d5e91c > > Thus ipfw_nat was in vnet0 context then. I saw crashes (in other modules) > when the context was not initialised and they looked differently. > > Right location was 0x86d9c160 (found adding print to ipfw module, I don't > know easier way): > > (kgdb) p ((struct ip_fw_chain *)0x86d9c160)->rwmtx > $1 = {lock_object = {lo_name = 0x932ba4b3 "IPFW static rules", lo_flags = > 69402624, lo_data = 0, lo_witness = 0x86d6ab30}, rw_lock = 1} > (kgdb) p ((struct ip_fw_chain *)0x86d9c160)->rules > $2 = (struct ip_fw *) 0x8f2d1c80 > > So I don't see a way how to reach module's virtualized variable from > outside the module even if you are in the right vnet context. The linker, > when loading the module and allocating the variable on vnet stacks in > 'modspace' possesses this information and it reallocates addresses in the > module and they are accessible from inside the module, but not from > outside. > > MZ> Cheers, > > MZ> Marko > > >> Fatal trap 12: page fault while in kernel mode > >> cpuid = 1; apic id = 01 > >> fault virtual address = 0x4 > >> fault code = supervisor read, page not present > >> instruction pointer = 0x20:0xc09f098e > >> stack pointer = 0x28:0xf563b944 > >> frame pointer = 0x28:0xf563b998 > >> code segment = base 0x0, limit 0xfffff, type 0x1b > >> = DPL 0, pres 1, def32 1, gran 1 > >> processor eflags = interrupt enabled, resume, IOPL = 0 > >> current process = 4264 (kldload) > >> > >> witness_checkorder(c6d5e91c,9,ca0ac2e3,223,0,...) at > >> witness_checkorder+0x6e _rw_wlock(c6d5e91c,ca0ac2e3,223,0,c0e8f795,...) > >> at _rw_wlock+0x82 > >> ipfw_nat_modevent(c98a48c0,0,0,75,0,...) at ipfw_nat_modevent+0x41 > >> module_register_init(ca0ad508,0,c0e8d834,e6,0,...) at > >> module_register_init+0xa7 > >> linker_load_module(0,f563bc18,c0e8d834,3fc,f563bc28,...) at > >> linker_load_module+0xa05 > >> kern_kldload(c86835c0,c72d3400,f563bc40,0,c8d0d000,...) at > >> kern_kldload+0x133 kldload(c86835c0,f563bcec,c09e8940,c86835c0,0,...) > >> at kldload+0x74 syscallenter(c86835c0,f563bce4,c0ce05dd,c1022150,0,...) > >> at syscallenter+0x263 syscall(f563bd28) at syscall+0x34 > >> Xint0x80_syscall() at Xint0x80_syscall+0x21 > >> --- syscall (304, FreeBSD ELF32, kldload), eip = 0x280da00b, esp = > >> 0xbfbfe79c, ebp = 0xbfbfec88 - > >> > >> It crashed on acessing data from virtualized global variable > >> V_layer3_chain in ipfw_nat_modevent(). V_layer3_chain is defined in > >> ipfw module and it turns out that &V_layer3_chain returns wrong > >> location from anywhere but ipfw.ko. > >> > >> May be this is a known issue, but I have not found info about this, so > >> below are details of investigation why this happens. > >> > >> Virtualized global variables are defined using the VNET_DEFINE() macro, > >> which places them in the 'set_vnet' linker set (in the base kernel or > >> in module). This is used to > >> > >> 1) copy these "default" values to each virtual network stack instance > >> when created; > >> > >> 2) act as unique global names by which the variable can be referred to. > >> The location of a per-virtual instance variable is calculated at > >> run-time like in the example below for layer3_chain variable in the > >> default vnet (vnet0): > >> > >> vnet0->vnet_data_base + (uintptr_t) & vnet_entry_layer3_chain > >> (1) > >> > >> For modules the thing is more complicated. When a module is loaded its > >> global variables from 'set_vnet' linker set are copied to the kernel > >> 'set_vnet', and for module to be able to access them the linker > >> reallocates all references accordingly > >> (kern/link_elf.c:elf_relocaddr()): > >> > >> if (x >= ef->vnet_start && x < ef->vnet_stop) > >> return ((x - ef->vnet_start) + ef->vnet_base); > >> > >> So from inside the module the access to its virtualized variables > >> works, but from the outside we get wrong location using calculation > >> like above (1), because &vnet_entry_layer3_chain returns address of the > >> variable in the module's 'set_vnet'. > >> > >> The workaround is to compile such modules into the kernel or use a hack > >> I have done for ipfw_nat -- add the function to ipfw module which > >> returns the location of virtualized layer3_chain variable and use this > >> location instead of V_layer3_chain macro (see the attached patch). > >> > >> But I suppose the problem is not a new and there might be better > >> approach already invented to deal with this?