From owner-freebsd-jail@FreeBSD.ORG Mon Jun 15 09:54:11 2015 Return-Path: Delivered-To: freebsd-jail@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5341FFE7; Mon, 15 Jun 2015 09:54:11 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mx1.sbone.de (mx1.sbone.de [IPv6:2a01:4f8:130:3ffc::401:25]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mx1.sbone.de", Issuer "SBone.DE" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 0CA9C8AF; Mon, 15 Jun 2015 09:54:10 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.sbone.de (mail.sbone.de [IPv6:fde9:577b:c1a9:31::2013:587]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.sbone.de (Postfix) with ESMTPS id 6418725D385E; Mon, 15 Jun 2015 09:53:58 +0000 (UTC) Received: from content-filter.sbone.de (content-filter.sbone.de [IPv6:fde9:577b:c1a9:31::2013:2742]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPS id A0B5CC77036; Mon, 15 Jun 2015 09:53:57 +0000 (UTC) X-Virus-Scanned: amavisd-new at sbone.de Received: from mail.sbone.de ([IPv6:fde9:577b:c1a9:31::2013:587]) by content-filter.sbone.de (content-filter.sbone.de [fde9:577b:c1a9:31::2013:2742]) (amavisd-new, port 10024) with ESMTP id DoU4RFTYq7x1; Mon, 15 Jun 2015 09:53:56 +0000 (UTC) Received: from [IPv6:fde9:577b:c1a9:4420:cabc:c8ff:fe8b:4fe6] (orange-tun0-ula.sbone.de [IPv6:fde9:577b:c1a9:4420:cabc:c8ff:fe8b:4fe6]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPSA id B88BAC76FD3; Mon, 15 Jun 2015 09:53:54 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) Subject: Re: How to implement jail-aware SysV IPC (with my nasty patch) From: "Bjoern A. Zeeb" In-Reply-To: Date: Mon, 15 Jun 2015 09:53:53 +0000 Cc: freebsd-jail@freebsd.org, freebsd-virtualization@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <2B7AA933-CB74-4737-8330-6E623A31C6DA@lists.zabbadoz.net> References: To: kikuchan@uranus.dti.ne.jp X-Mailer: Apple Mail (2.2098) X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2015 09:54:11 -0000 Hi, removed hackers, added virtualization. > On 12 Jun 2015, at 01:17 , kikuchan@uranus.dti.ne.jp wrote: >=20 > Hello, >=20 > I=E2=80=99m (still) trying to figure out how jail-aware SysV IPC = mechanism should be. The best way probably is to finally get the =E2=80=9Ccommon=E2=80=9D = VIMAGE framework into HEAD to allow easy virtualisation of other = services. That work has been sitting in perforce for a few years and = simply needs updating for sysctls I think. Then use that to virtualise things and have a vipc like we have vnets. = The good news is that you have identified most places and have the = cleanup functions already so it=E2=80=99d be a matter of transforming = your changes (assuming they are correct and working fine; haven=E2=80=99t = actually read the patch in detail;-) to the different infrastructure. = And that=E2=80=99s the easiest part. Bjoern From owner-freebsd-jail@FreeBSD.ORG Mon Jun 15 10:49:22 2015 Return-Path: Delivered-To: freebsd-jail@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 07290941; Mon, 15 Jun 2015 10:49:22 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wg0-x231.google.com (mail-wg0-x231.google.com [IPv6:2a00:1450:400c:c00::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9013874D; Mon, 15 Jun 2015 10:49:21 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wgzl5 with SMTP id l5so40511885wgz.3; Mon, 15 Jun 2015 03:49:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=JWR5WjCsHQ+v2c2U7c2JWNq7N7NXiD6r2n1qot0odSQ=; b=eSw9rQk7gQFEgNmK/TXNQSSFmmhVZ6Jy4PHN+qax8SpFiSbDNuTkpXcM5S1MRgRmbx 0RfrdTs+eLjdpzaFcQN5pf7W2yihMttFDl35023IUlsqBlM01mONMNIisDhzw3dooy2C OOsB0zfOqwmkkcD4+td7/XmUgBK8w84NQcst8xi1bK1syal5Y93pJvezLl2D1lsXf3vF z2hw13Lv+1eLC04TVoeNOspcA7nfqVrEys4EBNl/yBTWZ5yQ0biTRV1/LgWUFyMKf/tz vltwjYV+H/93xLGw4h6L1xzRt6Ge+fGC5UZc0HkvwrjSdzIF3f9yYfb7Hq0zNkdpdLUP UooA== X-Received: by 10.194.238.233 with SMTP id vn9mr18877453wjc.24.1434365360033; Mon, 15 Jun 2015 03:49:20 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id ch2sm15232158wib.18.2015.06.15.03.49.18 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Mon, 15 Jun 2015 03:49:18 -0700 (PDT) Date: Mon, 15 Jun 2015 12:49:16 +0200 From: Mateusz Guzik To: "Bjoern A. Zeeb" Cc: kikuchan@uranus.dti.ne.jp, freebsd-jail@freebsd.org, freebsd-virtualization@freebsd.org Subject: Re: How to implement jail-aware SysV IPC (with my nasty patch) Message-ID: <20150615104915.GA18004@dft-labs.eu> References: <2B7AA933-CB74-4737-8330-6E623A31C6DA@lists.zabbadoz.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <2B7AA933-CB74-4737-8330-6E623A31C6DA@lists.zabbadoz.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2015 10:49:22 -0000 On Mon, Jun 15, 2015 at 09:53:53AM +0000, Bjoern A. Zeeb wrote: > Hi, > > removed hackers, added virtualization. > > > > On 12 Jun 2015, at 01:17 , kikuchan@uranus.dti.ne.jp wrote: > > > > Hello, > > > > I’m (still) trying to figure out how jail-aware SysV IPC mechanism should be. > > The best way probably is to finally get the “common” VIMAGE framework into HEAD to allow easy virtualisation of other services. That work has been sitting in perforce for a few years and simply needs updating for sysctls I think. > > Then use that to virtualise things and have a vipc like we have vnets. The good news is that you have identified most places and have the cleanup functions already so it’d be a matter of transforming your changes (assuming they are correct and working fine; haven’t actually read the patch in detail;-) to the different infrastructure. And that’s the easiest part. > > I have not looked at vimage too closely, maybe indeed it's the right to go. Would definitely be interested in seeing it cleaned up and in action. In the meantime, as I tried to explain in the previous thread, a jail-aware sysvshm poses several questions which need to be answered/taken care of before it can hit the tree. I doubt any reasonable implementation can magically avoid problems they pose and I definitely want to get an analysis how proposed implementation behaves (or how it prevents given scenario from occuring). Fundamentally the basic question is how does the implementation cope with processes having sysvshm mappings obtained from 2 different jails (provided they use different sysvshms). Preferably the whole business would be /prevented/. Prevention mechanism would have to deal with shared address spaces (rfork(2) + RFMEM), threads and pre-existing mappings. The patch posted here just puts permission checks in several places, while leaving the namespace shared, which I find to be a user-visible hack with no good justification. There is also no analysis how this behaves when presented with aforementioned scenario. Even if it turns out the resut is harmless with resulting code, this leaves us with a very error-prone scheme. There is no technical problem adding a pointer to struct prison and dereferencing it instead of current global vars. Adding proper sysctls dumping the content for given jail is trivial and so is providing resource limits when creating a first-level jail with a separate sysvshm. Something which cannot be as easily achieved with the patch in question. Possible later switch to vimage would be transparent to users. -- Mateusz Guzik From owner-freebsd-jail@FreeBSD.ORG Mon Jun 15 17:10:33 2015 Return-Path: Delivered-To: freebsd-jail@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AEBC6695; Mon, 15 Jun 2015 17:10:33 +0000 (UTC) (envelope-from kikuchan@uranus.dti.ne.jp) Received: from vsmtp07.dti.ne.jp (vsmtp07.dti.ne.jp [202.216.231.142]) by mx1.freebsd.org (Postfix) with ESMTP id 6A0097CF; Mon, 15 Jun 2015 17:10:32 +0000 (UTC) (envelope-from kikuchan@uranus.dti.ne.jp) Received: from mail.dream.jp (webmail01.ga.dti.ne.jp [202.216.229.152]) by vsmtp07.dti.ne.jp (3.11v) with ESMTP AUTH id t5FHAQ4b002127; Tue, 16 Jun 2015 02:10:26 +0900 (JST) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Date: Tue, 16 Jun 2015 02:10:26 +0900 From: To: "Bjoern A. Zeeb" Cc: , Subject: Re: How to implement jail-aware SysV IPC (with my nasty patch) In-Reply-To: <2B7AA933-CB74-4737-8330-6E623A31C6DA@lists.zabbadoz.net> References: <2B7AA933-CB74-4737-8330-6E623A31C6DA@lists.zabbadoz.net> Message-ID: X-Sender: kikuchan@uranus.dti.ne.jp User-Agent: DTI MyMail/0.3-trunk X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2015 17:10:33 -0000 On Mon, 15 Jun 2015 09:53:53 +0000, "Bjoern A. Zeeb" wrote: > Hi, > > removed hackers, added virtualization. > > >> On 12 Jun 2015, at 01:17 , kikuchan@uranus.dti.ne.jp wrote: >> >> Hello, >> >> I$B!G(Bm (still) trying to figure out how jail-aware SysV IPC mechanism should be. > > The best way probably is to finally get the $B!H(Bcommon$B!I(B VIMAGE framework into HEAD to allow easy virtualisation of other services. That work has been sitting in perforce for a few years and simply needs updating for sysctls I think. > > Then use that to virtualise things and have a vipc like we have vnets. The good news is that you have identified most places and have the cleanup functions already so it$B!G(Bd be a matter of transforming your changes (assuming they are correct and working fine; haven$B!G(Bt actually read the patch in detail;-) to the different infrastructure. And that$B!G(Bs the easiest part. > > > Bjoern Hi Bjoern, Thank you for your reply. The "common" VIMAGE framework sounds good, I really want it. I want to know what the IPC system looks like for user-land after virtualized, and what happen if vnet like vipc is implemented. For example, jail 1, 2, 3 join vipc group A, and jail 4, 5, 6 join vipc group B ?? Hmm, it looks good. Regards, Kikuchan From owner-freebsd-jail@FreeBSD.ORG Mon Jun 15 17:32:55 2015 Return-Path: Delivered-To: freebsd-jail@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A9C3B4A6; Mon, 15 Jun 2015 17:32:55 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mx1.sbone.de (mx1.sbone.de [IPv6:2a01:4f8:130:3ffc::401:25]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mx1.sbone.de", Issuer "SBone.DE" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 616A9DC0; Mon, 15 Jun 2015 17:32:55 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.sbone.de (mail.sbone.de [IPv6:fde9:577b:c1a9:31::2013:587]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.sbone.de (Postfix) with ESMTPS id 44C9225D3891; Mon, 15 Jun 2015 17:32:51 +0000 (UTC) Received: from content-filter.sbone.de (content-filter.sbone.de [IPv6:fde9:577b:c1a9:31::2013:2742]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPS id 56210C77036; Mon, 15 Jun 2015 17:32:50 +0000 (UTC) X-Virus-Scanned: amavisd-new at sbone.de Received: from mail.sbone.de ([IPv6:fde9:577b:c1a9:31::2013:587]) by content-filter.sbone.de (content-filter.sbone.de [fde9:577b:c1a9:31::2013:2742]) (amavisd-new, port 10024) with ESMTP id Fvpq1GMwVR0V; Mon, 15 Jun 2015 17:32:48 +0000 (UTC) Received: from [IPv6:fde9:577b:c1a9:4420:cabc:c8ff:fe8b:4fe6] (orange-tun0-ula.sbone.de [IPv6:fde9:577b:c1a9:4420:cabc:c8ff:fe8b:4fe6]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPSA id D8381C76FD3; Mon, 15 Jun 2015 17:32:47 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) Subject: Re: How to implement jail-aware SysV IPC (with my nasty patch) From: "Bjoern A. Zeeb" In-Reply-To: Date: Mon, 15 Jun 2015 17:32:45 +0000 Cc: freebsd-jail@freebsd.org, freebsd-virtualization@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <2B7AA933-CB74-4737-8330-6E623A31C6DA@lists.zabbadoz.net> To: kikuchan@uranus.dti.ne.jp X-Mailer: Apple Mail (2.2098) X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2015 17:32:55 -0000 > On 15 Jun 2015, at 17:10 , kikuchan@uranus.dti.ne.jp wrote: >=20 > On Mon, 15 Jun 2015 09:53:53 +0000, "Bjoern A. Zeeb" = wrote: >> Hi, >>=20 >> removed hackers, added virtualization. >>=20 >>=20 >>> On 12 Jun 2015, at 01:17 , kikuchan@uranus.dti.ne.jp wrote: >>>=20 >>> Hello, >>>=20 >>> I=E2=80=99m (still) trying to figure out how jail-aware SysV IPC = mechanism should be. >>=20 >> The best way probably is to finally get the =E2=80=9Ccommon=E2=80=9D = VIMAGE framework into HEAD to allow easy virtualisation of other = services. That work has been sitting in perforce for a few years and = simply needs updating for sysctls I think. >>=20 >> Then use that to virtualise things and have a vipc like we have = vnets. The good news is that you have identified most places and have = the cleanup functions already so it=E2=80=99d be a matter of = transforming your changes (assuming they are correct and working fine; = haven=E2=80=99t actually read the patch in detail;-) to the different = infrastructure. And that=E2=80=99s the easiest part. >>=20 >>=20 >> Bjoern >=20 > Hi Bjoern, > Thank you for your reply. >=20 > The "common" VIMAGE framework sounds good, I really want it. >=20 > I want to know what the IPC system looks like for user-land after = virtualized, > and what happen if vnet like vipc is implemented. >=20 > For example, jail 1, 2, 3 join vipc group A, and jail 4, 5, 6 join = vipc group B ?? > Hmm, it looks good. That=E2=80=99s not exactly how it works currently and I think the mixing = of options will be harder and something we=E2=80=99l have to figure out = more carefully. You would be able to say jail 1 has a vipc and jail 2 and 3 and =E2=80=9Cc= hild jails=E2=80=9D and inherit it. (similar for 4 + 5,6) so it=E2=80=99s= nested but not side-by-side. If we want more of the =E2=80=9Cmixing=E2=80=9D and independentness = we=E2=80=99ll have to re-think the way we =E2=80=9Cmanage=E2=80=9D = jails. Bjoern= From owner-freebsd-jail@FreeBSD.ORG Mon Jun 15 18:45:36 2015 Return-Path: Delivered-To: freebsd-jail@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7F177EB0; Mon, 15 Jun 2015 18:45:36 +0000 (UTC) (envelope-from kikuchan@uranus.dti.ne.jp) Received: from vsmtp07.dti.ne.jp (vsmtp07.dti.ne.jp [202.216.231.142]) by mx1.freebsd.org (Postfix) with ESMTP id 3A1501AD; Mon, 15 Jun 2015 18:45:35 +0000 (UTC) (envelope-from kikuchan@uranus.dti.ne.jp) Received: from mail.dream.jp (webmail01.ga.dti.ne.jp [202.216.229.152]) by vsmtp07.dti.ne.jp (3.11v) with ESMTP AUTH id t5FIjYtN004416; Tue, 16 Jun 2015 03:45:34 +0900 (JST) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Date: Tue, 16 Jun 2015 03:45:34 +0900 From: To: Mateusz Guzik Cc: , Subject: Re: How to implement jail-aware SysV IPC (with my nasty patch) In-Reply-To: <20150615104915.GA18004@dft-labs.eu> References: <2B7AA933-CB74-4737-8330-6E623A31C6DA@lists.zabbadoz.net> <20150615104915.GA18004@dft-labs.eu> Message-ID: <3681b69c41fd9352fef30afed901661a@imap.cm.dream.jp> X-Sender: kikuchan@uranus.dti.ne.jp User-Agent: DTI MyMail/0.3-trunk X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2015 18:45:36 -0000 On Mon, 15 Jun 2015 12:49:16 +0200, Mateusz Guzik wrote: > On Mon, Jun 15, 2015 at 09:53:53AM +0000, Bjoern A. Zeeb wrote: >> Hi, >> >> removed hackers, added virtualization. >> >> >> > On 12 Jun 2015, at 01:17 , kikuchan@uranus.dti.ne.jp wrote: >> > >> > Hello, >> > >> > I$B!G(Bm (still) trying to figure out how jail-aware SysV IPC mechanism should be. >> >> The best way probably is to finally get the $B!H(Bcommon$B!I(B VIMAGE framework into HEAD to allow easy virtualisation of other services. That work has been sitting in perforce for a few years and simply needs updating for sysctls I think. >> >> Then use that to virtualise things and have a vipc like we have vnets. The good news is that you have identified most places and have the cleanup functions already so it$B!G(Bd be a matter of transforming your changes (assuming they are correct and working fine; haven$B!G(Bt actually read the patch in detail;-) to the different infrastructure. And that$B!G(Bs the easiest part. >> >> > > I have not looked at vimage too closely, maybe indeed it's the right to > go. Would definitely be interested in seeing it cleaned up and in > action. > > In the meantime, as I tried to explain in the previous thread, a > jail-aware sysvshm poses several questions which need to be > answered/taken care of before it can hit the tree. I doubt any > reasonable implementation can magically avoid problems they pose and I > definitely want to get an analysis how proposed implementation behaves > (or how it prevents given scenario from occuring). > > Fundamentally the basic question is how does the implementation cope > with processes having sysvshm mappings obtained from 2 different jails > (provided they use different sysvshms). > > Preferably the whole business would be /prevented/. Prevention mechanism > would have to deal with shared address spaces (rfork(2) + RFMEM), > threads and pre-existing mappings. > > The patch posted here just puts permission checks in several places, > while leaving the namespace shared, which I find to be a user-visible > hack with no good justification. There is also no analysis how this > behaves when presented with aforementioned scenario. Even if it turns > out the resut is harmless with resulting code, this leaves us with a > very error-prone scheme. > > There is no technical problem adding a pointer to struct prison and > dereferencing it instead of current global vars. Adding proper sysctls > dumping the content for given jail is trivial and so is providing > resource limits when creating a first-level jail with a separate > sysvshm. Something which cannot be as easily achieved with the patch in > question. > > Possible later switch to vimage would be transparent to users. Dear Mateusz, I'm sorry if I'm annoying you, but I really want to solve this problems. > Fundamentally the basic question is how does the implementation cope > with processes having sysvshm mappings obtained from 2 different jails > (provided they use different sysvshms). > > Preferably the whole business would be /prevented/. Prevention mechanism > would have to deal with shared address spaces (rfork(2) + RFMEM), > threads and pre-existing mappings. > > The patch posted here just puts permission checks in several places, > while leaving the namespace shared, which I find to be a user-visible > hack with no good justification. There is also no analysis how this > behaves when presented with aforementioned scenario. Even if it turns > out the resut is harmless with resulting code, this leaves us with a > very error-prone scheme. > > There is no technical problem adding a pointer to struct prison and > dereferencing it instead of current global vars. Adding proper sysctls > dumping the content for given jail is trivial and so is providing > resource limits when creating a first-level jail with a separate > sysvshm. Something which cannot be as easily achieved with the patch in > question. Could you try the latest patch, please? I justify user-visibility, make it hierarchical jail friendly, and use EINVAL instead of EACCES to conceal information leak. https://bz-attachments.freebsd.org/attachment.cgi?id=157661 (typo fixed) I realized my method is a bit better, when I'm trying to port/write the real namespace separation. Let me explain (again) why I choose this method for sysv ipc, and could you tell me how it should be, please? struct shmmap_state { vm_offset_t va; int shmid; }; In sysv_shm.c, struct shmmap_state, exist per process as p->p_vmspace->vm_shm, is a lookup-table for va -> shm object lookup. The shmmap_state entry holds a reference (here, shmid) to shm object for further detach, and entries are simply copied on fork. If you split namespace (includes shmid space) completely, shmid would be no longer a unique identifier for IPC object in kernel. To make it unique, adding a reference to prison into shmmap_state like this; struct shmmap_state { vm_offset_t va; struct prison *prison; int shmid; }; would be bad idea, because after a process calls jail_attach(), the process holds a reference to another (creator) prison, or copy the IPC object completely on every jail_attach() occurs? How do you deal with hierarchical jail? How about this; struct shmmap_state { vm_offset_t va; struct shmid_kernel *shmseg; }; looks better, but remember you split shmid space completely by moving global vars to separated namespace for each jail, the *shmseg points inside of each jail's "struct shmid_kernel *shmsegs" list. It would have the same problems as the previous one. To prevent this, make a single shared list of struct shmid_kernel in kernel? It would be the same method what I used. Hmm, or, other smarter way exists, perhaps? What does it look like? I have no more idea... My method didn't touch anything about the mapping stuff, thus it behaves exactly the same as current FreeBSD behave on this point. I'm not sure I could understand properly what the shared address space problem is, (Could someone help me to understand, perhaps in code?) and, I'm not sure whether the current FreeBSD has the shared address space problem for sysvshm combined with jails. If it has the problem, unfortunately my patch doesn't provide any solution for that, but if not, my patch doesn't have the problem either, because I didn't change code structure. The patch just fixes key_t collision for jails, nothing more. So, the patch is harmless for non-jail user, and I believe it's useful for jail user using allow.sysvipc=true. BTW, What do you think about the following design for jail-aware sysvipc? > - IPC objects created on parent jail, are invisible to children. > - IPC objects created on neighbor jail, are also invisible each other. > - IPC objects craeted on child jail, are VISIBLE from parent. > - IPC key_t spaces are separated between jails. If you see the key_t named object from parent, it's shown as IPC_PRIVATE. I want to know it because I think the implementation and the behavior design can be discussed independently. Thank you. Regards, Kikuchan From owner-freebsd-jail@FreeBSD.ORG Mon Jun 15 20:28:33 2015 Return-Path: Delivered-To: freebsd-jail@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id ED4296C6 for ; Mon, 15 Jun 2015 20:28:32 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6]) by mx1.freebsd.org (Postfix) with ESMTP id C987CE4B for ; Mon, 15 Jun 2015 20:28:32 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from [192.168.1.2] (unknown [192.168.1.2]) (Authenticated sender: allanjude.freebsd@scaleengine.com) by mx1.scaleengine.net (Postfix) with ESMTPSA id 445C99106 for ; Mon, 15 Jun 2015 20:28:26 +0000 (UTC) Message-ID: <557F356C.4060708@freebsd.org> Date: Mon, 15 Jun 2015 16:28:28 -0400 From: Allan Jude User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-jail@freebsd.org Subject: Re: zfs in a jail References: <20150613035921.GA22078@blazingdot.com> In-Reply-To: <20150613035921.GA22078@blazingdot.com> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="AsConnwvu9LXe2vHpUatCV7WdWU2vQDHk" X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2015 20:28:33 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --AsConnwvu9LXe2vHpUatCV7WdWU2vQDHk Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 2015-06-12 23:59, Marcus Reid wrote: > Hi, >=20 > I'm doing zfs from within a jail, and there is one thing that's giving > me some trouble. >=20 > First, the bits that get zfs working from inside a jail: >=20 > /etc/jail.conf: > allow.mount; > allow.mount.zfs; > enforce_statfs =3D 1; >=20 > /etc/sysctl.conf: > security.jail.mount_allowed=3D1 > security.jail.mount_zfs_allowed=3D1 > security.jail.enforce_statfs=3D1 >=20 > zfs set jailed=3Don zroot/jails/git/git >=20 > Finally, to get the dataset visible inside the jail, this is required > when the jail is running: >=20 > zfs jail git zroot/jails/git/git >=20 > So, in jail.conf, I do a: >=20 > exec.poststart =3D "zfs jail git zroot/jails/git/git" >=20 > Problem: zfs is not visible in jail after a reboot. This problem is > understood but I don't know the solution. >=20 > exec.poststart is run after exec.start (the thing that runs /etc/rc in > the jail), so the zfs datasets are not yet visible when /etc/rc.d/zfs > runs in the jail. So, I have to log into the jail and do a 'zfs mount > -a' after everything comes up. Not ideal. If there were a > exec.postcreate directive in jail.conf that ran a command on the host > after jail creation but before /etc/rc starts, then I could run 'zfs > jail' before the jails init scripts are run. >=20 > Am I going about that in the wrong way? jail.conf seems like the right= > place for it, because you want your storage working after a 'jail -rc > git', right? >=20 > Thanks, >=20 > Marcus > _______________________________________________ > freebsd-jail@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-jail > To unsubscribe, send any mail to "freebsd-jail-unsubscribe@freebsd.org"= >=20 If you set: zfs_enable=3D"YES" in rc.conf inside the jail, it runs 'zfs mount -a' as part of the startup routine. This is how it is expected to work. --=20 Allan Jude --AsConnwvu9LXe2vHpUatCV7WdWU2vQDHk Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (MingW32) iQIcBAEBAgAGBQJVfzVxAAoJEBmVNT4SmAt+xX4P/0rMKDVRvAjK5YcSX8xNapYh E02G/OnjRMNRjc1s8pQtuFefPqhiQk4gCk4gt+rCxikRLA3kmDOC+q0WuOLUnazs RuhNhbwyO5wK6eiy+jyGv86ahOdGUIskwqDGzo9ZnAyufXfCBBIlpcfkTi8HJ1Ca M0fkHYDVomUhhqq+TDPi6FZGQaoeqbm8Iae+GzBCtTPBd2pZKPuQvw0d6kAaXsXH hsduS+3KOORR2GD0sUzN45W42XvMCIjsWh/QnouYiVkM6mHTFa3GNqcup9CfAK4+ hGjrvjFxyqczdvSzbbfi1iY7EUZPtmhqL5YB4julK5XgpGS2sNG1xaiZfi/GvZ1b oKLGhn0ZVPc4MaX6PJ0fwh+X7RjZUJ7yFb9zXH0x8BLk2Jp5K1HaudTGJteIRBTq ybfu4tTrHUSW3eEieVjOb82YH+YCdpuv6oV65Wvwb1SXW/dJmvopEeGlt78hLkg+ FsTTLE5K5hrhH33cgQNu1A+GK4RgjqMf2On0G2gopJgM1/L6T/VjxmS3bKmLBKHf 8h9AGjGvdZDouhsC3J4+UrKz8Wg5FSIGyBnIKyOFAip3VxmzyBt/FscSO78REGCd txOYWo2759yhNUpRnwm1frQ9odV5ZqPnE1z88AKB3aizAQnSzMRE79TbCq3RsLjR 7fghtbW6wMOrN5GD2NTp =7zaJ -----END PGP SIGNATURE----- --AsConnwvu9LXe2vHpUatCV7WdWU2vQDHk-- From owner-freebsd-jail@FreeBSD.ORG Mon Jun 15 21:44:35 2015 Return-Path: Delivered-To: freebsd-jail@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0C7F8D80; Mon, 15 Jun 2015 21:44:35 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com [IPv6:2a00:1450:400c:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 941AF6B3; Mon, 15 Jun 2015 21:44:34 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wiga1 with SMTP id a1so91284772wig.0; Mon, 15 Jun 2015 14:44:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=qN5ok6rJG3E1UCPGCJwfevktPczWXBgca6Aseab4tAc=; b=gBBKefEBGo3HpvREA8qnfW92rCKsezhO+18VjxjQwN/lGaivNZQlALqFyucfHZYHqF bgP8UEPWPVd6n1H41kVhGevMk9JMlJIHCnL8RGZEQ8hhLg0kaJg/CtDeNhBUZiKwtJOl R2WbhyZ/R/vIRoT1t6mxxA5uHv0JFiYlYzOkOzS2Gq1qL7Y5tgPRL7lwteREKCxu9LZ9 RB9HV4UfZbElxYIHwbtd/PJ30XYniM8nj6Ldq/TA7PXDH6O15p4A3iNL9RPNYLDYwWiQ 2jPsGTchUEFWPJpUdcW5HO5FSkm5/Al6hhNkw0jwRB+p+bcIY9HPZUG1wmvdCnHoQMiz CbHQ== X-Received: by 10.194.184.140 with SMTP id eu12mr54337746wjc.78.1434404671982; Mon, 15 Jun 2015 14:44:31 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id o6sm17752918wiz.24.2015.06.15.14.44.30 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Mon, 15 Jun 2015 14:44:30 -0700 (PDT) Date: Mon, 15 Jun 2015 23:44:28 +0200 From: Mateusz Guzik To: kikuchan@uranus.dti.ne.jp Cc: freebsd-jail@freebsd.org, freebsd-virtualization@freebsd.org Subject: Re: How to implement jail-aware SysV IPC (with my nasty patch) Message-ID: <20150615214427.GB18004@dft-labs.eu> References: <2B7AA933-CB74-4737-8330-6E623A31C6DA@lists.zabbadoz.net> <20150615104915.GA18004@dft-labs.eu> <3681b69c41fd9352fef30afed901661a@imap.cm.dream.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <3681b69c41fd9352fef30afed901661a@imap.cm.dream.jp> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2015 21:44:35 -0000 On Tue, Jun 16, 2015 at 03:45:34AM +0900, kikuchan@uranus.dti.ne.jp wrote: > On Mon, 15 Jun 2015 12:49:16 +0200, Mateusz Guzik wrote: > > Fundamentally the basic question is how does the implementation cope > > with processes having sysvshm mappings obtained from 2 different jails > > (provided they use different sysvshms). > > > > Preferably the whole business would be /prevented/. Prevention mechanism > > would have to deal with shared address spaces (rfork(2) + RFMEM), > > threads and pre-existing mappings. > > > > The patch posted here just puts permission checks in several places, > > while leaving the namespace shared, which I find to be a user-visible > > hack with no good justification. There is also no analysis how this > > behaves when presented with aforementioned scenario. Even if it turns > > out the resut is harmless with resulting code, this leaves us with a > > very error-prone scheme. > > > > There is no technical problem adding a pointer to struct prison and > > dereferencing it instead of current global vars. Adding proper sysctls > > dumping the content for given jail is trivial and so is providing > > resource limits when creating a first-level jail with a separate > > sysvshm. Something which cannot be as easily achieved with the patch in > > question. > > Could you try the latest patch, please? > I justify user-visibility, make it hierarchical jail friendly, and use EINVAL instead of EACCES to conceal information leak. > https://bz-attachments.freebsd.org/attachment.cgi?id=157661 (typo fixed) > > > I realized my method is a bit better, when I'm trying to port/write the real namespace separation. > Let me explain (again) why I choose this method for sysv ipc, and could you tell me how it should be, please? > > struct shmmap_state { > vm_offset_t va; > int shmid; > }; > > In sysv_shm.c, struct shmmap_state, exist per process as p->p_vmspace->vm_shm, is a lookup-table for va -> shm object lookup. > The shmmap_state entry holds a reference (here, shmid) to shm object for further detach, and entries are simply copied on fork. > > If you split namespace (includes shmid space) completely, shmid would be no longer a unique identifier for IPC object in kernel. > To make it unique, adding a reference to prison into shmmap_state like this; > > struct shmmap_state { > vm_offset_t va; > struct prison *prison; > int shmid; > }; > > would be bad idea, because after a process calls jail_attach(), the process holds a reference to another (creator) prison, or copy the IPC object completely on every jail_attach() occurs? As I explained in the previous thread, with a separate namespace it is a strict requirement to prevent sharing of sysvshm mappings. With the requirement met, there is no issue. As you will see later in the mail, even your approach would benefit greatly from having such a restriction. > How do you deal with hierarchical jail? > If proper resource limiting for hierarchical jails is implemented, the new jail either inherits or gets a new namespace, depending on used options. With only simplistic support first level jails can inherit or get a new namespace, the rest must inherit. There is no issue here due to sharing prevention. > My method didn't touch anything about the mapping stuff, thus it behaves exactly the same as current FreeBSD behave on this point. > Sure it did. As you noticed yourself it makes sense to clean up sysvshms on jail destruction, which you do in sysvshm_cleanup_for_prison_myhook. Your code does: if ((shmseg->u.shm_perm.mode & SHMSEG_ALLOCATED) && shmseg->cred->cr_prison == pr) { shm_remove(shmseg, i); .... which differs from what is executed by kern_shmdt_locked. Now let's consider a process which rforks and shared the address space with it's child. The child enters a jail and grabs a sysvshm mapping, then exits and we kill the jail. In effect we got a process with an address space which used a mapping created in a now-destroyed jail. Is this situation problematic? I don't see any anlysis provided. Maybe it is, maybe it so happens it is not. The mere posibility of this scenario needlessly complicates maintenance, and such a scenario has likely no practical purpose. As such, it is best /prevented/. With it prevented there is nothing positive about your approach that I could see. > I'm not sure I could understand properly what the shared address space problem is, (Could someone help me to understand, perhaps in code?) > and, I'm not sure whether the current FreeBSD has the shared address space problem for sysvshm combined with jails. > If it has the problem, unfortunately my patch doesn't provide any solution for that, > but if not, my patch doesn't have the problem either, because I didn't change code structure. > As I mentioned, you sure did. I don't know if there are any serious problems /as it is/ and I'm too lazy to check. I surely expect any patch doing sysvshm for jails to be provided with an anslysis of its behaviour in that regard though. > The patch just fixes key_t collision for jails, nothing more. > So, the patch is harmless for non-jail user, and I believe it's useful for jail user using allow.sysvipc=true. > > > BTW, What do you think about the following design for jail-aware sysvipc? > > > - IPC objects created on parent jail, are invisible to children. > > - IPC objects created on neighbor jail, are also invisible each other. > > - IPC objects craeted on child jail, are VISIBLE from parent. > > - IPC key_t spaces are separated between jails. If you see the key_t named object from parent, it's shown as IPC_PRIVATE. > How about the following: the jail decided whether it wants to share a namespace with a particular child (and by extension grandchildren and so on). Done. There is nothing complicated to do here unless you want to try out named namespace which you e.g. assign to different jails on the same level. -- Mateusz Guzik