From owner-freebsd-virtualization@FreeBSD.ORG Mon Jun 15 10:49:22 2015 Return-Path: Delivered-To: freebsd-virtualization@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 07290941; Mon, 15 Jun 2015 10:49:22 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wg0-x231.google.com (mail-wg0-x231.google.com [IPv6:2a00:1450:400c:c00::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9013874D; Mon, 15 Jun 2015 10:49:21 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wgzl5 with SMTP id l5so40511885wgz.3; Mon, 15 Jun 2015 03:49:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=JWR5WjCsHQ+v2c2U7c2JWNq7N7NXiD6r2n1qot0odSQ=; b=eSw9rQk7gQFEgNmK/TXNQSSFmmhVZ6Jy4PHN+qax8SpFiSbDNuTkpXcM5S1MRgRmbx 0RfrdTs+eLjdpzaFcQN5pf7W2yihMttFDl35023IUlsqBlM01mONMNIisDhzw3dooy2C OOsB0zfOqwmkkcD4+td7/XmUgBK8w84NQcst8xi1bK1syal5Y93pJvezLl2D1lsXf3vF z2hw13Lv+1eLC04TVoeNOspcA7nfqVrEys4EBNl/yBTWZ5yQ0biTRV1/LgWUFyMKf/tz vltwjYV+H/93xLGw4h6L1xzRt6Ge+fGC5UZc0HkvwrjSdzIF3f9yYfb7Hq0zNkdpdLUP UooA== X-Received: by 10.194.238.233 with SMTP id vn9mr18877453wjc.24.1434365360033; Mon, 15 Jun 2015 03:49:20 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id ch2sm15232158wib.18.2015.06.15.03.49.18 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Mon, 15 Jun 2015 03:49:18 -0700 (PDT) Date: Mon, 15 Jun 2015 12:49:16 +0200 From: Mateusz Guzik To: "Bjoern A. Zeeb" Cc: kikuchan@uranus.dti.ne.jp, freebsd-jail@freebsd.org, freebsd-virtualization@freebsd.org Subject: Re: How to implement jail-aware SysV IPC (with my nasty patch) Message-ID: <20150615104915.GA18004@dft-labs.eu> References: <2B7AA933-CB74-4737-8330-6E623A31C6DA@lists.zabbadoz.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <2B7AA933-CB74-4737-8330-6E623A31C6DA@lists.zabbadoz.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2015 10:49:22 -0000 On Mon, Jun 15, 2015 at 09:53:53AM +0000, Bjoern A. Zeeb wrote: > Hi, > > removed hackers, added virtualization. > > > > On 12 Jun 2015, at 01:17 , kikuchan@uranus.dti.ne.jp wrote: > > > > Hello, > > > > I’m (still) trying to figure out how jail-aware SysV IPC mechanism should be. > > The best way probably is to finally get the “common” VIMAGE framework into HEAD to allow easy virtualisation of other services. That work has been sitting in perforce for a few years and simply needs updating for sysctls I think. > > Then use that to virtualise things and have a vipc like we have vnets. The good news is that you have identified most places and have the cleanup functions already so it’d be a matter of transforming your changes (assuming they are correct and working fine; haven’t actually read the patch in detail;-) to the different infrastructure. And that’s the easiest part. > > I have not looked at vimage too closely, maybe indeed it's the right to go. Would definitely be interested in seeing it cleaned up and in action. In the meantime, as I tried to explain in the previous thread, a jail-aware sysvshm poses several questions which need to be answered/taken care of before it can hit the tree. I doubt any reasonable implementation can magically avoid problems they pose and I definitely want to get an analysis how proposed implementation behaves (or how it prevents given scenario from occuring). Fundamentally the basic question is how does the implementation cope with processes having sysvshm mappings obtained from 2 different jails (provided they use different sysvshms). Preferably the whole business would be /prevented/. Prevention mechanism would have to deal with shared address spaces (rfork(2) + RFMEM), threads and pre-existing mappings. The patch posted here just puts permission checks in several places, while leaving the namespace shared, which I find to be a user-visible hack with no good justification. There is also no analysis how this behaves when presented with aforementioned scenario. Even if it turns out the resut is harmless with resulting code, this leaves us with a very error-prone scheme. There is no technical problem adding a pointer to struct prison and dereferencing it instead of current global vars. Adding proper sysctls dumping the content for given jail is trivial and so is providing resource limits when creating a first-level jail with a separate sysvshm. Something which cannot be as easily achieved with the patch in question. Possible later switch to vimage would be transparent to users. -- Mateusz Guzik