From owner-freebsd-stable@FreeBSD.ORG Tue Apr 4 10:47:15 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ABD3916A400; Tue, 4 Apr 2006 10:47:15 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 56A1743D45; Tue, 4 Apr 2006 10:47:15 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 6D00C46C6A; Tue, 4 Apr 2006 06:47:14 -0400 (EDT) Date: Tue, 4 Apr 2006 11:47:14 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Peter Jeremy In-Reply-To: <20060404100750.GG683@turion.vk2pj.dyndns.org> Message-ID: <20060404112938.G76562@fledge.watson.org> References: <20060403003318.K947@ganymede.hub.org> <20060403163220.F36756@fledge.watson.org> <20060404100750.GG683@turion.vk2pj.dyndns.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-current@freebsd.org, freebsd-stable@freebsd.org Subject: Re: new feature: private IPC for every jail X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Apr 2006 10:47:15 -0000 On Tue, 4 Apr 2006, Peter Jeremy wrote: > On Mon, 2006-Apr-03 16:34:59 +0100, Robert Watson wrote: >> (2) The name space model for system v ipc is flat, so while it's desirable >> to >> allow the administrator in the host environment to monitor and control >> resource use in the jail (for example, delete allocated but unused >> segments), doing that requires developing an administrative model for >> it. > > The SysV SHM name space is made up of a 32-bit user-selected key which is > mapped into a 32-bit (system chosen) identifier, which (on FreeBSD) is made > up of a 16-bit pool identifier (in the range 0..shmmni-1) and a 16-bit > generation counter. > > At the expense of restricting shmmni, the generation counter and JAIL_MAX, > it would seem possible to embed prison.pr_id into the shmid and treat pr_id > as an (implicit) part of the key - insisting they must match for jailed > processes. Since the name space remains the same, ipcs and ipcrm would not > be affected and a non-jailed ipcrm could delete jailed IPC by identifier. > > On the surface, this approach looks easier than having a distinct name space > associated with each prison (as per kern/48471) and has the advantage of > allowing non-jailed processes to manage jailed IPC. The disadvantage is > restricting the ranges of various counters - though I believe they are > overly generous by default. > > This doesn't really address the problem of SysV IPC and jails becoming more > intimately entwined. Hmm. This sounds like it might be workable. To make sure I understand your proposal: - We add a new prison ID field to the in-kernel description of each segment, semaphore, message queue, etc. This is initialized to the prison ID of the process creating the object at the time of creation. - shmget(), et al, will, in addition to matching the key when searching for an existing object, will also attempt to match the prison ID of the object to the process. For the sake of completeness, we will use prison ID 0 for unjailed processes (or something along those lines). This guarantees that two jails, or even the host and a jail, will never receive an ID already allocated to another jail, and in particular, not an ID for an object from another jail with the same key as might be used in the current jail. - shmat(), et al, will perform an access control check to confirm that if a process is jailed, its prison ID matches that of the object. Is it necessary, as you suggest, to change the IPC ID name space at all? I assume applications do consistently use shmget() to look up IDs, and that they can't/don't make assumptions about long-term persistence of those mappings across boot (which is effectively what a jail restart is? Is the behavior of IPXSEQ_TO_IPCID() something that has documented or relied on properties, or are we free to perform a mapping from a name (key) to an object (id) in any way we choose? I guess another change is also needed: - At jail termination, we GC all resources with the prison ID in question. This prevents a future jail from turning up with the same ID and seeing old shared memory (etc) segments. Robert N M Watson