From owner-freebsd-virtualization@FreeBSD.ORG Sun May 3 10:32:49 2009 Return-Path: Delivered-To: virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C9D8B1065670 for ; Sun, 3 May 2009 10:32:49 +0000 (UTC) (envelope-from pcc@gmx.net) Received: from mail.gmx.net (mail.gmx.net [213.165.64.20]) by mx1.freebsd.org (Postfix) with SMTP id 11F0C8FC08 for ; Sun, 3 May 2009 10:32:48 +0000 (UTC) (envelope-from pcc@gmx.net) Received: (qmail 32159 invoked by uid 0); 3 May 2009 10:32:47 -0000 Received: from 84.163.208.160 by www057.gmx.net with HTTP; Sun, 03 May 2009 12:32:44 +0200 (CEST) Content-Type: text/plain; charset="iso-8859-1" Date: Sun, 03 May 2009 12:32:44 +0200 From: "Peter Cornelius" In-Reply-To: <49FC78DA.2010201@elischer.org> Message-ID: <20090503103244.44760@gmx.net> MIME-Version: 1.0 References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> <49FC78DA.2010201@elischer.org> To: Julian Elischer X-Authenticated: #491680 X-Flags: 0001 X-Mailer: WWW-Mail 6100 (Global Message Exchange) X-Priority: 5 X-Provags-ID: V01U2FsdGVkX18g2cN6GyIgJWHp+m8s5xrGWrf5XCAfa4LI+0YsyV BbnE5xLT/7Mj9rkSGgCS5bq/WqtD1qydGnxA== Content-Transfer-Encoding: 8bit X-GMX-UID: qhaVBEdwbHIhTtK4PjQ0UDsiJihyalD0 X-FuHaFi: 0.75 Cc: virtualization@freebsd.org Subject: Re: VIMAGE X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 May 2009 10:32:50 -0000 Re... > The situation is that right now jail and vimage are > orthogonal (ish) however in the future, > vimage will become a set of options on jail. Ah. SO it probably is kinda useless to try and stick a couple of jails 'inside' a vimage. Rgds., Peter. -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 From owner-freebsd-virtualization@FreeBSD.ORG Sun May 3 17:51:51 2009 Return-Path: Delivered-To: virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D4C5A106567C for ; Sun, 3 May 2009 17:51:51 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outK.internet-mail-service.net (outk.internet-mail-service.net [216.240.47.234]) by mx1.freebsd.org (Postfix) with ESMTP id BA50F8FC08 for ; Sun, 3 May 2009 17:51:51 +0000 (UTC) (envelope-from julian@elischer.org) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id D52BD232E; Sun, 3 May 2009 10:51:51 -0700 (PDT) X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id 2063F2D62C1; Sun, 3 May 2009 10:51:51 -0700 (PDT) Message-ID: <49FDD9B9.7090403@elischer.org> Date: Sun, 03 May 2009 10:51:53 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: Peter Cornelius References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> <49FC78DA.2010201@elischer.org> <20090503103244.44760@gmx.net> In-Reply-To: <20090503103244.44760@gmx.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: virtualization@freebsd.org Subject: Re: VIMAGE X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 May 2009 17:51:52 -0000 Peter Cornelius wrote: > Re... > >> The situation is that right now jail and vimage are >> orthogonal (ish) however in the future, >> vimage will become a set of options on jail. > > Ah. SO it probably is kinda useless to try and stick a couple of jails 'inside' a vimage. no you will be able to nest jails. some of them may have the vimage options and some may not. > > Rgds., > > Peter. From owner-freebsd-virtualization@FreeBSD.ORG Sun May 3 18:06:03 2009 Return-Path: Delivered-To: virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F7781065673 for ; Sun, 3 May 2009 18:06:03 +0000 (UTC) (envelope-from nvass9573@gmx.com) Received: from mail.gmx.com (unknown [213.165.64.42]) by mx1.freebsd.org (Postfix) with SMTP id D9C408FC15 for ; Sun, 3 May 2009 18:06:02 +0000 (UTC) (envelope-from nvass9573@gmx.com) Received: (qmail invoked by alias); 03 May 2009 18:06:00 -0000 Received: from ipa114.43.107.79.tellas.gr (EHLO [169.254.0.4]) [79.107.43.114] by mail.gmx.com (mp-eu002) with SMTP; 03 May 2009 20:06:00 +0200 X-Authenticated: #46156728 X-Provags-ID: V01U2FsdGVkX1+7Cvxph067UhakhbBA/8GPnnrzCsJdKkC4uYztfC 6Drc5F0itYwFz1 Message-ID: <49FDDD02.3090803@gmx.com> Date: Sun, 03 May 2009 21:05:54 +0300 From: Nikos Vassiliadis User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: Julian Elischer References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> <49FC78DA.2010201@elischer.org> <20090503103244.44760@gmx.net> <49FDD9B9.7090403@elischer.org> In-Reply-To: <49FDD9B9.7090403@elischer.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-FuHaFi: 0.71 Cc: virtualization@freebsd.org Subject: Re: VIMAGE X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 May 2009 18:06:03 -0000 Julian Elischer wrote: > Peter Cornelius wrote: >> Re... >> >>> The situation is that right now jail and vimage are >>> orthogonal (ish) however in the future, >>> vimage will become a set of options on jail. >> >> Ah. SO it probably is kinda useless to try and stick a couple of jails >> 'inside' a vimage. > > no you will be able to nest jails. > some of them may have the vimage options and some may not. What about vimages without jails? I can imagine some applications of VIMAGE which completely lack user-space processing. If I recall correctly a jail exists as far there is at least one process associated with it. Would that be feasible? Having a vimage with no processes? From owner-freebsd-virtualization@FreeBSD.ORG Mon May 4 02:55:55 2009 Return-Path: Delivered-To: virtualization@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B35E9106566B for ; Mon, 4 May 2009 02:55:55 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from gritton.org (gritton.org [161.58.222.4]) by mx1.freebsd.org (Postfix) with ESMTP id 77A4A8FC08 for ; Mon, 4 May 2009 02:55:55 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from glorfindel.gritton.org (c-76-27-80-223.hsd1.ut.comcast.net [76.27.80.223]) (authenticated bits=0) by gritton.org (8.13.6.20060614/8.13.6) with ESMTP id n442trPv023146; Sun, 3 May 2009 20:55:54 -0600 (MDT) Message-ID: <49FE5937.3000606@FreeBSD.org> Date: Sun, 03 May 2009 20:55:51 -0600 From: Jamie Gritton User-Agent: Thunderbird 2.0.0.19 (X11/20090220) MIME-Version: 1.0 To: Nikos Vassiliadis References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> <49FC78DA.2010201@elischer.org> <20090503103244.44760@gmx.net> <49FDD9B9.7090403@elischer.org> <49FDDD02.3090803@gmx.com> In-Reply-To: <49FDDD02.3090803@gmx.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.94.2/9321/Sun May 3 16:33:47 2009 on gritton.org X-Virus-Status: Clean X-Mailman-Approved-At: Mon, 04 May 2009 03:59:52 +0000 Cc: virtualization@FreeBSD.org Subject: Re: VIMAGE X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 May 2009 02:55:55 -0000 Nikos Vassiliadis wrote: > Julian Elischer wrote: >> Peter Cornelius wrote: >>> Re... >>> >>>> The situation is that right now jail and vimage are >>>> orthogonal (ish) however in the future, >>>> vimage will become a set of options on jail. >>> >>> Ah. SO it probably is kinda useless to try and stick a couple of >>> jails 'inside' a vimage. >> >> no you will be able to nest jails. >> some of them may have the vimage options and some may not. > > What about vimages without jails? > I can imagine some applications of VIMAGE which completely > lack user-space processing. If I recall correctly a jail > exists as far there is at least one process associated with > it. Would that be feasible? > Having a vimage with no processes? Jails will be able to exist without processes, and in fact with nothing more than a vimage attached. But much of vimage only makes sense in conjunction with processes - a process attached to a vimage can see that vimage's network interfaces. There are still things like routing that work independent of processes I suppose, but it seems to me much what a vimage does is provide the network stack to the processes it's tied to. - Jamie From owner-freebsd-virtualization@FreeBSD.ORG Mon May 4 05:12:59 2009 Return-Path: Delivered-To: virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7EC901065673 for ; Mon, 4 May 2009 05:12:59 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outF.internet-mail-service.net (outf.internet-mail-service.net [216.240.47.229]) by mx1.freebsd.org (Postfix) with ESMTP id 61FD08FC1B for ; Mon, 4 May 2009 05:12:59 +0000 (UTC) (envelope-from julian@elischer.org) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id 7A12A21F8; Sun, 3 May 2009 22:12:59 -0700 (PDT) X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id C58972D6244; Sun, 3 May 2009 22:12:58 -0700 (PDT) Message-ID: <49FE795E.9040902@elischer.org> Date: Sun, 03 May 2009 22:13:02 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: Nikos Vassiliadis References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> <49FC78DA.2010201@elischer.org> <20090503103244.44760@gmx.net> <49FDD9B9.7090403@elischer.org> <49FDDD02.3090803@gmx.com> In-Reply-To: <49FDDD02.3090803@gmx.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: virtualization@freebsd.org Subject: Re: VIMAGE X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 May 2009 05:12:59 -0000 Nikos Vassiliadis wrote: > Julian Elischer wrote: >> Peter Cornelius wrote: >>> Re... >>> >>>> The situation is that right now jail and vimage are >>>> orthogonal (ish) however in the future, >>>> vimage will become a set of options on jail. >>> >>> Ah. SO it probably is kinda useless to try and stick a couple of >>> jails 'inside' a vimage. >> >> no you will be able to nest jails. >> some of them may have the vimage options and some may not. > > What about vimages without jails? > I can imagine some applications of VIMAGE which completely > lack user-space processing. If I recall correctly a jail > exists as far there is at least one process associated with > it. Would that be feasible? > Having a vimage with no processes? at this time yes From owner-freebsd-virtualization@FreeBSD.ORG Mon May 4 02:50:37 2009 Return-Path: Delivered-To: virtualization@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1A91A1065674; Mon, 4 May 2009 02:50:37 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from gritton.org (gritton.org [161.58.222.4]) by mx1.freebsd.org (Postfix) with ESMTP id 7C1B88FC17; Mon, 4 May 2009 02:50:31 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from glorfindel.gritton.org (c-76-27-80-223.hsd1.ut.comcast.net [76.27.80.223]) (authenticated bits=0) by gritton.org (8.13.6.20060614/8.13.6) with ESMTP id n442VZl1020415; Sun, 3 May 2009 20:31:36 -0600 (MDT) Message-ID: <49FE5387.3020503@FreeBSD.org> Date: Sun, 03 May 2009 20:31:35 -0600 From: Jamie Gritton User-Agent: Thunderbird 2.0.0.19 (X11/20090220) MIME-Version: 1.0 To: jail@FreeBSD.org, virtualization@FreeBSD.org, current@FreeBSD.org Content-Type: multipart/mixed; boundary="------------080804010107070301060002" X-Virus-Scanned: ClamAV 0.94.2/9321/Sun May 3 16:33:47 2009 on gritton.org X-Virus-Status: Clean X-Mailman-Approved-At: Mon, 04 May 2009 06:47:16 +0000 Cc: Subject: New jail framework - the userland side X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 May 2009 02:50:37 -0000 This is a multi-part message in MIME format. --------------080804010107070301060002 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi all. I recently added some new jail-related system calls to extend the current jail system with an nmount-inspired name=value interface. This not only adds a new interface to the jail system, but allows for future extensions. For the first step, I've just added new system calls to set and read jail parameters. This is step 2: altering jail(8) and jls(8) to work with the new jails. With the included patch, the old "jail path hostname ip-number command..." command line turns to a more general "jail foo=bar baz=bletch ...". There's a set of core parameters to set the things jails can already do, plus the ability to set any parameters that other subsystems may want to tie to jails - work in progress includes the Linux MIB parameters, future ideas include separate namespaces for things like SYSV/Posix IPC. And of course, the plan is to use these new jails to tie in to the Vimage project. This patch is for the jail admin programs, and uses the current kernel as of r191673. You won't yet be able to do anything jails don't do already, but the interface is how I plan for things to look in the future. I'd appreciate comments from anyone who's interested in the future of lightweight virtualization. As a bonus, there are man pages included :-). - Jamie --------------080804010107070301060002 Content-Type: text/plain; name="ju.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ju.diff" Index: usr.bin/killall/killall.1 =================================================================== --- usr.bin/killall/killall.1 (revision 191694) +++ usr.bin/killall/killall.1 (working copy) @@ -24,7 +24,7 @@ .\" .\" $FreeBSD$ .\" -.Dd November 9, 2007 +.Dd April 30, 2009 .Os .Dt KILLALL 1 .Sh NAME @@ -34,7 +34,7 @@ .Nm .Op Fl delmsvz .Op Fl help -.Op Fl j Ar jid +.Op Fl j Ar jail .Op Fl u Ar user .Op Fl t Ar tty .Op Fl c Ar procname @@ -91,9 +91,9 @@ (with or without a leading .Dq Li SIG ) , or numerically. -.It Fl j Ar jid -Kill processes in the jail specified by -.Ar jid . +.It Fl j Ar jail +Kill processes in the specified +.Ar jail . .It Fl u Ar user Limit potentially matching processes to those belonging to the specified Index: usr.bin/killall/killall.c =================================================================== --- usr.bin/killall/killall.c (revision 191694) +++ usr.bin/killall/killall.c (working copy) @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -51,7 +52,7 @@ usage(void) { - fprintf(stderr, "usage: killall [-delmsvz] [-help] [-j jid]\n"); + fprintf(stderr, "usage: killall [-delmsvz] [-help] [-j jail]\n"); fprintf(stderr, " [-u user] [-t tty] [-c cmd] [-SIGNAL] [cmd]...\n"); fprintf(stderr, "At least one option or argument to specify processes must be given.\n"); @@ -100,6 +101,7 @@ int main(int ac, char **av) { + struct iovec jparams[2]; struct kinfo_proc *procs = NULL, *newprocs; struct stat sb; struct passwd *pw; @@ -159,12 +161,21 @@ } jflag++; if (*av == NULL) - errx(1, "must specify jid"); - jid = strtol(*av, &ep, 10); - if (!*av || *ep) - errx(1, "illegal jid: %s", *av); + errx(1, "must specify jail"); + jid = strtoul(*av, &ep, 10); + if (!**av || *ep) { + *(const void **)&jparams[0].iov_base = + "name"; + jparams[0].iov_len = sizeof("name"); + jparams[1].iov_base = *av; + jparams[1].iov_len = strlen(*av) + 1; + jid = jail_get(jparams, 2, 0); + if (jid < 0) + errx(1, "unknown jail: %s", + *av); + } if (jail_attach(jid) == -1) - err(1, "jail_attach(): %d", jid); + err(1, "jail_attach(%d)", jid); break; case 'u': ++*av; Index: usr.sbin/jls/jls.c =================================================================== --- usr.sbin/jls/jls.c (revision 191694) +++ usr.sbin/jls/jls.c (working copy) @@ -1,6 +1,7 @@ /*- * Copyright (c) 2003 Mike Barcroft * Copyright (c) 2008 Bjoern A. Zeeb + * Copyright (c) 2009 James Gritton * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -23,18 +24,20 @@ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. - * - * $FreeBSD$ */ +#include +__FBSDID("$FreeBSD$"); + #include -#include #include +#include #include +#include -#include +#include #include -#include + #include #include #include @@ -43,215 +46,672 @@ #include #include -#define FLAG_A 0x00001 -#define FLAG_V 0x00002 +#define SJPARAM "security.jail.param" +#define ARRAY_SLOP 5 -#ifdef SUPPORT_OLD_XPRISON -static -char *print_xprison_v1(void *p, char *end, unsigned flags) +#define CTLTYPE_BOOL (CTLTYPE + 1) +#define CTLTYPE_NOBOOL (CTLTYPE + 2) +#define CTLTYPE_IPADDR (CTLTYPE + 3) +#define CTLTYPE_IP6ADDR (CTLTYPE + 4) + +#define PARAM_KEY 0x1 +#define PARAM_USER 0x2 +#define PARAM_ARRAY 0x4 +#define PARAM_OPT 0x8 + +#define PRINT_DEFAULT 0x01 +#define PRINT_VDEFAULT 0x02 +#define PRINT_HEADER 0x04 +#define PRINT_NAMEVAL 0x08 +#define PRINT_QUOTED 0x10 + +struct param { + char *name; + void *value; + size_t size; + int type; + unsigned flags; +}; + +struct iovec2 { + struct iovec name; + struct iovec value; +}; + +static struct param *params; +static int nparams; +static char errmsg[256]; + +static void add_param(const char *name, void *value, unsigned flags); +static int get_param(const char *name, struct param *param); +static int sort_param(const void *a, const void *b); +static char *noname(const char *name); +static char *nononame(const char *name); +static int print_jail(int pflags, int jflags); +static void quoted_print(char *str, int len); + +int +main(int argc, char **argv) { - struct xprison_v1 *xp; - struct in_addr in; + char *ep, *jname; + int c, i, jflags, jid, lastjid, pflags; - if ((char *)p + sizeof(struct xprison_v1) > end) - errx(1, "Invalid length for jail"); + jname = NULL; + pflags = jflags = jid = 0; + while ((c = getopt(argc, argv, "dj:hnqv")) >= 0) + switch (c) { + case 'd': + jflags |= JAIL_DYING; + break; + case 'j': + jid = strtoul(optarg, &ep, 10); + if (!*optarg || *ep) + jname = optarg; + break; + case 'h': + pflags |= PRINT_HEADER; + break; + case 'n': + pflags |= PRINT_NAMEVAL; + break; + case 'q': + pflags |= PRINT_QUOTED; + break; + case 'v': + pflags |= PRINT_VDEFAULT; + break; + default: + errx(1, "usage: jls [-dhnqv] [-j jail] [param ...]"); + } - xp = (struct xprison_v1 *)p; - if (flags & FLAG_V) { - printf("%6d %-29.29s %.74s\n", - xp->pr_id, xp->pr_host, xp->pr_path); - /* We are not printing an empty line here for state and name. */ - /* We are not printing an empty line here for cpusetid. */ - /* IPv4 address. */ - in.s_addr = htonl(xp->pr_ip); - printf("%6s %-15.15s\n", "", inet_ntoa(in)); + /* Add the parameters to print. */ + if (optind == argc) { + if (pflags & PRINT_VDEFAULT) { + add_param("jid", NULL, PARAM_USER); + add_param("host.hostname", NULL, PARAM_USER); + add_param("path", NULL, PARAM_USER); + add_param("name", NULL, PARAM_USER); + add_param("dying", NULL, PARAM_USER); + add_param("cpuset", NULL, PARAM_USER); + add_param("ip4.addr", NULL, PARAM_USER); + add_param("ip6.addr", NULL, PARAM_USER | PARAM_OPT); + } else { + pflags |= PRINT_DEFAULT; + add_param("jid", NULL, PARAM_USER); + add_param("ip4.addr", NULL, PARAM_USER); + add_param("host.hostname", NULL, PARAM_USER); + add_param("path", NULL, PARAM_USER); + } + } else + while (optind < argc) + add_param(argv[optind++], NULL, PARAM_USER); + + /* Add the index key and errmsg parameters. */ + if (jid != 0) + add_param("jid", &jid, PARAM_KEY); + else if (jname != NULL) + add_param("name", jname, PARAM_KEY); + else + add_param("lastjid", &lastjid, PARAM_KEY); + add_param("errmsg", errmsg, PARAM_KEY); + + /* Print a header line if requested. */ + if (pflags & PRINT_VDEFAULT) + printf(" JID Hostname Path\n" + " Name State\n" + " CPUSetID\n" + " IP Address(es)\n"); + else if (pflags & PRINT_DEFAULT) + printf(" JID IP Address " + "Hostname Path\n"); + else if (pflags & PRINT_HEADER) { + for (i = 0; i < nparams; i++) + if (params[i].flags & PARAM_USER) { + if (i > 0) + putchar(' '); + fputs(params[i].name, stdout); + } + putchar('\n'); + } + + /* Fetch the jail(s) and print the paramters. */ + if (jid != 0 || jname != NULL) { + if (print_jail(pflags, jflags) < 0) { + if (errmsg[0]) + errx(1, "%s", errmsg); + err(1, "jail_get"); + } } else { - printf("%6d %-15.15s %-29.29s %.74s\n", - xp->pr_id, inet_ntoa(in), xp->pr_host, xp->pr_path); + for (lastjid = 0; + (lastjid = print_jail(pflags, jflags)) >= 0; ) + ; + if (errno != 0 && errno != ENOENT) { + if (errmsg[0]) + errx(1, "%s", errmsg); + err(1, "jail_get"); + } } - return ((char *)(xp + 1)); + return (0); } -#endif -static -char *print_xprison_v3(void *p, char *end, unsigned flags) +static void +add_param(const char *name, void *value, unsigned flags) { - struct xprison *xp; - struct in_addr *iap, in; - struct in6_addr *ia6p; - char buf[INET6_ADDRSTRLEN]; - const char *state; - char *q; - uint32_t i; + struct param *param; + char *nname; + size_t mlen1, mlen2, buflen; + int mib1[CTL_MAXNAME], mib2[CTL_MAXNAME - 2]; + int i, tnparams; + char buf[MAXPATHLEN]; - if ((char *)p + sizeof(struct xprison) > end) - errx(1, "Invalid length for jail"); - xp = (struct xprison *)p; + static int paramlistsize; - if (xp->pr_state < 0 || xp->pr_state >= (int) - ((sizeof(prison_states) / sizeof(struct prison_state)))) - state = "(bogus)"; - else - state = prison_states[xp->pr_state].state_name; + /* The pseudo-parameter "all" scans the list of available parameters. */ + if (!strcmp(name, "all")) { + tnparams = nparams; + mib1[0] = 0; + mib1[1] = 2; + mlen1 = CTL_MAXNAME - 2; + if (sysctlnametomib(SJPARAM, mib1 + 2, &mlen1) < 0) + err(1, "sysctlnametomib(" SJPARAM ")"); + for (;;) { + /* Get the next parameter. */ + mlen2 = sizeof(mib2); + if (sysctl(mib1, mlen1 + 2, mib2, &mlen2, NULL, 0) < 0) + err(1, "sysctl(0.2)"); + if (mib2[0] != mib1[2] || mib2[1] != mib1[3] || + mib2[2] != mib1[4]) + break; + /* Convert it to an ascii name. */ + memcpy(mib1 + 2, mib2, mlen2); + mlen1 = mlen2 / sizeof(int); + mib1[1] = 1; + buflen = sizeof(buf); + if (sysctl(mib1, mlen1 + 2, buf, &buflen, NULL, 0) < 0) + err(1, "sysctl(0.1)"); + add_param(buf + sizeof(SJPARAM), NULL, flags); + /* + * Convert nobool parameters to bool if their + * counterpart is a node, ortherwise discard them. + */ + param = ¶ms[nparams - 1]; + if (param->type == CTLTYPE_NOBOOL) { + nname = nononame(param->name); + if (get_param(nname, param) >= 0 && + param->type != CTLTYPE_NODE) { + free(nname); + nparams--; + } else { + free(param->name); + param->name = nname; + param->type = CTLTYPE_BOOL; + param->size = sizeof(int); + param->value = NULL; + } + } + mib1[1] = 2; + } - /* See if we should print non-ACTIVE jails. No? */ - if ((flags & FLAG_A) == 0 && strcmp(state, "ALIVE")) { - q = (char *)(xp + 1); - q += (xp->pr_ip4s * sizeof(struct in_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - q += (xp->pr_ip6s * sizeof(struct in6_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - return (q); + qsort(params + tnparams, (size_t)(nparams - tnparams), + sizeof(struct param), sort_param); + return; } - if (flags & FLAG_V) - printf("%6d %-29.29s %.74s\n", - xp->pr_id, xp->pr_host, xp->pr_path); + /* Check for repeat parameters. */ + for (i = 0; i < nparams; i++) + if (!strcmp(name, params[i].name)) { + params[i].value = value; + params[i].flags |= flags; + return; + } - /* Jail state and name. */ - if (flags & FLAG_V) - printf("%6s %-29.29s %.74s\n", - "", (xp->pr_name[0] != '\0') ? xp->pr_name : "", state); + /* Make sure there is room for the new param record. */ + if (!nparams) { + paramlistsize = 32; + params = malloc(paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "malloc"); + } else if (nparams >= paramlistsize) { + paramlistsize *= 2; + params = realloc(params, paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "realloc"); + } - /* cpusetid. */ - if (flags & FLAG_V) - printf("%6s %-6d\n", - "", xp->pr_cpusetid); + /* Look up the parameter. */ + param = params + nparams++; + memset(param, 0, sizeof *param); + param->name = strdup(name); + if (param->name == NULL) + err(1, "strdup"); + param->flags = flags; + /* We have to know about pseudo-parameters without asking. */ + if (!strcmp(param->name, "lastjid")) { + param->type = CTLTYPE_INT; + param->size = sizeof(int); + goto got_type; + } + if (!strcmp(param->name, "errmsg")) { + param->type = CTLTYPE_STRING; + param->size = sizeof(errmsg); + goto got_type; + } + if (get_param(name, param) < 0) { + if (errno != ENOENT) + err(1, "sysctl(0.3.%s)", name); + /* See if this the "no" part of an existing boolean. */ + if ((nname = nononame(name))) { + i = get_param(nname, param); + free(nname); + if (i >= 0 && param->type == CTLTYPE_BOOL) { + param->type = CTLTYPE_NOBOOL; + goto got_type; + } + } + if (flags & PARAM_OPT) { + nparams--; + return; + } + errx(1, "unknown parameter: %s", name); + } + if (param->type == CTLTYPE_NODE) { + /* + * A node isn't normally a parameter, but may be a boolean + * if its "no" counterpart exists. + */ + nname = noname(name); + i = get_param(nname, param); + free(nname); + if (i >= 0 && param->type == CTLTYPE_NOBOOL) { + param->type = CTLTYPE_BOOL; + goto got_type; + } + errx(1, "unknown parameter: %s", name); + } - q = (char *)(xp + 1); - /* IPv4 addresses. */ - iap = (struct in_addr *)(void *)q; - q += (xp->pr_ip4s * sizeof(struct in_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - in.s_addr = 0; - for (i = 0; i < xp->pr_ip4s; i++) { - if (i == 0 || flags & FLAG_V) - in.s_addr = iap[i].s_addr; - if (flags & FLAG_V) - printf("%6s %-15.15s\n", "", inet_ntoa(in)); + got_type: + param->value = value; +} + +static int +get_param(const char *name, struct param *param) +{ + char *bufi, *p; + size_t buflen, mlen; + int mib[CTL_MAXNAME]; + char buf[MAXPATHLEN]; + + /* Look up the MIB. */ + mib[0] = 0; + mib[1] = 3; + snprintf(buf, sizeof(buf), SJPARAM ".%s", name); + mlen = sizeof(mib) - 2 * sizeof(int); + if (sysctl(mib, 2, mib + 2, &mlen, buf, strlen(buf)) < 0) + return (-1); + /* Get the type and size. */ + mib[1] = 4; + buflen = sizeof(buf); + if (sysctl(mib, (mlen / sizeof(int)) + 2, buf, &buflen, NULL, 0) < 0) + err(1, "sysctl(0.4.%s)", name); + param->type = *(int *)buf & CTLTYPE; + bufi = buf + sizeof(int); + p = strchr(bufi, '\0'); + if (p - 2 >= bufi && !strcmp(p - 2, ",a")) { + p[-2] = 0; + param->flags |= PARAM_ARRAY; } - /* IPv6 addresses. */ - ia6p = (struct in6_addr *)(void *)q; - q += (xp->pr_ip6s * sizeof(struct in6_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - for (i = 0; i < xp->pr_ip6s; i++) { - if (flags & FLAG_V) { - inet_ntop(AF_INET6, &ia6p[i], buf, sizeof(buf)); - printf("%6s %s\n", "", buf); + switch (param->type) { + case CTLTYPE_INT: + /* An integer parameter might be a boolean. */ + if (bufi[0] == 'B') + param->type = bufi[1] == 'N' + ? CTLTYPE_NOBOOL : CTLTYPE_BOOL; + case CTLTYPE_UINT: + param->size = sizeof(int); + break; + case CTLTYPE_LONG: + case CTLTYPE_ULONG: + param->size = sizeof(long); + break; + case CTLTYPE_STRUCT: + if (!strcmp(bufi, "S,in_addr")) { + param->type = CTLTYPE_IPADDR; + param->size = sizeof(struct in_addr); + } else if (!strcmp(bufi, "S,in6_addr")) { + param->type = CTLTYPE_IP6ADDR; + param->size = sizeof(struct in6_addr); } + break; + case CTLTYPE_STRING: + buf[0] = 0; + sysctl(mib + 2, mlen / sizeof(int), buf, &buflen, NULL, 0); + param->size = strtoul(buf, NULL, 10); + if (param->size == 0) + param->size = BUFSIZ; } + return (0); +} - /* If requested print the old style single line version. */ - if (!(flags & FLAG_V)) - printf("%6d %-15.15s %-29.29s %.74s\n", - xp->pr_id, (in.s_addr) ? inet_ntoa(in) : "", - xp->pr_host, xp->pr_path); +static int +sort_param(const void *a, const void *b) +{ + const struct param *parama, *paramb; + char *ap, *bp; - return (q); + /* Put top-level parameters first. */ + parama = a; + paramb = b; + ap = strchr(parama->name, '.'); + bp = strchr(paramb->name, '.'); + if (ap && !bp) + return (1); + if (bp && !ap) + return (-1); + return (strcmp(parama->name, paramb->name)); } -static void -usage(void) +static char * +noname(const char *name) { + char *nname, *p; - (void)fprintf(stderr, "usage: jls [-av]\n"); - exit(1); + nname = malloc(strlen(name) + 3); + if (nname == NULL) + err(1, "malloc"); + p = strrchr(name, '.'); + if (p != NULL) + sprintf(nname, "%.*s.no%s", p - name, name, p + 1); + else + sprintf(nname, "no%s", name); + return nname; } -int -main(int argc, char *argv[]) -{ - int ch, version; - unsigned flags; - size_t i, j, len; - void *p, *q; +static char * +nononame(const char *name) +{ + char *nname, *p; - flags = 0; - while ((ch = getopt(argc, argv, "av")) != -1) { - switch (ch) { - case 'a': - flags |= FLAG_A; - break; - case 'v': - flags |= FLAG_V; - break; - default: - usage(); - } - } - argc -= optind; - argv += optind; + p = strrchr(name, '.'); + if (strncmp(p ? p + 1 : name, "no", 2)) + return NULL; + nname = malloc(strlen(name) - 1); + if (nname == NULL) + err(1, "malloc"); + if (p != NULL) + sprintf(nname, "%.*s.%s", p - name, name, p + 3); + else + strcpy(nname, name + 2); + return nname; +} - if (sysctlbyname("security.jail.list", NULL, &len, NULL, 0) == -1) - err(1, "sysctlbyname(): security.jail.list"); +static int +print_jail(int pflags, int jflags) +{ + char *nname; + int i, ai, jid, count, sanity; + char ipbuf[INET6_ADDRSTRLEN]; - j = len; - for (i = 0; i < 4; i++) { - if (len <= 0) - exit(0); - p = q = malloc(len); - if (p == NULL) - err(1, "malloc()"); + static struct iovec2 *iov, *aiov; + static int narray, nkey; - if (sysctlbyname("security.jail.list", q, &len, NULL, 0) == -1) { - if (errno == ENOMEM) { - free(p); - p = NULL; - len += j; + /* Set up the parameter list(s) the first time around. */ + if (iov == NULL) { + iov = malloc(nparams * sizeof(struct iovec2)); + if (iov == NULL) + err(1, "malloc"); + for (i = narray = 0; i < nparams; i++) { + iov[i].name.iov_base = params[i].name; + iov[i].name.iov_len = strlen(params[i].name) + 1; + iov[i].value.iov_base = params[i].value; + iov[i].value.iov_len = + params[i].type == CTLTYPE_STRING && + params[i].value != NULL && + ((char *)params[i].value)[0] != '\0' + ? strlen(params[i].value) + 1 : params[i].size; + if (params[i].flags & (PARAM_KEY | PARAM_ARRAY)) { + narray++; + if (params[i].flags & PARAM_KEY) + nkey++; + } + } + if (narray > nkey) { + aiov = malloc(narray * sizeof(struct iovec2)); + if (aiov == NULL) + err(1, "malloc"); + for (i = ai = 0; i < nparams; i++) + if (params[i].flags & + (PARAM_KEY | PARAM_ARRAY)) + aiov[ai++] = iov[i]; + } + } + /* If there are array parameters, find their sizes. */ + if (aiov != NULL) { + for (ai = 0; ai < narray; ai++) + if (aiov[ai].value.iov_base == NULL) + aiov[ai].value.iov_len = 0; + if (jail_get((struct iovec *)aiov, 2 * narray, jflags) < 0) + return (-1); + } + /* Allocate storage for all parameters. */ + for (i = ai = 0; i < nparams; i++) { + if (params[i].flags & (PARAM_KEY | PARAM_ARRAY)) { + if (params[i].flags & PARAM_ARRAY) { + iov[i].value.iov_len = aiov[ai].value.iov_len + + ARRAY_SLOP * params[i].size; + iov[i].value.iov_base = + malloc(iov[i].value.iov_len); + } + ai++; + } else + iov[i].value.iov_base = malloc(params[i].size); + if (iov[i].value.iov_base == NULL) + err(1, "malloc"); + if (params[i].value == NULL) + memset(iov[i].value.iov_base, 0, iov[i].value.iov_len); + } + /* + * Get the actual prison. If there are array elements, retry a few + * times in case the size changed from under us. + */ + if ((jid = jail_get((struct iovec *)iov, 2 * nparams, jflags)) < 0) { + if (errno != EINVAL || aiov == NULL || errmsg[0]) + return (-1); + for (sanity = 0;; sanity++) { + if (sanity == 10) + return (-1); + for (ai = 0; ai < narray; ai++) + if (params[i].flags & PARAM_ARRAY) + aiov[ai].value.iov_len = 0; + if (jail_get((struct iovec *)iov, 2 * narray, jflags) < + 0) + return (-1); + for (i = ai = 0; i < nparams; i++) { + if (!(params[i].flags & + (PARAM_KEY | PARAM_ARRAY))) + continue; + if (params[i].flags & PARAM_ARRAY) { + iov[i].value.iov_len = + aiov[ai].value.iov_len + + ARRAY_SLOP * params[i].size; + iov[i].value.iov_base = + realloc(iov[i].value.iov_base, + iov[i].value.iov_len); + if (iov[i].value.iov_base == NULL) + err(1, "malloc"); + } + ai++; + } + } + } + if (pflags & PRINT_VDEFAULT) { + printf("%6d %-29.29s %.74s\n" + "%6s %-29.29s %.74s\n" + "%6s %-6d\n", + *(int *)iov[0].value.iov_base, + (char *)iov[1].value.iov_base, + (char *)iov[2].value.iov_base, + "", + (char *)iov[3].value.iov_base, + *(int *)iov[4].value.iov_base ? "DYING" : "ACTIVE", + "", + *(int *)iov[5].value.iov_base); + count = iov[6].value.iov_len / sizeof(struct in_addr); + for (ai = 0; ai < count; ai++) + if (inet_ntop(AF_INET, + &((struct in_addr *)iov[6].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%6s %-15.15s\n", "", ipbuf); + if (!strcmp(params[7].name, "ip6.addr")) { + count = iov[7].value.iov_len / sizeof(struct in6_addr); + for (ai = 0; ai < count; ai++) + if (inet_ntop(AF_INET6, &((struct in_addr *) + iov[7].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%6s %-15.15s\n", "", ipbuf); + } + } else if (pflags & PRINT_DEFAULT) + printf("%6d %-15.15s %-29.29s %.74s\n", + *(int *)iov[0].value.iov_base, + iov[1].value.iov_len == 0 ? "-" + : inet_ntoa(*(struct in_addr *)iov[1].value.iov_base), + (char *)iov[2].value.iov_base, + (char *)iov[3].value.iov_base); + else { + for (i = 0; i < nparams; i++) { + if (!(params[i].flags & PARAM_USER)) continue; + if (i > 0) + putchar(' '); + if (pflags & PRINT_NAMEVAL) { + /* + * Generally "name=value", but for booleans + * either "name" or "noname". + */ + switch (params[i].type) { + case CTLTYPE_BOOL: + if (*(int *)iov[i].value.iov_base) + printf("%s", params[i].name); + else { + nname = noname(params[i].name); + printf("%s", nname); + free(nname); + } + break; + case CTLTYPE_NOBOOL: + if (*(int *)iov[i].value.iov_base) + printf("%s", params[i].name); + else { + nname = + nononame(params[i].name); + printf("%s", nname); + free(nname); + } + break; + default: + printf("%s=", params[i].name); + } } - err(1, "sysctlbyname(): security.jail.list"); + count = params[i].flags & PARAM_ARRAY + ? iov[i].value.iov_len / params[i].size : 1; + if (count == 0) + putchar('-'); + for (ai = 0; ai < count; ai++) { + if (ai > 0) + putchar(','); + switch (params[i].type) { + case CTLTYPE_INT: + printf("%d", ((int *) + iov[i].value.iov_base)[ai]); + break; + case CTLTYPE_UINT: + printf("%u", ((int *) + iov[i].value.iov_base)[ai]); + break; + case CTLTYPE_IPADDR: + if (inet_ntop(AF_INET, + &((struct in_addr *) + iov[i].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%s", ipbuf); + break; + case CTLTYPE_IP6ADDR: + if (inet_ntop(AF_INET6, + &((struct in6_addr *) + iov[i].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%s", ipbuf); + break; + case CTLTYPE_LONG: + printf("%ld", ((long *) + iov[i].value.iov_base)[ai]); + case CTLTYPE_ULONG: + printf("%lu", ((long *) + iov[i].value.iov_base)[ai]); + break; + case CTLTYPE_STRING: + if (pflags & PRINT_QUOTED) + quoted_print((char *) + iov[i].value.iov_base, + params[i].size); + else + printf("%.*s", + params[i].size, (char *) + iov[i].value.iov_base); + break; + case CTLTYPE_BOOL: + case CTLTYPE_NOBOOL: + if (!(pflags & PRINT_NAMEVAL)) + printf(((int *) + iov[i].value.iov_base)[ai] + ? "true" : "false"); + } + } } - break; + putchar('\n'); } - if (p == NULL) - err(1, "sysctlbyname(): security.jail.list"); - if (len < sizeof(int)) - errx(1, "This is no prison. Kernel and userland out of sync?"); - version = *(int *)p; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); + for (i = 0; i < nparams; i++) + if (params[i].value == NULL) + free(iov[i].value.iov_base); + return (jid); +} - if (flags & FLAG_V) { - printf(" JID Hostname Path\n"); - printf(" Name State\n"); - printf(" CPUSetID\n"); - printf(" IP Address(es)\n"); - } else { - printf(" JID IP Address Hostname" - " Path\n"); +static void +quoted_print(char *str, int len) +{ + int c, qc; + char *p = str; + char *ep = str + len; + + /* An empty string needs quoting. */ + if (!*p) { + fputs("\"\"", stdout); + return; } - for (; q != NULL && (char *)q + sizeof(int) < (char *)p + len;) { - version = *(int *)q; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); - switch (version) { -#ifdef SUPPORT_OLD_XPRISON - case 1: - q = print_xprison_v1(q, (char *)p + len, flags); - break; - case 2: - errx(1, "Version 2 was used by multi-IPv4 jail " - "implementations that never made it into the " - "official kernel."); - /* NOTREACHED */ - break; -#endif - case 3: - q = print_xprison_v3(q, (char *)p + len, flags); - break; - default: - errx(1, "Prison unknown. Kernel/userland out of sync?"); - /* NOTREACHED */ - break; - } + + /* + * The value will be surrounded by quotes if it contains spaces + * or quotes. + */ + qc = strchr(p, '\'') ? '"' + : strchr(p, '"') ? '\'' + : strchr(p, ' ') || strchr(p, '\t') ? '"' + : 0; + if (qc) + putchar(qc); + while (p < ep && (c = *p++)) { + if (c == '\\' || c == qc) + putchar('\\'); + putchar(c); } - - free(p); - exit(0); + if (qc) + putchar(qc); } Index: usr.sbin/jls/Makefile =================================================================== --- usr.sbin/jls/Makefile (revision 191694) +++ usr.sbin/jls/Makefile (working copy) @@ -4,6 +4,4 @@ MAN= jls.8 WARNS?= 6 -CFLAGS+= -DSUPPORT_OLD_XPRISON - .include Index: usr.sbin/jls/jls.8 =================================================================== --- usr.sbin/jls/jls.8 (revision 191694) +++ usr.sbin/jls/jls.8 (working copy) @@ -25,7 +25,7 @@ .\" .\" $FreeBSD$ .\" -.Dd November 29, 2008 +.Dd April 30, 2009 .Dt JLS 8 .Os .Sh NAME @@ -33,38 +33,59 @@ .Nd "list jails" .Sh SYNOPSIS .Nm -.Op Fl av +.Op Fl dhnqv +.Op Fl j Ar jail +.Op Ar parameter ... .Sh DESCRIPTION The .Nm -utility lists all jails. -By default only active jails are listed. +utility lists all active jails, or the specified jail. +Each jail is represented by one row which contains space-separated values of +the listed +.Ar parameters , +including the pseudo-parameter +.Va all +which will show all available jail parameters. +A list of available parameters can be retrieved via +.Dq Nm sysctl Fl d Va security.jail.param . .Pp -The options are as follows: -.Bl -tag -width ".Fl a" -.It Fl a -Show jails in all states, not only active ones. +If no +.Ar parameters +are given, the following four columns will be printed: +jail identifier (jid), IP address (ip4.addr), hostname (host.hostname), +and path (path). +.Pp +The following options are available: +.Bl -tag -width indent +.It Fl d +List +.Va dying +as well as active jails. +.It Fl h +Print a header line containing the parameters listed. +If no parameters are given on the command line, the default four-column +output always contains a header. +.It Fl n +Print parameters in +.Dq name=value +format, where each parameter is preceded by its name. +This option is ignored for the default four-column output. +.It Fl q +Put quotes around string parameters if they contain spaces or quotes, or are +the empty string. .It Fl v -Show more verbose information. -This also lists cpusets, jail state, multi-IP, etc. instead of the -classic single-IP jail output. +Print a multiple-line summary per jail, with the following parameters: +jail identifier (jid), hostname (host.hostname), path (path), +jail name (name), jail state (dying), cpuset ID (cpuset), +IP address(es) (ip4.addr and ip6.addr). +.It Fl j Ar jail +The jid or name of the +.Ar jail +to list. +Without this option, all active jails will be listed. .El -.Pp -Each jail is represented by rows which, depending on -.Fl v , -contain the following columns: -.Bl -item -offset indent -compact -.It -jail identifier (JID), hostname and path -.It -jail state and name -.It -jail cpuset -.It -followed by one IP adddress per line. -.El .Sh SEE ALSO -.Xr jail 2 , +.Xr jail_get 2 , .Xr jail 8 , .Xr jexec 8 .Sh HISTORY @@ -72,3 +93,5 @@ .Nm utility was added in .Fx 5.1 . +Extensible jail parameters were introduced in +.Fx 8.0 . Index: usr.sbin/jexec/jexec.c =================================================================== --- usr.sbin/jexec/jexec.c (revision 191694) +++ usr.sbin/jexec/jexec.c (working copy) @@ -29,12 +29,16 @@ #include #include +#include #include +#include +#include #include #include #include +#include #include #include #include @@ -43,154 +47,8 @@ #include static void usage(void); +static int addr2jid(const char *addr); -#ifdef SUPPORT_OLD_XPRISON -static -char *lookup_xprison_v1(void *p, char *end, int *id) -{ - struct xprison_v1 *xp; - - if (id == NULL) - errx(1, "Internal error. Invalid ID pointer."); - - if ((char *)p + sizeof(struct xprison_v1) > end) - errx(1, "Invalid length for jail"); - - xp = (struct xprison_v1 *)p; - - *id = xp->pr_id; - return ((char *)(xp + 1)); -} -#endif - -static -char *lookup_xprison_v3(void *p, char *end, int *id, char *jailname) -{ - struct xprison *xp; - char *q; - int ok; - - if (id == NULL) - errx(1, "Internal error. Invalid ID pointer."); - - if ((char *)p + sizeof(struct xprison) > end) - errx(1, "Invalid length for jail"); - - xp = (struct xprison *)p; - ok = 1; - - /* Jail state and name. */ - if (xp->pr_state < 0 || xp->pr_state >= - (int)((sizeof(prison_states) / sizeof(struct prison_state)))) - errx(1, "Invalid jail state."); - else if (xp->pr_state != PRISON_STATE_ALIVE) - ok = 0; - if (jailname != NULL) { - if (xp->pr_name[0] == '\0') - ok = 0; - else if (strcmp(jailname, xp->pr_name) != 0) - ok = 0; - } - - q = (char *)(xp + 1); - /* IPv4 addresses. */ - q += (xp->pr_ip4s * sizeof(struct in_addr)); - if ((char *)q > end) - errx(1, "Invalid length for jail"); - /* IPv6 addresses. */ - q += (xp->pr_ip6s * sizeof(struct in6_addr)); - if ((char *)q > end) - errx(1, "Invalid length for jail"); - - if (ok) - *id = xp->pr_id; - return (q); -} - -static int -lookup_jail(int jid, char *jailname) -{ - size_t i, j, len; - void *p, *q; - int version, id, xid, count; - - if (sysctlbyname("security.jail.list", NULL, &len, NULL, 0) == -1) - err(1, "sysctlbyname(): security.jail.list"); - - j = len; - for (i = 0; i < 4; i++) { - if (len == 0) - return (-1); - p = q = malloc(len); - if (p == NULL) - err(1, "malloc()"); - - if (sysctlbyname("security.jail.list", q, &len, NULL, 0) == -1) { - if (errno == ENOMEM) { - free(p); - p = NULL; - len += j; - continue; - } - err(1, "sysctlbyname(): security.jail.list"); - } - break; - } - if (p == NULL) - err(1, "sysctlbyname(): security.jail.list"); - if (len < sizeof(int)) - errx(1, "This is no prison. Kernel and userland out of sync?"); - version = *(int *)p; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); - - count = 0; - xid = -1; - for (; q != NULL && (char *)q + sizeof(int) < (char *)p + len;) { - version = *(int *)q; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); - id = -1; - switch (version) { -#ifdef SUPPORT_OLD_XPRISON - case 1: - if (jailname != NULL) - errx(1, "Version 1 prisons did not " - "support jail names."); - q = lookup_xprison_v1(q, (char *)p + len, &id); - break; - case 2: - errx(1, "Version 2 was used by multi-IPv4 jail " - "implementations that never made it into the " - "official kernel."); - /* NOTREACHED */ - break; -#endif - case 3: - q = lookup_xprison_v3(q, (char *)p + len, &id, jailname); - break; - default: - errx(1, "Prison unknown. Kernel/userland out of sync?"); - /* NOTREACHED */ - break; - } - /* Possible match; see if we have a jail ID to match as well. */ - if (id > 0 && (jid <= 0 || id == jid)) { - xid = id; - count++; - } - } - - free(p); - - if (count == 1) - return (xid); - else if (count > 1) - errx(1, "Could not uniquely identify the jail."); - else - return (-1); -} - #define GET_USER_INFO do { \ pwd = getpwnam(username); \ if (pwd == NULL) { \ @@ -210,22 +68,18 @@ int main(int argc, char *argv[]) { + struct iovec params[2]; int jid; login_cap_t *lcap = NULL; struct passwd *pwd = NULL; gid_t groups[NGROUPS]; - int ch, ngroups, uflag, Uflag; - char *jailname, *username; + int ch, ngroups, uflag, Uflag, hflag; + char *ep, *username; + ch = uflag = Uflag = hflag = 0; + username = NULL; - ch = uflag = Uflag = 0; - jailname = username = NULL; - jid = -1; - - while ((ch = getopt(argc, argv, "i:n:u:U:")) != -1) { + while ((ch = getopt(argc, argv, "u:U:h")) != -1) { switch (ch) { - case 'n': - jailname = optarg; - break; case 'u': username = optarg; uflag = 1; @@ -234,6 +88,9 @@ username = optarg; Uflag = 1; break; + case 'h': + hflag = 1; + break; default: usage(); } @@ -242,22 +99,24 @@ argv += optind; if (argc < 2) usage(); - if (strlen(argv[0]) > 0) { - jid = (int)strtol(argv[0], NULL, 10); - if (errno) - err(1, "Unable to parse jail ID."); - } - if (jid <= 0 && jailname == NULL) { - fprintf(stderr, "Neither jail ID nor jail name given.\n"); - usage(); - } if (uflag && Uflag) usage(); if (uflag) GET_USER_INFO; - jid = lookup_jail(jid, jailname); - if (jid <= 0) - errx(1, "Cannot identify jail."); + if (hflag) + jid = addr2jid(argv[0]); + else { + jid = strtoul(argv[0], &ep, 10); + if (!*argv[0] || *ep) { + *(const void **)¶ms[0].iov_base = "name"; + params[0].iov_len = sizeof("name"); + params[1].iov_base = argv[0]; + params[1].iov_len = strlen(argv[0]) + 1; + jid = jail_get(params, 2, 0); + if (jid < 0) + errx(1, "Unknown jail: %s", argv[0]); + } + } if (jail_attach(jid) == -1) err(1, "jail_attach(): %d", jid); if (chdir("/") == -1) @@ -285,6 +144,108 @@ fprintf(stderr, "%s%s\n", "usage: jexec [-u username | -U username]", - " [-n jailname] jid command ..."); + " [-h hostname | -h ip-number | jail] command ..."); exit(1); } + +static int +addr2jid(const char *addr) +{ + struct iovec params[6]; + struct in_addr ia; + struct in6_addr ia6; + int cnt, doip, foundjid, ii, jid, lastjid, sanity; + char hostbuf[MAXHOSTNAMELEN]; + + if (inet_pton(AF_INET, addr, &ia) > 0) + doip = 4; + else if (inet_pton(AF_INET6, addr, &ia6) > 0) + doip = 6; + else + doip = 0; + + *(const void **)¶ms[0].iov_base = "lastjid"; + params[0].iov_len = sizeof("lastjid"); + params[1].iov_base = &lastjid; + params[1].iov_len = sizeof(lastjid); + switch (doip) { + case 4: + *(const void **)¶ms[2].iov_base = "ip4.addr"; + params[2].iov_len = sizeof("ip4.addr"); + *(const void **)¶ms[4].iov_base = "host.hostname"; + params[4].iov_len = sizeof("host.hostname"); + params[5].iov_base = hostbuf; + params[5].iov_len = MAXHOSTNAMELEN; + break; + case 6: + *(const void **)¶ms[2].iov_base = "ip6.addr"; + params[2].iov_len = sizeof("ip6.addr"); + *(const void **)¶ms[4].iov_base = "host.hostname"; + params[4].iov_len = sizeof("host.hostname"); + params[5].iov_base = hostbuf; + params[5].iov_len = MAXHOSTNAMELEN; + break; + default: + *(const void **)¶ms[2].iov_base = "host.hostname"; + params[2].iov_len = sizeof("host.hostname"); + params[3].iov_base = hostbuf; + params[3].iov_len = MAXHOSTNAMELEN; + } + + cnt = foundjid = sanity = 0; + for (jid = 0;; jid = lastjid) { + if (doip != 0) { + params[3].iov_base = NULL; + params[3].iov_len = 0; + if (jail_get(params, 4, 0) < 0) + break; + params[3].iov_len += 5 * sizeof(struct in6_addr); + params[3].iov_base = malloc(params[3].iov_len); + jid = jail_get(params, 6, 0); + } else + jid = jail_get(params, 4, 0); + if (jid > 0) { + sanity = 0; + if (!strcmp(hostbuf, addr)) { + cnt++; + foundjid = jid; + } else switch (doip) { + case 4: + for (ii = (params[3].iov_len / + sizeof(struct in_addr)) - 1; ii >= 0; ii--) + if (((struct in_addr *)params[3]. + iov_base)[ii].s_addr == ia.s_addr) { + cnt++; + foundjid = jid; + break; + } + break; + case 6: + for (ii = (params[3].iov_len / + sizeof(struct in6_addr)) - 1; ii >= 0; + ii--) + if (IN6_ARE_ADDR_EQUAL(&ia6, + &((struct in6_addr *) + params[3].iov_base)[ii])) { + cnt++; + foundjid = jid; + break; + } + } + } else if (errno == ENOENT || ++sanity > 10) + break; + else + jid = lastjid; + if (doip != 0) + free(params[3].iov_base); + } + switch (cnt) + { + case 0: + errx(1, "Unknown jail: %s", addr); + case 1: + return foundjid; + default: + errx(1, "Could not uniquely identify the jail: %s", addr); + } +} Index: usr.sbin/jexec/jexec.8 =================================================================== --- usr.sbin/jexec/jexec.8 (revision 191694) +++ usr.sbin/jexec/jexec.8 (working copy) @@ -25,7 +25,7 @@ .\" .\" $FreeBSD$ .\" -.Dd November 29, 2008 +.Dd April 30, 2009 .Dt JEXEC 8 .Os .Sh NAME @@ -34,36 +34,22 @@ .Sh SYNOPSIS .Nm .Op Fl u Ar username | Fl U Ar username -.Op Fl n Ar jailname -.Ar jid command ... +.Op Fl h Ar hostname | Fl h Ar ip | Ar jid | Ar name +.Ar command ... .Sh DESCRIPTION The .Nm utility executes .Ar command -inside the jail identified by either -.Ar jailname +inside the jail identified by +.Ar hostname , +.Ar ip , +.Ar jid , or -.Ar jid -or both. +.Ar name . .Pp -If the jail cannot be identified uniquely by the given parameters, -an error message is printed. -.Nm -will also check the state of the jail (once supported) to be -.Dv ALIVE -and ignore jails in other states. -The mandatory argument -.Ar jid -is the unique jail identifier as given by -.Xr jls 8 . -In case you only want to match on other criteria, give an empty string. -.Pp The following options are available: .Bl -tag -width indent -.It Fl n Ar jailname -The name of the jail, if given upon creation of the jail. -This is not the hostname of the jail. .It Fl u Ar username The user name from host environment as whom the .Ar command @@ -73,6 +59,9 @@ .Ar command should run. .El +.Sh "CAUTIONS" +Only a jail's jid or name is guaranteed to uniquely identify the jail. +Hostname or ip only work here if matched to one unique jail. .Sh SEE ALSO .Xr jail_attach 2 , .Xr jail 8 , Index: usr.sbin/jexec/Makefile =================================================================== --- usr.sbin/jexec/Makefile (revision 191694) +++ usr.sbin/jexec/Makefile (working copy) @@ -6,6 +6,4 @@ LDADD= -lutil WARNS?= 6 -CFLAGS+= -DSUPPORT_OLD_XPRISON - .include Index: usr.sbin/jail/jail.c =================================================================== --- usr.sbin/jail/jail.c (revision 191694) +++ usr.sbin/jail/jail.c (working copy) @@ -1,5 +1,6 @@ /*- * Copyright (c) 1999 Poul-Henning Kamp. + * Copyright (c) 2009 James Gritton * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -29,51 +30,43 @@ #include #include -#include #include #include -#include +#include +#include #include -#include -#include +#include #include #include #include #include +#include #include #include #include #include -#include #include #include -static void usage(void); -static int add_addresses(struct addrinfo *); -static struct in_addr *copy_addr4(void); -#ifdef INET6 -static struct in6_addr *copy_addr6(void); -#endif +#define SJPARAM "security.jail.param" +#define ERRMSG_SIZE 256 -extern char **environ; - -struct addr4entry { - STAILQ_ENTRY(addr4entry) addr4entries; - struct in_addr ip4; - int count; +struct param { + struct iovec name; + struct iovec value; }; -struct addr6entry { - STAILQ_ENTRY(addr6entry) addr6entries; -#ifdef INET6 - struct in6_addr ip6; -#endif - int count; -}; -STAILQ_HEAD(addr4head, addr4entry) addr4 = STAILQ_HEAD_INITIALIZER(addr4); -STAILQ_HEAD(addr6head, addr6entry) addr6 = STAILQ_HEAD_INITIALIZER(addr6); +static struct param *params; +static int nparams; + +static void set_param(const char *name, char *value); +static void set_param_ip_hostname(char *value, int family); +static void usage(void); + +extern char **environ; + #define GET_USER_INFO do { \ pwd = getpwnam(username); \ if (pwd == NULL) { \ @@ -94,27 +87,28 @@ main(int argc, char **argv) { login_cap_t *lcap = NULL; - struct jail j; + struct iovec rparams[2]; struct passwd *pwd = NULL; gid_t groups[NGROUPS]; - int ch, error, i, ngroups, securelevel; - int hflag, iflag, Jflag, lflag, uflag, Uflag; - char path[PATH_MAX], *jailname, *ep, *username, *JidFile, *ip; + int ch, cmdarg, i, jail_set_flags, jid, ngroups, oldargs, securelevel; + int iflag, Jflag, lflag, rflag, uflag, Uflag; + char *ep, *username, *JidFile; + char errmsg[ERRMSG_SIZE]; static char *cleanenv; const char *shell, *p = NULL; long ltmp; FILE *fp; - struct addrinfo hints, *res0; - hflag = iflag = Jflag = lflag = uflag = Uflag = 0; - securelevel = -1; - jailname = username = JidFile = cleanenv = NULL; + iflag = Jflag = lflag = rflag = uflag = Uflag = 0; + jail_set_flags = JAIL_CREATE | JAIL_UPDATE; + cmdarg = jid = securelevel = -1; + username = JidFile = cleanenv = NULL; fp = NULL; - while ((ch = getopt(argc, argv, "hiln:s:u:U:J:")) != -1) { + while ((ch = getopt(argc, argv, "cdilor:s:u:U:J:")) != -1) { switch (ch) { - case 'h': - hflag = 1; + case 'd': + jail_set_flags |= JAIL_DYING; break; case 'i': iflag = 1; @@ -123,9 +117,6 @@ JidFile = optarg; Jflag = 1; break; - case 'n': - jailname = optarg; - break; case 's': ltmp = strtol(optarg, &ep, 0); if (*ep || ep == optarg || ltmp > INT_MAX || !ltmp) @@ -143,13 +134,41 @@ case 'l': lflag = 1; break; + case 'c': + jail_set_flags = + (jail_set_flags & ~JAIL_UPDATE) | JAIL_CREATE; + break; + case 'o': + jail_set_flags = + (jail_set_flags & ~JAIL_CREATE) | JAIL_UPDATE; + break; + case 'r': + jid = strtoul(optarg, &ep, 10); + if (!*optarg || *ep) { + *(const void **)&rparams[0].iov_base = "name"; + rparams[0].iov_len = sizeof("name"); + rparams[1].iov_base = optarg; + rparams[1].iov_len = strlen(optarg) + 1; + jid = jail_get(rparams, 2, 0); + if (jid < 0) + errx(1, "unknown jail: %s", optarg); + } + rflag = 1; + break; default: usage(); } } argc -= optind; argv += optind; - if (argc < 4) + if (rflag) { + if (argc > 0 || iflag || Jflag || lflag || uflag || Uflag) + usage(); + if (jail_remove(jid) < 0) + err(1, "jail_remove"); + exit (0); + } + if (argc == 0) usage(); if (uflag && Uflag) usage(); @@ -157,92 +176,70 @@ usage(); if (uflag) GET_USER_INFO; - if (realpath(argv[0], path) == NULL) - err(1, "realpath: %s", argv[0]); - if (chdir(path) != 0) - err(1, "chdir: %s", path); - /* Initialize struct jail. */ - memset(&j, 0, sizeof(j)); - j.version = JAIL_API_VERSION; - j.path = path; - j.hostname = argv[1]; - if (jailname != NULL) - j.jailname = jailname; - /* Handle IP addresses. If requested resolve hostname too. */ - bzero(&hints, sizeof(struct addrinfo)); - hints.ai_protocol = IPPROTO_TCP; - hints.ai_socktype = SOCK_STREAM; - if (JAIL_API_VERSION < 2) - hints.ai_family = PF_INET; - else - hints.ai_family = PF_UNSPEC; - /* Handle hostname. */ - if (hflag != 0) { - error = getaddrinfo(j.hostname, NULL, &hints, &res0); - if (error != 0) - errx(1, "failed to handle hostname: %s", - gai_strerror(error)); - error = add_addresses(res0); - freeaddrinfo(res0); - if (error != 0) - errx(1, "failed to add addresses."); + /* + * If the first argument (path) starts with a slash, and the third + * argument (IP address) starts with a digit, it is likely to be + * an old-style fixed-parameter command line. + */ + oldargs = argc >= 4 && argv[0][0] == '/' && isdigit(argv[2][0]); + if (oldargs) { + if ((jail_set_flags & (JAIL_CREATE | JAIL_UPDATE)) != + (JAIL_CREATE | JAIL_UPDATE)) + usage(); + jail_set_flags = JAIL_CREATE | JAIL_ATTACH; + set_param("path", argv[0]); + set_param("host.hostname", argv[1]); + set_param("ip4.addr", argv[2]); + cmdarg = 3; + } else { + for (i = 0; i < argc; i++) + if (!strncmp(argv[i], "command=", 8)) { + cmdarg = i; + argv[cmdarg] += 8; + jail_set_flags |= JAIL_ATTACH; + break; + } else + set_param(NULL, argv[i]); } - /* Handle IP addresses. */ - hints.ai_flags = AI_NUMERICHOST; - ip = strtok(argv[2], ","); - while (ip != NULL) { - error = getaddrinfo(ip, NULL, &hints, &res0); - if (error != 0) - errx(1, "failed to handle ip: %s", gai_strerror(error)); - error = add_addresses(res0); - freeaddrinfo(res0); - if (error != 0) - errx(1, "failed to add addresses."); - ip = strtok(NULL, ","); - } - /* Count IP addresses and add them to struct jail. */ - if (!STAILQ_EMPTY(&addr4)) { - j.ip4s = STAILQ_FIRST(&addr4)->count; - j.ip4 = copy_addr4(); - if (j.ip4s > 0 && j.ip4 == NULL) - errx(1, "copy_addr4()"); - } -#ifdef INET6 - if (!STAILQ_EMPTY(&addr6)) { - j.ip6s = STAILQ_FIRST(&addr6)->count; - j.ip6 = copy_addr6(); - if (j.ip6s > 0 && j.ip6 == NULL) - errx(1, "copy_addr6()"); - } -#endif + errmsg[0] = 0; + set_param("errmsg", errmsg); if (Jflag) { fp = fopen(JidFile, "w"); if (fp == NULL) errx(1, "Could not create JidFile: %s", JidFile); } - i = jail(&j); - if (i == -1) - err(1, "syscall failed with"); + jid = jail_set(¶ms->name, 2 * nparams, jail_set_flags); + if (jid < 0) { + if (errmsg[0] != '\0') + errx(1, "%s", errmsg); + err(1, "jail_set"); + } if (iflag) { - printf("%d\n", i); + printf("%d\n", jid); fflush(stdout); } if (Jflag) { - if (fp != NULL) { + if (oldargs) fprintf(fp, "%d\t%s\t%s\t%s\t%s\n", - i, j.path, j.hostname, argv[2], argv[3]); - (void)fclose(fp); - } else { - errx(1, "Could not write JidFile: %s", JidFile); + jid, (char *)params[0].value.iov_base, + argv[1], argv[2], argv[3]); + else { + fprintf(fp, "%d", jid); + for (i = 0; i < argc; i++) + fprintf(fp, "\t%s", argv[i]); + fprintf(fp, "\n"); } + (void)fclose(fp); } if (securelevel > 0) { if (sysctlbyname("kern.securelevel", NULL, 0, &securelevel, sizeof(securelevel))) err(1, "Can not set securelevel to %d", securelevel); } + if (cmdarg < 0) + exit(0); if (username != NULL) { if (Uflag) GET_USER_INFO; @@ -272,158 +269,256 @@ if (p) setenv("TERM", p, 1); } - if (execv(argv[3], argv + 3) != 0) - err(1, "execv: %s", argv[3]); - exit(0); + execvp(argv[cmdarg], argv + cmdarg); + err(1, "execvp: %s", argv[cmdarg]); } static void -usage(void) +set_param(const char *name, char *value) { + struct param *param; + char *ep, *p; + size_t buflen, mlen; + int i, nval, mib[CTL_MAXNAME]; + char buf[MAXPATHLEN]; - (void)fprintf(stderr, "%s%s%s\n", - "usage: jail [-hi] [-n jailname] [-J jid_file] ", - "[-s securelevel] [-l -u username | -U username] ", - "path hostname [ip[,..]] command ..."); - exit(1); -} + static int paramlistsize; -static int -add_addresses(struct addrinfo *res0) -{ - int error; - struct addrinfo *res; - struct addr4entry *a4p; - struct sockaddr_in *sai; + /* Separate the name from the value, if not done already. */ + if (name == NULL) { + name = value; + if ((value = strchr(value, '='))) + *value++ = '\0'; + } + + /* Handle pseudo-parameters separately. */ + if (!strcmp(name, "ip4_hostname")) { + set_param_ip_hostname(value, AF_INET); + return; + } #ifdef INET6 - struct addr6entry *a6p; - struct sockaddr_in6 *sai6; + if (!strcmp(name, "ip6_hostname")) { + set_param_ip_hostname(value, AF_INET6); + return; + } #endif - int count; - error = 0; - for (res = res0; res && error == 0; res = res->ai_next) { - switch (res->ai_family) { - case AF_INET: - sai = (struct sockaddr_in *)(void *)res->ai_addr; - STAILQ_FOREACH(a4p, &addr4, addr4entries) { - if (bcmp(&sai->sin_addr, &a4p->ip4, - sizeof(struct in_addr)) == 0) { - err(1, "Ignoring duplicate IPv4 address."); - break; - } - } - a4p = (struct addr4entry *) malloc( - sizeof(struct addr4entry)); - if (a4p == NULL) { - error = 1; - break; - } - bzero(a4p, sizeof(struct addr4entry)); - bcopy(&sai->sin_addr, &a4p->ip4, - sizeof(struct in_addr)); - if (!STAILQ_EMPTY(&addr4)) - count = STAILQ_FIRST(&addr4)->count; - else - count = 0; - STAILQ_INSERT_TAIL(&addr4, a4p, addr4entries); - STAILQ_FIRST(&addr4)->count = count + 1; + /* Check for repeat parameters */ + for (i = 0; i < nparams; i++) + if (!strcmp(name, params[i].name.iov_base)) { + memcpy(params + i, params + i + 1, + (--nparams - i) * sizeof(struct param)); break; + } + + /* Make sure there is room for the new param record. */ + if (!nparams) { + paramlistsize = 32; + params = malloc(paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "malloc"); + } else if (nparams >= paramlistsize) { + paramlistsize *= 2; + params = realloc(params, paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "realloc"); + } + + /* Look up the paramter. */ + param = params + nparams++; + *(const void **)¶m->name.iov_base = name; + param->name.iov_len = strlen(name) + 1; + /* Trivial values - no value or errmsg. */ + if (value == NULL) { + param->value.iov_base = value; + param->value.iov_len = 0; + return; + } + if (!strcmp(name, "errmsg")) { + param->value.iov_base = value; + param->value.iov_len = ERRMSG_SIZE; + return; + } + mib[0] = 0; + mib[1] = 3; + snprintf(buf, sizeof(buf), SJPARAM ".%s", name); + mlen = sizeof(mib) - 2 * sizeof(int); + if (sysctl(mib, 2, mib + 2, &mlen, buf, strlen(buf)) < 0) + errx(1, "unknown parameter: %s", name); + mib[1] = 4; + buflen = sizeof(buf); + if (sysctl(mib, (mlen / sizeof(int)) + 2, buf, &buflen, NULL, 0) < 0) + err(1, "sysctl(0.4.%s)", name); + /* + * See if this is an array type. + * Treat non-arrays as an array of one. + */ + p = strchr(buf + sizeof(int), '\0'); + nval = 1; + if (p - 2 >= buf && !strcmp(p - 2, ",a")) { + if (value[0] == '\0' || + (value[0] == '-' && value[1] == '\0')) { + param->value.iov_base = value; + param->value.iov_len = 0; + return; + } + p[-2] = 0; + for (p = strchr(value, ','); p; p = strchr(p + 1, ',')) { + *p = 0; + nval++; + } + } + + /* Set the values according to the parameter type. */ + switch (*(int *)buf & CTLTYPE) { + case CTLTYPE_INT: + case CTLTYPE_UINT: + param->value.iov_len = nval * sizeof(int); + break; + case CTLTYPE_LONG: + case CTLTYPE_ULONG: + param->value.iov_len = nval * sizeof(long); + break; + case CTLTYPE_STRUCT: + if (!strcmp(buf + sizeof(int), "S,in_addr")) + param->value.iov_len = nval * sizeof(struct in_addr); #ifdef INET6 - case AF_INET6: - sai6 = (struct sockaddr_in6 *)(void *)res->ai_addr; - STAILQ_FOREACH(a6p, &addr6, addr6entries) { - if (bcmp(&sai6->sin6_addr, &a6p->ip6, - sizeof(struct in6_addr)) == 0) { - err(1, "Ignoring duplicate IPv6 address."); - break; - } + else if (!strcmp(buf + sizeof(int), "S,in6_addr")) + param->value.iov_len = nval * sizeof(struct in6_addr); +#endif + else + errx(1, "%s: unknown parameter structure (%s)", + name, buf + sizeof(int)); + break; + case CTLTYPE_STRING: + if (!strcmp(name, "path")) { + param->value.iov_base = malloc(MAXPATHLEN); + if (param->value.iov_base == NULL) + err(1, "malloc"); + if (realpath(value, param->value.iov_base) == NULL) + err(1, "%s: realpath(%s)", name, value); + if (chdir(param->value.iov_base) != 0) + err(1, "chdir: %s", + (char *)param->value.iov_base); + } else + param->value.iov_base = value; + param->value.iov_len = strlen(param->value.iov_base) + 1; + return; + default: + errx(1, "%s: unknown parameter type %d (%s)", + name, *(int *)buf, buf + sizeof(int)); + } + param->value.iov_base = malloc(param->value.iov_len); + for (i = 0; i < nval; i++) { + switch (*(int *)buf & CTLTYPE) { + case CTLTYPE_INT: + ((int *)param->value.iov_base)[i] = + strtol(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_UINT: + ((unsigned *)param->value.iov_base)[i] = + strtoul(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_LONG: + ((long *)param->value.iov_base)[i] = + strtol(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_ULONG: + ((unsigned long *)param->value.iov_base)[i] = + strtoul(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_STRUCT: + if (!strcmp(buf + sizeof(int), "S,in_addr")) { + if (inet_pton(AF_INET, value, + &((struct in_addr *) + param->value.iov_base)[i]) != 1) + errx(1, "%s: not an IPv4 address: %s", + name, value); } - a6p = (struct addr6entry *) malloc( - sizeof(struct addr6entry)); - if (a6p == NULL) { - error = 1; - break; +#ifdef INET6 + else if (!strcmp(buf + sizeof(int), "S,in6_addr")) { + if (inet_pton(AF_INET6, value, + &((struct in6_addr *) + param->value.iov_base)[i]) != 1) + errx(1, "%s: not an IPv6 address: %s", + name, value); } - bzero(a6p, sizeof(struct addr6entry)); - bcopy(&sai6->sin6_addr, &a6p->ip6, - sizeof(struct in6_addr)); - if (!STAILQ_EMPTY(&addr6)) - count = STAILQ_FIRST(&addr6)->count; - else - count = 0; - STAILQ_INSERT_TAIL(&addr6, a6p, addr6entries); - STAILQ_FIRST(&addr6)->count = count + 1; - break; #endif - default: - err(1, "Address family %d not supported. Ignoring.\n", - res->ai_family); - break; } + value = strchr(value, '\0') + 1; } - - return (error); } -static struct in_addr * -copy_addr4(void) +static void +set_param_ip_hostname(char *value, int family) { - size_t len; - struct in_addr *ip4s, *p, ia; - struct addr4entry *a4p; + struct addrinfo hints, *ai0, *ai; + char *avalue, *nextav; + socklen_t avlen; + int error; - if (STAILQ_EMPTY(&addr4)) - return NULL; + /* Look up the hostname in the specified address family. */ + memset(&hints, 0, sizeof(hints)); + hints.ai_family = family; + error = getaddrinfo(value, NULL, &hints, &ai0); + if (error != 0) + errx(1, "hostname %s: %s", value, gai_strerror(error)); - len = STAILQ_FIRST(&addr4)->count * sizeof(struct in_addr); - - ip4s = p = (struct in_addr *)malloc(len); - if (ip4s == NULL) - return (NULL); - - bzero(p, len); - - while (!STAILQ_EMPTY(&addr4)) { - a4p = STAILQ_FIRST(&addr4); - STAILQ_REMOVE_HEAD(&addr4, addr4entries); - ia.s_addr = a4p->ip4.s_addr; - bcopy(&ia, p, sizeof(struct in_addr)); - p++; - free(a4p); + /* Convert the addresses to ASCII so set_param can convert them back. */ + avlen = 0; + for (ai = ai0; ai; ai = ai->ai_next) + avlen++; + avlen *= +#ifdef INET6 + family == AF_INET6 ? INET6_ADDRSTRLEN : +#endif + INET_ADDRSTRLEN; + avalue = malloc(avlen); + if (avalue == NULL) + err(1, "malloc"); + avalue[0] = 0; + for (nextav = avalue, ai = ai0; ai; ai = ai->ai_next) { + if (inet_ntop(family, +#ifdef INET6 + family == AF_INET6 ? + (void *)&((struct sockaddr_in6 *)&ai->ai_addr)->sin6_addr : +#endif + (void *)&((struct sockaddr_in *)&ai->ai_addr)->sin_addr, + nextav, avlen - (nextav - avalue)) == NULL) + err(1, "inet_ntop"); + if (ai->ai_next) { + nextav = strchr(nextav, '\0'); + *nextav++ = ','; + } } - - return (ip4s); + set_param( +#ifdef INET6 + family == AF_INET6 ? "ip6.addr" : +#endif + "ip4.addr", avalue); } -#ifdef INET6 -static struct in6_addr * -copy_addr6(void) +static void +usage(void) { - size_t len; - struct in6_addr *ip6s, *p; - struct addr6entry *a6p; - if (STAILQ_EMPTY(&addr6)) - return NULL; - - len = STAILQ_FIRST(&addr6)->count * sizeof(struct in6_addr); - - ip6s = p = (struct in6_addr *)malloc(len); - if (ip6s == NULL) - return (NULL); - - bzero(p, len); - - while (!STAILQ_EMPTY(&addr6)) { - a6p = STAILQ_FIRST(&addr6); - STAILQ_REMOVE_HEAD(&addr6, addr6entries); - bcopy(&a6p->ip6, p, sizeof(struct in6_addr)); - p++; - free(a6p); - } - - return (ip6s); + (void)fprintf(stderr, + "usage: jail [-d] [-i] [-J jid_file] [-s securelevel]\n" + " [-l -u username | -U username]\n" + " [[-c | -o] param=value ... [command=command ...] |\n" + " path hostname ip command ...]\n" + " jail [-r jail]\n"); + exit(1); } -#endif - Index: usr.sbin/jail/jail.8 =================================================================== --- usr.sbin/jail/jail.8 (revision 191694) +++ usr.sbin/jail/jail.8 (working copy) @@ -1,5 +1,6 @@ .\" .\" Copyright (c) 2000, 2003 Robert N. M. Watson +.\" Copyright (c) 2008 James Gritton .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without @@ -33,49 +34,37 @@ .\" .\" $FreeBSD$ .\" -.Dd January 24, 2009 +.Dd April 30, 2009 .Dt JAIL 8 .Os .Sh NAME .Nm jail -.Nd "imprison process and its descendants" +.Nd "create or modify a system jail" .Sh SYNOPSIS .Nm -.Op Fl hi -.Op Fl n Ar jailname +.Op Fl di .Op Fl J Ar jid_file .Op Fl s Ar securelevel .Op Fl l u Ar username | Fl U Ar username -.Ar path hostname [ip[,..]] command ... +.Op Fl c | o +.Op Ar parameter=value ... | path hostname ip command ... +.Br +.Nm +.Op Fl r Ar jail .Sh DESCRIPTION The .Nm -utility imprisons a process and all future descendants. +utility creates a new jail or modifies an existing jail, optionally +imprisoning the current process (and future descendants) inside it. .Pp The options are as follows: -.Bl -tag -width ".Fl u Ar username" -.It Fl h -Resolve -.Va hostname -and add all IP addresses returned by the resolver -to the list of -.Va ip-addresses -for this prison. -This may affect default address selection for outgoing IPv4 connections -of prisons. -The address first returned by the resolver for each address family -will be used as primary address. -See -.Va ip-addresses -further down for details. +.Bl -tag -width indent +.It Fl d +Allow making changes to a +.Va +dying jail. .It Fl i Output the jail identifier of the newly created jail. -.It Fl n Ar jailname -Assign and administrative name to the jail that can be used for management -or auditing purposes. -The system will -.Sy not enforce -the name to be unique. .It Fl J Ar jid_file Write a .Ar jid_file @@ -100,7 +89,10 @@ .It Fl s Ar securelevel Sets the .Va kern.securelevel -sysctl variable to the specified value inside the newly created jail. +MIB entry to the specified value inside the newly created jail. +This is equivalent to setting the jail's +.Va securelevel +parameter. .It Fl u Ar username The user name from host environment as whom the .Ar command @@ -109,20 +101,156 @@ The user name from jailed environment as whom the .Ar command should run. -.It Ar path +.It Fl c +Create a new jail, but do not modify an existing one. +Default behavior is to allow modification if a +.Va jid +or +.Va name +parameter refers to an existing jail. +.It Fl o +Only modify an existing jail, but do not create one. +One of the +.Va jid +or +.Va name +parameters must exist and refer to an existing jail. +.It Fl r +Remove the +.Ar jail +specified by jid or name. +All jailed processes are killed. +.El +.Pp +.Ar Parameters +are listed in +.Dq name=value +form, following the options. +Some parameters are boolean, and do not have a value but are set by the +name alone with or without a +.Dq no +prefix, e.g. +.Va persist +or +.Va nopersist . +Any parameters not set will be given default values, generally based on the +current environment. +.Pp +The pseudo-parameter +.Va command +specifies that the current process should enter the new (or modified) jail, +and run the specified command. +It must be the last parameter specified, because it includes not only +the value following the +.Sq = +sign, but also passes the rest of the arguments to the command. +.Pp +Instead of supplying named +.Ar parameters , +four fixed parameters may be supplied in order on the command line: +.Ar path , +.Ar hostname , +.Ar ip , +and +.Ar command . +As the +.Va jid +and +.Va name +parameters aren't in this list, this mode will always create a new jail, and +the +.Fl c +and +.Fl o +options don't apply. +.Pp +Jails have a set a core parameters, and modules can add their own jail +parameters. +The current set of available parameters can be retrieved via +.Dq Nm sysctl Fl d Va security.jail.param . +Some of the notable core parameters include: +.Bl -tag -width indent +.It Va jid +The jail identifier. +This will be assigned automatically to a new jail (or can be explicitly +set), and can be used to identify the jail for later modification, or +for such commands as +.Xr jls 8 +or +.Xr jexec 8 . +.It Va name +The jail name. +This is an arbitrary string that identifies a jail. +Like the +.Va jid , +it can be passed to later +.Nm +commands, or to +.Xr jls 8 +or +.Xr jexec 8 . +If no +.Va name +is supplied, a default is assumed that is the same as the +.Va jid . +.It Va path Directory which is to be the root of the prison. -.It Ar hostname -Hostname of the prison. -.It Ar ip-addresses -None, one or more IPv4 and IPv6 addresses assigned to the prison. -The first address of each address family that was assigned to the jail will -be used as the source address in case source address selection on unbound -sockets cannot find a better match. +The +.Va command +(if any) is run from this directory, as are commands from +.Xr jexec 8 . +.It Va ip4.addr +A comma-separated list of IPv4 addresses assigned to the prison. +If this is set, the jail is restricted to using only these address. +Any attempts to use other addresses fail, and attempts to use wildcard +addresses silently use the jailed address instead. +For IPv4 the first address given will be kept used as the source address +in case source address selection on unbound sockets cannot find a better +match. It is only possible to start multiple jails with the same IP address, if none of the jails has more than this single overlapping IP address -assigned to itself for the address family in question. -.It Ar command -Pathname of the program which is to be executed. +assigned to itself. +.Pp +A list of zero elements (an empty string) will stop the jail from using IPv4 +entirely; setting the boolean parameter +.Ar noip4 +will not restrict the jail at all. +.It Va ip6.addr +A list of IPv6 addresses assigned to the prison, the counterpart to +.Ar ip4.addr +above. +.It Va host.hostname +Hostname of the prison. +If not specified, a jail will use the system hostname. +.It Va ip4_hostname +.It Va ip6_hostname +These psuedo-parameters actually set the jail's +.Va ip4 +and +.Va ip6 +parameters, but will get those addresses by resolving the supplied hostname. +.It Va securelevel +The value of the jail's +.Va kern.securelevel +sysctl. +A jail never has a lower securelevel than the default system, but by +setting this parameter it may have a higher one. +If the system securelevel is changed, any jail securelevels will be at +least as secure. +.It Va persist +Setting this boolean parameter allows a jail to exist without any +processes. +Normally, a jail is destroyed as its last process exits. +.It Va command +The command to run after creating or modifying the jail. +This command is run inside the jail, under the +.Va path +directory. +A new jail must have either the +.Va persist +or +.Va command +parameter set. .El .Pp Jails are typically set up using one of two philosophies: either to @@ -142,10 +270,6 @@ This manual page documents the configuration steps necessary to support either of these steps, although the configuration steps may be refined based on local requirements. -.Pp -Please see the -.Xr jail 2 -man page for further details. .Sh EXAMPLES .Ss "Setting up a Jail Directory Tree" To set up a jail directory tree containing an entire @@ -605,7 +729,7 @@ a jail. This functionality is disabled by default, but can be enabled by setting this MIB entry to 1. -.It Va security.jail.jail_max_af_ips +.It Va security.jail.max_af_ips This MIB entry determines how may address per address family a prison may have. The default is 255. .El @@ -641,7 +765,7 @@ .Xr ps 1 , .Xr quota 1 , .Xr chroot 2 , -.Xr jail 2 , +.Xr jail_set 2 , .Xr jail_attach 2 , .Xr procfs 5 , .Xr rc.conf 5 , @@ -665,6 +789,8 @@ .Nm utility appeared in .Fx 4.0 . +Extensible jail parameters were introduced in +.Fx 8.0 . .Sh AUTHORS .An -nosplit The jail feature was written by @@ -683,6 +809,9 @@ originally done by .An Pawel Jakub Dawidek for IPv4. +.Pp +.An James Gritton +added the extensible jail parameters. .Sh BUGS Jail currently lacks the ability to allow access to specific jail information via Index: sys/sys/jail.h =================================================================== --- sys/sys/jail.h (revision 191694) +++ sys/sys/jail.h (working copy) @@ -84,19 +84,11 @@ struct in6_addr pr_ip6[]; #endif }; -#define XPRISON_VERSION 3 +#define XPRISON_VERSION 3 -static const struct prison_state { - int pr_state; - const char * state_name; -} prison_states[] = { -#define PRISON_STATE_INVALID 0 - { PRISON_STATE_INVALID, "INVALID" }, -#define PRISON_STATE_ALIVE 1 - { PRISON_STATE_ALIVE, "ALIVE" }, -#define PRISON_STATE_DYING 2 - { PRISON_STATE_DYING, "DYING" }, -}; +#define PRISON_STATE_INVALID 0 +#define PRISON_STATE_ALIVE 1 +#define PRISON_STATE_DYING 2 /* * Flags for jail_set and jail_get. --------------080804010107070301060002-- From owner-freebsd-virtualization@FreeBSD.ORG Mon May 4 06:42:11 2009 Return-Path: Delivered-To: virtualization@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 07D0A1065676; Mon, 4 May 2009 06:42:11 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id C05998FC19; Mon, 4 May 2009 06:42:10 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 96DDA78D0B; Mon, 4 May 2009 06:25:20 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.3/8.14.3) with ESMTP id n446PKQI004425; Mon, 4 May 2009 06:25:20 GMT (envelope-from phk@critter.freebsd.dk) To: Jamie Gritton From: "Poul-Henning Kamp" In-Reply-To: Your message of "Sun, 03 May 2009 20:31:35 CST." <49FE5387.3020503@FreeBSD.org> Date: Mon, 04 May 2009 06:25:20 +0000 Message-ID: <4424.1241418320@critter.freebsd.dk> Sender: phk@critter.freebsd.dk X-Mailman-Approved-At: Mon, 04 May 2009 06:47:16 +0000 Cc: virtualization@FreeBSD.org, jail@FreeBSD.org, current@FreeBSD.org Subject: Re: New jail framework - the userland side X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 May 2009 06:42:11 -0000 In message <49FE5387.3020503@FreeBSD.org>, Jamie Gritton writes: >Hi all. I recently added some new jail-related system calls to extend >the current jail system with an nmount-inspired name=value interface. I think this is a great move in the right direction, my only concern is that we should try to share as much of the string-munging code between the nmount and jail implementations as possible. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-virtualization@FreeBSD.ORG Mon May 4 12:27:33 2009 Return-Path: Delivered-To: virtualization@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5163E1065678 for ; Mon, 4 May 2009 12:27:33 +0000 (UTC) (envelope-from nvass9573@gmx.com) Received: from mail.gmx.com (unknown [213.165.64.42]) by mx1.freebsd.org (Postfix) with SMTP id 866738FC1D for ; Mon, 4 May 2009 12:27:32 +0000 (UTC) (envelope-from nvass9573@gmx.com) Received: (qmail invoked by alias); 04 May 2009 12:27:30 -0000 Received: from ipa114.43.107.79.tellas.gr (EHLO [169.254.0.4]) [79.107.43.114] by mail.gmx.com (mp-eu005) with SMTP; 04 May 2009 14:27:30 +0200 X-Authenticated: #46156728 X-Provags-ID: V01U2FsdGVkX1+KNyVDOa6KHwRjka8QRHPvYmT/SD5jRK9aboWJ6Z Erl42sni4yBflc Message-ID: <49FEDF25.9060901@gmx.com> Date: Mon, 04 May 2009 15:27:17 +0300 From: Nikos Vassiliadis User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: Jamie Gritton References: <20090413.220932.74699777.sthaug@nethelp.no> <49E57076.7040509@elischer.org> <20090424202923.235660@gmx.net> <200904242249.27640.zec@icir.org> <20090425133006.311010@gmx.net> <20090502131259.31160@gmx.net> <49FC78DA.2010201@elischer.org> <20090503103244.44760@gmx.net> <49FDD9B9.7090403@elischer.org> <49FDDD02.3090803@gmx.com> <49FE5937.3000606@FreeBSD.org> In-Reply-To: <49FE5937.3000606@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-FuHaFi: 0.67 Cc: virtualization@FreeBSD.org Subject: Re: VIMAGE X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 May 2009 12:27:33 -0000 Jamie Gritton wrote: > Jails will be able to exist without processes, and in fact with nothing > more than a vimage attached. Ah that's what I was looking for. But much of vimage only makes sense in > conjunction with processes - a process attached to a vimage can see that > vimage's network interfaces. There are still things like routing that > work independent of processes I suppose, but it seems to me much what a > vimage does is provide the network stack to the processes it's tied to. Yet, VIMAGE is very similar in concept with VRF (http://en.wikipedia.org/wiki/VRF) and I think FreeBSD will look very promising in router-like applications:) Maybe there are applications of VIMAGE which haven't been considered by its developers. Time will tell... Nikos From owner-freebsd-virtualization@FreeBSD.ORG Mon May 4 13:17:58 2009 Return-Path: Delivered-To: virtualization@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CA40E106566B; Mon, 4 May 2009 13:17:58 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from gritton.org (gritton.org [161.58.222.4]) by mx1.freebsd.org (Postfix) with ESMTP id 8EB8C8FC21; Mon, 4 May 2009 13:17:58 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from glorfindel.gritton.org (c-76-27-80-223.hsd1.ut.comcast.net [76.27.80.223]) (authenticated bits=0) by gritton.org (8.13.6.20060614/8.13.6) with ESMTP id n44DHu8T003446; Mon, 4 May 2009 07:17:57 -0600 (MDT) Message-ID: <49FEEB03.7060908@FreeBSD.org> Date: Mon, 04 May 2009 07:17:55 -0600 From: Jamie Gritton User-Agent: Thunderbird 2.0.0.19 (X11/20090220) MIME-Version: 1.0 To: Poul-Henning Kamp References: <4424.1241418320@critter.freebsd.dk> In-Reply-To: <4424.1241418320@critter.freebsd.dk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.94.2/9325/Mon May 4 06:17:20 2009 on gritton.org X-Virus-Status: Clean Cc: virtualization@FreeBSD.org, jail@FreeBSD.org, current@FreeBSD.org Subject: Re: New jail framework - the userland side X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 May 2009 13:17:59 -0000 Poul-Henning Kamp wrote: > In message <49FE5387.3020503@FreeBSD.org>, Jamie Gritton writes: > >> Hi all. I recently added some new jail-related system calls to extend >> the current jail system with an nmount-inspired name=value interface. > > I think this is a great move in the right direction, my only concern is > that we should try to share as much of the string-munging code between > the nmount and jail implementations as possible. Most if it is shared - jail actually calls vfs_getopt and related calls from the family. I might want to spin those functions off into their own subsystem at some point, now that they're officially used outside of VFS. I did have to extend things somewhat for jail_get, as nmount is write- only and only had to deal with one module at a time (the filesystem type). Those extensions are available for use elsewhere, as I suspect filesystems and jails aren't the only place where we could use name- based extensibility. - Jamie From owner-freebsd-virtualization@FreeBSD.ORG Thu May 7 16:01:13 2009 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 86283106566C for ; Thu, 7 May 2009 16:01:13 +0000 (UTC) (envelope-from Carlos.Paniago@cnptia.embrapa.br) Received: from plutao.cnptia.embrapa.br (plutao.cnptia.embrapa.br [200.0.70.9]) by mx1.freebsd.org (Postfix) with ESMTP id 3E3568FC16 for ; Thu, 7 May 2009 16:01:13 +0000 (UTC) (envelope-from Carlos.Paniago@cnptia.embrapa.br) Received: from localhost (localhost.cnptia.embrapa.br [127.0.0.1]) by plutao.cnptia.embrapa.br (Postfix) with ESMTP id 26AD38477C for ; Thu, 7 May 2009 12:41:59 -0300 (BRT) X-Virus-Scanned: amavisd-new at cnptia.embrapa.br Received: from plutao.cnptia.embrapa.br ([127.0.0.1]) by localhost (plutao.cnptia.embrapa.br [127.0.0.1]) (amavisd-new, port 10024) with LMTP id bpgjij0BcPvy for ; Thu, 7 May 2009 12:41:55 -0300 (BRT) Received: from a037.cnptia.embrapa.br (a037.cnptia.embrapa.br [10.129.252.37]) by plutao.cnptia.embrapa.br (Postfix) with ESMTP id D629084730 for ; Thu, 7 May 2009 12:41:55 -0300 (BRT) Message-ID: <4A030140.4060509@cnptia.embrapa.br> Date: Thu, 07 May 2009 12:41:52 -0300 From: Carlos Fernando Assis Paniago User-Agent: Thunderbird 2.0.0.21 (X11/20090406) MIME-Version: 1.0 To: freebsd-virtualization@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Thu, 07 May 2009 16:08:28 +0000 Subject: Port of virtualbox in FreeBSD i386/amd64. X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 May 2009 16:01:13 -0000 People: look at the article bellow: http://miwi.bsdcrew.de/2009/05/virtualbox-on-freebsd/ we can get the port in: svn co http://svn.bluelife.at/projects/packages/blueports/emulators/virtualbox/ I'm compiling in an amd64 machine... Now we can help with virtualbox, because there is a port for it (unofficial). Paniago From owner-freebsd-virtualization@FreeBSD.ORG Fri May 8 04:35:11 2009 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 45335106564A for ; Fri, 8 May 2009 04:35:11 +0000 (UTC) (envelope-from clcchu@hotmail.com) Received: from col0-omc2-s9.col0.hotmail.com (col0-omc2-s9.col0.hotmail.com [65.55.34.83]) by mx1.freebsd.org (Postfix) with ESMTP id 234AD8FC12 for ; Fri, 8 May 2009 04:35:10 +0000 (UTC) (envelope-from clcchu@hotmail.com) Received: from COL101-W30 ([65.55.34.72]) by col0-omc2-s9.col0.hotmail.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 7 May 2009 21:35:10 -0700 Message-ID: X-Originating-IP: [123.203.190.250] From: Clarence Chu To: , Date: Fri, 8 May 2009 12:35:10 +0800 Importance: Normal In-Reply-To: <4A030140.4060509@cnptia.embrapa.br> References: <4A030140.4060509@cnptia.embrapa.br> MIME-Version: 1.0 X-OriginalArrivalTime: 08 May 2009 04:35:10.0982 (UTC) FILETIME=[5F6D6A60:01C9CF96] Content-Type: text/plain; charset="big5" Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: RE: Port of virtualbox in FreeBSD i386/amd64, emulators/qemu X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 May 2009 04:35:11 -0000 > Subject: Port of virtualbox in FreeBSD i386/amd64. > > People: look at the article bellow: > > http://miwi.bsdcrew.de/2009/05/virtualbox-on-freebsd/ > > we can get the port in: > > svn co > http://svn.bluelife.at/projects/packages/blueports/emulators/virtualbox/ > > FYI, ports/emulators/qemu as of version 0.10.3 may act as VMM for Vista-x86, MacOSX/Leopard (Hackintosh). not to mention -arm, -x86_64, -etc. virtualbox can only be x86-only VMM, AFAIK. It's definitely great news to be able to run VirtualBox under FreeBSD! Best wishes, Clarence CHU _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us From owner-freebsd-virtualization@FreeBSD.ORG Fri May 8 20:04:09 2009 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3FE26106564A for ; Fri, 8 May 2009 20:04:09 +0000 (UTC) (envelope-from scrappy@hub.org) Received: from hub.org (hub.org [200.46.204.220]) by mx1.freebsd.org (Postfix) with ESMTP id 0FA438FC08 for ; Fri, 8 May 2009 20:04:09 +0000 (UTC) (envelope-from scrappy@hub.org) Received: from localhost (maia-1.hub.org [200.46.208.211]) by hub.org (Postfix) with ESMTP id 7ADF753BC82; Fri, 8 May 2009 16:44:45 -0300 (ADT) Received: from hub.org ([200.46.204.220]) by localhost (mx1.hub.org [200.46.208.211]) (amavisd-maia, port 10024) with ESMTP id 83268-10; Fri, 8 May 2009 16:44:40 -0300 (ADT) Received: by hub.org (Postfix, from userid 1002) id 16A0453BC99; Fri, 8 May 2009 16:44:44 -0300 (ADT) Received: from localhost (localhost [127.0.0.1]) by hub.org (Postfix) with ESMTP id 1552A53BC86; Fri, 8 May 2009 16:44:43 -0300 (ADT) Date: Fri, 8 May 2009 16:44:43 -0300 (ADT) From: "Marc G. Fournier" To: Clarence Chu In-Reply-To: Message-ID: <20090508164247.F3563@hub.org> References: <4A030140.4060509@cnptia.embrapa.br> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Fri, 08 May 2009 20:20:15 +0000 Cc: carlos.paniago@cnptia.embrapa.br, freebsd-virtualization@freebsd.org Subject: RE: Port of virtualbox in FreeBSD i386/amd64, emulators/qemu X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 May 2009 20:04:09 -0000 On Fri, 8 May 2009, Clarence Chu wrote: > ports/emulators/qemu as of version 0.10.3 may act as VMM for Vista-x86, > MacOSX/Leopard (Hackintosh). not to mention -arm, -x86_64, -etc. I have something like 6 QEMU VPSs running on one physical server that clients are using to run fully networked Linux environments, and haven't heard any complaints ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 From owner-freebsd-virtualization@FreeBSD.ORG Sat May 9 06:08:43 2009 Return-Path: Delivered-To: virtualization@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7166C106566C; Sat, 9 May 2009 06:08:43 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from gritton.org (gritton.org [161.58.222.4]) by mx1.freebsd.org (Postfix) with ESMTP id 885E08FC08; Sat, 9 May 2009 06:08:39 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from glorfindel.gritton.org (c-76-27-80-223.hsd1.ut.comcast.net [76.27.80.223]) (authenticated bits=0) by gritton.org (8.13.6.20060614/8.13.6) with ESMTP id n4968aUk064000; Sat, 9 May 2009 00:08:37 -0600 (MDT) Message-ID: <4A051DE3.30705@FreeBSD.org> Date: Sat, 09 May 2009 00:08:35 -0600 From: Jamie Gritton User-Agent: Thunderbird 2.0.0.19 (X11/20090220) MIME-Version: 1.0 To: jail@FreeBSD.org, virtualization@FreeBSD.org Content-Type: multipart/mixed; boundary="------------080507020907010909080502" X-Virus-Scanned: ClamAV 0.94.2/9348/Fri May 8 20:35:20 2009 on gritton.org X-Virus-Status: Clean Cc: Subject: Hierarchical jails X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 May 2009 06:08:43 -0000 This is a multi-part message in MIME format. --------------080507020907010909080502 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Here's the first round of hierarchical jails under the new framework. Instead of creds having either a prison or a NULL pointer, they all have a prison pointer with the default being the global "prison0" that contains information about the real environment. Jailed root may (if granted permission) create prisons that would be under its place in the hierarchy, but may not alter (or even see) prisons at its level or above. The JID space is flat, i.e. every prison in the system has a unique ID. The prison name space is hierarchical, with jails having dot-separated component names. prison0 contains three fields that were system globals: pr_root, pr_host, and pr_securelevel. I've kept the globals rootvnode and hostname, and take care that when one is changed the other changes too (not yet true for hostname - read on). But I've actually removed the global securelevel, instead forcing people to use securelevel_gt() and securelevel_ge() (or in very rare cases to check prison0.pr_securelevel directly). I chose to do that because while using the global rootvnode and hostname may be incorrect, using the wrong securelevel is, well, insecure. Actually it would be insecure to use the wrong rootvnode too, but I'm not convinced removing that global is worth the headache. Other globals are subsumed into prison0, but they were only ever part of the jail system anyway: the various jail-related permission bits and such administrative things as prisoncount. The prison hierarchy keeps track of restrictions placed on prisons, and will reflect them downward so a child jail is always at least as restricted as its ancestors. It doesn't go the other way though: if a prison's restrictions are loosened, the children stay as they are. This patch doesn't have anything for userland, and hierarchical jails won't work without that patch (because jails don't have permission to create sub-jails by default, and jail(2) can't grant that permission). A userland patch will follow soon, very similar to the version I posted here recently. - Jamie --------------080507020907010909080502 Content-Type: text/plain; name="jh.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="jh.diff" Index: lib/libc/sys/jail.2 =================================================================== --- lib/libc/sys/jail.2 (revision 191896) +++ lib/libc/sys/jail.2 (working copy) @@ -25,7 +25,7 @@ .\" .\" $FreeBSD$ .\" -.Dd April 29, 2009 +.Dd May 8, 2009 .Dt JAIL 2 .Os .Sh NAME @@ -283,7 +283,7 @@ It is possible to identify a process as jailed by examining .Dq Li /proc//status : it will show a field near the end of the line, either as -a single hyphen for a process at large, or the hostname currently +a single hyphen for a process at large, or the name currently set for the prison for jailed processes. .Sh ERRORS The @@ -292,7 +292,10 @@ will fail if: .Bl -tag -width Er .It Bq Er EPERM -This process is not allowed to create a jail. +This process is not allowed to create a jail, either because it is not +the super-user, or the +.Va security.jail.allow_jails +sysctl MIB is not set. .It Bq Er EFAULT .Fa jail points to an address outside the allocated address space of the process. @@ -308,7 +311,10 @@ will fail if: .Bl -tag -width Er .It Bq Er EPERM -This process is not allowed to create a jail. +This process is not allowed to create a jail, either because it is not +the super-user, or the +.Va security.jail.allow_jails +sysctl MIB is not set. .It Bq Er EPERM A jail parameter was set to a less restrictive value then the current environment. @@ -429,4 +435,4 @@ who contributed it to .Fx . .An James Gritton -added the extensible jail parameters. +added the extensible jail parameters and hierchical jails. Index: sys/ufs/ufs/ufs_vnops.c =================================================================== --- sys/ufs/ufs/ufs_vnops.c (revision 191896) +++ sys/ufs/ufs/ufs_vnops.c (working copy) @@ -61,7 +61,6 @@ #include #include #include -#include #include Index: sys/kern/kern_jail.c =================================================================== --- sys/kern/kern_jail.c (revision 191896) +++ sys/kern/kern_jail.c (working copy) @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -48,7 +49,6 @@ #include #include #include -#include #include #include #include @@ -74,61 +74,43 @@ SYSCTL_NODE(_security, OID_AUTO, jail, CTLFLAG_RW, 0, "Jail rules"); -int jail_set_hostname_allowed = 1; -SYSCTL_INT(_security_jail, OID_AUTO, set_hostname_allowed, CTLFLAG_RW, - &jail_set_hostname_allowed, 0, - "Processes in jail can set their hostnames"); +/* prison0 describes what is "real" about the system. */ +struct prison prison0 = { + .pr_id = 0, + .pr_name = "0", + .pr_ref = 1, + .pr_uref = 1, + .pr_path = "/", + .pr_securelevel = -1, + .pr_children = LIST_HEAD_INITIALIZER(&prison0.pr_children), + .pr_flags = PR_ALLOW_ALL, + .pr_def_perms = PR_ALLOW_SET_HOSTNAME | + PR_RESTRICT_SOCKET_UNIXIPROUTE, + .pr_def_enforce_statfs = 2, +#if defined(INET) || defined(INET6) + .pr_def_max_af_ips = 255, +#endif +}; +MTX_SYSINIT(prison0, &prison0.pr_mtx, "jail mutex", MTX_DEF); -int jail_socket_unixiproute_only = 1; -SYSCTL_INT(_security_jail, OID_AUTO, socket_unixiproute_only, CTLFLAG_RW, - &jail_socket_unixiproute_only, 0, - "Processes in jail are limited to creating UNIX/IP/route sockets only"); - -int jail_sysvipc_allowed = 0; -SYSCTL_INT(_security_jail, OID_AUTO, sysvipc_allowed, CTLFLAG_RW, - &jail_sysvipc_allowed, 0, - "Processes in jail can use System V IPC primitives"); - -static int jail_enforce_statfs = 2; -SYSCTL_INT(_security_jail, OID_AUTO, enforce_statfs, CTLFLAG_RW, - &jail_enforce_statfs, 0, - "Processes in jail cannot see all mounted file systems"); - -int jail_allow_raw_sockets = 0; -SYSCTL_INT(_security_jail, OID_AUTO, allow_raw_sockets, CTLFLAG_RW, - &jail_allow_raw_sockets, 0, - "Prison root can create raw sockets"); - -int jail_chflags_allowed = 0; -SYSCTL_INT(_security_jail, OID_AUTO, chflags_allowed, CTLFLAG_RW, - &jail_chflags_allowed, 0, - "Processes in jail can alter system file flags"); - -int jail_mount_allowed = 0; -SYSCTL_INT(_security_jail, OID_AUTO, mount_allowed, CTLFLAG_RW, - &jail_mount_allowed, 0, - "Processes in jail can mount/unmount jail-friendly file systems"); - -int jail_max_af_ips = 255; -SYSCTL_INT(_security_jail, OID_AUTO, jail_max_af_ips, CTLFLAG_RW, - &jail_max_af_ips, 0, - "Number of IP addresses a jail may have at most per address family"); - -/* allprison, lastprid, and prisoncount are protected by allprison_lock. */ +/* allprison and lastprid are protected by allprison_lock. */ struct sx allprison_lock; SX_SYSINIT(allprison_lock, &allprison_lock, "allprison"); struct prisonlist allprison = TAILQ_HEAD_INITIALIZER(allprison); int lastprid = 0; -int prisoncount = 0; static int do_jail_attach(struct thread *td, struct prison *pr); static void prison_complete(void *context, int pending); static void prison_deref(struct prison *pr, int flags); +static char *prison_path(struct prison *pr1, struct prison *pr2); +static void prison_remove1(struct prison *pr); #ifdef INET static int _prison_check_ip4(struct prison *pr, struct in_addr *ia); +static int prison_restrict_ip4(struct prison *pr, struct in_addr *newip4); #endif #ifdef INET6 static int _prison_check_ip6(struct prison *pr, struct in6_addr *ia6); +static int prison_restrict_ip6(struct prison *pr, struct in6_addr *newip6); #endif static int sysctl_jail_list(SYSCTL_HANDLER_ARGS); @@ -139,7 +121,46 @@ #define PD_LIST_SLOCKED 0x08 #define PD_LIST_XLOCKED 0x10 +/* + * Parameter names corresponding to PR_* flag values + */ +static char *pr_flag_names[] = { + [0] = "persist", #ifdef INET + [2] = "ipv4", +#endif +#ifdef INET6 + [3] = "ipv6", +#endif + [16] = "perm.set_hostname_allowed", + "perm.sysvipc_allowed", + "perm.allow_raw_sockets", + "perm.chflags_allowed", + "perm.mount_allowed", + "perm.allow_quotas", + "perm.allow_jails", + "perm.socket_unixiproute_only", +}; + +static char *pr_flag_nonames[] = { + [0] = "nopersist", +#ifdef INET + [2] = "noipv4", +#endif +#ifdef INET6 + [3] = "noipv6", +#endif + [16] = "perm.noset_hostname_allowed", + "perm.nosysvipc_allowed", + "perm.noallow_raw_sockets", + "perm.nochflags_allowed", + "perm.nomount_allowed", + "perm.noallow_quotas", + "perm.noallow_jails", + "perm.nosocket_unixiproute_only", +}; + +#ifdef INET static int qcmp_v4(const void *ip1, const void *ip2) { @@ -277,7 +298,7 @@ return (error); tmplen = MAXPATHLEN + MAXHOSTNAMELEN + MAXHOSTNAMELEN; #ifdef INET - if (j.ip4s > jail_max_af_ips) + if (j.ip4s > td->td_ucred->cr_prison->pr_max_af_ips) return (EINVAL); tmplen += j.ip4s * sizeof(struct in_addr); #else @@ -285,7 +306,7 @@ return (EINVAL); #endif #ifdef INET6 - if (j.ip6s > jail_max_af_ips) + if (j.ip6s > td->td_ucred->cr_prison->pr_max_af_ips) return (EINVAL); tmplen += j.ip6s * sizeof(struct in6_addr); #else @@ -420,23 +441,24 @@ #endif struct vfsopt *opt; struct vfsoptlist *opts; - struct prison *pr, *deadpr, *tpr; + struct prison *pr, *deadpr, *mypr, *ppr, *tpr; struct vnode *root; char *errmsg, *host, *name, *p, *path; void *op; - int created, cuflags, error, errmsg_len, errmsg_pos; - int gotslevel, jid, len; + size_t namelen, onamelen; + int created, cuflags, descend, enforce, error, errmsg_len, errmsg_pos; + int gotenforce, gotslevel, fi, jid, len; int slevel, vfslocked; #if defined(INET) || defined(INET6) - int ii; + int ii, ij, gotmaxips, maxips; #endif #ifdef INET - int ip4s; + int ip4s, ip4a, redo_ip4; #endif #ifdef INET6 - int ip6s; + int ip6s, ip6a, redo_ip6; #endif - unsigned pr_flags, ch_flags; + unsigned pr_flags, ch_flags, tflags; char numbuf[12]; error = priv_check(td, PRIV_JAIL_SET); @@ -444,6 +466,9 @@ error = priv_check(td, PRIV_JAIL_ATTACH); if (error) return (error); + mypr = ppr = td->td_ucred->cr_prison; + if ((flags & JAIL_CREATE) && !(mypr->pr_flags & PR_ALLOW_JAILS)) + return (EPERM); if (flags & ~JAIL_SET_MASK) return (EINVAL); @@ -461,12 +486,15 @@ if (error) return (error); #ifdef INET + ip4a = 0; ip4 = NULL; #endif #ifdef INET6 + ip6a = 0; ip6 = NULL; #endif + again: error = vfs_copyopt(opts, "jid", &jid, sizeof(jid)); if (error == ENOENT) jid = 0; @@ -481,9 +509,33 @@ else gotslevel = 1; + error = vfs_copyopt(opts, "perm.enforce_statfs", &enforce, + sizeof(enforce)); + gotenforce = error == 0; + if (gotenforce) { + if (enforce < 0 || enforce > 2) + return (EINVAL); + } else if (error != ENOENT) + goto done_free; + +#if defined(INET) || defined(INET6) + error = vfs_copyopt(opts, "perm.max_af_ips", &maxips, sizeof(maxips)); + gotmaxips = error == 0; + if (maxips) { + if (maxips < 1) + return (EINVAL); + } else if (error != ENOENT) + goto done_free; +#endif + pr_flags = ch_flags = 0; - vfs_flagopt(opts, "persist", &pr_flags, PR_PERSIST); - vfs_flagopt(opts, "nopersist", &ch_flags, PR_PERSIST); + for (fi = 0; fi < sizeof(pr_flag_names) / sizeof(pr_flag_names[0]); + fi++) { + if (pr_flag_names[fi] == NULL) + continue; + vfs_flagopt(opts, pr_flag_names[fi], &pr_flags, 1 << fi); + vfs_flagopt(opts, pr_flag_nonames[fi], &ch_flags, 1 << fi); + } ch_flags |= pr_flags; if ((flags & (JAIL_CREATE | JAIL_UPDATE | JAIL_ATTACH)) == JAIL_CREATE && !(pr_flags & PR_PERSIST)) { @@ -524,6 +576,7 @@ } } + /* This might be the second time around for this option. */ #ifdef INET error = vfs_getopt(opts, "ip4.addr", &op, &ip4s); if (error == ENOENT) @@ -533,43 +586,54 @@ else if (ip4s & (sizeof(*ip4) - 1)) { error = EINVAL; goto done_free; - } else if (ip4s > 0) { - ip4s /= sizeof(*ip4); - if (ip4s > jail_max_af_ips) { - error = EINVAL; - vfs_opterror(opts, "too many IPv4 addresses"); - goto done_errmsg; - } - ip4 = malloc(ip4s * sizeof(*ip4), M_PRISON, M_WAITOK); - bcopy(op, ip4, ip4s * sizeof(*ip4)); - /* - * IP addresses are all sorted but ip[0] to preserve the - * primary IP address as given from userland. This special IP - * is used for unbound outgoing connections as well for - * "loopback" traffic. - */ - if (ip4s > 1) - qsort(ip4 + 1, ip4s - 1, sizeof(*ip4), qcmp_v4); - /* - * Check for duplicate addresses and do some simple zero and - * broadcast checks. If users give other bogus addresses it is - * their problem. - * - * We do not have to care about byte order for these checks so - * we will do them in NBO. - */ - for (ii = 0; ii < ip4s; ii++) { - if (ip4[ii].s_addr == INADDR_ANY || - ip4[ii].s_addr == INADDR_BROADCAST) { + } else { + ch_flags |= PR_IP4_USER; + pr_flags |= PR_IP4_USER; + if (ip4s > 0) { + ip4s /= sizeof(*ip4); + if (gotmaxips && ip4s > maxips) { error = EINVAL; - goto done_free; + vfs_opterror(opts, "too many IPv4 addresses"); + goto done_errmsg; } - if ((ii+1) < ip4s && - (ip4[0].s_addr == ip4[ii+1].s_addr || - ip4[ii].s_addr == ip4[ii+1].s_addr)) { - error = EINVAL; - goto done_free; + if (ip4a < ip4s) { + ip4a = ip4s; + free(ip4, M_PRISON); + ip4 = NULL; } + if (ip4 == NULL) + ip4 = malloc(ip4a * sizeof(*ip4), M_PRISON, + M_WAITOK); + bcopy(op, ip4, ip4s * sizeof(*ip4)); + /* + * IP addresses are all sorted but ip[0] to preserve + * the primary IP address as given from userland. + * This special IP is used for unbound outgoing + * connections as well for "loopback" traffic. + */ + if (ip4s > 1) + qsort(ip4 + 1, ip4s - 1, sizeof(*ip4), qcmp_v4); + /* + * Check for duplicate addresses and do some simple + * zero and broadcast checks. If users give other bogus + * addresses it is their problem. + * + * We do not have to care about byte order for these + * checks so we will do them in NBO. + */ + for (ii = 0; ii < ip4s; ii++) { + if (ip4[ii].s_addr == INADDR_ANY || + ip4[ii].s_addr == INADDR_BROADCAST) { + error = EINVAL; + goto done_free; + } + if ((ii+1) < ip4s && + (ip4[0].s_addr == ip4[ii+1].s_addr || + ip4[ii].s_addr == ip4[ii+1].s_addr)) { + error = EINVAL; + goto done_free; + } + } } } #endif @@ -583,29 +647,40 @@ else if (ip6s & (sizeof(*ip6) - 1)) { error = EINVAL; goto done_free; - } else if (ip6s > 0) { - ip6s /= sizeof(*ip6); - if (ip6s > jail_max_af_ips) { - error = EINVAL; - vfs_opterror(opts, "too many IPv6 addresses"); - goto done_errmsg; - } - ip6 = malloc(ip6s * sizeof(*ip6), M_PRISON, M_WAITOK); - bcopy(op, ip6, ip6s * sizeof(*ip6)); - if (ip6s > 1) - qsort(ip6 + 1, ip6s - 1, sizeof(*ip6), qcmp_v6); - for (ii = 0; ii < ip6s; ii++) { - if (IN6_IS_ADDR_UNSPECIFIED(&ip6[0])) { + } else { + ch_flags |= PR_IP6_USER; + pr_flags |= PR_IP6_USER; + if (ip6s > 0) { + ip6s /= sizeof(*ip6); + if (gotmaxips && ip6s > maxips) { error = EINVAL; - goto done_free; + vfs_opterror(opts, "too many IPv6 addresses"); + goto done_errmsg; } - if ((ii+1) < ip6s && - (IN6_ARE_ADDR_EQUAL(&ip6[0], &ip6[ii+1]) || - IN6_ARE_ADDR_EQUAL(&ip6[ii], &ip6[ii+1]))) - { - error = EINVAL; - goto done_free; + if (ip6a < ip6s) { + ip6a = ip6s; + free(ip6, M_PRISON); + ip6 = NULL; } + if (ip6 == NULL) + ip6 = malloc(ip6a * sizeof(*ip6), M_PRISON, + M_WAITOK); + bcopy(op, ip6, ip6s * sizeof(*ip6)); + if (ip6s > 1) + qsort(ip6 + 1, ip6s - 1, sizeof(*ip6), qcmp_v6); + for (ii = 0; ii < ip6s; ii++) { + if (IN6_IS_ADDR_UNSPECIFIED(&ip6[0])) { + error = EINVAL; + goto done_free; + } + if ((ii+1) < ip6s && + (IN6_ARE_ADDR_EQUAL(&ip6[0], &ip6[ii+1]) || + IN6_ARE_ADDR_EQUAL(&ip6[ii], &ip6[ii+1]))) + { + error = EINVAL; + goto done_free; + } + } } } #endif @@ -627,13 +702,15 @@ error = EINVAL; goto done_free; } - if (len > MAXPATHLEN) { - error = ENAMETOOLONG; - goto done_free; - } if (len < 2 || (len == 2 && path[0] == '/')) path = NULL; else { + /* Leave room for a real-root full pathname. */ + if (len + (path[0] == '/' && strcmp(mypr->pr_path, "/") + ? strlen(mypr->pr_path) : 0) > MAXPATHLEN) { + error = ENAMETOOLONG; + goto done_free; + } NDINIT(&nd, LOOKUP, MPSAFE | FOLLOW, UIO_SYSSPACE, path, td); error = namei(&nd); @@ -683,7 +760,13 @@ } pr = NULL; if (jid != 0) { - /* See if a requested jid already exists. */ + /* + * See if a requested jid already exists. There is an + * information leak here if the jid exists but is not within + * the caller's jail hierarchy. Jail creators will get EEXIST + * even though they cannot see the jail, and CREATE | UPDATE + * will return ENOENT which is not normally a valid error. + */ if (jid < 0) { error = EINVAL; vfs_opterror(opts, "negative jid"); @@ -691,6 +774,7 @@ } pr = prison_find(jid); if (pr != NULL) { + ppr = pr->pr_parent; /* Create: jid must not exist. */ if (cuflags == JAIL_CREATE) { mtx_unlock(&pr->pr_mtx); @@ -699,7 +783,10 @@ jid); goto done_unlock_list; } - if (pr->pr_uref == 0) { + if (!prison_ischild(mypr, pr)) { + mtx_unlock(&pr->pr_mtx); + pr = NULL; + } else if (pr->pr_uref == 0) { if (!(flags & JAIL_DYING)) { mtx_unlock(&pr->pr_mtx); error = ENOENT; @@ -717,7 +804,7 @@ * name. */ if (name == NULL) - name = pr->pr_name; + name = prison_name(mypr, pr); } } } @@ -738,12 +825,42 @@ * because that is the jail being updated). */ if (name != NULL) { + p = strrchr(name, '.'); + if (p != NULL) { + /* + * This is a hierarchical name. Split it into the + * parent and child names, and make sure the parent + * exists or matches an already found jail. + */ + *p = '\0'; + if (pr != NULL) { + if (strncmp(name, ppr->pr_name, p - name) || + ppr->pr_name[p - name] != '\0') { + mtx_unlock(&pr->pr_mtx); + error = EINVAL; + vfs_opterror(opts, + "cannot change jail's parent"); + goto done_unlock_list; + } + } else { + ppr = prison_find_name(mypr, name); + if (ppr == NULL) { + error = ENOENT; + vfs_opterror(opts, + "jail \"%s\" not found", name); + goto done_unlock_list; + } + mtx_unlock(&ppr->pr_mtx); + } + name = p + 1; + } if (name[0] != '\0') { + namelen = strlen(ppr->pr_name) + 1; + name_again: deadpr = NULL; - name_again: - TAILQ_FOREACH(tpr, &allprison, pr_list) { + FOREACH_PRISON_CHILD(ppr, tpr) { if (tpr != pr && tpr->pr_ref > 0 && - !strcmp(tpr->pr_name, name)) { + !strcmp(tpr->pr_name + namelen, name)) { if (pr == NULL && cuflags != JAIL_CREATE) { mtx_lock(&tpr->pr_mtx); @@ -763,7 +880,7 @@ /* * Create, or update(jid): * name must not exist in an - * active jail. + * active sibling jail. */ error = EEXIST; if (pr != NULL) @@ -810,6 +927,15 @@ /* If there's no prison to update, create a new one and link it in. */ if (pr == NULL) { created = 1; + mtx_lock(&ppr->pr_mtx); + if (ppr->pr_ref == 0 || (ppr->pr_flags & PR_REMOVE)) { + mtx_unlock(&ppr->pr_mtx); + error = ENOENT; + goto done_unlock_list; + } + ppr->pr_ref++; + ppr->pr_uref++; + mtx_unlock(&ppr->pr_mtx); pr = malloc(sizeof(*pr), M_PRISON, M_WAITOK | M_ZERO); if (jid == 0) { /* Find the next free jid. */ @@ -829,7 +955,9 @@ vfs_opterror(opts, "no available jail IDs"); free(pr, M_PRISON); - goto done_unlock_list; + prison_deref(ppr, PD_DEREF | + PD_DEUREF | PD_LIST_XLOCKED); + goto done_releroot; } jid++; goto findnext; @@ -848,24 +976,56 @@ } if (tpr == NULL) TAILQ_INSERT_TAIL(&allprison, pr, pr_list); - prisoncount++; + LIST_INSERT_HEAD(&ppr->pr_children, pr, pr_sibling); + for (tpr = ppr; tpr != NULL; tpr = tpr->pr_parent) + tpr->pr_prisoncount++; + pr->pr_parent = ppr; pr->pr_id = jid; + + /* Set some default values, and inherit some from the parent. */ if (name == NULL) name = ""; if (path == NULL) { path = "/"; - root = rootvnode; + root = mypr->pr_root; vref(root); } +#ifdef INET + pr->pr_flags |= ppr->pr_flags & PR_IP4; + pr->pr_ip4s = ppr->pr_ip4s; + if (ppr->pr_ip4 != NULL) { + pr->pr_ip4 = malloc(pr->pr_ip4s * + sizeof(struct in_addr), M_PRISON, M_WAITOK); + bcopy(ppr->pr_ip4, pr->pr_ip4, + pr->pr_ip4s * sizeof(*pr->pr_ip4)); + } +#endif +#ifdef INET6 + pr->pr_flags |= ppr->pr_flags & PR_IP6; + pr->pr_ip6s = ppr->pr_ip6s; + if (ppr->pr_ip6 != NULL) { + pr->pr_ip6 = malloc(pr->pr_ip6s * + sizeof(struct in6_addr), M_PRISON, M_WAITOK); + bcopy(ppr->pr_ip6, pr->pr_ip6, + pr->pr_ip6s * sizeof(*pr->pr_ip6)); + } +#endif + pr->pr_securelevel = ppr->pr_securelevel; + pr->pr_flags |= ppr->pr_def_perms; + pr->pr_enforce_statfs = ppr->pr_def_enforce_statfs; +#if defined(INET) || defined(INET6) + pr->pr_max_af_ips = ppr->pr_def_max_af_ips; +#endif - mtx_init(&pr->pr_mtx, "jail mutex", NULL, MTX_DEF); + LIST_INIT(&pr->pr_children); + mtx_init(&pr->pr_mtx, "jail mutex", NULL, MTX_DEF | MTX_DUPOK); /* * Allocate a dedicated cpuset for each jail. * Unlike other initial settings, this may return an erorr. */ - error = cpuset_create_root(td, &pr->pr_cpuset); + error = cpuset_create_root(ppr, &pr->pr_cpuset); if (error) { prison_deref(pr, PD_LIST_XLOCKED); goto done_releroot; @@ -887,103 +1047,425 @@ } /* Do final error checking before setting anything. */ - error = 0; + if (gotslevel) { + if (slevel < ppr->pr_securelevel) { + error = EPERM; + goto done_deref_locked; + } + } + if (gotenforce) { + if (enforce < ppr->pr_enforce_statfs) { + error = EPERM; + goto done_deref_locked; + } + } #if defined(INET) || defined(INET6) - if ( -#ifdef INET - ip4s > 0 -#ifdef INET6 - || + if (gotmaxips) { + if (maxips > ppr->pr_max_af_ips) { + error = EPERM; + goto done_deref_locked; + } + } #endif -#endif -#ifdef INET6 - ip6s > 0 -#endif - ) - /* - * Check for conflicting IP addresses. We permit them if there - * is no more than 1 IP on each jail. If there is a duplicate - * on a jail with more than one IP stop checking and return - * error. - */ - TAILQ_FOREACH(tpr, &allprison, pr_list) { - if (tpr == pr || tpr->pr_uref == 0) - continue; #ifdef INET - if ((ip4s > 0 && tpr->pr_ip4s > 1) || - (ip4s > 1 && tpr->pr_ip4s > 0)) - for (ii = 0; ii < ip4s; ii++) + if (ch_flags & PR_IP4_USER) { + if (!gotmaxips && ip4s > pr->pr_max_af_ips) { + error = EINVAL; + vfs_opterror(opts, "too many IPv4 addresses"); + goto done_deref_locked; + } + if (ppr->pr_flags & PR_IP4) { + if (!(pr_flags & PR_IP4_USER)) { + /* + * Silently ignore attempts to make the IP + * addresses unrestricted when the parent is + * restricted; in other words, interpret + * "unrestricted" as "as unrestricted as + * possible". + */ + ip4s = ppr->pr_ip4s; + if (ip4s == 0) { + free(ip4, M_PRISON); + ip4 = NULL; + } else if (ip4s <= ip4a) { + /* Inherit the parent's address(es). */ + bcopy(ppr->pr_ip4, ip4, + ip4s * sizeof(*ip4)); + } else { + /* + * There's no room for the parent's + * address list. Allocate some more. + */ + ip4a = ip4s; + free(ip4, M_PRISON); + ip4 = malloc(ip4a * sizeof(*ip4), + M_PRISON, M_NOWAIT); + if (ip4 != NULL) + bcopy(ppr->pr_ip4, ip4, + ip4s * sizeof(*ip4)); + else { + /* Allocation failed without + * sleeping. Unlocking the + * prison now will invalidate + * some checks and prematurely + * show an unfinished new jail. + * So let go of everything and + * start over. + */ + prison_deref(pr, created + ? PD_LOCKED | + PD_LIST_XLOCKED + : PD_DEREF | PD_LOCKED | + PD_LIST_XLOCKED); + if (root != NULL) { + vfslocked = + VFS_LOCK_GIANT( + root->v_mount); + vrele(root); + VFS_UNLOCK_GIANT( + vfslocked); + } + ip4 = malloc(ip4a * + sizeof(*ip4), M_PRISON, + M_WAITOK); + goto again; + } + } + } else if (ip4s > 0) { + /* + * Make sure the new set of IP addresses is a + * subset of the parent's list. Don't worry + * about the parent being unlocked, as any + * setting is done with allprison_lock held. + */ + for (ij = 0; ij < ppr->pr_ip4s; ij++) + if (ip4[0].s_addr == + ppr->pr_ip4[ij].s_addr) + break; + if (ij == ppr->pr_ip4s) { + error = EPERM; + goto done_deref_locked; + } + if (ip4s > 1) { + for (ii = ij = 1; ii < ip4s; ii++) { + if (ip4[ii].s_addr == + ppr->pr_ip4[0]. s_addr) + continue; + for (; ij < ppr->pr_ip4s; ij++) + if (ip4[ii].s_addr == + ppr->pr_ip4[ij].s_addr) + break; + } + if (ij == ppr->pr_ip4s) { + error = EPERM; + goto done_deref_locked; + } + } + } + } + if (ip4s > 0) { + /* + * Check for conflicting IP addresses. We permit them + * if there is no more than one IP on each jail. If + * there is a duplicate on a jail with more than one + * IP stop checking and return error. + */ + FOREACH_PRISON_DESCENDANT(&prison0, tpr, descend) { + if (tpr == pr || tpr->pr_uref == 0) { + descend = 0; + continue; + } + if (!(tpr->pr_flags & PR_IP4_USER)) + continue; + descend = 0; + if (tpr->pr_ip4 == NULL || + (ip4s == 1 && tpr->pr_ip4s == 1)) + continue; + for (ii = 0; ii < ip4s; ii++) { if (_prison_check_ip4(tpr, &ip4[ii]) == 0) { - error = EINVAL; + error = EADDRINUSE; vfs_opterror(opts, "IPv4 addresses clash"); goto done_deref_locked; } + } + } + } + } #endif #ifdef INET6 - if ((ip6s > 0 && tpr->pr_ip6s > 1) || - (ip6s > 1 && tpr->pr_ip6s > 0)) - for (ii = 0; ii < ip6s; ii++) + if (ch_flags & PR_IP6_USER) { + if (!gotmaxips && ip6s > pr->pr_max_af_ips) { + error = EINVAL; + vfs_opterror(opts, "too many IPv6 addresses"); + goto done_deref_locked; + } + if (ppr->pr_flags & PR_IP6) { + if (!(pr_flags & PR_IP6_USER)) { + /* + * Silently ignore attempts to make the IP + * addresses unrestricted when the parent is + * restricted. + */ + ip6s = ppr->pr_ip6s; + if (ip6s == 0) { + free(ip6, M_PRISON); + ip6 = NULL; + } else if (ip6s <= ip6a) { + /* Inherit the parent's address(es). */ + bcopy(ppr->pr_ip6, ip6, + ip6s * sizeof(*ip6)); + } else { + /* + * There's no room for the parent's + * address list. + */ + ip6a = ip6s; + free(ip6, M_PRISON); + ip6 = malloc(ip6a * sizeof(*ip6), + M_PRISON, M_NOWAIT); + if (ip6 != NULL) + bcopy(ppr->pr_ip6, ip6, + ip6s * sizeof(*ip6)); + else { + prison_deref(pr, created + ? PD_LOCKED | + PD_LIST_XLOCKED + : PD_DEREF | PD_LOCKED | + PD_LIST_XLOCKED); + if (root != NULL) { + vfslocked = + VFS_LOCK_GIANT( + root->v_mount); + vrele(root); + VFS_UNLOCK_GIANT( + vfslocked); + } + ip6 = malloc(ip6a * + sizeof(*ip6), M_PRISON, + M_WAITOK); + goto again; + } + } + } else if (ip6s > 0) { + /* + * Make sure the new set of IP addresses is a + * subset of the parent's list. + */ + for (ij = 0; ij < ppr->pr_ip6s; ij++) + if (IN6_ARE_ADDR_EQUAL(&ip6[0], + &ppr->pr_ip6[ij])) + break; + if (ij == ppr->pr_ip6s) { + error = EPERM; + goto done_deref_locked; + } + if (ip6s > 1) { + for (ii = ij = 1; ii < ip6s; ii++) { + if (IN6_ARE_ADDR_EQUAL(&ip6[ii], + &ppr->pr_ip6[0])) + continue; + for (; ij < ppr->pr_ip6s; ij++) + if (IN6_ARE_ADDR_EQUAL( + &ip6[ii], + &ppr->pr_ip6[ij])) + break; + } + if (ij == ppr->pr_ip6s) { + error = EPERM; + goto done_deref_locked; + } + } + } + } + if (ip6s > 0) { + /* Check for conflicting IP addresses. */ + FOREACH_PRISON_DESCENDANT(&prison0, tpr, descend) { + if (tpr == pr || tpr->pr_uref == 0) { + descend = 0; + continue; + } + if (!(tpr->pr_flags & PR_IP6_USER)) + continue; + descend = 0; + if (tpr->pr_ip6 == NULL || + (ip6s == 1 && tpr->pr_ip6s == 1)) + continue; + for (ii = 0; ii < ip6s; ii++) { if (_prison_check_ip6(tpr, &ip6[ii]) == 0) { - error = EINVAL; + error = EADDRINUSE; vfs_opterror(opts, "IPv6 addresses clash"); goto done_deref_locked; } -#endif + } + } } + } #endif - if (error == 0 && name != NULL) { + if (name != NULL) { /* Give a default name of the jid. */ if (name[0] == '\0') snprintf(name = numbuf, sizeof(numbuf), "%d", jid); else if (strtoul(name, &p, 10) != jid && *p == '\0') { error = EINVAL; vfs_opterror(opts, "name cannot be numeric"); + goto done_deref_locked; } + if (strlen(ppr->pr_name) + strlen(name) + 2 > + sizeof(pr->pr_name)) { + error = ENAMETOOLONG; + goto done_deref_locked; + } } - if (error) { - done_deref_locked: - /* - * Some parameter had an error so do not set anything. - * If this is a new jail, it will go away without ever - * having been seen. - */ - prison_deref(pr, created - ? PD_LOCKED | PD_LIST_XLOCKED - : PD_DEREF | PD_LOCKED | PD_LIST_XLOCKED); - goto done_releroot; + if ((PR_ALLOW_ALL & pr_flags & ~ppr->pr_flags) | + (PR_RESTRICT_ALL & ch_flags & ~pr_flags & ppr->pr_flags)) { + error = EPERM; + goto done_deref_locked; } /* Set the parameters of the prison. */ #ifdef INET - if (ip4s >= 0) { - pr->pr_ip4s = ip4s; - free(pr->pr_ip4, M_PRISON); - pr->pr_ip4 = ip4; - ip4 = NULL; + redo_ip4 = 0; + if (ch_flags & PR_IP4_USER) { + if (pr_flags & PR_IP4_USER) { + /* Some restriction set. */ + pr->pr_flags |= PR_IP4; + if (ip4s >= 0) { + free(pr->pr_ip4, M_PRISON); + pr->pr_ip4s = ip4s; + pr->pr_ip4 = ip4; + ip4 = NULL; + } + } else if (ppr->pr_flags & PR_IP4) { + /* This restriction cleared, but keep inherited. */ + free(pr->pr_ip4, M_PRISON); + pr->pr_ip4s = ip4s; + pr->pr_ip4 = ip4; + ip4 = NULL; + } else { + /* Restriction cleared, now unrestricted. */ + pr->pr_flags &= ~PR_IP4; + free(pr->pr_ip4, M_PRISON); + pr->pr_ip4s = 0; + } + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + if (prison_restrict_ip4(tpr, NULL)) { + redo_ip4 = 1; + descend = 0; + } + } } #endif #ifdef INET6 - if (ip6s >= 0) { - pr->pr_ip6s = ip6s; - free(pr->pr_ip6, M_PRISON); - pr->pr_ip6 = ip6; - ip6 = NULL; + redo_ip6 = 0; + if (ch_flags & PR_IP6_USER) { + if (pr_flags & PR_IP6_USER) { + /* Some restriction set. */ + pr->pr_flags |= PR_IP6; + if (ip6s >= 0) { + free(pr->pr_ip6, M_PRISON); + pr->pr_ip6s = ip6s; + pr->pr_ip6 = ip6; + ip6 = NULL; + } + } else if (ppr->pr_flags & PR_IP6) { + /* This restriction cleared, but keep inherited. */ + free(pr->pr_ip6, M_PRISON); + pr->pr_ip6s = ip6s; + pr->pr_ip6 = ip6; + ip6 = NULL; + } else { + /* Restriction cleared, now unrestricted. */ + pr->pr_flags &= ~PR_IP6; + free(pr->pr_ip6, M_PRISON); + pr->pr_ip6s = 0; + } + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + if (prison_restrict_ip6(tpr, NULL)) { + redo_ip6 = 1; + descend = 0; + } + } } #endif - if (gotslevel) + if (gotslevel) { pr->pr_securelevel = slevel; - if (name != NULL) - strlcpy(pr->pr_name, name, sizeof(pr->pr_name)); + /* Set all child jails to be at least this level. */ + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) + if (tpr->pr_securelevel < slevel) + tpr->pr_securelevel = slevel; + } + if (gotenforce) { + pr->pr_enforce_statfs = enforce; + if (pr->pr_def_enforce_statfs < enforce) + pr->pr_def_enforce_statfs = enforce; + /* Pass this restriction on to the children. */ + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) + if (tpr->pr_enforce_statfs < enforce) { + tpr->pr_enforce_statfs = enforce; + if (tpr->pr_def_enforce_statfs < enforce) + tpr->pr_def_enforce_statfs = enforce; + } + } +#if defined(INET) || defined(INET6) + if (gotmaxips) { + pr->pr_max_af_ips = maxips; + if (pr->pr_def_max_af_ips > maxips) + pr->pr_def_max_af_ips = maxips; + /* Pass this restriction on to the children. */ + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) + if (tpr->pr_max_af_ips > maxips) { + tpr->pr_max_af_ips = maxips; + if (tpr->pr_def_max_af_ips > maxips) + tpr->pr_def_max_af_ips = maxips; + } + } +#endif + if (name != NULL) { + onamelen = strlen(pr->pr_name); + if (ppr == &prison0) + strlcpy(pr->pr_name, name, sizeof(pr->pr_name)); + else + snprintf(pr->pr_name, sizeof(pr->pr_name), "%s.%s", + ppr->pr_name, name); + namelen = strlen(pr->pr_name); + /* Change this component of child names. */ + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + bcopy(tpr->pr_name + onamelen, tpr->pr_name + namelen, + strlen(tpr->pr_name + onamelen) + 1); + bcopy(pr->pr_name, tpr->pr_name, namelen); + } + } if (path != NULL) { - strlcpy(pr->pr_path, path, sizeof(pr->pr_path)); + /* Try to keep a real-rooted full pathname. */ + if (path[0] == '/' && strcmp(mypr->pr_path, "/")) + snprintf(pr->pr_path, sizeof pr->pr_path, "%s%s", + mypr->pr_path, path); + else + strlcpy(pr->pr_path, path, sizeof(pr->pr_path)); pr->pr_root = root; } if (host != NULL) strlcpy(pr->pr_host, host, sizeof(pr->pr_host)); + if ((tflags = PR_ALLOW_ALL & ch_flags & ~pr_flags)) { + /* Clear allow bits on sysctl and all children. */ + pr->pr_def_perms &= ~tflags; + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + tpr->pr_flags &= ~tflags; + tpr->pr_def_perms &= ~tflags; + } + } + if ((tflags = PR_RESTRICT_ALL & pr_flags)) { + /* Set restrict bits on sysctl and all children. */ + pr->pr_def_perms |= tflags; + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + tpr->pr_flags |= tflags; + tpr->pr_def_perms |= tflags; + } + } /* * Persistent prisons get an extra reference, and prisons losing their * persist flag lose that reference. Only do this for existing prisons @@ -1002,6 +1484,44 @@ pr->pr_flags = (pr->pr_flags & ~ch_flags) | pr_flags; mtx_unlock(&pr->pr_mtx); + /* Locks may have prevented a complete restriction of child IP + * addresses. If so, allocate some more memory and try again. + */ +#ifdef INET + while (redo_ip4) { + ip4s = pr->pr_ip4s; + ip4 = malloc(ip4s * sizeof(*ip4), M_PRISON, M_WAITOK); + mtx_lock(&pr->pr_mtx); + redo_ip4 = 0; + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + if (prison_restrict_ip4(tpr, ip4)) { + if (ip4 != NULL) + ip4 = NULL; + else + redo_ip4 = 1; + } + } + mtx_unlock(&pr->pr_mtx); + } +#endif +#ifdef INET6 + while (redo_ip6) { + ip6s = pr->pr_ip6s; + ip6 = malloc(ip6s * sizeof(*ip6), M_PRISON, M_WAITOK); + mtx_lock(&pr->pr_mtx); + redo_ip6 = 0; + FOREACH_PRISON_DESCENDANT_LOCKED(pr, tpr, descend) { + if (prison_restrict_ip6(tpr, ip6)) { + if (ip6 != NULL) + ip6 = NULL; + else + redo_ip6 = 1; + } + } + mtx_unlock(&pr->pr_mtx); + } +#endif + /* Let the modules do their work. */ sx_downgrade(&allprison_lock); if (created) { @@ -1054,6 +1574,11 @@ td->td_retval[0] = pr->pr_id; goto done_errmsg; + done_deref_locked: + prison_deref(pr, created + ? PD_LOCKED | PD_LIST_XLOCKED + : PD_DEREF | PD_LOCKED | PD_LIST_XLOCKED); + goto done_releroot; done_unlock_list: sx_xunlock(&allprison_lock); done_releroot: @@ -1131,6 +1656,7 @@ } SYSCTL_JAIL_PARAM(, jid, CTLTYPE_INT | CTLFLAG_RD, "I", "Jail ID"); +SYSCTL_JAIL_PARAM(, parent, CTLTYPE_INT | CTLFLAG_RD, "I", "Jail parent ID"); SYSCTL_JAIL_PARAM_STRING(, name, CTLFLAG_RW, MAXHOSTNAMELEN, "Jail name"); SYSCTL_JAIL_PARAM(, cpuset, CTLTYPE_INT | CTLFLAG_RD, "I", "Jail cpuset ID"); SYSCTL_JAIL_PARAM_STRING(, path, CTLFLAG_RD, MAXPATHLEN, "Jail root path"); @@ -1147,16 +1673,44 @@ #ifdef INET SYSCTL_JAIL_PARAM_NODE(ip4, "Jail IPv4 address virtualization"); +SYSCTL_JAIL_PARAM(, noip4, CTLTYPE_INT | CTLFLAG_RW, + "BN", "Jail w/ no IP address virtualization"); SYSCTL_JAIL_PARAM_STRUCT(_ip4, addr, CTLFLAG_RW, sizeof(struct in_addr), "S,in_addr,a", "Jail IPv4 addresses"); #endif #ifdef INET6 SYSCTL_JAIL_PARAM_NODE(ip6, "Jail IPv6 address virtualization"); +SYSCTL_JAIL_PARAM(, noip6, CTLTYPE_INT | CTLFLAG_RW, + "BN", "Jail w/ no IP address virtualization"); SYSCTL_JAIL_PARAM_STRUCT(_ip6, addr, CTLFLAG_RW, sizeof(struct in6_addr), "S,in6_addr,a", "Jail IPv6 addresses"); #endif +SYSCTL_JAIL_PARAM_NODE(perm, "Jail permissions"); +SYSCTL_JAIL_PARAM(_perm, set_hostname_allowed, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may set hostname"); +SYSCTL_JAIL_PARAM(_perm, sysvipc_allowed, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may use SYSV IPC"); +SYSCTL_JAIL_PARAM(_perm, allow_raw_sockets, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may create raw sockets"); +SYSCTL_JAIL_PARAM(_perm, chflags_allowed, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may alter system file flags"); +SYSCTL_JAIL_PARAM(_perm, mount_allowed, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may mount/unmount jail-friendly file systems"); +SYSCTL_JAIL_PARAM(_perm, allow_quotas, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may set file quotas"); +SYSCTL_JAIL_PARAM(_perm, allow_jails, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail may create child jails"); +SYSCTL_JAIL_PARAM(_perm, socket_unixiproute_only, CTLTYPE_INT | CTLFLAG_RW, + "B", "Jail limited to creating UNIX/IPv4/IPv6/route sockets only"); +SYSCTL_JAIL_PARAM(_perm, enforce_statfs, CTLTYPE_INT | CTLFLAG_RW, + "I", "Jail cannot see all mounted file systems"); +#if defined(INET) || defined(INET6) +SYSCTL_JAIL_PARAM(_perm, max_af_ips, CTLTYPE_INT | CTLFLAG_RW, + "I", "Number of IP addresses a jail may have at most per address family"); +#endif + /* * struct jail_get_args { * struct iovec *iovp; @@ -1188,28 +1742,21 @@ int kern_jail_get(struct thread *td, struct uio *optuio, int flags) { - struct prison *pr; + struct prison *pr, *mypr; struct vfsopt *opt; struct vfsoptlist *opts; char *errmsg, *name; - int error, errmsg_len, errmsg_pos, i, jid, len, locked, pos; + int error, errmsg_len, errmsg_pos, fi, i, jid, len, locked, pos; if (flags & ~JAIL_GET_MASK) return (EINVAL); - if (jailed(td->td_ucred)) { - /* - * Don't allow a jailed process to see any jails, - * not even its own. - */ - vfs_opterror(opts, "jail not found"); - return (ENOENT); - } /* Get the parameter list. */ error = vfs_buildopts(optuio, &opts); if (error) return (error); errmsg_pos = vfs_getopt_pos(opts, "errmsg"); + mypr = td->td_ucred->cr_prison; /* * Find the prison specified by one of: lastjid, jid, name. @@ -1218,7 +1765,7 @@ error = vfs_copyopt(opts, "lastjid", &jid, sizeof(jid)); if (error == 0) { TAILQ_FOREACH(pr, &allprison, pr_list) { - if (pr->pr_id > jid) { + if (pr->pr_id > jid && prison_ischild(mypr, pr)) { mtx_lock(&pr->pr_mtx); if (pr->pr_ref > 0 && (pr->pr_uref > 0 || (flags & JAIL_DYING))) @@ -1237,7 +1784,7 @@ error = vfs_copyopt(opts, "jid", &jid, sizeof(jid)); if (error == 0) { if (jid != 0) { - pr = prison_find(jid); + pr = prison_find_child(mypr, jid); if (pr != NULL) { if (pr->pr_uref == 0 && !(flags & JAIL_DYING)) { mtx_unlock(&pr->pr_mtx); @@ -1261,7 +1808,7 @@ error = EINVAL; goto done_unlock_list; } - pr = prison_find_name(name); + pr = prison_find_name(mypr, name); if (pr != NULL) { if (pr->pr_uref == 0 && !(flags & JAIL_DYING)) { mtx_unlock(&pr->pr_mtx); @@ -1290,14 +1837,18 @@ error = vfs_setopt(opts, "jid", &pr->pr_id, sizeof(pr->pr_id)); if (error != 0 && error != ENOENT) goto done_deref; - error = vfs_setopts(opts, "name", pr->pr_name); + i = pr->pr_parent == mypr ? 0 : pr->pr_parent->pr_id; + error = vfs_setopt(opts, "parent", &i, sizeof(i)); if (error != 0 && error != ENOENT) goto done_deref; + error = vfs_setopts(opts, "name", prison_name(mypr, pr)); + if (error != 0 && error != ENOENT) + goto done_deref; error = vfs_setopt(opts, "cpuset", &pr->pr_cpuset->cs_id, sizeof(pr->pr_cpuset->cs_id)); if (error != 0 && error != ENOENT) goto done_deref; - error = vfs_setopts(opts, "path", pr->pr_path); + error = vfs_setopts(opts, "path", prison_path(mypr, pr)); if (error != 0 && error != ENOENT) goto done_deref; #ifdef INET @@ -1319,14 +1870,29 @@ error = vfs_setopts(opts, "host.hostname", pr->pr_host); if (error != 0 && error != ENOENT) goto done_deref; - i = pr->pr_flags & PR_PERSIST ? 1 : 0; - error = vfs_setopt(opts, "persist", &i, sizeof(i)); + error = vfs_setopt(opts, "perm.enforce_statfs", &pr->pr_enforce_statfs, + sizeof(pr->pr_enforce_statfs)); if (error != 0 && error != ENOENT) goto done_deref; - i = !i; - error = vfs_setopt(opts, "nopersist", &i, sizeof(i)); +#if defined(INET) || defined(INET6) + error = vfs_setopt(opts, "perm.max_af_ips", &pr->pr_max_af_ips, + sizeof(pr->pr_max_af_ips)); if (error != 0 && error != ENOENT) goto done_deref; +#endif + for (fi = 0; fi < sizeof(pr_flag_names) / sizeof(pr_flag_names[0]); + fi++) { + if (pr_flag_names[fi] == NULL) + continue; + i = (pr->pr_flags & (1 << fi)) ? 1 : 0; + error = vfs_setopt(opts, pr_flag_names[fi], &i, sizeof(i)); + if (error != 0 && error != ENOENT) + goto done_deref; + i = !i; + error = vfs_setopt(opts, pr_flag_nonames[fi], &i, sizeof(i)); + if (error != 0 && error != ENOENT) + goto done_deref; + } i = (pr->pr_uref == 0); error = vfs_setopt(opts, "dying", &i, sizeof(i)); if (error != 0 && error != ENOENT) @@ -1402,6 +1968,159 @@ } /* + * Jail permission sysctls. These are companions to the jail parameters of + * similar names, and provide the default values for child jails. + */ + +static int +sysctl_jail_perm(SYSCTL_HANDLER_ARGS) +{ + struct prison *pr, *cpr; + int descend, error, i; + + pr = req->td->td_ucred->cr_prison; + + /* Get the current flag value, and convert it to a boolean. */ + i = (pr->pr_def_perms & arg2) ? 1 : 0; + error = sysctl_handle_int(oidp, &i, 0, req); + if (error || !req->newptr) + return (error); + i = i ? arg2 : 0; + /* Do not allow more than the current prison itself can do. */ + sx_slock(&allprison_lock); + mtx_lock(&pr->pr_mtx); + if ((i & PR_ALLOW_ALL & ~pr->pr_flags) | + (arg2 & PR_RESTRICT_ALL & pr->pr_flags & ~i)) { + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (EPERM); + } + pr->pr_def_perms = (pr->pr_def_perms & ~arg2) | i; + /* Reflect restrictions to child jails. */ + if ((arg2 & PR_ALLOW_ALL & ~i) | (arg2 & PR_RESTRICT_ALL & i)) + FOREACH_PRISON_DESCENDANT_LOCKED(pr, cpr, descend) { + cpr->pr_flags = (cpr->pr_flags & ~arg2) | i; + cpr->pr_def_perms = (cpr->pr_def_perms & ~arg2) | i; + } + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (0); +} + +SYSCTL_PROC(_security_jail, OID_AUTO, set_hostname_allowed, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_SET_HOSTNAME, sysctl_jail_perm, "I", + "Processes in jail can set their hostnames"); +SYSCTL_PROC(_security_jail, OID_AUTO, socket_unixiproute_only, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_RESTRICT_SOCKET_UNIXIPROUTE, sysctl_jail_perm, "I", + "Processes in jail are limited to creating UNIX/IP/route sockets only"); +SYSCTL_PROC(_security_jail, OID_AUTO, sysvipc_allowed, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_SYSVIPC, sysctl_jail_perm, "I", + "Processes in jail can use System V IPC primitives"); +SYSCTL_PROC(_security_jail, OID_AUTO, allow_raw_sockets, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_RAW_SOCKETS, sysctl_jail_perm, "I", + "Prison root can create raw sockets"); +SYSCTL_PROC(_security_jail, OID_AUTO, chflags_allowed, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_CHFLAGS, sysctl_jail_perm, "I", + "Processes in jail can alter system file flags"); +SYSCTL_PROC(_security_jail, OID_AUTO, mount_allowed, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_MOUNT, sysctl_jail_perm, "I", + "Processes in jail can mount/unmount jail-friendly file systems"); +SYSCTL_PROC(_security_jail, OID_AUTO, allow_quotas, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_QUOTAS, sysctl_jail_perm, "I", + "Processes in jail can set file quotas"); +SYSCTL_PROC(_security_jail, OID_AUTO, allow_jails, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, PR_ALLOW_JAILS, sysctl_jail_perm, "I", + "Processes in jail can create child jails"); + +static int +sysctl_jail_enforce_statfs(SYSCTL_HANDLER_ARGS) +{ + struct prison *pr, *cpr; + int descend, error, i; + + pr = req->td->td_ucred->cr_prison; + + i = pr->pr_def_enforce_statfs; + error = sysctl_handle_int(oidp, &i, 0, req); + if (error || !req->newptr) + return (error); + if (i < 0 || i > 2) + return (EINVAL); + /* Do not allow more than the current prison itself can do. */ + sx_slock(&allprison_lock); + mtx_lock(&pr->pr_mtx); + if (i < pr->pr_enforce_statfs) { + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (EPERM); + } + pr->pr_def_enforce_statfs = i; + /* Reflect restrictions to child jails. */ + FOREACH_PRISON_DESCENDANT_LOCKED(pr, cpr, descend) + if (cpr->pr_enforce_statfs < i) { + cpr->pr_enforce_statfs = i; + if (cpr->pr_def_enforce_statfs < i) + cpr->pr_def_enforce_statfs = i; + } + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (0); +} +SYSCTL_PROC(_security_jail, OID_AUTO, enforce_statfs, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, 0, sysctl_jail_enforce_statfs, "I", + "Processes in jail cannot see all mounted file systems"); + +#if defined(INET) || defined(INET6) +static int +sysctl_jail_max_af_ips(SYSCTL_HANDLER_ARGS) +{ + struct prison *pr, *cpr; + int descend, error, i; + + pr = req->td->td_ucred->cr_prison; + + i = pr->pr_def_max_af_ips; + error = sysctl_handle_int(oidp, &i, 0, req); + if (error || !req->newptr) + return (error); + if (i < 1) + return (EINVAL); + /* Do not allow more than the current prison itself can do. */ + sx_slock(&allprison_lock); + mtx_lock(&pr->pr_mtx); + if (i > pr->pr_max_af_ips) { + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (EPERM); + } + pr->pr_def_max_af_ips = i; + /* Reflect restrictions to child jails. */ + FOREACH_PRISON_DESCENDANT_LOCKED(pr, cpr, descend) + if (cpr->pr_max_af_ips > i) { + cpr->pr_max_af_ips = i; + if (cpr->pr_def_max_af_ips > i) + cpr->pr_def_max_af_ips = i; + } + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (0); +} +SYSCTL_PROC(_security_jail, OID_AUTO, max_af_ips, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_PRISON | CTLFLAG_MPSAFE, + NULL, 0, sysctl_jail_max_af_ips, "I", + "Number of IP addresses a jail may have at most per address family"); +#endif + +/* * struct jail_remove_args { * int jid; * }; @@ -1409,21 +2128,61 @@ int jail_remove(struct thread *td, struct jail_remove_args *uap) { - struct prison *pr; - struct proc *p; - int deuref, error; + struct prison *pr, *cpr, *lpr, *tpr; + int descend, error; error = priv_check(td, PRIV_JAIL_REMOVE); if (error) return (error); sx_xlock(&allprison_lock); - pr = prison_find(uap->jid); + pr = prison_find_child(td->td_ucred->cr_prison, uap->jid); if (pr == NULL) { sx_xunlock(&allprison_lock); return (EINVAL); } + /* Remove all descendants of this prison, then remove this prison. */ + pr->pr_ref++; + pr->pr_flags |= PR_REMOVE; + if (!LIST_EMPTY(&pr->pr_children)) { + mtx_unlock(&pr->pr_mtx); + lpr = NULL; + FOREACH_PRISON_DESCENDANT(pr, cpr, descend) { + mtx_lock(&cpr->pr_mtx); + if (cpr->pr_ref > 0) { + tpr = cpr; + cpr->pr_ref++; + cpr->pr_flags |= PR_REMOVE; + } else { + /* Already removed - do not do it again. */ + tpr = NULL; + } + mtx_unlock(&cpr->pr_mtx); + if (lpr != NULL) { + mtx_lock(&lpr->pr_mtx); + prison_remove1(lpr); + sx_xlock(&allprison_lock); + } + lpr = tpr; + } + if (lpr != NULL) { + mtx_lock(&lpr->pr_mtx); + prison_remove1(lpr); + sx_xlock(&allprison_lock); + } + mtx_lock(&pr->pr_mtx); + } + prison_remove1(pr); + return (0); +} + +static void +prison_remove1(struct prison *pr) +{ + struct proc *p; + int deuref; + /* If the prison was persistent, it is not anymore. */ deuref = 0; if (pr->pr_flags & PR_PERSIST) { @@ -1432,17 +2191,18 @@ pr->pr_flags &= ~PR_PERSIST; } - /* If there are no references left, remove the prison now. */ - if (pr->pr_ref == 0) { + /* + * jail_remove added a reference. If that's the only one, remove + * the prison now. + */ + KASSERT(pr->pr_ref > 0, + ("prison_remove1 removing a dead prison (jid=%d)", pr->pr_id)); + if (pr->pr_ref == 1) { prison_deref(pr, deuref | PD_DEREF | PD_LOCKED | PD_LIST_XLOCKED); - return (0); + return; } - /* - * Keep a temporary reference to make sure this prison sticks around. - */ - pr->pr_ref++; mtx_unlock(&pr->pr_mtx); sx_xunlock(&allprison_lock); /* @@ -1457,9 +2217,8 @@ PROC_UNLOCK(p); } sx_sunlock(&allproc_lock); - /* Remove the temporary reference. */ + /* Remove the temporary reference added by jail_remove. */ prison_deref(pr, deuref | PD_DEREF); - return (0); } @@ -1479,7 +2238,7 @@ return (error); sx_slock(&allprison_lock); - pr = prison_find(uap->jid); + pr = prison_find_child(td->td_ucred->cr_prison, uap->jid); if (pr == NULL) { sx_sunlock(&allprison_lock); return (EINVAL); @@ -1501,6 +2260,7 @@ static int do_jail_attach(struct thread *td, struct prison *pr) { + struct prison *ppr; struct proc *p; struct ucred *newcred, *oldcred; int vfslocked, error; @@ -1528,6 +2288,7 @@ /* * Reparent the newly attached process to this jail. */ + ppr = td->td_ucred->cr_prison; p = td->td_proc; error = cpuset_setproc_update_set(p, pr->pr_cpuset); if (error) @@ -1555,6 +2316,7 @@ p->p_ucred = newcred; PROC_UNLOCK(p); crfree(oldcred); + prison_deref(ppr, PD_DEREF | PD_DEUREF); return (0); e_unlock: VOP_UNLOCK(pr->pr_root, 0); @@ -1562,7 +2324,7 @@ VFS_UNLOCK_GIANT(vfslocked); e_revert_osd: /* Tell modules this thread is still in its old jail after all. */ - (void)osd_jail_call(td->td_ucred->cr_prison, PR_METHOD_ATTACH, td); + (void)osd_jail_call(ppr, PR_METHOD_ATTACH, td); prison_deref(pr, PD_DEREF | PD_DEUREF); return (error); } @@ -1588,18 +2350,42 @@ } /* - * Look for the named prison. Returns a locked prison or NULL. + * Find a prison that is a descendant of mypr. Returns a locked prison or NULL. */ struct prison * -prison_find_name(const char *name) +prison_find_child(struct prison *mypr, int prid) { + struct prison *pr; + int descend; + + sx_assert(&allprison_lock, SX_LOCKED); + FOREACH_PRISON_DESCENDANT(mypr, pr, descend) { + if (pr->pr_id == prid) { + mtx_lock(&pr->pr_mtx); + if (pr->pr_ref > 0) + return (pr); + mtx_unlock(&pr->pr_mtx); + } + } + return (NULL); +} + +/* + * Look for the name relative to mypr. Returns a locked prison or NULL. + */ +struct prison * +prison_find_name(struct prison *mypr, const char *name) +{ struct prison *pr, *deadpr; + size_t mylen; + int descend; sx_assert(&allprison_lock, SX_LOCKED); + mylen = mypr == &prison0 ? 0 : strlen(mypr->pr_name) + 1; again: deadpr = NULL; - TAILQ_FOREACH(pr, &allprison, pr_list) { - if (!strcmp(pr->pr_name, name)) { + FOREACH_PRISON_DESCENDANT(mypr, pr, descend) { + if (!strcmp(pr->pr_name + mylen, name)) { mtx_lock(&pr->pr_mtx); if (pr->pr_ref > 0) { if (pr->pr_uref > 0) @@ -1609,7 +2395,7 @@ mtx_unlock(&pr->pr_mtx); } } - /* There was no valid prison - perhaps there was a dying one */ + /* There was no valid prison - perhaps there was a dying one. */ if (deadpr != NULL) { mtx_lock(&deadpr->pr_mtx); if (deadpr->pr_ref == 0) { @@ -1663,66 +2449,87 @@ static void prison_deref(struct prison *pr, int flags) { + struct prison *ppr, *tpr; int vfslocked; if (!(flags & PD_LOCKED)) mtx_lock(&pr->pr_mtx); + /* Decrement the user references in a separate loop. */ if (flags & PD_DEUREF) { - pr->pr_uref--; + for (tpr = pr;; tpr = tpr->pr_parent) { + if (tpr != pr) + mtx_lock(&tpr->pr_mtx); + if (--tpr->pr_uref > 0) + break; + KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); + mtx_unlock(&tpr->pr_mtx); + } /* Done if there were only user references to remove. */ if (!(flags & PD_DEREF)) { - mtx_unlock(&pr->pr_mtx); + mtx_unlock(&tpr->pr_mtx); if (flags & PD_LIST_SLOCKED) sx_sunlock(&allprison_lock); else if (flags & PD_LIST_XLOCKED) sx_xunlock(&allprison_lock); return; } + if (tpr != pr) { + mtx_unlock(&tpr->pr_mtx); + mtx_lock(&pr->pr_mtx); + } } - if (flags & PD_DEREF) - pr->pr_ref--; - /* If the prison still has references, nothing else to do. */ - if (pr->pr_ref > 0) { - mtx_unlock(&pr->pr_mtx); - if (flags & PD_LIST_SLOCKED) - sx_sunlock(&allprison_lock); - else if (flags & PD_LIST_XLOCKED) - sx_xunlock(&allprison_lock); - return; - } - KASSERT(pr->pr_uref == 0, - ("%s: Trying to remove an active prison (jid=%d).", __func__, - pr->pr_id)); - mtx_unlock(&pr->pr_mtx); - if (flags & PD_LIST_SLOCKED) { - if (!sx_try_upgrade(&allprison_lock)) { - sx_sunlock(&allprison_lock); - sx_xlock(&allprison_lock); + for (;;) { + if (flags & PD_DEREF) + pr->pr_ref--; + /* If the prison still has references, nothing else to do. */ + if (pr->pr_ref > 0) { + mtx_unlock(&pr->pr_mtx); + if (flags & PD_LIST_SLOCKED) + sx_sunlock(&allprison_lock); + else if (flags & PD_LIST_XLOCKED) + sx_xunlock(&allprison_lock); + return; } - } else if (!(flags & PD_LIST_XLOCKED)) - sx_xlock(&allprison_lock); - TAILQ_REMOVE(&allprison, pr, pr_list); - prisoncount--; - sx_xunlock(&allprison_lock); + mtx_unlock(&pr->pr_mtx); + if (flags & PD_LIST_SLOCKED) { + if (!sx_try_upgrade(&allprison_lock)) { + sx_sunlock(&allprison_lock); + sx_xlock(&allprison_lock); + } + } else if (!(flags & PD_LIST_XLOCKED)) + sx_xlock(&allprison_lock); - if (pr->pr_root != NULL) { - vfslocked = VFS_LOCK_GIANT(pr->pr_root->v_mount); - vrele(pr->pr_root); - VFS_UNLOCK_GIANT(vfslocked); - } - mtx_destroy(&pr->pr_mtx); + TAILQ_REMOVE(&allprison, pr, pr_list); + LIST_REMOVE(pr, pr_sibling); + ppr = pr->pr_parent; + for (tpr = ppr; tpr != NULL; tpr = tpr->pr_parent) + tpr->pr_prisoncount--; + sx_downgrade(&allprison_lock); + + if (pr->pr_root != NULL) { + vfslocked = VFS_LOCK_GIANT(pr->pr_root->v_mount); + vrele(pr->pr_root); + VFS_UNLOCK_GIANT(vfslocked); + } + mtx_destroy(&pr->pr_mtx); #ifdef INET - free(pr->pr_ip4, M_PRISON); + free(pr->pr_ip4, M_PRISON); #endif #ifdef INET6 - free(pr->pr_ip6, M_PRISON); + free(pr->pr_ip6, M_PRISON); #endif - if (pr->pr_cpuset != NULL) - cpuset_rel(pr->pr_cpuset); - osd_jail_exit(pr); - free(pr, M_PRISON); + if (pr->pr_cpuset != NULL) + cpuset_rel(pr->pr_cpuset); + osd_jail_exit(pr); + free(pr, M_PRISON); + + /* Removing a prison frees a reference on its parent. */ + pr = ppr; + mtx_lock(&pr->pr_mtx); + flags = PD_DEREF | PD_LIST_SLOCKED; + } } void @@ -1768,10 +2575,94 @@ #ifdef INET /* + * Restrict a prison's IP address list with its parent's, possibly replacing + * it. Return true if the replacement buffer was used (or would have been). + */ +static int +prison_restrict_ip4(struct prison *pr, struct in_addr *newip4) +{ + int ii, ij, used; + struct prison *ppr; + + ppr = pr->pr_parent; + if (!(pr->pr_flags & PR_IP4_USER)) { + /* This has no user settings, so just copy the parent's list. */ + if (pr->pr_ip4s < ppr->pr_ip4s) { + /* + * There's no room for the parent's list. Use the + * new list buffer, which is assumed to be big enough + * (if it was passed). If there's no buffer, try to + * allocate one. + */ + used = 1; + if (newip4 == NULL) { + newip4 = malloc(ppr->pr_ip4s * sizeof(*newip4), + M_PRISON, M_NOWAIT); + if (newip4 != NULL) + used = 0; + } + if (newip4 != NULL) { + pr->pr_ip4s = ppr->pr_ip4s; + free(pr->pr_ip4, M_PRISON); + pr->pr_ip4 = newip4; + bcopy(ppr->pr_ip4, newip4, + pr->pr_ip4s * sizeof(*newip4)); + pr->pr_flags |= PR_IP4; + } + return (used); + } + pr->pr_ip4s = ppr->pr_ip4s; + if (pr->pr_ip4s > 0) + bcopy(ppr->pr_ip4, pr->pr_ip4, + pr->pr_ip4s * sizeof(*newip4)); + else if (pr->pr_ip4 != NULL) { + free(pr->pr_ip4, M_PRISON); + pr->pr_ip4 = NULL; + } + pr->pr_flags = + (pr->pr_flags & ~PR_IP4) | (ppr->pr_flags & PR_IP4); + } else if (pr->pr_ip4s > 0 && (ppr->pr_flags & PR_IP4)) { + /* Remove addresses that aren't in the parent. */ + for (ij = 0; ij < ppr->pr_ip4s; ij++) + if (pr->pr_ip4[0].s_addr == ppr->pr_ip4[ij].s_addr) + break; + if (ij == ppr->pr_ip4s) + bcopy(pr->pr_ip4 + 1, pr->pr_ip4, + --pr->pr_ip4s * sizeof(*pr->pr_ip4)); + for (ii = ij = 1; ii < pr->pr_ip4s; ii++) { + if (pr->pr_ip4[ii].s_addr == ppr->pr_ip4[0].s_addr) + continue; + for (; ij < ppr->pr_ip4s; ij++) { + if (qcmp_v4(&pr->pr_ip4[ii], + &ppr->pr_ip4[ij].s_addr) <= 0) + break; + } + if (ij == ppr->pr_ip4s) { + pr->pr_ip4s = ii; + break; + } + if (qcmp_v4(&pr->pr_ip4[ii], &ppr->pr_ip4[ij]) > 0) { + if (ii < --pr->pr_ip4s) + bcopy(pr->pr_ip4 + ii + 1, + pr->pr_ip4 + ii, + (pr->pr_ip4s - ii) * + sizeof(*pr->pr_ip4)); + ii--; + } + } + if (pr->pr_ip4s == 0) { + free(pr->pr_ip4, M_PRISON); + pr->pr_ip4 = NULL; + } + } + return (0); +} + +/* * Pass back primary IPv4 address of this jail. * - * If not jailed return success but do not alter the address. Caller has to - * make sure to initialize it correctly (e.g. INADDR_ANY). + * If not restricted return success but do not alter the address. Caller has + * to make sure to initialize it correctly (e.g. INADDR_ANY). * * Returns 0 on success, EAFNOSUPPORT if the jail doesn't allow IPv4. * Address returned in NBO. @@ -1784,10 +2675,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia != NULL, ("%s: ia is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP4)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP4)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip4 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -1799,12 +2694,36 @@ } /* + * Return true if pr1 and pr2 have the same IPv4 address restrictions. + */ +int +prison_equal_ip4(struct prison *pr1, struct prison *pr2) +{ + if (pr1 == pr2) + return (1); + + /* + * jail_set maintains an exclusive hold on allprison_lock while it + * changes the IP addresses, so only a shared hold is needed. This is + * easier than locking the two prisons which would require finding the + * proper locking order and end up needing allprison_lock anyway. + */ + sx_slock(&allprison_lock); + while (pr1 != &prison0 && !(pr1->pr_flags & PR_IP4_USER)) + pr1 = pr1->pr_parent; + while (pr2 != &prison0 && !(pr2->pr_flags & PR_IP4_USER)) + pr2 = pr2->pr_parent; + sx_sunlock(&allprison_lock); + return (pr1 == pr2); +} + +/* * Make sure our (source) address is set to something meaningful to this * jail. * - * Returns 0 if not jailed or if address belongs to jail, EADDRNOTAVAIL if - * the address doesn't belong, or EAFNOSUPPORT if the jail doesn't allow IPv4. - * Address passed in in NBO and returned in NBO. + * Returns 0 if jail doesn't restrict IPv4 or if address belongs to jail, + * EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if the jail + * doesn't allow IPv4. Address passed in in NBO and returned in NBO. */ int prison_local_ip4(struct ucred *cred, struct in_addr *ia) @@ -1816,10 +2735,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia != NULL, ("%s: ia is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP4)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP4)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip4 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -1861,10 +2784,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia != NULL, ("%s: ia is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP4)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP4)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip4 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -1886,9 +2813,9 @@ /* * Check if given address belongs to the jail referenced by cred/prison. * - * Returns 0 if not jailed or if address belongs to jail, EADDRNOTAVAIL if - * the address doesn't belong, or EAFNOSUPPORT if the jail doesn't allow IPv4. - * Address passed in in NBO. + * Returns 0 if jail doesn't restrict IPv4 or if address belongs to jail, + * EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if the jail + * doesn't allow IPv4. Address passed in in NBO. */ static int _prison_check_ip4(struct prison *pr, struct in_addr *ia) @@ -1929,10 +2856,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia != NULL, ("%s: ia is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP4)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP4)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip4 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -1945,11 +2876,93 @@ #endif #ifdef INET6 +static int +prison_restrict_ip6(struct prison *pr, struct in6_addr *newip6) +{ + int ii, ij, used; + struct prison *ppr; + + ppr = pr->pr_parent; + if (!(pr->pr_flags & PR_IP6_USER)) { + /* This has no user settings, so just copy the parent's list. */ + if (pr->pr_ip6s < ppr->pr_ip6s) { + /* + * There's no room for the parent's list. Use the + * new list buffer, which is assumed to be big enough + * (if it was passed). If there's no buffer, try to + * allocate one. + */ + used = 1; + if (newip6 == NULL) { + newip6 = malloc(ppr->pr_ip6s * sizeof(*newip6), + M_PRISON, M_NOWAIT); + if (newip6 != NULL) + used = 0; + } + if (newip6 != NULL) { + pr->pr_ip6s = ppr->pr_ip6s; + free(pr->pr_ip6, M_PRISON); + pr->pr_ip6 = newip6; + bcopy(ppr->pr_ip6, newip6, + ppr->pr_ip6s * sizeof(*newip6)); + pr->pr_flags |= PR_IP6; + } + return (used); + } + pr->pr_ip6s = ppr->pr_ip6s; + if (pr->pr_ip6s > 0) + bcopy(ppr->pr_ip6, pr->pr_ip6, + pr->pr_ip6s * sizeof(*newip6)); + else if (pr->pr_ip6 != NULL) { + free(pr->pr_ip6, M_PRISON); + pr->pr_ip6 = NULL; + } + pr->pr_flags = + (pr->pr_flags & ~PR_IP6) | (ppr->pr_flags & PR_IP6); + } else if (pr->pr_ip6s > 0 && (ppr->pr_flags & PR_IP6)) { + /* Remove addresses that aren't in the parent. */ + for (ij = 0; ij < ppr->pr_ip6s; ij++) + if (IN6_ARE_ADDR_EQUAL(&pr->pr_ip6[0], + &ppr->pr_ip6[ij])) + break; + if (ij == ppr->pr_ip6s) + bcopy(pr->pr_ip6 + 1, pr->pr_ip6, + --pr->pr_ip6s * sizeof(*pr->pr_ip6)); + for (ii = ij = 1; ii < pr->pr_ip6s; ii++) { + if (IN6_ARE_ADDR_EQUAL(&pr->pr_ip6[ii], + &ppr->pr_ip6[0])) + continue; + for (; ij < ppr->pr_ip6s; ij++) { + if (qcmp_v6(&pr->pr_ip6[ii], + &ppr->pr_ip6[ij]) <= 0) + break; + } + if (ij == ppr->pr_ip6s) { + pr->pr_ip6s = ii; + break; + } + if (qcmp_v6(&pr->pr_ip6[ii], &ppr->pr_ip6[ij]) > 0) { + if (ii < --pr->pr_ip6s) + bcopy(pr->pr_ip6 + ii + 1, + pr->pr_ip6 + ii, + (pr->pr_ip6s - ii) * + sizeof(*pr->pr_ip6)); + ii--; + } + } + if (pr->pr_ip6s == 0) { + free(pr->pr_ip6, M_PRISON); + pr->pr_ip6 = NULL; + } + } + return 0; +} + /* * Pass back primary IPv6 address for this jail. * - * If not jailed return success but do not alter the address. Caller has to - * make sure to initialize it correctly (e.g. IN6ADDR_ANY_INIT). + * If not restricted return success but do not alter the address. Caller has + * to make sure to initialize it correctly (e.g. IN6ADDR_ANY_INIT). * * Returns 0 on success, EAFNOSUPPORT if the jail doesn't allow IPv6. */ @@ -1961,10 +2974,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia6 != NULL, ("%s: ia6 is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP6)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP6)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip6 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -1976,13 +2993,32 @@ } /* + * Return true if pr1 and pr2 have the same IPv6 address restrictions. + */ +int +prison_equal_ip6(struct prison *pr1, struct prison *pr2) +{ + if (pr1 == pr2) + return (1); + + sx_slock(&allprison_lock); + while (pr1 != &prison0 && !(pr1->pr_flags & PR_IP6_USER)) + pr1 = pr1->pr_parent; + while (pr2 != &prison0 && !(pr2->pr_flags & PR_IP6_USER)) + pr2 = pr1->pr_parent; + sx_sunlock(&allprison_lock); + return (pr1 == pr2); +} + +/* * Make sure our (source) address is set to something meaningful to this jail. * * v6only should be set based on (inp->inp_flags & IN6P_IPV6_V6ONLY != 0) * when needed while binding. * - * Returns 0 if not jailed or if address belongs to jail, EADDRNOTAVAIL if - * the address doesn't belong, or EAFNOSUPPORT if the jail doesn't allow IPv6. + * Returns 0 if jail doesn't restrict IPv6 or if address belongs to jail, + * EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if the jail + * doesn't allow IPv6. */ int prison_local_ip6(struct ucred *cred, struct in6_addr *ia6, int v6only) @@ -1993,10 +3029,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia6 != NULL, ("%s: ia6 is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP6)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP6)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip6 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -2037,10 +3077,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia6 != NULL, ("%s: ia6 is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP6)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP6)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip6 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -2062,8 +3106,9 @@ /* * Check if given address belongs to the jail referenced by cred/prison. * - * Returns 0 if not jailed or if address belongs to jail, EADDRNOTAVAIL if - * the address doesn't belong, or EAFNOSUPPORT if the jail doesn't allow IPv6. + * Returns 0 if jail doesn't restrict IPv6 or if address belongs to jail, + * EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if the jail + * doesn't allow IPv6. */ static int _prison_check_ip6(struct prison *pr, struct in6_addr *ia6) @@ -2104,10 +3149,14 @@ KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); KASSERT(ia6 != NULL, ("%s: ia6 is NULL", __func__)); - if (!jailed(cred)) + pr = cred->cr_prison; + if (!(pr->pr_flags & PR_IP6)) return (0); - pr = cred->cr_prison; mtx_lock(&pr->pr_mtx); + if (!(pr->pr_flags & PR_IP6)) { + mtx_unlock(&pr->pr_mtx); + return (0); + } if (pr->pr_ip6 == NULL) { mtx_unlock(&pr->pr_mtx); return (EAFNOSUPPORT); @@ -2128,34 +3177,42 @@ int prison_check_af(struct ucred *cred, int af) { + struct prison *pr; int error; KASSERT(cred != NULL, ("%s: cred is NULL", __func__)); - - if (!jailed(cred)) - return (0); - + pr = cred->cr_prison; error = 0; switch (af) { #ifdef INET case AF_INET: - if (cred->cr_prison->pr_ip4 == NULL) - error = EAFNOSUPPORT; + if (pr->pr_flags & PR_IP4) + { + mtx_lock(&pr->pr_mtx); + if ((pr->pr_flags & PR_IP4) && pr->pr_ip4 == NULL) + error = EAFNOSUPPORT; + mtx_unlock(&pr->pr_mtx); + } break; #endif #ifdef INET6 case AF_INET6: - if (cred->cr_prison->pr_ip6 == NULL) - error = EAFNOSUPPORT; + if (pr->pr_flags & PR_IP6) + { + mtx_lock(&pr->pr_mtx); + if ((pr->pr_flags & PR_IP6) && pr->pr_ip6 == NULL) + error = EAFNOSUPPORT; + mtx_unlock(&pr->pr_mtx); + } break; #endif case AF_LOCAL: case AF_ROUTE: break; default: - if (jail_socket_unixiproute_only) + if (pr->pr_flags & PR_RESTRICT_SOCKET_UNIXIPROUTE) error = EAFNOSUPPORT; } return (error); @@ -2165,9 +3222,9 @@ * Check if given address belongs to the jail referenced by cred (wrapper to * prison_check_ip[46]). * - * Returns 0 if not jailed or if address belongs to jail, EADDRNOTAVAIL if - * the address doesn't belong, or EAFNOSUPPORT if the jail doesn't allow - * the address family. IPv4 Address passed in in NBO. + * Returns 0 if jail doesn't restrict the address family or if address belongs + * to jail, EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if + * the jail doesn't allow the address family. IPv4 Address passed in in NBO. */ int prison_if(struct ucred *cred, struct sockaddr *sa) @@ -2199,7 +3256,7 @@ break; #endif default: - if (jailed(cred) && jail_socket_unixiproute_only) + if (cred->cr_prison->pr_flags & PR_RESTRICT_SOCKET_UNIXIPROUTE) error = EAFNOSUPPORT; } return (error); @@ -2212,13 +3269,20 @@ prison_check(struct ucred *cred1, struct ucred *cred2) { - if (jailed(cred1)) { - if (!jailed(cred2)) - return (ESRCH); - if (cred2->cr_prison != cred1->cr_prison) - return (ESRCH); - } + return (cred1->cr_prison == cred2->cr_prison || + prison_ischild(cred1->cr_prison, cred2->cr_prison) ? 0 : ESRCH); +} +/* + * Return 1 if p2 is a child of p1, otherwise 0. + */ +int +prison_ischild(struct prison *pr1, struct prison *pr2) +{ + + for (pr2 = pr2->pr_parent; pr2 != NULL; pr2 = pr2->pr_parent) + if (pr1 == pr2) + return (1); return (0); } @@ -2229,7 +3293,7 @@ jailed(struct ucred *cred) { - return (cred->cr_prison != NULL); + return (cred->cr_prison != &prison0); } /* @@ -2265,12 +3329,12 @@ struct statfs *sp; size_t len; - if (!jailed(cred) || jail_enforce_statfs == 0) + pr = cred->cr_prison; + if (pr->pr_enforce_statfs == 0) return (0); - pr = cred->cr_prison; if (pr->pr_root->v_mount == mp) return (0); - if (jail_enforce_statfs == 2) + if (pr->pr_enforce_statfs == 2) return (ENOENT); /* * If jail's chroot directory is set to "/" we should be able to see @@ -2300,9 +3364,9 @@ struct prison *pr; size_t len; - if (!jailed(cred) || jail_enforce_statfs == 0) + pr = cred->cr_prison; + if (pr->pr_enforce_statfs == 0) return; - pr = cred->cr_prison; if (prison_canseemount(cred, mp) != 0) { bzero(sp->f_mntonname, sizeof(sp->f_mntonname)); strlcpy(sp->f_mntonname, "[restricted]", @@ -2416,6 +3480,13 @@ case PRIV_MQ_ADMIN: /* + * Jail operations within a jail work on child jails. + */ + case PRIV_JAIL_ATTACH: + case PRIV_JAIL_SET: + case PRIV_JAIL_REMOVE: + + /* * Jail implements its own inter-process limits, so allow * root processes in jail to change scheduling on other * processes in the same jail. Likewise for signalling. @@ -2467,7 +3538,7 @@ * setting system flags. */ case PRIV_VFS_SYSFLAGS: - if (jail_chflags_allowed) + if (cred->cr_prison->pr_flags & PR_ALLOW_CHFLAGS) return (0); else return (EPERM); @@ -2480,7 +3551,7 @@ case PRIV_VFS_UNMOUNT: case PRIV_VFS_MOUNT_NONUSER: case PRIV_VFS_MOUNT_OWNER: - if (jail_mount_allowed) + if (cred->cr_prison->pr_flags & PR_ALLOW_MOUNT) return (0); else return (EPERM); @@ -2503,7 +3574,7 @@ * Conditionally allow creating raw sockets in jail. */ case PRIV_NETINET_RAW: - if (jail_allow_raw_sockets) + if (cred->cr_prison->pr_flags & PR_ALLOW_RAW_SOCKETS) return (0); else return (EPERM); @@ -2526,11 +3597,61 @@ } } +/* + * Return the part of pr2's name that is relative to pr1, or the whole name + * if it does not directly follow. + */ + +char * +prison_name(struct prison *pr1, struct prison *pr2) +{ + char *name; + + /* Jails see themselves as "0" (if they see themselves at all). */ + if (pr1 == pr2) + return "0"; + name = pr2->pr_name; + if (prison_ischild(pr1, pr2)) { + /* + * pr1 isn't locked (and allprison_lock may not be either) + * so its length can't be counted on. But the number of dots + * can be counted on - and counted. + */ + for (; pr1 != &prison0; pr1 = pr1->pr_parent) + name = strchr(name, '.') + 1; + } + return (name); +} + +/* + * Return the part of pr2's path that is relative to pr1, or the whole path + * if it does not directly follow. + */ +static char * +prison_path(struct prison *pr1, struct prison *pr2) +{ + char *path1, *path2; + int len1; + + path1 = pr1->pr_path; + path2 = pr2->pr_path; + if (!strcmp(path1, "/")) + return (path2); + len1 = strlen(path1); + if (strncmp(path1, path2, len1)) + return (path2); + if (path2[len1] == '\0') + return "/"; + if (path2[len1] == '/') + return (path2 + len1); + return (path2); +} + static int sysctl_jail_list(SYSCTL_HANDLER_ARGS) { struct xprison *xp; - struct prison *pr; + struct prison *pr, *cpr; #ifdef INET struct in_addr *ip4 = NULL; int ip4s = 0; @@ -2539,62 +3660,60 @@ struct in_addr *ip6 = NULL; int ip6s = 0; #endif - int error; + int descend, error; - if (jailed(req->td->td_ucred)) - return (0); - xp = malloc(sizeof(*xp), M_TEMP, M_WAITOK); + pr = req->td->td_ucred->cr_prison; error = 0; sx_slock(&allprison_lock); - TAILQ_FOREACH(pr, &allprison, pr_list) { + FOREACH_PRISON_DESCENDANT(pr, cpr, descend) { again: - mtx_lock(&pr->pr_mtx); + mtx_lock(&cpr->pr_mtx); #ifdef INET - if (pr->pr_ip4s > 0) { - if (ip4s < pr->pr_ip4s) { - ip4s = pr->pr_ip4s; - mtx_unlock(&pr->pr_mtx); + if (cpr->pr_ip4s > 0) { + if (ip4s < cpr->pr_ip4s) { + ip4s = cpr->pr_ip4s; + mtx_unlock(&cpr->pr_mtx); ip4 = realloc(ip4, ip4s * sizeof(struct in_addr), M_TEMP, M_WAITOK); goto again; } - bcopy(pr->pr_ip4, ip4, - pr->pr_ip4s * sizeof(struct in_addr)); + bcopy(cpr->pr_ip4, ip4, + cpr->pr_ip4s * sizeof(struct in_addr)); } #endif #ifdef INET6 - if (pr->pr_ip6s > 0) { - if (ip6s < pr->pr_ip6s) { - ip6s = pr->pr_ip6s; - mtx_unlock(&pr->pr_mtx); + if (cpr->pr_ip6s > 0) { + if (ip6s < cpr->pr_ip6s) { + ip6s = cpr->pr_ip6s; + mtx_unlock(&cpr->pr_mtx); ip6 = realloc(ip6, ip6s * sizeof(struct in6_addr), M_TEMP, M_WAITOK); goto again; } - bcopy(pr->pr_ip6, ip6, - pr->pr_ip6s * sizeof(struct in6_addr)); + bcopy(cpr->pr_ip6, ip6, + cpr->pr_ip6s * sizeof(struct in6_addr)); } #endif - if (pr->pr_ref == 0) { - mtx_unlock(&pr->pr_mtx); + if (cpr->pr_ref == 0) { + mtx_unlock(&cpr->pr_mtx); continue; } bzero(xp, sizeof(*xp)); xp->pr_version = XPRISON_VERSION; - xp->pr_id = pr->pr_id; - xp->pr_state = pr->pr_uref > 0 + xp->pr_id = cpr->pr_id; + xp->pr_state = cpr->pr_uref > 0 ? PRISON_STATE_ALIVE : PRISON_STATE_DYING; - strlcpy(xp->pr_path, pr->pr_path, sizeof(xp->pr_path)); - strlcpy(xp->pr_host, pr->pr_host, sizeof(xp->pr_host)); - strlcpy(xp->pr_name, pr->pr_name, sizeof(xp->pr_name)); + strlcpy(xp->pr_path, prison_path(pr, cpr), sizeof(xp->pr_path)); + strlcpy(xp->pr_host, cpr->pr_host, sizeof(xp->pr_host)); + strlcpy(xp->pr_name, prison_name(pr, cpr), sizeof(xp->pr_name)); #ifdef INET - xp->pr_ip4s = pr->pr_ip4s; + xp->pr_ip4s = cpr->pr_ip4s; #endif #ifdef INET6 - xp->pr_ip6s = pr->pr_ip6s; + xp->pr_ip6s = cpr->pr_ip6s; #endif - mtx_unlock(&pr->pr_mtx); + mtx_unlock(&cpr->pr_mtx); error = SYSCTL_OUT(req, xp, sizeof(*xp)); if (error) break; @@ -2649,6 +3768,7 @@ static void db_show_prison(struct prison *pr) { + int fi; #if defined(INET) || defined(INET6) int ii; #endif @@ -2659,6 +3779,7 @@ db_printf("prison %p:\n", pr); db_printf(" jid = %d\n", pr->pr_id); db_printf(" name = %s\n", pr->pr_name); + db_printf(" parent = %p\n", pr->pr_parent); db_printf(" ref = %d\n", pr->pr_ref); db_printf(" uref = %d\n", pr->pr_uref); db_printf(" path = %s\n", pr->pr_path); @@ -2666,10 +3787,18 @@ ? pr->pr_cpuset->cs_id : -1); db_printf(" root = %p\n", pr->pr_root); db_printf(" securelevel = %d\n", pr->pr_securelevel); + db_printf(" child = %p\n", LIST_FIRST(&pr->pr_children)); + db_printf(" sibling = %p\n", LIST_NEXT(pr, pr_sibling)); db_printf(" flags = %x", pr->pr_flags); - if (pr->pr_flags & PR_PERSIST) - db_printf(" persist"); + for (fi = 0; fi < sizeof(pr_flag_names) / sizeof(pr_flag_names[0]); + fi++) + if (pr_flag_names[fi] != NULL && (pr->pr_flags & (1 << fi))) + db_printf(" %s", pr_flag_names[fi]); db_printf("\n"); + db_printf(" enforce_statfs = %d\n", pr->pr_enforce_statfs); +#if defined(INET) || defined(INET6) + db_printf(" max_af_ips = %d\n", pr->pr_max_af_ips); +#endif db_printf(" host.hostname = %s\n", pr->pr_host); #ifdef INET db_printf(" ip4s = %d\n", pr->pr_ip4s); @@ -2692,7 +3821,11 @@ struct prison *pr; if (!have_addr) { - /* Show all prisons in the list. */ + /* + * Show all prisons in the list, and prison0 which is not + * listed. + */ + db_show_prison(&prison0); TAILQ_FOREACH(pr, &allprison, pr_list) { db_show_prison(pr); if (db_pager_quit) @@ -2701,18 +3834,22 @@ return; } - /* Look for a prison with the ID and with references. */ - TAILQ_FOREACH(pr, &allprison, pr_list) - if (pr->pr_id == addr && pr->pr_ref > 0) - break; - if (pr == NULL) - /* Look again, without requiring a reference. */ + if (addr == 0) + pr = &prison0; + else { + /* Look for a prison with the ID and with references. */ TAILQ_FOREACH(pr, &allprison, pr_list) - if (pr->pr_id == addr) + if (pr->pr_id == addr && pr->pr_ref > 0) break; - if (pr == NULL) - /* Assume address points to a valid prison. */ - pr = (struct prison *)addr; + if (pr == NULL) + /* Look again, without requiring a reference. */ + TAILQ_FOREACH(pr, &allprison, pr_list) + if (pr->pr_id == addr) + break; + if (pr == NULL) + /* Assume address points to a valid prison. */ + pr = (struct prison *)addr; + } db_show_prison(pr); } Index: sys/kern/sysv_msg.c =================================================================== --- sys/kern/sysv_msg.c (revision 191896) +++ sys/kern/sysv_msg.c (working copy) @@ -337,7 +337,7 @@ { int error; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); if (uap->which < 0 || uap->which >= sizeof(msgcalls)/sizeof(msgcalls[0])) @@ -410,7 +410,7 @@ int rval, error, msqix; register struct msqid_kernel *msqkptr; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); msqix = IPCID_TO_IX(msqid); @@ -564,7 +564,7 @@ DPRINTF(("msgget(0x%x, 0%o)\n", key, msgflg)); - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&msq_mtx); @@ -674,7 +674,7 @@ register struct msg *msghdr; short next; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&msq_mtx); @@ -1012,7 +1012,7 @@ int msqix, error = 0; short next; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); msqix = IPCID_TO_IX(msqid); Index: sys/kern/vfs_syscalls.c =================================================================== --- sys/kern/vfs_syscalls.c (revision 191896) +++ sys/kern/vfs_syscalls.c (working copy) @@ -164,12 +164,6 @@ return (0); } -/* XXX PRISON: could be per prison flag */ -static int prison_quotas; -#if 0 -SYSCTL_INT(_kern_prison, OID_AUTO, quotas, CTLFLAG_RW, &prison_quotas, 0, ""); -#endif - /* * Change filesystem quotas. */ @@ -198,7 +192,7 @@ AUDIT_ARG(cmd, uap->cmd); AUDIT_ARG(uid, uap->uid); - if (jailed(td->td_ucred) && !prison_quotas) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_QUOTAS)) return (EPERM); NDINIT(&nd, LOOKUP, FOLLOW | LOCKLEAF | MPSAFE | AUDITVNODE1, UIO_USERSPACE, uap->path, td); Index: sys/kern/init_main.c =================================================================== --- sys/kern/init_main.c (revision 191896) +++ sys/kern/init_main.c (working copy) @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -436,6 +437,7 @@ td->td_oncpu = 0; td->td_flags = TDF_INMEM|TDP_KTHREAD; td->td_cpuset = cpuset_thread0(); + prison0.pr_cpuset = cpuset_ref(td->td_cpuset); p->p_peers = 0; p->p_leader = p; @@ -452,7 +454,7 @@ p->p_ucred->cr_ngroups = 1; /* group 0 */ p->p_ucred->cr_uidinfo = uifind(0); p->p_ucred->cr_ruidinfo = uifind(0); - p->p_ucred->cr_prison = NULL; /* Don't jail it. */ + p->p_ucred->cr_prison = &prison0; #ifdef VIMAGE p->p_ucred->cr_vnet = LIST_FIRST(&vnet_head); #endif Index: sys/kern/sysv_sem.c =================================================================== --- sys/kern/sysv_sem.c (revision 191896) +++ sys/kern/sysv_sem.c (working copy) @@ -344,7 +344,7 @@ { int error; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); if (uap->which < 0 || uap->which >= sizeof(semcalls)/sizeof(semcalls[0])) @@ -583,7 +583,7 @@ DPRINTF(("call to semctl(%d, %d, %d, 0x%p)\n", semid, semnum, cmd, arg)); - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); array = NULL; @@ -855,7 +855,7 @@ struct ucred *cred = td->td_ucred; DPRINTF(("semget(0x%x, %d, 0%o)\n", key, nsems, semflg)); - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&sem_mtx); @@ -982,7 +982,7 @@ #endif DPRINTF(("call to semop(%d, %p, %u)\n", semid, sops, nsops)); - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); semid = IPCID_TO_IX(semid); /* Convert back to zero origin */ Index: sys/kern/kern_proc.c =================================================================== --- sys/kern/kern_proc.c (revision 191896) +++ sys/kern/kern_proc.c (working copy) @@ -739,8 +739,8 @@ /* If jailed(cred), emulate the old P_JAILED flag. */ if (jailed(cred)) { kp->ki_flag |= P_JAILED; - /* If inside a jail, use 0 as a jail ID. */ - if (!jailed(curthread->td_ucred)) + /* If inside the jail, use 0 as a jail ID. */ + if (cred->cr_prison != curthread->td_ucred->cr_prison) kp->ki_jid = cred->cr_prison->pr_id; } } Index: sys/kern/kern_linker.c =================================================================== --- sys/kern/kern_linker.c (revision 191896) +++ sys/kern/kern_linker.c (working copy) @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -375,7 +376,7 @@ int foundfile, error; /* Refuse to load modules if securelevel raised */ - if (securelevel > 0) + if (prison0.pr_securelevel > 0) return (EPERM); KLD_LOCK_ASSERT(); @@ -580,7 +581,7 @@ int error, i; /* Refuse to unload modules if securelevel raised. */ - if (securelevel > 0) + if (prison0.pr_securelevel > 0) return (EPERM); KLD_LOCK_ASSERT(); Index: sys/kern/sysv_shm.c =================================================================== --- sys/kern/sysv_shm.c (revision 191896) +++ sys/kern/sysv_shm.c (working copy) @@ -303,7 +303,7 @@ int i; int error = 0; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&Giant); shmmap_s = p->p_vmspace->vm_shm; @@ -357,7 +357,7 @@ int rv; int error = 0; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&Giant); shmmap_s = p->p_vmspace->vm_shm; @@ -480,7 +480,7 @@ struct shmid_kernel *shmseg; struct oshmid_ds outbuf; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&Giant); shmseg = shm_find_segment_by_shmid(uap->shmid); @@ -542,7 +542,7 @@ int error = 0; struct shmid_kernel *shmseg; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&Giant); @@ -823,7 +823,7 @@ int segnum, mode; int error; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); mtx_lock(&Giant); mode = uap->shmflg & ACCESSPERMS; @@ -861,7 +861,7 @@ #if defined(__i386__) && (defined(COMPAT_FREEBSD4) || defined(COMPAT_43)) int error; - if (!jail_sysvipc_allowed && jailed(td->td_ucred)) + if (!(td->td_ucred->cr_prison->pr_flags & PR_ALLOW_SYSVIPC)) return (ENOSYS); if (uap->which < 0 || uap->which >= sizeof(shmcalls)/sizeof(shmcalls[0])) Index: sys/kern/vfs_mount.c =================================================================== --- sys/kern/vfs_mount.c (revision 191896) +++ sys/kern/vfs_mount.c (working copy) @@ -1421,6 +1421,11 @@ root_mount_done(void) { + /* Keep prison0's root in sync with the global rootvnode. */ + mtx_lock(&prison0.pr_mtx); + prison0.pr_root = rootvnode; + vref(prison0.pr_root); + mtx_unlock(&prison0.pr_mtx); /* * Use a mutex to prevent the wakeup being missed and waiting for * an extra 1 second sleep. Index: sys/kern/kern_exit.c =================================================================== --- sys/kern/kern_exit.c (revision 191896) +++ sys/kern/kern_exit.c (working copy) @@ -454,9 +454,8 @@ p->p_xstat = rv; p->p_xthread = td; - /* In case we are jailed tell the prison that we are gone. */ - if (jailed(p->p_ucred)) - prison_proc_free(p->p_ucred->cr_prison); + /* Tell the prison that we are gone. */ + prison_proc_free(p->p_ucred->cr_prison); #ifdef KDTRACE_HOOKS /* Index: sys/kern/kern_prot.c =================================================================== --- sys/kern/kern_prot.c (revision 191896) +++ sys/kern/kern_prot.c (working copy) @@ -1262,33 +1262,25 @@ * (securelevel >= level). Note that the logic is inverted -- these * functions return EPERM on "success" and 0 on "failure". * + * Due to care taken when setting the securelevel, we know that no jail will + * be less secure that its parent (or the physical system), so it is sufficient + * to test the current jail only. + * * XXXRW: Possibly since this has to do with privilege, it should move to * kern_priv.c. */ int securelevel_gt(struct ucred *cr, int level) { - int active_securelevel; - active_securelevel = securelevel; - KASSERT(cr != NULL, ("securelevel_gt: null cr")); - if (cr->cr_prison != NULL) - active_securelevel = imax(cr->cr_prison->pr_securelevel, - active_securelevel); - return (active_securelevel > level ? EPERM : 0); + return (cr->cr_prison->pr_securelevel > level ? EPERM : 0); } int securelevel_ge(struct ucred *cr, int level) { - int active_securelevel; - active_securelevel = securelevel; - KASSERT(cr != NULL, ("securelevel_ge: null cr")); - if (cr->cr_prison != NULL) - active_securelevel = imax(cr->cr_prison->pr_securelevel, - active_securelevel); - return (active_securelevel >= level ? EPERM : 0); + return (cr->cr_prison->pr_securelevel >= level ? EPERM : 0); } /* @@ -1822,7 +1814,7 @@ /* * Free a prison, if any. */ - if (jailed(cr)) + if (cr->cr_prison != NULL) prison_free(cr->cr_prison); #ifdef AUDIT audit_cred_destroy(cr); @@ -1857,8 +1849,7 @@ (caddr_t)&src->cr_startcopy)); uihold(dest->cr_uidinfo); uihold(dest->cr_ruidinfo); - if (jailed(dest)) - prison_hold(dest->cr_prison); + prison_hold(dest->cr_prison); #ifdef AUDIT audit_cred_copy(src, dest); #endif Index: sys/kern/kern_descrip.c =================================================================== --- sys/kern/kern_descrip.c (revision 191896) +++ sys/kern/kern_descrip.c (working copy) @@ -2363,24 +2363,25 @@ } /* - * Scan all active processes to see if any of them have a current or root - * directory of `olddp'. If so, replace them with the new mount point. + * Scan all active processes and prisons to see if any of them have a current + * or root directory of `olddp'. If so, replace them with the new mount point. */ void mountcheckdirs(struct vnode *olddp, struct vnode *newdp) { struct filedesc *fdp; + struct prison *pr; struct proc *p; int nrele; if (vrefcnt(olddp) == 1) return; + nrele = 0; sx_slock(&allproc_lock); FOREACH_PROC_IN_SYSTEM(p) { fdp = fdhold(p); if (fdp == NULL) continue; - nrele = 0; FILEDESC_XLOCK(fdp); if (fdp->fd_cdir == olddp) { vref(newdp); @@ -2392,17 +2393,40 @@ fdp->fd_rdir = newdp; nrele++; } + if (fdp->fd_jdir == olddp) { + vref(newdp); + fdp->fd_jdir = newdp; + nrele++; + } FILEDESC_XUNLOCK(fdp); fddrop(fdp); - while (nrele--) - vrele(olddp); } sx_sunlock(&allproc_lock); if (rootvnode == olddp) { - vrele(rootvnode); vref(newdp); rootvnode = newdp; + nrele++; } + mtx_lock(&prison0.pr_mtx); + if (prison0.pr_root == olddp) { + vref(newdp); + prison0.pr_root = newdp; + nrele++; + } + mtx_unlock(&prison0.pr_mtx); + sx_slock(&allprison_lock); + TAILQ_FOREACH(pr, &allprison, pr_list) { + mtx_lock(&pr->pr_mtx); + if (pr->pr_root == olddp) { + vref(newdp); + pr->pr_root = newdp; + nrele++; + } + mtx_unlock(&pr->pr_mtx); + } + sx_sunlock(&allprison_lock); + while (nrele--) + vrele(olddp); } struct filedesc_to_leader * Index: sys/kern/kern_fork.c =================================================================== --- sys/kern/kern_fork.c (revision 191896) +++ sys/kern/kern_fork.c (working copy) @@ -46,6 +46,7 @@ #include #include #include +#include #include #include #include @@ -54,7 +55,6 @@ #include #include #include -#include #include #include #include @@ -455,9 +455,8 @@ p2->p_ucred = crhold(td->td_ucred); - /* In case we are jailed tell the prison that we exist. */ - if (jailed(p2->p_ucred)) - prison_proc_hold(p2->p_ucred->cr_prison); + /* Tell the prison that we exist. */ + prison_proc_hold(p2->p_ucred->cr_prison); PROC_UNLOCK(p2); Index: sys/kern/kern_cpuset.c =================================================================== --- sys/kern/kern_cpuset.c (revision 191896) +++ sys/kern/kern_cpuset.c (working copy) @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -53,7 +54,6 @@ #include #include #include -#include /* Must come after sys/proc.h */ #include @@ -225,23 +225,16 @@ KASSERT(td != NULL, ("[%s:%d] td is NULL", __func__, __LINE__)); if (set != NULL && jailed(td->td_ucred)) { - struct cpuset *rset, *jset; - struct prison *pr; + struct cpuset *jset, *tset; - rset = cpuset_refroot(set); - - pr = td->td_ucred->cr_prison; - mtx_lock(&pr->pr_mtx); - cpuset_ref(pr->pr_cpuset); - jset = pr->pr_cpuset; - mtx_unlock(&pr->pr_mtx); - - if (jset->cs_id != rset->cs_id) { + jset = td->td_ucred->cr_prison->pr_cpuset; + for (tset = set; tset != NULL; tset = tset->cs_parent) + if (tset == jset) + break; + if (tset == NULL) { cpuset_rel(set); set = NULL; } - cpuset_rel(jset); - cpuset_rel(rset); } return (set); @@ -303,7 +296,7 @@ /* * Recursively check for errors that would occur from applying mask to * the tree of sets starting at 'set'. Checks for sets that would become - * empty as well as RDONLY flags. + * empty as well as RDONLY flags. Do not check jails. */ static int cpuset_testupdate(struct cpuset *set, cpuset_t *mask) @@ -320,14 +313,19 @@ CPU_COPY(&set->cs_mask, &newmask); CPU_AND(&newmask, mask); error = 0; - LIST_FOREACH(nset, &set->cs_children, cs_siblings) + LIST_FOREACH(nset, &set->cs_children, cs_siblings) { + if (set->cs_flags & CPU_SET_ROOT) + continue; if ((error = cpuset_testupdate(nset, &newmask)) != 0) break; + } return (error); } /* - * Applies the mask 'mask' without checking for empty sets or permissions. + * Apply the mask 'mask' to the cpuset and its children. Ignore permission + * errors, and replace any empty sets (which may occur under jails) with their + * parent's mask. */ static void cpuset_update(struct cpuset *set, cpuset_t *mask) @@ -336,6 +334,8 @@ mtx_assert(&cpuset_lock, MA_OWNED); CPU_AND(&set->cs_mask, mask); + if (CPU_EMPTY(&set->cs_mask)) + CPU_COPY(mask, &set->cs_mask); LIST_FOREACH(nset, &set->cs_children, cs_siblings) cpuset_update(nset, &set->cs_mask); @@ -456,25 +456,14 @@ struct prison *pr; sx_slock(&allprison_lock); - pr = prison_find(id); + pr = prison_find_child(curthread->td_ucred->cr_prison, id); sx_sunlock(&allprison_lock); if (pr == NULL) return (ESRCH); - if (jailed(curthread->td_ucred)) { - if (curthread->td_ucred->cr_prison == pr) { - cpuset_ref(pr->pr_cpuset); - set = pr->pr_cpuset; - } - } else { - cpuset_ref(pr->pr_cpuset); - set = pr->pr_cpuset; - } + cpuset_ref(pr->pr_cpuset); + *setp = pr->pr_cpuset; mtx_unlock(&pr->pr_mtx); - if (set) { - *setp = set; - return (0); - } - return (ESRCH); + return (0); } case CPU_WHICH_IRQ: return (0); @@ -731,21 +720,17 @@ * In case of no error, returns the set in *setp locked with a reference. */ int -cpuset_create_root(struct thread *td, struct cpuset **setp) +cpuset_create_root(struct prison *pr, struct cpuset **setp) { struct cpuset *root; struct cpuset *set; int error; - KASSERT(td != NULL, ("[%s:%d] invalid td", __func__, __LINE__)); + KASSERT(pr != NULL, ("[%s:%d] invalid pr", __func__, __LINE__)); KASSERT(setp != NULL, ("[%s:%d] invalid setp", __func__, __LINE__)); - thread_lock(td); - root = cpuset_refroot(td->td_cpuset); - thread_unlock(td); - - error = cpuset_create(setp, td->td_cpuset, &root->cs_mask); - cpuset_rel(root); + root = pr->pr_cpuset; + error = cpuset_create(setp, root, &root->cs_mask); if (error) return (error); Index: sys/kern/vfs_cache.c =================================================================== --- sys/kern/vfs_cache.c (revision 191896) +++ sys/kern/vfs_cache.c (working copy) @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -1078,6 +1079,7 @@ char *bp; int error, i, slash_prefixed; struct namecache *ncp; + struct vnode *pr_root; #ifdef KDTRACE_HOOKS struct vnode *startvp = vp; #endif @@ -1130,7 +1132,8 @@ buflen--; slash_prefixed = 1; } - while (vp != rdir && vp != rootvnode) { + pr_root = td->td_ucred->cr_prison->pr_root; + while (vp != rdir && vp != pr_root && vp != rootvnode) { if (vp->v_vflag & VV_ROOT) { if (vp->v_iflag & VI_DOOMED) { /* forced unmount */ CACHE_RUNLOCK(); Index: sys/kern/kern_mib.c =================================================================== --- sys/kern/kern_mib.c (revision 191896) +++ sys/kern/kern_mib.c (working copy) @@ -52,6 +52,7 @@ #include #include #include +#include #include #include @@ -228,7 +229,7 @@ pr = req->td->td_ucred->cr_prison; if (pr != NULL) { - if (!jail_set_hostname_allowed && req->newptr) + if (!(pr->pr_flags & PR_ALLOW_SET_HOSTNAME) && req->newptr) return (EPERM); /* * Process is in jail, so make a local copy of jail @@ -277,55 +278,43 @@ ®ression_securelevel_nonmonotonic, 0, "securelevel may be lowered"); #endif -int securelevel = -1; -static struct mtx securelevel_mtx; - -MTX_SYSINIT(securelevel_lock, &securelevel_mtx, "securelevel mutex lock", - MTX_DEF); - static int sysctl_kern_securelvl(SYSCTL_HANDLER_ARGS) { - struct prison *pr; - int error, level; + struct prison *pr, *cpr; + int descend, error, level; pr = req->td->td_ucred->cr_prison; /* - * If the process is in jail, return the maximum of the global and - * local levels; otherwise, return the global level. Perform a - * lockless read since the securelevel is an integer. + * Reading the securelevel is easy, since the current jail's level + * is known to be at least as secure as any higher levels. Perform + * a lockless read since the securelevel is an integer. */ - if (pr != NULL) - level = imax(securelevel, pr->pr_securelevel); - else - level = securelevel; + level = pr->pr_securelevel; error = sysctl_handle_int(oidp, &level, 0, req); if (error || !req->newptr) return (error); + /* Permit update only if the new securelevel exceeds the old. */ + sx_slock(&allprison_lock); + mtx_lock(&pr->pr_mtx); + if (!regression_securelevel_nonmonotonic && + level < pr->pr_securelevel) { + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); + return (EPERM); + } + pr->pr_securelevel = level; /* - * Permit update only if the new securelevel exceeds the - * global level, and local level if any. + * Set all child jails to be at least this level, but do not lower + * them (even if regression_securelevel_nonmonotonic). */ - if (pr != NULL) { - mtx_lock(&pr->pr_mtx); - if (!regression_securelevel_nonmonotonic && - (level < imax(securelevel, pr->pr_securelevel))) { - mtx_unlock(&pr->pr_mtx); - return (EPERM); - } - pr->pr_securelevel = level; - mtx_unlock(&pr->pr_mtx); - } else { - mtx_lock(&securelevel_mtx); - if (!regression_securelevel_nonmonotonic && - (level < securelevel)) { - mtx_unlock(&securelevel_mtx); - return (EPERM); - } - securelevel = level; - mtx_unlock(&securelevel_mtx); + FOREACH_PRISON_DESCENDANT_LOCKED(pr, cpr, descend) { + if (cpr->pr_securelevel < level) + cpr->pr_securelevel = level; } + mtx_unlock(&pr->pr_mtx); + sx_sunlock(&allprison_lock); return (error); } Index: sys/kern/vfs_subr.c =================================================================== --- sys/kern/vfs_subr.c (revision 191896) +++ sys/kern/vfs_subr.c (working copy) @@ -467,22 +467,14 @@ return (EPERM); /* - * If the file system was mounted outside a jail and a jailed thread - * tries to access it, deny immediately. + * If the file system was mounted outside the jail of the calling + * thread, deny immediately. */ - if (!jailed(mp->mnt_cred) && jailed(td->td_ucred)) + if (mp->mnt_cred->cr_prison != td->td_ucred->cr_prison && + !prison_ischild(td->td_ucred->cr_prison, mp->mnt_cred->cr_prison)) return (EPERM); /* - * If the file system was mounted inside different jail that the jail of - * the calling thread, deny immediately. - */ - if (jailed(mp->mnt_cred) && jailed(td->td_ucred) && - mp->mnt_cred->cr_prison != td->td_ucred->cr_prison) { - return (EPERM); - } - - /* * If file system supports delegated administration, we don't check * for the PRIV_VFS_MOUNT_OWNER privilege - it will be better verified * by the file system itself. @@ -2900,7 +2892,7 @@ db_printf(" mnt_cred = { uid=%u ruid=%u", (u_int)mp->mnt_cred->cr_uid, (u_int)mp->mnt_cred->cr_ruid); - if (mp->mnt_cred->cr_prison != NULL) + if (jailed(mp->mnt_cred)) db_printf(", jail=%d", mp->mnt_cred->cr_prison->pr_id); db_printf(" }\n"); db_printf(" mnt_ref = %d\n", mp->mnt_ref); Index: sys/netinet/in_pcb.c =================================================================== --- sys/netinet/in_pcb.c (revision 191896) +++ sys/netinet/in_pcb.c (working copy) @@ -600,7 +600,7 @@ goto done; } - if (cred == NULL || !jailed(cred)) { + if (cred == NULL || !(cred->cr_prison->pr_flags & PR_IP4)) { laddr->s_addr = ia->ia_addr.sin_addr.s_addr; goto done; } @@ -644,7 +644,7 @@ struct ifnet *ifp; /* If not jailed, use the default returned. */ - if (cred == NULL || !jailed(cred)) { + if (cred == NULL || !(cred->cr_prison->pr_flags & PR_IP4)) { ia = (struct in_ifaddr *)sro.ro_rt->rt_ifa; laddr->s_addr = ia->ia_addr.sin_addr.s_addr; goto done; @@ -709,7 +709,7 @@ if (ia == NULL) ia = ifatoia(ifa_ifwithnet(sintosa(&sain))); - if (cred == NULL || !jailed(cred)) { + if (cred == NULL || !(cred->cr_prison->pr_flags & PR_IP4)) { #if __FreeBSD_version < 800000 if (ia == NULL) ia = (struct in_ifaddr *)sro.ro_rt->rt_ifa; @@ -1220,7 +1220,8 @@ * Found? */ if (cred == NULL || - inp->inp_cred->cr_prison == cred->cr_prison) + prison_equal_ip4(cred->cr_prison, + inp->inp_cred->cr_prison)) return (inp); } } @@ -1252,7 +1253,8 @@ LIST_FOREACH(inp, &phd->phd_pcblist, inp_portlist) { wildcard = 0; if (cred != NULL && - inp->inp_cred->cr_prison != cred->cr_prison) + !prison_equal_ip4(inp->inp_cred->cr_prison, + cred->cr_prison)) continue; #ifdef INET6 /* XXX inp locking */ @@ -1333,7 +1335,7 @@ * the inp here, without any checks. * Well unless both bound with SO_REUSEPORT? */ - if (jailed(inp->inp_cred)) + if (inp->inp_cred->cr_prison->pr_flags & PR_IP4) return (inp); if (tmpinp == NULL) tmpinp = inp; @@ -1378,7 +1380,7 @@ (inp->inp_flags & INP_FAITH) == 0) continue; - injail = jailed(inp->inp_cred); + injail = inp->inp_cred->cr_prison->pr_flags & PR_IP4; if (injail) { if (prison_check_ip4(inp->inp_cred, &laddr) != 0) Index: sys/netinet/udp_usrreq.c =================================================================== --- sys/netinet/udp_usrreq.c (revision 191896) +++ sys/netinet/udp_usrreq.c (working copy) @@ -988,7 +988,7 @@ * Remember addr if jailed, to prevent * rebinding. */ - if (jailed(td->td_ucred)) + if (td->td_ucred->cr_prison->pr_flags & PR_IP4) inp->inp_laddr = laddr; inp->inp_lport = lport; if (in_pcbinshash(inp) != 0) { Index: sys/fs/procfs/procfs_status.c =================================================================== --- sys/fs/procfs/procfs_status.c (revision 191896) +++ sys/fs/procfs/procfs_status.c (working copy) @@ -151,10 +151,11 @@ sbuf_printf(sb, ",%lu", (u_long)cr->cr_groups[i]); } - if (jailed(p->p_ucred)) { - mtx_lock(&p->p_ucred->cr_prison->pr_mtx); - sbuf_printf(sb, " %s", p->p_ucred->cr_prison->pr_host); - mtx_unlock(&p->p_ucred->cr_prison->pr_mtx); + if (jailed(cr)) { + mtx_lock(&cr->cr_prison->pr_mtx); + sbuf_printf(sb, " %s", + prison_name(td->td_ucred->cr_prison, cr->cr_prison)); + mtx_unlock(&cr->cr_prison->pr_mtx); } else { sbuf_printf(sb, " -"); } Index: sys/nfsserver/nfs_srvsock.c =================================================================== --- sys/nfsserver/nfs_srvsock.c (revision 191896) +++ sys/nfsserver/nfs_srvsock.c (working copy) @@ -43,6 +43,7 @@ #include #include +#include #include #include #include @@ -699,6 +700,8 @@ nd = malloc(sizeof (struct nfsrv_descript), M_NFSRVDESC, M_WAITOK); nd->nd_cr = crget(); + nd->nd_cr->cr_prison = &prison0; + prison_hold(&prison0); NFSD_LOCK(); nd->nd_md = nd->nd_mrep = m; nd->nd_nam2 = nam; Index: sys/compat/freebsd32/freebsd32_misc.c =================================================================== --- sys/compat/freebsd32/freebsd32_misc.c (revision 191896) +++ sys/compat/freebsd32/freebsd32_misc.c (working copy) @@ -112,8 +112,6 @@ CTASSERT(sizeof(struct stat32) == 96); CTASSERT(sizeof(struct sigaction32) == 24); -extern int jail_max_af_ips; - static int freebsd32_kevent_copyout(void *arg, struct kevent *kevp, int count); static int freebsd32_kevent_copyin(void *arg, struct kevent *kevp, int count); @@ -2126,7 +2124,7 @@ return (error); tmplen = MAXPATHLEN + MAXHOSTNAMELEN + MAXHOSTNAMELEN; #ifdef INET - if (j32.ip4s > jail_max_af_ips) + if (j32.ip4s > td->td_ucred->cr_prison->pr_max_af_ips) return (EINVAL); tmplen += j32.ip4s * sizeof(struct in_addr); #else @@ -2134,7 +2132,7 @@ return (EINVAL); #endif #ifdef INET6 - if (j32.ip6s > jail_max_af_ips) + if (j32.ip6s > td->td_ucred->cr_prison->pr_max_af_ips) return (EINVAL); tmplen += j32.ip6s * sizeof(struct in6_addr); #else Index: sys/compat/linux/linux_mib.c =================================================================== --- sys/compat/linux/linux_mib.c (revision 191896) +++ sys/compat/linux/linux_mib.c (working copy) @@ -57,16 +57,18 @@ int pr_use_linux26; /* flag to determine whether to use 2.6 emulation */ }; +static struct linux_prison lprison0 = { + .pr_osname = "Linux", + .pr_osrelease = "2.6.16", + .pr_oss_version = 0x030600, + .pr_use_linux26 = 1, +}; + static unsigned linux_osd_jail_slot; SYSCTL_NODE(_compat, OID_AUTO, linux, CTLFLAG_RW, 0, "Linux mode"); -static struct mtx osname_lock; -MTX_SYSINIT(linux_osname, &osname_lock, "linux osname", MTX_DEF); - -static char linux_osname[LINUX_MAX_UTSNAME] = "Linux"; - static int linux_sysctl_osname(SYSCTL_HANDLER_ARGS) { @@ -86,9 +88,6 @@ 0, 0, linux_sysctl_osname, "A", "Linux kernel OS name"); -static char linux_osrelease[LINUX_MAX_UTSNAME] = "2.6.16"; -static int linux_use_linux26 = 1; - static int linux_sysctl_osrelease(SYSCTL_HANDLER_ARGS) { @@ -108,8 +107,6 @@ 0, 0, linux_sysctl_osrelease, "A", "Linux kernel OS release"); -static int linux_oss_version = 0x030600; - static int linux_sysctl_oss_version(SYSCTL_HANDLER_ARGS) { @@ -130,69 +127,74 @@ "Linux OSS version"); /* - * Returns holding the prison mutex if return non-NULL. + * Find a prison with Linux info. + * Return the Linux info and the (locked) prison. */ static struct linux_prison * -linux_get_prison(struct thread *td, struct prison **prp) +linux_find_prison(struct prison *spr, struct prison **prp) { struct prison *pr; struct linux_prison *lpr; - KASSERT(td == curthread, ("linux_get_prison() called on !curthread")); - *prp = pr = td->td_ucred->cr_prison; - if (pr == NULL || !linux_osd_jail_slot) - return (NULL); - mtx_lock(&pr->pr_mtx); - lpr = osd_jail_get(pr, linux_osd_jail_slot); - if (lpr == NULL) + if (!linux_osd_jail_slot) + /* In case osd_register failed. */ + spr = &prison0; + for (pr = spr;; pr = pr->pr_parent) { + mtx_lock(&pr->pr_mtx); + lpr = (pr == &prison0) + ? &lprison0 + : osd_jail_get(pr, linux_osd_jail_slot); + if (lpr != NULL) + break; mtx_unlock(&pr->pr_mtx); + } + *prp = pr; return (lpr); } /* - * Ensure a prison has its own Linux info. The prison should be locked on - * entrance and will be locked on exit (though it may get unlocked in the - * interrim). + * Ensure a prison has its own Linux info. If lprp is non-null, point it to + * the Linux info and lock the prison. */ static int linux_alloc_prison(struct prison *pr, struct linux_prison **lprp) { + struct prison *ppr; struct linux_prison *lpr, *nlpr; int error; /* If this prison already has Linux info, return that. */ error = 0; - mtx_assert(&pr->pr_mtx, MA_OWNED); - lpr = osd_jail_get(pr, linux_osd_jail_slot); - if (lpr != NULL) + lpr = linux_find_prison(pr, &ppr); + if (ppr == pr) goto done; /* * Allocate a new info record. Then check again, in case something * changed during the allocation. */ - mtx_unlock(&pr->pr_mtx); + mtx_unlock(&ppr->pr_mtx); nlpr = malloc(sizeof(struct linux_prison), M_PRISON, M_WAITOK); - mtx_lock(&pr->pr_mtx); - lpr = osd_jail_get(pr, linux_osd_jail_slot); - if (lpr != NULL) { + lpr = linux_find_prison(pr, &ppr); + if (ppr == pr) { free(nlpr, M_PRISON); goto done; } + /* Inherit the initial values from the ancestor. */ + mtx_lock(&pr->pr_mtx); error = osd_jail_set(pr, linux_osd_jail_slot, nlpr); - if (error) - free(nlpr, M_PRISON); - else { + if (error == 0) { + bcopy(lpr, nlpr, sizeof(*lpr)); lpr = nlpr; - mtx_lock(&osname_lock); - strncpy(lpr->pr_osname, linux_osname, LINUX_MAX_UTSNAME); - strncpy(lpr->pr_osrelease, linux_osrelease, LINUX_MAX_UTSNAME); - lpr->pr_oss_version = linux_oss_version; - lpr->pr_use_linux26 = linux_use_linux26; - mtx_unlock(&osname_lock); + } else { + free(nlpr, M_PRISON); + lpr = NULL; } -done: + mtx_unlock(&ppr->pr_mtx); + done: if (lprp != NULL) *lprp = lpr; + else + mtx_unlock(&pr->pr_mtx); return (error); } @@ -202,7 +204,6 @@ static int linux_prison_create(void *obj, void *data) { - int error; struct prison *pr = obj; struct vfsoptlist *opts = data; @@ -212,10 +213,7 @@ * Inherit a prison's initial values from its parent * (different from NULL which also inherits changes). */ - mtx_lock(&pr->pr_mtx); - error = linux_alloc_prison(pr, NULL); - mtx_unlock(&pr->pr_mtx); - return (error); + return linux_alloc_prison(pr, NULL); } static int @@ -223,8 +221,7 @@ { struct vfsoptlist *opts = data; char *osname, *osrelease; - size_t len; - int error, oss_version; + int error, len, oss_version; /* Check that the parameters are correct. */ (void)vfs_flagopt(opts, "linux", NULL, 0); @@ -263,8 +260,7 @@ struct prison *pr = obj; struct vfsoptlist *opts = data; char *osname, *osrelease; - size_t len; - int error, gotversion, nolinux, oss_version, yeslinux; + int error, gotversion, len, nolinux, oss_version, yeslinux; /* Set the parameters, which should be correct. */ yeslinux = vfs_flagopt(opts, "linux", NULL, 0); @@ -281,7 +277,7 @@ yeslinux = 1; error = vfs_copyopt(opts, "linux.oss_version", &oss_version, sizeof(oss_version)); - gotversion = error == 0; + gotversion = (error == 0); yeslinux |= gotversion; if (nolinux) { /* "nolinux": inherit the parent's Linux info. */ @@ -293,7 +289,6 @@ * "linux" or "linux.*": * the prison gets its own Linux info. */ - mtx_lock(&pr->pr_mtx); error = linux_alloc_prison(pr, &lpr); if (error) { mtx_unlock(&pr->pr_mtx); @@ -328,14 +323,18 @@ linux_prison_get(void *obj, void *data) { struct linux_prison *lpr; + struct prison *ppr; struct prison *pr = obj; struct vfsoptlist *opts = data; int error, i; - mtx_lock(&pr->pr_mtx); - /* Tell whether this prison has its own Linux info. */ - lpr = osd_jail_get(pr, linux_osd_jail_slot); - i = lpr != NULL; + /* + * Report on the prison that actually has the Linux info. It's + * kind of bogus to give an ancestor's info, but leave it to the + * caller to check the flag set below. + */ + lpr = linux_find_prison(pr, &ppr); + i = (ppr == pr); error = vfs_setopt(opts, "linux", &i, sizeof(i)); if (error != 0 && error != ENOENT) goto done; @@ -343,39 +342,20 @@ error = vfs_setopt(opts, "nolinux", &i, sizeof(i)); if (error != 0 && error != ENOENT) goto done; - /* - * It's kind of bogus to give the root info, but leave it to the caller - * to check the above flag. - */ - if (lpr != NULL) { - error = vfs_setopts(opts, "linux.osname", lpr->pr_osname); - if (error != 0 && error != ENOENT) - goto done; - error = vfs_setopts(opts, "linux.osrelease", lpr->pr_osrelease); - if (error != 0 && error != ENOENT) - goto done; - error = vfs_setopt(opts, "linux.oss_version", - &lpr->pr_oss_version, sizeof(lpr->pr_oss_version)); - if (error != 0 && error != ENOENT) - goto done; - } else { - mtx_lock(&osname_lock); - error = vfs_setopts(opts, "linux.osname", linux_osname); - if (error != 0 && error != ENOENT) - goto done; - error = vfs_setopts(opts, "linux.osrelease", linux_osrelease); - if (error != 0 && error != ENOENT) - goto done; - error = vfs_setopt(opts, "linux.oss_version", - &linux_oss_version, sizeof(linux_oss_version)); - if (error != 0 && error != ENOENT) - goto done; - mtx_unlock(&osname_lock); - } + error = vfs_setopts(opts, "linux.osname", lpr->pr_osname); + if (error != 0 && error != ENOENT) + goto done; + error = vfs_setopts(opts, "linux.osrelease", lpr->pr_osrelease); + if (error != 0 && error != ENOENT) + goto done; + error = vfs_setopt(opts, "linux.oss_version", &lpr->pr_oss_version, + sizeof(lpr->pr_oss_version)); + if (error != 0 && error != ENOENT) + goto done; error = 0; done: - mtx_unlock(&pr->pr_mtx); + mtx_unlock(&ppr->pr_mtx); return (error); } @@ -402,11 +382,8 @@ if (linux_osd_jail_slot > 0) { /* Copy the system linux info to any current prisons. */ sx_xlock(&allprison_lock); - TAILQ_FOREACH(pr, &allprison, pr_list) { - mtx_lock(&pr->pr_mtx); + TAILQ_FOREACH(pr, &allprison, pr_list) (void)linux_alloc_prison(pr, NULL); - mtx_unlock(&pr->pr_mtx); - } sx_xunlock(&allprison_lock); } } @@ -425,15 +402,9 @@ struct prison *pr; struct linux_prison *lpr; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - bcopy(lpr->pr_osname, dst, LINUX_MAX_UTSNAME); - mtx_unlock(&pr->pr_mtx); - } else { - mtx_lock(&osname_lock); - bcopy(linux_osname, dst, LINUX_MAX_UTSNAME); - mtx_unlock(&osname_lock); - } + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + bcopy(lpr->pr_osname, dst, LINUX_MAX_UTSNAME); + mtx_unlock(&pr->pr_mtx); } int @@ -442,16 +413,9 @@ struct prison *pr; struct linux_prison *lpr; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - strlcpy(lpr->pr_osname, osname, LINUX_MAX_UTSNAME); - mtx_unlock(&pr->pr_mtx); - } else { - mtx_lock(&osname_lock); - strcpy(linux_osname, osname); - mtx_unlock(&osname_lock); - } - + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + strlcpy(lpr->pr_osname, osname, LINUX_MAX_UTSNAME); + mtx_unlock(&pr->pr_mtx); return (0); } @@ -461,15 +425,9 @@ struct prison *pr; struct linux_prison *lpr; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - bcopy(lpr->pr_osrelease, dst, LINUX_MAX_UTSNAME); - mtx_unlock(&pr->pr_mtx); - } else { - mtx_lock(&osname_lock); - bcopy(linux_osrelease, dst, LINUX_MAX_UTSNAME); - mtx_unlock(&osname_lock); - } + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + bcopy(lpr->pr_osrelease, dst, LINUX_MAX_UTSNAME); + mtx_unlock(&pr->pr_mtx); } int @@ -479,12 +437,9 @@ struct linux_prison *lpr; int use26; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - use26 = lpr->pr_use_linux26; - mtx_unlock(&pr->pr_mtx); - } else - use26 = linux_use_linux26; + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + use26 = lpr->pr_use_linux26; + mtx_unlock(&pr->pr_mtx); return (use26); } @@ -494,20 +449,10 @@ struct prison *pr; struct linux_prison *lpr; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - strlcpy(lpr->pr_osrelease, osrelease, LINUX_MAX_UTSNAME); - lpr->pr_use_linux26 = - strlen(osrelease) >= 3 && osrelease[2] == '6'; - mtx_unlock(&pr->pr_mtx); - } else { - mtx_lock(&osname_lock); - strcpy(linux_osrelease, osrelease); - linux_use_linux26 = - strlen(osrelease) >= 3 && osrelease[2] == '6'; - mtx_unlock(&osname_lock); - } - + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + strlcpy(lpr->pr_osrelease, osrelease, LINUX_MAX_UTSNAME); + lpr->pr_use_linux26 = strlen(osrelease) >= 3 && osrelease[2] == '6'; + mtx_unlock(&pr->pr_mtx); return (0); } @@ -518,12 +463,9 @@ struct linux_prison *lpr; int version; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - version = lpr->pr_oss_version; - mtx_unlock(&pr->pr_mtx); - } else - version = linux_oss_version; + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + version = lpr->pr_oss_version; + mtx_unlock(&pr->pr_mtx); return (version); } @@ -533,16 +475,9 @@ struct prison *pr; struct linux_prison *lpr; - lpr = linux_get_prison(td, &pr); - if (lpr != NULL) { - lpr->pr_oss_version = oss_version; - mtx_unlock(&pr->pr_mtx); - } else { - mtx_lock(&osname_lock); - linux_oss_version = oss_version; - mtx_unlock(&osname_lock); - } - + lpr = linux_find_prison(td->td_ucred->cr_prison, &pr); + lpr->pr_oss_version = oss_version; + mtx_unlock(&pr->pr_mtx); return (0); } Index: sys/net/rtsock.c =================================================================== --- sys/net/rtsock.c (revision 191896) +++ sys/net/rtsock.c (working copy) @@ -373,6 +373,8 @@ /* * As a last resort return the 'default' jail address. */ + ia = ((struct sockaddr_in *)rt->rt_ifa->ifa_addr)-> + sin_addr; if (prison_get_ip4(cred, &ia) != 0) return (ESRCH); } @@ -414,6 +416,8 @@ /* * As a last resort return the 'default' jail address. */ + ia6 = ((struct sockaddr_in6 *)rt->rt_ifa->ifa_addr)-> + sin6_addr; if (prison_get_ip6(cred, &ia6) != 0) return (ESRCH); } Index: sys/netinet6/in6_pcb.c =================================================================== --- sys/netinet6/in6_pcb.c (revision 191896) +++ sys/netinet6/in6_pcb.c (working copy) @@ -666,7 +666,8 @@ inp->inp_lport == lport) { /* Found. */ if (cred == NULL || - inp->inp_cred->cr_prison == cred->cr_prison) + prison_equal_ip6(cred->cr_prison, + inp->inp_cred->cr_prison)) return (inp); } } @@ -698,7 +699,8 @@ LIST_FOREACH(inp, &phd->phd_pcblist, inp_portlist) { wildcard = 0; if (cred != NULL && - inp->inp_cred->cr_prison != cred->cr_prison) + !prison_equal_ip6(cred->cr_prison, + inp->inp_cred->cr_prison)) continue; /* XXX inp locking */ if ((inp->inp_vflag & INP_IPV6) == 0) @@ -838,7 +840,7 @@ * the inp here, without any checks. * Well unless both bound with SO_REUSEPORT? */ - if (jailed(inp->inp_cred)) + if (inp->inp_cred->cr_prison->pr_flags & PR_IP6) return (inp); if (tmpinp == NULL) tmpinp = inp; @@ -878,7 +880,7 @@ if (faith && (inp->inp_flags & INP_FAITH) == 0) continue; - injail = jailed(inp->inp_cred); + injail = inp->inp_cred->cr_prison->pr_flags & PR_IP6; if (injail) { if (prison_check_ip6(inp->inp_cred, laddr) != 0) Index: sys/contrib/ipfilter/netinet/ip_nat.c =================================================================== --- sys/contrib/ipfilter/netinet/ip_nat.c (revision 191896) +++ sys/contrib/ipfilter/netinet/ip_nat.c (working copy) @@ -662,7 +662,11 @@ return EPERM; } # else +# if defined(__FreeBSD_version) && (__FreeBSD_version >= 500034) + if (securelevel_ge(curthread->td_ucred, 3) && (mode & FWRITE)) { +# else if ((securelevel >= 3) && (mode & FWRITE)) { +# endif return EPERM; } # endif Index: sys/contrib/ipfilter/netinet/ip_fil_freebsd.c =================================================================== --- sys/contrib/ipfilter/netinet/ip_fil_freebsd.c (revision 191896) +++ sys/contrib/ipfilter/netinet/ip_fil_freebsd.c (working copy) @@ -318,8 +318,10 @@ # if (__FreeBSD_version >= 500024) struct thread *p; # if (__FreeBSD_version >= 500043) +# define p_cred td_ucred # define p_uid td_ucred->cr_ruid # else +# define p_cred t_proc->p_cred # define p_uid t_proc->p_cred->p_ruid # endif # else @@ -342,7 +344,11 @@ SPL_INT(s); #if (BSD >= 199306) && defined(_KERNEL) +# if (__FreeBSD_version >= 500034) + if (securelevel_ge(p->p_cred, 3) && (mode & FWRITE)) +# else if ((securelevel >= 3) && (mode & FWRITE)) +# endif return EPERM; #endif Index: sys/security/mac_bsdextended/mac_bsdextended.c =================================================================== --- sys/security/mac_bsdextended/mac_bsdextended.c (revision 191896) +++ sys/security/mac_bsdextended/mac_bsdextended.c (working copy) @@ -271,8 +271,8 @@ } if (rule->mbr_subject.mbs_flags & MBS_PRISON_DEFINED) { - match = (cred->cr_prison != NULL && - cred->cr_prison->pr_id == rule->mbr_subject.mbs_prison); + match = + (cred->cr_prison->pr_id == rule->mbr_subject.mbs_prison); if (rule->mbr_subject.mbs_neg & MBS_PRISON_DEFINED) match = !match; if (!match) Index: sys/sys/cpuset.h =================================================================== --- sys/sys/cpuset.h (revision 191896) +++ sys/sys/cpuset.h (working copy) @@ -169,6 +169,7 @@ #define CPU_SET_RDONLY 0x0002 /* No modification allowed. */ extern cpuset_t *cpuset_root; +struct prison; struct proc; struct thread; @@ -176,7 +177,7 @@ struct cpuset *cpuset_ref(struct cpuset *); void cpuset_rel(struct cpuset *); int cpuset_setthread(lwpid_t id, cpuset_t *); -int cpuset_create_root(struct thread *, struct cpuset **); +int cpuset_create_root(struct prison *, struct cpuset **); int cpuset_setproc_update_set(struct proc *, struct cpuset *); #else Index: sys/sys/jail.h =================================================================== --- sys/sys/jail.h (revision 191896) +++ sys/sys/jail.h (working copy) @@ -122,8 +122,8 @@ #include #include -#include -#include +#include +#include #include #define JAIL_MAX 999999 @@ -137,8 +137,6 @@ #include -struct cpuset; - /* * This structure describes a prison. It is pointed to by all struct * ucreds's of the inmates. pr_ref keeps track of them and is used to @@ -162,7 +160,7 @@ struct vnode *pr_root; /* (c) vnode to rdir */ char pr_host[MAXHOSTNAMELEN]; /* (p) jail hostname */ char pr_name[MAXHOSTNAMELEN]; /* (p) admin jail name */ - void *pr_spare; /* was pr_linux */ + struct prison *pr_parent; /* (c) containing jail */ int pr_securelevel; /* (p) securelevel */ struct task pr_task; /* (d) destroy task */ struct mtx pr_mtx; @@ -171,6 +169,14 @@ struct in_addr *pr_ip4; /* (p) v4 IPs of jail */ int pr_ip6s; /* (p) number of v6 IPs */ struct in6_addr *pr_ip6; /* (p) v6 IPs of jail */ + LIST_HEAD(, prison) pr_children; /* (a) list of child jails */ + LIST_ENTRY(prison) pr_sibling; /* (a) next in parent's list */ + int pr_prisoncount; /* (a) number of child jails */ + int pr_enforce_statfs; /* (p) statfs permission */ + int pr_max_af_ips; /* (p) IP address limit */ + unsigned pr_def_perms; /* (p) child PR_PERM_* flags */ + int pr_def_enforce_statfs; /* (p) child statfs */ + int pr_def_max_af_ips; /* (p) child IP limit */ }; #endif /* _KERNEL || _WANT_PRISON */ @@ -179,7 +185,24 @@ * Flag bits set via options or internally */ #define PR_PERSIST 0x00000001 /* Can exist without processes */ +#define PR_IP4_USER 0x00000004 /* Virtualize IPv4 addresses */ +#define PR_IP6_USER 0x00000008 /* Virtualize IPv6 addresses */ + +#define PR_ALLOW_SET_HOSTNAME 0x00010000 +#define PR_ALLOW_SYSVIPC 0x00020000 +#define PR_ALLOW_RAW_SOCKETS 0x00040000 +#define PR_ALLOW_CHFLAGS 0x00080000 +#define PR_ALLOW_MOUNT 0x00100000 +#define PR_ALLOW_QUOTAS 0x00200000 +#define PR_ALLOW_JAILS 0x00400000 +#define PR_RESTRICT_SOCKET_UNIXIPROUTE 0x00800000 + +#define PR_ALLOW_ALL 0x007f0000 +#define PR_RESTRICT_ALL 0x00800000 + #define PR_REMOVE 0x01000000 /* In process of being removed */ +#define PR_IP4 0x02000000 /* Virtualize IPv4 (maybe inherited) */ +#define PR_IP6 0x04000000 /* Virtualize IPv6 (maybe inherited) */ /* * OSD methods @@ -192,17 +215,67 @@ #define PR_MAXMETHOD 5 /* - * Sysctl-set variables that determine global jail policy - * - * XXX MIB entries will need to be protected by a mutex. + * Lock/unlock a prison. + * XXX These exist not so much for general convenience, but to be useable in + * the FOREACH_PRISON_DESCENDANT_LOCKED macro which can't handle them in + * non-function form as currently defined. */ -extern int jail_set_hostname_allowed; -extern int jail_socket_unixiproute_only; -extern int jail_sysvipc_allowed; -extern int jail_getfsstat_jailrootonly; -extern int jail_allow_raw_sockets; -extern int jail_chflags_allowed; +static __inline void +prison_lock(struct prison *pr) +{ + mtx_lock(&pr->pr_mtx); +} +static __inline void +prison_unlock(struct prison *pr) +{ + mtx_unlock(&pr->pr_mtx); +} + +/* Traverse a prison's immediate children */ +#define FOREACH_PRISON_CHILD(ppr, cpr) \ + LIST_FOREACH(cpr, &(ppr)->pr_children, pr_sibling) + +/* + * Preorder traversal of all of a prison's descendants. + * This ugly loop allows the macro to be followed by a single block + * as expected in a looping primitive. + */ +#define FOREACH_PRISON_DESCENDANT(ppr, cpr, descend) \ + for ((cpr) = (ppr), (descend) = 1; \ + ((cpr) = ((descend) && !LIST_EMPTY(&(cpr)->pr_children)) \ + ? LIST_FIRST(&(cpr)->pr_children) \ + : (cpr) == (ppr) \ + ? NULL \ + : ((descend) = LIST_NEXT(cpr, pr_sibling) != NULL) \ + ? LIST_NEXT(cpr, pr_sibling) \ + : (cpr)->pr_parent);) \ + if (!(descend)) \ + ; \ + else + +/* + * As above, but lock descendants on the way down and unlock on the way up. + */ +#define FOREACH_PRISON_DESCENDANT_LOCKED(ppr, cpr, descend) \ + for ((cpr) = (ppr), (descend) = 1; \ + ((cpr) = ((descend) && !LIST_EMPTY(&(cpr)->pr_children)) \ + ? LIST_FIRST(&(cpr)->pr_children) \ + : (cpr) == (ppr) \ + ? NULL \ + : (prison_unlock(cpr), \ + (descend) = LIST_NEXT(cpr, pr_sibling) != NULL) \ + ? LIST_NEXT(cpr, pr_sibling) \ + : (cpr)->pr_parent);) \ + if ((descend) ? (prison_lock(cpr), 0) : 1) \ + ; \ + else + +/* + * Attributes of the physical system, and the root of the jail tree. + */ +extern struct prison prison0; + TAILQ_HEAD(prisonlist, prison); extern struct prisonlist allprison; extern struct sx allprison_lock; @@ -240,18 +313,22 @@ void prison_enforce_statfs(struct ucred *cred, struct mount *mp, struct statfs *sp); struct prison *prison_find(int prid); -struct prison *prison_find_name(const char *name); +struct prison *prison_find_child(struct prison *mypr, int prid); +struct prison *prison_find_name(struct prison *mypr, const char *name); void prison_free(struct prison *pr); void prison_free_locked(struct prison *pr); void prison_hold(struct prison *pr); void prison_hold_locked(struct prison *pr); void prison_proc_hold(struct prison *); void prison_proc_free(struct prison *); +int prison_ischild(struct prison *pr1, struct prison *pr2); +int prison_equal_ip4(struct prison *, struct prison *); int prison_get_ip4(struct ucred *cred, struct in_addr *ia); int prison_local_ip4(struct ucred *cred, struct in_addr *ia); int prison_remote_ip4(struct ucred *cred, struct in_addr *ia); int prison_check_ip4(struct ucred *cred, struct in_addr *ia); #ifdef INET6 +int prison_equal_ip6(struct prison *, struct prison *); int prison_get_ip6(struct ucred *, struct in6_addr *); int prison_local_ip6(struct ucred *, struct in6_addr *, int); int prison_remote_ip6(struct ucred *, struct in6_addr *); @@ -259,6 +336,7 @@ #endif int prison_check_af(struct ucred *cred, int af); int prison_if(struct ucred *cred, struct sockaddr *sa); +char *prison_name(struct prison *pr1, struct prison *pr2); int prison_priv_check(struct ucred *cred, int priv); int sysctl_jail_param(struct sysctl_oid *, void *, int , struct sysctl_req *); Index: sys/sys/systm.h =================================================================== --- sys/sys/systm.h (revision 191896) +++ sys/sys/systm.h (working copy) @@ -45,8 +45,6 @@ #include #include /* for people using printf mainly */ -extern int securelevel; /* system security level (see init(8)) */ - extern int cold; /* nonzero if we are doing a cold boot */ extern int rebooting; /* boot() has been called. */ extern const char *panicstr; /* panic message */ --------------080507020907010909080502-- From owner-freebsd-virtualization@FreeBSD.ORG Sat May 9 06:46:44 2009 Return-Path: Delivered-To: virtualization@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2416106566B for ; Sat, 9 May 2009 06:46:44 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outL.internet-mail-service.net (outl.internet-mail-service.net [216.240.47.235]) by mx1.freebsd.org (Postfix) with ESMTP id B6E408FC12 for ; Sat, 9 May 2009 06:46:44 +0000 (UTC) (envelope-from julian@elischer.org) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id EAC7A6C719; Fri, 8 May 2009 23:47:03 -0700 (PDT) X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (unknown [24.114.252.230]) by idiom.com (Postfix) with ESMTP id 219AF2D600F; Fri, 8 May 2009 23:46:43 -0700 (PDT) Message-ID: <4A0526D7.7090000@elischer.org> Date: Fri, 08 May 2009 23:46:47 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: Jamie Gritton References: <4A051DE3.30705@FreeBSD.org> In-Reply-To: <4A051DE3.30705@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: virtualization@FreeBSD.org, jail@FreeBSD.org Subject: Re: Hierarchical jails X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 May 2009 06:46:45 -0000 Jamie Gritton wrote: > Here's the first round of hierarchical jails under the new framework. > > Instead of creds having either a prison or a NULL pointer, they all have > a prison pointer with the default being the global "prison0" that > contains information about the real environment. Jailed root may (if > granted permission) create prisons that would be under its place in the > hierarchy, but may not alter (or even see) prisons at its level or > above. agree > > The JID space is flat, i.e. every prison in the system has a unique ID. > The prison name space is hierarchical, with jails having dot-separated > component names. this matches vimage, and I agree. > > prison0 contains three fields that were system globals: pr_root, > pr_host, and pr_securelevel. I've kept the globals rootvnode and > hostname, and take care that when one is changed the other changes too > (not yet true for hostname - read on). But I've actually removed the > global securelevel, instead forcing people to use securelevel_gt() and > securelevel_ge() (or in very rare cases to check prison0.pr_securelevel > directly). I chose to do that because while using the global rootvnode > and hostname may be incorrect, using the wrong securelevel is, well, > insecure. Actually it would be insecure to use the wrong rootvnode too, > but I'm not convinced removing that global is worth the headache. fair enough at this time. > > Other globals are subsumed into prison0, but they were only ever part of > the jail system anyway: the various jail-related permission bits and > such administrative things as prisoncount. > > The prison hierarchy keeps track of restrictions placed on prisons, and > will reflect them downward so a child jail is always at least as > restricted as its ancestors. It doesn't go the other way though: if a > prison's restrictions are loosened, the children stay as they are. yes. I agree. > > This patch doesn't have anything for userland, and hierarchical jails > won't work without that patch (because jails don't have permission to > create sub-jails by default, and jail(2) can't grant that permission). > A userland patch will follow soon, very similar to the version I posted > here recently. > > - Jamie patch removed by mailng list... (but I saw it in the privately received version...) > > > ------------------------------------------------------------------------ > > _______________________________________________ > freebsd-virtualization@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to "freebsd-virtualization-unsubscribe@freebsd.org" From owner-freebsd-virtualization@FreeBSD.ORG Sat May 9 06:53:30 2009 Return-Path: Delivered-To: virtualization@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E4669106566B; Sat, 9 May 2009 06:53:30 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from gritton.org (gritton.org [161.58.222.4]) by mx1.freebsd.org (Postfix) with ESMTP id 559548FC0C; Sat, 9 May 2009 06:53:30 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from glorfindel.gritton.org (c-76-27-80-223.hsd1.ut.comcast.net [76.27.80.223]) (authenticated bits=0) by gritton.org (8.13.6.20060614/8.13.6) with ESMTP id n496rSx4069030; Sat, 9 May 2009 00:53:28 -0600 (MDT) Message-ID: <4A052867.2090806@FreeBSD.org> Date: Sat, 09 May 2009 00:53:27 -0600 From: Jamie Gritton User-Agent: Thunderbird 2.0.0.19 (X11/20090220) MIME-Version: 1.0 To: jail@FreeBSD.org, virtualization@FreeBSD.org Content-Type: multipart/mixed; boundary="------------020909010407060109070301" X-Virus-Scanned: ClamAV 0.94.2/9348/Fri May 8 20:35:20 2009 on gritton.org X-Virus-Status: Clean Cc: Subject: Hierarchical jails (user side) X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 May 2009 06:53:31 -0000 This is a multi-part message in MIME format. --------------020909010407060109070301 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit These are the extended versions of jail and jls to handle the arbitrary name-value pairs, as well as small changes to jexec and killall. There's actually nothing hierarchical about these programs, they just allow setting the parameters necessary to allow hierarchical prisons. They also include a bit of text in jail(8) about hierarchical jails and what's necessary to set them up. This might look familiar - it's excactly the same patches I posted a few days back, with the except of the jail(8) man page. I'd appreciate if anyone who's interested in hierarchical jails, or in the new jail subsystem, or in vimage (which will soon merge with this) could try out these patches and give a bit of feedback. - Jamie --------------020909010407060109070301 Content-Type: text/plain; name="jhu.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="jhu.diff" Index: usr.bin/killall/killall.1 =================================================================== --- usr.bin/killall/killall.1 (revision 191896) +++ usr.bin/killall/killall.1 (working copy) @@ -24,7 +24,7 @@ .\" .\" $FreeBSD$ .\" -.Dd November 9, 2007 +.Dd April 30, 2009 .Os .Dt KILLALL 1 .Sh NAME @@ -34,7 +34,7 @@ .Nm .Op Fl delmsvz .Op Fl help -.Op Fl j Ar jid +.Op Fl j Ar jail .Op Fl u Ar user .Op Fl t Ar tty .Op Fl c Ar procname @@ -91,9 +91,9 @@ (with or without a leading .Dq Li SIG ) , or numerically. -.It Fl j Ar jid -Kill processes in the jail specified by -.Ar jid . +.It Fl j Ar jail +Kill processes in the specified +.Ar jail . .It Fl u Ar user Limit potentially matching processes to those belonging to the specified Index: usr.bin/killall/killall.c =================================================================== --- usr.bin/killall/killall.c (revision 191896) +++ usr.bin/killall/killall.c (working copy) @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -51,7 +52,7 @@ usage(void) { - fprintf(stderr, "usage: killall [-delmsvz] [-help] [-j jid]\n"); + fprintf(stderr, "usage: killall [-delmsvz] [-help] [-j jail]\n"); fprintf(stderr, " [-u user] [-t tty] [-c cmd] [-SIGNAL] [cmd]...\n"); fprintf(stderr, "At least one option or argument to specify processes must be given.\n"); @@ -100,6 +101,7 @@ int main(int ac, char **av) { + struct iovec jparams[2]; struct kinfo_proc *procs = NULL, *newprocs; struct stat sb; struct passwd *pw; @@ -159,12 +161,21 @@ } jflag++; if (*av == NULL) - errx(1, "must specify jid"); - jid = strtol(*av, &ep, 10); - if (!*av || *ep) - errx(1, "illegal jid: %s", *av); + errx(1, "must specify jail"); + jid = strtoul(*av, &ep, 10); + if (!**av || *ep) { + *(const void **)&jparams[0].iov_base = + "name"; + jparams[0].iov_len = sizeof("name"); + jparams[1].iov_base = *av; + jparams[1].iov_len = strlen(*av) + 1; + jid = jail_get(jparams, 2, 0); + if (jid < 0) + errx(1, "unknown jail: %s", + *av); + } if (jail_attach(jid) == -1) - err(1, "jail_attach(): %d", jid); + err(1, "jail_attach(%d)", jid); break; case 'u': ++*av; Index: usr.sbin/jls/jls.c =================================================================== --- usr.sbin/jls/jls.c (revision 191896) +++ usr.sbin/jls/jls.c (working copy) @@ -1,6 +1,7 @@ /*- * Copyright (c) 2003 Mike Barcroft * Copyright (c) 2008 Bjoern A. Zeeb + * Copyright (c) 2009 James Gritton * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -23,18 +24,20 @@ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. - * - * $FreeBSD$ */ +#include +__FBSDID("$FreeBSD$"); + #include -#include #include +#include #include +#include -#include +#include #include -#include + #include #include #include @@ -43,215 +46,672 @@ #include #include -#define FLAG_A 0x00001 -#define FLAG_V 0x00002 +#define SJPARAM "security.jail.param" +#define ARRAY_SLOP 5 -#ifdef SUPPORT_OLD_XPRISON -static -char *print_xprison_v1(void *p, char *end, unsigned flags) +#define CTLTYPE_BOOL (CTLTYPE + 1) +#define CTLTYPE_NOBOOL (CTLTYPE + 2) +#define CTLTYPE_IPADDR (CTLTYPE + 3) +#define CTLTYPE_IP6ADDR (CTLTYPE + 4) + +#define PARAM_KEY 0x1 +#define PARAM_USER 0x2 +#define PARAM_ARRAY 0x4 +#define PARAM_OPT 0x8 + +#define PRINT_DEFAULT 0x01 +#define PRINT_VDEFAULT 0x02 +#define PRINT_HEADER 0x04 +#define PRINT_NAMEVAL 0x08 +#define PRINT_QUOTED 0x10 + +struct param { + char *name; + void *value; + size_t size; + int type; + unsigned flags; +}; + +struct iovec2 { + struct iovec name; + struct iovec value; +}; + +static struct param *params; +static int nparams; +static char errmsg[256]; + +static void add_param(const char *name, void *value, unsigned flags); +static int get_param(const char *name, struct param *param); +static int sort_param(const void *a, const void *b); +static char *noname(const char *name); +static char *nononame(const char *name); +static int print_jail(int pflags, int jflags); +static void quoted_print(char *str, int len); + +int +main(int argc, char **argv) { - struct xprison_v1 *xp; - struct in_addr in; + char *ep, *jname; + int c, i, jflags, jid, lastjid, pflags; - if ((char *)p + sizeof(struct xprison_v1) > end) - errx(1, "Invalid length for jail"); + jname = NULL; + pflags = jflags = jid = 0; + while ((c = getopt(argc, argv, "dj:hnqv")) >= 0) + switch (c) { + case 'd': + jflags |= JAIL_DYING; + break; + case 'j': + jid = strtoul(optarg, &ep, 10); + if (!*optarg || *ep) + jname = optarg; + break; + case 'h': + pflags |= PRINT_HEADER; + break; + case 'n': + pflags |= PRINT_NAMEVAL; + break; + case 'q': + pflags |= PRINT_QUOTED; + break; + case 'v': + pflags |= PRINT_VDEFAULT; + break; + default: + errx(1, "usage: jls [-dhnqv] [-j jail] [param ...]"); + } - xp = (struct xprison_v1 *)p; - if (flags & FLAG_V) { - printf("%6d %-29.29s %.74s\n", - xp->pr_id, xp->pr_host, xp->pr_path); - /* We are not printing an empty line here for state and name. */ - /* We are not printing an empty line here for cpusetid. */ - /* IPv4 address. */ - in.s_addr = htonl(xp->pr_ip); - printf("%6s %-15.15s\n", "", inet_ntoa(in)); + /* Add the parameters to print. */ + if (optind == argc) { + if (pflags & PRINT_VDEFAULT) { + add_param("jid", NULL, PARAM_USER); + add_param("host.hostname", NULL, PARAM_USER); + add_param("path", NULL, PARAM_USER); + add_param("name", NULL, PARAM_USER); + add_param("dying", NULL, PARAM_USER); + add_param("cpuset", NULL, PARAM_USER); + add_param("ip4.addr", NULL, PARAM_USER); + add_param("ip6.addr", NULL, PARAM_USER | PARAM_OPT); + } else { + pflags |= PRINT_DEFAULT; + add_param("jid", NULL, PARAM_USER); + add_param("ip4.addr", NULL, PARAM_USER); + add_param("host.hostname", NULL, PARAM_USER); + add_param("path", NULL, PARAM_USER); + } + } else + while (optind < argc) + add_param(argv[optind++], NULL, PARAM_USER); + + /* Add the index key and errmsg parameters. */ + if (jid != 0) + add_param("jid", &jid, PARAM_KEY); + else if (jname != NULL) + add_param("name", jname, PARAM_KEY); + else + add_param("lastjid", &lastjid, PARAM_KEY); + add_param("errmsg", errmsg, PARAM_KEY); + + /* Print a header line if requested. */ + if (pflags & PRINT_VDEFAULT) + printf(" JID Hostname Path\n" + " Name State\n" + " CPUSetID\n" + " IP Address(es)\n"); + else if (pflags & PRINT_DEFAULT) + printf(" JID IP Address " + "Hostname Path\n"); + else if (pflags & PRINT_HEADER) { + for (i = 0; i < nparams; i++) + if (params[i].flags & PARAM_USER) { + if (i > 0) + putchar(' '); + fputs(params[i].name, stdout); + } + putchar('\n'); + } + + /* Fetch the jail(s) and print the paramters. */ + if (jid != 0 || jname != NULL) { + if (print_jail(pflags, jflags) < 0) { + if (errmsg[0]) + errx(1, "%s", errmsg); + err(1, "jail_get"); + } } else { - printf("%6d %-15.15s %-29.29s %.74s\n", - xp->pr_id, inet_ntoa(in), xp->pr_host, xp->pr_path); + for (lastjid = 0; + (lastjid = print_jail(pflags, jflags)) >= 0; ) + ; + if (errno != 0 && errno != ENOENT) { + if (errmsg[0]) + errx(1, "%s", errmsg); + err(1, "jail_get"); + } } - return ((char *)(xp + 1)); + return (0); } -#endif -static -char *print_xprison_v3(void *p, char *end, unsigned flags) +static void +add_param(const char *name, void *value, unsigned flags) { - struct xprison *xp; - struct in_addr *iap, in; - struct in6_addr *ia6p; - char buf[INET6_ADDRSTRLEN]; - const char *state; - char *q; - uint32_t i; + struct param *param; + char *nname; + size_t mlen1, mlen2, buflen; + int mib1[CTL_MAXNAME], mib2[CTL_MAXNAME - 2]; + int i, tnparams; + char buf[MAXPATHLEN]; - if ((char *)p + sizeof(struct xprison) > end) - errx(1, "Invalid length for jail"); - xp = (struct xprison *)p; + static int paramlistsize; - if (xp->pr_state < 0 || xp->pr_state >= (int) - ((sizeof(prison_states) / sizeof(struct prison_state)))) - state = "(bogus)"; - else - state = prison_states[xp->pr_state].state_name; + /* The pseudo-parameter "all" scans the list of available parameters. */ + if (!strcmp(name, "all")) { + tnparams = nparams; + mib1[0] = 0; + mib1[1] = 2; + mlen1 = CTL_MAXNAME - 2; + if (sysctlnametomib(SJPARAM, mib1 + 2, &mlen1) < 0) + err(1, "sysctlnametomib(" SJPARAM ")"); + for (;;) { + /* Get the next parameter. */ + mlen2 = sizeof(mib2); + if (sysctl(mib1, mlen1 + 2, mib2, &mlen2, NULL, 0) < 0) + err(1, "sysctl(0.2)"); + if (mib2[0] != mib1[2] || mib2[1] != mib1[3] || + mib2[2] != mib1[4]) + break; + /* Convert it to an ascii name. */ + memcpy(mib1 + 2, mib2, mlen2); + mlen1 = mlen2 / sizeof(int); + mib1[1] = 1; + buflen = sizeof(buf); + if (sysctl(mib1, mlen1 + 2, buf, &buflen, NULL, 0) < 0) + err(1, "sysctl(0.1)"); + add_param(buf + sizeof(SJPARAM), NULL, flags); + /* + * Convert nobool parameters to bool if their + * counterpart is a node, ortherwise discard them. + */ + param = ¶ms[nparams - 1]; + if (param->type == CTLTYPE_NOBOOL) { + nname = nononame(param->name); + if (get_param(nname, param) >= 0 && + param->type != CTLTYPE_NODE) { + free(nname); + nparams--; + } else { + free(param->name); + param->name = nname; + param->type = CTLTYPE_BOOL; + param->size = sizeof(int); + param->value = NULL; + } + } + mib1[1] = 2; + } - /* See if we should print non-ACTIVE jails. No? */ - if ((flags & FLAG_A) == 0 && strcmp(state, "ALIVE")) { - q = (char *)(xp + 1); - q += (xp->pr_ip4s * sizeof(struct in_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - q += (xp->pr_ip6s * sizeof(struct in6_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - return (q); + qsort(params + tnparams, (size_t)(nparams - tnparams), + sizeof(struct param), sort_param); + return; } - if (flags & FLAG_V) - printf("%6d %-29.29s %.74s\n", - xp->pr_id, xp->pr_host, xp->pr_path); + /* Check for repeat parameters. */ + for (i = 0; i < nparams; i++) + if (!strcmp(name, params[i].name)) { + params[i].value = value; + params[i].flags |= flags; + return; + } - /* Jail state and name. */ - if (flags & FLAG_V) - printf("%6s %-29.29s %.74s\n", - "", (xp->pr_name[0] != '\0') ? xp->pr_name : "", state); + /* Make sure there is room for the new param record. */ + if (!nparams) { + paramlistsize = 32; + params = malloc(paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "malloc"); + } else if (nparams >= paramlistsize) { + paramlistsize *= 2; + params = realloc(params, paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "realloc"); + } - /* cpusetid. */ - if (flags & FLAG_V) - printf("%6s %-6d\n", - "", xp->pr_cpusetid); + /* Look up the parameter. */ + param = params + nparams++; + memset(param, 0, sizeof *param); + param->name = strdup(name); + if (param->name == NULL) + err(1, "strdup"); + param->flags = flags; + /* We have to know about pseudo-parameters without asking. */ + if (!strcmp(param->name, "lastjid")) { + param->type = CTLTYPE_INT; + param->size = sizeof(int); + goto got_type; + } + if (!strcmp(param->name, "errmsg")) { + param->type = CTLTYPE_STRING; + param->size = sizeof(errmsg); + goto got_type; + } + if (get_param(name, param) < 0) { + if (errno != ENOENT) + err(1, "sysctl(0.3.%s)", name); + /* See if this the "no" part of an existing boolean. */ + if ((nname = nononame(name))) { + i = get_param(nname, param); + free(nname); + if (i >= 0 && param->type == CTLTYPE_BOOL) { + param->type = CTLTYPE_NOBOOL; + goto got_type; + } + } + if (flags & PARAM_OPT) { + nparams--; + return; + } + errx(1, "unknown parameter: %s", name); + } + if (param->type == CTLTYPE_NODE) { + /* + * A node isn't normally a parameter, but may be a boolean + * if its "no" counterpart exists. + */ + nname = noname(name); + i = get_param(nname, param); + free(nname); + if (i >= 0 && param->type == CTLTYPE_NOBOOL) { + param->type = CTLTYPE_BOOL; + goto got_type; + } + errx(1, "unknown parameter: %s", name); + } - q = (char *)(xp + 1); - /* IPv4 addresses. */ - iap = (struct in_addr *)(void *)q; - q += (xp->pr_ip4s * sizeof(struct in_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - in.s_addr = 0; - for (i = 0; i < xp->pr_ip4s; i++) { - if (i == 0 || flags & FLAG_V) - in.s_addr = iap[i].s_addr; - if (flags & FLAG_V) - printf("%6s %-15.15s\n", "", inet_ntoa(in)); + got_type: + param->value = value; +} + +static int +get_param(const char *name, struct param *param) +{ + char *bufi, *p; + size_t buflen, mlen; + int mib[CTL_MAXNAME]; + char buf[MAXPATHLEN]; + + /* Look up the MIB. */ + mib[0] = 0; + mib[1] = 3; + snprintf(buf, sizeof(buf), SJPARAM ".%s", name); + mlen = sizeof(mib) - 2 * sizeof(int); + if (sysctl(mib, 2, mib + 2, &mlen, buf, strlen(buf)) < 0) + return (-1); + /* Get the type and size. */ + mib[1] = 4; + buflen = sizeof(buf); + if (sysctl(mib, (mlen / sizeof(int)) + 2, buf, &buflen, NULL, 0) < 0) + err(1, "sysctl(0.4.%s)", name); + param->type = *(int *)buf & CTLTYPE; + bufi = buf + sizeof(int); + p = strchr(bufi, '\0'); + if (p - 2 >= bufi && !strcmp(p - 2, ",a")) { + p[-2] = 0; + param->flags |= PARAM_ARRAY; } - /* IPv6 addresses. */ - ia6p = (struct in6_addr *)(void *)q; - q += (xp->pr_ip6s * sizeof(struct in6_addr)); - if (q > end) - errx(1, "Invalid length for jail"); - for (i = 0; i < xp->pr_ip6s; i++) { - if (flags & FLAG_V) { - inet_ntop(AF_INET6, &ia6p[i], buf, sizeof(buf)); - printf("%6s %s\n", "", buf); + switch (param->type) { + case CTLTYPE_INT: + /* An integer parameter might be a boolean. */ + if (bufi[0] == 'B') + param->type = bufi[1] == 'N' + ? CTLTYPE_NOBOOL : CTLTYPE_BOOL; + case CTLTYPE_UINT: + param->size = sizeof(int); + break; + case CTLTYPE_LONG: + case CTLTYPE_ULONG: + param->size = sizeof(long); + break; + case CTLTYPE_STRUCT: + if (!strcmp(bufi, "S,in_addr")) { + param->type = CTLTYPE_IPADDR; + param->size = sizeof(struct in_addr); + } else if (!strcmp(bufi, "S,in6_addr")) { + param->type = CTLTYPE_IP6ADDR; + param->size = sizeof(struct in6_addr); } + break; + case CTLTYPE_STRING: + buf[0] = 0; + sysctl(mib + 2, mlen / sizeof(int), buf, &buflen, NULL, 0); + param->size = strtoul(buf, NULL, 10); + if (param->size == 0) + param->size = BUFSIZ; } + return (0); +} - /* If requested print the old style single line version. */ - if (!(flags & FLAG_V)) - printf("%6d %-15.15s %-29.29s %.74s\n", - xp->pr_id, (in.s_addr) ? inet_ntoa(in) : "", - xp->pr_host, xp->pr_path); +static int +sort_param(const void *a, const void *b) +{ + const struct param *parama, *paramb; + char *ap, *bp; - return (q); + /* Put top-level parameters first. */ + parama = a; + paramb = b; + ap = strchr(parama->name, '.'); + bp = strchr(paramb->name, '.'); + if (ap && !bp) + return (1); + if (bp && !ap) + return (-1); + return (strcmp(parama->name, paramb->name)); } -static void -usage(void) +static char * +noname(const char *name) { + char *nname, *p; - (void)fprintf(stderr, "usage: jls [-av]\n"); - exit(1); + nname = malloc(strlen(name) + 3); + if (nname == NULL) + err(1, "malloc"); + p = strrchr(name, '.'); + if (p != NULL) + sprintf(nname, "%.*s.no%s", p - name, name, p + 1); + else + sprintf(nname, "no%s", name); + return nname; } -int -main(int argc, char *argv[]) -{ - int ch, version; - unsigned flags; - size_t i, j, len; - void *p, *q; +static char * +nononame(const char *name) +{ + char *nname, *p; - flags = 0; - while ((ch = getopt(argc, argv, "av")) != -1) { - switch (ch) { - case 'a': - flags |= FLAG_A; - break; - case 'v': - flags |= FLAG_V; - break; - default: - usage(); - } - } - argc -= optind; - argv += optind; + p = strrchr(name, '.'); + if (strncmp(p ? p + 1 : name, "no", 2)) + return NULL; + nname = malloc(strlen(name) - 1); + if (nname == NULL) + err(1, "malloc"); + if (p != NULL) + sprintf(nname, "%.*s.%s", p - name, name, p + 3); + else + strcpy(nname, name + 2); + return nname; +} - if (sysctlbyname("security.jail.list", NULL, &len, NULL, 0) == -1) - err(1, "sysctlbyname(): security.jail.list"); +static int +print_jail(int pflags, int jflags) +{ + char *nname; + int i, ai, jid, count, sanity; + char ipbuf[INET6_ADDRSTRLEN]; - j = len; - for (i = 0; i < 4; i++) { - if (len <= 0) - exit(0); - p = q = malloc(len); - if (p == NULL) - err(1, "malloc()"); + static struct iovec2 *iov, *aiov; + static int narray, nkey; - if (sysctlbyname("security.jail.list", q, &len, NULL, 0) == -1) { - if (errno == ENOMEM) { - free(p); - p = NULL; - len += j; + /* Set up the parameter list(s) the first time around. */ + if (iov == NULL) { + iov = malloc(nparams * sizeof(struct iovec2)); + if (iov == NULL) + err(1, "malloc"); + for (i = narray = 0; i < nparams; i++) { + iov[i].name.iov_base = params[i].name; + iov[i].name.iov_len = strlen(params[i].name) + 1; + iov[i].value.iov_base = params[i].value; + iov[i].value.iov_len = + params[i].type == CTLTYPE_STRING && + params[i].value != NULL && + ((char *)params[i].value)[0] != '\0' + ? strlen(params[i].value) + 1 : params[i].size; + if (params[i].flags & (PARAM_KEY | PARAM_ARRAY)) { + narray++; + if (params[i].flags & PARAM_KEY) + nkey++; + } + } + if (narray > nkey) { + aiov = malloc(narray * sizeof(struct iovec2)); + if (aiov == NULL) + err(1, "malloc"); + for (i = ai = 0; i < nparams; i++) + if (params[i].flags & + (PARAM_KEY | PARAM_ARRAY)) + aiov[ai++] = iov[i]; + } + } + /* If there are array parameters, find their sizes. */ + if (aiov != NULL) { + for (ai = 0; ai < narray; ai++) + if (aiov[ai].value.iov_base == NULL) + aiov[ai].value.iov_len = 0; + if (jail_get((struct iovec *)aiov, 2 * narray, jflags) < 0) + return (-1); + } + /* Allocate storage for all parameters. */ + for (i = ai = 0; i < nparams; i++) { + if (params[i].flags & (PARAM_KEY | PARAM_ARRAY)) { + if (params[i].flags & PARAM_ARRAY) { + iov[i].value.iov_len = aiov[ai].value.iov_len + + ARRAY_SLOP * params[i].size; + iov[i].value.iov_base = + malloc(iov[i].value.iov_len); + } + ai++; + } else + iov[i].value.iov_base = malloc(params[i].size); + if (iov[i].value.iov_base == NULL) + err(1, "malloc"); + if (params[i].value == NULL) + memset(iov[i].value.iov_base, 0, iov[i].value.iov_len); + } + /* + * Get the actual prison. If there are array elements, retry a few + * times in case the size changed from under us. + */ + if ((jid = jail_get((struct iovec *)iov, 2 * nparams, jflags)) < 0) { + if (errno != EINVAL || aiov == NULL || errmsg[0]) + return (-1); + for (sanity = 0;; sanity++) { + if (sanity == 10) + return (-1); + for (ai = 0; ai < narray; ai++) + if (params[i].flags & PARAM_ARRAY) + aiov[ai].value.iov_len = 0; + if (jail_get((struct iovec *)iov, 2 * narray, jflags) < + 0) + return (-1); + for (i = ai = 0; i < nparams; i++) { + if (!(params[i].flags & + (PARAM_KEY | PARAM_ARRAY))) + continue; + if (params[i].flags & PARAM_ARRAY) { + iov[i].value.iov_len = + aiov[ai].value.iov_len + + ARRAY_SLOP * params[i].size; + iov[i].value.iov_base = + realloc(iov[i].value.iov_base, + iov[i].value.iov_len); + if (iov[i].value.iov_base == NULL) + err(1, "malloc"); + } + ai++; + } + } + } + if (pflags & PRINT_VDEFAULT) { + printf("%6d %-29.29s %.74s\n" + "%6s %-29.29s %.74s\n" + "%6s %-6d\n", + *(int *)iov[0].value.iov_base, + (char *)iov[1].value.iov_base, + (char *)iov[2].value.iov_base, + "", + (char *)iov[3].value.iov_base, + *(int *)iov[4].value.iov_base ? "DYING" : "ACTIVE", + "", + *(int *)iov[5].value.iov_base); + count = iov[6].value.iov_len / sizeof(struct in_addr); + for (ai = 0; ai < count; ai++) + if (inet_ntop(AF_INET, + &((struct in_addr *)iov[6].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%6s %-15.15s\n", "", ipbuf); + if (!strcmp(params[7].name, "ip6.addr")) { + count = iov[7].value.iov_len / sizeof(struct in6_addr); + for (ai = 0; ai < count; ai++) + if (inet_ntop(AF_INET6, &((struct in_addr *) + iov[7].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%6s %-15.15s\n", "", ipbuf); + } + } else if (pflags & PRINT_DEFAULT) + printf("%6d %-15.15s %-29.29s %.74s\n", + *(int *)iov[0].value.iov_base, + iov[1].value.iov_len == 0 ? "-" + : inet_ntoa(*(struct in_addr *)iov[1].value.iov_base), + (char *)iov[2].value.iov_base, + (char *)iov[3].value.iov_base); + else { + for (i = 0; i < nparams; i++) { + if (!(params[i].flags & PARAM_USER)) continue; + if (i > 0) + putchar(' '); + if (pflags & PRINT_NAMEVAL) { + /* + * Generally "name=value", but for booleans + * either "name" or "noname". + */ + switch (params[i].type) { + case CTLTYPE_BOOL: + if (*(int *)iov[i].value.iov_base) + printf("%s", params[i].name); + else { + nname = noname(params[i].name); + printf("%s", nname); + free(nname); + } + break; + case CTLTYPE_NOBOOL: + if (*(int *)iov[i].value.iov_base) + printf("%s", params[i].name); + else { + nname = + nononame(params[i].name); + printf("%s", nname); + free(nname); + } + break; + default: + printf("%s=", params[i].name); + } } - err(1, "sysctlbyname(): security.jail.list"); + count = params[i].flags & PARAM_ARRAY + ? iov[i].value.iov_len / params[i].size : 1; + if (count == 0) + putchar('-'); + for (ai = 0; ai < count; ai++) { + if (ai > 0) + putchar(','); + switch (params[i].type) { + case CTLTYPE_INT: + printf("%d", ((int *) + iov[i].value.iov_base)[ai]); + break; + case CTLTYPE_UINT: + printf("%u", ((int *) + iov[i].value.iov_base)[ai]); + break; + case CTLTYPE_IPADDR: + if (inet_ntop(AF_INET, + &((struct in_addr *) + iov[i].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%s", ipbuf); + break; + case CTLTYPE_IP6ADDR: + if (inet_ntop(AF_INET6, + &((struct in6_addr *) + iov[i].value.iov_base)[ai], + ipbuf, sizeof(ipbuf)) == NULL) + err(1, "inet_ntop"); + else + printf("%s", ipbuf); + break; + case CTLTYPE_LONG: + printf("%ld", ((long *) + iov[i].value.iov_base)[ai]); + case CTLTYPE_ULONG: + printf("%lu", ((long *) + iov[i].value.iov_base)[ai]); + break; + case CTLTYPE_STRING: + if (pflags & PRINT_QUOTED) + quoted_print((char *) + iov[i].value.iov_base, + params[i].size); + else + printf("%.*s", + params[i].size, (char *) + iov[i].value.iov_base); + break; + case CTLTYPE_BOOL: + case CTLTYPE_NOBOOL: + if (!(pflags & PRINT_NAMEVAL)) + printf(((int *) + iov[i].value.iov_base)[ai] + ? "true" : "false"); + } + } } - break; + putchar('\n'); } - if (p == NULL) - err(1, "sysctlbyname(): security.jail.list"); - if (len < sizeof(int)) - errx(1, "This is no prison. Kernel and userland out of sync?"); - version = *(int *)p; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); + for (i = 0; i < nparams; i++) + if (params[i].value == NULL) + free(iov[i].value.iov_base); + return (jid); +} - if (flags & FLAG_V) { - printf(" JID Hostname Path\n"); - printf(" Name State\n"); - printf(" CPUSetID\n"); - printf(" IP Address(es)\n"); - } else { - printf(" JID IP Address Hostname" - " Path\n"); +static void +quoted_print(char *str, int len) +{ + int c, qc; + char *p = str; + char *ep = str + len; + + /* An empty string needs quoting. */ + if (!*p) { + fputs("\"\"", stdout); + return; } - for (; q != NULL && (char *)q + sizeof(int) < (char *)p + len;) { - version = *(int *)q; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); - switch (version) { -#ifdef SUPPORT_OLD_XPRISON - case 1: - q = print_xprison_v1(q, (char *)p + len, flags); - break; - case 2: - errx(1, "Version 2 was used by multi-IPv4 jail " - "implementations that never made it into the " - "official kernel."); - /* NOTREACHED */ - break; -#endif - case 3: - q = print_xprison_v3(q, (char *)p + len, flags); - break; - default: - errx(1, "Prison unknown. Kernel/userland out of sync?"); - /* NOTREACHED */ - break; - } + + /* + * The value will be surrounded by quotes if it contains spaces + * or quotes. + */ + qc = strchr(p, '\'') ? '"' + : strchr(p, '"') ? '\'' + : strchr(p, ' ') || strchr(p, '\t') ? '"' + : 0; + if (qc) + putchar(qc); + while (p < ep && (c = *p++)) { + if (c == '\\' || c == qc) + putchar('\\'); + putchar(c); } - - free(p); - exit(0); + if (qc) + putchar(qc); } Index: usr.sbin/jls/Makefile =================================================================== --- usr.sbin/jls/Makefile (revision 191896) +++ usr.sbin/jls/Makefile (working copy) @@ -4,6 +4,4 @@ MAN= jls.8 WARNS?= 6 -CFLAGS+= -DSUPPORT_OLD_XPRISON - .include Index: usr.sbin/jls/jls.8 =================================================================== --- usr.sbin/jls/jls.8 (revision 191896) +++ usr.sbin/jls/jls.8 (working copy) @@ -25,7 +25,7 @@ .\" .\" $FreeBSD$ .\" -.Dd November 29, 2008 +.Dd April 30, 2009 .Dt JLS 8 .Os .Sh NAME @@ -33,38 +33,59 @@ .Nd "list jails" .Sh SYNOPSIS .Nm -.Op Fl av +.Op Fl dhnqv +.Op Fl j Ar jail +.Op Ar parameter ... .Sh DESCRIPTION The .Nm -utility lists all jails. -By default only active jails are listed. +utility lists all active jails, or the specified jail. +Each jail is represented by one row which contains space-separated values of +the listed +.Ar parameters , +including the pseudo-parameter +.Va all +which will show all available jail parameters. +A list of available parameters can be retrieved via +.Dq Nm sysctl Fl d Va security.jail.param . .Pp -The options are as follows: -.Bl -tag -width ".Fl a" -.It Fl a -Show jails in all states, not only active ones. +If no +.Ar parameters +are given, the following four columns will be printed: +jail identifier (jid), IP address (ip4.addr), hostname (host.hostname), +and path (path). +.Pp +The following options are available: +.Bl -tag -width indent +.It Fl d +List +.Va dying +as well as active jails. +.It Fl h +Print a header line containing the parameters listed. +If no parameters are given on the command line, the default four-column +output always contains a header. +.It Fl n +Print parameters in +.Dq name=value +format, where each parameter is preceded by its name. +This option is ignored for the default four-column output. +.It Fl q +Put quotes around string parameters if they contain spaces or quotes, or are +the empty string. .It Fl v -Show more verbose information. -This also lists cpusets, jail state, multi-IP, etc. instead of the -classic single-IP jail output. +Print a multiple-line summary per jail, with the following parameters: +jail identifier (jid), hostname (host.hostname), path (path), +jail name (name), jail state (dying), cpuset ID (cpuset), +IP address(es) (ip4.addr and ip6.addr). +.It Fl j Ar jail +The jid or name of the +.Ar jail +to list. +Without this option, all active jails will be listed. .El -.Pp -Each jail is represented by rows which, depending on -.Fl v , -contain the following columns: -.Bl -item -offset indent -compact -.It -jail identifier (JID), hostname and path -.It -jail state and name -.It -jail cpuset -.It -followed by one IP adddress per line. -.El .Sh SEE ALSO -.Xr jail 2 , +.Xr jail_get 2 , .Xr jail 8 , .Xr jexec 8 .Sh HISTORY @@ -72,3 +93,5 @@ .Nm utility was added in .Fx 5.1 . +Extensible jail parameters were introduced in +.Fx 8.0 . Index: usr.sbin/jexec/jexec.c =================================================================== --- usr.sbin/jexec/jexec.c (revision 191896) +++ usr.sbin/jexec/jexec.c (working copy) @@ -29,12 +29,16 @@ #include #include +#include #include +#include +#include #include #include #include +#include #include #include #include @@ -43,154 +47,8 @@ #include static void usage(void); +static int addr2jid(const char *addr); -#ifdef SUPPORT_OLD_XPRISON -static -char *lookup_xprison_v1(void *p, char *end, int *id) -{ - struct xprison_v1 *xp; - - if (id == NULL) - errx(1, "Internal error. Invalid ID pointer."); - - if ((char *)p + sizeof(struct xprison_v1) > end) - errx(1, "Invalid length for jail"); - - xp = (struct xprison_v1 *)p; - - *id = xp->pr_id; - return ((char *)(xp + 1)); -} -#endif - -static -char *lookup_xprison_v3(void *p, char *end, int *id, char *jailname) -{ - struct xprison *xp; - char *q; - int ok; - - if (id == NULL) - errx(1, "Internal error. Invalid ID pointer."); - - if ((char *)p + sizeof(struct xprison) > end) - errx(1, "Invalid length for jail"); - - xp = (struct xprison *)p; - ok = 1; - - /* Jail state and name. */ - if (xp->pr_state < 0 || xp->pr_state >= - (int)((sizeof(prison_states) / sizeof(struct prison_state)))) - errx(1, "Invalid jail state."); - else if (xp->pr_state != PRISON_STATE_ALIVE) - ok = 0; - if (jailname != NULL) { - if (xp->pr_name[0] == '\0') - ok = 0; - else if (strcmp(jailname, xp->pr_name) != 0) - ok = 0; - } - - q = (char *)(xp + 1); - /* IPv4 addresses. */ - q += (xp->pr_ip4s * sizeof(struct in_addr)); - if ((char *)q > end) - errx(1, "Invalid length for jail"); - /* IPv6 addresses. */ - q += (xp->pr_ip6s * sizeof(struct in6_addr)); - if ((char *)q > end) - errx(1, "Invalid length for jail"); - - if (ok) - *id = xp->pr_id; - return (q); -} - -static int -lookup_jail(int jid, char *jailname) -{ - size_t i, j, len; - void *p, *q; - int version, id, xid, count; - - if (sysctlbyname("security.jail.list", NULL, &len, NULL, 0) == -1) - err(1, "sysctlbyname(): security.jail.list"); - - j = len; - for (i = 0; i < 4; i++) { - if (len == 0) - return (-1); - p = q = malloc(len); - if (p == NULL) - err(1, "malloc()"); - - if (sysctlbyname("security.jail.list", q, &len, NULL, 0) == -1) { - if (errno == ENOMEM) { - free(p); - p = NULL; - len += j; - continue; - } - err(1, "sysctlbyname(): security.jail.list"); - } - break; - } - if (p == NULL) - err(1, "sysctlbyname(): security.jail.list"); - if (len < sizeof(int)) - errx(1, "This is no prison. Kernel and userland out of sync?"); - version = *(int *)p; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); - - count = 0; - xid = -1; - for (; q != NULL && (char *)q + sizeof(int) < (char *)p + len;) { - version = *(int *)q; - if (version > XPRISON_VERSION) - errx(1, "Sci-Fi prison. Kernel/userland out of sync?"); - id = -1; - switch (version) { -#ifdef SUPPORT_OLD_XPRISON - case 1: - if (jailname != NULL) - errx(1, "Version 1 prisons did not " - "support jail names."); - q = lookup_xprison_v1(q, (char *)p + len, &id); - break; - case 2: - errx(1, "Version 2 was used by multi-IPv4 jail " - "implementations that never made it into the " - "official kernel."); - /* NOTREACHED */ - break; -#endif - case 3: - q = lookup_xprison_v3(q, (char *)p + len, &id, jailname); - break; - default: - errx(1, "Prison unknown. Kernel/userland out of sync?"); - /* NOTREACHED */ - break; - } - /* Possible match; see if we have a jail ID to match as well. */ - if (id > 0 && (jid <= 0 || id == jid)) { - xid = id; - count++; - } - } - - free(p); - - if (count == 1) - return (xid); - else if (count > 1) - errx(1, "Could not uniquely identify the jail."); - else - return (-1); -} - #define GET_USER_INFO do { \ pwd = getpwnam(username); \ if (pwd == NULL) { \ @@ -210,22 +68,18 @@ int main(int argc, char *argv[]) { + struct iovec params[2]; int jid; login_cap_t *lcap = NULL; struct passwd *pwd = NULL; gid_t groups[NGROUPS]; - int ch, ngroups, uflag, Uflag; - char *jailname, *username; + int ch, ngroups, uflag, Uflag, hflag; + char *ep, *username; + ch = uflag = Uflag = hflag = 0; + username = NULL; - ch = uflag = Uflag = 0; - jailname = username = NULL; - jid = -1; - - while ((ch = getopt(argc, argv, "i:n:u:U:")) != -1) { + while ((ch = getopt(argc, argv, "u:U:h")) != -1) { switch (ch) { - case 'n': - jailname = optarg; - break; case 'u': username = optarg; uflag = 1; @@ -234,6 +88,9 @@ username = optarg; Uflag = 1; break; + case 'h': + hflag = 1; + break; default: usage(); } @@ -242,22 +99,24 @@ argv += optind; if (argc < 2) usage(); - if (strlen(argv[0]) > 0) { - jid = (int)strtol(argv[0], NULL, 10); - if (errno) - err(1, "Unable to parse jail ID."); - } - if (jid <= 0 && jailname == NULL) { - fprintf(stderr, "Neither jail ID nor jail name given.\n"); - usage(); - } if (uflag && Uflag) usage(); if (uflag) GET_USER_INFO; - jid = lookup_jail(jid, jailname); - if (jid <= 0) - errx(1, "Cannot identify jail."); + if (hflag) + jid = addr2jid(argv[0]); + else { + jid = strtoul(argv[0], &ep, 10); + if (!*argv[0] || *ep) { + *(const void **)¶ms[0].iov_base = "name"; + params[0].iov_len = sizeof("name"); + params[1].iov_base = argv[0]; + params[1].iov_len = strlen(argv[0]) + 1; + jid = jail_get(params, 2, 0); + if (jid < 0) + errx(1, "Unknown jail: %s", argv[0]); + } + } if (jail_attach(jid) == -1) err(1, "jail_attach(): %d", jid); if (chdir("/") == -1) @@ -285,6 +144,108 @@ fprintf(stderr, "%s%s\n", "usage: jexec [-u username | -U username]", - " [-n jailname] jid command ..."); + " [-h hostname | -h ip-number | jail] command ..."); exit(1); } + +static int +addr2jid(const char *addr) +{ + struct iovec params[6]; + struct in_addr ia; + struct in6_addr ia6; + int cnt, doip, foundjid, ii, jid, lastjid, sanity; + char hostbuf[MAXHOSTNAMELEN]; + + if (inet_pton(AF_INET, addr, &ia) > 0) + doip = 4; + else if (inet_pton(AF_INET6, addr, &ia6) > 0) + doip = 6; + else + doip = 0; + + *(const void **)¶ms[0].iov_base = "lastjid"; + params[0].iov_len = sizeof("lastjid"); + params[1].iov_base = &lastjid; + params[1].iov_len = sizeof(lastjid); + switch (doip) { + case 4: + *(const void **)¶ms[2].iov_base = "ip4.addr"; + params[2].iov_len = sizeof("ip4.addr"); + *(const void **)¶ms[4].iov_base = "host.hostname"; + params[4].iov_len = sizeof("host.hostname"); + params[5].iov_base = hostbuf; + params[5].iov_len = MAXHOSTNAMELEN; + break; + case 6: + *(const void **)¶ms[2].iov_base = "ip6.addr"; + params[2].iov_len = sizeof("ip6.addr"); + *(const void **)¶ms[4].iov_base = "host.hostname"; + params[4].iov_len = sizeof("host.hostname"); + params[5].iov_base = hostbuf; + params[5].iov_len = MAXHOSTNAMELEN; + break; + default: + *(const void **)¶ms[2].iov_base = "host.hostname"; + params[2].iov_len = sizeof("host.hostname"); + params[3].iov_base = hostbuf; + params[3].iov_len = MAXHOSTNAMELEN; + } + + cnt = foundjid = sanity = 0; + for (jid = 0;; jid = lastjid) { + if (doip != 0) { + params[3].iov_base = NULL; + params[3].iov_len = 0; + if (jail_get(params, 4, 0) < 0) + break; + params[3].iov_len += 5 * sizeof(struct in6_addr); + params[3].iov_base = malloc(params[3].iov_len); + jid = jail_get(params, 6, 0); + } else + jid = jail_get(params, 4, 0); + if (jid > 0) { + sanity = 0; + if (!strcmp(hostbuf, addr)) { + cnt++; + foundjid = jid; + } else switch (doip) { + case 4: + for (ii = (params[3].iov_len / + sizeof(struct in_addr)) - 1; ii >= 0; ii--) + if (((struct in_addr *)params[3]. + iov_base)[ii].s_addr == ia.s_addr) { + cnt++; + foundjid = jid; + break; + } + break; + case 6: + for (ii = (params[3].iov_len / + sizeof(struct in6_addr)) - 1; ii >= 0; + ii--) + if (IN6_ARE_ADDR_EQUAL(&ia6, + &((struct in6_addr *) + params[3].iov_base)[ii])) { + cnt++; + foundjid = jid; + break; + } + } + } else if (errno == ENOENT || ++sanity > 10) + break; + else + jid = lastjid; + if (doip != 0) + free(params[3].iov_base); + } + switch (cnt) + { + case 0: + errx(1, "Unknown jail: %s", addr); + case 1: + return foundjid; + default: + errx(1, "Could not uniquely identify the jail: %s", addr); + } +} Index: usr.sbin/jexec/jexec.8 =================================================================== --- usr.sbin/jexec/jexec.8 (revision 191896) +++ usr.sbin/jexec/jexec.8 (working copy) @@ -25,7 +25,7 @@ .\" .\" $FreeBSD$ .\" -.Dd November 29, 2008 +.Dd April 30, 2009 .Dt JEXEC 8 .Os .Sh NAME @@ -34,36 +34,22 @@ .Sh SYNOPSIS .Nm .Op Fl u Ar username | Fl U Ar username -.Op Fl n Ar jailname -.Ar jid command ... +.Op Fl h Ar hostname | Fl h Ar ip | Ar jid | Ar name +.Ar command ... .Sh DESCRIPTION The .Nm utility executes .Ar command -inside the jail identified by either -.Ar jailname +inside the jail identified by +.Ar hostname , +.Ar ip , +.Ar jid , or -.Ar jid -or both. +.Ar name . .Pp -If the jail cannot be identified uniquely by the given parameters, -an error message is printed. -.Nm -will also check the state of the jail (once supported) to be -.Dv ALIVE -and ignore jails in other states. -The mandatory argument -.Ar jid -is the unique jail identifier as given by -.Xr jls 8 . -In case you only want to match on other criteria, give an empty string. -.Pp The following options are available: .Bl -tag -width indent -.It Fl n Ar jailname -The name of the jail, if given upon creation of the jail. -This is not the hostname of the jail. .It Fl u Ar username The user name from host environment as whom the .Ar command @@ -73,6 +59,9 @@ .Ar command should run. .El +.Sh "CAUTIONS" +Only a jail's jid or name is guaranteed to uniquely identify the jail. +Hostname or ip only work here if matched to one unique jail. .Sh SEE ALSO .Xr jail_attach 2 , .Xr jail 8 , Index: usr.sbin/jexec/Makefile =================================================================== --- usr.sbin/jexec/Makefile (revision 191896) +++ usr.sbin/jexec/Makefile (working copy) @@ -6,6 +6,4 @@ LDADD= -lutil WARNS?= 6 -CFLAGS+= -DSUPPORT_OLD_XPRISON - .include Index: usr.sbin/jail/jail.c =================================================================== --- usr.sbin/jail/jail.c (revision 191896) +++ usr.sbin/jail/jail.c (working copy) @@ -1,5 +1,6 @@ /*- * Copyright (c) 1999 Poul-Henning Kamp. + * Copyright (c) 2009 James Gritton * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -29,51 +30,43 @@ #include #include -#include #include #include -#include +#include +#include #include -#include -#include +#include #include #include #include #include +#include #include #include #include #include -#include #include #include -static void usage(void); -static int add_addresses(struct addrinfo *); -static struct in_addr *copy_addr4(void); -#ifdef INET6 -static struct in6_addr *copy_addr6(void); -#endif +#define SJPARAM "security.jail.param" +#define ERRMSG_SIZE 256 -extern char **environ; - -struct addr4entry { - STAILQ_ENTRY(addr4entry) addr4entries; - struct in_addr ip4; - int count; +struct param { + struct iovec name; + struct iovec value; }; -struct addr6entry { - STAILQ_ENTRY(addr6entry) addr6entries; -#ifdef INET6 - struct in6_addr ip6; -#endif - int count; -}; -STAILQ_HEAD(addr4head, addr4entry) addr4 = STAILQ_HEAD_INITIALIZER(addr4); -STAILQ_HEAD(addr6head, addr6entry) addr6 = STAILQ_HEAD_INITIALIZER(addr6); +static struct param *params; +static int nparams; + +static void set_param(const char *name, char *value); +static void set_param_ip_hostname(char *value, int family); +static void usage(void); + +extern char **environ; + #define GET_USER_INFO do { \ pwd = getpwnam(username); \ if (pwd == NULL) { \ @@ -94,27 +87,28 @@ main(int argc, char **argv) { login_cap_t *lcap = NULL; - struct jail j; + struct iovec rparams[2]; struct passwd *pwd = NULL; gid_t groups[NGROUPS]; - int ch, error, i, ngroups, securelevel; - int hflag, iflag, Jflag, lflag, uflag, Uflag; - char path[PATH_MAX], *jailname, *ep, *username, *JidFile, *ip; + int ch, cmdarg, i, jail_set_flags, jid, ngroups, oldargs, securelevel; + int iflag, Jflag, lflag, rflag, uflag, Uflag; + char *ep, *username, *JidFile; + char errmsg[ERRMSG_SIZE]; static char *cleanenv; const char *shell, *p = NULL; long ltmp; FILE *fp; - struct addrinfo hints, *res0; - hflag = iflag = Jflag = lflag = uflag = Uflag = 0; - securelevel = -1; - jailname = username = JidFile = cleanenv = NULL; + iflag = Jflag = lflag = rflag = uflag = Uflag = 0; + jail_set_flags = JAIL_CREATE | JAIL_UPDATE; + cmdarg = jid = securelevel = -1; + username = JidFile = cleanenv = NULL; fp = NULL; - while ((ch = getopt(argc, argv, "hiln:s:u:U:J:")) != -1) { + while ((ch = getopt(argc, argv, "cdilor:s:u:U:J:")) != -1) { switch (ch) { - case 'h': - hflag = 1; + case 'd': + jail_set_flags |= JAIL_DYING; break; case 'i': iflag = 1; @@ -123,9 +117,6 @@ JidFile = optarg; Jflag = 1; break; - case 'n': - jailname = optarg; - break; case 's': ltmp = strtol(optarg, &ep, 0); if (*ep || ep == optarg || ltmp > INT_MAX || !ltmp) @@ -143,13 +134,41 @@ case 'l': lflag = 1; break; + case 'c': + jail_set_flags = + (jail_set_flags & ~JAIL_UPDATE) | JAIL_CREATE; + break; + case 'o': + jail_set_flags = + (jail_set_flags & ~JAIL_CREATE) | JAIL_UPDATE; + break; + case 'r': + jid = strtoul(optarg, &ep, 10); + if (!*optarg || *ep) { + *(const void **)&rparams[0].iov_base = "name"; + rparams[0].iov_len = sizeof("name"); + rparams[1].iov_base = optarg; + rparams[1].iov_len = strlen(optarg) + 1; + jid = jail_get(rparams, 2, 0); + if (jid < 0) + errx(1, "unknown jail: %s", optarg); + } + rflag = 1; + break; default: usage(); } } argc -= optind; argv += optind; - if (argc < 4) + if (rflag) { + if (argc > 0 || iflag || Jflag || lflag || uflag || Uflag) + usage(); + if (jail_remove(jid) < 0) + err(1, "jail_remove"); + exit (0); + } + if (argc == 0) usage(); if (uflag && Uflag) usage(); @@ -157,92 +176,70 @@ usage(); if (uflag) GET_USER_INFO; - if (realpath(argv[0], path) == NULL) - err(1, "realpath: %s", argv[0]); - if (chdir(path) != 0) - err(1, "chdir: %s", path); - /* Initialize struct jail. */ - memset(&j, 0, sizeof(j)); - j.version = JAIL_API_VERSION; - j.path = path; - j.hostname = argv[1]; - if (jailname != NULL) - j.jailname = jailname; - /* Handle IP addresses. If requested resolve hostname too. */ - bzero(&hints, sizeof(struct addrinfo)); - hints.ai_protocol = IPPROTO_TCP; - hints.ai_socktype = SOCK_STREAM; - if (JAIL_API_VERSION < 2) - hints.ai_family = PF_INET; - else - hints.ai_family = PF_UNSPEC; - /* Handle hostname. */ - if (hflag != 0) { - error = getaddrinfo(j.hostname, NULL, &hints, &res0); - if (error != 0) - errx(1, "failed to handle hostname: %s", - gai_strerror(error)); - error = add_addresses(res0); - freeaddrinfo(res0); - if (error != 0) - errx(1, "failed to add addresses."); + /* + * If the first argument (path) starts with a slash, and the third + * argument (IP address) starts with a digit, it is likely to be + * an old-style fixed-parameter command line. + */ + oldargs = argc >= 4 && argv[0][0] == '/' && isdigit(argv[2][0]); + if (oldargs) { + if ((jail_set_flags & (JAIL_CREATE | JAIL_UPDATE)) != + (JAIL_CREATE | JAIL_UPDATE)) + usage(); + jail_set_flags = JAIL_CREATE | JAIL_ATTACH; + set_param("path", argv[0]); + set_param("host.hostname", argv[1]); + set_param("ip4.addr", argv[2]); + cmdarg = 3; + } else { + for (i = 0; i < argc; i++) + if (!strncmp(argv[i], "command=", 8)) { + cmdarg = i; + argv[cmdarg] += 8; + jail_set_flags |= JAIL_ATTACH; + break; + } else + set_param(NULL, argv[i]); } - /* Handle IP addresses. */ - hints.ai_flags = AI_NUMERICHOST; - ip = strtok(argv[2], ","); - while (ip != NULL) { - error = getaddrinfo(ip, NULL, &hints, &res0); - if (error != 0) - errx(1, "failed to handle ip: %s", gai_strerror(error)); - error = add_addresses(res0); - freeaddrinfo(res0); - if (error != 0) - errx(1, "failed to add addresses."); - ip = strtok(NULL, ","); - } - /* Count IP addresses and add them to struct jail. */ - if (!STAILQ_EMPTY(&addr4)) { - j.ip4s = STAILQ_FIRST(&addr4)->count; - j.ip4 = copy_addr4(); - if (j.ip4s > 0 && j.ip4 == NULL) - errx(1, "copy_addr4()"); - } -#ifdef INET6 - if (!STAILQ_EMPTY(&addr6)) { - j.ip6s = STAILQ_FIRST(&addr6)->count; - j.ip6 = copy_addr6(); - if (j.ip6s > 0 && j.ip6 == NULL) - errx(1, "copy_addr6()"); - } -#endif + errmsg[0] = 0; + set_param("errmsg", errmsg); if (Jflag) { fp = fopen(JidFile, "w"); if (fp == NULL) errx(1, "Could not create JidFile: %s", JidFile); } - i = jail(&j); - if (i == -1) - err(1, "syscall failed with"); + jid = jail_set(¶ms->name, 2 * nparams, jail_set_flags); + if (jid < 0) { + if (errmsg[0] != '\0') + errx(1, "%s", errmsg); + err(1, "jail_set"); + } if (iflag) { - printf("%d\n", i); + printf("%d\n", jid); fflush(stdout); } if (Jflag) { - if (fp != NULL) { + if (oldargs) fprintf(fp, "%d\t%s\t%s\t%s\t%s\n", - i, j.path, j.hostname, argv[2], argv[3]); - (void)fclose(fp); - } else { - errx(1, "Could not write JidFile: %s", JidFile); + jid, (char *)params[0].value.iov_base, + argv[1], argv[2], argv[3]); + else { + fprintf(fp, "%d", jid); + for (i = 0; i < argc; i++) + fprintf(fp, "\t%s", argv[i]); + fprintf(fp, "\n"); } + (void)fclose(fp); } if (securelevel > 0) { if (sysctlbyname("kern.securelevel", NULL, 0, &securelevel, sizeof(securelevel))) err(1, "Can not set securelevel to %d", securelevel); } + if (cmdarg < 0) + exit(0); if (username != NULL) { if (Uflag) GET_USER_INFO; @@ -272,158 +269,256 @@ if (p) setenv("TERM", p, 1); } - if (execv(argv[3], argv + 3) != 0) - err(1, "execv: %s", argv[3]); - exit(0); + execvp(argv[cmdarg], argv + cmdarg); + err(1, "execvp: %s", argv[cmdarg]); } static void -usage(void) +set_param(const char *name, char *value) { + struct param *param; + char *ep, *p; + size_t buflen, mlen; + int i, nval, mib[CTL_MAXNAME]; + char buf[MAXPATHLEN]; - (void)fprintf(stderr, "%s%s%s\n", - "usage: jail [-hi] [-n jailname] [-J jid_file] ", - "[-s securelevel] [-l -u username | -U username] ", - "path hostname [ip[,..]] command ..."); - exit(1); -} + static int paramlistsize; -static int -add_addresses(struct addrinfo *res0) -{ - int error; - struct addrinfo *res; - struct addr4entry *a4p; - struct sockaddr_in *sai; + /* Separate the name from the value, if not done already. */ + if (name == NULL) { + name = value; + if ((value = strchr(value, '='))) + *value++ = '\0'; + } + + /* Handle pseudo-parameters separately. */ + if (!strcmp(name, "ip4_hostname")) { + set_param_ip_hostname(value, AF_INET); + return; + } #ifdef INET6 - struct addr6entry *a6p; - struct sockaddr_in6 *sai6; + if (!strcmp(name, "ip6_hostname")) { + set_param_ip_hostname(value, AF_INET6); + return; + } #endif - int count; - error = 0; - for (res = res0; res && error == 0; res = res->ai_next) { - switch (res->ai_family) { - case AF_INET: - sai = (struct sockaddr_in *)(void *)res->ai_addr; - STAILQ_FOREACH(a4p, &addr4, addr4entries) { - if (bcmp(&sai->sin_addr, &a4p->ip4, - sizeof(struct in_addr)) == 0) { - err(1, "Ignoring duplicate IPv4 address."); - break; - } - } - a4p = (struct addr4entry *) malloc( - sizeof(struct addr4entry)); - if (a4p == NULL) { - error = 1; - break; - } - bzero(a4p, sizeof(struct addr4entry)); - bcopy(&sai->sin_addr, &a4p->ip4, - sizeof(struct in_addr)); - if (!STAILQ_EMPTY(&addr4)) - count = STAILQ_FIRST(&addr4)->count; - else - count = 0; - STAILQ_INSERT_TAIL(&addr4, a4p, addr4entries); - STAILQ_FIRST(&addr4)->count = count + 1; + /* Check for repeat parameters */ + for (i = 0; i < nparams; i++) + if (!strcmp(name, params[i].name.iov_base)) { + memcpy(params + i, params + i + 1, + (--nparams - i) * sizeof(struct param)); break; + } + + /* Make sure there is room for the new param record. */ + if (!nparams) { + paramlistsize = 32; + params = malloc(paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "malloc"); + } else if (nparams >= paramlistsize) { + paramlistsize *= 2; + params = realloc(params, paramlistsize * sizeof(*params)); + if (params == NULL) + err(1, "realloc"); + } + + /* Look up the paramter. */ + param = params + nparams++; + *(const void **)¶m->name.iov_base = name; + param->name.iov_len = strlen(name) + 1; + /* Trivial values - no value or errmsg. */ + if (value == NULL) { + param->value.iov_base = value; + param->value.iov_len = 0; + return; + } + if (!strcmp(name, "errmsg")) { + param->value.iov_base = value; + param->value.iov_len = ERRMSG_SIZE; + return; + } + mib[0] = 0; + mib[1] = 3; + snprintf(buf, sizeof(buf), SJPARAM ".%s", name); + mlen = sizeof(mib) - 2 * sizeof(int); + if (sysctl(mib, 2, mib + 2, &mlen, buf, strlen(buf)) < 0) + errx(1, "unknown parameter: %s", name); + mib[1] = 4; + buflen = sizeof(buf); + if (sysctl(mib, (mlen / sizeof(int)) + 2, buf, &buflen, NULL, 0) < 0) + err(1, "sysctl(0.4.%s)", name); + /* + * See if this is an array type. + * Treat non-arrays as an array of one. + */ + p = strchr(buf + sizeof(int), '\0'); + nval = 1; + if (p - 2 >= buf && !strcmp(p - 2, ",a")) { + if (value[0] == '\0' || + (value[0] == '-' && value[1] == '\0')) { + param->value.iov_base = value; + param->value.iov_len = 0; + return; + } + p[-2] = 0; + for (p = strchr(value, ','); p; p = strchr(p + 1, ',')) { + *p = 0; + nval++; + } + } + + /* Set the values according to the parameter type. */ + switch (*(int *)buf & CTLTYPE) { + case CTLTYPE_INT: + case CTLTYPE_UINT: + param->value.iov_len = nval * sizeof(int); + break; + case CTLTYPE_LONG: + case CTLTYPE_ULONG: + param->value.iov_len = nval * sizeof(long); + break; + case CTLTYPE_STRUCT: + if (!strcmp(buf + sizeof(int), "S,in_addr")) + param->value.iov_len = nval * sizeof(struct in_addr); #ifdef INET6 - case AF_INET6: - sai6 = (struct sockaddr_in6 *)(void *)res->ai_addr; - STAILQ_FOREACH(a6p, &addr6, addr6entries) { - if (bcmp(&sai6->sin6_addr, &a6p->ip6, - sizeof(struct in6_addr)) == 0) { - err(1, "Ignoring duplicate IPv6 address."); - break; - } + else if (!strcmp(buf + sizeof(int), "S,in6_addr")) + param->value.iov_len = nval * sizeof(struct in6_addr); +#endif + else + errx(1, "%s: unknown parameter structure (%s)", + name, buf + sizeof(int)); + break; + case CTLTYPE_STRING: + if (!strcmp(name, "path")) { + param->value.iov_base = malloc(MAXPATHLEN); + if (param->value.iov_base == NULL) + err(1, "malloc"); + if (realpath(value, param->value.iov_base) == NULL) + err(1, "%s: realpath(%s)", name, value); + if (chdir(param->value.iov_base) != 0) + err(1, "chdir: %s", + (char *)param->value.iov_base); + } else + param->value.iov_base = value; + param->value.iov_len = strlen(param->value.iov_base) + 1; + return; + default: + errx(1, "%s: unknown parameter type %d (%s)", + name, *(int *)buf, buf + sizeof(int)); + } + param->value.iov_base = malloc(param->value.iov_len); + for (i = 0; i < nval; i++) { + switch (*(int *)buf & CTLTYPE) { + case CTLTYPE_INT: + ((int *)param->value.iov_base)[i] = + strtol(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_UINT: + ((unsigned *)param->value.iov_base)[i] = + strtoul(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_LONG: + ((long *)param->value.iov_base)[i] = + strtol(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_ULONG: + ((unsigned long *)param->value.iov_base)[i] = + strtoul(value, &ep, 10); + if (ep[0] != '\0') + errx(1, "%s: non-integer value \"%s\"", + name, value); + break; + case CTLTYPE_STRUCT: + if (!strcmp(buf + sizeof(int), "S,in_addr")) { + if (inet_pton(AF_INET, value, + &((struct in_addr *) + param->value.iov_base)[i]) != 1) + errx(1, "%s: not an IPv4 address: %s", + name, value); } - a6p = (struct addr6entry *) malloc( - sizeof(struct addr6entry)); - if (a6p == NULL) { - error = 1; - break; +#ifdef INET6 + else if (!strcmp(buf + sizeof(int), "S,in6_addr")) { + if (inet_pton(AF_INET6, value, + &((struct in6_addr *) + param->value.iov_base)[i]) != 1) + errx(1, "%s: not an IPv6 address: %s", + name, value); } - bzero(a6p, sizeof(struct addr6entry)); - bcopy(&sai6->sin6_addr, &a6p->ip6, - sizeof(struct in6_addr)); - if (!STAILQ_EMPTY(&addr6)) - count = STAILQ_FIRST(&addr6)->count; - else - count = 0; - STAILQ_INSERT_TAIL(&addr6, a6p, addr6entries); - STAILQ_FIRST(&addr6)->count = count + 1; - break; #endif - default: - err(1, "Address family %d not supported. Ignoring.\n", - res->ai_family); - break; } + value = strchr(value, '\0') + 1; } - - return (error); } -static struct in_addr * -copy_addr4(void) +static void +set_param_ip_hostname(char *value, int family) { - size_t len; - struct in_addr *ip4s, *p, ia; - struct addr4entry *a4p; + struct addrinfo hints, *ai0, *ai; + char *avalue, *nextav; + socklen_t avlen; + int error; - if (STAILQ_EMPTY(&addr4)) - return NULL; + /* Look up the hostname in the specified address family. */ + memset(&hints, 0, sizeof(hints)); + hints.ai_family = family; + error = getaddrinfo(value, NULL, &hints, &ai0); + if (error != 0) + errx(1, "hostname %s: %s", value, gai_strerror(error)); - len = STAILQ_FIRST(&addr4)->count * sizeof(struct in_addr); - - ip4s = p = (struct in_addr *)malloc(len); - if (ip4s == NULL) - return (NULL); - - bzero(p, len); - - while (!STAILQ_EMPTY(&addr4)) { - a4p = STAILQ_FIRST(&addr4); - STAILQ_REMOVE_HEAD(&addr4, addr4entries); - ia.s_addr = a4p->ip4.s_addr; - bcopy(&ia, p, sizeof(struct in_addr)); - p++; - free(a4p); + /* Convert the addresses to ASCII so set_param can convert them back. */ + avlen = 0; + for (ai = ai0; ai; ai = ai->ai_next) + avlen++; + avlen *= +#ifdef INET6 + family == AF_INET6 ? INET6_ADDRSTRLEN : +#endif + INET_ADDRSTRLEN; + avalue = malloc(avlen); + if (avalue == NULL) + err(1, "malloc"); + avalue[0] = 0; + for (nextav = avalue, ai = ai0; ai; ai = ai->ai_next) { + if (inet_ntop(family, +#ifdef INET6 + family == AF_INET6 ? + (void *)&((struct sockaddr_in6 *)&ai->ai_addr)->sin6_addr : +#endif + (void *)&((struct sockaddr_in *)&ai->ai_addr)->sin_addr, + nextav, avlen - (nextav - avalue)) == NULL) + err(1, "inet_ntop"); + if (ai->ai_next) { + nextav = strchr(nextav, '\0'); + *nextav++ = ','; + } } - - return (ip4s); + set_param( +#ifdef INET6 + family == AF_INET6 ? "ip6.addr" : +#endif + "ip4.addr", avalue); } -#ifdef INET6 -static struct in6_addr * -copy_addr6(void) +static void +usage(void) { - size_t len; - struct in6_addr *ip6s, *p; - struct addr6entry *a6p; - if (STAILQ_EMPTY(&addr6)) - return NULL; - - len = STAILQ_FIRST(&addr6)->count * sizeof(struct in6_addr); - - ip6s = p = (struct in6_addr *)malloc(len); - if (ip6s == NULL) - return (NULL); - - bzero(p, len); - - while (!STAILQ_EMPTY(&addr6)) { - a6p = STAILQ_FIRST(&addr6); - STAILQ_REMOVE_HEAD(&addr6, addr6entries); - bcopy(&a6p->ip6, p, sizeof(struct in6_addr)); - p++; - free(a6p); - } - - return (ip6s); + (void)fprintf(stderr, + "usage: jail [-d] [-i] [-J jid_file] [-s securelevel]\n" + " [-l -u username | -U username]\n" + " [[-c | -o] param=value ... [command=command ...] |\n" + " path hostname ip command ...]\n" + " jail [-r jail]\n"); + exit(1); } -#endif - Index: usr.sbin/jail/jail.8 =================================================================== --- usr.sbin/jail/jail.8 (revision 191896) +++ usr.sbin/jail/jail.8 (working copy) @@ -1,5 +1,6 @@ .\" .\" Copyright (c) 2000, 2003 Robert N. M. Watson +.\" Copyright (c) 2008 James Gritton .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without @@ -33,49 +34,37 @@ .\" .\" $FreeBSD$ .\" -.Dd January 24, 2009 +.Dd May 9, 2009 .Dt JAIL 8 .Os .Sh NAME .Nm jail -.Nd "imprison process and its descendants" +.Nd "create or modify a system jail" .Sh SYNOPSIS .Nm -.Op Fl hi -.Op Fl n Ar jailname +.Op Fl di .Op Fl J Ar jid_file .Op Fl s Ar securelevel .Op Fl l u Ar username | Fl U Ar username -.Ar path hostname [ip[,..]] command ... +.Op Fl c | o +.Op Ar parameter=value ... | path hostname ip command ... +.Br +.Nm +.Op Fl r Ar jail .Sh DESCRIPTION The .Nm -utility imprisons a process and all future descendants. +utility creates a new jail or modifies an existing jail, optionally +imprisoning the current process (and future descendants) inside it. .Pp The options are as follows: -.Bl -tag -width ".Fl u Ar username" -.It Fl h -Resolve -.Va hostname -and add all IP addresses returned by the resolver -to the list of -.Va ip-addresses -for this prison. -This may affect default address selection for outgoing IPv4 connections -of prisons. -The address first returned by the resolver for each address family -will be used as primary address. -See -.Va ip-addresses -further down for details. +.Bl -tag -width indent +.It Fl d +Allow making changes to a +.Va +dying jail. .It Fl i Output the jail identifier of the newly created jail. -.It Fl n Ar jailname -Assign and administrative name to the jail that can be used for management -or auditing purposes. -The system will -.Sy not enforce -the name to be unique. .It Fl J Ar jid_file Write a .Ar jid_file @@ -100,7 +89,10 @@ .It Fl s Ar securelevel Sets the .Va kern.securelevel -sysctl variable to the specified value inside the newly created jail. +MIB entry to the specified value inside the newly created jail. +This is equivalent to setting the jail's +.Va securelevel +parameter. .It Fl u Ar username The user name from host environment as whom the .Ar command @@ -109,20 +101,159 @@ The user name from jailed environment as whom the .Ar command should run. -.It Ar path +.It Fl c +Create a new jail, but do not modify an existing one. +Default behavior is to allow modification if a +.Va jid +or +.Va name +parameter refers to an existing jail. +.It Fl o +Only modify an existing jail, but do not create one. +One of the +.Va jid +or +.Va name +parameters must exist and refer to an existing jail. +.It Fl r +Remove the +.Ar jail +specified by jid or name. +All jailed processes are killed, and all children of this jail are also +removed. +.El +.Pp +.Ar Parameters +are listed in +.Dq name=value +form, following the options. +Some parameters are boolean, and do not have a value but are set by the +name alone with or without a +.Dq no +prefix, e.g. +.Va persist +or +.Va nopersist . +Any parameters not set will be given default values, generally based on the +current environment. +.Pp +The pseudo-parameter +.Va command +specifies that the current process should enter the new (or modified) jail, +and run the specified command. +It must be the last parameter specified, because it includes not only +the value following the +.Sq = +sign, but also passes the rest of the arguments to the command. +.Pp +Instead of supplying named +.Ar parameters , +four fixed parameters may be supplied in order on the command line: +.Ar path , +.Ar hostname , +.Ar ip , +and +.Ar command . +As the +.Va jid +and +.Va name +parameters aren't in this list, this mode will always create a new jail, and +the +.Fl c +and +.Fl o +options don't apply. +.Pp +Jails have a set a core parameters, and modules can add their own jail +parameters. +The current set of available parameters can be retrieved via +.Dq Nm sysctl Fl d Va security.jail.param . +Some of the notable core parameters include: +.Bl -tag -width indent +.It Va jid +The jail identifier. +This will be assigned automatically to a new jail (or can be explicitly +set), and can be used to identify the jail for later modification, or +for such commands as +.Xr jls 8 +or +.Xr jexec 8 . +.It Va name +The jail name. +This is an arbitrary string that identifies a jail (except it may not +contain a +.Sq \&. ) . +Like the +.Va jid , +it can be passed to later +.Nm +commands, or to +.Xr jls 8 +or +.Xr jexec 8 . +If no +.Va name +is supplied, a default is assumed that is the same as the +.Va jid . +.It Va path Directory which is to be the root of the prison. -.It Ar hostname -Hostname of the prison. -.It Ar ip-addresses -None, one or more IPv4 and IPv6 addresses assigned to the prison. -The first address of each address family that was assigned to the jail will -be used as the source address in case source address selection on unbound -sockets cannot find a better match. +The +.Va command +(if any) is run from this directory, as are commands from +.Xr jexec 8 . +.It Va ip4.addr +A comma-separated list of IPv4 addresses assigned to the prison. +If this is set, the jail is restricted to using only these address. +Any attempts to use other addresses fail, and attempts to use wildcard +addresses silently use the jailed address instead. +For IPv4 the first address given will be kept used as the source address +in case source address selection on unbound sockets cannot find a better +match. It is only possible to start multiple jails with the same IP address, if none of the jails has more than this single overlapping IP address -assigned to itself for the address family in question. -.It Ar command -Pathname of the program which is to be executed. +assigned to itself. +.Pp +A list of zero elements (an empty string) will stop the jail from using IPv4 +entirely; setting the boolean parameter +.Ar noip4 +will not restrict the jail at all. +.It Va ip6.addr +A list of IPv6 addresses assigned to the prison, the counterpart to +.Ar ip4.addr +above. +.It Va host.hostname +Hostname of the prison. +If not specified, a jail will use the system hostname. +.It Va ip4_hostname +.It Va ip6_hostname +These psuedo-parameters actually set the jail's +.Va ip4 +and +.Va ip6 +parameters, but will get those addresses by resolving the supplied hostname. +.It Va securelevel +The value of the jail's +.Va kern.securelevel +sysctl. +A jail never has a lower securelevel than the default system, but by +setting this parameter it may have a higher one. +If the system securelevel is changed, any jail securelevels will be at +least as secure. +.It Va persist +Setting this boolean parameter allows a jail to exist without any +processes. +Normally, a jail is destroyed as its last process exits. +.It Va command +The command to run after creating or modifying the jail. +This command is run inside the jail, under the +.Va path +directory. +A new jail must have either the +.Va persist +or +.Va command +parameter set. .El .Pp Jails are typically set up using one of two philosophies: either to @@ -142,10 +273,6 @@ This manual page documents the configuration steps necessary to support either of these steps, although the configuration steps may be refined based on local requirements. -.Pp -Please see the -.Xr jail 2 -man page for further details. .Sh EXAMPLES .Ss "Setting up a Jail Directory Tree" To set up a jail directory tree containing an entire @@ -359,15 +486,6 @@ virtual host interface, and then start the jail's .Pa /etc/rc script from within the jail. -.Pp -NOTE: If you plan to allow untrusted users to have root access inside the -jail, you may wish to consider setting the -.Va security.jail.set_hostname_allowed -sysctl variable to 0. -Please see the management discussion later in this document as to why this -may be a good idea. -If you do decide to set this variable, -it must be set before starting any jails, and once each boot. .Bd -literal -offset indent ifconfig ed0 inet alias 192.0.2.100/32 mount -t procfs proc /data/jail/192.0.2.100/proc @@ -445,7 +563,7 @@ .Pp The .Pa /proc/ Ns Ar pid Ns Pa /status -file contains, as its last field, the hostname of the jail in which the +file contains, as its last field, the name of the jail in which the process runs, or .Dq Li - to indicate that the process is not running within a jail. @@ -454,21 +572,7 @@ command also shows a .Ql J flag for processes in a jail. -However, the hostname for a jail may be, by -default, modified from within the jail, so the -.Pa /proc -status entry is unreliable by default. -To disable the setting of the hostname -from within a jail, set the -.Va security.jail.set_hostname_allowed -sysctl variable in the host environment to 0, which will affect all jails. -You can have this sysctl set on each boot using -.Xr sysctl.conf 5 . -Just add the following line to -.Pa /etc/sysctl.conf : .Pp -.Dl security.jail.set_hostname_allowed=0 -.Pp You can also list/kill processes based on their jail ID. To show processes and their jail ID, use the following command: .Pp @@ -510,8 +614,6 @@ the host environment using .Xr sysctl 8 MIB variables. -Currently, these variables affect all jails on the system, although in -the future this functionality may be finer grained. .Bl -tag -width XXX .It Va security.jail.allow_raw_sockets This MIB entry determines whether or not prison root is allowed to @@ -555,12 +657,6 @@ .Xr hostname 1 or .Xr sethostname 3 . -In the current jail implementation, the ability to set the hostname from -within the jail can impact management tools relying on the accuracy of jail -information in -.Pa /proc . -As such, this should be disabled in environments where privileged access to -jails is given out to untrusted parties. .It Va security.jail.socket_unixiproute_only The jail functionality binds an IPv4 address to each jail, and limits access to other network addresses in the IPv4 space that may be available @@ -605,12 +701,30 @@ a jail. This functionality is disabled by default, but can be enabled by setting this MIB entry to 1. -.It Va security.jail.jail_max_af_ips +.It Va security.jail.allow_jails +This MIB entry determines if a privileged user inside a jail can create +sub-jails under that jail. It is disabled by default, but can be enabled by +setting this MIB entry to 1. See the section below for more information on +hierarchical jails. +.It Va security.jail.max_af_ips This MIB entry determines how may address per address family a prison may have. The default is 255. .El .Pp -The read-only sysctl variable +These variables affect all jails on the system. Finer grained control is +available via per-jail boolean parameters in the +.Va perm +group. For example, to globally allow raw socket creation, you can set the +.Va security.jail.allow_raw_sockets +MIB entry; to allow a single jail to create raw sockets, set its +.Va perm.allow_raw_sockets +parameter. Or to disallow a single jail from setting its hostname, set +.Va perm.noset_hostname_allowed . +These per-jail permission parameters default to the current value of the +associated sysctls at the time of jail creation, but changing the sysctls +won't change the behavior of existing jails. +.Pp +The read-only MIB entry .Va security.jail.jailed can be used to determine if a process is running inside a jail (value is one) or not (value is zero). @@ -632,6 +746,68 @@ .Va kern.securelevel and .Va kern.hostname . +.Ss "Hierarchical Jails" +By setting the +.Va security.jail.allow_jails +MIB entry or a jail's +.Va perm.allow_jails +parameter, processes within a jail may be able to create jails of their own. +These child jails are kept in a hierarchy, with jails only able to see and/or +modify their own jails (or those jails' children). +Each jail has a read-only +.Va parent +parameter, containing the +.Va jid +of the jail that created it; a +.Va jid +of 0 indicated the jail is a child of the current jail (or is a top-level +jail if the current process isn't jailed). +Jail parameters that are normally inherited from the base system, are in +the hierarchical case inherited from the jail that created them. +.Pp +The global sysctl MIB entries listed above (with the exception of +.Va security.jail.jailed_sockets_first ) +are per-jail, and can be used to define the default permissions of child +jails. +Jailed processes are not allowed to confer greater permissions than they +themselves are given, e.g. if a jail is created with +.Va perm.noset_hostname_allowed , +it is not able to set its +.Va security.jail.set_hostname_allowed +sysctl. +Similarly, such restrictions as +.Va ip4 +and +.Va securelevel +may not be bypassed in child jails. +.Pp +A child jail may in turn create its own child jails, unless its own +.Va perm.noallow_jails +parameter is set (remember, it defaults to the parent jail's value). +These jails are visible to and can be modified by their parent and all +ancestors. +.Pp +Jail names reflect this hierarchy, with a full name being an MIB-type string +separated by dots. +For example, if a base system process creates a jail +.Dq foo , +and a process under that jail creates another jail +.Dq bar , +then the second jail will be seen as +.Dq foo.bar +in the base system (though it is only seen as +.Dq bar +to any processes inside jail +.Dq foo ) . +Jids on the other hand exist in a single space, and each jail must have a +unique jid. +.Pp +Like the names, a child jail's +.Va path +is relative to its creator's own +.Va path . +This is by virtue of the child jail being created in the chrooted +environment of the first jail. .Sh SEE ALSO .Xr killall 1 , .Xr lsvfs 1 , @@ -641,7 +817,7 @@ .Xr ps 1 , .Xr quota 1 , .Xr chroot 2 , -.Xr jail 2 , +.Xr jail_set 2 , .Xr jail_attach 2 , .Xr procfs 5 , .Xr rc.conf 5 , @@ -665,6 +841,8 @@ .Nm utility appeared in .Fx 4.0 . +Extensible jail parameters were introduced in +.Fx 8.0 . .Sh AUTHORS .An -nosplit The jail feature was written by @@ -683,6 +861,9 @@ originally done by .An Pawel Jakub Dawidek for IPv4. +.Pp +.An James Gritton +added the extensible jail parameters and hierchical jails. .Sh BUGS Jail currently lacks the ability to allow access to specific jail information via --------------020909010407060109070301-- From owner-freebsd-virtualization@FreeBSD.ORG Sat May 9 09:57:47 2009 Return-Path: Delivered-To: virtualization@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BA0641065672; Sat, 9 May 2009 09:57:47 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id 787BF8FC14; Sat, 9 May 2009 09:57:47 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from localhost (localhost.codelab.cz [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 9FBE119E044; Sat, 9 May 2009 11:38:46 +0200 (CEST) Received: from [192.168.1.2] (r5bb235.net.upc.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id 760C419E043; Sat, 9 May 2009 11:38:44 +0200 (CEST) Message-ID: <4A054F24.5030206@quip.cz> Date: Sat, 09 May 2009 11:38:44 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cz, cs, en, en-us MIME-Version: 1.0 To: Jamie Gritton References: <4A051DE3.30705@FreeBSD.org> In-Reply-To: <4A051DE3.30705@FreeBSD.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: virtualization@FreeBSD.org, jail@FreeBSD.org Subject: Re: Hierarchical jails X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 May 2009 09:57:48 -0000 Jamie Gritton wrote: > Here's the first round of hierarchical jails under the new framework. > > Instead of creds having either a prison or a NULL pointer, they all have > a prison pointer with the default being the global "prison0" that > contains information about the real environment. Jailed root may (if > granted permission) create prisons that would be under its place in the > hierarchy, but may not alter (or even see) prisons at its level or > above. > > The JID space is flat, i.e. every prison in the system has a unique ID. > The prison name space is hierarchical, with jails having dot-separated > component names. [...] I am glad that you are working on this feature! I added info + links to this patches on wiki http://wiki.freebsd.org/Jails I hope I will have some free time to test it soon. Miroslav Lachman