Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Jun 2015 21:26:46 +0200
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        kikuchan@uranus.dti.ne.jp
Cc:        freebsd-jail@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: [patch] separate SysV IPC namespace for jail
Message-ID:  <20150609192646.GC2039@dft-labs.eu>
In-Reply-To: <e4fac688781e0b491d328e632eaa6322@imap.cm.dream.jp>
References:  <CAG40kxFFnfvbLbqVprPC0oZ%2BnbKDYGxdvgd-vxWXFfN%2B3NQ0_A@mail.gmail.com> <20150605235348.GA9965@dft-labs.eu> <CAG40kxEaOAmcOCwb7p6NF6sgox-KysKh2RJgG7og1fi0WL0-Sg@mail.gmail.com> <20150607013929.GA9182@dft-labs.eu> <CAG40kxFaD%2BTS3Asb7ZiRW67XLtMOe6ChDEVgkSnt1Ji3013j4w@mail.gmail.com> <20150607083734.GB9182@dft-labs.eu> <CAG40kxGFvgP0Zhoseo%2BDi2Zk2J6kf0jA8isZD5UDOoqnWdkqYQ@mail.gmail.com> <20150608171702.GA15516@dft-labs.eu> <e4fac688781e0b491d328e632eaa6322@imap.cm.dream.jp>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 10, 2015 at 01:43:59AM +0900, kikuchan@uranus.dti.ne.jp wrote:
> > I only briefly looked at the patch. The fact that you perform outside of
> > ipcperm looks suspicious but may be harmless, so at best it's a bad
> > style. If you need ipc mechanism-specifc functions, make them call
> > ipcperm instead.
> 
> Sorry, I guess EACCES misled you.
> I should have chosen other value and/or concealed information for each jail completely.
> 
> I intended to demonstrate it's enought to achieve IPC key_t space separation (to PostgreSQL work) for each jails without having shmid_kernel struct for each jails.
> 

There is no technical problem with providing entirely separate ipcs
which would not have to be solved with this approach.

This approach is actually harder to get right and has no benefit that I
would see.

One example downside is resource limiting - implementing per-namespace
limits is a non-problem.

> > Well, as I said in my first mail the idea is to make ipc code look at
> > structures assigned to given jail, so that we can have multiple jails
> > with only their own objects. No "well, this id is used by other jail",
> > unless the namespace is explicitly shared.
> 
> Ok, now I've understood what the idea is, and maybe it's done by Nick once before on https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=48471
> 

> There are two identifier, ID and KEY (shmid and shmkey for SHM) on SysV IPC object.
> The KEY space must be separated for each jails I think, but about the ID, is determined by kernel and userland users don't care its value, do you really think it should be managed separately in kernel?
> I agree that "multiple jails with only their own objects" is good design basically, but especially, if you want to support hierarchical jails, the objects will be referenced by multiple jails.
> If ID space separation is not so important, separating internal namespace for each jail is too complicate for simple KEY space separation, I think.
> 

Not separating stuff is more complicated.

> I really should have implemented to conceal information instead of returning EACCES, sorry. ;/
> 
> Before jumping to the conclusion, I want to know that *current* code relative to SHM whether have any problems on sharing underlaying vm page between processes that jailed to different jails each other, especially on fork and jail_attach. multithreaded process perhaps?
> 

There is definitely no problem sharing /a page/. There may be a problem
sharing a page which was obtained from syvshm.

> I'm also trying to port Nick's code to 10/stable

This patch is old and deals with the mostly mechanical part of the work.
In particular, it DOES NOT deal with any concerns I already expressed.
This is understanadble to some extent since there were no multilevel
jails at the time and some people may have felt securing against host
root is not necessary.

The patch will likely have a lot of conflicts and it will be way faster
to write from scratch.

> I guess it was not happned on 4.8 because lack of jail_attach.
> For example, a process attached to shmid = 65535 on jid=1, then the process changes its jail to jid=2, and if shmid = 65535 exists on jid=2, the process refers wrong vm mapping unless maintain shmmap_state data for the process every jail_attach.

This is an example problem.

> Maybe this behavior is something relative to the race that you mentioned before?

It is not.

> 
> 
> > For instance back then I could not find any reliable mechanism to tell
> > me whether given process has a shared address space. There is only a
> > vm_refcnt counter in vmspace which is modified on various occasions,
> 
> Hmm, sorry I can't understand what the problem is here...
> I'm not good at kernel internals yet, so I don't know details of when the processes share the address space, and I have no idea why you want to know whether the process has a shared address space or not...
> 

rfork has a flag which makes the new process share the address space
with the parent. So when one of these processes jails somewhere, we can
end up with mappings from separate namespaces.

> 
> > It is easy to implement it for "private purposes" (i.e.
> > disregarding possible attacks with jailing processes). The real work is
> > making the whole business safe.
> 
> I agree.
> 
> Is there any project ongoing for this sysvipc issue?
> If any, what is needed to be done?
> 

I am unaware of any work being done in the area.

I stated what needs to be done in my first e-mail.

-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150609192646.GC2039>