From owner-freebsd-arch@FreeBSD.ORG Thu May 9 12:16:31 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 38EBE165 for ; Thu, 9 May 2013 12:16:31 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) by mx1.freebsd.org (Postfix) with ESMTP id 0BAE32C1 for ; Thu, 9 May 2013 12:16:31 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 61D3BB953; Thu, 9 May 2013 08:16:30 -0400 (EDT) From: John Baldwin To: Konstantin Belousov Subject: Re: Extending MADV_PROTECT Date: Thu, 9 May 2013 08:14:52 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <201305071433.27993.jhb@freebsd.org> <201305081209.49429.jhb@freebsd.org> <20130509082538.GQ3047@kib.kiev.ua> In-Reply-To: <20130509082538.GQ3047@kib.kiev.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201305090814.52166.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 09 May 2013 08:16:30 -0400 (EDT) Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 May 2013 12:16:31 -0000 On Thursday, May 09, 2013 4:25:38 am Konstantin Belousov wrote: > On Wed, May 08, 2013 at 12:09:49PM -0400, John Baldwin wrote: > > On Wednesday, May 08, 2013 5:58:27 am Konstantin Belousov wrote: > > > On Tue, May 07, 2013 at 02:33:27PM -0400, John Baldwin wrote: > > > > One of the issues I have with our current MADV_PROTECT is that it > > > > isn't very administrative-friendly. That is, as a sysadmin I can't > > > > easily protect arbitrary processes from the OOM killer. Instead, the > > > > binary has to be changed to invoke madvise(). Furthermore, once the > > > > protection is granted it can't be revoked. Also, any binaries that > > > > want this have to be run as root. Instead, I would like to be able > > > > to both set and revoke this for existing processes and possibly even > > > > allow it to be inherited (so I can tag a top-level daemon that forks > > > > and have all its future children be protected for example). To that > > > > end I've whipped up a simple patch (against 8, but should port to > > > > HEAD easily if folks think it is a good idea) to add a new pprotect() > > > > system call and userland program (protect) that can be used similar to > > > > ktrace(1) either as a modifier when running a new program or as a tool > > > > for setting or clearing protection for existing processes. > > > > > > > > The inherit feature isn't implemented yet, but it should be simple > > > > to do. One would simply need a new flag that PPROT_INHERIT sets that > > > > is checked on fork and propagates P_PROTECTED if it is set. Also, > > > > one other thought I had is that at some point we might want to make > > > > P_PROTECTED more fine-grained, e.g. by allowing for OOM "priorities". > > > > To that end, it may make sense to add a new argument to protect, > > > > though you could also reserve part of the 'op' parameter to encode a > > > > priority. > > > > > > Wouldn't the pprot_setchildren() miss a child for which the new pid and > > > struct proc are already allocated in the do_fork(), but which is not yet > > > linked into the process tree ? If true, I think this does not > > > fulfill the promise of the PPROT_DESCEND. > > > > ktrace has the same issue, and really, this is just a race. If the user > > had run the command a few nanoseconds earlier the proc wouldn't be allocated > > at all, and I doubt a user would notice the difference in those two cases. > > If you are doing this programmatically then that is a race that the program > > can handle. It isn't any different from having a new process begin its > > fork() a few nanoseconds after this returns either. This is why if you > > want that behavior you would use -di (and applies equally to ktrace). > So to get this correct, a person first should enable inheritance, and only > then turn on the protection on the subtree ? This sounds somewhat sloppy, > but fine. Yes, ktrace works the same way. In practice however, if you know your process isn't actively forking (e.g. a daemon that forks a child at startup but then doesn't fork again), you can use -d just fine. > > > Since the syscall is mean to be extended in the future, would it make > > > more sense to add a multiplexer, e.g. procctl(2), one operation of which > > > would be PROCCTL_PROTECT ? > > > > Do we expect it to do more than adjust protection? We already have a few > > other process-control system calls (e.g. ptrace()). It's hard to ensure > > it is sufficiently generic when only abstracting from one use case. > > You mentioned a priority, and I think ability to pass a structure to the > sub-function of the syscall is better then carving bits in the int argument, > or introducing a new syscall. I think the priority would still be a pprotect operation. In some ways it would be nice to be able to do ioctls on processes and maybe this could be structured similarly? int procctl(int pid, unsigned long cmd, ...) (So it's basically ioctl but with the 'fd' replaced with 'pid'. This would also mean that in the future with Robert's pdfork() you could perhaps have ioctl on a process fd just foward the request to procctl). -- John Baldwin