Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Apr 2018 21:04:15 +0300
From:      Konstantin Belousov <kib@freebsd.org>
To:        Thomas Munro <munro@ip9.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: PROC_SET_PDEATHSIG to get a signal when your parent exits
Message-ID:  <20180414180415.GC1774@kib.kiev.ua>
In-Reply-To: <CADLWmXWGReyZn9ZoM9q8gs=%2Bbqw7e4-Ti1HNEPxYoJJ=6gZHhA@mail.gmail.com>
References:  <CADLWmXWGReyZn9ZoM9q8gs=%2Bbqw7e4-Ti1HNEPxYoJJ=6gZHhA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Apr 14, 2018 at 04:44:51PM +1200, Thomas Munro wrote:
> Hello,
> 
> In PostgreSQL we are planning to make use of Linux's
> prctl(PR_SET_PDEATHSIG) facility which tells the kernel to signal you
> when your parent process exits.  This will cut down on polling and
> contention on a pipe in certain busy loops, where we need to be able
> to exit quickly if something kills the main supervisor process but we
> aren't waiting in kevent/epoll/poll-type primitive.  We considered
> SIGIO but that'd require either one pipe per child process or a common
> process group, neither of which works for us.  So I decided to try
> adding a new command PROC_SET_PDEATHSIG to procctl() to work exactly
> like Linux.
> 
> Please see attached early draft patch.  This is my first attempt at
> writing a patch for FreeBSD, so I expect that it contains newbie
> mistakes and style problems and I would be grateful for any feedback.
> Is this design sane?  Is that the best syscall to use for this?  Did I
> get the ptrace/orphan stuff right?  Should this somehow share
> something with the Linux compat support for prctl(PR_SET_PDEATHSIG,
> ...)?  Would people want this?
Do we support prctl(2) in linuxolator ?  Even if yes, I doubt that
there is PR_SET_PDEATHSIG feature.

> 
> Here's a quick and dirty userspace program I came up with while
> developing the patch:
> 
> https://github.com/macdice/test-deathsig/blob/master/test-deathsig.c
> 
> That tests a few interesting cases I thought of, but note that I
> haven't yet tested 32 bit support or the setuid/getuid logic.
Compile the test program using "cc -m32 -o test-deathsig-32 ...."
to easily get binary for compat32 testing.

We have test framework, and several procctl(2) tests already.
Please look at the tests/sys/kern/reaper.c.  You might consider
converting your test to the framework and also adding it to the
base system.

Other comments inline.  You may create the account at
https://reviews.freebsd.org and putting patch there.

> From b9a6730697ba09f7ee0f81129f3f976df826d1cd Mon Sep 17 00:00:00 2001
> From: Thomas Munro <munro@ip9.org>
> Date: Fri, 13 Apr 2018 23:41:30 +0100
> Subject: [PATCH] Support PROC_SET_PDEATHSIG as a command to procctl.
> 
> Allow processes to request the delivery of a signal upon death
> of their parent process.  This provides approximately the same
> behaviour as prctl(PR_SET_PDEATHSIG, ...) on Linux.
> 
> Thomas Munro <munro@ip9.org>
> ---
>  lib/libc/sys/procctl.2  | 29 +++++++++++++++++++++++++++++
>  sys/kern/kern_exec.c    |  4 ++++
>  sys/kern/kern_exit.c    | 13 +++++++++++++
>  sys/kern/kern_procctl.c | 33 ++++++++++++++++++++++++++++++++-
>  sys/kern/kern_thread.c  |  4 ++--
>  sys/sys/proc.h          |  1 +
>  sys/sys/procctl.h       |  2 ++
>  7 files changed, 83 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/libc/sys/procctl.2 b/lib/libc/sys/procctl.2
> index af96ae6e04c..e49ef36995f 100644
> --- a/lib/libc/sys/procctl.2
> +++ b/lib/libc/sys/procctl.2
> @@ -391,6 +391,24 @@ otherwise.
>  See the note about sysctl
>  .Dv kern.trap_enotcap
>  above, which gives independent global control of signal delivery.
> +.It Dv PROC_SET_PDEATHSIG
> +Request the delivery of a signal when the parent of the calling
> +process exits.
> +Use id type P_PID and id 0.
Use
.Fa id
type
.Dv P_PID
and
.Fa id
0.

In fact, '0' is ok as an alias for the current pid, but not allowing current
pid as the argument is strange.

I can see why you only allow the process to set the parent signal only
on itself, but also can see that it does not cost anything to allow
arbitrary pid.

But I do not see how would pfind(0) return curproc.  Don't you need to add
some code for this in kern_procctl() ?


> +The value is cleared for the children
> +of fork() and when executing set-user-ID or set-group-ID binaries.
.Xr fork 2
> +.Fa arg
> +must point to a value of type int indicating the signal
> +that should be delivered to the caller.  Use zero to cancel the
Start new sentence from the new line.

> +delivery of signals.
> +.It Dv PROC_GET_PDEATHSIG
> +Query the current signal number that will be delivered when the parent
> +of the calling process exits.
> +Use id type P_PID and id 0.
Same markup.

> +.Fa arg
> +must point to a memory location that can hold a value of type int.
> +If signal delivery has not been requested, it will contain zero
> +on return.
>  .El
>  .Sh NOTES
>  Disabling tracing on a process should not be considered a security
> @@ -487,6 +505,12 @@ parameter for the
>  or
>  .Dv PROC_TRAPCAP_CTL
>  request is invalid.
> +.It Bq Er EINVAL
> +The
> +.Dv PROC_SET_PDEATHSIG
> +or
> +.Dv PROC_GET_PDEATHSIG
> +request referenced an unsupported id, id_type or invalid signal number.
>  .El
>  .Sh SEE ALSO
>  .Xr dtrace 1 ,
> @@ -506,3 +530,8 @@ function appeared in
>  The reaper facility is based on a similar feature of Linux and
>  DragonflyBSD, and first appeared in
>  .Fx 10.2 .
> +The
> +.Dv PROC_SET_PDEATHSIG
> +facility is based on the prctl(PR_SET_PDEATHSIG, ...) feature of Linux,
> +and first appeared in
> +.Fx 12.0 .
> diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c
> index 21b25e0148c..679915a6e6e 100644
> --- a/sys/kern/kern_exec.c
> +++ b/sys/kern/kern_exec.c
> @@ -522,6 +522,10 @@ do_execve(struct thread *td, struct image_args *args, struct mac *mac_p)
>  	credential_changing |= will_transition;
>  #endif
>  
> +	/* Don't inherit PROC_SET_PDEATHSIG value if setuid/setgid. */
> +	if (credential_changing)
> +		imgp->proc->p_pdeathsig = 0;
> +
What is the Linux behaviour on exec ?  It seems of negative value to
receive a signal which disposition is reset to default.  In other words,
for the non-suid image, you typicall get the process terminated and
perhaps core dumped when parent is exiting.

>  	if (credential_changing &&
>  #ifdef CAPABILITY_MODE
>  	    ((oldcred->cr_flags & CRED_FLAG_CAPMODE) == 0) &&
> diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
> index f672a2c7358..55f08a004eb 100644
> --- a/sys/kern/kern_exit.c
> +++ b/sys/kern/kern_exit.c
> @@ -480,6 +480,12 @@ exit1(struct thread *td, int rval, int signo)
>  				PROC_LOCK(q->p_reaper);
>  				pksignal(q->p_reaper, SIGCHLD, ksi1);
>  				PROC_UNLOCK(q->p_reaper);
> +			} else if (q->p_pdeathsig > 0) {
> +				/*
> +				 * The child asked to received a signal
> +				 * when we exit.
> +				 */
> +				kern_psignal(q, q->p_pdeathsig);
Don't you need to lock q around kern_psignal() ?  Test your
patch with WITNESS and INVARIANTS kernel options enabled.

>  			}
>  		} else {
>  			/*
> @@ -520,6 +526,13 @@ exit1(struct thread *td, int rval, int signo)
>  	 */
>  	while ((q = LIST_FIRST(&p->p_orphans)) != NULL) {
>  		PROC_LOCK(q);
> +		/*
> +		 * If we are the real parent of this process
> +		 * but it has been reparented to a debugger, then
> +		 * check if it asked for a signal when we exit.
> +		 */
> +		if (q->p_pdeathsig > 0)
> +			kern_psignal(q, q->p_pdeathsig);
>  		CTR2(KTR_PTRACE, "exit: pid %d, clearing orphan %d", p->p_pid,
>  		    q->p_pid);
>  		clear_orphan(q);
> diff --git a/sys/kern/kern_procctl.c b/sys/kern/kern_procctl.c
> index dddc6b9c1d3..c0adeed570e 100644
> --- a/sys/kern/kern_procctl.c
> +++ b/sys/kern/kern_procctl.c
> @@ -431,7 +431,7 @@ sys_procctl(struct thread *td, struct procctl_args *uap)
>  		struct procctl_reaper_pids rp;
>  		struct procctl_reaper_kill rk;
>  	} x;
> -	int error, error1, flags;
> +	int error, error1, flags, signum;
>  
>  	switch (uap->com) {
>  	case PROC_SPROTECT:
> @@ -467,6 +467,15 @@ sys_procctl(struct thread *td, struct procctl_args *uap)
>  	case PROC_TRAPCAP_STATUS:
>  		data = &flags;
>  		break;
> +	case PROC_SET_PDEATHSIG:
> +		error = copyin(uap->data, &signum, sizeof(signum));
> +		if (error != 0)
> +			return (error);
> +		data = &signum;
> +		break;
> +	case PROC_GET_PDEATHSIG:
> +		data = &signum;
> +		break;
>  	default:
>  		return (EINVAL);
>  	}
> @@ -486,6 +495,10 @@ sys_procctl(struct thread *td, struct procctl_args *uap)
>  		if (error == 0)
>  			error = copyout(&flags, uap->data, sizeof(flags));
>  		break;
> +	case PROC_GET_PDEATHSIG:
> +		if (error == 0)
> +			error = copyout(&signum, uap->data, sizeof(signum));
> +		break;
>  	}
>  	return (error);
>  }
> @@ -527,6 +540,7 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data)
>  	struct pgrp *pg;
>  	struct proc *p;
>  	int error, first_error, ok;
> +	int signum;
>  	bool tree_locked;
>  
>  	switch (com) {
> @@ -560,6 +574,23 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data)
>  	case PROC_TRAPCAP_STATUS:
>  		tree_locked = false;
>  		break;
> +	case PROC_SET_PDEATHSIG:
> +		signum = *(int *)data;
> +		if (idtype != P_PID || id != 0 ||
> +		    (signum != 0 && !_SIG_VALID(signum)))
> +			return (EINVAL);
> +		PROC_LOCK(td->td_proc);
> +		td->td_proc->p_pdeathsig = signum;
> +		PROC_UNLOCK(td->td_proc);
> +		return (0);
> +	case PROC_GET_PDEATHSIG:
> +		if (idtype != P_PID || id != 0)
> +			return (EINVAL);
> +		PROC_LOCK(td->td_proc);
> +		signum = td->td_proc->p_pdeathsig;
> +		PROC_UNLOCK(td->td_proc);
> +		*(int *)data = signum;
Why do you need to mediate assignment through the signum ?

Note that the locking around the lone integer assignment is excessive,
but it does not hurt.

> +		return (0);
>  	default:
>  		return (EINVAL);
>  	}
> diff --git a/sys/kern/kern_thread.c b/sys/kern/kern_thread.c
> index 4d2f020063e..b01cd99e597 100644
> --- a/sys/kern/kern_thread.c
> +++ b/sys/kern/kern_thread.c
> @@ -91,7 +91,7 @@ _Static_assert(offsetof(struct proc, p_pid) == 0xbc,
>      "struct proc KBI p_pid");
>  _Static_assert(offsetof(struct proc, p_filemon) == 0x3d0,
>      "struct proc KBI p_filemon");
> -_Static_assert(offsetof(struct proc, p_comm) == 0x3e0,
> +_Static_assert(offsetof(struct proc, p_comm) == 0x3e4,
>      "struct proc KBI p_comm");
>  _Static_assert(offsetof(struct proc, p_emuldata) == 0x4b8,
>      "struct proc KBI p_emuldata");
> @@ -111,7 +111,7 @@ _Static_assert(offsetof(struct proc, p_pid) == 0x74,
>      "struct proc KBI p_pid");
>  _Static_assert(offsetof(struct proc, p_filemon) == 0x27c,
>      "struct proc KBI p_filemon");
> -_Static_assert(offsetof(struct proc, p_comm) == 0x288,
> +_Static_assert(offsetof(struct proc, p_comm) == 0x28c,
>      "struct proc KBI p_comm");
>  _Static_assert(offsetof(struct proc, p_emuldata) == 0x314,
>      "struct proc KBI p_emuldata");
> diff --git a/sys/sys/proc.h b/sys/sys/proc.h
> index d63362d7232..19e99439188 100644
> --- a/sys/sys/proc.h
> +++ b/sys/sys/proc.h
> @@ -624,6 +624,7 @@ struct proc {
>  	u_int		p_treeflag;	/* (e) P_TREE flags */
>  	int		p_pendingexits; /* (c) Count of pending thread exits. */
>  	struct filemon	*p_filemon;	/* (c) filemon-specific data. */
> +	int		p_pdeathsig;	/* (c) Signal from parent on exit. */
>  /* End area that is zeroed on creation. */
>  #define	p_endzero	p_magic
>  
> diff --git a/sys/sys/procctl.h b/sys/sys/procctl.h
> index 17144e9f0c6..cf0a69475e4 100644
> --- a/sys/sys/procctl.h
> +++ b/sys/sys/procctl.h
> @@ -51,6 +51,8 @@
>  #define	PROC_TRACE_STATUS	8	/* query tracing status */
>  #define	PROC_TRAPCAP_CTL	9	/* trap capability errors */
>  #define	PROC_TRAPCAP_STATUS	10	/* query trap capability status */
> +#define PROC_SET_PDEATHSIG	11	/* set parent death signal */
> +#define PROC_GET_PDEATHSIG	12	/* get parent death signal */
Note that other commands names follow the pattern
	PROC_<FACILITY>_<CMD>,
in your case it would be PROC_PDEATHSIG_SET/GET.

Please use tab after #define.

>  
>  /* Operations for PROC_SPROTECT (passed in integer arg). */
>  #define	PPROT_OP(x)	((x) & 0xf)
> -- 
> 2.16.2
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180414180415.GC1774>