Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Dec 2019 02:11:14 +0300
From:      Dmitry Marakasov <amdmi3@amdmi3.ru>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: How can kill(-1, 0) return EPERM?
Message-ID:  <20191201231114.GG4071@hades.panopticon>
In-Reply-To: <20191201144813.GD10580@kib.kiev.ua>
References:  <20191129151606.GD4071@hades.panopticon> <20191129164509.GE4071@hades.panopticon> <20191129225834.GY10580@kib.kiev.ua> <20191201002411.GF4071@hades.panopticon> <20191201144813.GD10580@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
* Konstantin Belousov (kostikbel@gmail.com) wrote:

> > > > > I'm helping to investigate some userspace issue [1], where kill(-1, SIGKILL)
> > > > > fails with EPERM. I've managed to isolate this case in a small program:
> > > > > 
> > > > > 
> > > > > ```
> > > > > #include <err.h>
> > > > > #include <errno.h>
> > > > > #include <signal.h>
> > > > > #include <stdio.h>
> > > > > #include <string.h>
> > > > > #include <unistd.h>
> > > > > 
> > > > > int main() {
> > > > >     if (setuid(66) == -1)  // uucp, just for the test
> > > > >         err(1, "setuid");
> > > > > 
> > > > >     int res = kill(-1, 0);  // <- fails with EPERM
> > > > >     fprintf(stderr, "kill(-1, 0) result=%d, errno=%s\n", res, strerror(errno));
> > > > > 
> > > > >     return 0;
> > > > > }
> > > > > ```
> > > > > 
> > > > > when run from root on 12.1 kill call fails with EPERM. However I cannot
> > > > > comprehend what it is caused by and how it's even possible: kill(2) manpage
> > > > > says that with pid=-1 kill should only send (and in this case of sig=0,
> > > > > /not/ send) signals to the processes belonging to the current uid, so there
> > > > > should be no permission problems. I've also looked into the kernel code
> > > > > (sys_kill, killpg1), and it matches to what manpage says, I see no way
> > > > > for it to return EPERM: sys_kill() should fall through to the switch, call
> > > > > killpg1() with all=1 and killpg1() if(all) branch may only set `ret` to
> > > > > either 0 or ESRCH. Am I missing something, or is there a problem somewhere?
> > > > 
> > > > It looks like I have misread the `else if' path of this core.
> > > > 
> > > >     if (all) {
> > > >         /*
> > > >          * broadcast
> > > >          */
> > > >         sx_slock(&allproc_lock);
> > > >         FOREACH_PROC_IN_SYSTEM(p) {
> > > >             if (p->p_pid <= 1 || p->p_flag & P_SYSTEM ||
> > > >                 p == td->td_proc || p->p_state == PRS_NEW) {
> > > >                 continue;
> > > >             }
> > > >             PROC_LOCK(p);
> > > >             err = p_cansignal(td, p, sig);
> > > >             if (err == 0) {
> > > >                 if (sig)
> > > >                     pksignal(p, sig, ksi);
> > > >                 ret = err;
> > > >             }
> > > >             else if (ret == ESRCH)
> > > >                 ret = err;
> > > >             PROC_UNLOCK(p);
> > > >         }
> > > >         sx_sunlock(&allproc_lock);
> > > >     } ...
> > > > 
> > > > so it's clear now where EPERM comes from. However it looks like the
> > > > behavior contradicts the manpage - there are no signs of check that
> > > > the signalled process has the same uid as the caller.
> > > 
> > > I am not sure what you mean by 'signs of check'.  Look at p_cansignal()
> > > and cr_cansignal() implementation.
> > 
> > I've meant that according to the manpage
> > 
> >      If pid is -1:
> >              If the user has super-user privileges, the signal is sent to all
> >              processes excluding system processes (with P_SYSTEM flag set),
> >              process with ID 1 (usually init(8)), and the process sending the
> >              signal.  If the user is not the super user, the signal is sent to
> >              all processes with the same uid as the user excluding the process
> >              sending the signal.  No error is returned if any process could be
> >              signaled.
> > 
> > IMO there should be an additional check in this condition:
> > 
> >              if (p->p_pid <= 1 || p->p_flag & P_SYSTEM ||
> >                  p == td->td_proc || p->p_state == PRS_NEW) {
> >                  continue;
> >              }
> > 
> > E.g. something like
> > 
> >              if (p->p_pid <= 1 || p->p_flag & P_SYSTEM ||
> >                  p == td->td_proc || p->p_state == PRS_NEW ||
> >                  (td->td_ucred->cr_ruid != 0 &&
> > 			p->td_ucred->cr_ruid != td->td_ucred->cr_ruid) {
> >                  continue;
> >              }
> > 
> > e.g. it should not even attempt to signal processes with other uids.
> Why ?  You are trying to outguess p_cansignal(), which could deny
> action for much more reasons, so you would get EPERM still, e.g. if the
> target is suid.  Or, p_cansignal() also might allow to send the signal
> even for mismatched uids, again look at it code.

Exactly because of that - p_cansignal behaves in it's own way, which
doesn't match the indended/documented kill(2) behavior. You're right
in a sence that plain uid check is not sufficient though.

> I might guess that your complain is really about a different aspect
> of it.  If you look at the posix description of the EPERM error from
> kill(2) (really kill(3)), it says
> [EPERM]  The process does not have permission to send the signal to
>          any receiving process.
> In other words, we should not return EPERM if we signalled at least one
> of the process.
> 
> Is this the problem ?

It's not limited with this aspect. Here are some possible cases:

- no processes with the same UID
- one process with the same UID which can be signalled
- one process with the same UID which cannot (say, because of MAC) be signalled
- case of one process with the different UID which can be signalled

The kill(-1, 0) with the given UID results are/should be, correspondingly:

As per documentation: ESRCH, 0, EPERM, ESRCH
As per current implementation: EPERM, 0, EPERM, 0
As per implementation which relies solely on *_cansignal:
    ESRCH, 0, ESRCH, 0 (unrelated process signalled)

As you can see, relying soleley on p_cansignel would fix one case, but
break the others. We may need a trimmed down variant of p_cansignal for
this. Or make the latter take the mask of which checks it should
perform or skip.

-- 
Dmitry Marakasov   .   55B5 0596 FF1E 8D84 5F56  9510 D35A 80DD F9D2 F77D
amdmi3@amdmi3.ru  ..:              https://github.com/AMDmi3




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20191201231114.GG4071>