Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Jul 2016 10:05:02 -0700
From:      Mark Johnston <markj@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-current@FreeBSD.org
Subject:   Re: ptrace attach in multi-threaded processes
Message-ID:  <20160712170502.GA71220@wkstn-mjohnston.west.isilon.com>
In-Reply-To: <20160712055753.GI38613@kib.kiev.ua>
References:  <20160712011938.GA51319@wkstn-mjohnston.west.isilon.com> <20160712055753.GI38613@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jul 12, 2016 at 08:57:53AM +0300, Konstantin Belousov wrote:
> On Mon, Jul 11, 2016 at 06:19:38PM -0700, Mark Johnston wrote:
> > Hi,
> > 
> > It seems to be possible for ptrace(PT_ATTACH) to race with the delivery
> > of a signal to the same process. ptrace(PT_ATTACH) sets P_TRACED and
> > sends SIGSTOP to a thread in the target process. Consider the case where
> > a signal is delivered to a second thread, and both threads are executing
> > ast() concurrently. The two threads will both call issignal() and from
> > there call ptracestop() because P_TRACED is set, though they will be
> > serialized by the proc lock. If the thread receiving SIGSTOP wins the
> > race, it will suspend first and set p->p_xthread. The second thread will
> > also suspend in ptracestop(), overwriting the p_xthread field set by the
> > first thread. Later, ptrace(PT_DETACH) will unsuspend the threads, but
> > it will set td->td_xsig only in the second thread. This means that the
> > first thread will return SIGSTOP from ptracestop() and subsequently
> > suspend the process, which seems rather incorrect.
> Why ?  In particular, why delivering STOP after attach, in the described
> situation, is perceived as incorrect ?  Parallel STOPs, one from attach,
> and other from kill(2), must result in two stops.

I suppose it is not strictly incorrect. I find it surprising that a
PT_ATTACH followed by a PT_DETACH may leave the process in a different
state than it was in before the attach. This means that it is not
possible to gcore a process without potentially leaving it stopped, for
instance. This result may occur in a single-threaded process
as well, since a signal may already be queued when the PT_ATTACH handler
sends SIGSTOP.

To me it just seems a bit strange that ptrace's mechanism for stopping
the target - sending SIGSTOP - interacts this way with ptrace's handling
of signals - ptracestop()). Specifically, PT_ATTACH does not rely on the
SA_STOP property of SIGSTOP to stop the process, but rather on the
special signal handling in ptracestop().

> 
> The bit about overwriting p_xsig/p_xthread indeed initially sound worrysome,
> but probably not too much.  The only consequence of reassigning p_xthread
> is the selection of the 'lead' thread in sys_process.c, it seems.
> 
> > 
> > The above is just a theory to explain an unexpectedly-stopped
> > multi-threaded process that I've observed. Is there some mechanism I'm
> > missing that prevents multiple threads from suspending in ptracestop()
> > at the same time? If not, then I think that's the root of the problem,
> > since p_xthread is pretty clearly not meant to be overwritten this way.
> Again, why ?
> 
> Note the comment 
> 		 * Just make wait() to work, the last stopped thread
>                  * will win.
> which seems to point to the situation.

Indeed, I somehow missed that. I had assumed that the leaked TDB_XSIG
represented a bug in ptracestop().

> 
> > Moreover, in my scenario I see a thread with TDB_XSIG set even after
> > ptrace(PT_DETACH) was called (P_TRACED is cleared).
> This is interesting, we indeed do not clear the flag consistently.
> But again, the only consequence seems to be a possible invalid reporting
> of events.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160712170502.GA71220>