Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 27 Oct 2013 15:53:07 -0400
From:      Mark Johnston <markj@freebsd.org>
To:        symbolics@gmx.com
Cc:        dtrace@freebsd.org
Subject:   Re: Firefox crash during dtrace attach under -CURRENT
Message-ID:  <20131027195307.GA3206@charmander.uwaterloo.ca>
In-Reply-To: <20131025145956.GA26814@lemon>
References:  <20131023203009.GA92945@lemon> <20131024025902.GA2286@charmander> <20131025104706.GB1705@lemon> <20131025145956.GA26814@lemon>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 25, 2013 at 03:59:56PM +0100, symbolics@gmx.com wrote:
> On Fri, Oct 25, 2013 at 11:47:06AM +0100, symbolics@gmx.com wrote:
> > On Wed, Oct 23, 2013 at 10:59:02PM -0400, Mark Johnston wrote:
> > > On Wed, Oct 23, 2013 at 09:30:09PM +0100, symbolics@gmx.com wrote:
> > > > Hi,
> > > > 
> > > > http://dtrace.org/blogs/brendan/2011/02/11/dtrace-pid-provider-arguments/
> > > > 
> > > > I tried to follow some of the examples but I crash the Firefox process
> > > > each time. Sometimes DTrace manages to collect a little data before the
> > > > death.
> > > > 
> > > > [...]
> > > > 
> > > > Is this a known problem or should I send a PR?
> > > 
> > > Thanks for reporting this: I was able to reproduce the crash and managed
> > > to find a nasty pair of bugs. Could you test the patch below and let me
> > > know if it fixes the problem for you as well? If you see more crashes,
> > > please include the backtrace and signo from gdb again; it would likely
> > > be a different problem that needs to be debugged and fixed separately.
> > 
> > Hi Mark,
> > 
> > This helps but there still may be some issues. First time I used this
> > I found that when I killed the DTrace process Firefox went down too
> > with a SIGTRAP. I have a possibly unhelpful core from this:
> > 
> 
> Another data point. I attached to mutt and reviewed some of the calls it
> was making. Subsequently I killed DTrace, went to to look at other
> things and a while later when back to check my mail. On attempting to
> change into a different mail folder mutt died with a SIGTRAP. It seems
> like DTrace isn't tidying up after itself?
> 
> (gdb) bt
> #0  0x0000000800722541 in r_debug_state (rd=0x802425480, m=0x7fffffff6c28)
>     at /usr/home/dm/git/freebsd/libexec/rtld-elf/rtld.c:3491
> #1  0x0000000000000000 in ?? ()

Ok, I think I've figured out this one too. As you note, dtrace(1) isn't
cleaning up some of its breakpoints properly when it detaches. In
particular, it's not stopping the victim process before it tries to
remove breakpoints using ptrace(2); however, ptrace requires the target
process to be stopped, else it will return EBUSY. So the breakpoint in
the rtld gets left behind, and it turns out that r_debug_state() is called
every time a process tries to dlopen() a shared object.

mutt was a good example since it seems to dlopen() iconv-related stuff
as I scan through my inbox; one can inspect this with DTrace. :)
i.e. with something like

	'pid$target::dlopen:entry {trace(copyinstr(arg0));}'

With this observation it becomes easy to reproduce the problem using a
test program that does something like

	while (1) {
		dlopen("/lib/libnonexistent.so.100", RTLD_LAZY);
		sleep(1);
	}

A somewhat crude patch which fixes this for me is below; it just adds
code to send SIGSTOP to the target process before trying to remove
breakpoints. Does anyone see any problems with this? Perhaps it should
be libproc's responsibility to ensure that the victim process is stopped
before trying a ptrace(PT_IO, ...) to add/remove breakpoints?

Thanks,
-Mark

diff --git a/cddl/contrib/opensolaris/lib/libdtrace/common/dt_proc.c b/cddl/contrib/opensolaris/lib/libdtrace/common/dt_proc.c
index d40a0ae..6ed78e4 100644
--- a/cddl/contrib/opensolaris/lib/libdtrace/common/dt_proc.c
+++ b/cddl/contrib/opensolaris/lib/libdtrace/common/dt_proc.c
@@ -505,7 +505,7 @@ dt_proc_control(void *arg)
 	dt_proc_t *dpr = datap->dpcd_proc;
 	dt_proc_hash_t *dph = dpr->dpr_hdl->dt_procs;
 	struct ps_prochandle *P = dpr->dpr_proc;
-	int pid = dpr->dpr_pid;
+	int pid = dpr->dpr_pid, status;
 
 #if defined(sun)
 	int pfd = Pctlfd(P);
@@ -702,7 +702,22 @@ pwait_locked:
 	 */
 	(void) pthread_mutex_lock(&dpr->dpr_lock);
 
+#if defined(__FreeBSD__)
+	/*
+	 * On FreeBSD, the victim process must be stopped before ptrace(2) can
+	 * be used to remove breakpoints.
+	 */
+	if (kill(dpr->dpr_pid, SIGSTOP) == 0 &&
+	    wait4(dpr->dpr_pid, &status, WSTOPPED | WEXITED, NULL) != -1 &&
+	    WIFSTOPPED(status)) {
+		dt_proc_bpdestroy(dpr, B_TRUE);
+		kill(dpr->dpr_pid, SIGCONT);
+	} else
+		dt_dprintf("pid %d: failed to remove breakpoints\n",
+		    dpr->dpr_pid);
+#else
 	dt_proc_bpdestroy(dpr, B_TRUE);
+#endif
 	dpr->dpr_done = B_TRUE;
 	dpr->dpr_tid = 0;



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131027195307.GA3206>