From owner-freebsd-dtrace@FreeBSD.ORG Sun Oct 27 19:53:14 2013 Return-Path: Delivered-To: dtrace@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 86FFAEC4 for ; Sun, 27 Oct 2013 19:53:14 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-ie0-x234.google.com (mail-ie0-x234.google.com [IPv6:2607:f8b0:4001:c03::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 56C3C2938 for ; Sun, 27 Oct 2013 19:53:14 +0000 (UTC) Received: by mail-ie0-f180.google.com with SMTP id e14so9919538iej.11 for ; Sun, 27 Oct 2013 12:53:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=FMbwu3goFNc9Hkh0D74yQOEbUxA61xTlGGlhrHwbARU=; b=g1aE1DQpSH2ZtIhT19V+Lh/5P8WXEIDoNv+mITMfimDLCza2P25Zm1GrLe/Xi2kLDF 6zGAg1lJ+C4PE+HYpEsYdMrUDHn4wpMXTIGUNeS16fMO8JtroIdvKXsnd1u4ke5mBU7d mLcjsDTPdlabgqpTQdFJ933AgpYHjv5FSmzvzOwkMAQ5YE+EN3de2ZyiIfAfP5ZKj5yR YPMAKavpKxQwoWAH9MubuGM1gKaTE+fhNuTDkjfdRtED6AokQVGhuJvZGjBETTdZd4L/ vH1vsBdN2e7NuKuVdMbS/sx3f9c8wNgqoe4WujZymWRTim9EQQUnNHQK7YWE4JCtkAiW jTCg== X-Received: by 10.42.208.211 with SMTP id gd19mr11433061icb.15.1382903593634; Sun, 27 Oct 2013 12:53:13 -0700 (PDT) Received: from charmander.uwaterloo.ca (rn-nat2-uw-129-97-124-209.net.uwaterloo.ca. [129.97.124.209]) by mx.google.com with ESMTPSA id i11sm15895127igh.0.2013.10.27.12.53.12 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 27 Oct 2013 12:53:12 -0700 (PDT) Sender: Mark Johnston Date: Sun, 27 Oct 2013 15:53:07 -0400 From: Mark Johnston To: symbolics@gmx.com Subject: Re: Firefox crash during dtrace attach under -CURRENT Message-ID: <20131027195307.GA3206@charmander.uwaterloo.ca> References: <20131023203009.GA92945@lemon> <20131024025902.GA2286@charmander> <20131025104706.GB1705@lemon> <20131025145956.GA26814@lemon> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131025145956.GA26814@lemon> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: dtrace@freebsd.org X-BeenThere: freebsd-dtrace@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "A discussion list for developers working on DTrace in FreeBSD." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Oct 2013 19:53:14 -0000 On Fri, Oct 25, 2013 at 03:59:56PM +0100, symbolics@gmx.com wrote: > On Fri, Oct 25, 2013 at 11:47:06AM +0100, symbolics@gmx.com wrote: > > On Wed, Oct 23, 2013 at 10:59:02PM -0400, Mark Johnston wrote: > > > On Wed, Oct 23, 2013 at 09:30:09PM +0100, symbolics@gmx.com wrote: > > > > Hi, > > > > > > > > http://dtrace.org/blogs/brendan/2011/02/11/dtrace-pid-provider-arguments/ > > > > > > > > I tried to follow some of the examples but I crash the Firefox process > > > > each time. Sometimes DTrace manages to collect a little data before the > > > > death. > > > > > > > > [...] > > > > > > > > Is this a known problem or should I send a PR? > > > > > > Thanks for reporting this: I was able to reproduce the crash and managed > > > to find a nasty pair of bugs. Could you test the patch below and let me > > > know if it fixes the problem for you as well? If you see more crashes, > > > please include the backtrace and signo from gdb again; it would likely > > > be a different problem that needs to be debugged and fixed separately. > > > > Hi Mark, > > > > This helps but there still may be some issues. First time I used this > > I found that when I killed the DTrace process Firefox went down too > > with a SIGTRAP. I have a possibly unhelpful core from this: > > > > Another data point. I attached to mutt and reviewed some of the calls it > was making. Subsequently I killed DTrace, went to to look at other > things and a while later when back to check my mail. On attempting to > change into a different mail folder mutt died with a SIGTRAP. It seems > like DTrace isn't tidying up after itself? > > (gdb) bt > #0 0x0000000800722541 in r_debug_state (rd=0x802425480, m=0x7fffffff6c28) > at /usr/home/dm/git/freebsd/libexec/rtld-elf/rtld.c:3491 > #1 0x0000000000000000 in ?? () Ok, I think I've figured out this one too. As you note, dtrace(1) isn't cleaning up some of its breakpoints properly when it detaches. In particular, it's not stopping the victim process before it tries to remove breakpoints using ptrace(2); however, ptrace requires the target process to be stopped, else it will return EBUSY. So the breakpoint in the rtld gets left behind, and it turns out that r_debug_state() is called every time a process tries to dlopen() a shared object. mutt was a good example since it seems to dlopen() iconv-related stuff as I scan through my inbox; one can inspect this with DTrace. :) i.e. with something like 'pid$target::dlopen:entry {trace(copyinstr(arg0));}' With this observation it becomes easy to reproduce the problem using a test program that does something like while (1) { dlopen("/lib/libnonexistent.so.100", RTLD_LAZY); sleep(1); } A somewhat crude patch which fixes this for me is below; it just adds code to send SIGSTOP to the target process before trying to remove breakpoints. Does anyone see any problems with this? Perhaps it should be libproc's responsibility to ensure that the victim process is stopped before trying a ptrace(PT_IO, ...) to add/remove breakpoints? Thanks, -Mark diff --git a/cddl/contrib/opensolaris/lib/libdtrace/common/dt_proc.c b/cddl/contrib/opensolaris/lib/libdtrace/common/dt_proc.c index d40a0ae..6ed78e4 100644 --- a/cddl/contrib/opensolaris/lib/libdtrace/common/dt_proc.c +++ b/cddl/contrib/opensolaris/lib/libdtrace/common/dt_proc.c @@ -505,7 +505,7 @@ dt_proc_control(void *arg) dt_proc_t *dpr = datap->dpcd_proc; dt_proc_hash_t *dph = dpr->dpr_hdl->dt_procs; struct ps_prochandle *P = dpr->dpr_proc; - int pid = dpr->dpr_pid; + int pid = dpr->dpr_pid, status; #if defined(sun) int pfd = Pctlfd(P); @@ -702,7 +702,22 @@ pwait_locked: */ (void) pthread_mutex_lock(&dpr->dpr_lock); +#if defined(__FreeBSD__) + /* + * On FreeBSD, the victim process must be stopped before ptrace(2) can + * be used to remove breakpoints. + */ + if (kill(dpr->dpr_pid, SIGSTOP) == 0 && + wait4(dpr->dpr_pid, &status, WSTOPPED | WEXITED, NULL) != -1 && + WIFSTOPPED(status)) { + dt_proc_bpdestroy(dpr, B_TRUE); + kill(dpr->dpr_pid, SIGCONT); + } else + dt_dprintf("pid %d: failed to remove breakpoints\n", + dpr->dpr_pid); +#else dt_proc_bpdestroy(dpr, B_TRUE); +#endif dpr->dpr_done = B_TRUE; dpr->dpr_tid = 0;