From owner-freebsd-dtrace@FreeBSD.ORG  Wed Oct 30 12:37:18 2013
Return-Path: <owner-freebsd-dtrace@FreeBSD.ORG>
Delivered-To: dtrace@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 97750210
 for <dtrace@freebsd.org>; Wed, 30 Oct 2013 12:37:18 +0000 (UTC)
 (envelope-from symbolics@gmx.com)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 284132D8E
 for <dtrace@freebsd.org>; Wed, 30 Oct 2013 12:37:18 +0000 (UTC)
Received: from lemon ([80.7.17.14]) by mail.gmx.com (mrgmx003) with ESMTPSA
 (Nemesis) id 0LkgAG-1WBeie2Dp6-00aTLL for <dtrace@freebsd.org>; Wed, 30 Oct
 2013 13:37:16 +0100
Received: by lemon (Postfix, from userid 1001)
 id 0DCCEEB372; Wed, 30 Oct 2013 12:37:16 +0000 (GMT)
Date: Wed, 30 Oct 2013 12:37:16 +0000
From: symbolics@gmx.com
To: dtrace@freebsd.org
Subject: Re: Firefox crash during dtrace attach under -CURRENT
Message-ID: <20131030123716.GA2037@lemon>
References: <20131023203009.GA92945@lemon> <20131024025902.GA2286@charmander>
 <20131025104706.GB1705@lemon> <20131025145956.GA26814@lemon>
 <20131027195307.GA3206@charmander.uwaterloo.ca>
 <20131030081507.GA1674@lemon>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20131030081507.GA1674@lemon>
X-Provags-ID: V03:K0:Ox5UAYYJzTToDs8qKz7Azzz64GEvuq+Zzgg29A3Ogg/lgr0oEHN
 QSZTGiyqlXExVaDiolV1dtVsKyo9tXg/DEJNFshdw9vE1Y80HMni5SvbnpAIGTF/GOemS7K
 j583R2PWxdSc52af2Q44AgiUvz5V6mqk9WYN/526UgATOI5+PW3snYwOMOlT2Dg661gpHoC
 rS5Fp2n2ySgrMNBtiTFMQ==
X-BeenThere: freebsd-dtrace@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "A discussion list for developers working on DTrace in FreeBSD."
 <freebsd-dtrace.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-dtrace>, 
 <mailto:freebsd-dtrace-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-dtrace>
List-Post: <mailto:freebsd-dtrace@freebsd.org>
List-Help: <mailto:freebsd-dtrace-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-dtrace>,
 <mailto:freebsd-dtrace-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Oct 2013 12:37:18 -0000

On Wed, Oct 30, 2013 at 08:15:07AM +0000, symbolics@gmx.com wrote:
> On Sun, Oct 27, 2013 at 03:53:07PM -0400, Mark Johnston wrote:
> > On Fri, Oct 25, 2013 at 03:59:56PM +0100, symbolics@gmx.com wrote:
> > > On Fri, Oct 25, 2013 at 11:47:06AM +0100, symbolics@gmx.com wrote:
> > > > On Wed, Oct 23, 2013 at 10:59:02PM -0400, Mark Johnston wrote:
> > > > > On Wed, Oct 23, 2013 at 09:30:09PM +0100, symbolics@gmx.com wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > http://dtrace.org/blogs/brendan/2011/02/11/dtrace-pid-provider-arguments/
> > > > > > 
> > > > > > I tried to follow some of the examples but I crash the Firefox process
> > > > > > each time. Sometimes DTrace manages to collect a little data before the
> > > > > > death.
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > Is this a known problem or should I send a PR?
> > > > > 
> > > > > Thanks for reporting this: I was able to reproduce the crash and managed
> > > > > to find a nasty pair of bugs. Could you test the patch below and let me
> > > > > know if it fixes the problem for you as well? If you see more crashes,
> > > > > please include the backtrace and signo from gdb again; it would likely
> > > > > be a different problem that needs to be debugged and fixed separately.
> > > > 
> > > > Hi Mark,
> > > > 
> > > > This helps but there still may be some issues. First time I used this
> > > > I found that when I killed the DTrace process Firefox went down too
> > > > with a SIGTRAP. I have a possibly unhelpful core from this:
> > > > 
> > > 
> > > Another data point. I attached to mutt and reviewed some of the calls it
> > > was making. Subsequently I killed DTrace, went to to look at other
> > > things and a while later when back to check my mail. On attempting to
> > > change into a different mail folder mutt died with a SIGTRAP. It seems
> > > like DTrace isn't tidying up after itself?
> > > 
> > > (gdb) bt
> > > #0  0x0000000800722541 in r_debug_state (rd=0x802425480, m=0x7fffffff6c28)
> > >     at /usr/home/dm/git/freebsd/libexec/rtld-elf/rtld.c:3491
> > > #1  0x0000000000000000 in ?? ()
> > 
> > Ok, I think I've figured out this one too. As you note, dtrace(1) isn't
> > cleaning up some of its breakpoints properly when it detaches. In
> > particular, it's not stopping the victim process before it tries to
> > remove breakpoints using ptrace(2); however, ptrace requires the target
> > process to be stopped, else it will return EBUSY. So the breakpoint in
> > the rtld gets left behind, and it turns out that r_debug_state() is called
> > every time a process tries to dlopen() a shared object.
> > 
> > mutt was a good example since it seems to dlopen() iconv-related stuff
> > as I scan through my inbox; one can inspect this with DTrace. :)
> > i.e. with something like
> > 
> > 	'pid$target::dlopen:entry {trace(copyinstr(arg0));}'
> > 
> > With this observation it becomes easy to reproduce the problem using a
> > test program that does something like
> > 
> > 	while (1) {
> > 		dlopen("/lib/libnonexistent.so.100", RTLD_LAZY);
> > 		sleep(1);
> > 	}
> > 
> > A somewhat crude patch which fixes this for me is below; it just adds
> > code to send SIGSTOP to the target process before trying to remove
> > breakpoints. Does anyone see any problems with this? Perhaps it should
> > be libproc's responsibility to ensure that the victim process is stopped
> > before trying a ptrace(PT_IO, ...) to add/remove breakpoints?
> > 
> > Thanks,
> > -Mark
> > 
> 
> Hi Mark,
> 
> I've tried the patch but I can still reproduce the crash using mutt. I
> attached to the running mutt process with dtruss and watched that work a
> little bit, killed dtruss and carried on using mutt. I then tried to
> open a large mail folder and it crashed with a SIGTRAP. Backtrace, FWIW:
> 
> (gdb) bt
> #0  0x0000000800722541 in r_debug_state (rd=0x802425480, m=0x7fffffff84a8)
>     at /usr/home/dm/git/freebsd/libexec/rtld-elf/rtld.c:3491
> #1  0x0000000000000001 in ?? ()
> #2  0x0000000102c00000 in ?? ()
> #3  0x0000000000000001 in ?? ()
> #4  0x00000002028000c0 in ?? ()
> #5  0x0000000000000001 in ?? ()
> #6  0x0000000000000002 in ?? ()
> #7  0x00007fffffff8d00 in ?? ()
> #8  0x0000000801f25e3c in arena_avail_insert (arena=0xffffffff, 
>     chunk=0x7fffffff84a8, pageind=34395310320, npages=<value optimized out>, 
>     maybe_adjac_pred=<value optimized out>, maybe_adjac_succ=false)
>     at /usr/obj/usr/home/dm/git/freebsd/lib/libc/jemalloc_arena.c:274
> Previous frame inner to this frame (corrupt stack?)
> Current language:  auto; currently minimal
> 

Just quickly adding, this crash & trace is perfectly reproducible. I did
a second round of kernel building and testing to make sure I hadn't
messed up.

--sym