From owner-freebsd-hackers@FreeBSD.ORG Mon Dec 21 13:35:58 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EDC2E106566C; Mon, 21 Dec 2009 13:35:58 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id C11AB8FC1E; Mon, 21 Dec 2009 13:35:58 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 8084046B2D; Mon, 21 Dec 2009 08:35:58 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id C0ADD8A01B; Mon, 21 Dec 2009 08:35:57 -0500 (EST) From: John Baldwin To: freebsd-stable@freebsd.org Date: Fri, 18 Dec 2009 10:09:51 -0500 User-Agent: KMail/1.12.1 (FreeBSD/7.2-CBSD-20091103; KDE/4.3.1; amd64; ; ) References: <200912170908.49119.jhb@freebsd.org> <28F90357192743E085ABEE7CD4C9FDF9@multiplay.co.uk> In-Reply-To: <28F90357192743E085ABEE7CD4C9FDF9@multiplay.co.uk> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200912181009.51798.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 21 Dec 2009 08:35:57 -0500 (EST) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-hackers@freebsd.org, Steven Hartland Subject: Re: Passenger hangs on live and SEGV on tests possible threading / kernel bug? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Dec 2009 13:35:59 -0000 On Thursday 17 December 2009 12:27:17 pm Steven Hartland wrote: > ----- Original Message ----- > From: "John Baldwin" > > For the hang it seems you have a thread waiting in a blocking read(), a thread > > waiting in a blocking accept(), and lots of threads creating condition > > variables. However, the pthread_cond_init() in libpthread (libthr on FreeBSD) > > doesn't call pthread_cleanup_push(), so your stack trace doesn't make sense to > > me. However, that may be gdb getting confused. The pthread_cleanup_push() > > frame may be cond_init(). However, it doesn't call umtx_op() (the > > _thr_umutex_init() call it makes just initializes the structure, it doesn't > > make a _umtx_op() system call). You might try posting on threads@ to try to > > get more info on this, but your pthread_cond_init() stack traces don't really > > make sense. Can you rebuild libc and libthr with debug symbols? > > > > For example: > > > > # cd /usr/src/lib/libc > > # make clean > > # make DEBUG_FLAGS=-g > > # make DEBUG_FLAGS=-g install > > > > However, if you are hanging in read(), that usually means you have a socket > > that just doesn't have data. That might be an application bug of some sort. > > > > The segv trace doesn't include the first part of GDB messages which show which > > thread actually had a seg fault. It looks like it was the thread that was > > throwing an exception. However, nanosleep() doesn't throw exceptions, so that > > stack trace doesn't really make sense either. Perhaps that stack is hosed by > > the exception handling code? > > I've uploaded a two more traces for the oxt test failure / segv. > http://code.google.com/p/phusion-passenger/issues/detail?id=441#c1 > > >From looking at the test case it testing the capture of failures and its ability > to create a stack trace output so that may give others some indication where > the issue may be? > > I will look to do the same on for the hang issue but that's on a live site so > will need to schedule some downtime before I can get those rebuilt and then > wait for it to hang again, which could be quite some time :( Hmmm, the only seg fault I see is happening down inside libgcc in the stack unwinding code and that is 3rd party code from gcc. -- John Baldwin