From owner-freebsd-threads@FreeBSD.ORG Sun Oct 24 10:22:23 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2B20C16A4CE; Sun, 24 Oct 2004 10:22:23 +0000 (GMT) Received: from smtp.des.no (flood.des.no [217.116.83.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id DE3D643D2D; Sun, 24 Oct 2004 10:22:22 +0000 (GMT) (envelope-from des@des.no) Received: by smtp.des.no (Pony Express, from userid 666) id 614D75313; Sun, 24 Oct 2004 12:22:21 +0200 (CEST) Received: from dwp.des.no (des.no [80.203.228.37]) by smtp.des.no (Pony Express) with ESMTP id 6136E5310; Sun, 24 Oct 2004 12:22:14 +0200 (CEST) Received: by dwp.des.no (Postfix, from userid 2602) id 1CD29B861; Sun, 24 Oct 2004 12:22:14 +0200 (CEST) To: Sam Leffler References: <417866BF.1000200@errno.com> From: des@des.no (=?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?=) Date: Sun, 24 Oct 2004 12:22:14 +0200 In-Reply-To: <417866BF.1000200@errno.com> (Sam Leffler's message of "Thu, 21 Oct 2004 18:47:43 -0700") Message-ID: User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on flood.des.no X-Spam-Level: X-Spam-Status: No, hits=0.0 required=5.0 tests=AWL autolearn=no version=2.64 cc: kde@freebsd.org cc: current@freebsd.org cc: Mikhail Teterin cc: kde-freebsd@freebsd.kde.org cc: Robert Watson cc: threads@freebsd.org cc: Michael Nottebrock Subject: Re: [kde-freebsd] unkillable multithreaded processes stuck in `STOP' state X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Oct 2004 10:22:23 -0000 Sam Leffler writes: > On my recent -current laptop (updated last week) I can reliably run > gdb on a program, break main, and quit. The process being debugged is > left in STOP state and is unkillable. try 'procctl '... DES --=20 Dag-Erling Sm=F8rgrav - des@des.no From owner-freebsd-threads@FreeBSD.ORG Sun Oct 24 21:20:04 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E236216A4CE for ; Sun, 24 Oct 2004 21:20:04 +0000 (GMT) Received: from silver.he.iki.fi (helenius.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id B3A8C43D48 for ; Sun, 24 Oct 2004 21:20:03 +0000 (GMT) (envelope-from pete@he.iki.fi) Received: from [193.64.42.134] (h86.vuokselantie10.fi [193.64.42.134]) by silver.he.iki.fi (8.13.1/8.11.4) with ESMTP id i9OLK1W9054015 for ; Mon, 25 Oct 2004 00:20:02 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <417C1C82.1090502@he.iki.fi> Date: Mon, 25 Oct 2004 00:20:02 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040803 X-Accept-Language: en-us, en MIME-Version: 1.0 To: threads@freebsd.org Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: cpu statistics missing X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Oct 2004 21:20:05 -0000 Is 5.3 going to have no CPU utilization statistics for threaded programs? Pete From owner-freebsd-threads@FreeBSD.ORG Mon Oct 25 04:28:25 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9512616A4CE for ; Mon, 25 Oct 2004 04:28:25 +0000 (GMT) Received: from pimout1-ext.prodigy.net (pimout1-ext.prodigy.net [207.115.63.77]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1F29D43D5A for ; Mon, 25 Oct 2004 04:28:24 +0000 (GMT) (envelope-from julian@elischer.org) Received: from [192.168.1.102] (adsl-68-123-122-146.dsl.snfc21.pacbell.net [68.123.122.146])i9P4SLWC439614; Mon, 25 Oct 2004 00:28:22 -0400 Message-ID: <417C80E5.60100@elischer.org> Date: Sun, 24 Oct 2004 21:28:21 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.8a3) Gecko/20041017 X-Accept-Language: en, hu MIME-Version: 1.0 To: Petri Helenius References: <417C1C82.1090502@he.iki.fi> In-Reply-To: <417C1C82.1090502@he.iki.fi> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: threads@freebsd.org Subject: Re: cpu statistics missing X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Oct 2004 04:28:25 -0000 Petri Helenius wrote: > > Is 5.3 going to have no CPU utilization statistics for threaded programs? > > Pete > > _______________________________________________ > freebsd-threads@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-threads > To unsubscribe, send any mail to "freebsd-threads-unsubscribe@freebsd.org" welll.. ummmmm.. no. I mean yes it is not.. From owner-freebsd-threads@FreeBSD.ORG Mon Oct 25 04:53:31 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 64D4F16A4CE for ; Mon, 25 Oct 2004 04:53:31 +0000 (GMT) Received: from silver.he.iki.fi (helenius.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id 32F3543D1D for ; Mon, 25 Oct 2004 04:53:30 +0000 (GMT) (envelope-from pete@he.iki.fi) Received: from [193.64.42.134] (h86.vuokselantie10.fi [193.64.42.134]) by silver.he.iki.fi (8.13.1/8.11.4) with ESMTP id i9P4rNkq059958; Mon, 25 Oct 2004 07:53:23 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <417C86C4.5030802@he.iki.fi> Date: Mon, 25 Oct 2004 07:53:24 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040803 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Julian Elischer References: <417C1C82.1090502@he.iki.fi> <417C80E5.60100@elischer.org> In-Reply-To: <417C80E5.60100@elischer.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: threads@freebsd.org Subject: Re: cpu statistics missing X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Oct 2004 04:53:31 -0000 Julian Elischer wrote: > Petri Helenius wrote: > >> >> Is 5.3 going to have no CPU utilization statistics for threaded >> programs? >> >> > > welll.. ummmmm.. no. > I mean yes it is not.. > > What could be the reason I'm getting all-zeroes on percentages on ps and top for a program using libpthread? For example in top the TIME column increases but the WCPU and CPU columns stand at 0.00%. Pete From owner-freebsd-threads@FreeBSD.ORG Mon Oct 25 05:16:14 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8AAB316A4CE for ; Mon, 25 Oct 2004 05:16:14 +0000 (GMT) Received: from pimout3-ext.prodigy.net (pimout3-ext.prodigy.net [207.115.63.102]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4045443D5A for ; Mon, 25 Oct 2004 05:16:14 +0000 (GMT) (envelope-from julian@elischer.org) Received: from [192.168.1.102] (adsl-68-123-122-146.dsl.snfc21.pacbell.net [68.123.122.146])i9P5GBSX040484; Mon, 25 Oct 2004 01:16:12 -0400 Message-ID: <417C8C1B.3060101@elischer.org> Date: Sun, 24 Oct 2004 22:16:11 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.8a3) Gecko/20041017 X-Accept-Language: en, hu MIME-Version: 1.0 To: Petri Helenius References: <417C1C82.1090502@he.iki.fi> <417C80E5.60100@elischer.org> <417C86C4.5030802@he.iki.fi> In-Reply-To: <417C86C4.5030802@he.iki.fi> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: threads@freebsd.org Subject: Re: cpu statistics missing X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Oct 2004 05:16:14 -0000 Petri Helenius wrote: > Julian Elischer wrote: > >> Petri Helenius wrote: >> >>> >>> Is 5.3 going to have no CPU utilization statistics for threaded >>> programs? >>> >>> >> >> welll.. ummmmm.. no. >> I mean yes it is not.. >> >> > What could be the reason I'm getting all-zeroes on percentages on ps and > top for a program using libpthread? > For example in top the TIME column increases but the WCPU and CPU > columns stand at 0.00%. there is no consistant kernel entity to which these stats can be assigned. > > Pete > > _______________________________________________ > freebsd-threads@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-threads > To unsubscribe, send any mail to "freebsd-threads-unsubscribe@freebsd.org" From owner-freebsd-threads@FreeBSD.ORG Mon Oct 25 11:02:24 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3D8B816A4D1 for ; Mon, 25 Oct 2004 11:02:24 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 16F1E43D46 for ; Mon, 25 Oct 2004 11:02:24 +0000 (GMT) (envelope-from owner-bugmaster@freebsd.org) Received: from freefall.freebsd.org (peter@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.11/8.12.11) with ESMTP id i9PB2NdV078394 for ; Mon, 25 Oct 2004 11:02:23 GMT (envelope-from owner-bugmaster@freebsd.org) Received: (from peter@localhost) by freefall.freebsd.org (8.12.11/8.12.11/Submit) id i9PB2MX9078388 for freebsd-threads@freebsd.org; Mon, 25 Oct 2004 11:02:22 GMT (envelope-from owner-bugmaster@freebsd.org) Date: Mon, 25 Oct 2004 11:02:22 GMT Message-Id: <200410251102.i9PB2MX9078388@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: peter set sender to owner-bugmaster@freebsd.org using -f From: FreeBSD bugmaster To: freebsd-threads@FreeBSD.org Subject: Current problem reports assigned to you X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Oct 2004 11:02:24 -0000 Current FreeBSD problem reports Critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2004/04/22] threads/65883threads libkse's sigwait does not work after fork 1 problem total. Serious problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2000/07/18] kern/20016 threads pthreads: Cannot set scheduling timer/Can o [2000/08/26] kern/20861 threads libc_r does not honor socket timeouts o [2001/01/20] threads/24472threads libc_r does not honor SO_SNDTIMEO/SO_RCVT o [2001/01/25] threads/24632threads libc_r delicate deviation from libc in ha o [2001/01/25] kern/24641 threads pthread_rwlock_rdlock can deadlock o [2001/11/26] bin/32295 threads pthread dont dequeue signals o [2002/02/01] threads/34536threads accept() blocks other threads o [2002/05/25] kern/38549 threads the procces compiled whith pthread stoppe o [2002/06/27] threads/39922threads [PATCH?] Threaded applications executed w o [2002/08/04] kern/41331 threads Pthread library open sets O_NONBLOCK flag o [2003/03/02] threads/48856threads Setting SIGCHLD to SIG_IGN still leaves z o [2003/03/10] threads/49087threads Signals lost in programs linked with libc o [2003/05/08] threads/51949threads thread in accept cannot be cancelled s [2004/03/15] kern/64313 threads FreeBSD (OpenBSD) pthread implicit set/un o [2004/08/26] threads/70975threads unexpected and unreliable behaviour when o [2004/09/14] threads/71725threads Mysql Crashes frequently giving Sock Erro o [2004/10/05] threads/72353threads Assertion fails in /usr/src/lib/libpthrea o [2004/10/07] threads/72429threads threads blocked in stdio (fgets, etc) are o [2004/10/21] threads/72953threads fork() unblocks blocked signals w/o PTHRE 19 problems total. Non-critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2000/05/26] kern/18824 threads gethostbyname is not thread safe o [2000/06/13] kern/19247 threads uthread_sigaction.c does not do anything o [2000/10/21] kern/22190 threads A threaded read(2) from a socketpair(2) f o [2001/09/09] threads/30464threads pthread mutex attributes -- pshared o [2002/05/02] threads/37676threads libc_r: msgsnd(), msgrcv(), pread(), pwri s [2002/07/16] threads/40671threads pthread_cancel doesn't remove thread from o [2004/07/13] threads/69020threads pthreads library leaks _gc_mutex o [2004/09/21] threads/71966threads Mlnet Core Dumped : Fatal error '_pq_inse 8 problems total. From owner-freebsd-threads@FreeBSD.ORG Wed Oct 27 20:15:17 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 18DE416A4D1 for ; Wed, 27 Oct 2004 20:15:17 +0000 (GMT) Received: from mail4.speakeasy.net (mail4.speakeasy.net [216.254.0.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id D79B743D49 for ; Wed, 27 Oct 2004 20:15:13 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 22596 invoked from network); 27 Oct 2004 20:15:13 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 27 Oct 2004 20:15:13 -0000 Received: from [10.50.40.221] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i9RKF2iN022438; Wed, 27 Oct 2004 16:15:07 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Daniel Eischen Date: Wed, 27 Oct 2004 14:29:14 -0400 User-Agent: KMail/1.6.2 References: In-Reply-To: MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200410271429.14164.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: threads@FreeBSD.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Oct 2004 20:15:17 -0000 On Thursday 21 October 2004 07:04 pm, Daniel Eischen wrote: > On Thu, 21 Oct 2004, John Baldwin wrote: > > The behavior seems more to be this: > > > > - thread does pthread_cond_wait*(c1) > > - thread enqueued on c1 > > - thread interrupted by a signal while on c1 but still in PS_RUNNING > > This shouldn't happen when signals are deferred. It should > only happen when the state is PS_COND_WAIT after we've > context switched to the scheduler. > > > - thread saves state which excludes the PTHREAD_FLAGS_IN_CONDQ flag > > (among others) > > Right, because it assumes that the thread will be backed out of > any mutex or CV queues prior to invoking the signal handler. > > > - thread calls _cond_wait_backout() if state is PS_COND_WAIT (but it's > > not in - this case, this is the normal case though, which is why it's ok > > to not save the CONDQ flag in the saved state above) > > Right. The problem is, how is the thread getting setup for a signal > while signals are deferred and the state has not yet been changed > from PS_RUNNING to PS_COND_WAIT? > > > - thread executes signal handler > > - thread restores state > > - pthread_condwait*() see that interrupted is 0, so don't try to remove > > the thread from the condition variable (also, PTHREAD_FLAGS_IN_CONDQ > > isn't set either, so we can't detect this case that way) > > - thread returns from pthread_cond_wait() (maybe due to timeout, etc.) > > - thread calls pthread_cond_wait*(c2) > > - thread enqueued on c2 > > - another thread does pthread_cond_broadcast(c2), and bewm > > > > My question is is it possible for the thread to get interrupted and > > chosen to run a signal while it is on c1 somehow given my patch to defer > > signals around the wait loops (and is that patch correct btw given the > > above scenario?) > > Yes (and yes I think). Defering signals just means that the signal handler > won't try to install a signal frame on the current thread; instead it just > queues the signal and the scheduler will pick it up and send it to the > correct thread. > > I do think signals should be deferred for condition variables so > that setting the thread state (to PS_COND_WAIT) is atomic. > > It's not obvious to be where the bug is. If you had a simple > test case to reproduce it that would help. FWIW, we are having (I think) the same problem on 5.3 with libpthread. The panic there is in the mutex code about an assertion failing because a thread is on a syncq when it is not supposed to be. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-threads@FreeBSD.ORG Wed Oct 27 22:30:17 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 235C516A4CE; Wed, 27 Oct 2004 22:30:17 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id AA86943D2D; Wed, 27 Oct 2004 22:30:16 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i9RMUFKw022657; Wed, 27 Oct 2004 18:30:15 -0400 (EDT) Date: Wed, 27 Oct 2004 18:30:15 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: John Baldwin In-Reply-To: <200410271429.14164.jhb@FreeBSD.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: threads@freebsd.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Oct 2004 22:30:17 -0000 On Wed, 27 Oct 2004, John Baldwin wrote: > On Thursday 21 October 2004 07:04 pm, Daniel Eischen wrote: > > On Thu, 21 Oct 2004, John Baldwin wrote: > > > The behavior seems more to be this: > > > > > > - thread does pthread_cond_wait*(c1) > > > - thread enqueued on c1 > > > - thread interrupted by a signal while on c1 but still in PS_RUNNING > > > > This shouldn't happen when signals are deferred. It should > > only happen when the state is PS_COND_WAIT after we've > > context switched to the scheduler. > > > > > - thread saves state which excludes the PTHREAD_FLAGS_IN_CONDQ flag > > > (among others) > > > > Right, because it assumes that the thread will be backed out of > > any mutex or CV queues prior to invoking the signal handler. > > > > > - thread calls _cond_wait_backout() if state is PS_COND_WAIT (but it's > > > not in - this case, this is the normal case though, which is why it's ok > > > to not save the CONDQ flag in the saved state above) > > > > Right. The problem is, how is the thread getting setup for a signal > > while signals are deferred and the state has not yet been changed > > from PS_RUNNING to PS_COND_WAIT? > > > > > - thread executes signal handler > > > - thread restores state > > > - pthread_condwait*() see that interrupted is 0, so don't try to remove > > > the thread from the condition variable (also, PTHREAD_FLAGS_IN_CONDQ > > > isn't set either, so we can't detect this case that way) > > > - thread returns from pthread_cond_wait() (maybe due to timeout, etc.) > > > - thread calls pthread_cond_wait*(c2) > > > - thread enqueued on c2 > > > - another thread does pthread_cond_broadcast(c2), and bewm > > > > > > My question is is it possible for the thread to get interrupted and > > > chosen to run a signal while it is on c1 somehow given my patch to defer > > > signals around the wait loops (and is that patch correct btw given the > > > above scenario?) > > > > Yes (and yes I think). Defering signals just means that the signal handler > > won't try to install a signal frame on the current thread; instead it just > > queues the signal and the scheduler will pick it up and send it to the > > correct thread. > > > > I do think signals should be deferred for condition variables so > > that setting the thread state (to PS_COND_WAIT) is atomic. > > > > It's not obvious to be where the bug is. If you had a simple > > test case to reproduce it that would help. > > FWIW, we are having (I think) the same problem on 5.3 with libpthread. The > panic there is in the mutex code about an assertion failing because a thread > is on a syncq when it is not supposed to be. David and I recently fixed some races in pthread_join() and pthread_exit() in -current libpthread. Don't know if those were responsible... Here's a test program that shows correct behavior with both libc_r and libpthread in -current. -- Dan Eischen #include #include #include #include pthread_mutex_t gm; static void handler(int sig) { printf("Thread %p Got signal %d\n", pthread_self(), sig); } static void * waiter(void *arg) { sigset_t set; sigemptyset(&set); sigaddset(&set, SIGUSR1); sigprocmask(SIG_UNBLOCK, &set, NULL); for (;;) { pthread_mutex_lock(&gm); printf("Waiter locked mutex."); sleep(1); pthread_mutex_unlock(&gm); printf("Waiter unlocked mutex."); sleep(1); } return (NULL); } int main(int argc, char *argv[]) { struct sigaction act; sigset_t set; pthread_t tid; sigemptyset(&set); sigaddset(&set, SIGUSR1); sigprocmask(SIG_BLOCK, &set, NULL); /* Install a handler for SIGUSR1. */ act.sa_handler = handler; act.sa_flags = SA_RESTART; sigfillset(&act.sa_mask); sigaction(SIGUSR1, &act, NULL); pthread_mutex_init(&gm, NULL); pthread_mutex_lock(&gm); pthread_create(&tid, NULL, waiter, NULL); for (;;) { kill(getpid(), SIGUSR1); pthread_yield(); } return (0); } From owner-freebsd-threads@FreeBSD.ORG Wed Oct 27 23:26:58 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 09D9916A4CE; Wed, 27 Oct 2004 23:26:58 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id DAAEE43D1D; Wed, 27 Oct 2004 23:26:57 +0000 (GMT) (envelope-from davidxu@freebsd.org) Received: from [127.0.0.1] (davidxu@localhost [127.0.0.1]) i9RNQuZa031859; Wed, 27 Oct 2004 23:26:57 GMT (envelope-from davidxu@freebsd.org) Message-ID: <41802EE5.6070200@freebsd.org> Date: Thu, 28 Oct 2004 07:27:33 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.7.2) Gecko/20041004 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Daniel Eischen References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: threads@freebsd.org cc: John Baldwin Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Oct 2004 23:26:58 -0000 Daniel Eischen wrote: >> FWIW, we are having (I think) the same problem on 5.3 with >> libpthread. The >> >>panic there is in the mutex code about an assertion failing because a thread >>is on a syncq when it is not supposed to be. >> >> > >David and I recently fixed some races in pthread_join() and >pthread_exit() in -current libpthread. Don't know if those >were responsible... > > > That fix should be MFCed ASAP. >Here's a test program that shows correct behavior with both >libc_r and libpthread in -current. > > > From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 00:55:52 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4F9CA16A4CE; Thu, 28 Oct 2004 00:55:52 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id A941743D3F; Thu, 28 Oct 2004 00:55:51 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id 707577A423; Wed, 27 Oct 2004 17:55:49 -0700 (PDT) Message-ID: <41804394.7020306@elischer.org> Date: Wed, 27 Oct 2004 17:55:48 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: David Xu References: <41802EE5.6070200@freebsd.org> In-Reply-To: <41802EE5.6070200@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: Daniel Eischen cc: threads@freebsd.org cc: re@freebsd.org cc: John Baldwin Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 00:55:52 -0000 David, do you have revision numbers of what needs to be MFC'd? David Xu wrote: > Daniel Eischen wrote: > >>> FWIW, we are having (I think) the same problem on 5.3 with >>> libpthread. The >>> >>> panic there is in the mutex code about an assertion failing because >>> a thread >>> is on a syncq when it is not supposed to be. >>> >> >> >> David and I recently fixed some races in pthread_join() and >> pthread_exit() in -current libpthread. Don't know if those >> were responsible... >> >> >> > That fix should be MFCed ASAP. > >> Here's a test program that shows correct behavior with both >> libc_r and libpthread in -current. >> >> >> > > _______________________________________________ > freebsd-threads@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-threads > To unsubscribe, send any mail to > "freebsd-threads-unsubscribe@freebsd.org" From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 19:53:16 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6F41B16A4CE for ; Thu, 28 Oct 2004 19:53:16 +0000 (GMT) Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2845E43D3F for ; Thu, 28 Oct 2004 19:53:16 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 28649 invoked from network); 28 Oct 2004 19:53:15 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 28 Oct 2004 19:53:14 -0000 Received: from [10.50.40.221] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i9SJqvEI030410; Thu, 28 Oct 2004 15:52:59 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Daniel Eischen Date: Thu, 28 Oct 2004 15:54:07 -0400 User-Agent: KMail/1.6.2 References: In-Reply-To: MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200410281554.07222.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: threads@FreeBSD.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 19:53:16 -0000 On Wednesday 27 October 2004 06:30 pm, Daniel Eischen wrote: > On Wed, 27 Oct 2004, John Baldwin wrote: > > On Thursday 21 October 2004 07:04 pm, Daniel Eischen wrote: > > > On Thu, 21 Oct 2004, John Baldwin wrote: > > > > The behavior seems more to be this: > > > > > > > > - thread does pthread_cond_wait*(c1) > > > > - thread enqueued on c1 > > > > - thread interrupted by a signal while on c1 but still in PS_RUNNING > > > > > > This shouldn't happen when signals are deferred. It should > > > only happen when the state is PS_COND_WAIT after we've > > > context switched to the scheduler. > > > > > > > - thread saves state which excludes the PTHREAD_FLAGS_IN_CONDQ flag > > > > (among others) > > > > > > Right, because it assumes that the thread will be backed out of > > > any mutex or CV queues prior to invoking the signal handler. > > > > > > > - thread calls _cond_wait_backout() if state is PS_COND_WAIT (but > > > > it's not in - this case, this is the normal case though, which is why > > > > it's ok to not save the CONDQ flag in the saved state above) > > > > > > Right. The problem is, how is the thread getting setup for a signal > > > while signals are deferred and the state has not yet been changed > > > from PS_RUNNING to PS_COND_WAIT? > > > > > > > - thread executes signal handler > > > > - thread restores state > > > > - pthread_condwait*() see that interrupted is 0, so don't try to > > > > remove the thread from the condition variable (also, > > > > PTHREAD_FLAGS_IN_CONDQ isn't set either, so we can't detect this case > > > > that way) > > > > - thread returns from pthread_cond_wait() (maybe due to timeout, > > > > etc.) - thread calls pthread_cond_wait*(c2) > > > > - thread enqueued on c2 > > > > - another thread does pthread_cond_broadcast(c2), and bewm > > > > > > > > My question is is it possible for the thread to get interrupted and > > > > chosen to run a signal while it is on c1 somehow given my patch to > > > > defer signals around the wait loops (and is that patch correct btw > > > > given the above scenario?) > > > > > > Yes (and yes I think). Defering signals just means that the signal > > > handler won't try to install a signal frame on the current thread; > > > instead it just queues the signal and the scheduler will pick it up and > > > send it to the correct thread. > > > > > > I do think signals should be deferred for condition variables so > > > that setting the thread state (to PS_COND_WAIT) is atomic. > > > > > > It's not obvious to be where the bug is. If you had a simple > > > test case to reproduce it that would help. > > > > FWIW, we are having (I think) the same problem on 5.3 with libpthread. > > The panic there is in the mutex code about an assertion failing because a > > thread is on a syncq when it is not supposed to be. > > David and I recently fixed some races in pthread_join() and > pthread_exit() in -current libpthread. Don't know if those > were responsible... > > Here's a test program that shows correct behavior with both > libc_r and libpthread in -current. We've started testing on -current and are seeing several problems with libpthread. Using a UP kernel (machines have single processor with HTT) seems to make it better, but we seem to be getting SIG 11's in pthread_testcancel() as well as the failed lock assertions that were mentioned earlier on the list in the PR. Just running monodevelop from the bsd-sharp stuff mentioned earlier can break in that one of the processes dies with the assertion failure. If you let the other processes run, then you can run it again and get the window to pop up, but then clicking on any of the controls results in the pthread_testcancel() crash. FWIW, I think the reason that the stack traces look weird in the PR's thread may be due to catching a signal. When we were looking at the problems with libc_r on 4.x we would get some weird looking backtraces sometimes when the assertion in uthread_sig.c that I added failed. Seems that gdb doesn't handle the signal frames very well. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 20:27:58 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8A52B16A4CE; Thu, 28 Oct 2004 20:27:58 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1BA0543D49; Thu, 28 Oct 2004 20:27:58 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i9SKRukD009928; Thu, 28 Oct 2004 16:27:56 -0400 (EDT) Date: Thu, 28 Oct 2004 16:27:56 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: John Baldwin In-Reply-To: <200410281554.07222.jhb@FreeBSD.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: threads@freebsd.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 20:27:58 -0000 On Thu, 28 Oct 2004, John Baldwin wrote: > On Wednesday 27 October 2004 06:30 pm, Daniel Eischen wrote: > > On Wed, 27 Oct 2004, John Baldwin wrote: > > > > > > FWIW, we are having (I think) the same problem on 5.3 with libpthread. > > > The panic there is in the mutex code about an assertion failing because a > > > thread is on a syncq when it is not supposed to be. > > > > David and I recently fixed some races in pthread_join() and > > pthread_exit() in -current libpthread. Don't know if those > > were responsible... > > > > Here's a test program that shows correct behavior with both > > libc_r and libpthread in -current. > > We've started testing on -current and are seeing several problems with > libpthread. Using a UP kernel (machines have single processor with HTT) > seems to make it better, but we seem to be getting SIG 11's in > pthread_testcancel() as well as the failed lock assertions that were > mentioned earlier on the list in the PR. Just running monodevelop from the > bsd-sharp stuff mentioned earlier can break in that one of the processes dies > with the assertion failure. If you let the other processes run, then you can > run it again and get the window to pop up, but then clicking on any of the > controls results in the pthread_testcancel() crash. FWIW, I think the reason > that the stack traces look weird in the PR's thread may be due to catching a > signal. When we were looking at the problems with libc_r on 4.x we would get > some weird looking backtraces sometimes when the assertion in uthread_sig.c > that I added failed. Seems that gdb doesn't handle the signal frames very > well. You also want to make sure you're not running out of stack space for your threads. Is the code trying to install signal frames on threads itself? That could cause the problems you are seeing. -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 20:39:03 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 392A416A4CF; Thu, 28 Oct 2004 20:39:03 +0000 (GMT) Received: from freebsd3.cimlogic.com.au (adsl-20-121.swiftdsl.com.au [218.214.20.121]) by mx1.FreeBSD.org (Postfix) with ESMTP id 500D743D48; Thu, 28 Oct 2004 20:39:02 +0000 (GMT) (envelope-from jb@cimlogic.com.au) Received: by freebsd3.cimlogic.com.au (Postfix, from userid 102) id 8C63A6A9BC; Fri, 29 Oct 2004 06:39:00 +1000 (EST) Date: Fri, 29 Oct 2004 06:39:00 +1000 From: John Birrell To: John Baldwin Message-ID: <20041028203900.GF47792@freebsd3.cimlogic.com.au> References: <200410281554.07222.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200410281554.07222.jhb@FreeBSD.org> User-Agent: Mutt/1.4.2.1i cc: Daniel Eischen cc: threads@FreeBSD.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 20:39:03 -0000 On Thu, Oct 28, 2004 at 03:54:07PM -0400, John Baldwin wrote: > We've started testing on -current and are seeing several problems with > libpthread. Using a UP kernel (machines have single processor with HTT) > seems to make it better, but we seem to be getting SIG 11's in > pthread_testcancel() as well as the failed lock assertions that were > mentioned earlier on the list in the PR. Just running monodevelop from the > bsd-sharp stuff mentioned earlier can break in that one of the processes dies > with the assertion failure. If you let the other processes run, then you can > run it again and get the window to pop up, but then clicking on any of the > controls results in the pthread_testcancel() crash. FWIW, I think the reason > that the stack traces look weird in the PR's thread may be due to catching a > signal. When we were looking at the problems with libc_r on 4.x we would get > some weird looking backtraces sometimes when the assertion in uthread_sig.c > that I added failed. Seems that gdb doesn't handle the signal frames very > well. I have a server running -current as of July 23 which runs a process that often SIG 11's in pthread_testcancel() too. I've never been able to make sense of the back trace because it always shows the initialisation path for a module, yet for the process to run and serve web requests, that initialisation path must have been completed. I've assumed there is a bug in my code elsewhere in the application and that GDB is telling me the truth. -- John Birrell From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 20:41:44 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0BEBB16A4CE; Thu, 28 Oct 2004 20:41:44 +0000 (GMT) Received: from lakermmtao07.cox.net (lakermmtao07.cox.net [68.230.240.32]) by mx1.FreeBSD.org (Postfix) with ESMTP id 47BD243D31; Thu, 28 Oct 2004 20:41:43 +0000 (GMT) (envelope-from mezz7@cox.net) Received: from mezz.mezzweb.com ([68.103.32.140]) by lakermmtao07.cox.net (InterMail vM.6.01.03.04 201-2131-111-106-20040729) with ESMTP id <20041028204136.KIBN22771.lakermmtao07.cox.net@mezz.mezzweb.com>; Thu, 28 Oct 2004 16:41:36 -0400 Date: Thu, 28 Oct 2004 15:42:00 -0500 To: "Daniel Eischen" References: From: "Jeremy Messenger" Content-Type: text/plain; format=flowed; delsp=yes; charset=us-ascii MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID: In-Reply-To: User-Agent: Opera M2/7.54 (Linux, build 751) cc: threads@freebsd.org cc: John Baldwin Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 20:41:44 -0000 On Wed, 27 Oct 2004 18:30:15 -0400 (EDT), Daniel Eischen wrote: > On Wed, 27 Oct 2004, John Baldwin wrote: > >> On Thursday 21 October 2004 07:04 pm, Daniel Eischen wrote: >> > On Thu, 21 Oct 2004, John Baldwin wrote: >> > > The behavior seems more to be this: >> > > >> > > - thread does pthread_cond_wait*(c1) >> > > - thread enqueued on c1 >> > > - thread interrupted by a signal while on c1 but still in PS_RUNNING >> > >> > This shouldn't happen when signals are deferred. It should >> > only happen when the state is PS_COND_WAIT after we've >> > context switched to the scheduler. >> > >> > > - thread saves state which excludes the PTHREAD_FLAGS_IN_CONDQ flag >> > > (among others) >> > >> > Right, because it assumes that the thread will be backed out of >> > any mutex or CV queues prior to invoking the signal handler. >> > >> > > - thread calls _cond_wait_backout() if state is PS_COND_WAIT (but >> it's >> > > not in - this case, this is the normal case though, which is why >> it's ok >> > > to not save the CONDQ flag in the saved state above) >> > >> > Right. The problem is, how is the thread getting setup for a signal >> > while signals are deferred and the state has not yet been changed >> > from PS_RUNNING to PS_COND_WAIT? >> > >> > > - thread executes signal handler >> > > - thread restores state >> > > - pthread_condwait*() see that interrupted is 0, so don't try to >> remove >> > > the thread from the condition variable (also, PTHREAD_FLAGS_IN_CONDQ >> > > isn't set either, so we can't detect this case that way) >> > > - thread returns from pthread_cond_wait() (maybe due to timeout, >> etc.) >> > > - thread calls pthread_cond_wait*(c2) >> > > - thread enqueued on c2 >> > > - another thread does pthread_cond_broadcast(c2), and bewm >> > > >> > > My question is is it possible for the thread to get interrupted and >> > > chosen to run a signal while it is on c1 somehow given my patch to >> defer >> > > signals around the wait loops (and is that patch correct btw given >> the >> > > above scenario?) >> > >> > Yes (and yes I think). Defering signals just means that the signal >> handler >> > won't try to install a signal frame on the current thread; instead it >> just >> > queues the signal and the scheduler will pick it up and send it to the >> > correct thread. >> > >> > I do think signals should be deferred for condition variables so >> > that setting the thread state (to PS_COND_WAIT) is atomic. >> > >> > It's not obvious to be where the bug is. If you had a simple >> > test case to reproduce it that would help. >> >> FWIW, we are having (I think) the same problem on 5.3 with libpthread. >> The >> panic there is in the mutex code about an assertion failing because a >> thread >> is on a syncq when it is not supposed to be. > > David and I recently fixed some races in pthread_join() and > pthread_exit() in -current libpthread. Don't know if those > were responsible... > > Here's a test program that shows correct behavior with both > libc_r and libpthread in -current. I am not a programmer, but wondering if this races fixed has to do with the SIGUSR1? Asking, because in lang/mono/files/patch-libgc_include_private_gcconfig.h is what fix the Mono build/install hang. If I remove this file and that races fixed should fix this? Maybe, I can backport the fixes in -CURRENT to RELENG_5 by myself and try to test it. What lang/mono/files/patch-libgc_include_private_gcconfig.h has look like this: ===================================== -# define SIG_SUSPEND SIGUSR1 -# define SIG_THR_RESTART SIGUSR2 +# define SIG_SUSPEND SIGTSTP +# define SIG_THR_RESTART SIGCONT ===================================== Cheers, Mezz -- mezz7@cox.net - mezz@FreeBSD.org FreeBSD GNOME Team http://www.FreeBSD.org/gnome/ - gnome@FreeBSD.org From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 20:49:11 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5E49C16A4CF; Thu, 28 Oct 2004 20:49:11 +0000 (GMT) Received: from lakermmtao01.cox.net (lakermmtao01.cox.net [68.230.240.38]) by mx1.FreeBSD.org (Postfix) with ESMTP id AD9B843D54; Thu, 28 Oct 2004 20:49:10 +0000 (GMT) (envelope-from mezz7@cox.net) Received: from mezz.mezzweb.com ([68.103.32.140]) by lakermmtao01.cox.net (InterMail vM.6.01.03.04 201-2131-111-106-20040729) with ESMTP id <20041028204901.LGIN4770.lakermmtao01.cox.net@mezz.mezzweb.com>; Thu, 28 Oct 2004 16:49:01 -0400 To: "Daniel Eischen" References: Message-ID: Date: Thu, 28 Oct 2004 15:49:21 -0500 From: "Jeremy Messenger" Content-Type: text/plain; format=flowed; delsp=yes; charset=us-ascii MIME-Version: 1.0 Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Opera M2/7.54 (Linux, build 751) cc: threads@freebsd.org cc: John Baldwin Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 20:49:11 -0000 On Thu, 28 Oct 2004 16:27:56 -0400 (EDT), Daniel Eischen wrote: > On Thu, 28 Oct 2004, John Baldwin wrote: > >> On Wednesday 27 October 2004 06:30 pm, Daniel Eischen wrote: >> > On Wed, 27 Oct 2004, John Baldwin wrote: >> > > >> > > FWIW, we are having (I think) the same problem on 5.3 with >> libpthread. >> > > The panic there is in the mutex code about an assertion failing >> because a >> > > thread is on a syncq when it is not supposed to be. >> > >> > David and I recently fixed some races in pthread_join() and >> > pthread_exit() in -current libpthread. Don't know if those >> > were responsible... >> > >> > Here's a test program that shows correct behavior with both >> > libc_r and libpthread in -current. >> >> We've started testing on -current and are seeing several problems with >> libpthread. Using a UP kernel (machines have single processor with HTT) >> seems to make it better, but we seem to be getting SIG 11's in >> pthread_testcancel() as well as the failed lock assertions that were >> mentioned earlier on the list in the PR. Just running monodevelop from >> the >> bsd-sharp stuff mentioned earlier can break in that one of the >> processes dies >> with the assertion failure. If you let the other processes run, then >> you can >> run it again and get the window to pop up, but then clicking on any of >> the >> controls results in the pthread_testcancel() crash. FWIW, I think the >> reason >> that the stack traces look weird in the PR's thread may be due to >> catching a >> signal. When we were looking at the problems with libc_r on 4.x we >> would get >> some weird looking backtraces sometimes when the assertion in >> uthread_sig.c >> that I added failed. Seems that gdb doesn't handle the signal frames >> very >> well. > > You also want to make sure you're not running out of stack space > for your threads. > > Is the code trying to install signal frames on threads itself? > That could cause the problems you are seeing. Does it has to do with those lines in Mono's threads.c? Does it looks fine? ${WRKSRC}/mono/io-layer/threads.c (264 to 292 line): ===================================================== /* Set a 2M stack size. This is the default on Linux, but BSD * needs it. (The original bug report from Martin Dvorak * set the size to 2M-4k. I don't know why it's short by 4k, so * I'm leaving it as 2M until I'm told differently.) */ thr_ret = pthread_attr_init(&attr); g_assert (thr_ret == 0); /* defaults of 2Mb for 32bits and 4Mb for 64bits */ if (stacksize == 0){ #if HAVE_VALGRIND_MEMCHECK_H if (RUNNING_ON_VALGRIND) stacksize = 1 << 20; else stacksize = (SIZEOF_VOID_P / 2) * 1024 * 1024; #else stacksize = (SIZEOF_VOID_P / 2) * 1024 * 1024; #endif } #ifdef HAVE_PTHREAD_ATTR_SETSTACKSIZE thr_ret = pthread_attr_setstacksize(&attr, stacksize); g_assert (thr_ret == 0); #endif ret=_wapi_timed_thread_create(&thread_private_handle->thread, &attr, create, start, thread_exit, param, handle); ===================================================== Cheers, Mezz -- mezz7@cox.net - mezz@FreeBSD.org FreeBSD GNOME Team http://www.FreeBSD.org/gnome/ - gnome@FreeBSD.org From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 20:57:38 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8D43216A4CE; Thu, 28 Oct 2004 20:57:38 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3731E43D53; Thu, 28 Oct 2004 20:57:38 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i9SKvYZv015235; Thu, 28 Oct 2004 16:57:34 -0400 (EDT) Date: Thu, 28 Oct 2004 16:57:34 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: John Birrell In-Reply-To: <20041028203900.GF47792@freebsd3.cimlogic.com.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: threads@freebsd.org cc: John Baldwin Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 20:57:38 -0000 On Fri, 29 Oct 2004, John Birrell wrote: > On Thu, Oct 28, 2004 at 03:54:07PM -0400, John Baldwin wrote: > > We've started testing on -current and are seeing several problems with > > libpthread. Using a UP kernel (machines have single processor with HTT) > > seems to make it better, but we seem to be getting SIG 11's in > > pthread_testcancel() as well as the failed lock assertions that were > > mentioned earlier on the list in the PR. Just running monodevelop from the > > bsd-sharp stuff mentioned earlier can break in that one of the processes dies > > with the assertion failure. If you let the other processes run, then you can > > run it again and get the window to pop up, but then clicking on any of the > > controls results in the pthread_testcancel() crash. FWIW, I think the reason > > that the stack traces look weird in the PR's thread may be due to catching a > > signal. When we were looking at the problems with libc_r on 4.x we would get > > some weird looking backtraces sometimes when the assertion in uthread_sig.c > > that I added failed. Seems that gdb doesn't handle the signal frames very > > well. > > I have a server running -current as of July 23 which runs a process that often > SIG 11's in pthread_testcancel() too. I've never been able to make sense of the > back trace because it always shows the initialisation path for a module, yet > for the process to run and serve web requests, that initialisation path must > have been completed. I've assumed there is a bug in my code elsewhere in the > application and that GDB is telling me the truth. Hmm, a [sig]longjmp() out of a signal handler to the context of a different thread? -- Dan From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 22:43:45 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AC1D616A4CE; Thu, 28 Oct 2004 22:43:45 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 571D343D64; Thu, 28 Oct 2004 22:43:45 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i9SMhiqc026987; Thu, 28 Oct 2004 18:43:44 -0400 (EDT) Date: Thu, 28 Oct 2004 18:43:44 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: John Baldwin In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: threads@freebsd.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 22:43:45 -0000 On Thu, 28 Oct 2004, Daniel Eischen wrote: > On Thu, 28 Oct 2004, John Baldwin wrote: > > > On Wednesday 27 October 2004 06:30 pm, Daniel Eischen wrote: > > > On Wed, 27 Oct 2004, John Baldwin wrote: > > > > > > > > FWIW, we are having (I think) the same problem on 5.3 with libpthread. > > > > The panic there is in the mutex code about an assertion failing because a > > > > thread is on a syncq when it is not supposed to be. > > > > > > David and I recently fixed some races in pthread_join() and > > > pthread_exit() in -current libpthread. Don't know if those > > > were responsible... > > > > > > Here's a test program that shows correct behavior with both > > > libc_r and libpthread in -current. > > > > We've started testing on -current and are seeing several problems with > > libpthread. Using a UP kernel (machines have single processor with HTT) > > seems to make it better, but we seem to be getting SIG 11's in > > pthread_testcancel() as well as the failed lock assertions that were > > mentioned earlier on the list in the PR. Just running monodevelop from the > > bsd-sharp stuff mentioned earlier can break in that one of the processes dies > > with the assertion failure. If you let the other processes run, then you can > > run it again and get the window to pop up, but then clicking on any of the > > controls results in the pthread_testcancel() crash. FWIW, I think the reason > > that the stack traces look weird in the PR's thread may be due to catching a > > signal. When we were looking at the problems with libc_r on 4.x we would get > > some weird looking backtraces sometimes when the assertion in uthread_sig.c > > that I added failed. Seems that gdb doesn't handle the signal frames very > > well. > > You also want to make sure you're not running out of stack space > for your threads. > > Is the code trying to install signal frames on threads itself? > That could cause the problems you are seeing. I went back to the monodoc test case in the PR. Running under the debugger gives this: (gdb) run /usr/local/lib/mono/1.0/mcs.exe -out:browser.exe ./browser.cs ./list.cs ./elabel.cs ./history.cs ./Contributions.cs ./XmlNodeWriter.cs -resource:./../monodoc.png,monodoc.png -resource:./browser.glade,browser.glade -pkg:gtkhtml-sharp -pkg:glade-sharp -r:System.Web.Services -r:./monodoc.dll Starting program: /usr/local/bin/mono /usr/local/lib/mono/1.0/mcs.exe -out:browser.exe ./browser.cs ./list.cs ./elabel.cs ./history.cs ./Contributions.cs ./XmlNodeWriter.cs -resource:./../monodoc.png,monodoc.png -resource:./browser.glade,browser.glade -pkg:gtkhtml-sharp -pkg:glade-sharp -r:System.Web.Services -r:./monodoc.dll [Switching to Thread 1 (LWP 100074)] Breakpoint 1, 0x0804862e in main () (gdb) cont Continuing. [Switching to Thread 4 (LWP 100128)] Breakpoint 2, 0x2842c801 in __assert () from /lib/libc.so.5 (gdb) bt #0 0x2842c801 in __assert () from /lib/libc.so.5 #1 0x2837ce4e in _lock_acquire (lck=0x8062f00, lu=0x8110e48, prio=674751930) at /opt/FreeBSD/src/lib/libpthread/sys/lock.c:171 #2 0x2837010b in mutex_lock_common (curthread=0x8110e00, m=0x28482434, abstime=0x0) at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:495 #3 0x28371677 in __pthread_mutex_lock (m=0x28482434) at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:796 #4 0x28171cc6 in WaitForSingleObjectEx (handle=0xe, timeout=500, alertable=0) at handles-private.h:97 #5 0x2816b116 in CreateProcess (appname=0xd, cmdline=0x8092ac4, process_attrs=0x0, thread_attrs=0x0, inherit_handles=1, create_flags=1024, new_environ=0x0, cwd=0x0, startup=0xbf8ec78c, process_info=0xbf8ec77c) at processes.c:427 #6 0x2813ef4f in ves_icall_System_Diagnostics_Process_Start_internal (appname=0x80f89d8, cmd=0x8092ab8, dirname=0x808ff30, stdin_handle=0x2837e5ba, stdout_handle=0x2837e5ba, stderr_handle=0x2837e5ba, process_info=0xbf8ec964) at process.c:870 #7 0x28f548ff in ?? () #8 0x080f89d8 in ?? () #9 0x08092ab8 in ?? () #10 0x0808ff30 in ?? () #11 0x00000009 in ?? () #12 0x0000000d in ?? () #13 0x0000000b in ?? () #14 0xbf8ec964 in ?? () #15 0x0812d420 in ?? () #16 0x0812d408 in ?? () #17 0x0820d300 in ?? () #18 0x0808ff30 in ?? () #19 0x08092ab8 in ?? () #20 0x080f89d8 in ?? () #21 0xbf8ec838 in ?? () #22 0x28f548cc in ?? () #23 0xbf8ec98c in ?? () #24 0x28f542aa in ?? () ---Type to continue, or q to quit--- #25 0x080f89d8 in ?? () #26 0x08092ab8 in ?? () #27 0x0808ff30 in ?? () #28 0x00000009 in ?? () #29 0x0000000d in ?? () #30 0x0000000b in ?? () #31 0xbf8ec964 in ?? () #32 0x28371bfe in mutex_unlock_common (m=0xb, add_reference=134818488) at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:984 Previous frame inner to this frame (corrupt stack?) (gdb) info threads 5 Thread 2 (LWP 100137) 0x2837bfd3 in kse_release () at kse_release.S:2 4 Thread 3 (sleeping) 0x28373d0f in _thr_sched_switch_unlocked (curthread=0x8110000) at pthread_md.h:225 * 3 Thread 4 (LWP 100128) 0x2842c801 in __assert () from /lib/libc.so.5 2 Thread 1 (sleeping) 0x28373d0f in _thr_sched_switch_unlocked (curthread=0x8053000) at pthread_md.h:225 (gdb) thread 3 [Switching to thread 3 (Thread 4 (LWP 100128))]#0 0x2842c801 in __assert () from /lib/libc.so.5 (gdb) frame 2 #2 0x2837010b in mutex_lock_common (curthread=0x8110e00, m=0x28482434, abstime=0x0) at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:495 495 THR_LOCK_ACQUIRE(curthread, &(*m)->m_lock); (gdb) print curthread->uniqueid $36 = 3 (gdb) print/x curthread->magic $37 = 0xd09ba115 (gdb) print/x **m $39 = {m_lock = {l_head = 0x7273752f, l_tail = 0x636f6c2f, l_type = 0x6c2f6c61, l_wait = 0x6d2f6269, l_wakeup = 0x726f6373}, m_type = 0x2e62696c, m_protocol = 0x7c6c6c64, m_queue = { tqh_first = 0x74737953, tqh_last = 0x522e6d65}, m_owner = 0x69746e75, m_flags = 0x532e656d, m_count = 0x61697265, m_refcount = 0x617a696c, m_prio = 0x6e6f6974, m_saved_prio = 0x6553492e, m_qe = {tqe_next = 0x6c616972, tqe_prev = 0x62617a69}} The thread seems to be correct, but the mutex is trashed. It's not a valid mutex and the lock type (l_type) does indeed have LCK_PRIORITY set. Note that libpthread doesn't create any locks of this type, so this trips the assertion failure. -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 22:49:29 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DFB7616A4CE; Thu, 28 Oct 2004 22:49:29 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id B4E6743D5D; Thu, 28 Oct 2004 22:49:29 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id 6A97B7A424; Thu, 28 Oct 2004 15:49:29 -0700 (PDT) Message-ID: <41817778.4070801@elischer.org> Date: Thu, 28 Oct 2004 15:49:28 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: David Xu References: <41804394.7020306@elischer.org> <41804D8E.2030003@freebsd.org> In-Reply-To: <41804D8E.2030003@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: Daniel Eischen cc: threads@freebsd.org cc: re@freebsd.org cc: John Baldwin Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables a nd signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 22:49:30 -0000 re, how about it? David Xu wrote: > Here is the cvs log: > > Revision Changes Path > 1.58 +1 -0 src/lib/libpthread/thread/thr_create.c > 1.14 +1 -1 src/lib/libpthread/thread/thr_find_thread.c > 1.115 +27 -10 src/lib/libpthread/thread/thr_kern.c > 1.119 +15 -11 src/lib/libpthread/thread/thr_private.h > 1.81 +1 -2 src/lib/libpthread/thread/thr_sig.c > > Julian Elischer wrote: > >> David, do you have revision numbers of what needs to be MFC'd? >> >> >> David Xu wrote: >> >> >>> Daniel Eischen wrote: >>> >>> >>>>> FWIW, we are having (I think) the same problem on 5.3 with >>>>> libpthread. The >>>>> >>>>> panic there is in the mutex code about an assertion failing >>>>> because a thread >>>>> is on a syncq when it is not supposed to be. >>>>> >>>> >>>> >>>> >>>> David and I recently fixed some races in pthread_join() and >>>> pthread_exit() in -current libpthread. Don't know if those >>>> were responsible... >>>> >>>> >>>> >>> >>> That fix should be MFCed ASAP. >>> >>> >>>> Here's a test program that shows correct behavior with both >>>> libc_r and libpthread in -current. >>>> From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 23:00:25 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6FC9016A4CE; Thu, 28 Oct 2004 23:00:25 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3092A43D46; Thu, 28 Oct 2004 23:00:25 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id 0B0E37A424; Thu, 28 Oct 2004 16:00:25 -0700 (PDT) Message-ID: <41817A08.9000706@elischer.org> Date: Thu, 28 Oct 2004 16:00:24 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: David Xu References: <41804394.7020306@elischer.org> <41804D8E.2030003@freebsd.org> In-Reply-To: <41804D8E.2030003@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: Daniel Eischen cc: threads@freebsd.org cc: re@freebsd.org cc: John Baldwin Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables a nd signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 23:00:25 -0000 David Xu wrote: > Here is the cvs log: > > Revision Changes Path > 1.58 +1 -0 src/lib/libpthread/thread/thr_create.c > 1.14 +1 -1 src/lib/libpthread/thread/thr_find_thread.c > 1.115 +27 -10 src/lib/libpthread/thread/thr_kern.c > 1.119 +15 -11 src/lib/libpthread/thread/thr_private.h > 1.81 +1 -2 src/lib/libpthread/thread/thr_sig.c commit message was: 1. Move thread list flags into new separate member, and atomically put DEAD thread on GC list, this closes a race between pthread_join and thr_cleanup. 2. Introduce a mutex to protect tcb initialization, tls allocation and deallocation code in rtld seems no lock protection or it is broken, under stress testing, memory is corrupted. translates to: > julian@julian:cvs diff -u > cvs server: Diffing . > Index: thr_create.c > =================================================================== > RCS file: /home/ncvs/src/lib/libpthread/thread/thr_create.c,v > retrieving revision 1.57 > diff -u -r1.57 thr_create.c > --- thr_create.c 12 Aug 2004 12:12:12 -0000 1.57 > +++ thr_create.c 28 Oct 2004 22:55:58 -0000 > @@ -234,6 +234,7 @@ > new_thread->specific_data_count = 0; > new_thread->cleanup = NULL; > new_thread->flags = 0; > + new_thread->tlflags = 0; > new_thread->continuation = NULL; > new_thread->wakeup_time.tv_sec = -1; > new_thread->lock_switch = 0; > Index: thr_find_thread.c > =================================================================== > RCS file: /home/ncvs/src/lib/libpthread/thread/thr_find_thread.c,v > retrieving revision 1.13 > diff -u -r1.13 thr_find_thread.c > --- thr_find_thread.c 17 Jul 2003 23:02:30 -0000 1.13 > +++ thr_find_thread.c 28 Oct 2004 22:55:58 -0000 > @@ -90,7 +90,7 @@ > if (curthread != NULL) > curthread->critical_count--; > if ((thread->refcount == 0) && > - (thread->flags & THR_FLAGS_GC_SAFE) != 0) > + (thread->tlflags & TLFLAGS_GC_SAFE) != 0) > THR_GCLIST_ADD(thread); > KSE_LOCK_RELEASE(curkse, &_thread_list_lock); > _kse_critical_leave(crit); > Index: thr_kern.c > =================================================================== > RCS file: /home/ncvs/src/lib/libpthread/thread/thr_kern.c,v > retrieving revision 1.112 > diff -u -r1.112 thr_kern.c > --- thr_kern.c 15 Aug 2004 16:28:05 -0000 1.112 > +++ thr_kern.c 28 Oct 2004 22:55:58 -0000 > @@ -139,6 +139,9 @@ > static struct thread_hash_head thr_hashtable[THREAD_HASH_QUEUES]; > #define THREAD_HASH(thrd) ((unsigned long)thrd % > THREAD_HASH_QUEUE > S) > > +/* Lock for thread tcb constructor/destructor */ > +static pthread_mutex_t _tcb_mutex; > + > #ifdef DEBUG_THREAD_KERN > static void dump_queues(struct kse *curkse); > #endif > @@ -166,7 +169,7 @@ > struct pthread_sigframe *psf); > static int thr_timedout(struct pthread *thread, struct timespec > *curtime); > static void thr_unlink(struct pthread *thread); > -static void thr_destroy(struct pthread *thread); > +static void thr_destroy(struct pthread *curthread, struct pthread > *thread); > static void thread_gc(struct pthread *thread); > static void kse_gc(struct pthread *thread); > static void kseg_gc(struct pthread *thread); > @@ -240,7 +243,7 @@ > _thr_stack_free(&thread->attr); > if (thread->specific != NULL) > free(thread->specific); > - thr_destroy(thread); > + thr_destroy(curthread, thread); > } > } > > @@ -285,14 +288,14 @@ > /* Free the free threads. */ > while ((thread = TAILQ_FIRST(&free_threadq)) != NULL) { > TAILQ_REMOVE(&free_threadq, thread, tle); > - thr_destroy(thread); > + thr_destroy(curthread, thread); > } > free_thread_count = 0; > > /* Free the to-be-gc'd threads. */ > while ((thread = TAILQ_FIRST(&_thread_gc_list)) != NULL) { > TAILQ_REMOVE(&_thread_gc_list, thread, gcle); > - thr_destroy(thread); > + thr_destroy(curthread, thread); > } > TAILQ_INIT(&gc_ksegq); > _gc_count = 0; > @@ -381,6 +384,7 @@ > if (_lock_init(&_thread_list_lock, LCK_ADAPTIVE, > _kse_lock_wait, _kse_lock_wakeup) != 0) > PANIC("Unable to initialize thread list lock"); > + _pthread_mutex_init(&_tcb_mutex, NULL); > active_kse_count = 0; > active_kseg_count = 0; > _gc_count = 0; > @@ -1204,7 +1208,6 @@ > thread->kseg = _kse_initial->k_kseg; > thread->kse = _kse_initial; > } > - thread->flags |= THR_FLAGS_GC_SAFE; > > /* > * We can't hold the thread list lock while holding the > @@ -1213,6 +1216,7 @@ > KSE_SCHED_UNLOCK(curkse, curkse->k_kseg); > DBG_MSG("Adding thread %p to GC list\n", thread); > KSE_LOCK_ACQUIRE(curkse, &_thread_list_lock); > + thread->tlflags |= TLFLAGS_GC_SAFE; > THR_GCLIST_ADD(thread); > KSE_LOCK_RELEASE(curkse, &_thread_list_lock); > if (sys_scope) { > @@ -1252,7 +1256,7 @@ > /* Check the threads waiting for GC. */ > for (td = TAILQ_FIRST(&_thread_gc_list); td != NULL; td = > td_next) { > td_next = TAILQ_NEXT(td, gcle); > - if ((td->flags & THR_FLAGS_GC_SAFE) == 0) > + if ((td->tlflags & TLFLAGS_GC_SAFE) == 0) > continue; > else if (((td->attr.flags & PTHREAD_SCOPE_SYSTEM) != 0) && > ((td->kse->k_kcb->kcb_kmbx.km_flags & KMF_DONE) == > 0)) { > @@ -2382,7 +2386,14 @@ > if ((thread == NULL) && > ((thread = malloc(sizeof(struct pthread))) != NULL)) { > bzero(thread, sizeof(struct pthread)); > - if ((thread->tcb = _tcb_ctor(thread, curthread == > NULL)) == NULL > ) { > + if (curthread) { > + _pthread_mutex_lock(&_tcb_mutex); > + thread->tcb = _tcb_ctor(thread, 0 /* not > initial tls */) > ; > + _pthread_mutex_unlock(&_tcb_mutex); > + } else { > + thread->tcb = _tcb_ctor(thread, 1 /* initial > tls */); > + } > + if (thread->tcb == NULL) { > free(thread); > thread = NULL; > } else { > @@ -2418,7 +2429,7 @@ > thread->name = NULL; > } > if ((curthread == NULL) || (free_thread_count >= > MAX_CACHED_THREADS)) { > - thr_destroy(thread); > + thr_destroy(curthread, thread); > } else { > /* Add the thread to the free thread list. */ > crit = _kse_critical_enter(); > @@ -2431,14 +2442,20 @@ > } > > static void > -thr_destroy(struct pthread *thread) > +thr_destroy(struct pthread *curthread, struct pthread *thread) > { > int i; > > for (i = 0; i < MAX_THR_LOCKLEVEL; i++) > _lockuser_destroy(&thread->lockusers[i]); > _lock_destroy(&thread->lock); > - _tcb_dtor(thread->tcb); > + if (curthread) { > + _pthread_mutex_lock(&_tcb_mutex); > + _tcb_dtor(thread->tcb); > + _pthread_mutex_unlock(&_tcb_mutex); > + } else { > + _tcb_dtor(thread->tcb); > + } > free(thread->siginfo); > free(thread); > } > Index: thr_private.h > =================================================================== > RCS file: /home/ncvs/src/lib/libpthread/thread/thr_private.h,v > retrieving revision 1.118 > diff -u -r1.118 thr_private.h > --- thr_private.h 7 Aug 2004 15:15:38 -0000 1.118 > +++ thr_private.h 28 Oct 2004 22:55:59 -0000 > @@ -753,9 +753,13 @@ > #define THR_FLAGS_IN_RUNQ 0x0004 /* in run queue using pqe link */ > #define THR_FLAGS_EXITING 0x0008 /* thread is exiting */ > #define THR_FLAGS_SUSPENDED 0x0010 /* thread is suspended */ > -#define THR_FLAGS_GC_SAFE 0x0020 /* thread safe for > cleaning */ > -#define THR_FLAGS_IN_TDLIST 0x0040 /* thread in all > thread list */ > -#define THR_FLAGS_IN_GCLIST 0x0080 /* thread in gc list */ > + > + /* Thread list flags; only set with thread list lock held. */ > +#define TLFLAGS_GC_SAFE 0x0001 /* thread safe for > cleaning */ > +#define TLFLAGS_IN_TDLIST 0x0002 /* thread in all > thread list */ > +#define TLFLAGS_IN_GCLIST 0x0004 /* thread in gc list */ > + int tlflags; > + > /* > * Base priority is the user setable and retrievable priority > * of the thread. It is only affected by explicit calls to > @@ -897,30 +901,30 @@ > * the gc list. > */ > #define THR_LIST_ADD(thrd) do { \ > - if (((thrd)->flags & THR_FLAGS_IN_TDLIST) == 0) { \ > + if (((thrd)->tlflags & TLFLAGS_IN_TDLIST) == 0) { \ > TAILQ_INSERT_HEAD(&_thread_list, thrd, tle); \ > _thr_hash_add(thrd); \ > - (thrd)->flags |= THR_FLAGS_IN_TDLIST; \ > + (thrd)->tlflags |= TLFLAGS_IN_TDLIST; \ > } \ > } while (0) > #define THR_LIST_REMOVE(thrd) do { \ > - if (((thrd)->flags & THR_FLAGS_IN_TDLIST) != 0) { \ > + if (((thrd)->tlflags & TLFLAGS_IN_TDLIST) != 0) { \ > TAILQ_REMOVE(&_thread_list, thrd, tle); \ > _thr_hash_remove(thrd); \ > - (thrd)->flags &= ~THR_FLAGS_IN_TDLIST; \ > + (thrd)->tlflags &= ~TLFLAGS_IN_TDLIST; \ > } \ > } while (0) > #define THR_GCLIST_ADD(thrd) do { \ > - if (((thrd)->flags & THR_FLAGS_IN_GCLIST) == 0) { \ > + if (((thrd)->tlflags & TLFLAGS_IN_GCLIST) == 0) { \ > TAILQ_INSERT_HEAD(&_thread_gc_list, thrd, gcle);\ > - (thrd)->flags |= THR_FLAGS_IN_GCLIST; \ > + (thrd)->tlflags |= TLFLAGS_IN_GCLIST; \ > _gc_count++; \ > } \ > } while (0) > #define THR_GCLIST_REMOVE(thrd) do { \ > - if (((thrd)->flags & THR_FLAGS_IN_GCLIST) != 0) { \ > + if (((thrd)->tlflags & TLFLAGS_IN_GCLIST) != 0) { \ > TAILQ_REMOVE(&_thread_gc_list, thrd, gcle); \ > - (thrd)->flags &= ~THR_FLAGS_IN_GCLIST; \ > + (thrd)->tlflags &= ~TLFLAGS_IN_GCLIST; \ > _gc_count--; \ > } \ > } while (0) > Index: thr_sig.c > =================================================================== > RCS file: /home/ncvs/src/lib/libpthread/thread/thr_sig.c,v > retrieving revision 1.79 > diff -u -r1.79 thr_sig.c > --- thr_sig.c 13 Jul 2004 22:52:11 -0000 1.79 > +++ thr_sig.c 28 Oct 2004 22:55:59 -0000 > @@ -1195,8 +1195,7 @@ > thr_sigframe_save(struct pthread *thread, struct pthread_sigframe *psf) > { > /* This has to initialize all members of the sigframe. */ > - psf->psf_flags = > - thread->flags & (THR_FLAGS_PRIVATE | THR_FLAGS_IN_TDLIST); > + psf->psf_flags = thread->flags & THR_FLAGS_PRIVATE; > psf->psf_interrupted = thread->interrupted; > psf->psf_timeout = thread->timeout; > psf->psf_state = thread->state; > julian@julian: > > Julian Elischer wrote: > >> David, do you have revision numbers of what needs to be MFC'd? >> >> >> David Xu wrote: >> >> >>> Daniel Eischen wrote: >>> >>> >>>>> FWIW, we are having (I think) the same problem on 5.3 with >>>>> libpthread. The >>>>> >>>>> panic there is in the mutex code about an assertion failing >>>>> because a thread >>>>> is on a syncq when it is not supposed to be. >>>>> >>>> >>>> >>>> >>>> David and I recently fixed some races in pthread_join() and >>>> pthread_exit() in -current libpthread. Don't know if those >>>> were responsible... >>>> >>>> >>>> >>> >>> That fix should be MFCed ASAP. >>> >>> >>>> Here's a test program that shows correct behavior with both >>>> libc_r and libpthread in -current. >>>> From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 23:08:44 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D127216A4CE; Thu, 28 Oct 2004 23:08:44 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7B2FC43D45; Thu, 28 Oct 2004 23:08:44 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i9SN8e67026728; Thu, 28 Oct 2004 19:08:40 -0400 (EDT) Date: Thu, 28 Oct 2004 19:08:40 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Julian Elischer In-Reply-To: <41817A08.9000706@elischer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: threads@freebsd.org cc: re@freebsd.org cc: David Xu cc: John Baldwin Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables a nd signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 23:08:45 -0000 On Thu, 28 Oct 2004, Julian Elischer wrote: > > > David Xu wrote: > > > Here is the cvs log: > > > > Revision Changes Path > > 1.58 +1 -0 src/lib/libpthread/thread/thr_create.c > > 1.14 +1 -1 src/lib/libpthread/thread/thr_find_thread.c > > 1.115 +27 -10 src/lib/libpthread/thread/thr_kern.c > > 1.119 +15 -11 src/lib/libpthread/thread/thr_private.h > > 1.81 +1 -2 src/lib/libpthread/thread/thr_sig.c > > commit message was: > 1. Move thread list flags into new separate member, and atomically > put DEAD thread on GC list, this closes a race between pthread_join > and thr_cleanup. > 2. Introduce a mutex to protect tcb initialization, tls allocation and > deallocation code in rtld seems no lock protection or it is broken, > under stress testing, memory is corrupted. > > > translates to: Yes, these look right. -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 23:21:08 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A597016A4CE; Thu, 28 Oct 2004 23:21:08 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 86EBA43D53; Thu, 28 Oct 2004 23:21:08 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id 628327A424; Thu, 28 Oct 2004 16:21:08 -0700 (PDT) Message-ID: <41817EE4.9080302@elischer.org> Date: Thu, 28 Oct 2004 16:21:08 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Daniel Eischen References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: threads@freebsd.org cc: re@freebsd.org cc: David Xu cc: John Baldwin Subject: MFC req for 5.x/5.3 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 23:21:08 -0000 Daniel Eischen wrote: >On Thu, 28 Oct 2004, Julian Elischer wrote: > > > >>David Xu wrote: >> >> >> >>>Here is the cvs log: >>> >>>Revision Changes Path >>> 1.58 +1 -0 src/lib/libpthread/thread/thr_create.c >>> 1.14 +1 -1 src/lib/libpthread/thread/thr_find_thread.c >>> 1.115 +27 -10 src/lib/libpthread/thread/thr_kern.c >>> 1.119 +15 -11 src/lib/libpthread/thread/thr_private.h >>> 1.81 +1 -2 src/lib/libpthread/thread/thr_sig.c >>> >>> >>commit message was: >>1. Move thread list flags into new separate member, and atomically >> put DEAD thread on GC list, this closes a race between pthread_join >> and thr_cleanup. >>2. Introduce a mutex to protect tcb initialization, tls allocation and >> deallocation code in rtld seems no lock protection or it is broken, >> under stress testing, memory is corrupted. >> >> >>translates to: >> [diff removed] >> >> > >Yes, these look right. > > > From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 23:29:49 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7762F16A4CE; Thu, 28 Oct 2004 23:29:49 +0000 (GMT) Received: from electra.cse.Buffalo.EDU (electra.cse.Buffalo.EDU [128.205.32.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0EE6B43D41; Thu, 28 Oct 2004 23:29:49 +0000 (GMT) (envelope-from kensmith@cse.Buffalo.EDU) Received: from electra.cse.Buffalo.EDU (kensmith@localhost [127.0.0.1]) i9SNTkIo010497; Thu, 28 Oct 2004 19:29:46 -0400 (EDT) Received: (from kensmith@localhost) by electra.cse.Buffalo.EDU (8.12.10/8.12.9/Submit) id i9SNTkHA010496; Thu, 28 Oct 2004 19:29:46 -0400 (EDT) Date: Thu, 28 Oct 2004 19:29:46 -0400 From: Ken Smith To: Julian Elischer Message-ID: <20041028232946.GA10099@electra.cse.Buffalo.EDU> References: <41804394.7020306@elischer.org> <41804D8E.2030003@freebsd.org> <41817778.4070801@elischer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <41817778.4070801@elischer.org> User-Agent: Mutt/1.4.1i cc: Daniel Eischen cc: threads@freebsd.org cc: re@freebsd.org cc: David Xu cc: John Baldwin Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables a nd signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 23:29:49 -0000 On Thu, Oct 28, 2004 at 03:49:28PM -0700, Julian Elischer wrote: > re, how about it? Give me an hour or two, yesterday was the first I saw of this so I need to research it a bit. Is that OK? > >>>>>FWIW, we are having (I think) the same problem on 5.3 with > >>>>>libpthread. The > >>>>> > >>>>>panic there is in the mutex code about an assertion failing > >>>>>because a thread > >>>>>is on a syncq when it is not supposed to be. Umm. Your patch changes only user-level code, correct? Please tell me you can only panic a debugging kernel with user-level code issues. -- Ken Smith - From there to here, from here to | kensmith@cse.buffalo.edu there, funny things are everywhere. | - Theodore Geisel | From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 23:40:05 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7D5F716A4CE; Thu, 28 Oct 2004 23:40:05 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 28EE843D45; Thu, 28 Oct 2004 23:40:05 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i9SNe1kJ003347; Thu, 28 Oct 2004 19:40:01 -0400 (EDT) Date: Thu, 28 Oct 2004 19:40:01 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Ken Smith In-Reply-To: <20041028232946.GA10099@electra.cse.Buffalo.EDU> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: threads@freebsd.org cc: re@freebsd.org cc: Julian Elischer cc: John Baldwin cc: David Xu Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables a nd signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 23:40:05 -0000 On Thu, 28 Oct 2004, Ken Smith wrote: > On Thu, Oct 28, 2004 at 03:49:28PM -0700, Julian Elischer wrote: > > > re, how about it? > > Give me an hour or two, yesterday was the first I saw of this so I > need to research it a bit. Is that OK? > > > >>>>>FWIW, we are having (I think) the same problem on 5.3 with > > >>>>>libpthread. The > > >>>>> > > >>>>>panic there is in the mutex code about an assertion failing > > >>>>>because a thread > > >>>>>is on a syncq when it is not supposed to be. > > Umm. Your patch changes only user-level code, correct? Please tell > me you can only panic a debugging kernel with user-level code issues. User-level panic by some assertions in libpthread which are caused by a race condition that this patch closes. -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 23:40:34 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7B72216A4CE; Thu, 28 Oct 2004 23:40:34 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 66DF543D2D; Thu, 28 Oct 2004 23:40:34 +0000 (GMT) (envelope-from davidxu@freebsd.org) Received: from [127.0.0.1] (davidxu@localhost [127.0.0.1]) i9SNeWwQ030648; Thu, 28 Oct 2004 23:40:33 GMT (envelope-from davidxu@freebsd.org) Message-ID: <41818397.8090303@freebsd.org> Date: Fri, 29 Oct 2004 07:41:11 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.7.2) Gecko/20041004 X-Accept-Language: en-us, en MIME-Version: 1.0 To: John Birrell References: <200410281554.07222.jhb@FreeBSD.org> <20041028203900.GF47792@freebsd3.cimlogic.com.au> In-Reply-To: <20041028203900.GF47792@freebsd3.cimlogic.com.au> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: Daniel Eischen cc: threads@freebsd.org cc: John Baldwin Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 23:40:34 -0000 John Birrell wrote: >On Thu, Oct 28, 2004 at 03:54:07PM -0400, John Baldwin wrote: > > >>We've started testing on -current and are seeing several problems with >>libpthread. Using a UP kernel (machines have single processor with HTT) >>seems to make it better, but we seem to be getting SIG 11's in >>pthread_testcancel() as well as the failed lock assertions that were >>mentioned earlier on the list in the PR. Just running monodevelop from the >>bsd-sharp stuff mentioned earlier can break in that one of the processes dies >>with the assertion failure. If you let the other processes run, then you can >>run it again and get the window to pop up, but then clicking on any of the >>controls results in the pthread_testcancel() crash. FWIW, I think the reason >>that the stack traces look weird in the PR's thread may be due to catching a >>signal. When we were looking at the problems with libc_r on 4.x we would get >>some weird looking backtraces sometimes when the assertion in uthread_sig.c >>that I added failed. Seems that gdb doesn't handle the signal frames very >>well. >> >> > >I have a server running -current as of July 23 which runs a process that often >SIG 11's in pthread_testcancel() too. I've never been able to make sense of the >back trace because it always shows the initialisation path for a module, yet >for the process to run and serve web requests, that initialisation path must >have been completed. I've assumed there is a bug in my code elsewhere in the >application and that GDB is telling me the truth. > > > It would be nice if you could provide some example code, even if the code may contains bug, it is still good for me to see how pthread_cancel can cause SIG 11, because pthread_cancel seems checking everything carefully. David Xu From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 23:44:11 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B345E16A4CE; Thu, 28 Oct 2004 23:44:11 +0000 (GMT) Received: from electra.cse.Buffalo.EDU (electra.cse.Buffalo.EDU [128.205.32.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 53D8443D45; Thu, 28 Oct 2004 23:44:11 +0000 (GMT) (envelope-from kensmith@cse.Buffalo.EDU) Received: from electra.cse.Buffalo.EDU (kensmith@localhost [127.0.0.1]) i9SNi5Io010883; Thu, 28 Oct 2004 19:44:05 -0400 (EDT) Received: (from kensmith@localhost) by electra.cse.Buffalo.EDU (8.12.10/8.12.9/Submit) id i9SNi5AV010882; Thu, 28 Oct 2004 19:44:05 -0400 (EDT) Date: Thu, 28 Oct 2004 19:44:05 -0400 From: Ken Smith To: Daniel Eischen Message-ID: <20041028234405.GC10099@electra.cse.Buffalo.EDU> References: <20041028232946.GA10099@electra.cse.Buffalo.EDU> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i cc: re@freebsd.org cc: John Baldwin cc: Julian Elischer cc: threads@freebsd.org cc: Ken Smith cc: David Xu Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables a nd signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 23:44:11 -0000 On Thu, Oct 28, 2004 at 07:40:01PM -0400, Daniel Eischen wrote: > On Thu, 28 Oct 2004, Ken Smith wrote: > > > On Thu, Oct 28, 2004 at 03:49:28PM -0700, Julian Elischer wrote: > > > > > re, how about it? > > > > Give me an hour or two, yesterday was the first I saw of this so I > > need to research it a bit. Is that OK? > > > > > >>>>>FWIW, we are having (I think) the same problem on 5.3 with > > > >>>>>libpthread. The > > > >>>>> > > > >>>>>panic there is in the mutex code about an assertion failing > > > >>>>>because a thread > > > >>>>>is on a syncq when it is not supposed to be. > > > > Umm. Your patch changes only user-level code, correct? Please tell > > me you can only panic a debugging kernel with user-level code issues. > > User-level panic by some assertions in libpthread which are caused > by a race condition that this patch closes. > Thank you. We use the word 'panic' for too many things one of which is much scarier than others. -- Ken Smith - From there to here, from here to | kensmith@cse.buffalo.edu there, funny things are everywhere. | - Theodore Geisel | From owner-freebsd-threads@FreeBSD.ORG Fri Oct 29 01:08:26 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3D30216A4E4; Fri, 29 Oct 2004 01:08:26 +0000 (GMT) Received: from electra.cse.Buffalo.EDU (electra.cse.Buffalo.EDU [128.205.32.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id B8D0A43D1F; Fri, 29 Oct 2004 01:08:25 +0000 (GMT) (envelope-from kensmith@cse.Buffalo.EDU) Received: from electra.cse.Buffalo.EDU (kensmith@localhost [127.0.0.1]) i9T18MIo012748; Thu, 28 Oct 2004 21:08:22 -0400 (EDT) Received: (from kensmith@localhost) by electra.cse.Buffalo.EDU (8.12.10/8.12.9/Submit) id i9T18MTi012747; Thu, 28 Oct 2004 21:08:22 -0400 (EDT) Date: Thu, 28 Oct 2004 21:08:22 -0400 From: Ken Smith To: Julian Elischer Message-ID: <20041029010822.GA12081@electra.cse.Buffalo.EDU> References: <41817EE4.9080302@elischer.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="x+6KMIRAuhnl3hBn" Content-Disposition: inline In-Reply-To: <41817EE4.9080302@elischer.org> User-Agent: Mutt/1.4.1i cc: Daniel Eischen cc: threads@freebsd.org cc: re@freebsd.org cc: David Xu cc: John Baldwin Subject: Re: MFC req for 5.x/5.3 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Oct 2004 01:08:26 -0000 --x+6KMIRAuhnl3hBn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 28, 2004 at 04:21:08PM -0700, Julian Elischer wrote: > Daniel Eischen wrote: > >On Thu, 28 Oct 2004, Julian Elischer wrote: > >>David Xu wrote: > >>>Here is the cvs log: > >>> > >>>Revision Changes Path > >>> 1.58 +1 -0 src/lib/libpthread/thread/thr_create.c > >>> 1.14 +1 -1 src/lib/libpthread/thread/thr_find_thread.c > >>> 1.115 +27 -10 src/lib/libpthread/thread/thr_kern.c > >>> 1.119 +15 -11 src/lib/libpthread/thread/thr_private.h > >>> 1.81 +1 -2 src/lib/libpthread/thread/thr_sig.c > >>> > >>commit message was: > >>1. Move thread list flags into new separate member, and atomically > >> put DEAD thread on GC list, this closes a race between pthread_join > >> and thr_cleanup. > >>2. Introduce a mutex to protect tcb initialization, tls allocation and > >> deallocation code in rtld seems no lock protection or it is broken, > >> under stress testing, memory is corrupted. > >> > >>translates to: > >> >=20 > [diff removed] >=20 > > > >Yes, these look right. > > Ok. If you have done a complete buildworld/installworld test on RELENG_5 with the patches you sent please MFC it to RELENG_5 and RELENG_5_3. If you haven't done a complete buildworld/installworld test with these patches please just MFC to RELENG_5 and let me know, I'll test it there before we do the jump to RELENG_5_3. Sorry for being this jumpy about it but this does look like a slightly complicated MFC. It looks like there had been other changes to libpthread/thread between the RELENG_5 branch and now that you are not MFC-ing at this point. I need to do tags slips with what hits RELENG_5_3 so I need to be a bit careful with what gets that far. And I know not everyone has a ton of machines around they can test the various stages on so if it's a bit hard for you to do the full buildworld/installworld tests I can take care of that for you. Thanks. --=20 Ken Smith - From there to here, from here to | kensmith@cse.buffalo.edu there, funny things are everywhere. | - Theodore Geisel | --x+6KMIRAuhnl3hBn Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (SunOS) iD8DBQFBgZgE/G14VSmup/YRAt6dAJwPX+3XqCnWiU7o1i/JhhrenbMcjgCgi7ns bSc7E/lCARFOE2l3d02GNxE= =oGVe -----END PGP SIGNATURE----- --x+6KMIRAuhnl3hBn-- From owner-freebsd-threads@FreeBSD.ORG Fri Oct 29 04:09:51 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B458216A4CE for ; Fri, 29 Oct 2004 04:09:51 +0000 (GMT) Received: from mail.freebsd.org.cn (dns3.freebsd.org.cn [61.129.66.75]) by mx1.FreeBSD.org (Postfix) with SMTP id B488043D46 for ; Fri, 29 Oct 2004 04:09:47 +0000 (GMT) (envelope-from delphij@frontfree.net) Received: (qmail 52611 invoked by uid 0); 29 Oct 2004 04:04:40 -0000 Received: from unknown (HELO beastie.frontfree.net) (219.239.98.7) by mail.freebsd.org.cn with SMTP; 29 Oct 2004 04:04:40 -0000 Received: from localhost (localhost.frontfree.net [127.0.0.1]) by beastie.frontfree.net (Postfix) with ESMTP id 252E91316ED; Fri, 29 Oct 2004 12:09:42 +0800 (CST) Received: from beastie.frontfree.net ([127.0.0.1]) by localhost (beastie.frontfree.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 04009-03; Fri, 29 Oct 2004 12:09:31 +0800 (CST) Received: by beastie.frontfree.net (Postfix, from userid 1001) id 2B4AD1314A1; Fri, 29 Oct 2004 12:09:31 +0800 (CST) Date: Fri, 29 Oct 2004 12:09:31 +0800 From: Xin LI To: Ken Smith Message-ID: <20041029040931.GA920@frontfree.net> References: <41817EE4.9080302@elischer.org> <20041029010822.GA12081@electra.cse.Buffalo.EDU> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Q68bSM7Ycu6FN28Q" Content-Disposition: inline In-Reply-To: <20041029010822.GA12081@electra.cse.Buffalo.EDU> User-Agent: Mutt/1.4.2.1i X-GPG-key-ID/Fingerprint: 0xCAEEB8C0 / 43B8 B703 B8DD 0231 B333 DC28 39FB 93A0 CAEE B8C0 X-GPG-Public-Key: http://www.delphij.net/delphij.asc X-Operating-System: FreeBSD beastie.frontfree.net 5.3-delphij FreeBSD 5.3-delphij #11: Tue Oct 26 14:12:03 CST 2004 delphij@beastie.frontfree.net:/usr/obj/usr/src/sys/BEASTIE i386 X-URL: http://www.delphij.net X-By: delphij@beastie.frontfree.net X-Location: Beijing, China X-Virus-Scanned: by amavisd-new at frontfree.net cc: re@freebsd.org cc: John Baldwin cc: Daniel Eischen cc: Julian Elischer cc: threads@freebsd.org cc: David Xu Subject: Re: MFC req for 5.x/5.3 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Oct 2004 04:09:51 -0000 --Q68bSM7Ycu6FN28Q Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi, Ken, On Thu, Oct 28, 2004 at 09:08:22PM -0400, Ken Smith wrote: > > >>>Revision Changes Path > > >>> 1.58 +1 -0 src/lib/libpthread/thread/thr_create.c > > >>> 1.14 +1 -1 src/lib/libpthread/thread/thr_find_thread.c > > >>> 1.115 +27 -10 src/lib/libpthread/thread/thr_kern.c > > >>> 1.119 +15 -11 src/lib/libpthread/thread/thr_private.h > > >>> 1.81 +1 -2 src/lib/libpthread/thread/thr_sig.c JFYI: I have tested that a RELENG_5 with ``cvs up -A'' in src/lib/libpthrea= d/ works quite well and I have tested these on my notebook (P4-M based). Cheers, --=20 Xin LI http://www.delphij.net/ See complete headers for GPG key and other information. --Q68bSM7Ycu6FN28Q Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (FreeBSD) iD8DBQFBgcJ7/cVsHxFZiIoRAmq7AJ9sXSlIe/5GyVvnFm/UvTFJRg/6UwCfYAaF m4z4yhIV0uBRyxfDs9RK0s4= =T3I0 -----END PGP SIGNATURE----- --Q68bSM7Ycu6FN28Q-- From owner-freebsd-threads@FreeBSD.ORG Fri Oct 29 04:52:01 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BC03116A4CE; Fri, 29 Oct 2004 04:52:01 +0000 (GMT) Received: from freebsd3.cimlogic.com.au (adsl-20-121.swiftdsl.com.au [218.214.20.121]) by mx1.FreeBSD.org (Postfix) with ESMTP id DE4C043D46; Fri, 29 Oct 2004 04:52:00 +0000 (GMT) (envelope-from jb@cimlogic.com.au) Received: by freebsd3.cimlogic.com.au (Postfix, from userid 102) id 638226A9BC; Fri, 29 Oct 2004 14:51:59 +1000 (EST) Date: Fri, 29 Oct 2004 14:51:59 +1000 From: John Birrell To: David Xu Message-ID: <20041029045159.GG47792@freebsd3.cimlogic.com.au> References: <200410281554.07222.jhb@FreeBSD.org> <20041028203900.GF47792@freebsd3.cimlogic.com.au> <41818397.8090303@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <41818397.8090303@freebsd.org> User-Agent: Mutt/1.4.2.1i cc: threads@freebsd.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Oct 2004 04:52:01 -0000 On Fri, Oct 29, 2004 at 07:41:11AM +0800, David Xu wrote: > It would be nice if you could provide some example code, even if the code > may contains bug, it is still good for me to see how pthread_cancel can > cause SIG 11, because pthread_cancel seems checking everything carefully. (Note that it is pthread_testcancel, not pthread_cancel, and it is libpthread makeing the call, not the application directly) Sorry, there is absolutely no way I can get a simple/example program. The application only does it when it is loaded serving data at about 4 Mb/s. I haven't seen the SIG 11 on any of my test systems and I *have* tried to create test loads. It only happens on the production server serving the real world. 8-( A second instance of the same program, running on the same server, serving a different group of users at a lower bit rate has never had the problem. -- John Birrell From owner-freebsd-threads@FreeBSD.ORG Fri Oct 29 20:12:34 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D66D016A4CE; Fri, 29 Oct 2004 20:12:34 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id ACAEB43D1D; Fri, 29 Oct 2004 20:12:34 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id 5B0A77A403; Fri, 29 Oct 2004 13:12:34 -0700 (PDT) Message-ID: <4182A431.2050001@elischer.org> Date: Fri, 29 Oct 2004 13:12:33 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Xin LI References: <41817EE4.9080302@elischer.org> <20041029010822.GA12081@electra.cse.Buffalo.EDU> <20041029040931.GA920@frontfree.net> In-Reply-To: <20041029040931.GA920@frontfree.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: re@freebsd.org cc: John Baldwin cc: Daniel Eischen cc: David Xu cc: threads@freebsd.org cc: Ken Smith Subject: Re: MFC req for 5.x/5.3 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Oct 2004 20:12:35 -0000 Xin LI wrote: >Hi, Ken, > >On Thu, Oct 28, 2004 at 09:08:22PM -0400, Ken Smith wrote: > > >>>>>>Revision Changes Path >>>>>>1.58 +1 -0 src/lib/libpthread/thread/thr_create.c >>>>>>1.14 +1 -1 src/lib/libpthread/thread/thr_find_thread.c >>>>>>1.115 +27 -10 src/lib/libpthread/thread/thr_kern.c >>>>>>1.119 +15 -11 src/lib/libpthread/thread/thr_private.h >>>>>>1.81 +1 -2 src/lib/libpthread/thread/thr_sig.c >>>>>> >>>>>> > >JFYI: I have tested that a RELENG_5 with ``cvs up -A'' in src/lib/libpthread/ >works quite well and I have tested these on my notebook (P4-M based). > though this is not equivalent to doing a -A.. there are a few minor differences not being MFC'd.. > >Cheers, > > From owner-freebsd-threads@FreeBSD.ORG Fri Oct 29 20:25:09 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C1E7716A4CE; Fri, 29 Oct 2004 20:25:09 +0000 (GMT) Received: from electra.cse.Buffalo.EDU (electra.cse.Buffalo.EDU [128.205.32.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 300EF43D3F; Fri, 29 Oct 2004 20:25:09 +0000 (GMT) (envelope-from kensmith@cse.Buffalo.EDU) Received: from electra.cse.Buffalo.EDU (kensmith@localhost [127.0.0.1]) i9TKOwIo010931; Fri, 29 Oct 2004 16:24:58 -0400 (EDT) Received: (from kensmith@localhost) by electra.cse.Buffalo.EDU (8.12.10/8.12.9/Submit) id i9TKOw7F010930; Fri, 29 Oct 2004 16:24:58 -0400 (EDT) Date: Fri, 29 Oct 2004 16:24:58 -0400 From: Ken Smith To: Julian Elischer Message-ID: <20041029202458.GE9533@electra.cse.Buffalo.EDU> References: <41817EE4.9080302@elischer.org> <20041029010822.GA12081@electra.cse.Buffalo.EDU> <20041029040931.GA920@frontfree.net> <4182A431.2050001@elischer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4182A431.2050001@elischer.org> User-Agent: Mutt/1.4.1i cc: re@freebsd.org cc: John Baldwin cc: Daniel Eischen cc: David Xu cc: threads@freebsd.org cc: Ken Smith Subject: Re: MFC req for 5.x/5.3 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Oct 2004 20:25:09 -0000 On Fri, Oct 29, 2004 at 01:12:33PM -0700, Julian Elischer wrote: > Xin LI wrote: > >Hi, Ken, > > > >On Thu, Oct 28, 2004 at 09:08:22PM -0400, Ken Smith wrote: > > > > > >>>>>>Revision Changes Path > >>>>>>1.58 +1 -0 src/lib/libpthread/thread/thr_create.c > >>>>>>1.14 +1 -1 src/lib/libpthread/thread/thr_find_thread.c > >>>>>>1.115 +27 -10 src/lib/libpthread/thread/thr_kern.c > >>>>>>1.119 +15 -11 src/lib/libpthread/thread/thr_private.h > >>>>>>1.81 +1 -2 src/lib/libpthread/thread/thr_sig.c > >>>>>> > >>>>>> > > > >JFYI: I have tested that a RELENG_5 with ``cvs up -A'' in > >src/lib/libpthread/ > >works quite well and I have tested these on my notebook (P4-M based). > > > > though this is not equivalent to doing a -A.. there are a few minor > differences not being MFC'd.. Yup. That's why I thought we would need to be a little bit careful with this, it's a bit more complicated than it first seems. There is a chance for example that a piece that's not being MFC-ed added an extra #include and if the new code that's being MFC-ed relies on that it can be a bit tough to catch with the first attempt. My asking for caution on this wasn't a reflection on Julian, I'd ask anyone to be this careful about this particular MFC because it doesn't look like it's a straight "MFC everything". And doing an "MFC everything" for a library like this is risky at the RC2 stage, it's possible pieces of what gets swept in could have an impact (possibly negative) on the packages that use it. It would be a bit of a gamble. Thanks for your work on this guys. Greatly appreciated. -- Ken Smith - From there to here, from here to | kensmith@cse.buffalo.edu there, funny things are everywhere. | - Theodore Geisel | From owner-freebsd-threads@FreeBSD.ORG Fri Oct 29 21:09:12 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 78E8016A4CE for ; Fri, 29 Oct 2004 21:09:12 +0000 (GMT) Received: from mail4.speakeasy.net (mail4.speakeasy.net [216.254.0.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id 48E8443D48 for ; Fri, 29 Oct 2004 21:09:12 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 9332 invoked from network); 29 Oct 2004 21:09:12 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 29 Oct 2004 21:09:11 -0000 Received: from [10.50.40.221] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i9TL97gj039257; Fri, 29 Oct 2004 17:09:07 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: David Xu Date: Fri, 29 Oct 2004 16:09:03 -0400 User-Agent: KMail/1.6.2 References: <20041028203900.GF47792@freebsd3.cimlogic.com.au> <41818397.8090303@freebsd.org> In-Reply-To: <41818397.8090303@freebsd.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200410291609.03243.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: Daniel Eischen cc: threads@FreeBSD.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Oct 2004 21:09:12 -0000 On Thursday 28 October 2004 07:41 pm, David Xu wrote: > John Birrell wrote: > >On Thu, Oct 28, 2004 at 03:54:07PM -0400, John Baldwin wrote: > >>We've started testing on -current and are seeing several problems with > >>libpthread. Using a UP kernel (machines have single processor with HTT) > >>seems to make it better, but we seem to be getting SIG 11's in > >>pthread_testcancel() as well as the failed lock assertions that were > >>mentioned earlier on the list in the PR. Just running monodevelop from > >> the bsd-sharp stuff mentioned earlier can break in that one of the > >> processes dies with the assertion failure. If you let the other > >> processes run, then you can run it again and get the window to pop up, > >> but then clicking on any of the controls results in the > >> pthread_testcancel() crash. FWIW, I think the reason that the stack > >> traces look weird in the PR's thread may be due to catching a signal. > >> When we were looking at the problems with libc_r on 4.x we would get > >> some weird looking backtraces sometimes when the assertion in > >> uthread_sig.c that I added failed. Seems that gdb doesn't handle the > >> signal frames very well. > > > >I have a server running -current as of July 23 which runs a process that > > often SIG 11's in pthread_testcancel() too. I've never been able to make > > sense of the back trace because it always shows the initialisation path > > for a module, yet for the process to run and serve web requests, that > > initialisation path must have been completed. I've assumed there is a bug > > in my code elsewhere in the application and that GDB is telling me the > > truth. > > It would be nice if you could provide some example code, even if the code > may contains bug, it is still good for me to see how pthread_cancel can > cause SIG 11, because pthread_cancel seems checking everything carefully. Unfortunately the only sample code I have right now is monodevelop built from the bsd-sharp stuff. I don't have any smaller samples. Note also that it's not pthread_cancel(), but pthread_testcancel(). -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-threads@FreeBSD.ORG Fri Oct 29 21:09:20 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 84FC616A4CF for ; Fri, 29 Oct 2004 21:09:20 +0000 (GMT) Received: from mail6.speakeasy.net (mail6.speakeasy.net [216.254.0.206]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0ECBA43D5A for ; Fri, 29 Oct 2004 21:09:20 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 17400 invoked from network); 29 Oct 2004 21:09:19 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 29 Oct 2004 21:09:18 -0000 Received: from [10.50.40.221] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i9TL97gl039257; Fri, 29 Oct 2004 17:09:14 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Daniel Eischen Date: Fri, 29 Oct 2004 16:56:25 -0400 User-Agent: KMail/1.6.2 References: In-Reply-To: MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200410291656.25609.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: threads@FreeBSD.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Oct 2004 21:09:20 -0000 On Thursday 28 October 2004 06:43 pm, Daniel Eischen wrote: > On Thu, 28 Oct 2004, Daniel Eischen wrote: > > On Thu, 28 Oct 2004, John Baldwin wrote: > > > On Wednesday 27 October 2004 06:30 pm, Daniel Eischen wrote: > > > > On Wed, 27 Oct 2004, John Baldwin wrote: > > > > > FWIW, we are having (I think) the same problem on 5.3 with > > > > > libpthread. The panic there is in the mutex code about an assertion > > > > > failing because a thread is on a syncq when it is not supposed to > > > > > be. > > > > > > > > David and I recently fixed some races in pthread_join() and > > > > pthread_exit() in -current libpthread. Don't know if those > > > > were responsible... > > > > > > > > Here's a test program that shows correct behavior with both > > > > libc_r and libpthread in -current. > > > > > > We've started testing on -current and are seeing several problems with > > > libpthread. Using a UP kernel (machines have single processor with > > > HTT) seems to make it better, but we seem to be getting SIG 11's in > > > pthread_testcancel() as well as the failed lock assertions that were > > > mentioned earlier on the list in the PR. Just running monodevelop from > > > the bsd-sharp stuff mentioned earlier can break in that one of the > > > processes dies with the assertion failure. If you let the other > > > processes run, then you can run it again and get the window to pop up, > > > but then clicking on any of the controls results in the > > > pthread_testcancel() crash. FWIW, I think the reason that the stack > > > traces look weird in the PR's thread may be due to catching a signal. > > > When we were looking at the problems with libc_r on 4.x we would get > > > some weird looking backtraces sometimes when the assertion in > > > uthread_sig.c that I added failed. Seems that gdb doesn't handle the > > > signal frames very well. > > > > You also want to make sure you're not running out of stack space > > for your threads. > > > > Is the code trying to install signal frames on threads itself? > > That could cause the problems you are seeing. > > I went back to the monodoc test case in the PR. Running under > the debugger gives this: > > (gdb) run /usr/local/lib/mono/1.0/mcs.exe -out:browser.exe ./browser.cs > ./list.cs ./elabel.cs ./history.cs > ./Contributions.cs ./XmlNodeWriter.cs > -resource:./../monodoc.png,monodoc.png > -resource:./browser.glade,browser.glade -pkg:gtkhtml-sharp -pkg:glade-sharp > -r:System.Web.Services -r:./monodoc.dll Starting program: > /usr/local/bin/mono /usr/local/lib/mono/1.0/mcs.exe -out:browser.exe > ./browser.cs ./list.cs > ./elabel.cs ./history.cs ./Contributions.cs > ./XmlNodeWriter.cs -resource:./../monodoc.png,monodoc.png > -resource:./browser.glade,browser.glade -pkg:gtkhtml-sharp > -pkg:glade-sharp -r:System.Web.Services -r:./monodoc.dll > [Switching to Thread 1 (LWP 100074)] > > Breakpoint 1, 0x0804862e in main () > (gdb) cont > Continuing. > [Switching to Thread 4 (LWP 100128)] > > Breakpoint 2, 0x2842c801 in __assert () from /lib/libc.so.5 > (gdb) bt > #0 0x2842c801 in __assert () from /lib/libc.so.5 > #1 0x2837ce4e in _lock_acquire (lck=0x8062f00, lu=0x8110e48, > prio=674751930) at /opt/FreeBSD/src/lib/libpthread/sys/lock.c:171 > #2 0x2837010b in mutex_lock_common (curthread=0x8110e00, m=0x28482434, > abstime=0x0) at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:495 > #3 0x28371677 in __pthread_mutex_lock (m=0x28482434) > at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:796 > #4 0x28171cc6 in WaitForSingleObjectEx (handle=0xe, timeout=500, > alertable=0) at handles-private.h:97 #5 0x2816b116 in CreateProcess > (appname=0xd, cmdline=0x8092ac4, process_attrs=0x0, thread_attrs=0x0, > inherit_handles=1, create_flags=1024, new_environ=0x0, cwd=0x0, > startup=0xbf8ec78c, process_info=0xbf8ec77c) at processes.c:427 > #6 0x2813ef4f in ves_icall_System_Diagnostics_Process_Start_internal > (appname=0x80f89d8, cmd=0x8092ab8, dirname=0x808ff30, > stdin_handle=0x2837e5ba, stdout_handle=0x2837e5ba, > stderr_handle=0x2837e5ba, process_info=0xbf8ec964) at process.c:870 #7 > 0x28f548ff in ?? () > #8 0x080f89d8 in ?? () > #9 0x08092ab8 in ?? () > #10 0x0808ff30 in ?? () > #11 0x00000009 in ?? () > #12 0x0000000d in ?? () > #13 0x0000000b in ?? () > #14 0xbf8ec964 in ?? () > #15 0x0812d420 in ?? () > #16 0x0812d408 in ?? () > #17 0x0820d300 in ?? () > #18 0x0808ff30 in ?? () > #19 0x08092ab8 in ?? () > #20 0x080f89d8 in ?? () > #21 0xbf8ec838 in ?? () > #22 0x28f548cc in ?? () > #23 0xbf8ec98c in ?? () > #24 0x28f542aa in ?? () > ---Type to continue, or q to quit--- > #25 0x080f89d8 in ?? () > #26 0x08092ab8 in ?? () > #27 0x0808ff30 in ?? () > #28 0x00000009 in ?? () > #29 0x0000000d in ?? () > #30 0x0000000b in ?? () > #31 0xbf8ec964 in ?? () > #32 0x28371bfe in mutex_unlock_common (m=0xb, add_reference=134818488) > at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:984 > Previous frame inner to this frame (corrupt stack?) > (gdb) info threads > 5 Thread 2 (LWP 100137) 0x2837bfd3 in kse_release () at kse_release.S:2 > 4 Thread 3 (sleeping) 0x28373d0f in _thr_sched_switch_unlocked > (curthread=0x8110000) at pthread_md.h:225 > * 3 Thread 4 (LWP 100128) 0x2842c801 in __assert () from /lib/libc.so.5 > 2 Thread 1 (sleeping) 0x28373d0f in _thr_sched_switch_unlocked > (curthread=0x8053000) at pthread_md.h:225 > (gdb) thread 3 > [Switching to thread 3 (Thread 4 (LWP 100128))]#0 0x2842c801 in __assert > () from /lib/libc.so.5 (gdb) frame 2 > #2 0x2837010b in mutex_lock_common (curthread=0x8110e00, m=0x28482434, > abstime=0x0) at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:495 > 495 THR_LOCK_ACQUIRE(curthread, &(*m)->m_lock); > (gdb) print curthread->uniqueid > $36 = 3 > (gdb) print/x curthread->magic > $37 = 0xd09ba115 > (gdb) print/x **m > $39 = {m_lock = {l_head = 0x7273752f, l_tail = 0x636f6c2f, l_type = > 0x6c2f6c61, l_wait = 0x6d2f6269, l_wakeup = 0x726f6373}, m_type = > 0x2e62696c, m_protocol = 0x7c6c6c64, m_queue = { tqh_first = 0x74737953, > tqh_last = 0x522e6d65}, m_owner = 0x69746e75, m_flags = 0x532e656d, m_count > = 0x61697265, m_refcount = 0x617a696c, m_prio = 0x6e6f6974, m_saved_prio = > 0x6553492e, m_qe = {tqe_next = 0x6c616972, tqe_prev = 0x62617a69}} > > The thread seems to be correct, but the mutex is trashed. It's not > a valid mutex and the lock type (l_type) does indeed have LCK_PRIORITY > set. Note that libpthread doesn't create any locks of this type, so > this trips the assertion failure. Actually, I think we are looking at a buffer overflow in mono. If you treat the mutex as a string and print it you get this: > echo "7273752f636f6c2f6c2f6c616d2f6269726f63732e62696c7c6c6c6474737953522e6d6569746e75532e656d61697265617a696c6e6f69746553492e,6c61697262617a69" | sed -e 's/../ 0x&/g' | dh rsu/col/l/lam/birocs.bil|lldtsySR.meitnuS.emaireazilnoiteSI. That looks like it is in network order but is a path name. Putting it back in host order gives: /usr/local/lib/mscorlib.dll|System.Runtime.Serialization.ISe Which is the filename and part of the class name for a .net class, so this is starting to look like mono overflowed some malloc'd buffer that happend to be next to the mutex. Actually, the string starts with the mutex, so it looks more like there is a bug in mono where it is treating a pthread_mutex_t as a char * or some such. Looks like it would be in mono_class_from_name() or something that it calls since that is the only place that string seems to come from. Oh, this is a known bug in mono-1.0 that has problems with freeing mutexes and using them afterwards. To see what we are doing you'll have to get mono-1.0.2 using the stuff from the bsd sharp project. We can look at this again ourselves though, since our problems may be another case of mono using a mutex after it has been freed. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-threads@FreeBSD.ORG Fri Oct 29 23:58:22 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 09C6816A4CE; Fri, 29 Oct 2004 23:58:22 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id EC60143D41; Fri, 29 Oct 2004 23:58:21 +0000 (GMT) (envelope-from davidxu@freebsd.org) Received: from [127.0.0.1] (davidxu@localhost [127.0.0.1]) i9TNwJsg020817; Fri, 29 Oct 2004 23:58:20 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4182D91F.6020603@freebsd.org> Date: Sat, 30 Oct 2004 07:58:23 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.2) Gecko/20040921 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Ken Smith References: <41817EE4.9080302@elischer.org> <20041029010822.GA12081@electra.cse.Buffalo.EDU> <20041029040931.GA920@frontfree.net> <4182A431.2050001@elischer.org> <20041029202458.GE9533@electra.cse.Buffalo.EDU> In-Reply-To: <20041029202458.GE9533@electra.cse.Buffalo.EDU> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: Daniel Eischen cc: threads@freebsd.org cc: re@freebsd.org cc: Julian Elischer cc: John Baldwin Subject: Re: MFC req for 5.x/5.3 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Oct 2004 23:58:22 -0000 Ken Smith wrote: >Yup. That's why I thought we would need to be a little bit careful >with this, it's a bit more complicated than it first seems. There >is a chance for example that a piece that's not being MFC-ed added >an extra #include and if the new code that's being MFC-ed relies on >that it can be a bit tough to catch with the first attempt. My >asking for caution on this wasn't a reflection on Julian, I'd ask >anyone to be this careful about this particular MFC because it doesn't >look like it's a straight "MFC everything". And doing an "MFC everything" >for a library like this is risky at the RC2 stage, it's possible pieces >of what gets swept in could have an impact (possibly negative) on the >packages that use it. It would be a bit of a gamble. > >Thanks for your work on this guys. Greatly appreciated. > > > Because the library in RELENG_5 can not pass my stress test: http://people.freebsd.org/~davidxu/thread_stress/joinstress.c I think it is a real world test case for web server like program, without this patches, I don't think libpthread can be used under heavily loaded environment. David Xu From owner-freebsd-threads@FreeBSD.ORG Sat Oct 30 00:20:49 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1A29716A4CE; Sat, 30 Oct 2004 00:20:49 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id D2EC943D49; Sat, 30 Oct 2004 00:20:48 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id 5B4967A427; Fri, 29 Oct 2004 17:20:48 -0700 (PDT) Message-ID: <4182DE60.8030805@elischer.org> Date: Fri, 29 Oct 2004 17:20:48 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: David Xu References: <41817EE4.9080302@elischer.org> <20041029010822.GA12081@electra.cse.Buffalo.EDU> <20041029040931.GA920@frontfree.net> <4182A431.2050001@elischer.org> <20041029202458.GE9533@electra.cse.Buffalo.EDU> <4182D91F.6020603@freebsd.org> In-Reply-To: <4182D91F.6020603@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: Daniel Eischen cc: threads@freebsd.org cc: Ken Smith cc: re@freebsd.org cc: John Baldwin Subject: Re: MFC req for 5.x/5.3 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Oct 2004 00:20:49 -0000 when compiling these against RELENG_5 I see a couple of warnings in libpthread.. cc -O -pipe -DPTHREAD_KERNEL -I/usr/src/lib/libpthread/../libc/include -I/usr/s rc/lib/libpthread/thread -I/usr/src/lib/libpthread/../../include -I/usr/src/lib /libpthread/arch/i386/include -I/usr/src/lib/libpthread/sys -I/usr/src/lib/libpt hread/../../libexec/rtld-elf -I/usr/src/lib/libpthread/../../libexec/rtld-elf/i3 86 -fno-builtin -D_LOCK_DEBUG -D_PTHREADS_INVARIANTS -Wall -I/usr/src/lib/libpth read/../libc/i386 -c /usr/src/lib/libpthread/thread/thr_kern.c /usr/src/lib/libpthread/thread/thr_kern.c: In function `_kse_alloc': /usr/src/lib/libpthread/thread/thr_kern.c:2204: warning: 'crit' might be used un initialized in this function cc -O -pipe -DPTHREAD_KERNEL -I/usr/src/lib/libpthread/../libc/include -I/usr/s rc/lib/libpthread/thread -I/usr/src/lib/libpthread/../../include -I/usr/src/lib /libpthread/arch/i386/include -I/usr/src/lib/libpthread/sys -I/usr/src/lib/libpt hread/../../libexec/rtld-elf -I/usr/src/lib/libpthread/../../libexec/rtld-elf/i3 86 -fno-builtin -D_LOCK_DEBUG -D_PTHREADS_INVARIANTS -Wall -I/usr/src/lib/libpth read/../libc/i386 -c /usr/src/lib/libpthread/thread/thr_mutex.c /usr/src/lib/libpthread/thread/thr_mutex.c: In function `_pthread_mutex_init': /usr/src/lib/libpthread/thread/thr_mutex.c:111: warning: 'type' might be used un initialized in this function /usr/src/lib/libpthread/thread/thr_mutex.c:112: warning: 'protocol' might be use d uninitialized in this function /usr/src/lib/libpthread/thread/thr_mutex.c:113: warning: 'ceiling' might be used uninitialized in this function /usr/src/lib/libpthread/thread/thr_mutex.c:114: warning: 'flags' might be used u ninitialized in this function cc -O -pipe -DPTHREAD_KERNEL -I/usr/src/lib/libpthread/../libc/include -I/usr/s rc/lib/libpthread/thread -I/usr/src/lib/libpthread/../../include -I/usr/src/lib /libpthread/arch/i386/include -I/usr/src/lib/libpthread/sys -I/usr/src/lib/libpt hread/../../libexec/rtld-elf -I/usr/src/lib/libpthread/../../libexec/rtld-elf/i3 86 -fno-builtin -D_LOCK_DEBUG -D_PTHREADS_INVARIANTS -Wall -I/usr/src/lib/libpth read/../libc/i386 -c /usr/src/lib/libpthread/thread/thr_sem.c /usr/src/lib/libpthread/thread/thr_sem.c: In function `_sem_init': /usr/src/lib/libpthread/thread/thr_sem.c:126: warning: assignment makes integer from pointer without a cast cc -O -pipe -DPTHREAD_KERNEL -I/usr/src/lib/libpthread/../libc/include -I/usr/s rc/lib/libpthread/thread -I/usr/src/lib/libpthread/../../include -I/usr/src/lib /libpthread/arch/i386/include -I/usr/src/lib/libpthread/sys -I/usr/src/lib/libpt hread/../../libexec/rtld-elf -I/usr/src/lib/libpthread/../../libexec/rtld-elf/i3 86 -fno-builtin -D_LOCK_DEBUG -D_PTHREADS_INVARIANTS -Wall -I/usr/src/lib/libpth read/../libc/i386 -c /usr/src/lib/libpthread/thread/thr_symbols.c In file included from /usr/src/lib/libpthread/../../libexec/rtld-elf/rtld.h:40, from /usr/src/lib/libpthread/thread/thr_symbols.c:37: /usr/src/lib/libpthread/../../libexec/rtld-elf/i386/rtld_machdep.h: In function `reloc_jmpslot': /usr/src/lib/libpthread/../../libexec/rtld-elf/i386/rtld_machdep.h:49: warning: implicit declaration of function `dbg' Dan/David.. any of these anything to worry about? if not I'll commit to RELENG_5 David Xu wrote: > Ken Smith wrote: > >> Yup. That's why I thought we would need to be a little bit careful >> with this, it's a bit more complicated than it first seems. There >> is a chance for example that a piece that's not being MFC-ed added >> an extra #include and if the new code that's being MFC-ed relies on >> that it can be a bit tough to catch with the first attempt. My >> asking for caution on this wasn't a reflection on Julian, I'd ask >> anyone to be this careful about this particular MFC because it doesn't >> look like it's a straight "MFC everything". And doing an "MFC >> everything" >> for a library like this is risky at the RC2 stage, it's possible pieces >> of what gets swept in could have an impact (possibly negative) on the >> packages that use it. It would be a bit of a gamble. >> >> Thanks for your work on this guys. Greatly appreciated. >> >> >> > Because the library in RELENG_5 can not pass my stress test: > http://people.freebsd.org/~davidxu/thread_stress/joinstress.c > I think it is a real world test case for web server like program, > without this patches, I don't think libpthread can be used under > heavily loaded environment. > > David Xu > > _______________________________________________ > freebsd-threads@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-threads > To unsubscribe, send any mail to > "freebsd-threads-unsubscribe@freebsd.org" From owner-freebsd-threads@FreeBSD.ORG Sat Oct 30 00:25:06 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 95FF916A4CF; Sat, 30 Oct 2004 00:25:06 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 826C843D4C; Sat, 30 Oct 2004 00:25:06 +0000 (GMT) (envelope-from davidxu@freebsd.org) Received: from [127.0.0.1] (davidxu@localhost [127.0.0.1]) i9U0P4xT026664; Sat, 30 Oct 2004 00:25:05 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4182DF64.2020505@freebsd.org> Date: Sat, 30 Oct 2004 08:25:08 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.2) Gecko/20040921 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Julian Elischer References: <41817EE4.9080302@elischer.org> <20041029010822.GA12081@electra.cse.Buffalo.EDU> <20041029040931.GA920@frontfree.net> <4182A431.2050001@elischer.org> <20041029202458.GE9533@electra.cse.Buffalo.EDU> <4182D91F.6020603@freebsd.org> <4182DE60.8030805@elischer.org> In-Reply-To: <4182DE60.8030805@elischer.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: Daniel Eischen cc: threads@freebsd.org cc: Ken Smith cc: re@freebsd.org cc: John Baldwin Subject: Re: MFC req for 5.x/5.3 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Oct 2004 00:25:06 -0000 I don't worrry about these, I saw them a long time ago. :-) Julian Elischer wrote: > when compiling these against RELENG_5 I see a couple of warnings in > libpthread.. > > cc -O -pipe -DPTHREAD_KERNEL > -I/usr/src/lib/libpthread/../libc/include -I/usr/s > rc/lib/libpthread/thread -I/usr/src/lib/libpthread/../../include > -I/usr/src/lib > /libpthread/arch/i386/include -I/usr/src/lib/libpthread/sys > -I/usr/src/lib/libpt > hread/../../libexec/rtld-elf > -I/usr/src/lib/libpthread/../../libexec/rtld-elf/i3 > 86 -fno-builtin -D_LOCK_DEBUG -D_PTHREADS_INVARIANTS -Wall > -I/usr/src/lib/libpth > read/../libc/i386 -c /usr/src/lib/libpthread/thread/thr_kern.c > /usr/src/lib/libpthread/thread/thr_kern.c: In function `_kse_alloc': > /usr/src/lib/libpthread/thread/thr_kern.c:2204: warning: 'crit' might > be used un > initialized in this function From owner-freebsd-threads@FreeBSD.ORG Sat Oct 30 04:56:48 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 23A3616A4CE; Sat, 30 Oct 2004 04:56:48 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id BD77D43D53; Sat, 30 Oct 2004 04:56:47 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i9U4uiOM000970; Sat, 30 Oct 2004 00:56:44 -0400 (EDT) Date: Sat, 30 Oct 2004 00:56:44 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Julian Elischer In-Reply-To: <4182DE60.8030805@elischer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: threads@freebsd.org cc: Ken Smith cc: re@freebsd.org cc: David Xu cc: John Baldwin Subject: Re: MFC req for 5.x/5.3 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Oct 2004 04:56:48 -0000 On Fri, 29 Oct 2004, Julian Elischer wrote: > when compiling these against RELENG_5 I see a couple of warnings in > libpthread.. > > cc -O -pipe -DPTHREAD_KERNEL -I/usr/src/lib/libpthread/../libc/include > -I/usr/s > rc/lib/libpthread/thread -I/usr/src/lib/libpthread/../../include > -I/usr/src/lib > /libpthread/arch/i386/include -I/usr/src/lib/libpthread/sys > -I/usr/src/lib/libpt > hread/../../libexec/rtld-elf > -I/usr/src/lib/libpthread/../../libexec/rtld-elf/i3 > 86 -fno-builtin -D_LOCK_DEBUG -D_PTHREADS_INVARIANTS -Wall > -I/usr/src/lib/libpth > read/../libc/i386 -c /usr/src/lib/libpthread/thread/thr_kern.c > /usr/src/lib/libpthread/thread/thr_kern.c: In function `_kse_alloc': > /usr/src/lib/libpthread/thread/thr_kern.c:2204: warning: 'crit' might be > used un > initialized in this function > > cc -O -pipe -DPTHREAD_KERNEL -I/usr/src/lib/libpthread/../libc/include > -I/usr/s > rc/lib/libpthread/thread -I/usr/src/lib/libpthread/../../include > -I/usr/src/lib > /libpthread/arch/i386/include -I/usr/src/lib/libpthread/sys > -I/usr/src/lib/libpt > hread/../../libexec/rtld-elf > -I/usr/src/lib/libpthread/../../libexec/rtld-elf/i3 > 86 -fno-builtin -D_LOCK_DEBUG -D_PTHREADS_INVARIANTS -Wall > -I/usr/src/lib/libpth > read/../libc/i386 -c /usr/src/lib/libpthread/thread/thr_mutex.c > /usr/src/lib/libpthread/thread/thr_mutex.c: In function > `_pthread_mutex_init': > /usr/src/lib/libpthread/thread/thr_mutex.c:111: warning: 'type' might be > used un > initialized in this function > /usr/src/lib/libpthread/thread/thr_mutex.c:112: warning: 'protocol' > might be use > d uninitialized in this function > /usr/src/lib/libpthread/thread/thr_mutex.c:113: warning: 'ceiling' might > be used > uninitialized in this function > /usr/src/lib/libpthread/thread/thr_mutex.c:114: warning: 'flags' might > be used u > ninitialized in this function > > cc -O -pipe -DPTHREAD_KERNEL -I/usr/src/lib/libpthread/../libc/include > -I/usr/s > rc/lib/libpthread/thread -I/usr/src/lib/libpthread/../../include > -I/usr/src/lib > /libpthread/arch/i386/include -I/usr/src/lib/libpthread/sys > -I/usr/src/lib/libpt > hread/../../libexec/rtld-elf > -I/usr/src/lib/libpthread/../../libexec/rtld-elf/i3 > 86 -fno-builtin -D_LOCK_DEBUG -D_PTHREADS_INVARIANTS -Wall > -I/usr/src/lib/libpth > read/../libc/i386 -c /usr/src/lib/libpthread/thread/thr_sem.c > /usr/src/lib/libpthread/thread/thr_sem.c: In function `_sem_init': > /usr/src/lib/libpthread/thread/thr_sem.c:126: warning: assignment makes > integer > from pointer without a cast > > cc -O -pipe -DPTHREAD_KERNEL -I/usr/src/lib/libpthread/../libc/include > -I/usr/s > rc/lib/libpthread/thread -I/usr/src/lib/libpthread/../../include > -I/usr/src/lib > /libpthread/arch/i386/include -I/usr/src/lib/libpthread/sys > -I/usr/src/lib/libpt > hread/../../libexec/rtld-elf > -I/usr/src/lib/libpthread/../../libexec/rtld-elf/i3 > 86 -fno-builtin -D_LOCK_DEBUG -D_PTHREADS_INVARIANTS -Wall > -I/usr/src/lib/libpth > read/../libc/i386 -c /usr/src/lib/libpthread/thread/thr_symbols.c > In file included from > /usr/src/lib/libpthread/../../libexec/rtld-elf/rtld.h:40, > from /usr/src/lib/libpthread/thread/thr_symbols.c:37: > /usr/src/lib/libpthread/../../libexec/rtld-elf/i386/rtld_machdep.h: In > function > `reloc_jmpslot': > /usr/src/lib/libpthread/../../libexec/rtld-elf/i386/rtld_machdep.h:49: > warning: > implicit declaration of function `dbg' > > Dan/David.. any of these anything to worry about? if not I'll commit to > RELENG_5 No, we get those on -current as well. -- Dan From owner-freebsd-threads@FreeBSD.ORG Sat Oct 30 22:05:24 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from [IPv6:::1] (hub.freebsd.org [216.136.204.18]) by hub.freebsd.org (Postfix) with ESMTP id 4789416A4CE; Sat, 30 Oct 2004 22:05:23 +0000 (GMT) Message-ID: <41840FB3.2020305@freebsd.org> Date: Sat, 30 Oct 2004 16:03:31 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.2) Gecko/20040929 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Julian Elischer References: <41817EE4.9080302@elischer.org> In-Reply-To: <41817EE4.9080302@elischer.org> X-Enigmail-Version: 0.86.1.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: Daniel Eischen cc: threads@freebsd.org cc: re@freebsd.org cc: David Xu cc: John Baldwin Subject: Re: MFC req for 5.x/5.3 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Oct 2004 22:05:25 -0000 Julian Elischer wrote: > > > Daniel Eischen wrote: > >> On Thu, 28 Oct 2004, Julian Elischer wrote: >> >> >> >>> David Xu wrote: >>> >>> >>> >>>> Here is the cvs log: >>>> >>>> Revision Changes Path >>>> 1.58 +1 -0 src/lib/libpthread/thread/thr_create.c >>>> 1.14 +1 -1 src/lib/libpthread/thread/thr_find_thread.c >>>> 1.115 +27 -10 src/lib/libpthread/thread/thr_kern.c >>>> 1.119 +15 -11 src/lib/libpthread/thread/thr_private.h >>>> 1.81 +1 -2 src/lib/libpthread/thread/thr_sig.c >>>> >>> >>> commit message was: >>> 1. Move thread list flags into new separate member, and atomically >>> put DEAD thread on GC list, this closes a race between pthread_join >>> and thr_cleanup. >>> 2. Introduce a mutex to protect tcb initialization, tls allocation and >>> deallocation code in rtld seems no lock protection or it is broken, >>> under stress testing, memory is corrupted. >>> >>> >>> translates to: >>> > > [diff removed] > >>> >> >> >> Yes, these look right. >> >> >> > Julian and all, I know that re@ approved these a few days ago, but we haven't seen any acticity and we need to get RC2 out so that SACK can get validated and we can turn to -RELEASE. I know it's very short notice, but I'm going to retract this MFC approval and instead ask that you only commit it to RELENG_5. Scott