From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 01:53:29 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 606D016A4CE for ; Fri, 17 Sep 2004 01:53:29 +0000 (GMT) Received: from mail1.speakeasy.net (mail1.speakeasy.net [216.254.0.201]) by mx1.FreeBSD.org (Postfix) with ESMTP id 37BA443D54 for ; Fri, 17 Sep 2004 01:53:29 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 31772 invoked from network); 17 Sep 2004 01:53:28 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 17 Sep 2004 01:53:28 -0000 Received: from slimer.baldwin.cx (slimer.baldwin.cx [192.168.0.16]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8H1rNN2002471; Thu, 16 Sep 2004 21:53:26 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Andrew Gallatin Date: Thu, 16 Sep 2004 13:16:43 -0400 User-Agent: KMail/1.6.2 References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <414942B3.1060703@elischer.org> <16713.38977.864343.415015@grasshopper.cs.duke.edu> In-Reply-To: <16713.38977.864343.415015@grasshopper.cs.duke.edu> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200409161316.43010.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: Julian Elischer cc: freebsd-threads@FreeBSD.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 01:53:29 -0000 On Thursday 16 September 2004 09:42 am, Andrew Gallatin wrote: > Julian Elischer writes: > > Andrew, please try -current on ts own now.. > > I have checked in some fixes that have helped others. > > OK, preemption off... Still a system lockup, but a little different. > > The interesting thing here is that continuing and breaking into the > debugger repeatedly seems to show that thread 0xc1646af0 is looping in > exit. I've seen him in thread_single, thread_suspend_check, and in > exit itself at kern_exit.c:163, etc. A breakpoint in > thread_suspend_one never triggers, so I guess he's holding the proc > lock and just looping forever. A breakpoint in _mtx_assert() shows > him asserting the proc lock in thread_suspend_check at kern_thread.c:898. > Over and over. There is definitely some sort of infinite loop here. Stripping out the comments in exit1() for that section of code reveals basically: PROC_LOCK(p); if (p->p_flag & P_HADTHREADS) { retry: thread_suspend_check(0); if (thread_single(SINGLE_EXIT)) goto retry; } p->p_flag |= P_WEXIT; PROC_UNLOCK(p); So it's easy to see how it can stuck in a loop I think. If thread_single() never drops the lock then other threads that are waiting to die can't actually wait because they can never get the proc lock so that they can die. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org