From owner-freebsd-threads@FreeBSD.ORG Wed Sep 29 21:45:40 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3CF5116A4CE; Wed, 29 Sep 2004 21:45:40 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id B934143D45; Wed, 29 Sep 2004 21:45:39 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i8TLjaJt021317 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 29 Sep 2004 17:45:36 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i8TLjVGD005140; Wed, 29 Sep 2004 17:45:31 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16731.11515.504636.53058@grasshopper.cs.duke.edu> Date: Wed, 29 Sep 2004 17:45:31 -0400 (EDT) To: Julian Elischer In-Reply-To: <415B1ED6.8010809@elischer.org> References: <16728.37731.540143.307772@grasshopper.cs.duke.edu> <41589B4A.9080508@elischer.org> <415AB791.10809@freebsd.org> <16730.48642.4481.841374@grasshopper.cs.duke.edu> <415B13E8.2090205@elischer.org> <16731.6010.446877.347190@grasshopper.cs.duke.edu> <415B1ED6.8010809@elischer.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: David Xu cc: freebsd-threads@freebsd.org Subject: Re: easy to reproduce unkillable threads X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 21:45:40 -0000 I tried a -current kernel (w/o your patch) from today (still RELENG_5 userland), and I still see the problem. % ssh scream 'skill -9 -u gallatin' Connection to scream closed by remote host. % ssh scream 'ssh scream 'ps axH | grep testc' 580 ?? SLs 0:00.01 csh -c ps axH | grep testc 586 ?? RL 0:00.00 grep testc 535 p0- WL 0:06.21 ./testcdev On scream's console, send break to debugger..: Stopped at kdb_enter+0x30: leave db> ps pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd 548 c1a39c40 e67ee000 1387 547 548 0004002 [SLPQ ttyin 0xc1830c10][SLP] csh 547 c1a39a80 e67ed000 1387 545 545 0000100 [SLPQ select 0xc071aaa4][SLP] sshd 545 c1817000 e5556000 0 450 545 0000100 [SLPQ sbwait 0xc1991320][SLP] sshd 535 c1a34e00 e67e6000 1387 1 535 020c482 (threaded) testcdev thread 0xc164dc80 ksegrp 0xc15e57e0 [SUSP] 511 c1a34a80 e67e4000 0 1 511 0004002 [SLPQ ttyin 0xc1705810][SLP] getty db> trace 535 sched_switch(c164dc80,c164daf0,1,4ec51334,ed18649a) at sched_switch+0x137 mi_switch(1,c164daf0,0,0,0) at mi_switch+0x1d4 thread_single(1,c164dc80,0,0,0) at thread_single+0x1d7 exit1(c164dc80,9,0,0,c051996e) at exit1+0x115 expand_name(c164dc80,9,100,0,0) at expand_name postsig(9,c164dc80,0,0,0) at postsig+0x204 ast(e52d1d48) at ast+0x5e4 doreti_ast() at doreti_ast+0x17 db> c It seems to be just a problem with skill -9. skill -2 works fine. As I said before, libthr seems to behave differently. Rather than a lingering thread, the polling thread (doing the while(1)) is stuck on the CPU (using 100% of one cpu in a dual system), and the thread which was doing the cv_wait() is stuck with the exact same stack as above: 629 c1a1da80 e67e7000 1387 1 629 0004482 (threaded) testcdev thread 0xc164dc80 ksegrp 0xc15e54d0 [SUSP] thread 0xc1879af0 ksegrp 0xc15e54d0 [CPU 1] db> trace 629 sched_switch(c164dc80,0,1,b5d71f28,b4e1d6c8) at sched_switch+0x137 mi_switch(1,0,c1870880,c164dc80,c164dc80) at mi_switch+0x1d4 thread_single(1,c164dc80,e52d1c54,c1b14100,c164dc80) at thread_single+0x1d7 exit1(c164dc80,9,0,e52d1ce4,c051996e) at exit1+0x115 expand_name(c164dc80,9,100,0,0) at expand_name postsig(9,246,c06e7bd0,36,bfafefb4) at postsig+0x1a4 ast(e52d1d48) at ast+0x5e4 doreti_ast() at doreti_ast+0x17 db> c Drew