From owner-freebsd-threads@FreeBSD.ORG Tue Sep 14 14:34:49 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5ED3E16A4CE; Tue, 14 Sep 2004 14:34:49 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id DB68543D41; Tue, 14 Sep 2004 14:34:46 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i8EEYiJt018443 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 14 Sep 2004 10:34:44 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i8EEYdUc068729; Tue, 14 Sep 2004 10:34:39 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16711.383.448500.578640@grasshopper.cs.duke.edu> Date: Tue, 14 Sep 2004 10:34:39 -0400 (EDT) To: Julian Elischer In-Reply-To: <4146AAC1.5020701@elischer.org> References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Sep 2004 14:34:49 -0000 Julian Elischer writes: > Andrew Gallatin wrote: > > Julian Elischer writes: > > > > > > > >Maybe this would be easier to debug if I disabled preemption? > > > > > > > > > > > > > I think that this would possibly GO AWAY of you disab;ed preemption. > > > which would make it very hard to debug :-) > > > > > > > Yes and no. You initially asked me to try in -current because of > > some changes you'd made to the exit code. RELENG_5 (with the old > > exit code and no preemption) shows a different problem (proc is > > just not killable). If the proc was killable without preemption, > > that would at least show your new code is better.. > > try the attached diff: > This is worse.. Its worse in that the application never starts running fully, and that it seems to ignore signals entirely. I can't attach a debugger to it to see how far it got before hanging due to the signal problem. When it hangs, (both before and after a signal is sent) the CPU utilization is 0%.. Before its sent a signal, it looks like this: 573 c1f3b8c0 e88ae000 1387 517 573 000c082 (threaded) mx_pingpong thread 0xc1f3e320 ksegrp 0xc19ead20 [RUNQ] thread 0xc1f3e4b0 ksegrp 0xc19ead20 [RUNQ] thread 0xc1f3e640 ksegrp 0xc19eaaf0 [SLPQ ksesigwait 0xc1f3b9c0][SLP] db> call db_trace_thread(0xc1f3e320, -1) sched_switch(c1f3e320,0,1,1862ccb2,994777d8) at sched_switch+0x137 mi_switch(1,0,c05fdf59,804c000,c2b8c2ec) at mi_switch+0x1ce turnstile_wait(c1a518c0,c06c53e0,c1a4d7d0,0,1) at turnstile_wait+0x339 _mtx_lock_sleep(c06c53e0,c1f3e320,0,0,0) at _mtx_lock_sleep+0x122 vm_fault(c187a5dc,804c000,1,0,0) at vm_fault+0x214 trap_pfault(e88b8d48,1,804c800,3,804c800) at trap_pfault+0x136 trap(2f,2f,2f,805d13c,805d13c) at trap+0x201 calltrap() at calltrap+0x5 --- trap 0xc, eip = 0x804c800, esp = 0xbfbfe66c, ebp = 0xbfbfe678 --- 0 db> call db_trace_thread(0xc1f3e4b0, -1) sched_switch(c1f3e4b0,0,1,f0007932,9935c3e9) at sched_switch+0x137 mi_switch(1,0,c19ead60,e88bbc5c,c1f3e4b0) at mi_switch+0x1ce sleepq_switch(c19ead60,c1f3e4b0,0,e88bbc94,c04e5da6) at sleepq_switch+0x171 sleepq_timedwait_sig(c19ead60,0,c1f3b92c,c0677640,100) at sleepq_timedwait_sig+0x13 msleep(c19ead60,c1f3b92c,168,c0677640,1771) at msleep+0x37b kse_release(c1f3e4b0,e88bbd14,4,c04c47ab,0) at kse_release+0x29b syscall(2f,2f,2f,8054200,0) at syscall+0x2fc Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280a3d4f, esp = 0x8194f80, ebp = 0x8194fbc --- 0 db> call db_trace_thread(0xc1f3e640, -1) sched_switch(c1f3e640,0,1,bc7c14b2,97d6ec54) at sched_switch+0x137 mi_switch(1,0,0,0,0) at mi_switch+0x1ce sleepq_switch(c1f3b9c0,c1f3e640,0,e88bec94,c04e5da6) at sleepq_switch+0x171 sleepq_timedwait_sig(c1f3b9c0,0,0,0,0) at sleepq_timedwait_sig+0x13 msleep(c1f3b9c0,c1f3b92c,168,c0677635,bb9) at msleep+0x37b kse_release(c1f3e640,e88bed14,4,c04c47ab,0) at kse_release+0x1a1 syscall(2f,2f,2f,1,81) at syscall+0x2fc Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280a3d4f, esp = 0xbfafef30, ebp = 0xbfafef8c --- 0 A different run, but after sending it a ^C from the command line: 547 c1f3b1c0 e88aa000 0 1 547 000c482 (threaded) mx_pingpong thread 0xc1f3e960 ksegrp 0xc19eaee0 [RUNQ] thread 0xc1f3eaf0 ksegrp 0xc19eaee0 [RUNQ] thread 0xc1f3ec80 ksegrp 0xc19eab60 [SUSP] db> call db_trace_thread(0xc1f3e960, -1) sched_switch(c1f3e960,0,2,e7ff39b6,d6d80c8c) at sched_switch+0x137 mi_switch(2,0,0,0,0) at mi_switch+0x1ce ast(e88c4d48) at ast+0x4eb doreti_ast() at doreti_ast+0x17 0 db> call db_trace_thread(0xc1f3eaf0, -1) sched_switch(c1f3eaf0,0,1,6e2ca4e6,d6924d2f) at sched_switch+0x137 mi_switch(1,0,c19eaf20,e88c7c5c,c1f3eaf0) at mi_switch+0x1ce sleepq_switch(c19eaf20,c1f3eaf0,0,e88c7c94,c04e5da6) at sleepq_switch+0x171 sleepq_timedwait_sig(c19eaf20,0,c1f3b22c,c0677640,100) at sleepq_timedwait_sig+0x13 msleep(c19eaf20,c1f3b22c,168,c0677640,1771) at msleep+0x37b kse_release(c1f3eaf0,e88c7d14,4,c04c47ab,0) at kse_release+0x29b syscall(2f,2f,2f,8054200,0) at syscall+0x2fc Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280a3d4f, esp = 0x8194f80, ebp = 0x8194fbc --- 0 db> call db_trace_thread(0xc1f3ec80, -1) sched_switch(c1f3ec80,0,1,26e24232,4249ca0b) at sched_switch+0x137 mi_switch(1,0,0,0,0) at mi_switch+0x1ce thread_single(1,c1f3ec80,c1f3b1c0,e88cac5c,c0500581) at thread_single+0x1d7 exit1(c1f3ec80,2,e88cacb8,c04f1736,0) at exit1+0x115 expand_name(c1f3ec80,2,c1f3ec80,e88cad48,0) at expand_name kse_thr_interrupt(c1f3ec80,e88cad14,c,c1f3ec80,e88cad3c) at kse_thr_interrupt+0x329 syscall(2f,2f,2f,8054100,805a800) at syscall+0x2fc Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (382, FreeBSD ELF32, kse_thr_interrupt), eip = 0x280a3d6f, esp = 0xbfafee60, ebp = 0xbfafeefc --- 0 If you want line number translations, please let me know. I saved the kernel that this came from and also took a dump. Drew