Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 01 Oct 2004 11:22:33 +0800
From:      David Xu <davidxu@freebsd.org>
To:        Andrew Gallatin <gallatin@cs.duke.edu>
Cc:        freebsd-threads@freebsd.org
Subject:   Re: easy to reproduce unkillable threads
Message-ID:  <415CCD79.3030401@freebsd.org>
In-Reply-To: <16731.11515.504636.53058@grasshopper.cs.duke.edu>
References:  <16728.37731.540143.307772@grasshopper.cs.duke.edu> <41589B4A.9080508@elischer.org>	<415AB791.10809@freebsd.org> <16730.48642.4481.841374@grasshopper.cs.duke.edu> <415B13E8.2090205@elischer.org> <16731.6010.446877.347190@grasshopper.cs.duke.edu> <415B1ED6.8010809@elischer.org> <16731.11515.504636.53058@grasshopper.cs.duke.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Now, I can reproduce it on SMP.  However, if I :
sysctl -w kern.smp.forward_signal_enabled=0
then, I can always kill it. it sounds like another IPI bug.

David Xu

Andrew Gallatin wrote:

>I tried a -current kernel (w/o your patch) from today (still RELENG_5
>userland), and I still see the problem.
>
>% ssh scream 'skill -9 -u gallatin'
>Connection to scream closed by remote host.
>
>% ssh scream 'ssh scream 'ps axH | grep testc'
>  580  ??  SLs    0:00.01 csh -c ps axH | grep testc
>  586  ??  RL     0:00.00 grep testc
>  535  p0- WL     0:06.21 ./testcdev
>
>On scream's console, send break to debugger..:
>Stopped at      kdb_enter+0x30: leave
>db> ps
>  pid   proc     uarea   uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
>  548 c1a39c40 e67ee000 1387   547   548 0004002 [SLPQ ttyin 0xc1830c10][SLP] csh
>  547 c1a39a80 e67ed000 1387   545   545 0000100 [SLPQ select 0xc071aaa4][SLP] sshd
>  545 c1817000 e5556000    0   450   545 0000100 [SLPQ sbwait 0xc1991320][SLP] sshd
>  535 c1a34e00 e67e6000 1387     1   535 020c482 (threaded)  testcdev
>   thread 0xc164dc80 ksegrp 0xc15e57e0 [SUSP]
>  511 c1a34a80 e67e4000    0     1   511 0004002 [SLPQ ttyin 0xc1705810][SLP] getty
>
>db> trace 535
>sched_switch(c164dc80,c164daf0,1,4ec51334,ed18649a) at sched_switch+0x137
>mi_switch(1,c164daf0,0,0,0) at mi_switch+0x1d4
>thread_single(1,c164dc80,0,0,0) at thread_single+0x1d7
>exit1(c164dc80,9,0,0,c051996e) at exit1+0x115
>expand_name(c164dc80,9,100,0,0) at expand_name
>postsig(9,c164dc80,0,0,0) at postsig+0x204
>ast(e52d1d48) at ast+0x5e4
>doreti_ast() at doreti_ast+0x17
>db> c
>
>
>It seems to be just a problem with skill -9.  skill -2 works fine.
>
>As I said before, libthr seems to behave differently.  Rather than a
>lingering thread, the polling thread (doing the while(1)) is stuck on
>the CPU (using 100% of one cpu in a dual system), and the thread which
>was doing the cv_wait() is stuck with the exact same stack as above:
>
> 629 c1a1da80 e67e7000 1387     1   629 0004482 (threaded)  testcdev
>   thread 0xc164dc80 ksegrp 0xc15e54d0 [SUSP]
>   thread 0xc1879af0 ksegrp 0xc15e54d0 [CPU 1]
>
>db> trace 629
>sched_switch(c164dc80,0,1,b5d71f28,b4e1d6c8) at sched_switch+0x137
>mi_switch(1,0,c1870880,c164dc80,c164dc80) at mi_switch+0x1d4
>thread_single(1,c164dc80,e52d1c54,c1b14100,c164dc80) at thread_single+0x1d7
>exit1(c164dc80,9,0,e52d1ce4,c051996e) at exit1+0x115
>expand_name(c164dc80,9,100,0,0) at expand_name
>postsig(9,246,c06e7bd0,36,bfafefb4) at postsig+0x1a4
>ast(e52d1d48) at ast+0x5e4
>doreti_ast() at doreti_ast+0x17
>db> c
>
>
>Drew
>
>
>  
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?415CCD79.3030401>