Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Sep 2004 12:01:27 -0400 (EDT)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        freebsd-threads@freebsd.org
Subject:   Unkillable KSE threaded proc
Message-ID:  <16703.11479.679335.588170@grasshopper.cs.duke.edu>

next in thread | raw e-mail | index | archive | help

If I send a kill -9 to a threaded process in a creative way, I see it
get stuck forever exiting.  (run from a /bin/sh script,
killed via ssh $MACHINE skill -9 -u gallatin)

It shows up in a ddb ps like this:

3403 c1652540 e52fe000 1387     1  3401 000c402 (threaded) mx_pingpong
   thread 0xc2de4c60 ksegrp 0xc15b2200 [SUSP]

Doing a trace shows what I assume is the main thread waiting for
the other thread to exit:

db> tr 3403
sched_switch(c2de4c60,0,41508ec8,a87f7f6d,ffc00014) at sched_switch+0xa5
mi_switch(1,0,e89a3c44,c051f91d,c2de4c60) at mi_switch+0x1b6
thread_single(1,c06ea9c0,e89a3c64,c1652540,c2de4c60) at thread_single+0x1e0
exit1(c2de4c60,9,0,e89a3ce4,c0519447) at exit1+0x11d
expand_name(c2de4c60,9,100,0,0) at expand_name
postsig(9,202,c06e5db8,17f,8058f84) at postsig+0x204
ast(e89a3d48) at ast+0x5e7
doreti_ast() at doreti_ast+0x17

Looking at the proc in kgdb:

(kgdb) p $proc
$1 = (struct proc *) 0xc1652540
(kgdb) p * $proc
$2 = {
  p_list = {
    le_next = 0xc1b66e00, 
    le_prev = 0xc1b858c0
  }, 
  p_ksegrps = {
    tqh_first = 0xc2de3880, 
    tqh_last = 0xc15b2204
  }, 
  p_threads = {
    tqh_first = 0xc2de4c60, 
    tqh_last = 0xc2de4c68
  }, 
  p_suspended = {
    tqh_first = 0xc2de4c60, 
    tqh_last = 0xc2de4c88
  }, 
  p_ucred = 0xc1ac7d80, 
  p_fd = 0xc187d300, 
  p_fdtol = 0x0, 
  p_stats = 0xe52fe000, 
  p_limit = 0xc1bf1700, 
  p_upages_obj = 0xc0c4218c, 
  p_sigacts = 0xc21bc000, 
  p_flag = 0xc402, 
  p_sflag = 0x1, 
  p_state = PRS_NORMAL, 
  p_pid = 0xd4b, 
  p_hash = {
    le_next = 0x0, 
    le_prev = 0xc155552c
  }, 
  p_pglist = {
    le_next = 0x0, 
    le_prev = 0xc1b64248
  }, 
  p_pptr = 0xc1561e00, 
  p_sibling = {
    le_next = 0xc1876c40, 
    le_prev = 0xc1561e68
  }, 
  p_children = {
    lh_first = 0x0
  }, 
  p_mtx = {
    mtx_object = {
      lo_class = 0xc06e90bc, 
      lo_name = 0xc06bb669 "process lock", 
      lo_type = 0xc06bb669 "process lock", 
      lo_flags = 0x430000, 
      lo_list = {
        tqe_next = 0x0, 
        tqe_prev = 0x0
      }, 
      lo_witness = 0x0
    }, 
    mtx_lock = 0x4, 
    mtx_recurse = 0x0
  }, 
  p_oppid = 0x0, 
  p_vmspace = 0xc1af0258, 
  p_swtime = 0x3a7, 
  p_realtimer = {
    it_interval = {
      tv_sec = 0x0, 
      tv_usec = 0x0
    }, 
    it_value = {
      tv_sec = 0x0, 
---Type <return> to continue, or q <return> to quit---
      tv_usec = 0x0
    }
  }, 
  p_runtime = {
    sec = 0x8, 
    frac = 0x8ce499fd61838320
  }, 
  p_uu = 0x6ecf00, 
  p_su = 0x13a8da, 
  p_iu = 0x1, 
  p_uticks = 0x3a2, 
  p_sticks = 0xa5, 
  p_iticks = 0x0, 
  p_profthreads = 0x0, 
  p_maxthrwaits = 0x0, 
  p_traceflag = 0x0, 
  p_tracevp = 0x0, 
  p_tracecred = 0x0, 
  p_textvp = 0xc1dfaa50, 
  p_siglist = {
    __bits = {0x0, 0x0, 0x0, 0x0}
  }, 
  p_lock = 0x0, 
  p_sigiolst = {
    slh_first = 0x0
  }, 
  p_sigparent = 0x14, 
  p_sig = 0x0, 
  p_code = 0x0, 
  p_stops = 0x0, 
  p_stype = 0x0, 
  p_step = 0x0, 
  p_pfsflags = 0x0, 
  p_nlminfo = 0x0, 
  p_aioinfo = 0x0, 
  p_singlethread = 0xc2de4c60, 
  p_suspcount = 0x1, 
  p_xthread = 0x0, 
  p_magic = 0xbeefface, 
  p_comm = "mx_pingpong\0\0\0\0\0\0\0\0", 
  p_pgrp = 0xc1b64240, 
  p_sysent = 0xc06ff000, 
  p_args = 0xc2de0480, 
  p_cpulimit = 0x7fffffffffffffff, 
  p_nice = 0x0, 
  p_xstat = 0x0, 
  p_klist = {
    kl_lock = 0xc16525ac, 
    kl_list = {
      slh_first = 0x0
    }
  }, 
  p_numthreads = 0x1, 
  p_numksegrps = 0x2, 
  p_md = {
    md_ldt = 0xc181f5c0
  }, 
  p_itcallout = {
    c_links = {
      sle = {
        sle_next = 0x0
      }, 
      tqe = {
        tqe_next = 0x0, 
        tqe_prev = 0x0
      }
    }, 
    c_time = 0x0, 
    c_arg = 0x0, 
---Type <return> to continue, or q <return> to quit---
    c_func = 0, 
    c_flags = 0x8
  }, 
  p_uarea = 0xe52fe000, 
  p_acflag = 0x10, 
  p_ru = 0x0, 
  p_peers = 0x0, 
  p_leader = 0xc1652540, 
  p_emuldata = 0x0, 
  p_label = 0x0, 
  p_sched = 0xc1652700
}


This is happening as of this morning with RELENG_5 (SCHED_4BSD)
and with a ~3 month old 5-current (SCHED_4BSD).  It seems to 
happen on both i386 and amd64.

Question:  Does the ddb ps indicate that there is another thread in the kernel?
If yes, how the heck can I get a trace of it?  Neither 
0xc2de4c60 or 0xc15b2200 shows another stack when passed to 
ddb's tr.

I suspect the other thread is sleeping in a cv_wait_sig() in my driver,
but it would be nice to know for sure..

% ldd mx_pingpong
mx_pingpong:
        libpthread.so.1 => /usr/lib/libpthread.so.1 (0x4807e000)
        libc.so.5 => /lib/libc.so.5 (0x480a3000)


Thanks,

Drew








Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?16703.11479.679335.588170>