Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Jul 2004 14:06:42 +0400 (MSD)
From:      Igor Sysoev <is@rambler-co.ru>
To:        freebsd-hackers@freebsd.org
Cc:        Jonathan Lemon <jlemon@FreeBSD.org>
Subject:   panic caused by EVFILT_SIGNAL detaching in rfork()ed thread
Message-ID:  <20040705140331.K723@is.park.rambler.ru>

next in thread | raw e-mail | index | archive | help
While development of my http server nginx I've got panics caused by detaching
of the EVFILT_SIGNAL event. The worker process starts two worker threads
created by rfork(RFPROC|RFTHREAD|RFMEM). Each thread opens kqueue and
adds the EVFILT_SIGNAL event. If the main thread of the worker process
exits abnormally (on 4.x) or simply exits (on 5.x) then kernel may panic:

panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x4
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc014bc96
stack pointer           = 0x10:0xd41d4dd4
frame pointer           = 0x10:0xd41d4dd4
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 396 (nginx)
interrupt mask          = none
trap number             = 12
panic: page fault

[ skipped ]

(kgdb) bt
#0  dumpsys () at ../../kern/kern_shutdown.c:487
#1  0xc01496a3 in boot (howto=256) at ../../kern/kern_shutdown.c:316
#2  0xc0149ae1 in panic (fmt=0xc023734c "%s") at ../../kern/kern_shutdown.c:595
#3  0xc0200eb7 in trap_fatal (frame=0xd41d4d94, eva=4)
    at ../../i386/i386/trap.c:974
#4  0xc0200b65 in trap_pfault (frame=0xd41d4d94, usermode=0, eva=4)
    at ../../i386/i386/trap.c:867
#5  0xc020070b in trap (frame={tf_fs = -736296944, tf_es = -1072234480,
      tf_ds = -961150960, tf_edi = -736334656, tf_esi = 0,
      tf_ebp = -736277036, tf_isp = -736277056, tf_ebx = -736334592,
      tf_edx = 0, tf_ecx = -736549824, tf_eax = -736334592, tf_trapno = 12,
      tf_err = 0, tf_eip = -1072382826, tf_cs = 8, tf_eflags = 66055,
      tf_esp = -736277000, tf_ss = -1072429537}) at ../../i386/i386/trap.c:466
#6  0xc014bc96 in filt_sigdetach (kn=0xd41c6d00) at ../../kern/kern_sig.c:1741
#7  0xc014061f in kqueue_close (fp=0xc6dc1900, p=0xd4192440)
    at ../../kern/kern_event.c:797
#8  0xc013f277 in fdrop (fp=0xc6dc1900, p=0xd4192440) at ../../sys/file.h:218
#9  0xc013f1bf in closef (fp=0xc6dc1900, p=0xd4192440)
    at ../../kern/kern_descrip.c:1279
#10 0xc013edcc in fdfree (p=0xd4192440) at ../../kern/kern_descrip.c:1061
#11 0xc0141a89 in exit1 (p=0xd4192440, rv=9) at ../../kern/kern_exit.c:188
#12 0xc014b5de in sigexit (p=0xd4192440, sig=9) at ../../kern/kern_sig.c:1503
#13 0xc014b358 in postsig (sig=9) at ../../kern/kern_sig.c:1406
#14 0xc0201397 in syscall2 (frame={tf_fs = 47, tf_es = -1078001617,
      tf_ds = -1078001617, tf_edi = 134823956, tf_esi = 1,
      tf_ebp = -1077938752, tf_isp = -736276524, tf_ebx = 2, tf_edx = 12,
      tf_ecx = 134811400, tf_eax = 0, tf_trapno = 7, tf_err = 2,
      tf_eip = 672013132, tf_cs = 31, tf_eflags = 514, tf_esp = -1077938780,
      tf_ss = 47}) at ../../i386/i386/trap.c:197
#15 0xc01f42a5 in Xint0x80_syscall ()
#16 0x8055d18 in ?? ()
#17 0x80544b3 in ?? ()
#18 0x8055415 in ?? ()
#19 0x8054ed6 in ?? ()
#20 0x8049c02 in ?? ()
#21 0x8049976 in ?? ()
(kgdb) fr 6
#6  0xc014bc96 in filt_sigdetach (kn=0xd41c6d00) at ../../kern/kern_sig.c:1741
1741            SLIST_REMOVE(&p->p_klist, kn, knote, kn_selnext);
(kgdb)  p *(*(struct knote *)0xd41c6d00)->kn_ptr.p_proc
$1 = {p_procq = {tqe_next = 0xd4191c20, tqe_prev = 0xc0279e50}, p_list = {
    le_next = 0x0, le_prev = 0xc0279de4}, p_cred = 0x0, p_fd = 0xc6dc2500,
  p_stats = 0xd41eecd0, p_limit = 0xc6c6b500, p_upages_obj = 0xd41d9c60,
  p_procsig = 0x0, p_flag = 24838, p_stat = 5 '\005', p_pad1 = "\000\000",
  p_pid = 402, p_hash = {le_next = 0x0, le_prev = 0xc6a61e48}, p_pglist = {
    le_next = 0x0, le_prev = 0xc6dac668}, p_pptr = 0xd4192c60, p_sibling = {
    le_next = 0x0, le_prev = 0xd4192cb0}, p_children = {lh_first = 0x0},
  p_ithandle = {callout = 0x0}, p_oppid = 0, p_dupfd = 0, p_vmspace = 0x0,
  p_estcpu = 0, p_cpticks = 0, p_pctcpu = 13, p_wchan = 0x0,
  p_wmesg = 0xc021c35f "ttywai", p_swtime = 5, p_slptime = 0, p_realtimer = {
    it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0,
      tv_usec = 0}}, p_runtime = 180692, p_uu = 31474, p_su = 149503,
  p_iu = 1, p_uticks = 4, p_sticks = 19, p_iticks = 0, p_traceflag = 0,
  p_tracep = 0x0, p_siglist = {__bits = {0, 0, 0, 0}}, p_textvp = 0x0,
  p_lock = 0 '\000', p_oncpu = 0 '\000', p_lastcpu = 0 '\000',
  p_rqindex = 6 '\006', p_locks = 0, p_simple_locks = 0, p_stops = 0,
  p_stype = 0, p_step = 0 '\000', p_pfsflags = 0 '\000', p_pad3 = "\000",
  p_retval = {0, 672525728}, p_sigiolst = {slh_first = 0x0}, p_sigparent = 20,
  p_oldsigmask = {__bits = {0, 0, 0, 0}}, p_sig = 0, p_code = 0, p_klist = {
    slh_first = 0x0}, p_sigmask = {__bits = {0, 0, 0, 0}}, p_sigstk = {
    ss_sp = 0x0, ss_size = 0, ss_flags = 4}, p_priority = 50 '2',
  p_usrpri = 50 '2', p_nice = 0 '\000',
  p_comm = "top\000\000\000ty\000sion\000\000\000", p_pgrp = 0x0,
  p_sysent = 0xc0241900, p_rtprio = {type = 1, prio = 0}, p_prison = 0x0,
  p_args = 0xc6b66e20, p_addr = 0xd41ee000, p_md = {md_regs = 0xd41f0fa8},
  p_xstat = 0, p_acflag = 0, p_ru = 0x0, p_nthreads = 0, p_aioinfo = 0x0,
  p_wakeup = 0, p_peers = 0x0, p_leader = 0xd4191f60, p_asleep = {
    as_priority = 0, as_timo = 0}, p_emuldata = 0x0}
(kgdb)

The following patch against 4.8 resolved the panics:

------------
--- src/sys/kern/kern_sig.c        Wed Jul  3 18:43:27 2002
+++ src/sys/kern/kern_sig.c        Mon Jul  5 10:16:56 2004
@@ -1738,7 +1738,9 @@
 {
        struct proc *p = kn->kn_ptr.p_proc;

-       SLIST_REMOVE(&p->p_klist, kn, knote, kn_selnext);
+       if (!SLIST_EMPTY(&p->p_klist)) {
+               SLIST_REMOVE(&p->p_klist, kn, knote, kn_selnext);
+       }
 }

 /*
------------

For 5.x patch is similar (tested on 5.2.1).


By the way, I found that rfork()ed thread will get EVFILT_SIGNAL events
only if namely this thread added the filter to kqueue. If one thread added
the filter then another thread would not get this filter events. It's probabaly
caused by the implementation EVFILT_SIGNAL filter - as EVFILT_PROC it
uses p->p_klist. I think it should be documented in man page.


Igor Sysoev
http://sysoev.ru/en/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040705140331.K723>