Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Nov 2005 17:15:59 -0800 (PST)
From:      Garry Belka <garry@NetworkPhysics.COM>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   kern/89262: multi-threaded process hangs in kernel in fork() 
Message-ID:  <200511190115.jAJ1Fxhg061478@focus5.fractal.networkphysics.com>
Resent-Message-ID: <200511190120.jAJ1KTXE021247@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         89262
>Category:       kern
>Synopsis:       multi-threaded process hangs in kernel in fork()
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Nov 19 01:20:28 GMT 2005
>Closed-Date:
>Last-Modified:
>Originator:     Garry Belka
>Release:        FreeBSD 5.4-RELEASE and 6.0 RELEASE i386
>Organization:
Network Physics
>Environment:
System: FreeBSD tempo 5.4-RELEASE SMP
>Description:

We see, not too often, that a Java process hangs and can't be killed even by SIGKILL.

Apparently, one of the process threads forks. fork1() in kernel attempts to enter a single-threaded mode, but thread_single() fails to complete and hangs waiting until all threads but proc-> p_singlethread are suspended. One of the remaining threads is not suspended and has only SLEEP flag set.

pid    thread    thid  flags  inhib pflags  comm         wchan

 1982 0xcd150180 100351 00020c00    1 0088 java <sched_switch+323>
    mi_switch + 426 in section .text
    thread_suspend_check + 298 in section .text
    userret + 58 in section .text
    fork_return + 18 in section .text
    fork_exit + 102 in section .text
 1982 0xce120c00 100948 00000c00    1 0880 java <sched_switch+323>
    mi_switch + 426 in section .text
    thread_suspend_check + 298 in section .text
    userret + 58 in section .text
    ast + 844 in section .text
 1982 0xcd740900 100616 00000808    2 0080 java         sbwait cd557320
    mi_switch + 426 in section .text              (SLEEPING, not SUSPENDED)
    sleepq_switch + 164 in section .text
    sleepq_wait_sig + 12 in section .text
    msleep + 566 in section .text
    sbwait + 56 in section .text
    soreceive + 572 in section .text
    soo_read + 65 in section .text
    dofileread + 173 in section .text
    read + 59 in section .text
    syscall + 551 in section .text
 1982 0xc3ae7900 100906 00000808    1 0080 java             mi_switch + 426 in section .text
    sleepq_switch + 164 in section .text
    sleepq_wait_sig + 12 in section .text
    msleep + 566 in section .text
    sbwait + 56 in section .text
    soreceive + 572 in section .text
    soo_read + 65 in section .text
    dofileread + 173 in section .text
    read + 59 in section .text
    syscall + 551 in section .text
 1982 0xcd719780 100605 00000c00    1 0880 java             mi_switch + 426 in section .text
    thread_suspend_check + 298 in section .text
    userret + 58 in section .text
    ast + 844 in section .text
 1982 0xcd6d9000 100830 00000000    1 0880 java      (p_singlethread)      mi_switch + 426 in section .text     - line 355
    thread_single + 497 in section .text - line 863
    fork1 + 169 in section .text         - line 257
    fork + 24 in section .text
    syscall + 551 in section .text

Signals in singlethread state are not really delivered, SIGKILL stays with the first thread in the queue, and so we got a deadlock.


I think that we got into this state because the non-suspended thread was running when singlethread was attempting to put every thread to sleep. All threads were marked TDF_ASTPENDING. However, a bit later ast() failed to deal correctly with a thread that had non-null td->td_mailbox.

sys/kern/subr_trap.c:ast()
       if ((p->p_flag & P_SA) && (td->td_mailbox == NULL))
                thread_user_enter(td);
	
>How-To-Repeat:
	start multiple threads in java on an SMP machine
and keep on calling system()
in those threads. it will take some time
	
>Fix:

	

--- single_suspend.patch begins here ---
Index: kern/kern_thread.c
===================================================================
RCS file: /u1/Repo/FreeBSD/sys/kern/kern_thread.c,v
retrieving revision 1.3
diff -u -r1.3 kern_thread.c
--- kern/kern_thread.c  9 Jul 2005 01:27:18 -0000       1.3
+++ kern/kern_thread.c  15 Nov 2005 03:01:22 -0000
@@ -1001,6 +1001,18 @@
 }
 
 void
+thread_check_single_suspend(struct thread *td)
+{
+        struct proc *p = td->td_proc;
+
+        if (__predict_false(P_SHOULDSTOP(p))) {
+                PROC_LOCK(p);
+                thread_suspend_check(0);
+                PROC_UNLOCK(p);
+        }
+}
+
+void
 thread_unsuspend_one(struct thread *td)
 {
        struct proc *p = td->td_proc;
Index: kern/subr_trap.c
===================================================================
RCS file: /u1/Repo/FreeBSD/sys/kern/subr_trap.c,v
retrieving revision 1.1.1.2
diff -u -r1.1.1.2 subr_trap.c
--- kern/subr_trap.c    8 Jul 2005 03:01:08 -0000       1.1.1.2
+++ kern/subr_trap.c    15 Nov 2005 03:01:23 -0000
@@ -171,6 +171,8 @@
 
        if ((p->p_flag & P_SA) && (td->td_mailbox == NULL))
                thread_user_enter(td);
+        else 
+               thread_check_single_suspend(td);
        /*
         * This updates the p_sflag's for the checks below in one
         * "atomic" operation with turning off the astpending flag.
Index: sys/proc.h
===================================================================
RCS file: /u1/Repo/FreeBSD/sys/sys/proc.h,v
retrieving revision 1.1.1.5
diff -u -r1.1.1.5 proc.h
--- sys/proc.h  8 Jul 2005 03:07:51 -0000       1.1.1.5
+++ sys/proc.h  15 Nov 2005 03:01:28 -0000
@@ -887,6 +887,7 @@
 void   ksegrp_unlink(struct ksegrp *kg);
 void   thread_signal_add(struct thread *td, int sig);
 struct thread *thread_alloc(void);
+void   thread_check_single_suspend(struct thread *td);
 void   thread_exit(void) __dead2;
 int    thread_export_context(struct thread *td, int willexit);
 void   thread_free(struct thread *td);
--- single_suspend.patch ends here ---


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200511190115.jAJ1Fxhg061478>