From owner-freebsd-threads@FreeBSD.ORG Mon Jul 11 11:02:27 2005 Return-Path: X-Original-To: freebsd-threads@freebsd.org Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0A82F16A42D for ; Mon, 11 Jul 2005 11:02:27 +0000 (GMT) (envelope-from owner-bugmaster@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id B0B1943D78 for ; Mon, 11 Jul 2005 11:02:20 +0000 (GMT) (envelope-from owner-bugmaster@freebsd.org) Received: from freefall.freebsd.org (peter@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.3/8.13.3) with ESMTP id j6BB2Kig011621 for ; Mon, 11 Jul 2005 11:02:20 GMT (envelope-from owner-bugmaster@freebsd.org) Received: (from peter@localhost) by freefall.freebsd.org (8.13.3/8.13.1/Submit) id j6BB2Ju2011615 for freebsd-threads@freebsd.org; Mon, 11 Jul 2005 11:02:19 GMT (envelope-from owner-bugmaster@freebsd.org) Date: Mon, 11 Jul 2005 11:02:19 GMT Message-Id: <200507111102.j6BB2Ju2011615@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: peter set sender to owner-bugmaster@freebsd.org using -f From: FreeBSD bugmaster To: freebsd-threads@FreeBSD.org Cc: Subject: Current problem reports assigned to you X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jul 2005 11:02:27 -0000 Current FreeBSD problem reports Critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2005/01/26] threads/76690threads fork hang in child for (-lc_r & -lthr) o [2005/05/11] threads/80887threads ULE with SMP broke libpthread/libthr on 5 2 problems total. Serious problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2000/07/18] kern/20016 threads pthreads: Cannot set scheduling timer/Can o [2000/08/26] kern/20861 threads libc_r does not honor socket timeouts o [2001/01/20] threads/24472threads libc_r does not honor SO_SNDTIMEO/SO_RCVT o [2001/01/25] threads/24632threads libc_r delicate deviation from libc in ha o [2001/01/25] kern/24641 threads pthread_rwlock_rdlock can deadlock o [2001/11/26] bin/32295 threads pthread dont dequeue signals o [2002/02/01] threads/34536threads accept() blocks other threads o [2002/05/25] kern/38549 threads the procces compiled whith pthread stoppe o [2002/06/27] threads/39922threads [PATCH?] Threaded applications executed w o [2002/08/04] kern/41331 threads Pthread library open sets O_NONBLOCK flag o [2003/03/02] threads/48856threads Setting SIGCHLD to SIG_IGN still leaves z o [2003/03/10] threads/49087threads Signals lost in programs linked with libc o [2003/05/08] threads/51949threads thread in accept cannot be cancelled s [2004/03/15] kern/64313 threads FreeBSD (OpenBSD) pthread implicit set/un o [2004/08/26] threads/70975threads unexpected and unreliable behaviour when o [2004/09/14] threads/71725threads Mysql Crashes frequently giving Sock Erro o [2004/10/05] threads/72353threads Assertion fails in /usr/src/lib/libpthrea o [2004/10/07] threads/72429threads threads blocked in stdio (fgets, etc) are o [2004/10/21] threads/72953threads fork() unblocks blocked signals w/o PTHRE o [2004/12/19] threads/75273threads FBSD 5.3 libpthread (KSE) bug o [2004/12/21] threads/75374threads pthread_kill() ignores SA_SIGINFO flag o [2005/01/26] threads/76694threads fork cause hang in dup()/close() function o [2005/03/10] threads/78660threads Java hangs unkillably in STOP state after o [2005/04/08] threads/79683threads svctcp_create() fails if multiple threads o [2005/04/28] threads/80435threads panic on high loads o [2005/05/19] threads/81258threads Thread specific data is sometimes assigne 26 problems total. Non-critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2000/05/26] kern/18824 threads gethostbyname is not thread safe o [2000/06/13] kern/19247 threads uthread_sigaction.c does not do anything o [2000/10/21] kern/22190 threads A threaded read(2) from a socketpair(2) f o [2001/09/09] threads/30464threads pthread mutex attributes -- pshared o [2002/05/02] threads/37676threads libc_r: msgsnd(), msgrcv(), pread(), pwri s [2002/07/16] threads/40671threads pthread_cancel doesn't remove thread from o [2004/07/13] threads/69020threads pthreads library leaks _gc_mutex o [2004/09/21] threads/71966threads Mlnet Core Dumped : Fatal error '_pq_inse o [2004/11/21] threads/74180threads KSE problem. Applications those riched ma o [2005/01/20] threads/76513threads libpthread is not working o [2005/04/13] threads/79887threads [patch] freopen() isn't thread-safe o [2005/05/13] threads/80992threads abort() sometimes not caught by gdb depen o [2005/05/26] threads/81534threads [PATCH] libc_r close() will fail on any f 13 problems total. From owner-freebsd-threads@FreeBSD.ORG Thu Jul 14 16:25:24 2005 Return-Path: X-Original-To: freebsd-threads@freebsd.org Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E2ACE16A41C for ; Thu, 14 Jul 2005 16:25:24 +0000 (GMT) (envelope-from ghelmer@palisadesys.com) Received: from magellan.palisadesys.com (magellan.palisadesys.com [192.188.162.211]) by mx1.FreeBSD.org (Postfix) with ESMTP id 932C443D45 for ; Thu, 14 Jul 2005 16:25:24 +0000 (GMT) (envelope-from ghelmer@palisadesys.com) Received: from [172.16.1.108] (cetus.palisadesys.com [192.188.162.7]) (authenticated bits=0) by magellan.palisadesys.com (8.12.11/8.12.11) with ESMTP id j6EGPMcM026234 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 14 Jul 2005 11:25:22 -0500 (CDT) (envelope-from ghelmer@palisadesys.com) Message-ID: <42D691F2.3030201@palisadesys.com> Date: Thu, 14 Jul 2005 11:25:22 -0500 From: Guy Helmer User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-threads@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Palisade-MailScanner-Information: Please contact the ISP for more information X-Palisade-MailScanner: Found to be clean X-MailScanner-From: ghelmer@palisadesys.com Subject: system scope threads entering STOP state X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jul 2005 16:25:25 -0000 I have a long-running multithreaded process on FreeBSD 5.4 (SMP, PREEMTPION, SCHED_4BSD) linked with libpthread and I'm creating the threads with attribute PTHREAD_SCOPE_SYSTEM. The threads need to be processing input in near-real-time or its input buffers overflow. I've modified the program so that a thread can fork/execl/waitpid (without WNOHANG) to use an external program for further processing on a batch of input (sometimes via a pipe, other times via writing to a file). However, even under a light input load, the program is now dropping input. While running top(1) in thread mode, I occasionally find all the program's threads are in the STOP state for several consecutive seconds. Is there anything related to the frequent use of fork, execve, or wait4 that would be likely to cause such a situation? I'm not seeing anything obvious in my reading of the kernel sources. Thanks in advance for any help, Guy -- Guy Helmer, Ph.D. Principal System Architect Palisade Systems, Inc. From owner-freebsd-threads@FreeBSD.ORG Thu Jul 14 19:17:20 2005 Return-Path: X-Original-To: freebsd-threads@freebsd.org Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E9FDA16A459 for ; Thu, 14 Jul 2005 19:17:20 +0000 (GMT) (envelope-from julian@elischer.org) Received: from postoffice.vicor-nb.com (postoffice.vicor.com [69.26.56.52]) by mx1.FreeBSD.org (Postfix) with ESMTP id B3D4843D55 for ; Thu, 14 Jul 2005 19:17:19 +0000 (GMT) (envelope-from julian@elischer.org) Received: from localhost (localhost [127.0.0.1]) by postoffice.vicor-nb.com (Postfix) with ESMTP id 2F4244CE965; Thu, 14 Jul 2005 12:17:19 -0700 (PDT) Received: from postoffice.vicor-nb.com ([127.0.0.1]) by localhost (postoffice.vicor-nb.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 91773-03; Thu, 14 Jul 2005 12:17:18 -0700 (PDT) Received: from bigwoop.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by postoffice.vicor-nb.com (Postfix) with ESMTP id 96A054CE87D; Thu, 14 Jul 2005 12:17:18 -0700 (PDT) Received: from [208.206.78.97] (julian.vicor-nb.com [208.206.78.97]) by bigwoop.vicor-nb.com (Postfix) with ESMTP id 854987A403; Thu, 14 Jul 2005 12:17:18 -0700 (PDT) Message-ID: <42D6BA3E.1000306@elischer.org> Date: Thu, 14 Jul 2005 12:17:18 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050629 X-Accept-Language: en, hu MIME-Version: 1.0 To: Guy Helmer References: <42D691F2.3030201@palisadesys.com> In-Reply-To: <42D691F2.3030201@palisadesys.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by amavisd-new at postoffice.vicor.com Cc: freebsd-threads@freebsd.org Subject: Re: system scope threads entering STOP state X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jul 2005 19:17:21 -0000 Guy Helmer wrote: > I have a long-running multithreaded process on FreeBSD 5.4 (SMP, > PREEMTPION, SCHED_4BSD) linked with libpthread and I'm creating the > threads with attribute PTHREAD_SCOPE_SYSTEM. The threads need to be > processing input in near-real-time or its input buffers overflow. > > I've modified the program so that a thread can fork/execl/waitpid > (without WNOHANG) to use an external program for further processing on > a batch of input (sometimes via a pipe, other times via writing to a > file). However, even under a light input load, the program is now > dropping input. While running top(1) in thread mode, I occasionally > find all the program's threads are in the STOP state for several > consecutive seconds. Is there anything related to the frequent use of > fork, execve, or wait4 that would be likely to cause such a > situation? I'm not seeing anything obvious in my reading of the > kernel sources. duirng a fork the parent process is in a variant of the "STOPPED" state, or, rather, if you look at top -H you should see that all teh threads except for that doing the fork, are in the STOPPED state. This is because while a thread is forking the process needs to be single threaded so that there is a consistent image to be copied to teh child. the single threaded state is also enterred for exit() and execve(), though that should not affect your program. I can't imagine why the state would persist for any length of time, unless there is another thread that is in an uninterruptible wait. In that case the other threads have to wait for it to complete what it is doing and come back. I have considerred whether such a thread should not be considerred "already suspended" and in fact some earlier versions of the code did that, however it leads to some inconsistancies and the danger that such a thread will be suspended holding some resource that it should not hold for any length of time. > > Thanks in advance for any help, > Guy > From owner-freebsd-threads@FreeBSD.ORG Fri Jul 15 13:36:00 2005 Return-Path: X-Original-To: freebsd-threads@freebsd.org Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 16F7016A41C for ; Fri, 15 Jul 2005 13:36:00 +0000 (GMT) (envelope-from ghelmer@palisadesys.com) Received: from magellan.palisadesys.com (magellan.palisadesys.com [192.188.162.211]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6CA4243D48 for ; Fri, 15 Jul 2005 13:35:59 +0000 (GMT) (envelope-from ghelmer@palisadesys.com) Received: from [172.16.1.108] (cetus.palisadesys.com [192.188.162.7]) (authenticated bits=0) by magellan.palisadesys.com (8.12.11/8.12.11) with ESMTP id j6FDZqvH054975 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 15 Jul 2005 08:35:53 -0500 (CDT) (envelope-from ghelmer@palisadesys.com) Message-ID: <42D7BBB8.9050207@palisadesys.com> Date: Fri, 15 Jul 2005 08:35:52 -0500 From: Guy Helmer User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Julian Elischer References: <42D691F2.3030201@palisadesys.com> <42D6BA3E.1000306@elischer.org> In-Reply-To: <42D6BA3E.1000306@elischer.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Palisade-MailScanner-Information: Please contact the ISP for more information X-Palisade-MailScanner: Found to be clean X-MailScanner-From: ghelmer@palisadesys.com Cc: freebsd-threads@freebsd.org Subject: Re: system scope threads entering STOP state X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jul 2005 13:36:00 -0000 Julian Elischer wrote: > > > Guy Helmer wrote: > >> I have a long-running multithreaded process on FreeBSD 5.4 (SMP, >> PREEMTPION, SCHED_4BSD) linked with libpthread and I'm creating the >> threads with attribute PTHREAD_SCOPE_SYSTEM. The threads need to be >> processing input in near-real-time or its input buffers overflow. >> >> I've modified the program so that a thread can fork/execl/waitpid >> (without WNOHANG) to use an external program for further processing >> on a batch of input (sometimes via a pipe, other times via writing to >> a file). However, even under a light input load, the program is now >> dropping input. While running top(1) in thread mode, I occasionally >> find all the program's threads are in the STOP state for several >> consecutive seconds. Is there anything related to the frequent use >> of fork, execve, or wait4 that would be likely to cause such a >> situation? I'm not seeing anything obvious in my reading of the >> kernel sources. > > duirng a fork the parent process is in a variant of the "STOPPED" > state, or, rather, if you > look at top -H you should see that all teh threads except for that > doing the fork, are in > the STOPPED state. > > This is because while a thread is forking the process needs to be > single threaded so that > there is a consistent image to be copied to teh child. > > the single threaded state is also enterred for exit() and execve(), > though that should not affect your program. > > I can't imagine why the state would persist for any length of time, > unless there is another thread > that is in an uninterruptible wait. In that case the other threads > have to wait for it to complete > what it is doing and come back. I have considerred whether such a > thread should not be considerred > "already suspended" and in fact some earlier versions of the code did > that, however it leads to some > inconsistancies and the danger that such a thread will be suspended > holding some resource > that it should not hold for any length of time. > Thanks for the explanation. I was that the other threads would be stopped during a fork(2) but it looked to me like the STOP would be brief. Would an "uninterruptible wait" include system calls like a write(2) of a large buffer? That would explain it... Thanks, Guy -- Guy Helmer, Ph.D. Principal System Architect Palisade Systems, Inc. From owner-freebsd-threads@FreeBSD.ORG Fri Jul 15 19:07:35 2005 Return-Path: X-Original-To: freebsd-threads@freebsd.org Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 137A416A422 for ; Fri, 15 Jul 2005 19:07:35 +0000 (GMT) (envelope-from caelian@gmail.com) Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.192]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9753243D46 for ; Fri, 15 Jul 2005 19:07:34 +0000 (GMT) (envelope-from caelian@gmail.com) Received: by zproxy.gmail.com with SMTP id 40so396571nzk for ; Fri, 15 Jul 2005 12:07:33 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:subject:from:to:cc:content-type:date:message-id:mime-version:x-mailer:content-transfer-encoding; b=oOVnW3SV+ByLQ+5gUIxhRkUJuCMshu3GRSVexMuHjs3P0CkMR4MORQsQ9IDRfP5mm85HmsAGpJK29sj2ElYMs4GDx8r9lWimZuxEjQeqAHKilN2CLnEplRPghdRCFnbVvlsbwZSsRO0q7uJt8MlxmUWITevms0Wz14xwjIhvLhg= Received: by 10.36.36.14 with SMTP id j14mr1481089nzj; Fri, 15 Jul 2005 12:00:20 -0700 (PDT) Received: from ?192.168.15.103? ([68.190.230.198]) by mx.gmail.com with ESMTP id 38sm3426124nza.2005.07.15.12.00.20; Fri, 15 Jul 2005 12:00:20 -0700 (PDT) From: Pascal Hofstee To: freebsd-threads@freebsd.org Content-Type: text/plain Date: Fri, 15 Jul 2005 11:59:43 -0700 Message-Id: <1121453983.672.15.camel@synergy.charterpipeline.net.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.3.5.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: David Ayers Subject: GNUstep, libobjc and threading X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jul 2005 19:07:35 -0000 Hi, Ever since the GNUstep project decided to make certain changes regarding how GNUstep applications should be linked to external libraries ... GNUstep builds on the FreeBSD platform appeared to be broken where they used to function properly before. (Problem being applications segfaulting in libobjc's thread initialization) During the last couple of days, David Ayers, one of the people with an active GNUstep interest and willingness to see this issue resolved was kind enough to allow me to provide him access to my FreeBSD/amd64 7.0-CURRENT machine in an attempt to get to the bottom of this issue. His final analysis boils down to the following: ----[ Analysis of GNUstep on FreeBSD pthread problem ]---- The real bug does indeed lie in the internal initialization process of the FreeBSD libpthread library. In this case when libobjc is not excplitly linked into the executable, (and is further at the bottom of the ldd depenency list) it get's initialized before libpthread does. i.e. it starts calling libpthread functions before the following constructor: void _thread_init_hack(void) __attribute__ ((constructor)); void _thread_init_hack(void) { _libpthread_init(NULL); } of the libpthread library is called. This constructor sets up the data structures for _get_curthread(). The function that libobjc calls early: pthread_key_create() calls _get_currthread which still returns NULL. This pointer is then dereferenced in the THR_LOCK_ACQUIRE macro resulting in the segfault. Many of the mutex functions have already been guarded for this case with the following code fragements: if (_thr_initial == NULL) _libpthread_init(NULL); See also the following comment in thr_create.c: ... * Some notes on new thread creation and first time initializion * to enable multi-threading. * * There are basically two things that need to be done. * * 1) The internal library variables must be initialized. * 2) Upcalls need to be enabled to allow multiple threads * to be run. * * The first may be done as a result of other pthread functions * being called. When _thr_initial is null, _libpthread_init is * called to initialize the internal variables; this also creates * or sets the initial thread. It'd be nice to automatically * have _libpthread_init called on program execution so we don't * have to have checks throughout the library. ... So they seem to be aware of the issue but haven't guarded all functions like pthread_key_create() which libobjc calls. I don't believe that we can find a reliable workaround within GNUstep. I also don't know if this bug is in a stable release version of FreeBSD. But I want to mention that a call to pthread_self() would workaround this. Yet this call would have to be done in libobjc's __objc_init_thread_system and not in GNUstep. The other hack is to fiddle with the link order so that the constructor above can cover up the issue. I'll leave it up to the -make maintainers to decide what to do. My current tendency is to close this bug as invalid at least until someone verifies that the issue exists with released versions of FreeBSD. And even then a workaround should probably be done in libobjc (which seems to be installed by FreeBSD but if libobjc needs updating you might as well update libpthread instead and fix the bug properly). ----[ End of Analysis]---- On a sidenote .. this same problem is duplicated (likely similarly) when using libthr ... and does not occur when using libc_r. Is the above analysis enough information to hopefully fix this problem with FreeBSD's threading libraries ... and would properly gaurding the mentioned pthread_key_create function with a similar _thread_initial check be sufficient ? With kind regards, and hoping to get some feedback on this ... -- Pascal Hofstee From owner-freebsd-threads@FreeBSD.ORG Fri Jul 15 20:16:32 2005 Return-Path: X-Original-To: freebsd-threads@freebsd.org Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D000A16A41C for ; Fri, 15 Jul 2005 20:16:32 +0000 (GMT) (envelope-from julian@elischer.org) Received: from postoffice.vicor-nb.com (postoffice.vicor.com [69.26.56.52]) by mx1.FreeBSD.org (Postfix) with ESMTP id F148A43D4C for ; Fri, 15 Jul 2005 20:16:31 +0000 (GMT) (envelope-from julian@elischer.org) Received: from localhost (localhost [127.0.0.1]) by postoffice.vicor-nb.com (Postfix) with ESMTP id BE4C64CE7BF; Fri, 15 Jul 2005 13:16:31 -0700 (PDT) Received: from postoffice.vicor-nb.com ([127.0.0.1]) by localhost (postoffice.vicor-nb.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 43427-02; Fri, 15 Jul 2005 13:16:31 -0700 (PDT) Received: from bigwoop.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by postoffice.vicor-nb.com (Postfix) with ESMTP id 1474E4CE7BC; Fri, 15 Jul 2005 13:16:31 -0700 (PDT) Received: from [208.206.78.97] (julian.vicor-nb.com [208.206.78.97]) by bigwoop.vicor-nb.com (Postfix) with ESMTP id 080FA7A424; Fri, 15 Jul 2005 13:16:31 -0700 (PDT) Message-ID: <42D8199E.1060702@elischer.org> Date: Fri, 15 Jul 2005 13:16:30 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050629 X-Accept-Language: en, hu MIME-Version: 1.0 To: Guy Helmer References: <42D691F2.3030201@palisadesys.com> <42D6BA3E.1000306@elischer.org> <42D7BBB8.9050207@palisadesys.com> In-Reply-To: <42D7BBB8.9050207@palisadesys.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by amavisd-new at postoffice.vicor.com Cc: freebsd-threads@freebsd.org Subject: Re: system scope threads entering STOP state X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jul 2005 20:16:33 -0000 Guy Helmer wrote: > Julian Elischer wrote: > >> >> >> Guy Helmer wrote: >> >>> I have a long-running multithreaded process on FreeBSD 5.4 (SMP, >>> PREEMTPION, SCHED_4BSD) linked with libpthread and I'm creating the >>> threads with attribute PTHREAD_SCOPE_SYSTEM. The threads need to be >>> processing input in near-real-time or its input buffers overflow. >>> >>> I've modified the program so that a thread can fork/execl/waitpid >>> (without WNOHANG) to use an external program for further processing >>> on a batch of input (sometimes via a pipe, other times via writing >>> to a file). However, even under a light input load, the program is >>> now dropping input. While running top(1) in thread mode, I >>> occasionally find all the program's threads are in the STOP state >>> for several consecutive seconds. Is there anything related to the >>> frequent use of fork, execve, or wait4 that would be likely to cause >>> such a situation? I'm not seeing anything obvious in my reading of >>> the kernel sources. >> >> >> duirng a fork the parent process is in a variant of the "STOPPED" >> state, or, rather, if you >> look at top -H you should see that all teh threads except for that >> doing the fork, are in >> the STOPPED state. >> >> This is because while a thread is forking the process needs to be >> single threaded so that >> there is a consistent image to be copied to teh child. >> >> the single threaded state is also enterred for exit() and execve(), >> though that should not affect your program. >> >> I can't imagine why the state would persist for any length of time, >> unless there is another thread >> that is in an uninterruptible wait. In that case the other threads >> have to wait for it to complete >> what it is doing and come back. I have considerred whether such a >> thread should not be considerred >> "already suspended" and in fact some earlier versions of the code did >> that, however it leads to some >> inconsistancies and the danger that such a thread will be suspended >> holding some resource >> that it should not hold for any length of time. >> > Thanks for the explanation. I was that the other threads would be > stopped during a fork(2) but it looked to me like the STOP would be > brief. You were *what*? "aware"?, "suspicious"? :-) > > Would an "uninterruptible wait" include system calls like a write(2) > of a large buffer? That would explain it... it's hard to say.. Possibly yes, if it had to allocate buffer space. However this is a question for others.. Is it possible to duplicate this on request? > > Thanks, > Guy >