From owner-freebsd-threads@FreeBSD.ORG Sun Sep 12 14:18:41 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9B5B516A4CE for ; Sun, 12 Sep 2004 14:18:41 +0000 (GMT) Received: from bps.jodocus.org (g157016.upc-g.chello.nl [80.57.157.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id DDFD043D48 for ; Sun, 12 Sep 2004 14:18:40 +0000 (GMT) (envelope-from joost@jodocus.org) Received: from jodocus.org (localhost [127.0.0.1]) by bps.jodocus.org (8.13.1/8.12.10) with ESMTP id i8CEIdpf090054 for ; Sun, 12 Sep 2004 16:18:39 +0200 (CEST) (envelope-from joost@jodocus.org) Received: (from joost@localhost) by jodocus.org (8.13.1/8.12.10/Submit) id i8CEIcm3090053 for freebsd-threads@freebsd.org; Sun, 12 Sep 2004 16:18:38 +0200 (CEST) (envelope-from joost) Date: Sun, 12 Sep 2004 16:18:38 +0200 From: Joost Bekkers To: freebsd-threads@freebsd.org Message-ID: <20040912141838.GA89862@bps.jodocus.org> Mail-Followup-To: Joost Bekkers , freebsd-threads@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.4.2.1i Subject: SIGILL @ pthread_create() after execv X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Sep 2004 14:18:41 -0000 Hello After upgrading to 5.3-BETA3 (from 5.2.1-p9) one of my programs started to crash with Illigal Instruction (SIGILL) after it restarted itself with a execv gdb is telling me: Program terminated with signal 4, Illegal instruction. #0 0x28274d3f in pthread_testcancel () from /usr/lib/libpthread.so.1 (gdb) where #0 0x28274d3f in pthread_testcancel () from /usr/lib/libpthread.so.1 #1 0x2826126d in pthread_create () from /usr/lib/libpthread.so.1 #2 0x08151364 in rdns_cache_init () at rdns_cache.c:317 #3 0x081513d6 in gethostname_cached (addr=0x9cb757e "À¨dà", len=4, ttl_refresh=0) at rdns_cache.c:336 #4 0x0811b17a in dns_gethostname (desc=0x9cb756c) at fd_network.c:130 #5 0x080cb25e in fread_char (ch=0x9cb8418, fp=0x8244180) at save.c:1215 #6 0x080ca539 in load_char_obj (d=0x9cb756c, name=0xbfbfd960 "Jodocus") at save.c:930 #7 0x0811afa9 in copyover_recover_players () at fd_copyover.c:337 #8 0x0807a1d7 in main (argc=5, argv=0xbfbfec70) at comm.c:256 I'm at a loss on how to get to the bottom of this problem. Can anybody shed some light on this? thanks -- greetz Joost joost@jodocus.org From owner-freebsd-threads@FreeBSD.ORG Sun Sep 12 17:40:17 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E80BA16A4CF for ; Sun, 12 Sep 2004 17:40:17 +0000 (GMT) Received: from pimout2-ext.prodigy.net (pimout2-ext.prodigy.net [207.115.63.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 78F5E43D48 for ; Sun, 12 Sep 2004 17:40:17 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (adsl-216-100-132-188.dsl.snfc21.pacbell.net [216.100.132.188])i8CHeFvd196266; Sun, 12 Sep 2004 13:40:16 -0400 Message-ID: <414489FF.3090705@elischer.org> Date: Sun, 12 Sep 2004 10:40:15 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524 X-Accept-Language: en, hu MIME-Version: 1.0 To: Joost Bekkers References: <20040912141838.GA89862@bps.jodocus.org> In-Reply-To: <20040912141838.GA89862@bps.jodocus.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-threads@freebsd.org Subject: Re: SIGILL @ pthread_create() after execv X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Sep 2004 17:40:18 -0000 Joost Bekkers wrote: > Hello > > After upgrading to 5.3-BETA3 (from 5.2.1-p9) one of my programs started to crash with > Illigal Instruction (SIGILL) after it restarted itself with a execv > > gdb is telling me: > > Program terminated with signal 4, Illegal instruction. > #0 0x28274d3f in pthread_testcancel () from /usr/lib/libpthread.so.1 > (gdb) where > #0 0x28274d3f in pthread_testcancel () from /usr/lib/libpthread.so.1 > #1 0x2826126d in pthread_create () from /usr/lib/libpthread.so.1 > #2 0x08151364 in rdns_cache_init () at rdns_cache.c:317 > #3 0x081513d6 in gethostname_cached (addr=0x9cb757e "?d?", len=4, ttl_refresh=0) at rdns_cache.c:336 > #4 0x0811b17a in dns_gethostname (desc=0x9cb756c) at fd_network.c:130 > #5 0x080cb25e in fread_char (ch=0x9cb8418, fp=0x8244180) at save.c:1215 > #6 0x080ca539 in load_char_obj (d=0x9cb756c, name=0xbfbfd960 "Jodocus") at save.c:930 > #7 0x0811afa9 in copyover_recover_players () at fd_copyover.c:337 > #8 0x0807a1d7 in main (argc=5, argv=0xbfbfec70) at comm.c:256 > > I'm at a loss on how to get to the bottom of this problem. I suspect we've screwed execve for threaded progrms :-) Guys I probably have to free the 'upcall' structure or the first ptrhead call after execvs will find th e old one and try upcall to the wrong place.. I'll look at this tonight I hope.. thanks for the report.. > > Can anybody shed some light on this? > > thanks > From owner-freebsd-threads@FreeBSD.ORG Mon Sep 13 11:02:21 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6F38116A4CE for ; Mon, 13 Sep 2004 11:02:21 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 47F0843D31 for ; Mon, 13 Sep 2004 11:02:21 +0000 (GMT) (envelope-from owner-bugmaster@freebsd.org) Received: from freefall.freebsd.org (peter@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.11/8.12.11) with ESMTP id i8DB2Ljm048898 for ; Mon, 13 Sep 2004 11:02:21 GMT (envelope-from owner-bugmaster@freebsd.org) Received: (from peter@localhost) by freefall.freebsd.org (8.12.11/8.12.11/Submit) id i8DB2Km7048892 for freebsd-threads@freebsd.org; Mon, 13 Sep 2004 11:02:20 GMT (envelope-from owner-bugmaster@freebsd.org) Date: Mon, 13 Sep 2004 11:02:20 GMT Message-Id: <200409131102.i8DB2Km7048892@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: peter set sender to owner-bugmaster@freebsd.org using -f From: FreeBSD bugmaster To: freebsd-threads@FreeBSD.org Subject: Current problem reports assigned to you X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Sep 2004 11:02:21 -0000 Current FreeBSD problem reports Critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2004/04/22] threads/65883threads libkse's sigwait does not work after fork 1 problem total. Serious problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2000/07/18] kern/20016 threads pthreads: Cannot set scheduling timer/Can o [2000/08/26] kern/20861 threads libc_r does not honor socket timeouts o [2001/01/20] threads/24472threads libc_r does not honor SO_SNDTIMEO/SO_RCVT o [2001/01/25] threads/24632threads libc_r delicate deviation from libc in ha o [2001/01/25] kern/24641 threads pthread_rwlock_rdlock can deadlock o [2001/11/26] bin/32295 threads pthread dont dequeue signals o [2002/02/01] threads/34536threads accept() blocks other threads o [2002/05/25] kern/38549 threads the procces compiled whith pthread stoppe o [2002/06/27] threads/39922threads [PATCH?] Threaded applications executed w o [2002/08/04] kern/41331 threads Pthread library open sets O_NONBLOCK flag o [2003/03/02] threads/48856threads Setting SIGCHLD to SIG_IGN still leaves z o [2003/03/10] threads/49087threads Signals lost in programs linked with libc o [2003/05/08] threads/51949threads thread in accept cannot be cancelled s [2004/03/15] kern/64313 threads FreeBSD (OpenBSD) pthread implicit set/un o [2004/08/26] threads/70975threads unexpected and unreliable behaviour when 15 problems total. Non-critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2000/05/26] kern/18824 threads gethostbyname is not thread safe o [2000/06/13] kern/19247 threads uthread_sigaction.c does not do anything o [2000/10/21] kern/22190 threads A threaded read(2) from a socketpair(2) f o [2001/09/09] threads/30464threads pthread mutex attributes -- pshared o [2002/05/02] threads/37676threads libc_r: msgsnd(), msgrcv(), pread(), pwri s [2002/07/16] threads/40671threads pthread_cancel doesn't remove thread from o [2004/07/13] threads/69020threads pthreads library leaks _gc_mutex 7 problems total. From owner-freebsd-threads@FreeBSD.ORG Tue Sep 14 07:48:55 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F161616A4CE; Tue, 14 Sep 2004 07:48:55 +0000 (GMT) Received: from pimout3-ext.prodigy.net (pimout3-ext.prodigy.net [207.115.63.102]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3075243D49; Tue, 14 Sep 2004 07:48:55 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (adsl-67-126-115-227.dsl.snfc21.pacbell.net [67.126.115.227])i8E7mpPJ215362; Tue, 14 Sep 2004 03:48:52 -0400 Message-ID: <4146A263.2020603@elischer.org> Date: Tue, 14 Sep 2004 00:48:51 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524 X-Accept-Language: en, hu MIME-Version: 1.0 To: Andrew Gallatin References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.52569.375858.857614@grasshopper.cs.duke.edu> In-Reply-To: <16704.52569.375858.857614@grasshopper.cs.duke.edu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Sep 2004 07:48:56 -0000 Andrew Gallatin wrote: > Julian Elischer writes: > > I think that this would possibly GO AWAY of you disab;ed preemption. > > which would make it very hard to debug :-) > > Nope, still happens w/o preempt.. And its the "worse" problem of deadlocking > the system rather than just having the process fail to exit. > > db> ps > pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd > 579 c37e41c0 e8855000 1387 578 579 0004002 [SLPQ ttyin 0xc17df810][SLP] csh > 578 c1817540 e671a000 1387 576 576 0000100 [SLPQ select 0xc06cb704][SLP] sshd > 576 c37e4540 e8857000 0 451 576 0000100 [SLPQ sbwait 0xc1983e84][SLP] sshd > 566 c1a1fc40 e67ba000 1387 1 564 000c482 (threaded) mx_pingpong > thread 0xc37944b0 ksegrp 0xc1a20460 [CPU 0] > thread 0xc3794640 ksegrp 0xc1a20460 [SUSP] > thread 0xc187e320 ksegrp 0xc1a20460 [RUNQ] > thread 0xc187e4b0 ksegrp 0xc187fee0 [CPU 1] > [...] can you reconfirm that this probelm exists without preemption with today's -current? From owner-freebsd-threads@FreeBSD.ORG Tue Sep 14 08:24:36 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BBCDC16A4CE; Tue, 14 Sep 2004 08:24:36 +0000 (GMT) Received: from pimout3-ext.prodigy.net (pimout3-ext.prodigy.net [207.115.63.102]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4B45C43D1D; Tue, 14 Sep 2004 08:24:36 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (adsl-67-126-115-227.dsl.snfc21.pacbell.net [67.126.115.227])i8E8OXPJ138314; Tue, 14 Sep 2004 04:24:34 -0400 Message-ID: <4146AAC1.5020701@elischer.org> Date: Tue, 14 Sep 2004 01:24:33 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524 X-Accept-Language: en, hu MIME-Version: 1.0 To: Andrew Gallatin References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> In-Reply-To: <16704.49447.290897.602540@grasshopper.cs.duke.edu> Content-Type: multipart/mixed; boundary="------------080701070602030008080208" cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Sep 2004 08:24:36 -0000 This is a multi-part message in MIME format. --------------080701070602030008080208 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Andrew Gallatin wrote: > Julian Elischer writes: > > > > > >Maybe this would be easier to debug if I disabled preemption? > > > > > > > > > I think that this would possibly GO AWAY of you disab;ed preemption. > > which would make it very hard to debug :-) > > > > Yes and no. You initially asked me to try in -current because of > some changes you'd made to the exit code. RELENG_5 (with the old > exit code and no preemption) shows a different problem (proc is > just not killable). If the proc was killable without preemption, > that would at least show your new code is better.. try the attached diff: > > Drew --------------080701070602030008080208 Content-Type: text/plain; name="q.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="q.diff" Index: sys/kern/kern_switch.c =========================================================================== --- sys/kern/kern_switch.c 2004/09/14 08:14:48 #76 +++ sys/kern/kern_switch.c 2004/09/14 08:14:48 @@ -350,11 +389,10 @@ } kg->kg_avail_opennings = 1; } - kg->kg_avail_opennings--; sched_add(td, flags); return; } tda = kg->kg_last_assigned; if ((kg->kg_avail_opennings <= 0) && (tda && (tda->td_priority > td->td_priority))) { @@ -415,7 +449,6 @@ td2 = TAILQ_NEXT(tda, td_runq); kg->kg_last_assigned = td2; } - kg->kg_avail_opennings--; sched_add(td2, flags); } else { CTR3(KTR_RUNQ, "setrunqueue: held: td%p kg%p pid%d", Index: sys/kern/sched_4bsd.c =========================================================================== --- sys/kern/sched_4bsd.c 2004/09/14 08:14:48 #64 +++ sys/kern/sched_4bsd.c 2004/09/14 08:14:48 @@ -1024,6 +1024,7 @@ } if ((td->td_proc->p_flag & P_NOLOAD) == 0) sched_tdcnt++; + td->td_ksegrp->kg_avail_opennings--; runq_add(ke->ke_runq, ke); ke->ke_ksegrp->kg_runq_kses++; ke->ke_state = KES_ONRUNQ; Index: sys/kern/sched_ule.c =========================================================================== --- sys/kern/sched_ule.c 2004/09/14 08:14:48 #127 +++ sys/kern/sched_ule.c 2004/09/14 08:14:48 @@ -1773,6 +1773,7 @@ curthread->td_flags |= TDF_NEEDRESCHED; if (preemptive && maybe_preempt(td)) return; + td->td_ksegrp->kg_avail_opennings--; ke->ke_ksegrp->kg_runq_threads++; ke->ke_state = KES_ONRUNQ; --------------080701070602030008080208-- From owner-freebsd-threads@FreeBSD.ORG Tue Sep 14 09:50:07 2004 Return-Path: Delivered-To: freebsd-threads@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4204B16A4CE for ; Tue, 14 Sep 2004 09:50:07 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2981E43D5F for ; Tue, 14 Sep 2004 09:50:07 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) i8E9o7vb028036 for ; Tue, 14 Sep 2004 09:50:07 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.11/8.12.11/Submit) id i8E9o7EW028035; Tue, 14 Sep 2004 09:50:07 GMT (envelope-from gnats) Resent-Date: Tue, 14 Sep 2004 09:50:07 GMT Resent-Message-Id: <200409140950.i8E9o7EW028035@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-threads@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Birju Shah Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 364DB16A4CE for ; Tue, 14 Sep 2004 09:42:49 +0000 (GMT) Received: from www.freebsd.org (www.freebsd.org [216.136.204.117]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1B36143D4C for ; Tue, 14 Sep 2004 09:42:49 +0000 (GMT) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.12.11/8.12.11) with ESMTP id i8E9gmom055148 for ; Tue, 14 Sep 2004 09:42:48 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.12.11/8.12.11/Submit) id i8E9gm1s055147; Tue, 14 Sep 2004 09:42:48 GMT (envelope-from nobody) Message-Id: <200409140942.i8E9gm1s055147@www.freebsd.org> Date: Tue, 14 Sep 2004 09:42:48 GMT From: Birju Shah To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-2.3 Subject: threads/71725: Mysql Crashes frequently giving Sock Error X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Sep 2004 09:50:07 -0000 >Number: 71725 >Category: threads >Synopsis: Mysql Crashes frequently giving Sock Error >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-threads >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Sep 14 09:50:06 GMT 2004 >Closed-Date: >Last-Modified: >Originator: Birju Shah >Release: 4.8 >Organization: >Environment: FreeBSD hecht.ai.net 4.8-RELEASE FreeBSD 4.8-RELEASE #1: Sun Jun 22 05:42:39 EDT 2003 ainet@mercury.ai.net:/usr/src/sys/compile/www i386 >Description: ------------------OUR PROBLEM-------------------------------------------------- We have a autoresponder script(backend mysql) which maintains 100% double optin email lists, All those who submit their site to our search engine are sent a email which states that please confirm your email address, the users are sent a link where they can visit and enter their email and name and confirm their email address. Now the visitors visit the confirmation page and confirm their email address with us. This confirmation is handled by the autoresponder perl script and the backendis mysql. We have kept a monitoring system on the confirmation script and there are times when the autoresponder script gives the timedout error( its a perl script). And when we notice the running process that time on the server, it shows around 50-80 people are simultaneously executing the confirmation script we discussed above. And immediately the mysql crashed and gives a mysql sock error. The only remedy remains is to reboot the server to restart the mysql. We tried restarting mysql without rebooting the server, but nothing happens, it doesnt restart. Once we reboot the server, everything works fine, till again we receive a timedout error and it crashes again. We tried upgrading our RAM from 256 MB to 768MB, on our dedicated server, but the problem remains the same, absolutely no change. Very frankly speaking I am not a technical savy and I am not aware of the mysql techical issue, I have been reading a lot of forums for finding a solution to this issue, but in vain, I am hoping a lot out of you guys as you are the experts in this field. Conclusion: What I have noticed is that initially the mysql server gets overloaded and as soon as it gets overloaded, it starts giving mysql sock error and the only remedy remains is to reboot the server. It seems like a bug in pthread libc_r as i have gone through different forums but i am not sure. My server configuration is PIII 700MHz 256M ram + 512MB ( Recently added) 1X9GB hard drive (SCSI) ------------------------------------------------------------------------ When we check the hostname error log in mysql it shows this, 40903 5:53:06 InnoDB: Starting shutdown... 040903 5:53:08 InnoDB: Shutdown completed 040903 5:53:08 /usr/local/libexec/mysqld: Shutdown Complete 040903 05:53:08 mysqld ended 040903 06:00:10 mysqld started 040903 6:00:10 Warning: setrlimit returned ok, but didn't change limits. Max open files is 14781 (request: 81930) 040903 6:00:10 Warning: Changed limits: max_connections: 14771 table_cache: 64 040903 6:00:10 InnoDB: Started /usr/local/libexec/mysqld: ready for connections. Version: '4.0.18-log' socket: '/tmp/mysql.sock' port: 3306 Fatal error '_pq_remove: Not in priority queue' at line ? in file /usr/src/lib/libc_r/uthread/uthread_priority_queue.c (errno = ?) Number of processes running now: 0 040903 08:10:22 mysqld restarted 040903 8:10:22 Warning: setrlimit returned ok, but didn't change limits. Max open files is 14781 (request: 81930) 040903 8:10:22 Warning: Changed limits: max_connections: 14771 table_cache: 64 040903 8:10:23 InnoDB: Started /usr/local/libexec/mysqld: ready for connections. Version: '4.0.18-log' socket: '/tmp/mysql.sock' port: 3306 040903 13:01:38 mysqld started 040903 13:01:38 Warning: setrlimit returned ok, but didn't change limits. Max open files is 14781 (request: 81930) 040903 13:01:38 Warning: Changed limits: max_connections: 14771 table_cache: 64 040903 13:01:40 InnoDB: Started /usr/local/libexec/mysqld: ready for connections. Version: '4.0.18-log' socket: '/tmp/mysql.sock' port: 3306 040903 14:57:09 /usr/local/libexec/mysqld: Normal shutdown >How-To-Repeat: No fix time, sometimes crashes once in a day and sometimes crashes once in 4 days. >Fix: >Release-Note: >Audit-Trail: >Unformatted: From owner-freebsd-threads@FreeBSD.ORG Tue Sep 14 14:10:19 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 055B316A4CF; Tue, 14 Sep 2004 14:10:19 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 972FA43D2F; Tue, 14 Sep 2004 14:10:18 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i8EEAFJt014562 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 14 Sep 2004 10:10:15 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i8EEA8cS068710; Tue, 14 Sep 2004 10:10:08 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16710.64448.640458.282221@grasshopper.cs.duke.edu> Date: Tue, 14 Sep 2004 10:10:08 -0400 (EDT) To: Julian Elischer In-Reply-To: <4146A263.2020603@elischer.org> References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.52569.375858.857614@grasshopper.cs.duke.edu> <4146A263.2020603@elischer.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Sep 2004 14:10:19 -0000 Julian Elischer writes: > Andrew Gallatin wrote: > > Julian Elischer writes: > > > I think that this would possibly GO AWAY of you disab;ed preemption. > > > which would make it very hard to debug :-) > > > > Nope, still happens w/o preempt.. And its the "worse" problem of deadlocking > > the system rather than just having the process fail to exit. > > > > db> ps > > pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd > > 579 c37e41c0 e8855000 1387 578 579 0004002 [SLPQ ttyin 0xc17df810][SLP] csh > > 578 c1817540 e671a000 1387 576 576 0000100 [SLPQ select 0xc06cb704][SLP] sshd > > 576 c37e4540 e8857000 0 451 576 0000100 [SLPQ sbwait 0xc1983e84][SLP] sshd > > 566 c1a1fc40 e67ba000 1387 1 564 000c482 (threaded) mx_pingpong > > thread 0xc37944b0 ksegrp 0xc1a20460 [CPU 0] > > thread 0xc3794640 ksegrp 0xc1a20460 [SUSP] > > thread 0xc187e320 ksegrp 0xc1a20460 [RUNQ] > > thread 0xc187e4b0 ksegrp 0xc187fee0 [CPU 1] > > > > [...] > > can you reconfirm that this probelm exists without preemption with today's -current? > Yes, it seems to behave exactly the same way for a -current cvsupped one hour ago. Drew From owner-freebsd-threads@FreeBSD.ORG Tue Sep 14 14:34:49 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5ED3E16A4CE; Tue, 14 Sep 2004 14:34:49 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id DB68543D41; Tue, 14 Sep 2004 14:34:46 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i8EEYiJt018443 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 14 Sep 2004 10:34:44 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i8EEYdUc068729; Tue, 14 Sep 2004 10:34:39 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16711.383.448500.578640@grasshopper.cs.duke.edu> Date: Tue, 14 Sep 2004 10:34:39 -0400 (EDT) To: Julian Elischer In-Reply-To: <4146AAC1.5020701@elischer.org> References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Sep 2004 14:34:49 -0000 Julian Elischer writes: > Andrew Gallatin wrote: > > Julian Elischer writes: > > > > > > > >Maybe this would be easier to debug if I disabled preemption? > > > > > > > > > > > > > I think that this would possibly GO AWAY of you disab;ed preemption. > > > which would make it very hard to debug :-) > > > > > > > Yes and no. You initially asked me to try in -current because of > > some changes you'd made to the exit code. RELENG_5 (with the old > > exit code and no preemption) shows a different problem (proc is > > just not killable). If the proc was killable without preemption, > > that would at least show your new code is better.. > > try the attached diff: > This is worse.. Its worse in that the application never starts running fully, and that it seems to ignore signals entirely. I can't attach a debugger to it to see how far it got before hanging due to the signal problem. When it hangs, (both before and after a signal is sent) the CPU utilization is 0%.. Before its sent a signal, it looks like this: 573 c1f3b8c0 e88ae000 1387 517 573 000c082 (threaded) mx_pingpong thread 0xc1f3e320 ksegrp 0xc19ead20 [RUNQ] thread 0xc1f3e4b0 ksegrp 0xc19ead20 [RUNQ] thread 0xc1f3e640 ksegrp 0xc19eaaf0 [SLPQ ksesigwait 0xc1f3b9c0][SLP] db> call db_trace_thread(0xc1f3e320, -1) sched_switch(c1f3e320,0,1,1862ccb2,994777d8) at sched_switch+0x137 mi_switch(1,0,c05fdf59,804c000,c2b8c2ec) at mi_switch+0x1ce turnstile_wait(c1a518c0,c06c53e0,c1a4d7d0,0,1) at turnstile_wait+0x339 _mtx_lock_sleep(c06c53e0,c1f3e320,0,0,0) at _mtx_lock_sleep+0x122 vm_fault(c187a5dc,804c000,1,0,0) at vm_fault+0x214 trap_pfault(e88b8d48,1,804c800,3,804c800) at trap_pfault+0x136 trap(2f,2f,2f,805d13c,805d13c) at trap+0x201 calltrap() at calltrap+0x5 --- trap 0xc, eip = 0x804c800, esp = 0xbfbfe66c, ebp = 0xbfbfe678 --- 0 db> call db_trace_thread(0xc1f3e4b0, -1) sched_switch(c1f3e4b0,0,1,f0007932,9935c3e9) at sched_switch+0x137 mi_switch(1,0,c19ead60,e88bbc5c,c1f3e4b0) at mi_switch+0x1ce sleepq_switch(c19ead60,c1f3e4b0,0,e88bbc94,c04e5da6) at sleepq_switch+0x171 sleepq_timedwait_sig(c19ead60,0,c1f3b92c,c0677640,100) at sleepq_timedwait_sig+0x13 msleep(c19ead60,c1f3b92c,168,c0677640,1771) at msleep+0x37b kse_release(c1f3e4b0,e88bbd14,4,c04c47ab,0) at kse_release+0x29b syscall(2f,2f,2f,8054200,0) at syscall+0x2fc Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280a3d4f, esp = 0x8194f80, ebp = 0x8194fbc --- 0 db> call db_trace_thread(0xc1f3e640, -1) sched_switch(c1f3e640,0,1,bc7c14b2,97d6ec54) at sched_switch+0x137 mi_switch(1,0,0,0,0) at mi_switch+0x1ce sleepq_switch(c1f3b9c0,c1f3e640,0,e88bec94,c04e5da6) at sleepq_switch+0x171 sleepq_timedwait_sig(c1f3b9c0,0,0,0,0) at sleepq_timedwait_sig+0x13 msleep(c1f3b9c0,c1f3b92c,168,c0677635,bb9) at msleep+0x37b kse_release(c1f3e640,e88bed14,4,c04c47ab,0) at kse_release+0x1a1 syscall(2f,2f,2f,1,81) at syscall+0x2fc Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280a3d4f, esp = 0xbfafef30, ebp = 0xbfafef8c --- 0 A different run, but after sending it a ^C from the command line: 547 c1f3b1c0 e88aa000 0 1 547 000c482 (threaded) mx_pingpong thread 0xc1f3e960 ksegrp 0xc19eaee0 [RUNQ] thread 0xc1f3eaf0 ksegrp 0xc19eaee0 [RUNQ] thread 0xc1f3ec80 ksegrp 0xc19eab60 [SUSP] db> call db_trace_thread(0xc1f3e960, -1) sched_switch(c1f3e960,0,2,e7ff39b6,d6d80c8c) at sched_switch+0x137 mi_switch(2,0,0,0,0) at mi_switch+0x1ce ast(e88c4d48) at ast+0x4eb doreti_ast() at doreti_ast+0x17 0 db> call db_trace_thread(0xc1f3eaf0, -1) sched_switch(c1f3eaf0,0,1,6e2ca4e6,d6924d2f) at sched_switch+0x137 mi_switch(1,0,c19eaf20,e88c7c5c,c1f3eaf0) at mi_switch+0x1ce sleepq_switch(c19eaf20,c1f3eaf0,0,e88c7c94,c04e5da6) at sleepq_switch+0x171 sleepq_timedwait_sig(c19eaf20,0,c1f3b22c,c0677640,100) at sleepq_timedwait_sig+0x13 msleep(c19eaf20,c1f3b22c,168,c0677640,1771) at msleep+0x37b kse_release(c1f3eaf0,e88c7d14,4,c04c47ab,0) at kse_release+0x29b syscall(2f,2f,2f,8054200,0) at syscall+0x2fc Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280a3d4f, esp = 0x8194f80, ebp = 0x8194fbc --- 0 db> call db_trace_thread(0xc1f3ec80, -1) sched_switch(c1f3ec80,0,1,26e24232,4249ca0b) at sched_switch+0x137 mi_switch(1,0,0,0,0) at mi_switch+0x1ce thread_single(1,c1f3ec80,c1f3b1c0,e88cac5c,c0500581) at thread_single+0x1d7 exit1(c1f3ec80,2,e88cacb8,c04f1736,0) at exit1+0x115 expand_name(c1f3ec80,2,c1f3ec80,e88cad48,0) at expand_name kse_thr_interrupt(c1f3ec80,e88cad14,c,c1f3ec80,e88cad3c) at kse_thr_interrupt+0x329 syscall(2f,2f,2f,8054100,805a800) at syscall+0x2fc Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (382, FreeBSD ELF32, kse_thr_interrupt), eip = 0x280a3d6f, esp = 0xbfafee60, ebp = 0xbfafeefc --- 0 If you want line number translations, please let me know. I saved the kernel that this came from and also took a dump. Drew From owner-freebsd-threads@FreeBSD.ORG Tue Sep 14 15:36:53 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2AE0416A4CE; Tue, 14 Sep 2004 15:36:53 +0000 (GMT) Received: from pimout1-ext.prodigy.net (pimout1-ext.prodigy.net [207.115.63.77]) by mx1.FreeBSD.org (Postfix) with ESMTP id C3B3843D53; Tue, 14 Sep 2004 15:36:51 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (adsl-68-120-129-148.dsl.snfc21.pacbell.net [68.120.129.148])i8EFajuV087474; Tue, 14 Sep 2004 11:36:47 -0400 Message-ID: <4147100C.8000005@elischer.org> Date: Tue, 14 Sep 2004 08:36:44 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524 X-Accept-Language: en, hu MIME-Version: 1.0 To: Andrew Gallatin References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> In-Reply-To: <16711.383.448500.578640@grasshopper.cs.duke.edu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Sep 2004 15:36:53 -0000 "bugger" Andrew Gallatin wrote: > Julian Elischer writes: > > Andrew Gallatin wrote: > > > Julian Elischer writes: > > > > > > > > > >Maybe this would be easier to debug if I disabled preemption? > > > > > > > > > > > > > > > > > I think that this would possibly GO AWAY of you disab;ed preemption. > > > > which would make it very hard to debug :-) > > > > > > > > > > Yes and no. You initially asked me to try in -current because of > > > some changes you'd made to the exit code. RELENG_5 (with the old > > > exit code and no preemption) shows a different problem (proc is > > > just not killable). If the proc was killable without preemption, > > > that would at least show your new code is better.. > > > > try the attached diff: > > > > This is worse.. > > Its worse in that the application never starts running fully, and that > it seems to ignore signals entirely. I can't attach a debugger to it > to see how far it got before hanging due to the signal problem. When > it hangs, (both before and after a signal is sent) the CPU utilization > is 0%.. Before its sent a signal, it looks like this: > > 573 c1f3b8c0 e88ae000 1387 517 573 000c082 (threaded) mx_pingpong > thread 0xc1f3e320 ksegrp 0xc19ead20 [RUNQ] > thread 0xc1f3e4b0 ksegrp 0xc19ead20 [RUNQ] > thread 0xc1f3e640 ksegrp 0xc19eaaf0 [SLPQ ksesigwait 0xc1f3b9c0][SLP] > > > db> call db_trace_thread(0xc1f3e320, -1) > sched_switch(c1f3e320,0,1,1862ccb2,994777d8) at sched_switch+0x137 > mi_switch(1,0,c05fdf59,804c000,c2b8c2ec) at mi_switch+0x1ce > turnstile_wait(c1a518c0,c06c53e0,c1a4d7d0,0,1) at turnstile_wait+0x339 > _mtx_lock_sleep(c06c53e0,c1f3e320,0,0,0) at _mtx_lock_sleep+0x122 > vm_fault(c187a5dc,804c000,1,0,0) at vm_fault+0x214 > trap_pfault(e88b8d48,1,804c800,3,804c800) at trap_pfault+0x136 > trap(2f,2f,2f,805d13c,805d13c) at trap+0x201 > calltrap() at calltrap+0x5 > --- trap 0xc, eip = 0x804c800, esp = 0xbfbfe66c, ebp = 0xbfbfe678 --- > 0 > > db> call db_trace_thread(0xc1f3e4b0, -1) > sched_switch(c1f3e4b0,0,1,f0007932,9935c3e9) at sched_switch+0x137 > mi_switch(1,0,c19ead60,e88bbc5c,c1f3e4b0) at mi_switch+0x1ce > sleepq_switch(c19ead60,c1f3e4b0,0,e88bbc94,c04e5da6) at sleepq_switch+0x171 > sleepq_timedwait_sig(c19ead60,0,c1f3b92c,c0677640,100) at sleepq_timedwait_sig+0x13 > msleep(c19ead60,c1f3b92c,168,c0677640,1771) at msleep+0x37b > kse_release(c1f3e4b0,e88bbd14,4,c04c47ab,0) at kse_release+0x29b > syscall(2f,2f,2f,8054200,0) at syscall+0x2fc > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280a3d4f, esp = 0x8194f80, ebp = 0x8194fbc --- > 0 > > db> call db_trace_thread(0xc1f3e640, -1) > sched_switch(c1f3e640,0,1,bc7c14b2,97d6ec54) at sched_switch+0x137 > mi_switch(1,0,0,0,0) at mi_switch+0x1ce > sleepq_switch(c1f3b9c0,c1f3e640,0,e88bec94,c04e5da6) at sleepq_switch+0x171 > sleepq_timedwait_sig(c1f3b9c0,0,0,0,0) at sleepq_timedwait_sig+0x13 > msleep(c1f3b9c0,c1f3b92c,168,c0677635,bb9) at msleep+0x37b > kse_release(c1f3e640,e88bed14,4,c04c47ab,0) at kse_release+0x1a1 > syscall(2f,2f,2f,1,81) at syscall+0x2fc > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280a3d4f, esp = 0xbfafef30, ebp = 0xbfafef8c --- > 0 > > > A different run, but after sending it a ^C from the command line: > > 547 c1f3b1c0 e88aa000 0 1 547 000c482 (threaded) mx_pingpong > thread 0xc1f3e960 ksegrp 0xc19eaee0 [RUNQ] > thread 0xc1f3eaf0 ksegrp 0xc19eaee0 [RUNQ] > thread 0xc1f3ec80 ksegrp 0xc19eab60 [SUSP] > > db> call db_trace_thread(0xc1f3e960, -1) > sched_switch(c1f3e960,0,2,e7ff39b6,d6d80c8c) at sched_switch+0x137 > mi_switch(2,0,0,0,0) at mi_switch+0x1ce > ast(e88c4d48) at ast+0x4eb > doreti_ast() at doreti_ast+0x17 > 0 > db> call db_trace_thread(0xc1f3eaf0, -1) > sched_switch(c1f3eaf0,0,1,6e2ca4e6,d6924d2f) at sched_switch+0x137 > mi_switch(1,0,c19eaf20,e88c7c5c,c1f3eaf0) at mi_switch+0x1ce > sleepq_switch(c19eaf20,c1f3eaf0,0,e88c7c94,c04e5da6) at sleepq_switch+0x171 > sleepq_timedwait_sig(c19eaf20,0,c1f3b22c,c0677640,100) at sleepq_timedwait_sig+0x13 > msleep(c19eaf20,c1f3b22c,168,c0677640,1771) at msleep+0x37b > kse_release(c1f3eaf0,e88c7d14,4,c04c47ab,0) at kse_release+0x29b > syscall(2f,2f,2f,8054200,0) at syscall+0x2fc > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280a3d4f, esp = 0x8194f80, ebp = 0x8194fbc --- > 0 > db> call db_trace_thread(0xc1f3ec80, -1) > sched_switch(c1f3ec80,0,1,26e24232,4249ca0b) at sched_switch+0x137 > mi_switch(1,0,0,0,0) at mi_switch+0x1ce > thread_single(1,c1f3ec80,c1f3b1c0,e88cac5c,c0500581) at thread_single+0x1d7 > exit1(c1f3ec80,2,e88cacb8,c04f1736,0) at exit1+0x115 > expand_name(c1f3ec80,2,c1f3ec80,e88cad48,0) at expand_name > kse_thr_interrupt(c1f3ec80,e88cad14,c,c1f3ec80,e88cad3c) at kse_thr_interrupt+0x329 > syscall(2f,2f,2f,8054100,805a800) at syscall+0x2fc > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (382, FreeBSD ELF32, kse_thr_interrupt), eip = 0x280a3d6f, esp = 0xbfafee60, ebp = 0xbfafeefc --- > 0 > > > If you want line number translations, please let me know. I saved the > kernel that this came from and also took a dump. > > Drew From owner-freebsd-threads@FreeBSD.ORG Tue Sep 14 16:06:18 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BF8E616A4CE; Tue, 14 Sep 2004 16:06:18 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5A74E43D1F; Tue, 14 Sep 2004 16:06:18 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i8EG6GJt001935 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 14 Sep 2004 12:06:16 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i8EG6BUF068803; Tue, 14 Sep 2004 12:06:11 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16711.5875.358882.236642@grasshopper.cs.duke.edu> Date: Tue, 14 Sep 2004 12:06:11 -0400 (EDT) To: Julian Elischer In-Reply-To: <4147100C.8000005@elischer.org> References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <4147100C.8000005@elischer.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Sep 2004 16:06:18 -0000 FWIW, for the case where there is one lingering thread, calling thread_unsuspend_one() on it seems to get it to exit.. Maybe there is some sort of race while exiting which causes the wrong number of threads to be either suspended, or unsuspended. If too many are suspended, one is left lingering. If too few are suspended, the system deadlocks because a thread never gets off the cpu. Would it help at all to try with libthr and see what it does? Let me know what more I can do to help get this fixed.. Drew PS: By "one lingering thread", I mean the case I first complained about. Eg: 540 c164e700 e52e1000 1387 1 538 000c482 (threaded) mx_pingpong thread 0xc1fb8320 ksegrp 0xc15bb850 [SUSP] db> tr 540 sched_switch(c1fb8320,0,0,15fc9814,e30bebc7) at sched_switch+0xd8 mi_switch(1,0,e881fc44,c051e6dd,c1fb8320) at mi_switch+0x1c7 thread_single(1,c06eaae0,e881fc64,c164e700,c1fb8320) at thread_single+0x1d7 exit1(c1fb8320,9,0,e881fce4,c051877e) at exit1+0x115 expand_name(c1fb8320,9,100,0,0) at expand_name postsig(9,202,c06e5dd8,17f,8058f84) at postsig+0x204 ast(e881fd48) at ast+0x5e4 doreti_ast() at doreti_ast+0x17 db> call thread_unsuspend_one(0xc1fb8320) 0xc1562640 From owner-freebsd-threads@FreeBSD.ORG Wed Sep 15 00:32:02 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 14EE916A4CE; Wed, 15 Sep 2004 00:32:02 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id E252943D53; Wed, 15 Sep 2004 00:32:01 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id C485E7A425; Tue, 14 Sep 2004 17:32:01 -0700 (PDT) Message-ID: <41478D81.2010005@elischer.org> Date: Tue, 14 Sep 2004 17:32:01 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Andrew Gallatin References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <4147100C.8000005@elischer.org> <16711.5875.358882.236642@grasshopper.cs.duke.edu> In-Reply-To: <16711.5875.358882.236642@grasshopper.cs.duke.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2004 00:32:02 -0000 andrew, if you get a chance, there is a patch at http://www.freebsd.org/~julian/q.diff that has some debugging in it I'd like to see the result of.. if it crashes or hangs processes without triggerring the debugging code then even that tells me something :-) Andrew Gallatin wrote: >FWIW, for the case where there is one lingering thread, calling >thread_unsuspend_one() on it seems to get it to exit.. > >Maybe there is some sort of race while exiting which causes the wrong >number of threads to be either suspended, or unsuspended. If too many >are suspended, one is left lingering. If too few are suspended, the >system deadlocks because a thread never gets off the cpu. > >Would it help at all to try with libthr and see what it does? >Let me know what more I can do to help get this fixed.. > >Drew > >PS: >By "one lingering thread", I mean the case I first complained about. >Eg: > >540 c164e700 e52e1000 1387 1 538 000c482 (threaded) mx_pingpong > thread 0xc1fb8320 ksegrp 0xc15bb850 [SUSP] > >db> tr 540 >sched_switch(c1fb8320,0,0,15fc9814,e30bebc7) at sched_switch+0xd8 >mi_switch(1,0,e881fc44,c051e6dd,c1fb8320) at mi_switch+0x1c7 >thread_single(1,c06eaae0,e881fc64,c164e700,c1fb8320) at >thread_single+0x1d7 >exit1(c1fb8320,9,0,e881fce4,c051877e) at exit1+0x115 >expand_name(c1fb8320,9,100,0,0) at expand_name >postsig(9,202,c06e5dd8,17f,8058f84) at postsig+0x204 >ast(e881fd48) at ast+0x5e4 >doreti_ast() at doreti_ast+0x17 >db> call thread_unsuspend_one(0xc1fb8320) >0xc1562640 > > > > > From owner-freebsd-threads@FreeBSD.ORG Wed Sep 15 08:24:02 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B1F7C16A4CE; Wed, 15 Sep 2004 08:24:02 +0000 (GMT) Received: from pimout3-ext.prodigy.net (pimout3-ext.prodigy.net [207.115.63.102]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3773143D48; Wed, 15 Sep 2004 08:24:02 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (adsl-68-123-124-219.dsl.snfc21.pacbell.net [68.123.124.219])i8F8NwNm168130; Wed, 15 Sep 2004 04:23:59 -0400 Message-ID: <4147FC1E.2010608@elischer.org> Date: Wed, 15 Sep 2004 01:23:58 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524 X-Accept-Language: en, hu MIME-Version: 1.0 To: Andrew Gallatin References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> In-Reply-To: <16711.383.448500.578640@grasshopper.cs.duke.edu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2004 08:24:02 -0000 either of : http://www.freebsd.org/~julian/q.diff or http://www.freebsd.org/~julian/r.diff Might make some difference. today's q.diff has a fix that was missing yesterday. From owner-freebsd-threads@FreeBSD.ORG Wed Sep 15 14:22:58 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DF86516A4CE; Wed, 15 Sep 2004 14:22:58 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5480A43D2F; Wed, 15 Sep 2004 14:22:58 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i8FEMuJt013888 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 15 Sep 2004 10:22:56 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i8FEMoHL070271; Wed, 15 Sep 2004 10:22:50 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16712.20538.804004.90978@grasshopper.cs.duke.edu> Date: Wed, 15 Sep 2004 10:22:50 -0400 (EDT) To: Julian Elischer In-Reply-To: <4147FC1E.2010608@elischer.org> References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <4147FC1E.2010608@elischer.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2004 14:22:59 -0000 Julian Elischer writes: > either of : > http://www.freebsd.org/~julian/q.diff > > or > > http://www.freebsd.org/~julian/r.diff > > Might make some difference. > > today's q.diff has a fix that was missing yesterday. Both seem the same as unpatched head -- app starts, runs normally, then skill -9 -u gallatin leaves threads stuck on the cpu, seeminlgly deadlocking the system. But -- I think I now have a clue as to what's going on. I started a ktrace of the problematic process just before doing the skill -9, and afterwards it kept on tracing. I noticed it was stuck doing this: 569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call 569 mx_pingpong Events dropped. 569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call 569 mx_pingpong Events dropped. 569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call It turns out that the userspace code is basically doing: do { MUTEX_LOCK(&lock); should_exit = work(); MUTEX_UNLOCK(&lock); ioctl(fd, DRIVER_WAIT) } while (!should_exit); return NULL; Changing it to <...> rv = ioctl(fd, DRIVER_WAIT) } while ((rv == 0 || rv == EWOULDBLOCK) && !should_exit); return NULL; Seems like it works around the problem with your r.diff patch applied to head. The ioctl in the driver boils down to a cv_timedwait_sig(), which is where the EINTR is coming from. Even if this is our bug, I think that a user-level bug like this should not be able to deadlock the system... FWIW, even with the fix to the user-level code, we still have the original problem (one lingering thread using no CPU) in RELENG_5. Drew From owner-freebsd-threads@FreeBSD.ORG Wed Sep 15 17:55:56 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DCFBC16A4CE for ; Wed, 15 Sep 2004 17:55:56 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id C0BF143D5A for ; Wed, 15 Sep 2004 17:55:56 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id A8E4B7A43E; Wed, 15 Sep 2004 10:55:56 -0700 (PDT) Message-ID: <4148822C.7000902@elischer.org> Date: Wed, 15 Sep 2004 10:55:56 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Andrew Gallatin References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <4147FC1E.2010608@elischer.org> <16712.20538.804004.90978@grasshopper.cs.duke.edu> In-Reply-To: <16712.20538.804004.90978@grasshopper.cs.duke.edu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2004 17:55:57 -0000 Andrew Gallatin wrote: >Julian Elischer writes: > > either of : > > http://www.freebsd.org/~julian/q.diff > > > > or > > > > http://www.freebsd.org/~julian/r.diff > > > > Might make some difference. > > > > today's q.diff has a fix that was missing yesterday. > >Both seem the same as unpatched head -- app starts, runs normally, >then skill -9 -u gallatin leaves threads stuck on the cpu, seeminlgly >deadlocking the system. > >But -- I think I now have a clue as to what's going on. I started a >ktrace of the problematic process just before doing the skill -9, and >afterwards it kept on tracing. > >I noticed it was stuck doing this: > > 569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call > 569 mx_pingpong Events dropped. > 569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call > 569 mx_pingpong Events dropped. > 569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call > >It turns out that the userspace code is basically doing: > > do { > MUTEX_LOCK(&lock); > should_exit = work(); > MUTEX_UNLOCK(&lock); > ioctl(fd, DRIVER_WAIT) > } while (!should_exit); > return NULL; > >Changing it to > ><...> > rv = ioctl(fd, DRIVER_WAIT) > } while ((rv == 0 || rv == EWOULDBLOCK) && !should_exit); > return NULL; > >Seems like it works around the problem with your r.diff patch applied >to head. The ioctl in the driver boils down to a cv_timedwait_sig(), >which is where the EINTR is coming from. > >Even if this is our bug, I think that a user-level bug like this should >not be able to deadlock the system... > I agree.. the rule is that userland should not be able to crash the system.. so this is a bug either way.. > >FWIW, even with the fix to the user-level code, we still have the >original problem (one lingering thread using no CPU) in RELENG_5. > >Drew > > > > From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 00:16:25 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2C6E716A4CF for ; Thu, 16 Sep 2004 00:16:25 +0000 (GMT) Received: from mproxy.gmail.com (rproxy.gmail.com [64.233.170.193]) by mx1.FreeBSD.org (Postfix) with ESMTP id BF5BE43D45 for ; Thu, 16 Sep 2004 00:16:24 +0000 (GMT) (envelope-from marcus.vinicius.ferreira@gmail.com) Received: by mproxy.gmail.com with SMTP id 77so290568rnk for ; Wed, 15 Sep 2004 17:16:20 -0700 (PDT) Received: by 10.38.99.13 with SMTP id w13mr1146494rnb; Wed, 15 Sep 2004 17:16:20 -0700 (PDT) Received: by 10.38.78.36 with HTTP; Wed, 15 Sep 2004 17:16:20 -0700 (PDT) Message-ID: Date: Wed, 15 Sep 2004 21:16:20 -0300 From: Marcus Vinicius Ferreira To: freebsd-threads@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Marcus Vinicius Ferreira List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 00:16:25 -0000 -- Marcus Vinicius Ferreira From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 07:37:27 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5B1F516A4CE; Thu, 16 Sep 2004 07:37:27 +0000 (GMT) Received: from pimout1-ext.prodigy.net (pimout1-ext.prodigy.net [207.115.63.77]) by mx1.FreeBSD.org (Postfix) with ESMTP id ECF2743D41; Thu, 16 Sep 2004 07:37:26 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (adsl-68-123-125-25.dsl.snfc21.pacbell.net [68.123.125.25])i8G7bOWC335864; Thu, 16 Sep 2004 03:37:24 -0400 Message-ID: <414942B3.1060703@elischer.org> Date: Thu, 16 Sep 2004 00:37:23 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524 X-Accept-Language: en, hu MIME-Version: 1.0 To: Andrew Gallatin References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> In-Reply-To: <16711.383.448500.578640@grasshopper.cs.duke.edu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 07:37:27 -0000 Andrew, please try -current on ts own now.. I have checked in some fixes that have helped others. From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 09:37:23 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 41CBE16A4CE; Thu, 16 Sep 2004 09:37:23 +0000 (GMT) Received: from silver.he.iki.fi (helenius.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id 01AE843D46; Thu, 16 Sep 2004 09:37:22 +0000 (GMT) (envelope-from pete@he.iki.fi) Received: from [195.163.185.142] (i2-142.rommon.fi [195.163.185.142]) by silver.he.iki.fi (8.12.10/8.11.4) with ESMTP id i8G9aum1057913; Thu, 16 Sep 2004 12:36:57 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <41495EBD.7030501@he.iki.fi> Date: Thu, 16 Sep 2004 12:37:01 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040803 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Julian Elischer References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <414942B3.1060703@elischer.org> In-Reply-To: <414942B3.1060703@elischer.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: Andrew Gallatin cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 09:37:23 -0000 Julian Elischer wrote: > Andrew, please try -current on ts own now.. > I have checked in some fixes that have helped others. Will these make it to 5-STABLE / 5.3-RELEASE? (just concerned about how good will libpthread be on the release we plan to live off for quite a while) Pete From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 12:51:07 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BB27E16A4CE; Thu, 16 Sep 2004 12:51:07 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5188E43D1F; Thu, 16 Sep 2004 12:51:07 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i8GCp4Jt025439 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 16 Sep 2004 08:51:04 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i8GCowwL071743; Thu, 16 Sep 2004 08:50:58 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16713.35890.516192.596992@grasshopper.cs.duke.edu> Date: Thu, 16 Sep 2004 08:50:58 -0400 (EDT) To: Julian Elischer In-Reply-To: <414942B3.1060703@elischer.org> References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <414942B3.1060703@elischer.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 12:51:07 -0000 Julian Elischer writes: > Andrew, please try -current on ts own now.. > I have checked in some fixes that have helped others. I just tried, and had 2 different results. 2 system lockups, and one lingering thread. This is with PREEMPTION. I'm going to try again in a second w/o PREEMPTION. The last system lockup was kinda interesting, here are some details. For all my test setups, there has been one mx_pingpong running as root, and one mx_pingpong running as me. After the skill, a vmstat (running as root) kept going, and showed that the test was still running (like the signal bounced off of it). Further confirmation is that the mx_pingpong running as root exited normally, indicating that the other side had run to completion. I then killed vmstat and did a 'ps ax'. The ps got stuck on the skill'ed mx_pingpong's proc lock (note the address passed to the mtx_lock in the ps's frame). At this point, it looked like this: KDB: enter: Line break on console [thread 100146] Stopped at kdb_enter+0x30: leave db> sho pcpu cpuid = 0 curthread = 0xc1a15960: pid 561 "ps" curpcb = 0xe67b2da0 fpcurthread = none idlethread = 0xc1561640: pid 12 "idle: cpu0" APIC ID = 0 currentldt = 0x30 db> pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd 561 c1a14a80 e67de000 0 541 561 0004002 [CPU 0] ps 551 c1647e00 e5321000 1387 1 549 000c482 (threaded) mx_pingpong thread 0xc1646c80 ksegrp 0xc15ba690 [CPU 1] thread 0xc1646af0 ksegrp 0xc15ba690 [SUSP] 541 c1a18c40 e67e8000 0 538 541 0004002 [SLPQ pause 0xc1a18c78][SLP] csh <...> db> tr kdb_enter(c066f281,46,40,c16f3140,e67b2b14) at kdb_enter+0x30 siointr1(c1637800,0,c066f049,6ad,e67b2afc) at siointr1+0xd1 siointr(c1637800,0,c06a19a0,0,4) at siointr+0x35 intr_execute_handlers(c1556e90,e67b2b14,e67b2b74,c061bf53,34) at intr_execute_handlers+0xb8 lapic_handle_intr(34) at lapic_handle_intr+0x3b Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc04cd32b, esp = 0xe67b2b58, ebp = 0xe67b2b74 --- _mtx_lock_sleep(c1647e6c,c1a15960,0,c065b894,3c5) at _mtx_lock_sleep+0x12e _mtx_lock_flags(c1647e6c,0,c065b894,3c5,0) at _mtx_lock_flags+0x9f sysctl_kern_proc(c0687d00,e67b2c88,0,e67b2c10,e67b2c10) at sysctl_kern_proc+0x241 sysctl_root(0,e67b2c7c,3,e67b2c10,c1a15960) at sysctl_root+0x13b userland_sysctl(c1a15960,e67b2c7c,3,0,bfbfe28c) at userland_sysctl+0x11c __sysctl(c1a15960,e67b2d14,18,8053000,6) at __sysctl+0xb0 syscall(2f,2f,2f,bfbfe28c,bfbfe2c0) at syscall+0x271 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x280f3ee7, esp = 0xbfbfe22c, ebp = 0xbfbfe258 --- According to gdb: 0xc04d085d is in sysctl_kern_proc (../../../kern/kern_proc.c:965). 960 if (p->p_state == PRS_NEW) { 961 mtx_unlock_spin(&sched_lock); 962 continue; 963 } 964 mtx_unlock_spin(&sched_lock); 965 PROC_LOCK(p); 966 /* 967 * Show a user only appropriate processes. 968 */ 969 if (p_cansee(curthread, p)) { db> call db_trace_thread(0xc1646c80, -1) sched_switch(c1646c80,c159f190,2,117,6a5c13ea) at sched_switch+0x16e mi_switch(2,c1646c80,c1646c80,c06ad340,4) at mi_switch+0x2ad maybe_preempt(e52d1bec,e52d1b78,c04e7482,c06ad340,c1646c80) at maybe_preempt+0x192 (null)(0,c1646c88,0,c1646c90,0) at 0x240 end(c15ba690,c15ba694,c1646c80,c1646af8,c1646af0) at 0xc15ba690 end(c15e4460,c15e4464,c187a960,c187a968,0) at 0xc1a14a80 <...> db> call db_trace_thread(0xc1646af0, -1) sched_switch(c1646af0,0,1,11d,4b34ccaa) at sched_switch+0x16e mi_switch(1,0,c065cd70,335,c1647e6c) at mi_switch+0x2ad thread_single(1,0,c0659772,88,e52cec70) at thread_single+0x1d7 exit1(c1646af0,9,c065c386,996,1) at exit1+0xd5 expand_name(c1646af0,9,c065c386,928,0) at expand_name postsig(9,0,c065f070,100,1020800) at postsig+0x1e0 ast(e52ced48) at ast+0x46e doreti_ast() at doreti_ast+0x17 0 Drew From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 13:42:33 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3123016A4CE; Thu, 16 Sep 2004 13:42:33 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id C0A3743D4C; Thu, 16 Sep 2004 13:42:32 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i8GDgVJt003807 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 16 Sep 2004 09:42:31 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i8GDgP7E071783; Thu, 16 Sep 2004 09:42:25 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16713.38977.864343.415015@grasshopper.cs.duke.edu> Date: Thu, 16 Sep 2004 09:42:25 -0400 (EDT) To: Julian Elischer In-Reply-To: <414942B3.1060703@elischer.org> References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <414942B3.1060703@elischer.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 13:42:33 -0000 Julian Elischer writes: > Andrew, please try -current on ts own now.. > I have checked in some fixes that have helped others. OK, preemption off... Still a system lockup, but a little different. The interesting thing here is that continuing and breaking into the debugger repeatedly seems to show that thread 0xc1646af0 is looping in exit. I've seen him in thread_single, thread_suspend_check, and in exit itself at kern_exit.c:163, etc. A breakpoint in thread_suspend_one never triggers, so I guess he's holding the proc lock and just looping forever. A breakpoint in _mtx_assert() shows him asserting the proc lock in thread_suspend_check at kern_thread.c:898. Over and over. I don't know how to figure out where the other cpu-bound thread is. A ktrace does not show it bouncing around in our driver's ioctl handler. If you have a KTR mask you think might be helpful, I'd be happy to build a ktr kernel to try to get more info from the thread on CPU1. Drew [halt - sent] KDB: enter: Line break on console [thread 100097] Stopped at kdb_enter+0x30: leave db> sho pcpu cpuid = 0 curthread = 0xc1646af0: pid 575 "mx_pingpong" curpcb = 0xe52ceda0 fpcurthread = none idlethread = 0xc1561640: pid 12 "idle: cpu0" APIC ID = 0 currentldt = 0x30 db> tr kdb_enter(c066f1a0,c063158a,a0,c16f3140,e52ceba8) at kdb_enter+0x30 siointr1(c1637800,0,c066ef68,6ad,e52ceb90) at siointr1+0xd1 siointr(c1637800,c06a18c0,c065cd10,e52ceb9c,4) at siointr+0x35 intr_execute_handlers(c1556e90,e52ceba8,e52cec08,c061bf03,34) at intr_execute_handlers+0xb8 lapic_handle_intr(34) at lapic_handle_intr+0x3b Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc04cd58d, esp = 0xe52cebec, ebp = 0xe52cec08 --- _mtx_assert(c186de6c,1,c065cd10,382,c186de00) at _mtx_assert+0xc thread_suspend_check(0,0,c0659712,88,e52cec68) at thread_suspend_check+0x59 exit1(c1646af0,9,c065c326,996,1) at exit1+0xc9 expand_name(c1646af0,9,c065c326,928,0) at expand_name postsig(9,0,c065ef8f,100,1020800) at postsig+0x1e0 ast(e52ced48) at ast+0x46e doreti_ast() at doreti_ast+0x17 db> ps pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd 575 c186de00 e6772000 1387 1 573 000c482 (threaded) mx_pingpong thread 0xc1646af0 ksegrp 0xc1871070 [CPU 0] thread 0xc1646c80 ksegrp 0xc1871070 [SUSP] thread 0xc1646e10 ksegrp 0xc1871070 [RUNQ] thread 0xc1648000 ksegrp 0xc15ba230 [CPU 1] db> call db_trace_thread(0xc1646c80, 10) sched_switch(c1646c80,c1646af0,1,11d,a273455a) at sched_switch+0x16e mi_switch(1,c1646af0,c065cd10,335,c186de6c) at mi_switch+0x2ad thread_single(1,0,c0659712,88,67e8ac52) at thread_single+0x1d7 exit1(c1646c80,9,c065c326,996,1) at exit1+0xd5 expand_name(c1646c80,9,c065c326,928,0) at expand_name postsig(9,0,c065ef8f,100,1020800) at postsig+0x1e0 ast(e52d1d48) at ast+0x46e doreti_ast() at doreti_ast+0x17 0 db> call db_trace_thread(0xc1646e10, 10) sched_switch(c1646e10,0,2,117,8da55b4a) at sched_switch+0x16e mi_switch(2,0,c065ef8f,f5,1010000) at mi_switch+0x2ad ast(e52d4d48) at ast+0x3c1 doreti_ast() at doreti_ast+0x17 0 db> call db_trace_thread(0xc1648000, 10) sched_switch(18e,3a99,c15ba230,1e,0) at sched_switch+0x16e __func__.0() at __func__.0+0xacd5 0 From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 15:53:47 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C279116A4CF for ; Thu, 16 Sep 2004 15:53:47 +0000 (GMT) Received: from pimout1-ext.prodigy.net (pimout1-ext.prodigy.net [207.115.63.77]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5140143D53 for ; Thu, 16 Sep 2004 15:53:47 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (adsl-68-120-128-124.dsl.snfc21.pacbell.net [68.120.128.124])i8GFrfWC400416; Thu, 16 Sep 2004 11:53:42 -0400 Message-ID: <4149B704.7050801@elischer.org> Date: Thu, 16 Sep 2004 08:53:40 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524 X-Accept-Language: en, hu MIME-Version: 1.0 To: Petri Helenius References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <414942B3.1060703@elischer.org> <41495EBD.7030501@he.iki.fi> In-Reply-To: <41495EBD.7030501@he.iki.fi> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: Andrew Gallatin cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 15:53:47 -0000 Petri Helenius wrote: > Julian Elischer wrote: > >> Andrew, please try -current on ts own now.. >> I have checked in some fixes that have helped others. > > > Will these make it to 5-STABLE / 5.3-RELEASE? > (just concerned about how good will libpthread be on the release we plan > to live off for quite a while) > > Pete as soon as more people have load tested it in -curent we plan to MT5. From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 16:28:33 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E3D4F16A4D0; Thu, 16 Sep 2004 16:28:33 +0000 (GMT) Received: from bps.jodocus.org (g157016.upc-g.chello.nl [80.57.157.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1412A43D31; Thu, 16 Sep 2004 16:28:33 +0000 (GMT) (envelope-from joost@jodocus.org) Received: from jodocus.org (localhost [127.0.0.1]) by bps.jodocus.org (8.13.1/8.12.10) with ESMTP id i8GGSS5q000895; Thu, 16 Sep 2004 18:28:28 +0200 (CEST) (envelope-from joost@jodocus.org) Received: (from joost@localhost) by jodocus.org (8.13.1/8.12.10/Submit) id i8GGSS3u000894; Thu, 16 Sep 2004 18:28:28 +0200 (CEST) (envelope-from joost) Date: Thu, 16 Sep 2004 18:28:28 +0200 From: Joost Bekkers To: Julian Elischer , David Xu Message-ID: <20040916162828.GA855@bps.jodocus.org> Mail-Followup-To: Joost Bekkers , Julian Elischer , David Xu , freebsd-threads@freebsd.org References: <20040912141838.GA89862@bps.jodocus.org> <414489FF.3090705@elischer.org> <414645C8.8070001@elischer.org> <20040914140002.GA32528@bps.jodocus.org> <41494310.40907@elischer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <41494310.40907@elischer.org> User-Agent: Mutt/1.4.2.1i cc: freebsd-threads@freebsd.org Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 16:28:34 -0000 On Thu, Sep 16, 2004 at 12:38:56AM -0700, Julian Elischer wrote: > > I checked in David's patch, which may fox this.. > try -current . > I'm not experiencing the problem anymore. thanks. -- greetz Joost joost@jodocus.org From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 18:45:10 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8D17D16A4CE for ; Thu, 16 Sep 2004 18:45:10 +0000 (GMT) Received: from transport.cksoft.de (transport.cksoft.de [62.111.66.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id A778643D39 for ; Thu, 16 Sep 2004 18:45:09 +0000 (GMT) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from transport.cksoft.de (localhost [127.0.0.1]) by transport.cksoft.de (Postfix) with ESMTP id 940E41FFDDC for ; Thu, 16 Sep 2004 20:45:07 +0200 (CEST) Received: by transport.cksoft.de (Postfix, from userid 66) id 894351FFDD6; Thu, 16 Sep 2004 20:45:05 +0200 (CEST) Received: by mail.int.zabbadoz.net (Postfix, from userid 1060) id B991615384; Thu, 16 Sep 2004 18:43:25 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.int.zabbadoz.net (Postfix) with ESMTP id AEC5815380 for ; Thu, 16 Sep 2004 18:43:26 +0000 (UTC) Date: Thu, 16 Sep 2004 18:43:26 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@e0-0.zab2.int.zabbadoz.net To: freebsd-threads@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS cksoft-s20020300-20031204bz on transport.cksoft.de Subject: assert in _lock_acquire ? X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 18:45:10 -0000 Hi, I am using a global mutex to serialize a longer debugging output amongst threads. As this is only used for internal debugging builds where I really want to see everything I do not care about performance etc. Recently I ran into problems with that - getting a core dump at the same place in this debugging function. It takes some time but I can always reproduce it. I have seen this the last weeks with at least HEAD/ULE and since yesterday with RELENG_5/4BSD. The first debugging log where I can find it is dated 20040802. At that time the machine must have been running a 5-CURRENT from around 20040625. So I finally built libpthread with env DEBUG_FLAGS=-g make all and even linked with libpthread.a instead of using the shared lib. Here's the relevant part of the backtrace: ------ cut ------- (gdb) bt full #0 _lock_acquire (lck=0x38, lu=0x80da034, prio=56) at /u1/src/src/RELENG_5/compile-20040914-1630/lib/libpthread/sys/lock.c:168 i = 135110708 lval = 672675788 __func__ = "_lock_acquire" #1 0x08076151 in mutex_handoff (curthread=0x80ee000, mutex=0x80d8980) at /u1/src/src/RELENG_5/compile-20040914-1630/lib/libpthread/thread/thr_mutex.c:1586 kmbx = (struct kse_mailbox *) 0x1 pthread = (struct pthread *) 0x80d7b80 #2 0x08075166 in mutex_unlock_common (m=0x8092d6c, add_reference=0) at /u1/src/src/RELENG_5/compile-20040914-1630/lib/libpthread/thread/thr_mutex.c:1026 curthread = (struct pthread *) 0x80ee000 kmbx = (struct kse_mailbox *) 0x0 ret = 0 #3 0x08074c24 in _pthread_mutex_unlock (m=0x8092d6c) at /u1/src/src/RELENG_5/compile-20040914-1630/lib/libpthread/thread/thr_mutex.c:879 No locals. .... #9 0x0806ed84 in thread_start (curthread=0x80ee000, start_routine=0x806d62c , arg=0x0) at /u1/src/src/RELENG_5/compile-20040914-1630/lib/libpthread/thread/thr_create.c:342 No locals. #10 0x2815dcbf in _ctx_start () from /lib/libc.so.5 No symbol table info available. ------ cut ------- it seems I am running into an assert() in _lock_acquire. does this make any sense ? -- Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 19:34:11 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EA7D816A4CE for ; Thu, 16 Sep 2004 19:34:11 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9AA8C43D53 for ; Thu, 16 Sep 2004 19:34:11 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i8GJY6ML029673; Thu, 16 Sep 2004 15:34:06 -0400 (EDT) Date: Thu, 16 Sep 2004 15:34:01 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: "Bjoern A. Zeeb" In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: freebsd-threads@freebsd.org Subject: Re: assert in _lock_acquire ? X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 19:34:12 -0000 On Thu, 16 Sep 2004, Bjoern A. Zeeb wrote: > Hi, > > I am using a global mutex to serialize a longer debugging > output amongst threads. As this is only used for internal > debugging builds where I really want to see everything > I do not care about performance etc. Where did you introduce the global mutex? In your application or in libpthread or libc sources? > Recently I ran into problems with that - getting a core dump > at the same place in this debugging function. It takes some > time but I can always reproduce it. > > I have seen this the last weeks with at least HEAD/ULE and since > yesterday with RELENG_5/4BSD. The first debugging log where I can > find it is dated 20040802. At that time the machine must have > been running a 5-CURRENT from around 20040625. > > So I finally built libpthread with > env DEBUG_FLAGS=-g make all > and even linked with libpthread.a instead of using the shared lib. > > Here's the relevant part of the backtrace: > > ------ cut ------- > (gdb) bt full > #0 _lock_acquire (lck=0x38, lu=0x80da034, prio=56) at /u1/src/src/RELENG_5/compile-20040914-1630/lib/libpthread/sys/lock.c:168 > i = 135110708 > lval = 672675788 > __func__ = "_lock_acquire" > #1 0x08076151 in mutex_handoff (curthread=0x80ee000, mutex=0x80d8980) at /u1/src/src/RELENG_5/compile-20040914-1630/lib/libpthread/thread/thr_mutex.c:1586 > kmbx = (struct kse_mailbox *) 0x1 The kse_mailbox has become corrupted. If you are using %gs for anything, that could be the cause. %gs is reserved for the threads libraries. -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 19:55:09 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6361316A4CF; Thu, 16 Sep 2004 19:55:09 +0000 (GMT) Received: from transport.cksoft.de (transport.cksoft.de [62.111.66.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id B418243D41; Thu, 16 Sep 2004 19:55:08 +0000 (GMT) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from transport.cksoft.de (localhost [127.0.0.1]) by transport.cksoft.de (Postfix) with ESMTP id 2B70C1FFDDC; Thu, 16 Sep 2004 21:55:07 +0200 (CEST) Received: by transport.cksoft.de (Postfix, from userid 66) id 188901FFDD7; Thu, 16 Sep 2004 21:55:05 +0200 (CEST) Received: by mail.int.zabbadoz.net (Postfix, from userid 1060) id 1DFA9156A7; Thu, 16 Sep 2004 19:50:21 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.int.zabbadoz.net (Postfix) with ESMTP id 1B24F1569F; Thu, 16 Sep 2004 19:50:22 +0000 (UTC) Date: Thu, 16 Sep 2004 19:50:22 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@e0-0.zab2.int.zabbadoz.net To: Daniel Eischen In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS cksoft-s20020300-20031204bz on transport.cksoft.de cc: freebsd-threads@freebsd.org Subject: Re: assert in _lock_acquire ? X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 19:55:09 -0000 On Thu, 16 Sep 2004, Daniel Eischen wrote: > On Thu, 16 Sep 2004, Bjoern A. Zeeb wrote: > > > I am using a global mutex to serialize a longer debugging > > output amongst threads. As this is only used for internal > > debugging builds where I really want to see everything > > I do not care about performance etc. > > Where did you introduce the global mutex? In your application > or in libpthread or libc sources? application; initialized from main before any further threads started. > > ------ cut ------- > > (gdb) bt full > > #0 _lock_acquire (lck=0x38, lu=0x80da034, prio=56) at /u1/src/src/RELENG_5/compile-20040914-1630/lib/libpthread/sys/lock.c:168 > > i = 135110708 > > lval = 672675788 > > __func__ = "_lock_acquire" > > #1 0x08076151 in mutex_handoff (curthread=0x80ee000, mutex=0x80d8980) at /u1/src/src/RELENG_5/compile-20040914-1630/lib/libpthread/thread/thr_mutex.c:1586 > > kmbx = (struct kse_mailbox *) 0x1 > > The kse_mailbox has become corrupted. If you are using %gs for anything, > that could be the cause. %gs is reserved for the threads libraries. also lck=0x38 looked odd to me but this my be a result of corrupted kmbx. what is %gs btw ? is there an (easy) way I can get to know when this happens ? -- Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 20:03:29 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 80BEA16A4CE for ; Thu, 16 Sep 2004 20:03:29 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 16DF443D45 for ; Thu, 16 Sep 2004 20:03:29 +0000 (GMT) (envelope-from eischen@vigrid.com) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i8GK3SML014336; Thu, 16 Sep 2004 16:03:28 -0400 (EDT) Date: Thu, 16 Sep 2004 16:03:27 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: "Bjoern A. Zeeb" In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: freebsd-threads@freebsd.org Subject: Re: assert in _lock_acquire ? X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 20:03:29 -0000 On Thu, 16 Sep 2004, Bjoern A. Zeeb wrote: > On Thu, 16 Sep 2004, Daniel Eischen wrote: > > > On Thu, 16 Sep 2004, Bjoern A. Zeeb wrote: > > > > > I am using a global mutex to serialize a longer debugging > > > output amongst threads. As this is only used for internal > > > debugging builds where I really want to see everything > > > I do not care about performance etc. > > > > Where did you introduce the global mutex? In your application > > or in libpthread or libc sources? > > application; initialized from main before any further threads started. > > > > > ------ cut ------- > > > (gdb) bt full > > > #0 _lock_acquire (lck=0x38, lu=0x80da034, prio=56) at /u1/src/src/RELENG_5/compile-20040914-1630/lib/libpthread/sys/lock.c:168 > > > i = 135110708 > > > lval = 672675788 > > > __func__ = "_lock_acquire" > > > #1 0x08076151 in mutex_handoff (curthread=0x80ee000, mutex=0x80d8980) at /u1/src/src/RELENG_5/compile-20040914-1630/lib/libpthread/thread/thr_mutex.c:1586 > > > kmbx = (struct kse_mailbox *) 0x1 > > > > The kse_mailbox has become corrupted. If you are using %gs for anything, > > that could be the cause. %gs is reserved for the threads libraries. > > also lck=0x38 looked odd to me but this my be a result of corrupted > kmbx. > > what is %gs btw ? An i386 segment register. The older NVidia drivers used %gs and thus could not work with libpthread (or libthr). Any messages from the kernel about static LDT allocation are also hints that something is using %gs. I suspect your application is using or calling something that is changing %gs. -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 21:18:02 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C510A16A4CE; Thu, 16 Sep 2004 21:18:02 +0000 (GMT) Received: from bps.jodocus.org (g157016.upc-g.chello.nl [80.57.157.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 21A7943D48; Thu, 16 Sep 2004 21:18:02 +0000 (GMT) (envelope-from joost@jodocus.org) Received: from jodocus.org (localhost [127.0.0.1]) by bps.jodocus.org (8.13.1/8.12.10) with ESMTP id i8GLHvl7004922; Thu, 16 Sep 2004 23:17:57 +0200 (CEST) (envelope-from joost@jodocus.org) Received: (from joost@localhost) by jodocus.org (8.13.1/8.12.10/Submit) id i8GLHvmY004921; Thu, 16 Sep 2004 23:17:57 +0200 (CEST) (envelope-from joost) Date: Thu, 16 Sep 2004 23:17:57 +0200 From: Joost Bekkers To: Julian Elischer , David Xu , freebsd-threads@freebsd.org Message-ID: <20040916211757.GA4830@bps.jodocus.org> Mail-Followup-To: Joost Bekkers , Julian Elischer , David Xu , freebsd-threads@freebsd.org References: <20040912141838.GA89862@bps.jodocus.org> <414489FF.3090705@elischer.org> <414645C8.8070001@elischer.org> <20040914140002.GA32528@bps.jodocus.org> <41494310.40907@elischer.org> <20040916162828.GA855@bps.jodocus.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040916162828.GA855@bps.jodocus.org> User-Agent: Mutt/1.4.2.1i Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 21:18:02 -0000 On Thu, Sep 16, 2004 at 06:28:28PM +0200, Joost Bekkers wrote: > On Thu, Sep 16, 2004 at 12:38:56AM -0700, Julian Elischer wrote: > > > > I checked in David's patch, which may fox this.. > > try -current . > > > > I'm not experiencing the problem anymore. > Celebrated too soon.... Signals are not being delivered to the process after it did its execv. The only signal that seems to be working is KILL (-9) -- greetz Joost joost@jodocus.org From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 22:07:50 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 682DD16A4CE; Thu, 16 Sep 2004 22:07:50 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6CB5643D54; Thu, 16 Sep 2004 22:07:49 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id 1F7D67A3D2; Thu, 16 Sep 2004 15:07:49 -0700 (PDT) Message-ID: <414A0EB4.5060304@elischer.org> Date: Thu, 16 Sep 2004 15:07:48 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Joost Bekkers References: <20040912141838.GA89862@bps.jodocus.org> <414489FF.3090705@elischer.org> <414645C8.8070001@elischer.org> <20040914140002.GA32528@bps.jodocus.org> <41494310.40907@elischer.org> <20040916162828.GA855@bps.jodocus.org> <20040916211757.GA4830@bps.jodocus.org> In-Reply-To: <20040916211757.GA4830@bps.jodocus.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: David Xu cc: freebsd-threads@freebsd.org Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 22:07:50 -0000 Ick. Joost Bekkers wrote: >On Thu, Sep 16, 2004 at 06:28:28PM +0200, Joost Bekkers wrote: > > >>On Thu, Sep 16, 2004 at 12:38:56AM -0700, Julian Elischer wrote: >> >> >>>I checked in David's patch, which may fox this.. >>>try -current . >>> >>> >>> >>I'm not experiencing the problem anymore. >> >> >> > >Celebrated too soon.... > >Signals are not being delivered to the process after it did >its execv. > >The only signal that seems to be working is KILL (-9) > the man page is: (for execve) Signals set to be ignored in the calling process are set to be ignored in the new process. Signals which are set to be caught in the calling process image are set to default action in the new process image. Blocked signals remain blocked regardless of changes to the signal action. The signal stack is reset to be undefined (see sigaction(2) for more information). so we need to keep track of all signals accepted by the process (which is an OR of the signals accepted by all the threads) and set it back to that state regardless of what thread is doing the exit. (yuck that is quite a difficult question) I wonder if the "signal gatherring thread" has that info? Maybe if the signal thread exits it should look to see if the process is exec/exiting (by looking at the thread_single mode) and transfer its mask to teh 'survicor' thread? David? > > > From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 22:46:13 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 47B8916A4CE; Thu, 16 Sep 2004 22:46:13 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2010543D31; Thu, 16 Sep 2004 22:46:13 +0000 (GMT) (envelope-from davidxu@freebsd.org) Received: from [127.0.0.1] (davidxu@localhost [127.0.0.1]) i8GMkBMb073822; Thu, 16 Sep 2004 22:46:12 GMT (envelope-from davidxu@freebsd.org) Message-ID: <414A17C8.30703@freebsd.org> Date: Fri, 17 Sep 2004 06:46:32 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.7.1) Gecko/20040730 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Julian Elischer References: <20040912141838.GA89862@bps.jodocus.org> <414489FF.3090705@elischer.org> <414645C8.8070001@elischer.org> <20040914140002.GA32528@bps.jodocus.org> <41494310.40907@elischer.org> <20040916162828.GA855@bps.jodocus.org> <20040916211757.GA4830@bps.jodocus.org> <414A0EB4.5060304@elischer.org> In-Reply-To: <414A0EB4.5060304@elischer.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: Daniel Eischen cc: freebsd-threads@freebsd.org Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 22:46:13 -0000 Julian Elischer wrote: > Ick. > > > Joost Bekkers wrote: > >> On Thu, Sep 16, 2004 at 06:28:28PM +0200, Joost Bekkers wrote: >> >> >>> On Thu, Sep 16, 2004 at 12:38:56AM -0700, Julian Elischer wrote: >>> >>> >>>> I checked in David's patch, which may fox this.. >>>> try -current . >>>> >>>> >>> >>> I'm not experiencing the problem anymore. >>> >>> >> >> >> Celebrated too soon.... >> >> Signals are not being delivered to the process after it did >> its execv. >> >> The only signal that seems to be working is KILL (-9) >> > > the man page is: (for execve) > Signals set to be ignored in the calling process are set to be > ignored in > the new process. Signals which are set to be caught in the calling > process image are set to default action in the new process image. > Blocked signals remain blocked regardless of changes to the signal > action. The signal stack is reset to be undefined (see > sigaction(2) for > more information). > > so we need to keep track of all signals accepted by the process (which > is an > OR of the signals accepted by all the threads) and set it back to that > state > regardless of what thread is doing the exit. > (yuck that is quite a difficult question) I wonder if the "signal > gatherring thread" > has that info? > > Maybe if the signal thread exits it should look to see if the process > is exec/exiting > (by looking at the thread_single mode) and transfer its mask to teh > 'survicor' thread? > > David? > I think this becauses the M:N thread masks all signals except SIGSTOP and SIGKILL, the real signal mask in userland needs to be set back to kernel, libpthread should provide a wrapper for execv syscall, Dan? fix me if I am wrong. Posix says: The initial thread of the new process shall inherit at least the following attributes from the calling thread: * Signal mask (see /sigprocmask/() and /pthread_sigmask/() ) * Pending signals (see /sigpending/() ) * From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 22:58:53 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3938716A4CF; Thu, 16 Sep 2004 22:58:53 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id C986043D39; Thu, 16 Sep 2004 22:58:52 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i8GMwoML026293; Thu, 16 Sep 2004 18:58:50 -0400 (EDT) Date: Thu, 16 Sep 2004 18:58:50 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: David Xu In-Reply-To: <414A17C8.30703@freebsd.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: Julian Elischer cc: freebsd-threads@freebsd.org Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 22:58:53 -0000 On Fri, 17 Sep 2004, David Xu wrote: > Julian Elischer wrote: > > > Ick. > > > > > > Joost Bekkers wrote: > > > >> Celebrated too soon.... > >> > >> Signals are not being delivered to the process after it did > >> its execv. > >> > >> The only signal that seems to be working is KILL (-9) > >> > > > > the man page is: (for execve) > > Signals set to be ignored in the calling process are set to be > > ignored in > > the new process. Signals which are set to be caught in the calling > > process image are set to default action in the new process image. > > Blocked signals remain blocked regardless of changes to the signal > > action. The signal stack is reset to be undefined (see > > sigaction(2) for > > more information). > > > > so we need to keep track of all signals accepted by the process (which > > is an > > OR of the signals accepted by all the threads) and set it back to that > > state > > regardless of what thread is doing the exit. > > (yuck that is quite a difficult question) I wonder if the "signal > > gatherring thread" > > has that info? > > > > Maybe if the signal thread exits it should look to see if the process > > is exec/exiting > > (by looking at the thread_single mode) and transfer its mask to teh > > 'survicor' thread? > > > > David? > > > I think this becauses the M:N thread masks all signals except SIGSTOP > and SIGKILL, > the real signal mask in userland needs to be set back to kernel, > libpthread should > provide a wrapper for execv syscall, Dan? fix me if I am wrong. We do that in fork(). Is execv() not being done after a fork()? -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 23:09:25 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C1CC16A4CE; Thu, 16 Sep 2004 23:09:25 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id ECA3543D3F; Thu, 16 Sep 2004 23:09:23 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id 848667A3D2; Thu, 16 Sep 2004 16:09:23 -0700 (PDT) Message-ID: <414A1D23.6070603@elischer.org> Date: Thu, 16 Sep 2004 16:09:23 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: David Xu References: <20040912141838.GA89862@bps.jodocus.org> <414489FF.3090705@elischer.org> <414645C8.8070001@elischer.org> <20040914140002.GA32528@bps.jodocus.org> <41494310.40907@elischer.org> <20040916162828.GA855@bps.jodocus.org> <20040916211757.GA4830@bps.jodocus.org> <414A0EB4.5060304@elischer.org> <414A17C8.30703@freebsd.org> In-Reply-To: <414A17C8.30703@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: Daniel Eischen cc: freebsd-threads@freebsd.org Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 23:09:25 -0000 David Xu wrote: > Julian Elischer wrote: > >> Ick. >> >> >> Joost Bekkers wrote: >> >>> On Thu, Sep 16, 2004 at 06:28:28PM +0200, Joost Bekkers wrote: >>> >>> >>>> On Thu, Sep 16, 2004 at 12:38:56AM -0700, Julian Elischer wrote: >>>> >>>> >>>>> I checked in David's patch, which may fox this.. >>>>> try -current . >>>>> >>>>> >>>> >>>> >>>> I'm not experiencing the problem anymore. >>>> >>>> >>> >>> >>> >>> Celebrated too soon.... >>> >>> Signals are not being delivered to the process after it did >>> its execv. >>> >>> The only signal that seems to be working is KILL (-9) >>> >> >> the man page is: (for execve) >> Signals set to be ignored in the calling process are set to be >> ignored in >> the new process. Signals which are set to be caught in the calling >> process image are set to default action in the new process image. >> Blocked signals remain blocked regardless of changes to the signal >> action. The signal stack is reset to be undefined (see >> sigaction(2) for >> more information). >> >> so we need to keep track of all signals accepted by the process >> (which is an >> OR of the signals accepted by all the threads) and set it back to >> that state >> regardless of what thread is doing the exit. >> (yuck that is quite a difficult question) I wonder if the "signal >> gatherring thread" >> has that info? >> >> Maybe if the signal thread exits it should look to see if the >> process is exec/exiting >> (by looking at the thread_single mode) and transfer its mask to teh >> 'survicor' thread? >> >> David? >> > I think this becauses the M:N thread masks all signals except SIGSTOP > and SIGKILL, > the real signal mask in userland needs to be set back to kernel, > libpthread should > provide a wrapper for execv syscall, Dan? fix me if I am wrong. FOr exit() it would be ok to ignore it but for exec() we need some information.. maybe we need to catch teh exec call in teh library and make sure that it passes the right information down My suggestion in kse is to get the signal thread to have that mask and then the signal thread could pass it to the process as a whole.. in fact, shuoldn't the signal thread have a mask that could be set to the current mask whenever threas chang etheir requirements? the signal thread could do: if (we woke up) if (process is exiting) set mask of curthread->td_proc->p_singlethread to my mask.. for libthr the behaviour would have to be to collect all the masks together as each thread exits.. hmm hang on.. that would work for all cases... if every exiting thread in this situation or'd it's mask of acceptabile signals to teh processe's mask. then by the end, we'd have a mask of all accepted signals.. For KSE processes all threads except the signal thread would add nothing but the algorythm would still work... david.. I assume the waiting signal thread ahs teh mask it needs as part of its arguments and can use this..? > > > Posix says: > > The initial thread of the new process shall inherit at least the > following attributes > from the calling thread: > > * > > Signal mask (see /sigprocmask/() > > > and /pthread_sigmask/() > > ) > > * > > Pending signals (see /sigpending/() > > ) > > * > > From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 23:25:25 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D93EE16A4CE; Thu, 16 Sep 2004 23:25:25 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id B83EF43D49; Thu, 16 Sep 2004 23:25:25 +0000 (GMT) (envelope-from davidxu@freebsd.org) Received: from [127.0.0.1] (davidxu@localhost [127.0.0.1]) i8GNPNXQ079576; Thu, 16 Sep 2004 23:25:24 GMT (envelope-from davidxu@freebsd.org) Message-ID: <414A20F9.5000304@freebsd.org> Date: Fri, 17 Sep 2004 07:25:45 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.7.1) Gecko/20040730 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Daniel Eischen References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: Julian Elischer cc: freebsd-threads@freebsd.org Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 23:25:26 -0000 Daniel Eischen wrote: >We do that in fork(). Is execv() not being done after a fork()? > > > Joost calls execv() directly in threaded process, he did not go through fork() ->execv() path. From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 23:31:26 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 40AB016A4CE; Thu, 16 Sep 2004 23:31:26 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id D010243D1F; Thu, 16 Sep 2004 23:31:25 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i8GNVMML011365; Thu, 16 Sep 2004 19:31:22 -0400 (EDT) Date: Thu, 16 Sep 2004 19:31:22 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: David Xu In-Reply-To: <414A20F9.5000304@freebsd.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: Julian Elischer cc: freebsd-threads@freebsd.org Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 23:31:26 -0000 On Fri, 17 Sep 2004, David Xu wrote: > Daniel Eischen wrote: > > >We do that in fork(). Is execv() not being done after a fork()? > > > > > > > Joost calls execv() directly in threaded process, he did not go through > fork() ->execv() path. Yes, Julian just emailed me similarly. In that case, I think we need to wrap execve() and set the kernel signal mask to the threads signal mask. We don't need all the single threading stuff that is in our wrapped fork(); just __sys_sigprocmask() should be sufficient. Right? -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 00:01:13 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 55E6D16A4CF; Fri, 17 Sep 2004 00:01:13 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 38DAC43D48; Fri, 17 Sep 2004 00:01:13 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id F3EC37A3D2; Thu, 16 Sep 2004 17:01:12 -0700 (PDT) Message-ID: <414A2948.9000900@elischer.org> Date: Thu, 16 Sep 2004 17:01:12 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Daniel Eischen References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: David Xu cc: freebsd-threads@freebsd.org Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 00:01:13 -0000 Daniel Eischen wrote: >On Fri, 17 Sep 2004, David Xu wrote: > > >>Daniel Eischen wrote: >> >> >>>We do that in fork(). Is execv() not being done after a fork()? >>> >>> >>> >>> >>Joost calls execv() directly in threaded process, he did not go through >>fork() ->execv() path. >> > >Yes, Julian just emailed me similarly. In that case, I think we need >to wrap execve() and set the kernel signal mask to the threads signal >mask. We don't need all the single threading stuff that is in our >wrapped fork(); just __sys_sigprocmask() should be sufficient. Right? > We would need to ensure that there is no chance that we could be switched to another kernel thread between the two calls. In general I'd prefer it if we had a way that worked even if the userland screwed up.. execve is often a way in which daemons recover when they feel that they have messed up in some way... e.g.: panic() { log("help I've fallen over and I can't get up"); execve( me, argc, argv, envpp); /* or whatever the args are ..ok so I need to write more userland stuff */_ } do we trust userland that much? does the signal therad just enable ALL signals? does it not maks those for which we have no consumers? I'd still prefer to do things that work for libthr as well as libpthread. _ From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 00:16:42 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1E2EB16A4CE; Fri, 17 Sep 2004 00:16:42 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id C111A43D1D; Fri, 17 Sep 2004 00:16:41 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i8H0GdML007605; Thu, 16 Sep 2004 20:16:39 -0400 (EDT) Date: Thu, 16 Sep 2004 20:16:39 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Julian Elischer In-Reply-To: <414A2948.9000900@elischer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: David Xu cc: freebsd-threads@freebsd.org Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 00:16:42 -0000 On Thu, 16 Sep 2004, Julian Elischer wrote: > > > Daniel Eischen wrote: > > >On Fri, 17 Sep 2004, David Xu wrote: > > > > > >>Daniel Eischen wrote: > >> > >> > >>>We do that in fork(). Is execv() not being done after a fork()? > >>> > >>> > >>> > >>> > >>Joost calls execv() directly in threaded process, he did not go through > >>fork() ->execv() path. > >> > > > >Yes, Julian just emailed me similarly. In that case, I think we need > >to wrap execve() and set the kernel signal mask to the threads signal > >mask. We don't need all the single threading stuff that is in our > >wrapped fork(); just __sys_sigprocmask() should be sufficient. Right? > > > > We would need to ensure that there is no chance that we could be > switched to another kernel thread between the two calls. True. > In general I'd prefer it if we had a way that worked even if the > userland screwed up.. > execve is often a way in which daemons recover when they feel that they > have messed up in some way... > > e.g.: > panic() > { > log("help I've fallen over and I can't get up"); > execve( me, argc, argv, envpp); /* or whatever the args are ..ok > so I need to write more userland stuff */_ > } > do we trust userland that much? > > does the signal therad just enable ALL signals? > does it not maks those for which we have no consumers? Regardless, it doesn't have the signal mask that the execve()'ing thread has, and that is the key issue. The exec'd process needs to have the signal mask of the issuing thread. > I'd still prefer to do things that work for libthr as well as libpthread. I don't see why this (whatever we do) has to be any different for libthr. For libpthread, we could put ourselves in a critical region (clear the mailbox) -- that would stop upcalls. Does that also prevent switching to different kernel threads? -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 00:44:42 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C86FE16A4CE; Fri, 17 Sep 2004 00:44:42 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id B5EAA43D1F; Fri, 17 Sep 2004 00:44:42 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id 800367A3D2; Thu, 16 Sep 2004 17:44:42 -0700 (PDT) Message-ID: <414A337A.1040906@elischer.org> Date: Thu, 16 Sep 2004 17:44:42 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Daniel Eischen References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: David Xu cc: freebsd-threads@freebsd.org Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 00:44:42 -0000 Daniel Eischen wrote: >On Thu, 16 Sep 2004, Julian Elischer wrote: > >Regardless, it doesn't have the signal mask that the execve()'ing thread >has, and that is the key issue. The exec'd process needs to have the >signal mask of the issuing thread. > > > >>I'd still prefer to do things that work for libthr as well as libpthread. >> >> > >I don't see why this (whatever we do) has to be any different for libthr. > > > >For libpthread, we could put ourselves in a critical region (clear >the mailbox) -- that would stop upcalls. Does that also prevent >switching to different kernel threads? > yes I guess that would be enough. > > > From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 01:53:29 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 606D016A4CE for ; Fri, 17 Sep 2004 01:53:29 +0000 (GMT) Received: from mail1.speakeasy.net (mail1.speakeasy.net [216.254.0.201]) by mx1.FreeBSD.org (Postfix) with ESMTP id 37BA443D54 for ; Fri, 17 Sep 2004 01:53:29 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 31772 invoked from network); 17 Sep 2004 01:53:28 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 17 Sep 2004 01:53:28 -0000 Received: from slimer.baldwin.cx (slimer.baldwin.cx [192.168.0.16]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8H1rNN2002471; Thu, 16 Sep 2004 21:53:26 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Andrew Gallatin Date: Thu, 16 Sep 2004 13:16:43 -0400 User-Agent: KMail/1.6.2 References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <414942B3.1060703@elischer.org> <16713.38977.864343.415015@grasshopper.cs.duke.edu> In-Reply-To: <16713.38977.864343.415015@grasshopper.cs.duke.edu> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200409161316.43010.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: Julian Elischer cc: freebsd-threads@FreeBSD.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 01:53:29 -0000 On Thursday 16 September 2004 09:42 am, Andrew Gallatin wrote: > Julian Elischer writes: > > Andrew, please try -current on ts own now.. > > I have checked in some fixes that have helped others. > > OK, preemption off... Still a system lockup, but a little different. > > The interesting thing here is that continuing and breaking into the > debugger repeatedly seems to show that thread 0xc1646af0 is looping in > exit. I've seen him in thread_single, thread_suspend_check, and in > exit itself at kern_exit.c:163, etc. A breakpoint in > thread_suspend_one never triggers, so I guess he's holding the proc > lock and just looping forever. A breakpoint in _mtx_assert() shows > him asserting the proc lock in thread_suspend_check at kern_thread.c:898. > Over and over. There is definitely some sort of infinite loop here. Stripping out the comments in exit1() for that section of code reveals basically: PROC_LOCK(p); if (p->p_flag & P_HADTHREADS) { retry: thread_suspend_check(0); if (thread_single(SINGLE_EXIT)) goto retry; } p->p_flag |= P_WEXIT; PROC_UNLOCK(p); So it's easy to see how it can stuck in a loop I think. If thread_single() never drops the lock then other threads that are waiting to die can't actually wait because they can never get the proc lock so that they can die. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 04:36:25 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8BF5916A4CF; Fri, 17 Sep 2004 04:36:25 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3935843D54; Fri, 17 Sep 2004 04:36:25 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i8H4aLML027550; Fri, 17 Sep 2004 00:36:21 -0400 (EDT) Date: Fri, 17 Sep 2004 00:36:21 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: David Xu In-Reply-To: <414A17C8.30703@freebsd.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: Julian Elischer cc: freebsd-threads@freebsd.org Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 04:36:25 -0000 On Fri, 17 Sep 2004, David Xu wrote: > Julian Elischer wrote: > > > so we need to keep track of all signals accepted by the process (which > > is an > > OR of the signals accepted by all the threads) and set it back to that > > state > > regardless of what thread is doing the exit. > > (yuck that is quite a difficult question) I wonder if the "signal > > gatherring thread" > > has that info? > > > > Maybe if the signal thread exits it should look to see if the process > > is exec/exiting > > (by looking at the thread_single mode) and transfer its mask to teh > > 'survicor' thread? > > > > David? > > > I think this becauses the M:N thread masks all signals except SIGSTOP > and SIGKILL, > the real signal mask in userland needs to be set back to kernel, > libpthread should > provide a wrapper for execv syscall, Dan? fix me if I am wrong. Potential (untested) patch at: http://people.freebsd.org/~deischen/kse/execve.diffs -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 04:40:49 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D6F0516A4CE; Fri, 17 Sep 2004 04:40:49 +0000 (GMT) Received: from pimout3-ext.prodigy.net (pimout3-ext.prodigy.net [207.115.63.102]) by mx1.FreeBSD.org (Postfix) with ESMTP id 05A0243D2F; Fri, 17 Sep 2004 04:40:49 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (adsl-64-164-9-59.dsl.snfc21.pacbell.net [64.164.9.59])i8H4ekNm033004; Fri, 17 Sep 2004 00:40:47 -0400 Message-ID: <414A6ACD.2020600@elischer.org> Date: Thu, 16 Sep 2004 21:40:45 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524 X-Accept-Language: en, hu MIME-Version: 1.0 To: John Baldwin References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <414942B3.1060703@elischer.org> <16713.38977.864343.415015@grasshopper.cs.duke.edu> <200409161316.43010.jhb@FreeBSD.org> In-Reply-To: <200409161316.43010.jhb@FreeBSD.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: Andrew Gallatin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 04:40:50 -0000 John Baldwin wrote: > On Thursday 16 September 2004 09:42 am, Andrew Gallatin wrote: > >>Julian Elischer writes: >> > Andrew, please try -current on ts own now.. >> > I have checked in some fixes that have helped others. >> >>OK, preemption off... Still a system lockup, but a little different. >> >>The interesting thing here is that continuing and breaking into the >>debugger repeatedly seems to show that thread 0xc1646af0 is looping in >>exit. I've seen him in thread_single, thread_suspend_check, and in >>exit itself at kern_exit.c:163, etc. A breakpoint in >>thread_suspend_one never triggers, so I guess he's holding the proc >>lock and just looping forever. A breakpoint in _mtx_assert() shows >>him asserting the proc lock in thread_suspend_check at kern_thread.c:898. >>Over and over. > > > There is definitely some sort of infinite loop here. Stripping out the > comments in exit1() for that section of code reveals basically: > > PROC_LOCK(p); > if (p->p_flag & P_HADTHREADS) { > retry: > thread_suspend_check(0); > if (thread_single(SINGLE_EXIT)) > goto retry; > } > p->p_flag |= P_WEXIT; > PROC_UNLOCK(p); > > So it's easy to see how it can stuck in a loop I think. If thread_single() > never drops the lock then other threads that are waiting to die can't > actually wait because they can never get the proc lock so that they can die. > hmm intersting.. but this code hasn't changed in ages... in thread_single we see: thread_suspend_one(td); PROC_UNLOCK(p); mi_switch(SW_VOL, NULL); mtx_unlock_spin(&sched_lock); PROC_LOCK(p); mtx_lock_spin(&sched_lock); so when it sleeps it releases the proc lock. From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 07:05:25 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A9DC716A4CE; Fri, 17 Sep 2004 07:05:25 +0000 (GMT) Received: from bps.jodocus.org (g157016.upc-g.chello.nl [80.57.157.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id F176143D1D; Fri, 17 Sep 2004 07:05:24 +0000 (GMT) (envelope-from joost@jodocus.org) Received: from jodocus.org (localhost [127.0.0.1]) by bps.jodocus.org (8.13.1/8.12.10) with ESMTP id i8H75KXK013007; Fri, 17 Sep 2004 09:05:20 +0200 (CEST) (envelope-from joost@jodocus.org) Received: (from joost@localhost) by jodocus.org (8.13.1/8.12.10/Submit) id i8H75Jpa013006; Fri, 17 Sep 2004 09:05:19 +0200 (CEST) (envelope-from joost) Date: Fri, 17 Sep 2004 09:05:19 +0200 From: Joost Bekkers To: Daniel Eischen Message-ID: <20040917070519.GA12866@bps.jodocus.org> Mail-Followup-To: Joost Bekkers , Daniel Eischen , David Xu , Julian Elischer , freebsd-threads@freebsd.org References: <414A17C8.30703@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i cc: freebsd-threads@freebsd.org cc: David Xu cc: Julian Elischer Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 07:05:25 -0000 On Fri, Sep 17, 2004 at 12:36:21AM -0400, Daniel Eischen wrote: > > I think this becauses the M:N thread masks all signals except SIGSTOP > > and SIGKILL, > > the real signal mask in userland needs to be set back to kernel, > > libpthread should > > provide a wrapper for execv syscall, Dan? fix me if I am wrong. > > Potential (untested) patch at: > > http://people.freebsd.org/~deischen/kse/execve.diffs works, signals are arriving again. -- greetz Joost joost@jodocus.org From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 11:16:22 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5F3E616A4CE; Fri, 17 Sep 2004 11:16:22 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id E634243D2D; Fri, 17 Sep 2004 11:16:21 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i8HBGHML010020; Fri, 17 Sep 2004 07:16:17 -0400 (EDT) Date: Fri, 17 Sep 2004 07:16:17 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: David Xu In-Reply-To: <20040917070519.GA12866@bps.jodocus.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: Julian Elischer cc: freebsd-threads@freebsd.org Subject: Re: SIGILL @ pthread_create() after execv -FIXED- X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 11:16:22 -0000 On Fri, 17 Sep 2004, Joost Bekkers wrote: > On Fri, Sep 17, 2004 at 12:36:21AM -0400, Daniel Eischen wrote: > > > I think this becauses the M:N thread masks all signals except SIGSTOP > > > and SIGKILL, > > > the real signal mask in userland needs to be set back to kernel, > > > libpthread should > > > provide a wrapper for execv syscall, Dan? fix me if I am wrong. > > > > Potential (untested) patch at: > > > > http://people.freebsd.org/~deischen/kse/execve.diffs > > works, signals are arriving again. David, would you review the above patch? Thanks! -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 11:47:38 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E2B8716A4CE; Fri, 17 Sep 2004 11:47:38 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 640D143D46; Fri, 17 Sep 2004 11:47:38 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i8HBlZJt029879 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 17 Sep 2004 07:47:35 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i8HBlTTE073202; Fri, 17 Sep 2004 07:47:29 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16714.52945.827195.748164@grasshopper.cs.duke.edu> Date: Fri, 17 Sep 2004 07:47:29 -0400 (EDT) To: Julian Elischer In-Reply-To: <414A6ACD.2020600@elischer.org> References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <414942B3.1060703@elischer.org> <16713.38977.864343.415015@grasshopper.cs.duke.edu> <200409161316.43010.jhb@FreeBSD.org> <414A6ACD.2020600@elischer.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 11:47:39 -0000 Julian Elischer writes: > John Baldwin wrote: > > On Thursday 16 September 2004 09:42 am, Andrew Gallatin wrote: > > > >>Julian Elischer writes: > >> > Andrew, please try -current on ts own now.. > >> > I have checked in some fixes that have helped others. > >> > >>OK, preemption off... Still a system lockup, but a little different. > >> > >>The interesting thing here is that continuing and breaking into the > >>debugger repeatedly seems to show that thread 0xc1646af0 is looping in > >>exit. I've seen him in thread_single, thread_suspend_check, and in > >>exit itself at kern_exit.c:163, etc. A breakpoint in > >>thread_suspend_one never triggers, so I guess he's holding the proc > >>lock and just looping forever. A breakpoint in _mtx_assert() shows > >>him asserting the proc lock in thread_suspend_check at kern_thread.c:898. > >>Over and over. > > > > > > There is definitely some sort of infinite loop here. Stripping out the > > comments in exit1() for that section of code reveals basically: > > > > PROC_LOCK(p); > > if (p->p_flag & P_HADTHREADS) { > > retry: > > thread_suspend_check(0); > > if (thread_single(SINGLE_EXIT)) > > goto retry; > > } > > p->p_flag |= P_WEXIT; > > PROC_UNLOCK(p); > > > > So it's easy to see how it can stuck in a loop I think. If thread_single() > > never drops the lock then other threads that are waiting to die can't > > actually wait because they can never get the proc lock so that they can die. > > > > > hmm intersting.. > but this code hasn't changed in ages... > > > in thread_single we see: > > thread_suspend_one(td); > PROC_UNLOCK(p); > mi_switch(SW_VOL, NULL); > mtx_unlock_spin(&sched_lock); > PROC_LOCK(p); > mtx_lock_spin(&sched_lock); > > so when it sleeps it releases the proc lock. But that's the problem. As I said above, break in thread_suspend_one never triggers, so this code is never called. It must be bailing out of thread_suspend_one() before this happens. Did somebody fix ddb? If yes, I can try stepping through it if you like. Maybe a quick fix would be to drop the proc lock and tsleep for a clock tick at the bottom of the infinate loop... Drew From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 11:57:23 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4078B16A4CE for ; Fri, 17 Sep 2004 11:57:23 +0000 (GMT) Received: from tts.orel.ru (tts.orel.ru [213.59.64.67]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7B28A43D39 for ; Fri, 17 Sep 2004 11:57:22 +0000 (GMT) (envelope-from bel@orel.ru) Received: from orel.ru (lg.orel.ru [62.33.11.59]) by tts.orel.ru (8.12.10/8.12.10/bel) with ESMTP id i8HBvIeT030721 for ; Fri, 17 Sep 2004 15:57:19 +0400 Message-ID: <414AD11D.5000403@orel.ru> Date: Fri, 17 Sep 2004 15:57:17 +0400 From: Andrew Belashov Organization: ORIS User-Agent: Mozilla/5.0 (X11; U; FreeBSD sparc64; en-US; rv:1.6) Gecko/20040407 X-Accept-Language: ru, en-us, en MIME-Version: 1.0 To: freebsd-threads@freebsd.org X-Enigmail-Version: 0.83.5.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Zombi-Check: on netra2.orel.ru Subject: Need help for debugging libkse X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 11:57:23 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, All! I'm debugging libkse library for FreeBSD/sparc64. Please, explain for me this ktrace dump: ======================================================== ~ 7877 ss CALL kse_create(0x26a000,0) ~ 7877 ss RET kse_create 0 ~ 7877 ss CALL write(0x1,0x7fdffffdc68,0x1) ~ 7877 ss RET write 2531328/0x26a000 ~ ^^^^^^^^^^^^^^^^ ~ 7877 ss CALL kse_switchin(0x270440,0x1) ~ 7877 ss RET kse_switchin JUSTRETURN ~ 7877 ss CALL write(0x1,0x7fdffffdc68,0x1) ~ 7877 ss RET write 1 ~ 7877 ss CALL write(0x1,0x7fdffffdc68,0x1) ~ 7877 ss RET write 1 ======================================================== This is normal? writing one byte, but write(2) return 0x26a000. - -- With best regards, Andrew Belashov. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFBStEawF8YpH80o/IRAgTeAKCFHQF72h//+cOtm63yBwpYJlHBNgCcD1tt 8lRNUo+NL3rroY7LS+xxGw4= =sa74 -----END PGP SIGNATURE----- From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 12:00:06 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9525C16A4CE; Fri, 17 Sep 2004 12:00:06 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 44F9543D31; Fri, 17 Sep 2004 12:00:06 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i8HC02Jt000988 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 17 Sep 2004 08:00:02 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i8HBxvQH073219; Fri, 17 Sep 2004 07:59:57 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16714.53693.518134.409849@grasshopper.cs.duke.edu> Date: Fri, 17 Sep 2004 07:59:57 -0400 (EDT) To: Julian Elischer In-Reply-To: <16714.52945.827195.748164@grasshopper.cs.duke.edu> References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <414942B3.1060703@elischer.org> <16713.38977.864343.415015@grasshopper.cs.duke.edu> <200409161316.43010.jhb@FreeBSD.org> <414A6ACD.2020600@elischer.org> <16714.52945.827195.748164@grasshopper.cs.duke.edu> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 12:00:06 -0000 Andrew Gallatin writes: > > But that's the problem. As I said above, break in thread_suspend_one > never triggers, so this code is never called. It must be bailing > out of thread_suspend_one() before this happens. Oops. No coffee.. I meant "bailing out of thread_single()" Drew From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 12:32:06 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1774816A4CE for ; Fri, 17 Sep 2004 12:32:06 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id B7A4C43D1F for ; Fri, 17 Sep 2004 12:32:05 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i8HCW4ML017452; Fri, 17 Sep 2004 08:32:05 -0400 (EDT) Date: Fri, 17 Sep 2004 08:32:04 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Andrew Belashov In-Reply-To: <414AD11D.5000403@orel.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: freebsd-threads@freebsd.org Subject: Re: Need help for debugging libkse X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 12:32:06 -0000 On Fri, 17 Sep 2004, Andrew Belashov wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, All! > > I'm debugging libkse library for FreeBSD/sparc64. > > Please, explain for me this ktrace dump: > > ======================================================== > ~ 7877 ss CALL kse_create(0x26a000,0) > ~ 7877 ss RET kse_create 0 > ~ 7877 ss CALL write(0x1,0x7fdffffdc68,0x1) > ~ 7877 ss RET write 2531328/0x26a000 > ~ ^^^^^^^^^^^^^^^^ That is an upcall. The kse_mailbox is being returned (same as in the first kse_create()). > ~ 7877 ss CALL kse_switchin(0x270440,0x1) This resumes the thread after it has become unblocked. > ~ 7877 ss RET kse_switchin JUSTRETURN > ~ 7877 ss CALL write(0x1,0x7fdffffdc68,0x1) > ~ 7877 ss RET write 1 > ~ 7877 ss CALL write(0x1,0x7fdffffdc68,0x1) > ~ 7877 ss RET write 1 > ======================================================== > > This is normal? > writing one byte, but write(2) return 0x26a000. It looks normal but it would be nice if "write 2531328/0x26a000" were labeled "kse_create 0x26a000" or something more appropriate. -- Dan Eischen