From owner-freebsd-stable@FreeBSD.ORG Mon Jan 3 20:13:57 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B2F4216A4CE for ; Mon, 3 Jan 2005 20:13:57 +0000 (GMT) Received: from mailhost.stack.nl (vaak.stack.nl [131.155.140.140]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7E07243D31 for ; Mon, 3 Jan 2005 20:13:56 +0000 (GMT) (envelope-from jilles@stack.nl) Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132]) by mailhost.stack.nl (Postfix) with ESMTP id 878C21F07A; Mon, 3 Jan 2005 21:13:55 +0100 (CET) Received: by turtle.stack.nl (Postfix, from userid 1677) id 7B7721CDA0; Mon, 3 Jan 2005 21:13:55 +0100 (CET) Date: Mon, 3 Jan 2005 21:13:55 +0100 From: Jilles Tjoelker To: freebsd-stable@freebsd.org Message-ID: <20050103201355.GA49512@stack.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Operating-System: FreeBSD 5.3-RELEASE-p2 i386 User-Agent: Mutt/1.5.6i cc: peters@stack.nl Subject: unkillable processes after debugging on 5.3R X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Jan 2005 20:13:57 -0000 I have two unkillable processes. System: FreeBSD turtle.stack.nl 5.3-RELEASE-p2 FreeBSD 5.3-RELEASE-p2 #5: Thu Dec 2 17:25:55 CET 2004 jilles@snail.stack.nl:/usr/obj/usr/src/sys/SNAIL i386 The system is SMP with two CPUs. Quoting the user: > I was debugging with the system gdb some c++-code (with a strange > segmentation fault). I was logged in via ssh and the connection seemed to > freeze (no response from keyboard input) so I disconnected (~.-sequence in > ssh). > Logging in again on the machine, I killed the debugger and shell (I don't > remember in which order) and tried to kill the program skilllist (pid > 20326). The skilllist program then appeared to be using 100% CPU time and > did not respond to any of the signals I sent. About 24 hours later, I > discovered that a zsh-process (pid 20328) was also running at lots of > cpu-time. The program was initially not run in the background, the nicing > and placing into the idle queue has been done later. > My code is c++-code, using both fd 1 and 2 for output. It is not threaded. > It does not use fork, exec etc. It's basically a simple prog, generating > only output, not listening for input. The working directory is mounted over > nfs (but my code does not open files). After the nicing and placing into the idle queue the system is properly responsive. Output of some commands about the processes: UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 1711 20326 1 475 171 20 2280 1416 - RN ph- 2421:35.76 ./skilllist 1711 20328 1 106 -8 20 2696 2328 - RNE ph- 729:34.15 -zsh (zsh) RTPRIO idle:25 idle:76 db> trace 20328 sched_switch(c29cd320,0,1) at sched_switch+0x143 mi_switch(1,0,c29cd320,1,c29cd320) at mi_switch+0x1ba sleepq_switch(c2302a80) at sleepq_switch+0x133 sleepq_wait(c2302a80,0,0,0,0) at sleepq_wait+0xb msleep(c2302a80,c2302bd8,4c,c06d14b3,0) at msleep+0x322 pipeclose(c2302a80,c2302b14,c3eba484,e9e9eb94,c050736c) at pipeclose+0x88 pipe_close(c3eba484,c29cd320) at pipe_close+0x2a fdrop_locked(c3eba484,c29cd320,c25b0c8c,e9e9ec04,c050616f) at fdrop_locked+0xa8 fdrop(c3eba484,c29cd320,0,2,c388a000) at fdrop+0x41 closef(c3eba484,c29cd320) at closef+0x23f fdfree(c29cd320,c3c22d70) at fdfree+0x383 exit1(c29cd320,2,1,c29cd320,c388a000) at exit1+0x4d4 sigexit(c29cd320,2,0,c3c22c5c,c29cd320) at sigexit+0xd3 postsig(2) at postsig+0x13f ast(e9e9ed48) at ast+0x4ba doreti_ast() at doreti_ast+0x17 db> trace 20326 sched_switch(ffffffff,c22e3000,400,8067000,df42c340) at sched_switch+0x143 db> c jilles@turtle /home/jilles% fstat -vp20326 USER CMD PID FD MOUNT INUM MODE SZ|DV R/W peters skilllist 20326 root / 2 drwxr-xr-x 1024 r peters skilllist 20326 wd /toad.mnt/capitalism 892355 drwxr-xr-x 512 r peters skilllist 20326 text /toad.mnt/capitalism 892490 -rwxr-xr-x 235868 r peters skilllist 20326 0 - - bad - peters skilllist 20326 1* pipe c2302b2c <-> c2302a80 0 rw peters skilllist 20326 2* pipe c2302b2c <-> c2302a80 0 rw jilles@turtle /home/jilles% fstat -vp20328 USER CMD PID FD MOUNT INUM MODE SZ|DV R/W peters zsh 20328 root / 2 drwxr-xr-x 1024 r peters zsh 20328 wd /toad.mnt/capitalism 892355 drwxr-xr-x 512 r peters zsh 20328 text /usr 921879 -r-xr-xr-x 3156 r can't read sock at 0x0 peters zsh 20328 10* error peters zsh 20328 12 - - bad - can't read pipe at 0x0 peters zsh 20328 13* error jilles@turtle /home/jilles% The address c2302a80 occurs in the 20328 backtrace as well. The fstat output of 20328 is unreliable: a later query returned this: jilles@turtle /home/jilles% fstat -vp20328 USER CMD PID FD MOUNT INUM MODE SZ|DV R/W peters zsh 20328 root / 2 drwxr-xr-x 1024 r peters zsh 20328 wd /toad.mnt/capitalism 892355 drwxr-xr-x 512 r peters zsh 20328 text /usr 921879 -r-xr-xr-x 3156 r unknown file type 5 for file 10 of pid 20328 unknown file type 5 for file 12 of pid 20328 can't read pipe at 0x0 peters zsh 20328 13* error jilles@turtle /home/jilles% -- Jilles Tjoelker