Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 7 Oct 2015 11:39:42 +0200
From:      Palle Girgensohn <girgen@FreeBSD.org>
To:        freebsd-net@freebsd.org
Subject:   Process hung in STOPPED_SINGLE, wchan vodead, and cannot be killed or continued
Message-ID:  <60F10B6B-0B90-4728-B405-4B916CDF7FD6@FreeBSD.org>

next in thread | raw e-mail | index | archive | help
Hi,

I see a process that is hung in a jail, and cannot be killed or =
continued:

# ps HO wchan,nwchan,ppid -p 92266
  PID WCHAN  NWCHAN           PPID TT  STAT    TIME COMMAND
92266 -      -                   1  -  TJ   0:00,73 /usr/local/bin/jsvc =
-home /usr/local/openjdk8 -server
92266 vodead fffff811a5e6b400    1  -  TJ   0:00,48 /usr/local/bin/jsvc =
-home /usr/local/openjdk8 -server

# top
...
  PID USERNAME     THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU =
COMMAND
92266 nobody         2  20    0  4470M   418M STOP    2   0:20   0.00% =
jsvc


# ps axu
USER     PID %CPU %MEM     VSZ    RSS TT  STAT STARTED    TIME COMMAND
nobody 92266  0,0  0,4 4577204 427756  -  TJ   11:02pm 0:20,08 =
/usr/local/bin/jsvc -home /usr/local/openjdk8 ...

# sockstat
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN =
ADDRESS     =20
nobody   jsvc       92266 15 stream (not connected)
nobody   jsvc       92266 16 tcp4   127.0.0.1:8078        *:*
?        ?          ?     ?  tcp4   127.0.0.1:8078        =
127.0.0.1:22789
...

# sockstat | grep '^?' |wc -l
     151

# netstat -an | less
netstat: kvm not available: /dev/mem: No such file or directory
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        =
(state)
tcp4     374      0 127.0.0.1.8078         127.0.0.1.32866        CLOSED
...


# procstat -t 92266
  PID    TID COMM             TDNAME           CPU  PRI STATE   WCHAN   =20=

92266 105754 jsvc             -                 20  120 stop    -        =
=20
92266 106982 jsvc             -                  2  120 stop    vodead   =
=20
# procstat -k 92266
  PID    TID COMM             TDNAME           KSTACK                    =
  =20
92266 105754 jsvc             -                mi_switch =
thread_suspend_switch thread_single exit1 sys_sys_exit amd64_syscall =
Xfast_syscall=20
92266 106982 jsvc             -                mi_switch sleepq_switch =
sleepq_wait _sleep vnode_create_vobject zfs_freebsd_open VOP_OPEN_APV =
vn_open_vnode vn_open_cred kern_openat amd64_syscall Xfast_syscall=20



8078 is the java port that it used to listen to...


all look like this
?        ?          ?     ?  tcp4   127.0.0.1:8078        =
127.0.0.1:53583


# gdb -p 92266  /usr/local/bin/jsvc=20
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you =
are
welcome to change it and/or distribute copies of it under certain =
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for =
details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging =
symbols found)...
Attaching to program: /usr/local/bin/jsvc, process 92266

[ just hangs ]...

^Z
[1]+  Stopped                 gdb -p 92266 /usr/local/bin/jsvc
[root@tranbar /]#=20
[root@tranbar /]#=20
[root@tranbar /]# kill %1
[root@tranbar /]#=20
[1]+  Terminated              gdb -p 92266 /usr/local/bin/jsvc
[root@tranbar /]#=20




The culprit to begin with could be this:

Oct  7 07:54:00 host kernel: sonewconn: pcb 0xfffff80b49171310: Listen =
queue overflow: 151 already in queue awaiting acceptance (6 occurrences)

Occurred all through the night, saturating a service, *very likely* the =
one now showing problems, but i was never there to check. 151 lost =
network sockets (see sockstat above) connects the dots.

It seems the service entered STOP when we tried to stop it. jsvc is =
similar to daemontools, and I remeber seeing a references to a parent =
process 92265, but I might be imaginating, since the ppid =3D 1.

Trying to shut down the jail we got hanging shutdown processes:

from host:/var/log/console.jailname:
...
Stopping tomcat.
Waiting for PIDS: 9226690 second watchdog timeout expired. Shutdown =
terminated.
Ons  7 Okt 2015 08:27:19 CEST
...


# freebsd-version -ku
10.2-RELEASE-p3
10.2-RELEASE-p3


So basically, is there a way to get rid of this process without =
rebooting?

Palle




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?60F10B6B-0B90-4728-B405-4B916CDF7FD6>