Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Mar 2014 22:02:50 -0700
From:      Kevin Oberman <rkoberman@gmail.com>
To:        Marcelo Gondim <gondim@bsdinfo.com.br>
Cc:        FreeBSD Stable Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: sshd with zombie process on FreeBSD 10.0-STABLE - workaround
Message-ID:  <CAN6yY1sf0z_jBJgBy2dZX0a3JJnyTnq76_DepXzG32GWgHHO6A@mail.gmail.com>
In-Reply-To: <532B7DEC.7010809@bsdinfo.com.br>
References:  <53016D97.5030909@bsdinfo.com.br> <CAN6yY1uucfkdXxkCF30w1Q9vffRpDLxM90Sz1XVbdn5W69vQMg@mail.gmail.com> <5329D81E.7040709@bsdinfo.com.br> <201403201058.38555.jhb@freebsd.org> <532B7DEC.7010809@bsdinfo.com.br>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Mar 20, 2014 at 4:46 PM, Marcelo Gondim <gondim@bsdinfo.com.br>wrote:

> Em 20/03/14 11:58, John Baldwin escreveu:
>
>> On Wednesday, March 19, 2014 1:47:10 pm Marcelo Gondim wrote:
>>
>>  Em 19/03/14 13:01, Kevin Oberman escreveu:
>>>
>>>> On Wed, Mar 19, 2014 at 6:00 AM, Marcelo Gondim
>>>>
>>> <gondim@bsdinfo.com.br>wrote:
>>
>>> Hi all,
>>>>>
>>>>> While the solution does not appear, did the script below and put it in
>>>>> crontab to automatically delete zombie sshd processes.
>>>>>
>>>>> the_walking_dead.sh:
>>>>>
>>>>> #!/bin/sh
>>>>> kill -9 `ps afx|grep sshd|grep unknown|awk '{print $1}'`
>>>>>
>>>>>
>>>>> Put this in /etc/crontab:
>>>>>
>>>>> 00 1 * * *    root    the_walking_dead.sh
>>>>>
>>>>>
>>>>>  If 'kill -9' works, the process is not really a zombie. It simply
>>>> still
>>>>
>>> has
>>
>>> a socket open and is waiting for it to be closed before exiting.
>>>>
>>>> You might takes a look at network sockets with sockstat(1) and see if
>>>> you
>>>> can get any indication of why these sockets are not being closed. It may
>>>>
>>> be
>>
>>> that the issue is not sshd but some other issue in the OS leaving sockets
>>>> open.
>>>>
>>>>  Hi Kevin,
>>>
>>> My ps -afx below:
>>>
>>> [...]
>>> 42139  -  Is       0:00.01 sshd: unknown [priv] (sshd)
>>> 42140  -  Z        0:00.01 <defunct>
>>> 42141  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>>> 58445  -  Is       0:00.01 sshd: unknown [priv] (sshd)
>>> 58446  -  Z        0:00.02 <defunct>
>>> 58447  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>>> 65635  -  Is       0:00.01 sshd: vinicius [priv] (sshd)
>>> 65636  -  Z        0:00.01 <defunct>
>>> [...]
>>>
>>> # sockstat | grep 42140
>>> #
>>>
>>> # sockstat | grep 58446
>>> #
>>>
>>> # sockstat | grep 65636
>>> #
>>>
>>> No associated socket with zombie process.
>>>
>> Do a pstree.  I bet the zombies are children of the other processes that
>> are stuck on a socket as Kevin described.
>>
>>  # ps afx|grep sshd |grep unk
> 10948  -  Is       0:00.02 sshd: unknown [priv] (sshd)
> 10955  -  IW       0:00.00 sshd: unknown [pam] (sshd)       <====
> 11701  -  Is       0:00.02 sshd: unknown [priv] (sshd)
> 11704  -  IW       0:00.00 sshd: unknown [pam] (sshd)
> 25450  -  Is       0:00.01 sshd: unknown [priv] (sshd)
> 25452  -  IW       0:00.00 sshd: unknown [pam] (sshd)
> 41193  -  Is       0:00.02 sshd: unknown [priv] (sshd)
> 41196  -  IW       0:00.00 sshd: unknown [pam] (sshd)
> 42193  -  Is       0:00.02 sshd: unknown [priv] (sshd)
> 42195  -  IW       0:00.00 sshd: unknown [pam] (sshd)
> 80638  -  Is       0:00.02 sshd: unknown [priv] (sshd)
> 80640  -  IW       0:00.00 sshd: unknown [pam] (sshd)
> 81484  -  Is       0:00.02 sshd: unknown [priv] (sshd)
> 81486  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>
> With proctstat I could see  the socket as follows:
>
> # procstat -f 10955
>   PID COMM               FD T V FLAGS     REF  OFFSET PRO NAME
> 10955 sshd              text v r r-------  -       - - /usr/sbin/sshd
> 10955 sshd               cwd v d r-------  -       - - /
> 10955 sshd              root v d r-------  -       - - /
> 10955 sshd                 0 v c rw------  6       0 - /dev/null
> 10955 sshd                 1 v c rw------  6       0 - /dev/null
> 10955 sshd                 2 v c rw------  6       0 - /dev/null
> 10955 sshd                 3 s - rw---n--  2       0 TCP 186.xxx.xx.2:22
> 186.xxx.xx.8:57035
> 10955 sshd                 5 p - rw------  2       0 - -
> 10955 sshd                 6 s - rw------  2       0 UDS -
> 10955 sshd                 7 p - rw------  1       0 - -
> 10955 sshd                 8 s - rw------  2       0 UDS -
>
> I do not understand why these connections are remaining locked in FreeBSD
> 10.0
>
> I'll try this sysctl: net.inet.tcp.delayed_ack=0
>

If the problem is still showing up, can you  see what is going on with the
socket? What is the state of the connection. Try "netstat -f inet -p tcp"
and see what state the connection is in. I'm wondering if there is some
sort of race going on where the socket hangs.

Ideally I'd look to try and capture the packets st the end of the session.
Can you do something to trigger this reliably? if so "standard" "tcpdump
-pw file.bpf host HOST". I seem to recall that these connections are
scheduled. If so, you can put the packet capture in a crontab to run at the
same time. If you feed this to a tool like wireshark, you should get a good
idea of what is happening, if not why. I understand that the timing of this
might be very tricky.
-- 
R. Kevin Oberman, Network Engineer, Retired
E-mail: rkoberman@gmail.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAN6yY1sf0z_jBJgBy2dZX0a3JJnyTnq76_DepXzG32GWgHHO6A>