Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 22 Mar 2014 03:06:10 -0300
From:      Marcelo Gondim <gondim@bsdinfo.com.br>
To:        FreeBSD Stable Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: sshd with zombie process on FreeBSD 10.0-STABLE - workaround
Message-ID:  <532D2852.1010700@bsdinfo.com.br>
In-Reply-To: <CAN6yY1sf0z_jBJgBy2dZX0a3JJnyTnq76_DepXzG32GWgHHO6A@mail.gmail.com>
References:  <53016D97.5030909@bsdinfo.com.br> <CAN6yY1uucfkdXxkCF30w1Q9vffRpDLxM90Sz1XVbdn5W69vQMg@mail.gmail.com> <5329D81E.7040709@bsdinfo.com.br> <201403201058.38555.jhb@freebsd.org> <532B7DEC.7010809@bsdinfo.com.br> <CAN6yY1sf0z_jBJgBy2dZX0a3JJnyTnq76_DepXzG32GWgHHO6A@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Em 22/03/14 02:02, Kevin Oberman escreveu:
> On Thu, Mar 20, 2014 at 4:46 PM, Marcelo Gondim <gondim@bsdinfo.com.br>wrote:
>
>> Em 20/03/14 11:58, John Baldwin escreveu:
>>
>>> On Wednesday, March 19, 2014 1:47:10 pm Marcelo Gondim wrote:
>>>
>>>   Em 19/03/14 13:01, Kevin Oberman escreveu:
>>>>> On Wed, Mar 19, 2014 at 6:00 AM, Marcelo Gondim
>>>>>
>>>> <gondim@bsdinfo.com.br>wrote:
>>>> Hi all,
>>>>>> While the solution does not appear, did the script below and put it in
>>>>>> crontab to automatically delete zombie sshd processes.
>>>>>>
>>>>>> the_walking_dead.sh:
>>>>>>
>>>>>> #!/bin/sh
>>>>>> kill -9 `ps afx|grep sshd|grep unknown|awk '{print $1}'`
>>>>>>
>>>>>>
>>>>>> Put this in /etc/crontab:
>>>>>>
>>>>>> 00 1 * * *    root    the_walking_dead.sh
>>>>>>
>>>>>>
>>>>>>   If 'kill -9' works, the process is not really a zombie. It simply
>>>>> still
>>>>>
>>>> has
>>>> a socket open and is waiting for it to be closed before exiting.
>>>>> You might takes a look at network sockets with sockstat(1) and see if
>>>>> you
>>>>> can get any indication of why these sockets are not being closed. It may
>>>>>
>>>> be
>>>> that the issue is not sshd but some other issue in the OS leaving sockets
>>>>> open.
>>>>>
>>>>>   Hi Kevin,
>>>> My ps -afx below:
>>>>
>>>> [...]
>>>> 42139  -  Is       0:00.01 sshd: unknown [priv] (sshd)
>>>> 42140  -  Z        0:00.01 <defunct>
>>>> 42141  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>>>> 58445  -  Is       0:00.01 sshd: unknown [priv] (sshd)
>>>> 58446  -  Z        0:00.02 <defunct>
>>>> 58447  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>>>> 65635  -  Is       0:00.01 sshd: vinicius [priv] (sshd)
>>>> 65636  -  Z        0:00.01 <defunct>
>>>> [...]
>>>>
>>>> # sockstat | grep 42140
>>>> #
>>>>
>>>> # sockstat | grep 58446
>>>> #
>>>>
>>>> # sockstat | grep 65636
>>>> #
>>>>
>>>> No associated socket with zombie process.
>>>>
>>> Do a pstree.  I bet the zombies are children of the other processes that
>>> are stuck on a socket as Kevin described.
>>>
>>>   # ps afx|grep sshd |grep unk
>> 10948  -  Is       0:00.02 sshd: unknown [priv] (sshd)
>> 10955  -  IW       0:00.00 sshd: unknown [pam] (sshd)       <====
>> 11701  -  Is       0:00.02 sshd: unknown [priv] (sshd)
>> 11704  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>> 25450  -  Is       0:00.01 sshd: unknown [priv] (sshd)
>> 25452  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>> 41193  -  Is       0:00.02 sshd: unknown [priv] (sshd)
>> 41196  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>> 42193  -  Is       0:00.02 sshd: unknown [priv] (sshd)
>> 42195  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>> 80638  -  Is       0:00.02 sshd: unknown [priv] (sshd)
>> 80640  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>> 81484  -  Is       0:00.02 sshd: unknown [priv] (sshd)
>> 81486  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>>
>> With proctstat I could see  the socket as follows:
>>
>> # procstat -f 10955
>>    PID COMM               FD T V FLAGS     REF  OFFSET PRO NAME
>> 10955 sshd              text v r r-------  -       - - /usr/sbin/sshd
>> 10955 sshd               cwd v d r-------  -       - - /
>> 10955 sshd              root v d r-------  -       - - /
>> 10955 sshd                 0 v c rw------  6       0 - /dev/null
>> 10955 sshd                 1 v c rw------  6       0 - /dev/null
>> 10955 sshd                 2 v c rw------  6       0 - /dev/null
>> 10955 sshd                 3 s - rw---n--  2       0 TCP 186.xxx.xx.2:22
>> 186.xxx.xx.8:57035
>> 10955 sshd                 5 p - rw------  2       0 - -
>> 10955 sshd                 6 s - rw------  2       0 UDS -
>> 10955 sshd                 7 p - rw------  1       0 - -
>> 10955 sshd                 8 s - rw------  2       0 UDS -
>>
>> I do not understand why these connections are remaining locked in FreeBSD
>> 10.0
>>
>> I'll try this sysctl: net.inet.tcp.delayed_ack=0
>>
> If the problem is still showing up, can you  see what is going on with the
> socket? What is the state of the connection. Try "netstat -f inet -p tcp"
> and see what state the connection is in. I'm wondering if there is some
> sort of race going on where the socket hangs.
>
> Ideally I'd look to try and capture the packets st the end of the session.
> Can you do something to trigger this reliably? if so "standard" "tcpdump
> -pw file.bpf host HOST". I seem to recall that these connections are
> scheduled. If so, you can put the packet capture in a crontab to run at the
> same time. If you feed this to a tool like wireshark, you should get a good
> idea of what is happening, if not why. I understand that the timing of this
> might be very tricky.
Hi Kevin,

Thanks for your help.

I did the netstat and the state of the connection is closed as you can 
see below:

# procstat -f 26177
   PID COMM               FD T V FLAGS     REF  OFFSET PRO NAME
26177 sshd              text v r r-------  -       - - /usr/sbin/sshd
26177 sshd               cwd v d r-------  -       - - /
26177 sshd              root v d r-------  -       - - /
26177 sshd                 0 v c rw------  6       0 - /dev/null
26177 sshd                 1 v c rw------  6       0 - /dev/null
26177 sshd                 2 v c rw------  6       0 - /dev/null
26177 sshd                 3 s - rw---n--  2       0 TCP 
186.193.48.10:4321 186.193.48.8:50094
26177 sshd                 4 s - rw------  1       0 UDS -
26177 sshd                 5 p - rw------  2       0 - -
26177 sshd                 6 s - rw------  2       0 UDS -

# procstat -f 10110
   PID COMM               FD T V FLAGS     REF  OFFSET PRO NAME
10110 sshd              text v r r-------  -       - - /usr/sbin/sshd
10110 sshd               cwd v d r-------  -       - - /
10110 sshd              root v d r-------  -       - - /
10110 sshd                 0 v c rw------  6       0 - /dev/null
10110 sshd                 1 v c rw------  6       0 - /dev/null
10110 sshd                 2 v c rw------  6       0 - /dev/null
10110 sshd                 3 s - rw---n--  2       0 TCP 
186.193.48.10:4321 186.193.48.8:63048
10110 sshd                 4 s - rw------  1       0 UDS -
10110 sshd                 5 p - rw------  2       0 - -
10110 sshd                 6 s - rw------  2       0 UDS -

# netstat -f inet -p tcp
Active Internet connections
Proto Recv-Q Send-Q Local Address          Foreign Address (state)
tcp4       0      0 bart.24173             pppoe17250.8728 ESTABLISHED
tcp4       0      0 bart.53795             pppoe17249.8728 TIME_WAIT
tcp4       0      0 bart.54191             pppoe149.8728 TIME_WAIT
tcp4       0      0 bart.12476             pppoe148.8728 TIME_WAIT
tcp4       0      0 bart.36846             pppoe142.8728 TIME_WAIT
tcp4       0      0 bart.39944             186.193.48.22.8728 TIME_WAIT
tcp4       0      0 bart.60233             186.193.48.25.8728 TIME_WAIT
tcp4       0      0 bart.50946             186.193.48.9.8728 TIME_WAIT
tcp4       0      0 bart.13403             186.193.48.19.8728 TIME_WAIT
tcp4       0      0 bart.36982             zeus.linuxinfo.c.8728 TIME_WAIT
tcp4       0      0 bart.rwhois            pppoe769.49896 ESTABLISHED
tcp4       0      0 bart.mysql             mail.15711 ESTABLISHED
tcp4       0      0 bart.mysql             mail.16087 ESTABLISHED
tcp4       0      0 bart.mysql             mail.25051 ESTABLISHED
tcp4       0      0 bart.mysql             mail.59126 ESTABLISHED
tcp4       0      0 bart.mysql             mail.59051 ESTABLISHED
tcp4       0      0 bart.mysql             mail.29446 ESTABLISHED
tcp4       0      0 bart.mysql             mail.45453 ESTABLISHED
tcp4       0      0 bart.mysql             mail.14938 ESTABLISHED
tcp4       0      0 bart.mysql             mail.46230 FIN_WAIT_2
tcp4       0      0 bart.mysql             mail.16930 FIN_WAIT_2
tcp4       0      0 bart.mysql             mail.28074 FIN_WAIT_2
tcp4       0      0 bart.mysql             mail.53686 FIN_WAIT_2
tcp4       0      0 bart.mysql             mail.14448 FIN_WAIT_2
tcp4       0      0 bart.mysql             mail.52487 ESTABLISHED
tcp4       0      0 bart.rwhois            186.193.48.8.50094 CLOSED     
          <====
tcp4       0      0 bart.mysql             mail.38286 FIN_WAIT_2
tcp4       0      0 bart.mysql             mail.32387 FIN_WAIT_2
tcp4       0      0 bart.mysql             mail.52219 ESTABLISHED
tcp4       0      0 bart.mysql             mail.52144 ESTABLISHED
tcp4       0      0 bart.mysql             mail.18862 FIN_WAIT_2
tcp4       0      0 bart.mysql             mail.52636 FIN_WAIT_2
tcp4       0      0 bart.mysql             mail.51607 FIN_WAIT_2
tcp4       0      0 bart.mysql             mail.62581 FIN_WAIT_2
tcp4       0      0 bart.mysql             mail.23071 ESTABLISHED
tcp4       0      0 bart.mysql             mail.22862 FIN_WAIT_2
tcp4       0      0 bart.rwhois            186.193.48.8.63048 
CLOSED              <====
tcp4       0      0 bart.mysql             mail.42479 FIN_WAIT_2
tcp4       0      0 bart.mysql             mail.18146 ESTABLISHED
tcp4       0      0 bart.mysql             mail.46731 FIN_WAIT_2
tcp4       0      0 bart.mysql             mail.20498 ESTABLISHED
tcp4       0      0 bart.62869             186.193.48.2.1190 ESTABLISHED
tcp4       0      0 bart.mysql             mail.55353 ESTABLISHED

Cheers,
Gondim



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?532D2852.1010700>