From owner-freebsd-stable@FreeBSD.ORG Sat Mar 22 07:18:21 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8ABC992C for ; Sat, 22 Mar 2014 07:18:21 +0000 (UTC) Received: from mail-pd0-x22b.google.com (mail-pd0-x22b.google.com [IPv6:2607:f8b0:400e:c02::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 5C37DC48 for ; Sat, 22 Mar 2014 07:18:21 +0000 (UTC) Received: by mail-pd0-f171.google.com with SMTP id r10so3229670pdi.16 for ; Sat, 22 Mar 2014 00:18:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=NFbc+5p4WMm3bp02SFMsugKBz3s4hlsn+sq18cGFYgg=; b=JpHt0iLBAEBeQI0INgZFnIsxnhvG8t7swh/wj9LcKP636a65AWjnDBGTE1CLpUMpbz GXMvDp3S3KecIvK6DtKTTDJQ1oRDljNGY/xGFKTukIhvCqwGAE4ZR+LqMmZzu6KYItDv 1ywxOqCOPfR8Cw+lgTu7Uy7D8/5S74msVLFgafH2IPRK7mswnsmTPKmTaoka/8p8sMJB th36g3171MIRONpKSWyd88qj1bo2e0AsiPG0rJio9yafy7ISlI3opGrQB8vBF+rrjykP hKA3KNZGBrM4/u+gUXWtzaOmRsxEqNRSoStnx3cJPgkyVqCztRdIND1S4XCQ/hd4sA7v sihg== MIME-Version: 1.0 X-Received: by 10.68.241.73 with SMTP id wg9mr60423495pbc.62.1395472700943; Sat, 22 Mar 2014 00:18:20 -0700 (PDT) Sender: kob6558@gmail.com Received: by 10.66.0.164 with HTTP; Sat, 22 Mar 2014 00:18:20 -0700 (PDT) In-Reply-To: <532D2852.1010700@bsdinfo.com.br> References: <53016D97.5030909@bsdinfo.com.br> <5329D81E.7040709@bsdinfo.com.br> <201403201058.38555.jhb@freebsd.org> <532B7DEC.7010809@bsdinfo.com.br> <532D2852.1010700@bsdinfo.com.br> Date: Sat, 22 Mar 2014 00:18:20 -0700 X-Google-Sender-Auth: dbmXGq5wG4xcSmoTkqpR8wG-F28 Message-ID: Subject: Re: sshd with zombie process on FreeBSD 10.0-STABLE - workaround From: Kevin Oberman To: Marcelo Gondim Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: FreeBSD Stable Mailing List X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Mar 2014 07:18:21 -0000 On Fri, Mar 21, 2014 at 11:06 PM, Marcelo Gondim wrote: > Em 22/03/14 02:02, Kevin Oberman escreveu: > >> On Thu, Mar 20, 2014 at 4:46 PM, Marcelo Gondim > >wrote: >> >> Em 20/03/14 11:58, John Baldwin escreveu: >>> >>> On Wednesday, March 19, 2014 1:47:10 pm Marcelo Gondim wrote: >>>> >>>> Em 19/03/14 13:01, Kevin Oberman escreveu: >>>> >>>>> On Wed, Mar 19, 2014 at 6:00 AM, Marcelo Gondim >>>>>> >>>>>> wrote: >>>>> Hi all, >>>>> >>>>>> While the solution does not appear, did the script below and put it in >>>>>>> crontab to automatically delete zombie sshd processes. >>>>>>> >>>>>>> the_walking_dead.sh: >>>>>>> >>>>>>> #!/bin/sh >>>>>>> kill -9 `ps afx|grep sshd|grep unknown|awk '{print $1}'` >>>>>>> >>>>>>> >>>>>>> Put this in /etc/crontab: >>>>>>> >>>>>>> 00 1 * * * root the_walking_dead.sh >>>>>>> >>>>>>> >>>>>>> If 'kill -9' works, the process is not really a zombie. It simply >>>>>>> >>>>>> still >>>>>> >>>>>> has >>>>> a socket open and is waiting for it to be closed before exiting. >>>>> >>>>>> You might takes a look at network sockets with sockstat(1) and see if >>>>>> you >>>>>> can get any indication of why these sockets are not being closed. It >>>>>> may >>>>>> >>>>>> be >>>>> that the issue is not sshd but some other issue in the OS leaving >>>>> sockets >>>>> >>>>>> open. >>>>>> >>>>>> Hi Kevin, >>>>>> >>>>> My ps -afx below: >>>>> >>>>> [...] >>>>> 42139 - Is 0:00.01 sshd: unknown [priv] (sshd) >>>>> 42140 - Z 0:00.01 >>>>> 42141 - IW 0:00.00 sshd: unknown [pam] (sshd) >>>>> 58445 - Is 0:00.01 sshd: unknown [priv] (sshd) >>>>> 58446 - Z 0:00.02 >>>>> 58447 - IW 0:00.00 sshd: unknown [pam] (sshd) >>>>> 65635 - Is 0:00.01 sshd: vinicius [priv] (sshd) >>>>> 65636 - Z 0:00.01 >>>>> [...] >>>>> >>>>> # sockstat | grep 42140 >>>>> # >>>>> >>>>> # sockstat | grep 58446 >>>>> # >>>>> >>>>> # sockstat | grep 65636 >>>>> # >>>>> >>>>> No associated socket with zombie process. >>>>> >>>>> Do a pstree. I bet the zombies are children of the other processes >>>> that >>>> are stuck on a socket as Kevin described. >>>> >>>> # ps afx|grep sshd |grep unk >>>> >>> 10948 - Is 0:00.02 sshd: unknown [priv] (sshd) >>> 10955 - IW 0:00.00 sshd: unknown [pam] (sshd) <==== >>> 11701 - Is 0:00.02 sshd: unknown [priv] (sshd) >>> 11704 - IW 0:00.00 sshd: unknown [pam] (sshd) >>> 25450 - Is 0:00.01 sshd: unknown [priv] (sshd) >>> 25452 - IW 0:00.00 sshd: unknown [pam] (sshd) >>> 41193 - Is 0:00.02 sshd: unknown [priv] (sshd) >>> 41196 - IW 0:00.00 sshd: unknown [pam] (sshd) >>> 42193 - Is 0:00.02 sshd: unknown [priv] (sshd) >>> 42195 - IW 0:00.00 sshd: unknown [pam] (sshd) >>> 80638 - Is 0:00.02 sshd: unknown [priv] (sshd) >>> 80640 - IW 0:00.00 sshd: unknown [pam] (sshd) >>> 81484 - Is 0:00.02 sshd: unknown [priv] (sshd) >>> 81486 - IW 0:00.00 sshd: unknown [pam] (sshd) >>> >>> With proctstat I could see the socket as follows: >>> >>> # procstat -f 10955 >>> PID COMM FD T V FLAGS REF OFFSET PRO NAME >>> 10955 sshd text v r r------- - - - /usr/sbin/sshd >>> 10955 sshd cwd v d r------- - - - / >>> 10955 sshd root v d r------- - - - / >>> 10955 sshd 0 v c rw------ 6 0 - /dev/null >>> 10955 sshd 1 v c rw------ 6 0 - /dev/null >>> 10955 sshd 2 v c rw------ 6 0 - /dev/null >>> 10955 sshd 3 s - rw---n-- 2 0 TCP 186.xxx.xx.2:22 >>> 186.xxx.xx.8:57035 >>> 10955 sshd 5 p - rw------ 2 0 - - >>> 10955 sshd 6 s - rw------ 2 0 UDS - >>> 10955 sshd 7 p - rw------ 1 0 - - >>> 10955 sshd 8 s - rw------ 2 0 UDS - >>> >>> I do not understand why these connections are remaining locked in FreeBSD >>> 10.0 >>> >>> I'll try this sysctl: net.inet.tcp.delayed_ack=0 >>> >>> If the problem is still showing up, can you see what is going on with >> the >> socket? What is the state of the connection. Try "netstat -f inet -p tcp" >> and see what state the connection is in. I'm wondering if there is some >> sort of race going on where the socket hangs. >> >> Ideally I'd look to try and capture the packets st the end of the session. >> Can you do something to trigger this reliably? if so "standard" "tcpdump >> -pw file.bpf host HOST". I seem to recall that these connections are >> scheduled. If so, you can put the packet capture in a crontab to run at >> the >> same time. If you feed this to a tool like wireshark, you should get a >> good >> idea of what is happening, if not why. I understand that the timing of >> this >> might be very tricky. >> > Hi Kevin, > > Thanks for your help. > > I did the netstat and the state of the connection is closed as you can see > below: > > # procstat -f 26177 > PID COMM FD T V FLAGS REF OFFSET PRO NAME > 26177 sshd text v r r------- - - - /usr/sbin/sshd > 26177 sshd cwd v d r------- - - - / > 26177 sshd root v d r------- - - - / > 26177 sshd 0 v c rw------ 6 0 - /dev/null > 26177 sshd 1 v c rw------ 6 0 - /dev/null > 26177 sshd 2 v c rw------ 6 0 - /dev/null > 26177 sshd 3 s - rw---n-- 2 0 TCP > 186.193.48.10:4321 186.193.48.8:50094 > 26177 sshd 4 s - rw------ 1 0 UDS - > 26177 sshd 5 p - rw------ 2 0 - - > 26177 sshd 6 s - rw------ 2 0 UDS - > > # procstat -f 10110 > PID COMM FD T V FLAGS REF OFFSET PRO NAME > 10110 sshd text v r r------- - - - /usr/sbin/sshd > 10110 sshd cwd v d r------- - - - / > 10110 sshd root v d r------- - - - / > 10110 sshd 0 v c rw------ 6 0 - /dev/null > 10110 sshd 1 v c rw------ 6 0 - /dev/null > 10110 sshd 2 v c rw------ 6 0 - /dev/null > 10110 sshd 3 s - rw---n-- 2 0 TCP > 186.193.48.10:4321 186.193.48.8:63048 > 10110 sshd 4 s - rw------ 1 0 UDS - > 10110 sshd 5 p - rw------ 2 0 - - > 10110 sshd 6 s - rw------ 2 0 UDS - > > # netstat -f inet -p tcp > Active Internet connections > Proto Recv-Q Send-Q Local Address Foreign Address (state) > tcp4 0 0 bart.24173 pppoe17250.8728 ESTABLISHED > tcp4 0 0 bart.53795 pppoe17249.8728 TIME_WAIT > tcp4 0 0 bart.54191 pppoe149.8728 TIME_WAIT > tcp4 0 0 bart.12476 pppoe148.8728 TIME_WAIT > tcp4 0 0 bart.36846 pppoe142.8728 TIME_WAIT > tcp4 0 0 bart.39944 186.193.48.22.8728 TIME_WAIT > tcp4 0 0 bart.60233 186.193.48.25.8728 TIME_WAIT > tcp4 0 0 bart.50946 186.193.48.9.8728 TIME_WAIT > tcp4 0 0 bart.13403 186.193.48.19.8728 TIME_WAIT > tcp4 0 0 bart.36982 zeus.linuxinfo.c.8728 TIME_WAIT > tcp4 0 0 bart.rwhois pppoe769.49896 ESTABLISHED > tcp4 0 0 bart.mysql mail.15711 ESTABLISHED > tcp4 0 0 bart.mysql mail.16087 ESTABLISHED > tcp4 0 0 bart.mysql mail.25051 ESTABLISHED > tcp4 0 0 bart.mysql mail.59126 ESTABLISHED > tcp4 0 0 bart.mysql mail.59051 ESTABLISHED > tcp4 0 0 bart.mysql mail.29446 ESTABLISHED > tcp4 0 0 bart.mysql mail.45453 ESTABLISHED > tcp4 0 0 bart.mysql mail.14938 ESTABLISHED > tcp4 0 0 bart.mysql mail.46230 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.16930 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.28074 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.53686 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.14448 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.52487 ESTABLISHED > tcp4 0 0 bart.rwhois 186.193.48.8.50094 CLOSED > <==== > tcp4 0 0 bart.mysql mail.38286 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.32387 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.52219 ESTABLISHED > tcp4 0 0 bart.mysql mail.52144 ESTABLISHED > tcp4 0 0 bart.mysql mail.18862 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.52636 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.51607 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.62581 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.23071 ESTABLISHED > tcp4 0 0 bart.mysql mail.22862 FIN_WAIT_2 > tcp4 0 0 bart.rwhois 186.193.48.8.63048 CLOSED > <==== > tcp4 0 0 bart.mysql mail.42479 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.18146 ESTABLISHED > tcp4 0 0 bart.mysql mail.46731 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.20498 ESTABLISHED > tcp4 0 0 bart.62869 186.193.48.2.1190 ESTABLISHED > tcp4 0 0 bart.mysql mail.55353 ESTABLISHED > I'm sorry. I am now even more confused. Maybe I need to re-read the entire thread. I thought that the hung processes were sshd. These are rwhois. Or is there an ssh tunnel carrying the rwhois connections? (I see no sshd connections in this list.) -- R. Kevin Oberman, Network Engineer, Retired E-mail: rkoberman@gmail.com