From owner-freebsd-hackers@FreeBSD.ORG Wed Apr 2 14:30:15 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3BF35AB4 for ; Wed, 2 Apr 2014 14:30:15 +0000 (UTC) Received: from mail.tdx.com (mail.tdx.com [62.13.128.18]) by mx1.freebsd.org (Postfix) with ESMTP id BE5BC848 for ; Wed, 2 Apr 2014 14:30:14 +0000 (UTC) Received: from Mail-PC.tdx.co.uk (storm.tdx.co.uk [62.13.130.251]) (authenticated bits=0) by mail.tdx.com (8.14.3/8.14.3/) with ESMTP id s32EU67f096415 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 2 Apr 2014 15:30:07 +0100 (BST) Date: Wed, 02 Apr 2014 15:30:06 +0100 From: Karl Pielorz To: freebsd-hackers@freebsd.org Subject: Stuck CLOSED sockets / sshd / zombies... Message-ID: <3FE645E9723756F22EF901AE@Mail-PC.tdx.co.uk> X-Mailer: Mulberry/4.0.8 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Apr 2014 14:30:15 -0000 Hi All, This issue started in -xen (subject: *Stuck sshd in urdlck), moved to -stable (subject: sshd with zombie process on FreeBSD 10.0-STABLE), and -net (subject: Server sockets staying in CLOSED for extended), but seems to have died a death in all of them. It's affecting a number of people - predominately with sshd. Does anyone know how I can troubleshoot this further, what the cause / fix is, or if it's already actually fixed? " # ps ax | grep 4344 ps axl | grep 4344 0 4344 895 0 20 0 84868 6944 urdlck Is - 0:00.01 sshd: unknown [priv] (sshd) 22 4345 4344 0 20 0 0 0 - Z - 0:00.00 0 4346 4344 0 21 0 84868 6952 sbwait I - 0:00.00 sshd: unknown [pam] (sshd) #ps axd ... 895 - Is 0:00.05 |-- /usr/sbin/sshd 3933 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd) 3934 - Z 0:00.00 | | |-- 3935 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd) 4338 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd) 4339 - Z 0:00.00 | | |-- 4340 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd) 4341 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd) 4342 - Z 0:00.00 | | |-- 4343 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd) 4344 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd) 4345 - Z 0:00.00 | | |-- 4346 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd) ... #netstat -a -n | grep CLOSED | wc -l 59 #netstat -a | grep 54544 tcp4 0 0 192.168.0.138.22 192.168.0.45.54544 CLOSED #sockstat | grep 4343 root sshd 4343 3 tcp4 192.168.0.138:22 192.168.0.45:54544 root sshd 4343 6 stream (not connected) root sshd 4343 8 stream -> ?? #uname -a FreeBSD host 10.0-STABLE FreeBSD 10.0-STABLE #0 r261289M: Thu Jan 30 13:33:35 UTC 2014 x@domain.com:/usr/src/sys/amd64/compile/GENERIC amd64 " For a box that's doing nothing (apart from people ssh'ing in occasionally) - there's obviously something wrong. What would be next to try and figure out why this is happening? - as I'd dearly like to know what's causing it / a fix (or if it's already fixed in -STABLE, and at which revision) Thanks, -Karl