Date: Wed, 02 Apr 2014 08:41:13 -0600 From: Ian Lepore <ian@FreeBSD.org> To: Karl Pielorz <kpielorz_lst@tdx.co.uk> Cc: freebsd-hackers@FreeBSD.org Subject: Re: Stuck CLOSED sockets / sshd / zombies... Message-ID: <1396449673.81853.264.camel@revolution.hippie.lan> In-Reply-To: <3FE645E9723756F22EF901AE@Mail-PC.tdx.co.uk> References: <3FE645E9723756F22EF901AE@Mail-PC.tdx.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2014-04-02 at 15:30 +0100, Karl Pielorz wrote: > Hi All, > > This issue started in -xen (subject: *Stuck sshd in urdlck), moved to > -stable (subject: sshd with zombie process on FreeBSD 10.0-STABLE), and > -net (subject: Server sockets staying in CLOSED for extended), but seems to > have died a death in all of them. > > It's affecting a number of people - predominately with sshd. > > Does anyone know how I can troubleshoot this further, what the cause / fix > is, or if it's already actually fixed? > > " > # ps ax | grep 4344 > ps axl | grep 4344 > 0 4344 895 0 20 0 84868 6944 urdlck Is - 0:00.01 sshd: unknown > [priv] (sshd) > 22 4345 4344 0 20 0 0 0 - Z - 0:00.00 <defunct> > 0 4346 4344 0 21 0 84868 6952 sbwait I - 0:00.00 sshd: unknown > [pam] (sshd) > > #ps axd > ... > 895 - Is 0:00.05 |-- /usr/sbin/sshd > 3933 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd) > 3934 - Z 0:00.00 | | |-- <defunct> > 3935 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd) > 4338 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd) > 4339 - Z 0:00.00 | | |-- <defunct> > 4340 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd) > 4341 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd) > 4342 - Z 0:00.00 | | |-- <defunct> > 4343 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd) > 4344 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd) > 4345 - Z 0:00.00 | | |-- <defunct> > 4346 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd) > ... > > #netstat -a -n | grep CLOSED | wc -l > 59 > > #netstat -a | grep 54544 > tcp4 0 0 192.168.0.138.22 192.168.0.45.54544 CLOSED > > #sockstat | grep 4343 > root sshd 4343 3 tcp4 192.168.0.138:22 192.168.0.45:54544 > root sshd 4343 6 stream (not connected) > root sshd 4343 8 stream -> ?? > > #uname -a > FreeBSD host 10.0-STABLE FreeBSD 10.0-STABLE #0 r261289M: Thu Jan 30 > 13:33:35 UTC 2014 x@domain.com:/usr/src/sys/amd64/compile/GENERIC amd64 > " > > For a box that's doing nothing (apart from people ssh'ing in occasionally) > - there's obviously something wrong. > > What would be next to try and figure out why this is happening? - as I'd > dearly like to know what's causing it / a fix (or if it's already fixed in > -STABLE, and at which revision) > > Thanks, > > -Karl I don't know anything about the underlying cause of the stuck sockets or zombies, but I suspect the thing that triggered the appearance of the problem was the import of a newer openssh in which the UsePrivilegeSeparation option default changed to "Sandbox" (or maybe that was just a new option with the new version). I think of this possibility because the extra child forked off with that option exposed some kernel memory-management problems on the arm platform a few months ago. That may imply that adding "UsePrivilegeSeparation no" could be a workaround for anyone having severe problems with this on a production server, but it should in no way become mythology that doing this somehow "fixes" a problem -- it would be purely a workaround, and we should keep pursuing the actual problem. -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1396449673.81853.264.camel>