Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 02 Apr 2014 08:41:13 -0600
From:      Ian Lepore <ian@FreeBSD.org>
To:        Karl Pielorz <kpielorz_lst@tdx.co.uk>
Cc:        freebsd-hackers@FreeBSD.org
Subject:   Re: Stuck CLOSED sockets / sshd / zombies...
Message-ID:  <1396449673.81853.264.camel@revolution.hippie.lan>
In-Reply-To: <3FE645E9723756F22EF901AE@Mail-PC.tdx.co.uk>
References:  <3FE645E9723756F22EF901AE@Mail-PC.tdx.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2014-04-02 at 15:30 +0100, Karl Pielorz wrote:
> Hi All,
> 
> This issue started in -xen (subject: *Stuck sshd in urdlck), moved to 
> -stable (subject: sshd with zombie process on FreeBSD 10.0-STABLE), and 
> -net (subject: Server sockets staying in CLOSED for extended), but seems to 
> have died a death in all of them.
> 
> It's affecting a number of people - predominately with sshd.
> 
> Does anyone know how I can troubleshoot this further, what the cause / fix 
> is, or if it's already actually fixed?
> 
> "
> # ps ax | grep 4344
> ps axl | grep 4344
>    0  4344   895   0  20  0 84868 6944 urdlck  Is - 0:00.01 sshd: unknown 
> [priv] (sshd)
>   22  4345  4344   0  20  0     0    0 -       Z  - 0:00.00 <defunct>
>    0  4346  4344   0  21  0 84868 6952 sbwait  I  - 0:00.00 sshd: unknown 
> [pam] (sshd)
> 
> #ps axd
> ...
>   895  -  Is        0:00.05 |-- /usr/sbin/sshd
>  3933  -  Is        0:00.01 | |-- sshd: unknown [priv] (sshd)
>  3934  -  Z         0:00.00 | | |-- <defunct>
>  3935  -  I         0:00.00 | | `-- sshd: unknown [pam] (sshd)
>  4338  -  Is        0:00.01 | |-- sshd: unknown [priv] (sshd)
>  4339  -  Z         0:00.00 | | |-- <defunct>
>  4340  -  I         0:00.00 | | `-- sshd: unknown [pam] (sshd)
>  4341  -  Is        0:00.01 | |-- sshd: unknown [priv] (sshd)
>  4342  -  Z         0:00.00 | | |-- <defunct>
>  4343  -  I         0:00.00 | | `-- sshd: unknown [pam] (sshd)
>  4344  -  Is        0:00.01 | |-- sshd: unknown [priv] (sshd)
>  4345  -  Z         0:00.00 | | |-- <defunct>
>  4346  -  I         0:00.00 | | `-- sshd: unknown [pam] (sshd)
> ...
> 
> #netstat -a -n | grep CLOSED | wc -l
> 59
> 
> #netstat -a | grep 54544
> tcp4       0      0 192.168.0.138.22      192.168.0.45.54544     CLOSED
> 
> #sockstat | grep 4343
> root     sshd       4343  3  tcp4   192.168.0.138:22    192.168.0.45:54544
> root     sshd       4343  6  stream (not connected)
> root     sshd       4343  8  stream -> ??
> 
> #uname -a
> FreeBSD host 10.0-STABLE FreeBSD 10.0-STABLE #0 r261289M: Thu Jan 30 
> 13:33:35 UTC 2014     x@domain.com:/usr/src/sys/amd64/compile/GENERIC  amd64
> "
> 
> For a box that's doing nothing (apart from people ssh'ing in occasionally) 
> -   there's obviously something wrong.
> 
> What would be next to try and figure out why this is happening? - as I'd 
> dearly like to know what's causing it / a fix (or if it's already fixed in 
> -STABLE, and at which revision)
> 
> Thanks,
> 
> -Karl

I don't know anything about the underlying cause of the stuck sockets or
zombies, but I suspect the thing that triggered the appearance of the
problem was the import of a newer openssh in which the
UsePrivilegeSeparation option default changed to "Sandbox" (or maybe
that was just a new option with the new version).  I think of this
possibility because the extra child forked off with that option exposed
some kernel memory-management problems on the arm platform a few months
ago.

That may imply that adding "UsePrivilegeSeparation no" could be a
workaround for anyone having severe problems with this on a production
server, but it should in no way become mythology that doing this somehow
"fixes" a problem -- it would be purely a workaround, and we should keep
pursuing the actual problem.

-- Ian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1396449673.81853.264.camel>