Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 26 Jul 2019 09:11:38 -0400
From:      Janos Dohanics <web@3dresearch.com>
To:        FreeBSD Questions <freebsd-questions@freebsd.org>
Subject:   Re: Help:: Listen queue overflow killing servers
Message-ID:  <20190726091138.ffb39f75029373f85ab0edb5@3dresearch.com>
In-Reply-To: <3a62375a-432c-3533-a7bc-e5573c26fa9c@ifdnrg.com>
References:  <3a62375a-432c-3533-a7bc-e5573c26fa9c@ifdnrg.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 26 Jul 2019 12:58:45 +0100
Paul Macdonald via freebsd-questions <freebsd-questions@freebsd.org>
wrote:

>=20
> Hi,
>=20
> Over the past few months i've seen several boxes (4 or 5) become=20
> unresponsive as a result of a Listen queue overflow state.
>=20
> Processes stack up, none are killable, all these are within jails and=20
> neither the jail can be stopped nor the server rebooted (without a
> power cycle).
>=20
> All are on ZFS and are std apache/php/mysql servers with nothing too
> exotic.
>=20
> All on 12.0-RELEASE, i've only started seeing these issues recently,
> but it feels like more and more.
>=20
> /var/log/messages shows tyically;
>=20
>  =A0=A0=A0 kernel: sonewconn: pcb 0xfffff813395e3d58: Listen queue
> overflow: 193 already in queue awaiting acceptance (83 occurrences)
>=20
> netstat -Lan=A0 shows
>=20
> tcp4 193/0/128=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 x.x.x.x.443
> tcp4=A0 193/0/128=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0 x.x.x.x.80
>=20
> connections cannot be killed with tcpdrop ( except ssh which can!)
>=20
> All processes seem to be in Disk State ( many many apache processes
> but others getting stuck too)
>=20
> www=A0=A0=A0=A0=A0 60089=A0=A0=A0 0.0 0.1=A0 196588=A0=A0 78328=A0 -=A0 D=
J=A0=A0 21:07
> 1:19.54 /usr/local/sbin/httpd -DNOHTTPACCEPT
> ..<snoip>
>=20
> www=A0=A0=A0=A0=A0 93713=A0=A0=A0 0.0 0.0=A0 183576=A0=A0 33164=A0 -=A0 D=
J=A0=A0 23:57
> 0:00.01 /usr/local/sbin/httpd -DNOHTTPACCEPT
>=20
> but no zombies..
>=20
> last pid: 24773;=A0 load averages:=A0 0.00,=A0 0.00, 0.00=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=20
>  =A0=A0=A0 up 52+11:41:09=A0 11:48:02
> 918 processes: 1 running, 917 sleeping
> CPU:=A0 0.0% user,=A0 0.0% nice,=A0 0.0% system,=A0 0.0% interrupt,=A0 10=
0% idle
> Mem: 107M Active, 3729M Inact, 93G Wired, 27G Free
> ARC: 79G Total, 54G MFU, 23G MRU, 243M Anon, 710M Header, 1615M Other
>  =A0=A0=A0=A0 73G Compressed, 191G Uncompressed, 2.60:1 Ratio
> Swap: 4096M Total, 4096M Free
>=20
>=20
> I'd appreciate any advice as at present it looks like my only option
> is to hard power cycle these

I have also been trying to find a resolution to a similar problem
(FreeBSD 12.0-STABLE r345381, virtual instace, not jail).

Apparently at random, TCP sockets on ports 110 and 143 are stuck in
CLOSE_WAIT state (cyrus 3.0.10). My understanding is that in CLOSE_WAIT
state the socket is waiting for the server application to close the
socket.

When the listening queue overflows, I too am unable restart cyrus, even
with kill -9, reboot(8) doesn't work, new ssh connection is not
accepted. Hard reboot is the only "remedy".

I have increased the cyrus listen queue from the default 32 to 128, but
I think that's just putting a larger bucket under a leaking roof.

--=20
Janos Dohanics



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190726091138.ffb39f75029373f85ab0edb5>