Date: Fri, 26 Jul 2019 09:11:38 -0400 From: Janos Dohanics <web@3dresearch.com> To: FreeBSD Questions <freebsd-questions@freebsd.org> Subject: Re: Help:: Listen queue overflow killing servers Message-ID: <20190726091138.ffb39f75029373f85ab0edb5@3dresearch.com> In-Reply-To: <3a62375a-432c-3533-a7bc-e5573c26fa9c@ifdnrg.com> References: <3a62375a-432c-3533-a7bc-e5573c26fa9c@ifdnrg.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 26 Jul 2019 12:58:45 +0100 Paul Macdonald via freebsd-questions <freebsd-questions@freebsd.org> wrote: >=20 > Hi, >=20 > Over the past few months i've seen several boxes (4 or 5) become=20 > unresponsive as a result of a Listen queue overflow state. >=20 > Processes stack up, none are killable, all these are within jails and=20 > neither the jail can be stopped nor the server rebooted (without a > power cycle). >=20 > All are on ZFS and are std apache/php/mysql servers with nothing too > exotic. >=20 > All on 12.0-RELEASE, i've only started seeing these issues recently, > but it feels like more and more. >=20 > /var/log/messages shows tyically; >=20 > =A0=A0=A0 kernel: sonewconn: pcb 0xfffff813395e3d58: Listen queue > overflow: 193 already in queue awaiting acceptance (83 occurrences) >=20 > netstat -Lan=A0 shows >=20 > tcp4 193/0/128=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0 x.x.x.x.443 > tcp4=A0 193/0/128=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 x.x.x.x.80 >=20 > connections cannot be killed with tcpdrop ( except ssh which can!) >=20 > All processes seem to be in Disk State ( many many apache processes > but others getting stuck too) >=20 > www=A0=A0=A0=A0=A0 60089=A0=A0=A0 0.0 0.1=A0 196588=A0=A0 78328=A0 -=A0 D= J=A0=A0 21:07 > 1:19.54 /usr/local/sbin/httpd -DNOHTTPACCEPT > ..<snoip> >=20 > www=A0=A0=A0=A0=A0 93713=A0=A0=A0 0.0 0.0=A0 183576=A0=A0 33164=A0 -=A0 D= J=A0=A0 23:57 > 0:00.01 /usr/local/sbin/httpd -DNOHTTPACCEPT >=20 > but no zombies.. >=20 > last pid: 24773;=A0 load averages:=A0 0.00,=A0 0.00, 0.00=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=20 > =A0=A0=A0 up 52+11:41:09=A0 11:48:02 > 918 processes: 1 running, 917 sleeping > CPU:=A0 0.0% user,=A0 0.0% nice,=A0 0.0% system,=A0 0.0% interrupt,=A0 10= 0% idle > Mem: 107M Active, 3729M Inact, 93G Wired, 27G Free > ARC: 79G Total, 54G MFU, 23G MRU, 243M Anon, 710M Header, 1615M Other > =A0=A0=A0=A0 73G Compressed, 191G Uncompressed, 2.60:1 Ratio > Swap: 4096M Total, 4096M Free >=20 >=20 > I'd appreciate any advice as at present it looks like my only option > is to hard power cycle these I have also been trying to find a resolution to a similar problem (FreeBSD 12.0-STABLE r345381, virtual instace, not jail). Apparently at random, TCP sockets on ports 110 and 143 are stuck in CLOSE_WAIT state (cyrus 3.0.10). My understanding is that in CLOSE_WAIT state the socket is waiting for the server application to close the socket. When the listening queue overflows, I too am unable restart cyrus, even with kill -9, reboot(8) doesn't work, new ssh connection is not accepted. Hard reboot is the only "remedy". I have increased the cyrus listen queue from the default 32 to 128, but I think that's just putting a larger bucket under a leaking roof. --=20 Janos Dohanics
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190726091138.ffb39f75029373f85ab0edb5>