Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Sep 2007 15:23:45 -0700
From:      "Jeff Soule" <jsoule@webcrossing.com>
To:        freebsd-bugs@freebsd.org
Subject:   tcp listen problem
Message-ID:  <6c845d510709211523o4694af26y15c83dfebb445acd@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
We are seeing an intermittent problem with FreeBSD 6.2 and our
custom web server application, where incoming listens will
sometimes not be passed to our application to be accepted. It is as
if the listen queue is "clogged" somehow, and all incoming listens
are blocked from being passed to our application. The clogged state
lasts anywhere from a few minutes to over 30 minutes, then (if we
wait it out) picks up and runs as if nothing had gone wrong. When the
application picks up, the pending requests are accepted by our
application with an error that they timed out on the client, and with
new listens accepted and working fine. Other applications, and other
ip:port pairs in our application, all continue to work fine while a
listen for a particular ip:port is clogged.

Our short-term fix for the problem is to check for incoming listens
completing, and if none come in for a 2 second period to call
ourselves and make sure that our call to ourselves completes. If not,
then we kill the instance and restart. Restarting the application
fixes the problem immediately (except that the listens in the queue
at the time of the restart are lost and get errors). The problem is
that the short-term fix reduces our uptime from 100% to 99.5%, and
this is simply not an acceptable level of service for our customers;
we have to fix this...

Internal details on what we are doing:
* using select for polled I/O, with all I/O requests coming out of a
single thread
* using threads for incoming requests in a single process (this is
because it is a database application, and we need all threads to
access the database cache)

We've checked a tcpdump of incoming calls, and can't see anything
funny about the calls that clog the listen queue; they look fine to
us. So doesn't look like an attack per se.

Incidence seems to be random. We might have 4-5 days without any,
then get 10 in one day close together, or get one every now and then.

Any help would be much appreciated, and we would be happy to hire
someone on a consulting basis to help resolve.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6c845d510709211523o4694af26y15c83dfebb445acd>