From owner-freebsd-hackers  Sat May 25 14:20:46 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id OAA14879
          for hackers-outgoing; Sat, 25 May 1996 14:20:46 -0700 (PDT)
Received: from kitten.mcs.com (Kitten.mcs.com [192.160.127.90])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id OAA14874
          for <hackers@freebsd.org>; Sat, 25 May 1996 14:20:43 -0700 (PDT)
Received: from venus.mcs.com (root@Venus.mcs.com [192.160.127.92]) by kitten.mcs.com (8.7.5/8.6.9) with SMTP id QAA07705 for <hackers@freebsd.org>; Sat, 25 May 1996 16:20:42 -0500 (CDT)
Received: by venus.mcs.com (/\==/\ Smail3.1.28.1 #28.5)
	id <m0uNQlN-000IDOC@venus.mcs.com>; Sat, 25 May 96 16:20 CDT
Message-Id: <m0uNQlN-000IDOC@venus.mcs.com>
Subject: Grrr.. is this is a FreeBSD problem (TIME_WAIT again)
To: hackers@freebsd.org
Date: Sat, 25 May 1996 16:20:41 -0500 (CDT)
From: "Karl Denninger, MCSNet" <karl@mcs.com>
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Hi folks.

I have some custom code here which does TCP socket work, in many cases
within the same machine.

Both ends of the connection call shutdown(socket, 2) and close(socket)
before exiting, and both insure that linger is turned off on their
respective ends of the link. 

If the caller and callee are on DIFFERENT machines, I get no stale sockets.
This is reliable even if there are tens of new connections per minute.

If the caller and callee are on the SAME machine, I get sockets in TIME_WAIT
for 2 minutes each (grrrr) which, if the traffic is heavy enough, eventually
blocks new connections for a few minutes until they clear up.  None of the 
sockets in TIME_WAIT has output or input pending; both counts show zero.

This is a serious problem!

Interestingly enough, I can switch the end of the link which "netstat" thinks
is the "local" end by changing who calls shutdown() first!  This is also
unexpected; I would have thought that the caller ALWAYS would be the "local"
side of the connection.

I've checked and rechecked -- the same code, running across two machines,
does not do this.  But when the calling and called code are on the same
system (2.1-STABLE) it does -- repeatedly and reliably.

Any ideas?  While one solution would be to get the code off the same
(common) machine, there are reasons that I don't want to do this in normal
production.  But, I need to use TCP (rather than local Unix domain sockets)
because the BACKUP server is on a different system (in the event the first
one crashes).

Why would this happen when the caller and callee are on the same box, but
not when the traffic actually goes across the network?  Has anyone else seen
anything like this in their experience?  Due to the structure of this module
(its a drop-in into a stock daemon from another source) I cannot leave the 
socket open across requests, and I'd like to understand the reason for
this behavior anyway.

--
--
Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity
Modem: [+1 312 248-0900]     | T1 from $600 monthly; speeds to DS-3 available
Voice: [+1 312 803-MCS1]     | 21 Chicagoland POPs, ISDN, 28.8, much more
Fax: [+1 312 248-9865]       | Email to "info@mcs.net" WWW: http://www.mcs.net/
ISDN - Get it here TODAY!    | Home of Chicago's only FULL Clarinet feed!