Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 May 1996 15:37:47 -0700
From:      David Greenman <davidg@Root.COM>
To:        "Karl Denninger, MCSNet" <karl@mcs.com>
Cc:        hackers@FreeBSD.ORG
Subject:   Re: Grrr.. is this is a FreeBSD problem (TIME_WAIT again) 
Message-ID:  <199605252237.PAA23150@Root.COM>
In-Reply-To: Your message of "Sat, 25 May 1996 16:20:41 CDT." <m0uNQlN-000IDOC@venus.mcs.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
>If the caller and callee are on DIFFERENT machines, I get no stale sockets.
>This is reliable even if there are tens of new connections per minute.
>
>If the caller and callee are on the SAME machine, I get sockets in TIME_WAIT
>for 2 minutes each (grrrr) which, if the traffic is heavy enough, eventually
>blocks new connections for a few minutes until they clear up.  None of the 
>sockets in TIME_WAIT has output or input pending; both counts show zero.
>
>This is a serious problem!
>
>Interestingly enough, I can switch the end of the link which "netstat" thinks
>is the "local" end by changing who calls shutdown() first!  This is also
>unexpected; I would have thought that the caller ALWAYS would be the "local"
>side of the connection.
>
>I've checked and rechecked -- the same code, running across two machines,
>does not do this.  But when the calling and called code are on the same
>system (2.1-STABLE) it does -- repeatedly and reliably.
>
>Any ideas?  While one solution would be to get the code off the same
>(common) machine, there are reasons that I don't want to do this in normal
>production.  But, I need to use TCP (rather than local Unix domain sockets)
>because the BACKUP server is on a different system (in the event the first
>one crashes).
>
>Why would this happen when the caller and callee are on the same box, but
>not when the traffic actually goes across the network?  Has anyone else seen
>anything like this in their experience?  Due to the structure of this module
>(its a drop-in into a stock daemon from another source) I cannot leave the 
>socket open across requests, and I'd like to understand the reason for
>this behavior anyway.

   Based on what you've said thus far, it's working as it is supposed to.
There is a good discussion of the 2MSL wait ("TIME_WAIT") in "TCP/IP
Illustrated Volume 1", page 242, by W. Richard Stevens. Depending on how
your program handles it's ports/connections, you might be able to use the
SO_REUSEADDR socket option to avoid the problem. See page 244.

-DG

David Greenman
Core-team/Principal Architect, The FreeBSD Project



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199605252237.PAA23150>