From owner-freebsd-hackers Sat May 25 15:37:47 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id PAA18167 for hackers-outgoing; Sat, 25 May 1996 15:37:47 -0700 (PDT) Received: from Root.COM (implode.Root.COM [198.145.90.17]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id PAA18162 for ; Sat, 25 May 1996 15:37:43 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by Root.COM (8.7.5/8.6.5) with SMTP id PAA23150; Sat, 25 May 1996 15:37:47 -0700 (PDT) Message-Id: <199605252237.PAA23150@Root.COM> X-Authentication-Warning: implode.Root.COM: Host localhost [127.0.0.1] didn't use HELO protocol To: "Karl Denninger, MCSNet" cc: hackers@FreeBSD.ORG Subject: Re: Grrr.. is this is a FreeBSD problem (TIME_WAIT again) In-reply-to: Your message of "Sat, 25 May 1996 16:20:41 CDT." From: David Greenman Reply-To: davidg@Root.COM Date: Sat, 25 May 1996 15:37:47 -0700 Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk >If the caller and callee are on DIFFERENT machines, I get no stale sockets. >This is reliable even if there are tens of new connections per minute. > >If the caller and callee are on the SAME machine, I get sockets in TIME_WAIT >for 2 minutes each (grrrr) which, if the traffic is heavy enough, eventually >blocks new connections for a few minutes until they clear up. None of the >sockets in TIME_WAIT has output or input pending; both counts show zero. > >This is a serious problem! > >Interestingly enough, I can switch the end of the link which "netstat" thinks >is the "local" end by changing who calls shutdown() first! This is also >unexpected; I would have thought that the caller ALWAYS would be the "local" >side of the connection. > >I've checked and rechecked -- the same code, running across two machines, >does not do this. But when the calling and called code are on the same >system (2.1-STABLE) it does -- repeatedly and reliably. > >Any ideas? While one solution would be to get the code off the same >(common) machine, there are reasons that I don't want to do this in normal >production. But, I need to use TCP (rather than local Unix domain sockets) >because the BACKUP server is on a different system (in the event the first >one crashes). > >Why would this happen when the caller and callee are on the same box, but >not when the traffic actually goes across the network? Has anyone else seen >anything like this in their experience? Due to the structure of this module >(its a drop-in into a stock daemon from another source) I cannot leave the >socket open across requests, and I'd like to understand the reason for >this behavior anyway. Based on what you've said thus far, it's working as it is supposed to. There is a good discussion of the 2MSL wait ("TIME_WAIT") in "TCP/IP Illustrated Volume 1", page 242, by W. Richard Stevens. Depending on how your program handles it's ports/connections, you might be able to use the SO_REUSEADDR socket option to avoid the problem. See page 244. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project