From owner-freebsd-hackers Sat May 25 14:20:46 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id OAA14879 for hackers-outgoing; Sat, 25 May 1996 14:20:46 -0700 (PDT) Received: from kitten.mcs.com (Kitten.mcs.com [192.160.127.90]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id OAA14874 for ; Sat, 25 May 1996 14:20:43 -0700 (PDT) Received: from venus.mcs.com (root@Venus.mcs.com [192.160.127.92]) by kitten.mcs.com (8.7.5/8.6.9) with SMTP id QAA07705 for ; Sat, 25 May 1996 16:20:42 -0500 (CDT) Received: by venus.mcs.com (/\==/\ Smail3.1.28.1 #28.5) id ; Sat, 25 May 96 16:20 CDT Message-Id: Subject: Grrr.. is this is a FreeBSD problem (TIME_WAIT again) To: hackers@freebsd.org Date: Sat, 25 May 1996 16:20:41 -0500 (CDT) From: "Karl Denninger, MCSNet" X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi folks. I have some custom code here which does TCP socket work, in many cases within the same machine. Both ends of the connection call shutdown(socket, 2) and close(socket) before exiting, and both insure that linger is turned off on their respective ends of the link. If the caller and callee are on DIFFERENT machines, I get no stale sockets. This is reliable even if there are tens of new connections per minute. If the caller and callee are on the SAME machine, I get sockets in TIME_WAIT for 2 minutes each (grrrr) which, if the traffic is heavy enough, eventually blocks new connections for a few minutes until they clear up. None of the sockets in TIME_WAIT has output or input pending; both counts show zero. This is a serious problem! Interestingly enough, I can switch the end of the link which "netstat" thinks is the "local" end by changing who calls shutdown() first! This is also unexpected; I would have thought that the caller ALWAYS would be the "local" side of the connection. I've checked and rechecked -- the same code, running across two machines, does not do this. But when the calling and called code are on the same system (2.1-STABLE) it does -- repeatedly and reliably. Any ideas? While one solution would be to get the code off the same (common) machine, there are reasons that I don't want to do this in normal production. But, I need to use TCP (rather than local Unix domain sockets) because the BACKUP server is on a different system (in the event the first one crashes). Why would this happen when the caller and callee are on the same box, but not when the traffic actually goes across the network? Has anyone else seen anything like this in their experience? Due to the structure of this module (its a drop-in into a stock daemon from another source) I cannot leave the socket open across requests, and I'd like to understand the reason for this behavior anyway. -- -- Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity Modem: [+1 312 248-0900] | T1 from $600 monthly; speeds to DS-3 available Voice: [+1 312 803-MCS1] | 21 Chicagoland POPs, ISDN, 28.8, much more Fax: [+1 312 248-9865] | Email to "info@mcs.net" WWW: http://www.mcs.net/ ISDN - Get it here TODAY! | Home of Chicago's only FULL Clarinet feed!