From owner-freebsd-stable@FreeBSD.ORG Thu May 29 22:12:02 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E60DF10656B0 for ; Thu, 29 May 2008 22:12:02 +0000 (UTC) (envelope-from rblayzor.bulk@inoc.net) Received: from mx0-a.inoc.net (mx0-a.inoc.net [64.246.130.30]) by mx1.freebsd.org (Postfix) with ESMTP id 85FAC8FC15 for ; Thu, 29 May 2008 22:12:02 +0000 (UTC) (envelope-from rblayzor.bulk@inoc.net) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=inoc.net; h=Received:From:To:Subject:Date; b=IO5hi1UqlZhYQFRZRe0FdWovERIxls3wPsSvuHG3gAjscDMxlazVt8khQVCQjQz8EdrmwQPzcXNigDeTozMTbM5EYeb2H7j2WewsOVAFTq+34IOWSJMGNOzbFR3wohdUcBPvw+cpjXiCvzD+uHbjJxDmdnazxWVy/o2s1qDZDD0=; Received: from [172.16.0.199] (cpe-67-240-119-200.nycap.res.rr.com [67.240.119.200]) by mx0-a.inoc.net (build v8.3.29) with ESMTP id 157982838-1941382 for multiple; Thu, 29 May 2008 22:11:58 +0000 (UTC) Message-Id: From: Robert Blayzor To: Matthew Dillon In-Reply-To: <200805292132.m4TLWhCv026720@apollo.backplane.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v924) Date: Thu, 29 May 2008 18:11:56 -0400 References: <1A19ABA2-61CD-4D92-A08D-5D9650D69768@mac.com> <23C02C8B-281A-4ABD-8144-3E25E36EDAB4@inoc.net> <483DE2E0.90003@FreeBSD.org> <483E36CE.3060400@FreeBSD.org> <483E3C26.3060103@paradise.net.nz> <483E4657.9060906@FreeBSD.org> <483EA513.4070409@earthlink.net> <96AFE8D3-7EAC-4A4A-8EFF-35A5DCEC6426@inoc.net> <483EAED1.2050404@FreeBSD.org> <200805291912.m4TJCG56025525@apollo.backplane.com> <14DA211A-A9C5-483A-8CB9-886E5B19A840@inoc.net> <200805291930.m4TJUeGX025815@apollo.backplane.com> <0C827F66-09CE-476D-86E9-146AB255926B@inoc.net> <200805292132.m4TLWhCv026720@apollo.backplane.com> X-Mailer: Apple Mail (2.924) Cc: freebsd-stable@freebsd.org Subject: Re: Sockets stuck in FIN_WAIT_1 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2008 22:12:03 -0000 On May 29, 2008, at 5:32 PM, Matthew Dillon wrote: > Now, the connection is also in a half-closed state, which means > that > one direction is closed. I can't tell which direction that is > but my > guess is that 1.1.1.1 (the apache server) closed the 1.1.1.1- > >2.2.2.2 > direction and the 2.2.2.2 box has a broken TCP implementation and > can't > deal with it. This is exactly what we're seeing, it's VERY strange. I did kill off Apache, and all the FIN_WAIT_1's stuck around, so the kernel is in fact sending these probe packets, every 60 seconds, which the client responds to... (most of the time). > I can suggest two things. First, the TCP connection is good but > you > still may be able to tell Apache, in the apache configuration > file, to > timeout after a certain period of time and clear the connection. I don't think this helps since Apache sees the connection as long gone. As far as Apache is concerned (as far as I can tell), this connection doesn't exist. This may be proved by killing off Apache, the connection still lives and while Apache is running, I have the max clients connected most of the time... so I don't think the linger around and jam up sockets to Apache. If they did, I think Apache would spiral down quite quickly. > Secondly, it may be beneficial to identify exactly what the > client and > server were talking about which caused the client to hang with a > live > tcp connection. The only way to do that is to tcpdump EVERYTHING > going > on related to the apache srever, save it to a big-ass disk > partition > (like 500G), and then when you see a stuck connection go back > through > the tcpdump log file and locate it, grep it out, and review what > exactly > it was talking about. You'd have to tcpdump with options to tell > it to > dump the TCP data payloads. Unfortunately it's not possible for me, not nearly enough space. This is a VERY busy server, a spikey 20Mbps+ (8-12Mbps on average) of web traffic almost constantly. The traffic is VERY static, just small data files and occasional large ones (12Mb+), but the majority are 2-5k files. (it's a clamav mirror server) > It seems likely that the client is running an applet or > javascript that > receives a stream over the connection, and that applet or > javascript > program has locked up, causing the data sent from the server to > build up > and for the client's buffer space to run out, and start > advertising the > 0 window. 98% of the clients are clamav (freshclam) clients on various platforms. Using p0f most of them are various flavors of Linux, but I can't say what OS the clients are connecting to for sure since I'd have to look at the OS finger print of the SYN packets... Don't get me wrong, the server keeps up well, low CPU, lots of RAM free, lots of network available, and 99% of all HTTP connections are completed just fine. I just see these FIN_WAIT_1 connections build up over time until the server runs out of socket space and then things just stop working. Only way to correct it seems to reboot the server... even under RELENG_7_0.... so the upgrade from 4_11 did not fix the problem. -- Robert Blayzor, BOFH INOC, LLC rblayzor@inoc.net http://www.inoc.net/~rblayzor/