From owner-freebsd-stable@FreeBSD.ORG Sat Nov 26 07:56:52 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B6420106564A for ; Sat, 26 Nov 2011 07:56:52 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta15.westchester.pa.mail.comcast.net (qmta15.westchester.pa.mail.comcast.net [76.96.59.228]) by mx1.freebsd.org (Postfix) with ESMTP id 5DE708FC12 for ; Sat, 26 Nov 2011 07:56:51 +0000 (UTC) Received: from omta15.westchester.pa.mail.comcast.net ([76.96.62.87]) by qmta15.westchester.pa.mail.comcast.net with comcast id 1jws1i0011swQuc5FjwsHy; Sat, 26 Nov 2011 07:56:52 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta15.westchester.pa.mail.comcast.net with comcast id 1jwp1i0021t3BNj3bjwpWj; Sat, 26 Nov 2011 07:56:52 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id E497C102C19; Fri, 25 Nov 2011 23:56:47 -0800 (PST) Date: Fri, 25 Nov 2011 23:56:47 -0800 From: Jeremy Chadwick To: Kris Bauer Message-ID: <20111126075647.GA33048@icarus.home.lan> References: <4ECEF6FD.5050006@freebsd.org> <4ED077BF.10205@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: kerbzo@gmail.com, freebsd-stable@freebsd.org, stb@lassitu.de, raul@turing.b2n.org, Steven Hartland , george+freebsd@m5p.com, FreeBSD Release Engineering Team , Lawrence Stewart Subject: Re: TCP Reassembly Issues X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Nov 2011 07:56:52 -0000 On Sat, Nov 26, 2011 at 12:49:24AM -0600, Kris Bauer wrote: > On Fri, Nov 25, 2011 at 11:23 PM, Lawrence Stewart wrote: > > > On 11/25/11 13:01, Lawrence Stewart wrote: > > > >> On 11/24/11 18:02, Kris Bauer wrote: > >> > >>> Hello, > >>> > >>> I am currently experiencing an issue with FreeBSD 9.0-RC2 r227852 > >>> where the > >>> net.inet.tcp.reass.curesegments value is constantly increasing (and not > >>> descreasing when there is nominal traffic with the box). It is causing > >>> tcp > >>> slowdowns as described with kern/155407: > >>> > >>> Exhausted net.inet.tcp.reass.maxsegments block recovering tcp session > >>> (for > >>> this socket and any other socket waiting for retransmited packets). After > >>> exhausted net.inet.tcp.reass.maxsegments allocation new entry in > >>> tcp_reass > >>> failed (for this socket and any other socket waiting for retransmited > >>> packets). > >>> > >>> I have increased the reass.maxsegments value to 16384 to temporarily > >>> avoid > >>> the problem, but the cursegments number keeps rising and it seems it will > >>> occur again. > >>> > >>> Is this an issue that anyone else has seen? I can provide more > >>> information > >>> if need be. > >>> > >> > >> Thanks Kris, Raul and Stefan for the reports, I'll look into this. > >> > > > > I think I've got it - a stupid 1 line logic bug. My apologies for missing > > it when I reviewed the patch which introduced the bug (patch was committed > > to head as r226113, MFCed to stable/9 as r226228). > > > > Due to some miscommunication, the initial patch was committed to and MFCed > > from head much later than it should have been in the 9.0 release cycle and > > instead of being included in the BETAs, didn't make it in until 9.0-RC1 I > > believe i.e. only RC1 and RC2 should be experiencing the issue. > > > > Could those who have reported the bug and are able to recompile their > > kernel to test a patch please try the following and report back to the list: > > > > > > http://people.freebsd.org/~lstewart/patches/misctcp/tcp_reass_plugzoneleak_10.x.r227986.patch > > > > The patch is against head r227986 but will apply and work correctly for > > 9.0 as well. > > > > Cheers, > > Lawrence > > > > I have patched, recompiled, and rebooted. net.inet.tcp.reass.cursegments > is no longer incrementing, and connectivity is holding steady. If anything > changes over the next couple of hours, I'll be sure to report it; but all > preliminary signs of the problem are gone. > > Thanks for all the help! Let's not be hasty in concluding everything is fixed. Why I'm a bit on edge about this: I took the time to find the CVS commits that induced this issue in the first place, and it seems there is some history. The commit that caused this problem to begin with was supposedly a fix for a different problem: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_reass.c#rev1.375 A week later, that commit went from HEAD/MAIN into RELENG_9: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_reass.c#rev1.374.2.2 Be sure to read the description of the problem that was being fixed in the first place. I've also CC'd the original problem reporter, Steven Hartland, because we're going to need him to try the above patch from Lawrence to make sure there aren't other problems. Meaning: for all we know, the above fix might work great for Kris but cause problems for Steve. This entire situation leads me to believe very few people are doing quality testing of RELENG_9, yet we're already into 9.0-RC2. Please don't tell me "that's exactly why you should be running RELENG_9!"; that is completely backwards and I refuse to get into a flame war about it, because it's this simple: 90%+ of those running FreeBSD on servers need something that's stable, we can't risk wonkiness (especially of this degree!) on systems taking production traffic. Did no one actually test the change *thoroughly*? Imagine had this lay dormant until 9.0-RELEASE. Lawrence: please don't take my comments personally or to mean "you broke it and caused this mess!" It's meant to read more along the lines of "you committed a fix for something that broke other bits badly, but nobody noticed this, including the original reporter of a different problem? How/why?" You get the idea. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |