Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 26 Nov 2011 00:01:53 -0800
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Kris Bauer <kristoph.bauer@gmail.com>
Cc:        kerbzo@gmail.com, freebsd-stable@freebsd.org, stb@lassitu.de, raul@turing.b2n.org, george+freebsd@m5p.com, Steven Hartland <killing@multiplay.co.uk>, Lawrence Stewart <lstewart@freebsd.org>, FreeBSD Release Engineering Team <re@freebsd.org>
Subject:   Re: TCP Reassembly Issues
Message-ID:  <20111126080153.GA33335@icarus.home.lan>
In-Reply-To: <20111126075647.GA33048@icarus.home.lan>
References:  <CAPNZ-Wq38=F3o2hYuYF_unBj3SZQ52XhVhdcwQ8PE_vU9xc2YA@mail.gmail.com> <4ECEF6FD.5050006@freebsd.org> <4ED077BF.10205@freebsd.org> <CAPNZ-WqZsSjcO=dVZpOOMtB_Y_hNcj%2BpYDA4nWPXX9kY9Vj1Wg@mail.gmail.com> <20111126075647.GA33048@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Nov 25, 2011 at 11:56:47PM -0800, Jeremy Chadwick wrote:
> On Sat, Nov 26, 2011 at 12:49:24AM -0600, Kris Bauer wrote:
> > On Fri, Nov 25, 2011 at 11:23 PM, Lawrence Stewart <lstewart@freebsd.org>wrote:
> > 
> > > On 11/25/11 13:01, Lawrence Stewart wrote:
> > >
> > >> On 11/24/11 18:02, Kris Bauer wrote:
> > >>
> > >>> Hello,
> > >>>
> > >>> I am currently experiencing an issue with FreeBSD 9.0-RC2 r227852
> > >>> where the
> > >>> net.inet.tcp.reass.curesegments value is constantly increasing (and not
> > >>> descreasing when there is nominal traffic with the box). It is causing
> > >>> tcp
> > >>> slowdowns as described with kern/155407:
> > >>>
> > >>> Exhausted net.inet.tcp.reass.maxsegments block recovering tcp session
> > >>> (for
> > >>> this socket and any other socket waiting for retransmited packets). After
> > >>> exhausted net.inet.tcp.reass.maxsegments allocation new entry in
> > >>> tcp_reass
> > >>> failed (for this socket and any other socket waiting for retransmited
> > >>> packets).
> > >>>
> > >>> I have increased the reass.maxsegments value to 16384 to temporarily
> > >>> avoid
> > >>> the problem, but the cursegments number keeps rising and it seems it will
> > >>> occur again.
> > >>>
> > >>> Is this an issue that anyone else has seen? I can provide more
> > >>> information
> > >>> if need be.
> > >>>
> > >>
> > >> Thanks Kris, Raul and Stefan for the reports, I'll look into this.
> > >>
> > >
> > > I think I've got it - a stupid 1 line logic bug. My apologies for missing
> > > it when I reviewed the patch which introduced the bug (patch was committed
> > > to head as r226113, MFCed to stable/9 as r226228).
> > >
> > > Due to some miscommunication, the initial patch was committed to and MFCed
> > > from head much later than it should have been in the 9.0 release cycle and
> > > instead of being included in the BETAs, didn't make it in until 9.0-RC1 I
> > > believe i.e. only RC1 and RC2 should be experiencing the issue.
> > >
> > > Could those who have reported the bug and are able to recompile their
> > > kernel to test a patch please try the following and report back to the list:
> > >
> > >
> > > http://people.freebsd.org/~lstewart/patches/misctcp/tcp_reass_plugzoneleak_10.x.r227986.patch
> > >
> > > The patch is against head r227986 but will apply and work correctly for
> > > 9.0 as well.
> > >
> > > Cheers,
> > > Lawrence
> > >
> > 
> > I have patched, recompiled, and rebooted.  net.inet.tcp.reass.cursegments
> > is no longer incrementing, and connectivity is holding steady.  If anything
> > changes over the next couple of hours, I'll be sure to report it; but all
> > preliminary signs of the problem are gone.
> > 
> > Thanks for all the help!
> 
> Let's not be hasty in concluding everything is fixed.  Why I'm a bit on
> edge about this: I took the time to find the CVS commits that induced
> this issue in the first place, and it seems there is some history.
> 
> The commit that caused this problem to begin with was supposedly a fix
> for a different problem:
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_reass.c#rev1.375
> 
> A week later, that commit went from HEAD/MAIN into RELENG_9:
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_reass.c#rev1.374.2.2
> 
> Be sure to read the description of the problem that was being fixed in
> the first place.  I've also CC'd the original problem reporter, Steven
> Hartland, because we're going to need him to try the above patch from
> Lawrence to make sure there aren't other problems.  Meaning: for all we
> know, the above fix might work great for Kris but cause problems for
> Steve.
> 
> This entire situation leads me to believe very few people are doing
> quality testing of RELENG_9, yet we're already into 9.0-RC2.  Please
> don't tell me "that's exactly why you should be running RELENG_9!"; that
> is completely backwards and I refuse to get into a flame war about it,
> because it's this simple: 90%+ of those running FreeBSD on servers need
> something that's stable, we can't risk wonkiness (especially of this
> degree!) on systems taking production traffic.  Did no one actually test
> the change *thoroughly*?  Imagine had this lay dormant until 9.0-RELEASE.
> 
> Lawrence: please don't take my comments personally or to mean "you broke
> it and caused this mess!"  It's meant to read more along the lines of
> "you committed a fix for something that broke other bits badly, but
> nobody noticed this, including the original reporter of a different
> problem?  How/why?"  You get the idea.

Re-sending, because the "Tested by" commit line had someone who replaced
the "@" character with "-at-", so my mail client assumed the Email
address was on my local machine.  Sorry about that folks.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111126080153.GA33335>