Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Sep 2006 19:05:06 +0400
From:      Ruslan Ermilov <ru@FreeBSD.org>
To:        Mike Silbersack <silby@silby.com>
Cc:        cvs-src@FreeBSD.org, Gleb Smirnoff <glebius@FreeBSD.org>, cvs-all@FreeBSD.org, src-committers@FreeBSD.org
Subject:   Re: cvs commit: src/sys/netinet in_pcb.c tcp_subr.c tcp_timer.c tcp_var.h
Message-ID:  <20060906150506.GA7069@rambler-co.ru>
In-Reply-To: <20060906093553.L6691@odysseus.silby.com>
References:  <200609061356.k86DuZ0w016069@repoman.freebsd.org> <20060906091204.B6691@odysseus.silby.com> <20060906143204.GQ40020@FreeBSD.org> <20060906093553.L6691@odysseus.silby.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--DocE+STaALJfprDB
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Sep 06, 2006 at 09:49:15AM -0500, Mike Silbersack wrote:
>=20
> On Wed, 6 Sep 2006, Gleb Smirnoff wrote:
>=20
> >Then we found the CPU hog in the in_pcblookup_local(). I've added
> >counters and gathered stats via ktr(4). When a lag occured, the
> >following data was gathered:
> >
> >112350 return 0x0, iterations 0, expired 0
> >112349 return 0xc5154888, iterations 19998, expired 745
>=20
> Ah, I think I see what's happening.  It's probably spinning because the=
=20
> heuristic isn't triggering on each entry, that doesn't surprise me.  What=
=20
> does surprise me is that it's expiring more than one entry - my original=
=20
> intent with that code was for it to free just one entry, which it would=
=20
> then use... meaning that I goofed up the implementation.
>=20
"it would then use"?  What do you mean?

> I had been thinking of rewriting that heuristic anyway, I'm sure that I=
=20
> can go back and find something far more efficient if you give me a few=20
> days.  (Or a week.)
>=20
I don't see much point in doing it here.  TCP's slow timeout routing
does its job pretty well.  Besides, doing it here is IMO a layering
violation.

> >1.78 hasn't yet been merged to RELENG_6, and we faced the problem on
> >RELENG_6 boxes where the periodic merging cycle is present. So the
> >problem is not in 1.78 of tcp_timer.c. We have a lot of tcptw entries
> >because we have a very big connection rate, not because they are
> >leaked or not purged.
>=20
> Ok, just checking.
>=20
> With this code removed, are you not seeing the web frontends delaying new=
=20
> connections when they can't find a free port to use?
>=20
The TCP's slow timeout routine that runs twice a second takes care
of garbage collecting expired time-wait entries.  When we (just for
the sake of experiment) disabled the slowtimo code that removes the
expired entries, it would still work normally -- the tcptw zone would
be full (or with at most one entry free), and reusing one LRU entry
each time it's needed.  There wasn't any noticeable delay in
processing (in our case anyway).


Cheers,
--=20
Ruslan Ermilov
ru@FreeBSD.org
FreeBSD committer

--DocE+STaALJfprDB
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFE/uOiqRfpzJluFF4RAmzJAKCDfsCnIGaWjEsZv/ymtHk+dej3IgCbBJNw
Tryg0EXfPGkgVns65PWAQto=
=1Y2b
-----END PGP SIGNATURE-----

--DocE+STaALJfprDB--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060906150506.GA7069>