Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Oct 2011 07:46:07 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Lawrence Stewart <lstewart@freebsd.org>
Cc:        Kostik Belousov <kostikbel@gmail.com>, freebsd-net@freebsd.org, freebsd-current@freebsd.org, Andre Oppermann <andre@freebsd.org>
Subject:   Re: 9.0-RC1 panic in tcp_input: negative winow.
Message-ID:  <20111028054605.GF1667@garage.freebsd.pl>
In-Reply-To: <4EA9F76E.9010008@freebsd.org>
References:  <20111022084931.GD1697@garage.freebsd.pl> <201110240814.22368.jhb@freebsd.org> <20111026075431.GB1672@garage.freebsd.pl> <201110260753.37264.jhb@freebsd.org> <4EA9F76E.9010008@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--RE3pQJLXZi4fr8Xo
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Oct 28, 2011 at 11:29:34AM +1100, Lawrence Stewart wrote:
> On 10/26/11 22:53, John Baldwin wrote:
> > The assertion would be triggered when the next packet arrives (as I said
> > above).  Try modifying your debugging output to also log if the ACK is
> > delayed.  I suspect it is not delayed until the last one.  (Pushing out=
 an
> > ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the =
case
> > of an immediate ACK, rcv_nxt>  rcv_adv is only a transient condition all
> > under a single lock invocation so never visible to other consumers of t=
he
> > protocol control block.)  If that is what you see, then that confirms w=
hat
> > I guessed above and I will likely just remove the assertion in tcp_inpu=
t()
> > and patch the timewait code to handle this case.
> >
>=20
> Pawel, have you been able to confirm John's hypothesis? [...]

Yeah, sorry. I moved the debug to the points where we drop the t_inpcb
lock and I still see rcv_nxt being greater than rcv_adv:

	tcp_do_segment:2970 negative window: tp 0xfffffe00685ee3d0 rcv_nxt 1312878=
324 rcv_adv 1312878187

This is just before the INP_WUNLOCK(tp->t_inpcb) under 'check_delack'
label. I see this a lot (it was logged 545 times for 11 different tp
pointers during 24h period).

	tcp_do_segment:3009 negative window: tp 0xfffffe005cfc6000 rcv_nxt 1442546=
453 rcv_adv 1442545722

This is just before calling tcp_output(). This one was logged 65 times
for 3 different tp pointers.
I placed a debug also after tcp_output() call, but it is not logged, so
once we return from tcp_output() everything is fine.

The panic would be triggered 115 times for 5 different tp pointers
during that time.

I write 'tp pointers' as I'm not 100% sure if the same pointer always
represents the same connection or if it is reused.

> [...] What I don't=20
> quite get is why we haven't had a lot more reports of this issue...

Maybe because my TCP/IP stack is heavly modified? ...not:)

No idea to be honest. Ask Ken to turn on INVARIANTS in 9.0-RC2 and we
will see:)

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://yomoli.com

--RE3pQJLXZi4fr8Xo
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)

iEYEARECAAYFAk6qQZ0ACgkQForvXbEpPzTIcwCcC6C06i2hgJshb29NsE5iZ5NJ
l/EAoO/qBU7/4+8tJOElQQUArjNWpq4t
=CGv+
-----END PGP SIGNATURE-----

--RE3pQJLXZi4fr8Xo--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111028054605.GF1667>