From owner-freebsd-net@FreeBSD.ORG  Mon Mar 22 12:27:22 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 00572106566B
	for <freebsd-net@freebsd.org>; Mon, 22 Mar 2010 12:27:21 +0000 (UTC)
	(envelope-from cjharrer@comcast.net)
Received: from qmta06.westchester.pa.mail.comcast.net
	(qmta06.westchester.pa.mail.comcast.net [76.96.62.56])
	by mx1.freebsd.org (Postfix) with ESMTP id A2DF68FC12
	for <freebsd-net@freebsd.org>; Mon, 22 Mar 2010 12:27:21 +0000 (UTC)
Received: from omta12.westchester.pa.mail.comcast.net ([76.96.62.44])
	by qmta06.westchester.pa.mail.comcast.net with comcast
	id wAm41d0040xGWP856CTMxd; Mon, 22 Mar 2010 12:27:21 +0000
Received: from record ([69.141.194.142])
	by omta12.westchester.pa.mail.comcast.net with comcast
	id wCTL1d00L34oVgM3YCTM51; Mon, 22 Mar 2010 12:27:21 +0000
From: "Chris Harrer" <cjharrer@comcast.net>
To: "'Rui Paulo'" <rpaulo@freebsd.org>, "'Bruce Evans'" <brde@optusnet.com.au>
References: <006f01cac6d8$5fc03cb0$1f40b610$@net>
	<2FEF7D36-31B7-4FD7-9350-EECA7B38D519@freebsd.org>
	<20100321173545.O3020@delplex.bde.org>
	<8FEEC471-81FB-4FCA-BA4E-8E90048EFD8B@freebsd.org>
In-Reply-To: <8FEEC471-81FB-4FCA-BA4E-8E90048EFD8B@freebsd.org>
Date: Mon, 22 Mar 2010 08:27:18 -0400
Message-ID: <000001cac9bb$03bdfb10$0b39f130$@net>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: AcrI+JWNJWilg1rDR5S3W7Glg8ZeVQAwfR3w
Content-Language: en-us
Cc: freebsd-net@freebsd.org
Subject: RE: Bug in tcp_output?
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Mar 2010 12:27:22 -0000

Bruce and Rui,

I owe you an apology, sorry.  As I was reading Bruce's response, I was
saying to myself that something isn't adding up and sure enough I went =
back
and looked and noticed that I was running with a locally modified =
tcp_var.h
(which changed the snd_cwnd container).

The "clean" version does indeed work as expected.

Again, sorry for the "noise".

Thanks.

Chris
-----Original Message-----
From: Rui Paulo [mailto:rpaulo@gmail.com] On Behalf Of Rui Paulo
Sent: Sunday, March 21, 2010 9:15 AM
To: Bruce Evans
Cc: Chris Harrer; freebsd-net@freebsd.org
Subject: Re: Bug in tcp_output?


On 21 Mar 2010, at 07:17, Bruce Evans wrote:

> On Sat, 20 Mar 2010, Rui Paulo wrote:
>=20
>> On 18 Mar 2010, at 20:19, Chris Harrer wrote:
>>>=20
>>> In the following block of code, running on a x86_64 platform, I =
believe
that
>>> cwin should be declared as an int:
>>> ...
>>>              else {
>>>=20
>>>                      long cwin;  =DF-- Should be an int
>>> ...
>>>                      if (len > 0) {
>>>=20
>>>                              cwin =3D tp->snd_cwnd -
>>>=20
>>>                                     (tp->snd_nxt - tp->sack_newdata) =
-
>>>=20
>>>                                     sack_bytes_rxmt;
>>>=20
>>>                              if (cwin < 0)
>>>=20
>>>                                     cwin =3D 0;
>>>=20
>>>                              len =3D lmin(len, cwin);
>>>=20
>>>                      }
>>>=20
>>>              }
>>>=20
>>>       }
>>>=20
>>>=20
>>>=20
>>> Consider the case where:
>>>=20
>>> sack_rxmit =3D 0
>>>=20
>>> sack_bytes_rxmt =3D 0x2238
>>>=20
>>> off =3D 0
>>>=20
>>> len =3D0xa19c
>>>=20
>>> tp->snd_cwnd =3D 0x2238
>>>=20
>>> tp->snd_nxt =3D 0xdd6d7974
>>>=20
>>> tp->sack_newdata =3D 0xdd6d6858
>>>=20
>>> In this case cwin evaluates to 0x00000000ffffe37c, which is not <0, =
but
>>> instead huge.  This causes the remaining data on the socket=92s =
so->so_snd
>>> buffer to be sent to the network causing more problems at the =
receiver
which
>>> is already dropping frames.
>>=20
>> I see. This is most likely a bug. Can you send-pr so this doesn't get
lost?
>=20
> What bug do you see?  This is most likely not a bug.  I only see the
> following bugs
> - the suggestion to increase the fragility of the code by changing =
cwin to
>  int
> - lots of whitespace lossage
> - the style bug in the declaration of cwin (nested declaration)
> - lots fragile but working code.  It depends on the machine being a =
normal
>  2's complement one.  It would fail on normal 1's complement machines =
and
>  on abnormal 2's complement ones, but so would many other things in =
the
>  kernel.
> - type and arithmetic errors that are not made at runtime resulting in =
a
>  value that wouldn't work, though the runtime value would.
>=20
> Relevant code quoted again, with the whitespace fixed:
>=20
>>>                              cwin =3D tp->snd_cwnd -
>>>                                     (tp->snd_nxt - tp->sack_newdata) =
-
>>>                                     sack_bytes_rxmt;
>=20
> On 64-bit machines, with the above values, this is:
>=20
> 				rhs =3D (u_long)0x2238UL -
> 				      ((tcp_seq)0xdd6d7974 -
> 				       (tcp_seq)0xdd6d6858) -
> 				      (int)0x2238;
> 				    =3D (u_long)0x2238UL -
> 				      ((uint32_t)0xdd6d7974 -
> 				       (uint32_t)0xdd6d6858) -
> 				      (int)0x2238;
> 				    =3D (u_long)0x2238UL -
> 				      (u_int)0x111c -
> 				      (int)0x2238;
> 				    =3D (u_long)0x111c -
> 				      (int)0x2238;
> 				    =3D (u_long)0x111c -
> 				      (u_long)0x2238;
> 				    =3D (u_long)0xffffffffffffeee4;
> 				cwin =3D (long)rhs;
> 				     =3D -(long)0x111c;
>=20
> I might have made arithmetic errors too, but I'm sure that I got the
> conversions essentially correct.  On machines with 64-bit u_longs,
> almost everything is evaluated modulo 2^64.  This gives a large =
positive
> value, but not one with the top bits set up to only the 31st as would
> happen on machines with 32-bit u_longs.  Then the final conversion to
> long gives a negative value.

Right. I made some bad calculations.

>=20
> This is fragile, but it is a standard 2's complement hack.  It would
> fail mainly on normal ones complement machines when the rhs is
> (u_long)0xFF...FF.  Then the lhs is probably negative 0, which is
> not less than 0.
>=20
> The fragility is essentially the same on machines with 32-bit u_longs.
> Almost everything is evaluated modulo 2^32...
>=20
> Using 64-bit u_longs for tp->snd_cwnd (and thus for almost the entire
> calculation) is exessive but doesn't cause any problems.
>=20
> Using a signed type for sack_bytes_rxmt asks for sign extension bugs =
but
> doesn't get them.  Here it is promoted to a u_long so there are no
> sign extension bugs for it here.
>=20
> Using a signed type for cwin is essential for the comparison of cwin
> with 0 to work.

Right.

>  This signed type should have the same size as the rhs
> to avoid even more fragility (if it were int, then you would have to
> worry about the value being changed to a non-working value by the
> implementation-defined conversion of the rhs to cwin not just for
> values larger than LONG_MAX but also for ones larger than INT_MAX.
> `int' should work in practice.  This and other things depend on the
> difference of the tcp_seq's not being anywhere near as large as
> 0x7fffffff).

I assumed that Chris saw a problem with this code after being hit by =
some
TCP/IP interop issue. Was this the case?

--
Rui Paulo