Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Oct 1997 12:41:02 +0930
From:      Greg Lehey <grog@lemis.com>
To:        Joao Carlos Mendes Luis <jonny@coppe.ufrj.br>
Cc:        hackers@FreeBSD.ORG
Subject:   Re: TCP problem
Message-ID:  <19971008124102.17436@lemis.com>
In-Reply-To: <199710080122.XAA00461@gaia.coppe.ufrj.br>; from Joao Carlos Mendes Luis on Tue, Oct 07, 1997 at 11:22:56PM -0200
References:  <199710080122.XAA00461@gaia.coppe.ufrj.br>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Oct 07, 1997 at 11:22:56PM -0200, Joao Carlos Mendes Luis wrote:
> Hi,
>
>   I have an intermitent TCP problem between a FreeBSD 2.2-STABLE and a
> 2.0.27 Linux.  It's happening right now, let me show an example:
>
> 146.164.5.200 - FreeBSD
> 146.164.53.91 - Linux
>
> gaia::root [530] telnet 146.164.53.91 chargen
> Trying 146.164.53.91...
> Connected to gta.ufrj.br.
> Escape character is '^]'.
>  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg
> !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh
> "#$%&'()*+,-./01^]quit
>
> telnet> quit
> Connection closed.
>
>
> In the above connection, telnet stops completely after some output.
> Here's the tcpdump output I've run in another window:
>
> gaia::root [544] tcpdump -pvn host 146.164.53.91 and port chargen
> tcpdump: listening on de0
> 22:49:35.574865 146.164.5.200.2038 > 146.164.53.91.19: S 1153321686:1153321686(0) win 65535 <mss 1460,nop,wscale 1,nop,nop,timestamp[|tcp]> (DF) [tos 0x10] (ttl 64, id 38631)
> 22:49:35.576498 146.164.53.91.19 > 146.164.5.200.2038: S 2381730138:2381730138(0) ack 1153321687 win 31744 <mss 1460> (ttl 63, id 13086)
> 22:49:35.576825 146.164.5.200.2038 > 146.164.53.91.19: . ack 1 win 164 (DF) [tos 0x10] (ttl 64, id 38632)
> 22:49:35.614489 146.164.53.91.19 > 146.164.5.200.2038: P 1:75(74) ack 1 win 31744 (DF) [tos 0x10] (ttl 63, id 13088)
> 22:49:35.804015 146.164.5.200.2038 > 146.164.53.91.19: . ack 75 win 90 (DF) [tos 0x10] (ttl 64, id 38638)
> 22:49:36.646712 146.164.53.91.19 > 146.164.5.200.2038: P 75:165(90) ack 1 win 31744 [tos 0x10] (ttl 63, id 13091)
> 22:49:36.804033 146.164.5.200.2038 > 146.164.53.91.19: . ack 165 win 0 (DF) [tos 0x10] (ttl 64, id 38646)
> 22:49:38.496284 146.164.53.91.19 > 146.164.5.200.2038: . ack 1 win 31744 [tos 0x10] (ttl 63, id 13096)
> 22:49:38.496584 146.164.5.200.2038 > 146.164.53.91.19: . ack 165 win 0 (DF) [tos 0x10] (ttl 64, id 38660)
> 22:49:41.896328 146.164.53.91.19 > 146.164.5.200.2038: . ack 1 win 31744 [tos 0x10] (ttl 63, id 13098)
> 22:49:41.896640 146.164.5.200.2038 > 146.164.53.91.19: . ack 165 win 0 (DF) [tos 0x10] (ttl 64, id 38673)
> 22:49:48.696422 146.164.53.91.19 > 146.164.5.200.2038: . ack 1 win 31744 [tos 0x10] (ttl 63, id 13105)
> 22:49:48.696692 146.164.5.200.2038 > 146.164.53.91.19: . ack 165 win 0 (DF) [tos 0x10] (ttl 64, id 38729)
> 22:50:02.296514 146.164.53.91.19 > 146.164.5.200.2038: . ack 1 win 31744 [tos 0x10] (ttl 63, id 13163)
> 22:50:02.296814 146.164.5.200.2038 > 146.164.53.91.19: . ack 165 win 0 (DF) [tos 0x10] (ttl 64, id 38844)
> ^C
> 1385 packets received by filter
> 0 packets dropped by kernel
>
>
> And here, the output on the remote machine:
>
> recreio::root [281] tcpdump -pvn host 146.164.5.200 and port chargen
> tcpdump: listening on eth0
> 22:47:16.565018 146.164.5.200.2038 > 146.164.53.91.19: S 1153321686:1153321686(0) win 65535 <mss 1460,nop,wscale 1,nop,nop,timestamp 2287877 0,nop,nop,opt-12:00134149> (DF) [tos 0x10] (ttl 63, id 38631)
> 22:47:16.565018 146.164.53.91.19 > 146.164.5.200.2038: S 2381730138:2381730138(0) ack 1153321687 win 31744 <mss 1460> (ttl 64, id 13086)
> 22:47:16.565018 146.164.5.200.2038 > 146.164.53.91.19: . ack 1 win 164 (DF) [tos 0x10] (ttl 63, id 38632)
> 22:47:16.595018 146.164.53.91.19 > 146.164.5.200.2038: P 1:75(74) ack 1 win 31744 (DF) [tos 0x10] (ttl 64, id 13088)
> 22:47:16.785018 146.164.5.200.2038 > 146.164.53.91.19: . ack 75 win 90 (DF) [tos 0x10] (ttl 63, id 38638)
> 22:47:17.635018 146.164.53.91.19 > 146.164.5.200.2038: P 75:165(90) ack 1 win 31744 [tos 0x10] (ttl 64, id 13091)
> 22:47:17.785018 146.164.5.200.2038 > 146.164.53.91.19: . ack 165 win 0 (DF) [tos 0x10] (ttl 63, id 38646)
> 22:47:19.485018 146.164.53.91.19 > 146.164.5.200.2038: . ack 1 win 31744 [tos 0x10] (ttl 64, id 13096)
> 22:47:19.485018 146.164.5.200.2038 > 146.164.53.91.19: . ack 165 win 0 (DF) [tos 0x10] (ttl 63, id 38660)
> 22:47:22.885018 146.164.53.91.19 > 146.164.5.200.2038: . ack 1 win 31744 [tos 0x10] (ttl 64, id 13098)
> 22:47:22.885018 146.164.5.200.2038 > 146.164.53.91.19: . ack 165 win 0 (DF) [tos 0x10] (ttl 63, id 38673)
> 22:47:29.685018 146.164.53.91.19 > 146.164.5.200.2038: . ack 1 win 31744 [tos 0x10] (ttl 64, id 13105)
> 22:47:29.685018 146.164.5.200.2038 > 146.164.53.91.19: . ack 165 win 0 (DF) [tos 0x10] (ttl 63, id 38729)
> 22:47:43.285018 146.164.53.91.19 > 146.164.5.200.2038: . ack 1 win 31744 [tos 0x10] (ttl 64, id 13163)
> 22:47:43.285018 146.164.5.200.2038 > 146.164.53.91.19: . ack 165 win 0 (DF) [tos 0x10] (ttl 63, id 38844)
>
> 15 packets received by filter
> 0 packets dropped by kernel
>
> As you can see, the connection is in a loop, trying to exchange
> data,

Well, it's retrying, and will probably time out and close the
connection after about 10 minutes.

> but there's a problem somewhere.  The very strange thing is that's
> intermitent.  Most of the time it works perfectly.
>
> Also curious, the chargen always stops at the same char.

The 01?

> I have no problem connecting from/to other machines to/from both of these.

Is this just a problem with a single connection, or with all
communication between the two machines?  Are there other machines on
the net?  If so, how do they communicate with these two machines?

> Rebooting the Linux machine does not solve the problem, but rebooting the
> FreeBSD one does solve, so I think it's a FreeBSD problem.  Any suggestions ?

The traces show that the machine with the trouble is IP 146.164.53.91.
The dumps on both sides show 146.164.53.91 retrying an ack, and
146.164.53.200 responding to it immediately.  To get the sequence
straight, look at the timestamps:

> 22:47:22.885018 146.164.53.91.19 > 146.164.5.200.2038: . ack 1 win 31744 [tos 0x10] (ttl 64, id 13098)
> 22:47:22.885018 146.164.5.200.2038 > 146.164.53.91.19: . ack 165 win 0 (DF) [tos 0x10] (ttl 63, id 38673)

Interesting.  This is the Linux box, and it claims a response in 0
µs.  In fact, since the last 4 digits of the timestamp are always
5018, I assume that it can't resolve more than .01 s.

Since the tcpdump on the Linux side shows the data going in, I can
only imagine it's a bug in the Linux TCP/IP stack.

So why does it recover when you reboot the FreeBSD machine?  Probably
a connection reset when the machine comes up.  It should also recover
if you reboot the Linux box, and possibly if you take the interface
down and up again.

Greg



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19971008124102.17436>