From owner-freebsd-hackers  Wed Nov 28 21:42:45 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from exuma.irbs.com (exuma.irbs.com [216.86.160.252])
	by hub.freebsd.org (Postfix) with ESMTP id C9C2237B417
	for <freebsd-hackers@freebsd.org>; Wed, 28 Nov 2001 21:42:39 -0800 (PST)
Received: by exuma.irbs.com (Postfix, from userid 2500)
	id 1CE0A17406; Thu, 29 Nov 2001 00:42:34 -0500 (EST)
Date: Thu, 29 Nov 2001 00:42:34 -0500
From: John Capo <jc@irbs.com>
To: freebsd-hackers@freebsd.org
Subject: Re: FreeBSD performing worse than Linux?
Message-ID: <20011129004234.A16101@exuma.irbs.com>
Reply-To: jc@irbs.com
References: <20011128153817.T61580@monorchid.lemis.com> <15364.38174.938500.946169@caddis.yogotech.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <15364.38174.938500.946169@caddis.yogotech.com>; from nate@yogotech.com on Wed, Nov 28, 2001 at 12:41:18AM -0700
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

I started noticing some TCP weirdness when I moved my bandwidth
stats site from my office to my colo facility last week.  The colo
is five miles away by road and 1200 miles away by network.  Netscape
would stop for seconds at a time while loading the graph images but
there was no consistency.  Worked properly sometimes and sometimes
not.

I also noticed a delay when dumping the contents of my spam reject
db with a perl program.  Output would pause for a second, start for
a second, pause for a second, and so on.  Piping the perl script
to cat produces continuous output.  I dismissed this behavior to
network oddity since the web sites on my machines seemed to be
running just fine.

Now this thread comes along and I realize there is something wrong
so I did a little testing.

find / -print on one of my servers in a ssh session will fill the
pipe to my office, 256K frame, and run nicely then get into the
starting and stopping mode after a good amount of data has been
sent.  find / -print | dd obs=1 will screw up within a few seconds
and stay that way.  Netstat in another ssh session shows data ready
to go:

    tcp4       0  15928  server.22         client.4427    ESTABLISHED

This is a fragment from a dump on the server side while running
find / -print | dd obs=1

21:41:46.328381 client.4427 > server.22: . ack 11249 win 17328 <nop,nop,timestamp 53827689 105528699> (DF) [tos 0x10]
21:41:46.335863 client.4427 > server.22: . ack 11345 win 17328 <nop,nop,timestamp 53827689 105528699> (DF) [tos 0x10]
21:41:46.342216 client.4427 > server.22: . ack 11441 win 17328 <nop,nop,timestamp 53827690 105528699> (DF) [tos 0x10]
21:41:46.396051 client.4427 > server.22: . ack 11489 win 17376 <nop,nop,timestamp 53827696 105528699> (DF) [tos 0x10]
21:41:46.418208 client.4427 > server.22: . ack 11489 win 17376 <nop,nop,timestamp 53827698 105528699> (DF) [tos 0x10] 
21:41:47.460903 server.22 > client.4427: . 11489:12937(1448) ack 144 win 17376 <nop,nop,timestamp 105528895 53827698> (DF) [tos 0x10] 
21:41:47.569133 client.4427 > server.22: . ack 12937 win 15928 <nop,nop,timestamp 53827813 105528895> (DF) [tos 0x10] 
21:41:49.001039 client.4427 > server.22: P 144:192(48) ack 12937 win 17376 <nop,nop,timestamp 53827954 105528895> (DF) [tos 0x10] 
21:41:49.001073 server.22 > client.4427: . 28049:29497(1448) ack 192 win 17328 <nop,nop,timestamp 105529049 53827954> (DF) [tos 0x10] 
21:41:49.001085 server.22 > client.4427: P 29497:30313(816) ack 192 win 17328 <nop,nop,timestamp 105529049 53827954> (DF) [tos 0x10] 
21:41:49.109131 client.4427 > server.22: . ack 12937 win 17376 <nop,nop,timestamp 53827967 105528895> (DF) [tos 0x10] 

Its been a while since I have had to analyze TCP dumps but it looks
to me like the server received an ack at 21:41:47.569133 for byte
12937 but the server did not resume transmission till the duplicate
ack at 21:41:49.001039.  The starting and stopping continues every
few seconds.  The only other interesting thing I see is the client
sending duplicate acks for byte 11489.

Running netstat -p tcp -s on the server shows a retransmit timeout
for each output pause.  Full TCP stats:

    689765 packets sent
	    208566 data packets (90677298 bytes)
	    1046 data packets (1187590 bytes) retransmitted
	    1 resend initiated by MTU discovery
	    292504 ack-only packets (21123 delayed)
	    0 URG only packets
	    11551 window probe packets
	    139170 window update packets
	    36928 control packets
    906752 packets received
	    167629 acks (for 90004170 bytes)
	    10803 duplicate acks
	    0 acks for unsent data
	    706255 packets (792771342 bytes) received in-sequence
	    468 completely duplicate packets (5045 bytes)
	    15 old duplicate packets
	    10 packets with some dup. data (202 bytes duped)
	    480 out-of-order packets (241868 bytes)
	    6 packets (6 bytes) of data after window
	    6 window probes
	    3812 window update packets
	    33 packets received after close
	    2 discarded for bad checksums
	    0 discarded for bad header offset fields
	    0 discarded because packet too short

There are no ip errors.

I see exactly the same behavior on 3 -stable machines running kernels
from late October and early November.  Another -stable machine with
a kernel from late September does pause but not as consistently as
the later kernel machines do.  The client machine is running a
kernel from early November.  Fxp cards nailed at 100Mbs full duplex
in all machines connected to a Cisco 2924 with all ports nailed at
100Mbs full duplex.  I am not seeing any link level errors on the
machines or the switch.

The pauses occur with or without newreno.  Another difference between
the machine that works better and the others that don't is the ones
that reliably hang are SMP machines.  Setting machdep.smp_active=0
does not change anything.  Same test works fine on SMP machines in
my office with kernels from the same time period.

This is interesting, the same test in an ssh session from a 4.3-BETA
machine to the same server pauses very briefly every minute or so
but that could be a true dropped packet.  I do see the retransmit
counter on the server increment at the same rate.  Same results
with a W98 putty session running in vmware on a -stable machine.

Something is borked.

John Capo


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message