From owner-freebsd-stable@FreeBSD.ORG  Sat Mar  6 09:48:09 2004
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 31F0816A4CE
	for <freebsd-stable@freebsd.org>;
	Sat,  6 Mar 2004 09:48:09 -0800 (PST)
Received: from ganymede.hub.org (u46n208.hfx.eastlink.ca [24.222.46.208])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C55C843D3F
	for <freebsd-stable@freebsd.org>;
	Sat,  6 Mar 2004 09:48:08 -0800 (PST)	(envelope-from scrappy@hub.org)
Received: by ganymede.hub.org (Postfix, from userid 1000)
	id 5A46B39A01; Sat,  6 Mar 2004 13:48:08 -0400 (AST)
Received: from localhost (localhost [127.0.0.1])
	by ganymede.hub.org (Postfix) with ESMTP id 513EF39573
	for <freebsd-stable@freebsd.org>;
	Sat,  6 Mar 2004 13:48:08 -0400 (AST)
Date: Sat, 6 Mar 2004 13:48:08 -0400 (AST)
From: "Marc G. Fournier" <scrappy@hub.org>
To: freebsd-stable@freebsd.org
Message-ID: <20040306130937.N71806@ganymede.hub.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Subject: Odd network issue ... *very* slow scp between two servers
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 06 Mar 2004 17:48:09 -0000


I have two servers on the same network switch, sitting one on top of the
other ... one is running an em device, the other an fxp device ...

Doing a straight ftp between the two servers, of a 1Meg file, shows:

1038785 bytes received in 85.91 seconds (11.81 KB/s)

Going between two servers, same switch, both running fxp devices, for the
exact same file, shows:

1038785 bytes received in 0.09 seconds (10.64 MB/s)

Now, I have ipaudit running on all the servers, to monitor bandwidth ...
the server with the fxp device on it, that I just downloaded to from
another fxp server @ 10.64MB/s, did 11535.73M of traffic total yesterday
...  the one with the em device did 11766.46M ...

Now, in my /var/log/messages file, I am getting the RST lines:

Mar  6 12:35:38 neptune /kernel: Limiting open port RST response from 700 to 200 packets per second
Mar  6 12:35:39 neptune /kernel: Limiting open port RST response from 636 to 200 packets per second
Mar  6 12:35:41 neptune /kernel: Limiting open port RST response from 523 to 200 packets per second
Mar  6 12:35:46 neptune /kernel: Limiting open port RST response from 386 to 200 packets per second
Mar  6 12:35:55 neptune /kernel: Limiting open port RST response from 238 to 200 packets per second
Mar  6 13:34:25 neptune /kernel: Limiting open port RST response from 799 to 200 packets per second
Mar  6 13:34:27 neptune /kernel: Limiting open port RST response from 637 to 200 packets per second
Mar  6 13:34:28 neptune /kernel: Limiting open port RST response from 503 to 200 packets per second
Mar  6 13:34:32 neptune /kernel: Limiting open port RST response from 343 to 200 packets per second
Mar  6 13:34:42 neptune /kernel: Limiting open port RST response from 206 to 200 packets per second

And seems to be quite regular:

neptune# gzcat /var/log/messages.0.gz | grep RST | wc -l
      95

where 0.gz is from Mar  5 14:47:28 -> Mar  6 11:30:52

but, shouldn't:

net.inet.tcp.blackhole: 0 -> 2

help?  or did I read the man page wrong?  If it should, I'm still only
getting ~13k/s on that same file ...

there is nothing else in messages to indicate a problem, either with
processes, or drives, or anything, and load on the machine, right now, is
only 1.3 ...

vmstat -i shows a high rate of interrupts for the em device:

neptune# uptime
 1:43PM  up 57 days,  3:08, 5 users, load averages: 1.38, 1.32, 0.97
neptune# vmstat -i
interrupt                   total       rate
ahd0 irq16                     15          0
ahd1 irq17              932228686        188
em0 irq18              1205773331        244
clk irq0                493596903         99
rtc irq8                631819522        128
Total                  3263418457        661

vs

mars# uptime
 1:43PM  up 77 days,  9:50, 3 users, load averages: 7.44, 7.73, 6.28
mars# vmstat -i
interrupt                   total       rate
fxp0 irq5               499794285         74
ahc0 irq11                     15          0
ahc1 irq15              915710622        136
fdc0 irq6                       4          0
clk irq0                668800403         99
rtc irq8                856196939        128
Total                  2940502268        439

the fxp device is running:
        media: Ethernet autoselect (100baseTX <full-duplex>)

the em device is running:
        media: Ethernet 100baseTX <full-duplex>

and, finally, the em server was last upgraded:
	4.9-STABLE #4: Tue Jan  6 00:59:37 AST 2004

while the fxp server is almost ancient:
	4.9-PRERELEASE #2: Sat Sep 20 14:42:25 ADT 2003

I'm going to do a reboot on the server Monday, when a tech is easily
accessible in case of a problem ... but, before I do that, is there
anything I can do to possible debug this?   Maybe something I can look at
that would show a 'leak', maybe?

Thanks ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664