Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 15 Apr 2001 17:10:02 +0200 (CEST)
From:      Remy Nonnenmacher <remy@boostworks.com>
To:        freebsd-net@freebsd.org
Subject:   fxp+bridge: highly suspect syndrom.
Message-ID:  <200104151507.f3FF7iC97018@luxren2.boostworks.com>

next in thread | raw e-mail | index | archive | help
I'm looking for any idea about a very strange problem:

Context:

I'm using an interposed machine (TRACER) running bridge between to
interfaces. There is two other machines (UA and OS) connected via the
interposed machine: (All machines are 4.2-REL)

	UA ------- TRACER ------- OS

All machines uses fxp interfaces. Connection is done via crossed
cables.

The UA machine runs a loader, requesting about 7000 HTML pages on the OS
(1 connection per page). All NMBCLUSTERS are bumped up to handle this
load.
During a direct test (without TRACER), UA and OS run at about 15% of
their maximum load and network is solid red (100Mbit/s from OS to UA).
No glitches, no problems.

The TRACER employ the standard briging code. It supports up to 4x100
Mbit/s @ 25-30% load, moving around 30.000 paquets/sec between the
interfaces. No load or limits problem with this machine. This have been
tested by pumping data between OS and UA.

Problem:

when the tracer is interposed _and_ the specific Loader test (the 7000
connect/data_transfert/disconnect) is run, _and_ there is more than one
connection running at a time, then, between 3 and 50 times during the
test, a packet is send by the OS, received by the TRACER and not
forwarded until it is repeated by the OS.

And now, hold your belt and grasp your suspenders: The only packet that
suffer the problem is a SYN+ACK from the OS to the UA. _Never any other
ones_ !.

The sequence is always exactly the same:

	UA		TRACER			OS

	SYN---------------SYN------------------>SYN
	                     SYN+ACK<-----------SYN+ACK
	                     			.
	                     			.
	                     SYN+ACK<-----------SYN+ACK (repeated)
	SYN+ACK<------------
	SYN+ACK(rep)<-------

As if the TRACER<->OS interface received the first SYN+ACK but do not
get an interrupt until the repeated frame comes in.

What I have checked:

- Bridge do not use ipfw to check packets (means: nothing knows nor care
about flags)
- Instrumented drivers: No packet loss, no discards.
- Checked for aging of bridge table: no table aging occured
- Instrumented bridge: No DROPS, no UNKNOWN, all packets forwarded
   regularly.

What I have tried: (No effect on the problem)

- Changing PCI irq for interfaces
- Using same IRQ for two interfaces
- Swaping fxp0/fxp1 at TRACER level
- Changing cards (using dual Pro100, fxp again)
- Switching from the regular 4.2-RELEASE fxp driver to the Jonathan
Lemon's one (applied patchset and rebuilt kernel).

What works: (changes that make the problem disappears)

- using a dec2114X based card on TRACER, at OS side only. (Note that
exchanging the sides ('de' on UA side and 'fxp' on OS side) makes the
problem raises again (*))
- Making only one connection at a time (1 connect/transfert/disconnect
phase at a time).

What have a strong influence on the problem (lowers the problem
frequency):

- hammering the network with packets flood during the loader test
- tcpdumping interfaces on TRACER
- Running two independant loader test.(**)


Notes:

(*) This is why i believe the packet is received by the TRACER machine
and not that it is kept at OS side for one second. (Outgoing packets
catched via tcpdump not meaning 'physically sent on the wire').

(*) Note that the problem appears when the loaded is waiting for a
connection to be established. During this time, it do not terminate
other existing connections. When two independant tests are ran, it
appears like the previous 'hammering' condition.

-------------------

From the observation that the problem, in the worst case, repeats quiet
exactly at seconds tick intervals (ie: one second runs, one second
pause), May I conclude that there is something wrong that relates to
(or is a combination of):

- Interrupts not transmitted, processed or catched by the fxp driver
when there is no other activity than finishing transmitting packets.
- a race condition happening at one second intervals with something that
  repeats regularly (retrieving fxp stats ?)
- "'Genious' work inside the 8255X component" (highly double-qoted).

I have detailed files (head turning slowly, deep slow voice, slight
german accent, half open eyes ;) .. containing tcpdump traces for those
interested. I'll keep drilling to locate the cause.

Any idea welcome.

Thanks.

RN.
IhM



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200104151507.f3FF7iC97018>