Date: Sun, 15 Apr 2001 17:10:02 +0200 (CEST) From: Remy Nonnenmacher <remy@boostworks.com> To: freebsd-net@freebsd.org Subject: fxp+bridge: highly suspect syndrom. Message-ID: <200104151507.f3FF7iC97018@luxren2.boostworks.com>
next in thread | raw e-mail | index | archive | help
I'm looking for any idea about a very strange problem: Context: I'm using an interposed machine (TRACER) running bridge between to interfaces. There is two other machines (UA and OS) connected via the interposed machine: (All machines are 4.2-REL) UA ------- TRACER ------- OS All machines uses fxp interfaces. Connection is done via crossed cables. The UA machine runs a loader, requesting about 7000 HTML pages on the OS (1 connection per page). All NMBCLUSTERS are bumped up to handle this load. During a direct test (without TRACER), UA and OS run at about 15% of their maximum load and network is solid red (100Mbit/s from OS to UA). No glitches, no problems. The TRACER employ the standard briging code. It supports up to 4x100 Mbit/s @ 25-30% load, moving around 30.000 paquets/sec between the interfaces. No load or limits problem with this machine. This have been tested by pumping data between OS and UA. Problem: when the tracer is interposed _and_ the specific Loader test (the 7000 connect/data_transfert/disconnect) is run, _and_ there is more than one connection running at a time, then, between 3 and 50 times during the test, a packet is send by the OS, received by the TRACER and not forwarded until it is repeated by the OS. And now, hold your belt and grasp your suspenders: The only packet that suffer the problem is a SYN+ACK from the OS to the UA. _Never any other ones_ !. The sequence is always exactly the same: UA TRACER OS SYN---------------SYN------------------>SYN SYN+ACK<-----------SYN+ACK . . SYN+ACK<-----------SYN+ACK (repeated) SYN+ACK<------------ SYN+ACK(rep)<------- As if the TRACER<->OS interface received the first SYN+ACK but do not get an interrupt until the repeated frame comes in. What I have checked: - Bridge do not use ipfw to check packets (means: nothing knows nor care about flags) - Instrumented drivers: No packet loss, no discards. - Checked for aging of bridge table: no table aging occured - Instrumented bridge: No DROPS, no UNKNOWN, all packets forwarded regularly. What I have tried: (No effect on the problem) - Changing PCI irq for interfaces - Using same IRQ for two interfaces - Swaping fxp0/fxp1 at TRACER level - Changing cards (using dual Pro100, fxp again) - Switching from the regular 4.2-RELEASE fxp driver to the Jonathan Lemon's one (applied patchset and rebuilt kernel). What works: (changes that make the problem disappears) - using a dec2114X based card on TRACER, at OS side only. (Note that exchanging the sides ('de' on UA side and 'fxp' on OS side) makes the problem raises again (*)) - Making only one connection at a time (1 connect/transfert/disconnect phase at a time). What have a strong influence on the problem (lowers the problem frequency): - hammering the network with packets flood during the loader test - tcpdumping interfaces on TRACER - Running two independant loader test.(**) Notes: (*) This is why i believe the packet is received by the TRACER machine and not that it is kept at OS side for one second. (Outgoing packets catched via tcpdump not meaning 'physically sent on the wire'). (*) Note that the problem appears when the loaded is waiting for a connection to be established. During this time, it do not terminate other existing connections. When two independant tests are ran, it appears like the previous 'hammering' condition. ------------------- From the observation that the problem, in the worst case, repeats quiet exactly at seconds tick intervals (ie: one second runs, one second pause), May I conclude that there is something wrong that relates to (or is a combination of): - Interrupts not transmitted, processed or catched by the fxp driver when there is no other activity than finishing transmitting packets. - a race condition happening at one second intervals with something that repeats regularly (retrieving fxp stats ?) - "'Genious' work inside the 8255X component" (highly double-qoted). I have detailed files (head turning slowly, deep slow voice, slight german accent, half open eyes ;) .. containing tcpdump traces for those interested. I'll keep drilling to locate the cause. Any idea welcome. Thanks. RN. IhM To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200104151507.f3FF7iC97018>